Nucleic acid-based detection methods should be cost-effective, fast, sensitive, and accurate. The detection platform should also be simple to use and interpret, stable under a wide range of operating conditions (such as temperature, humidity, lighting conditions, and access to infrastructure), preferably portable and disposable. (Yager et al., Nature, 2006, 442, 412-418). Furthermore, they should provide the required sensitivity and specificity. (Weigl et al, Lab Chip, 2008, 8, 1999-2014). The ability to perform multiplex tests is another important prerequisite for detection methods and devices.
One application for the use of nucleic acid sequence detection, is for detection of pathogens. Infectious diseases remain a major cause of morbidity and mortality throughout the world. Conventional and standard methods of pathogen detection include cell culture, PCR, and enzyme immunoassay, which are often labor-intensive and can take from several hours to days to perform. (Foudeh et al., Lab Chip, 2012, 12, 3249-3266). In the World Health Organization's 2004 report, infectious diseases were identified as the second leading cause of mortality throughout the world after cardiovascular disease (WHO, The World Health Report 2004, Genève, 2004). This problem is particularly magnified in area of poor hygiene, with limited access to centralized labs for diagnostics and treatments. Even in industrialized nations, there remain issues to be addressed with respect to food industries, pathogen outbreaks, and sexually transmitted diseases. In the political sphere, the threats of biological warfare are also a real possibility. Effective pathogen detection and identification is critical for the prevention and treatment of infectious diseases.
There are other applications for a nucleic acid sequence detection method, in addition to pathogen detection and identification. For example, detection of a plant from which a toxin is derived (e.g., castor oil plant containing ricin), or single nucleotide polymorphism (SNP) detection for the purposes of human identification or tumor sequencing, are but two examples.
Therefore, there is a need in the art to provide a cost-effective, fast, sensitive, and accurate method of nucleic acid detection and identification that can be carried in out in a range of operating conditions. Provided herein are methods and compositions that address this need.
Provided herein are methods and compositions for the detection and identification of target nucleic acids, using target-specific guide nucleic acid (gNA) mediated nuclease system proteins, such as guide RNA (gRNA) mediated CRISPR/Cas system proteins.
In one aspect, provided herein is a method of identifying a target in a sample comprising: (a) contacting nucleic acid from a sample with a plurality of gNA-nucleic acid-guided nuclease system protein complexes, wherein the complexes are targeted to at least one target, and wherein the nucleic acid, the gNA-nucleic acid-guided nuclease system protein complexes, or both the nucleic acid and the gNA-nucleic acid-guided nuclease system protein complexes comprise a label; and (b) identifying the target in the sample, wherein the identifying is achieved by detecting a specific signal from the label, wherein the presence of a specific signal indicates binding of the gNA-nucleic acid-guided nuclease system protein complex to the nucleic acid. In some embodiments, nucleic acid comprises a label. In some embodiments, the nucleic acid label is an intercalating label. In some embodiments, the nucleic acid is labeled prior to the contacting step. In some embodiments, the nucleic acid is labeled after the contacting step. In some embodiments, a plurality of the gNA-nucleic acid-guided nuclease system protein complexes comprise a label. In some embodiments, a plurality of the gNA-nucleic acid-guided nuclease system protein complexes are labeled prior to the contacting step. In some embodiments, a plurality of the gNA-nucleic acid-guided nuclease system protein complexes are labeled after the contacting step. In some embodiments, the nucleic acid and the plurality of the gNA-nucleic acid-guided nuclease system protein complexes are labeled; in some embodiments, they are both labeled prior to the contacting step; and in some embodiments, they are both labeled after the contacting step. In some embodiments, the nucleic acid is labeled prior to contacting, and the plurality of the gNA-nucleic acid-guided nuclease system protein complexes are labeled after the contacting step. In some embodiments, the nucleic acid is labeled after contacting step, and the plurality of the gNA-nucleic acid-guided nuclease system protein complexes are labeled before the contacting step. In some embodiments, the nucleic acid comprises a first label and the gNA-nucleic acid-guided nuclease system protein complexes comprise a second label, wherein the first and second label comprise a donor/acceptor pair for fluorescent resonance energy transfer (FRET). In some embodiments, the contacting is carried out at room temperature. In some embodiments, the identifying comprises detecting a single signal. In some embodiments, the identifying comprises detecting multiple signals. In some embodiments, the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, CasX, CasY, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the nucleic acid-guided nuclease system protein is a dead nucleic acid-guided nuclease system protein. In some embodiments, the CRISPR/Cas system protein is a dead CRISPR/Cas system protein. In some embodiments, the CRISPR/Cas system protein is dCas9. In some embodiments, the nucleic acid-guided nuclease system protein is a nucleic acid-guided nuclease system nickase protein. In some embodiments, the CRISPR/Cas system protein is a CRISPR/Cas system nickase protein. In some embodiments, the nucleic acid-guided nuclease system protein exhibits reduced off-target binding. In some embodiments, the label is selected from the group consisting of an enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group, an aptamer, one member of a binding pair, and combinations thereof. In some embodiments, the label is a detectable label. In some embodiments, the sample is selected from the group consisting of a clinical sample, a forensic sample, an environmental sample, a metagenomic sample, and a food sample. In some embodiments, the sample is from a human. In some embodiments, the sample is not processed prior to the contacting. In some embodiments, the nucleic acid, the gNA-nucleic acid-guided nuclease system protein complexes, or both the nucleic acid and the gNA-nucleic acid-guided nuclease system protein complexes comprise multiple labels. In some embodiments, the complexes are targeted to a plurality of targets. In some embodiments, the target is a pathogen. In some embodiments, the pathogen is selected from the pathogens in Table 1. In some embodiments, the complexes targeted to different targets comprise different labels. In some embodiments, the complexes targeted to different targets comprise the same label. In some embodiments, the gNA-nucleic acid-guided nuclease system protein complexes are attached to a substrate. In some embodiments, the nucleic acid-guided nuclease system proteins are attached to a substrate. In some embodiments, the gNAs are attached to a substrate. In some embodiments, the nucleic acid is attached to a substrate. In some embodiments, the substrate is silica, plastic, glass, or metal. In some embodiments, the substrate is a 2-dimensional substrate. In some embodiments, the substrate comprises a 3-dimensional substrate. In some embodiments, the substrate comprises a chamber or is a cylindrical array. In some embodiments, the gNAs or gNA-nucleic acid-guided nuclease system protein complexes are attached to the substrate in a known order. In some embodiments, the substrate is reusable. In some embodiments, the contacting takes place in solution. In some embodiments, the gNA-nucleic acid-guided nuclease system protein complexes, gNAs, or nucleic acid-guided nuclease system proteins are in solution. In some embodiments, the gNA-nucleic acid-guided nuclease system protein complexes comprise at least 1 unique gNA. In some embodiments, the nucleic acid is sheared. In some embodiments, the nucleic acid is amplified. In some embodiments, the nucleic acid is blocked at one or more termini with non-extendible nucleotides. In some embodiments, the nucleic acid is further analyzed by sequencing after the identifying step. In some embodiments, a plurality of targets are simultaneously identified. In some embodiments, the method is carried out in less than 24 hours. In some embodiments, the sample is subject to a prior screening step. In some embodiments, the sample is subject to multi-detection steps. In some embodiments, the binding of a gNA-nucleic acid-guided nuclease system protein complex to the nucleic acid indicates the presence of a target. In some embodiments, the lack of binding of a gNA-nucleic acid-guided nuclease system protein complex to a nucleic acid indicates the absence of a target. In some embodiments, the method further comprises generating droplets, wherein a subset of the droplets comprise a gNA-nucleic acid-guided nuclease system protein complex. In some embodiments, a subset of the droplets comprise a gNA-nucleic acid-guided nuclease system protein complex bound to the nucleic acid. In any of the embodiments provided herein, the nucleic acid from the sample comprises DNA. In any of the embodiments provided herein, the nucleic acid from the sample is DNA. In any of the embodiments provided herein, the nucleic acid from the sample comprises RNA. In any of the embodiments provided herein, the nucleic acid from the sample is RNA.
In another aspect, provided herein is a collection of guide nucleic acids (gNAs) targeted to at least one target, wherein a plurality of the gNAs comprise a label. In some embodiments, the collection is targeted to a plurality of targets. In some embodiments, the target is a pathogen. In some embodiments, the pathogen is selected from the pathogens in Table 1. In some embodiments, the gNAs targeted to different targets comprise different labels. In some embodiments, the gNAs targeted to different targets comprise the same labels. In some embodiments, the label is a detectable label. In some embodiments, the label is selected from the group consisting of an enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group, an aptamer, one member of a binding pair, and combinations thereof. In some embodiments, the gNAs comprise more than one label. In some embodiments, the gNAs are attached to a substrate. In some embodiments, the substrate is silica, plastic, glass, or metal. In some embodiments, the substrate is a 2-dimensional substrate. In some embodiments, the substrate comprises a 3-dimensional substrate. In some embodiments, the substrate comprises a chamber or is a cylindrical array. In some embodiments, the gNAs are attached to a substrate in a known order. In some embodiments, the substrate is reusable. In some embodiments, the gNAs are in solution. In some embodiments, the collection comprises at least 1 unique gNA. In some embodiments, the gNAs are complexed to nucleic acid-guided nuclease system proteins. In some embodiments, the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein. In some embodiments, the gNAs are gRNAs. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, CasX, CasY, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the CRISPR/Cas system protein is a dead CRISPR/Cas system protein. In some embodiments, the CRISPR/Cas system protein is a dead CRISPR/Cas system protein. In some embodiments, the CRISPR/Cas system protein is dCas9. In some embodiments, the CRISPR/Cas system protein is CRISPR/Cas 9 system nickase protein. In some embodiments, the CRISPR/Cas system protein is CRISPR/Cas 9 system nickase protein. In some embodiments, the nucleic acid-guided nuclease system protein exhibits reduced off-target binding. In some embodiments, the nucleic acid-guided nuclease system proteins also comprise a label.
In another aspect, provided herein is a collection of gNA-nucleic acid-guided nuclease system protein complexes, wherein the gNAs are targeted to at least one target, and wherein a plurality of the complexes comprise a label. In some embodiments, a plurality of the gNAs comprise a label. In some embodiments, a plurality of the nucleic acid-guided nuclease system proteins comprise a label. In some embodiments, both a plurality of gNAs and a plurality of nucleic acid-guided nuclease system proteins comprise a label. In some embodiments, the gNAs are targeted a plurality of targets. In some embodiments, the target is a pathogen. In some embodiments, the pathogen is selected from the pathogens in Table 1. In some embodiments, the gNAs targeted to different targets comprise different labels. In some embodiments, the gNAs targeted to different targets comprise the same label. In some embodiments, the nucleic acid-guided nuclease system proteins comprise different labels. In some embodiments, the nucleic acid-guided nuclease system proteins comprise the same label. In some embodiments, the label is a detectable label. In some embodiments, the label is selected from the group consisting of an enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group, an aptamer, one member of a binding pair, and combinations thereof. In some embodiments, the complexes comprise more than one label. In some embodiments, the complexes are attached to a substrate. In some embodiments, the gNAs are attached to the substrate. In some embodiments, the nucleic acid-guided nuclease system proteins are attached to the substrate. In some embodiments, the substrate is silica, plastic, glass, or metal. In some embodiments, the substrate is a 2-dimensional substrate. In some embodiments, the substrate is a 3-dimensional substrate. In some embodiments, the substrate comprises a chamber or is a cylindrical array. In some embodiments, the complexes are attached to a substrate in a known order. In some embodiments, the substrate is reusable. In some embodiments, the complexes are in solution. In some embodiments, the collection comprises at least 1 unique gNA. In some embodiments, the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, CasX, CasY, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the nucleic acid-guided nuclease system protein is a dead nucleic acid-guided nuclease system protein. In some embodiments, the CRISPR/Cas system protein is a dead CRISPR/Cas system protein. In some embodiments, the CRISPR/Cas system protein is dCas9. In some embodiments, the nucleic acid-guided nuclease system protein is a nucleic acid-guided nuclease system nickase protein. In some embodiments, the CRISPR/Cas system protein is a CRISPR/Cas system nickase protein. In some embodiments, the nucleic acid-guided nuclease system protein exhibits reduced off-target binding.
In another aspect, provided herein are kits comprising anyone of the collections and compositions described herein.
Provided herein are methods and compositions for the detection and identification of targets (e.g. pathogens), using target-specific (e.g. pathogen-specific) guide nucleic acids (gNAs) such as gRNAs and nucleic acid-guided nuclease system proteins, such as CRISPR/Cas system proteins. In some preferred embodiments, the gNAs are gRNAs and the nucleic acid-guided nuclease system proteins are CRISPR/Cas system proteins, such as Cas9. Whenever methods employing gRNAs and CRISPR/Cas system proteins are discussed herein, related methods employing other appropriate nucleic acid-guided nuclease system proteins and gNAs are also contemplated.
Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described.
Numeric ranges are inclusive of the numbers defining the range.
For purposes of interpreting this specification, the following definitions will apply and whenever appropriate, terms used in the singular will also include the plural and vice versa. If any definition set forth below conflicts with any document incorporated herein by reference, the definition set forth shall control.
As used herein, the singular form “a”, “an”, and “the” includes plural references unless indicated otherwise.
It is understood that aspects and embodiments of the invention described herein include “comprising,” “consisting,” and “consisting essentially of” aspects and embodiments.
The term “about” as used herein refers to the usual error range for the respective value readily known to the skilled person in this technical field. Reference to “about” a value or parameter herein includes (and describes) embodiments that are directed to that value or parameter per se.
The term “nucleic acid”, as used herein, refers to a molecule comprising one or more nucleic acid subunits. A nucleic acid can include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), and modified versions of the same. A nucleic acid comprises deoxyribonucleic acid (DNA), ribonucleic acid (RNA), combinations, or derivatives thereof. A nucleic acid may be single-stranded and/or double-stranded.
The nucleic acids comprise “nucleotides”, which, as used herein, is intended to include those moieties that contain purine and pyrimidine bases, and modified versions of the same. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles. In addition, the term “nucleotide” or “polynucleotide” includes those moieties that contain hapten or fluorescent labels and may contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides, nucleotides or polynucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.
For any of the structural and functional characteristics described herein, methods of determining these characteristics are known in the art.
Provided here in are methods and compositions for the detection and identification of targets. The methods utilize target-specific guide nucleic acids (gNAs) (such as guide RNAs) and nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins such as Cas9). The target contains a nucleic acid, which is utilized for identification and detection.
Contemplated targets include, but are not limited to, species or genus-level detection in a sample (e.g., the presence of human DNA); pathogens (such as those used in illustrative embodiments below); single nucleotide polymorphisms (SNPs), insertions, deletions, tandem repeats, or translocations (e.g., for tumor profiling or diagnosis of genetic disease); identifying markers indicative of an individual (e.g., a human individual), such as human SNPs or STRs (e.g., for identification of an individual's DNA in a forensic sample); potential toxins; or animals, fungi, and plants (e.g. ricin containing traces of castor plant; raw materials used in food).
In some cases, the genomes of one or more subjects present in a complex sample are substantially identical and can be difficult to resolve using standard technologies. In some cases, the one or more subjects have genomes that are 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, 99.99%, 99.999% identical. In some cases, the one or more subjects are one or more different strains of microorganisms, for example, one or more strains of bacteria, virus, fungus and the like.
Target nucleic acid sequences can comprise one or more genetic features. The one or more genetic features can distinguish one subject from another. A genetic feature as referred to herein can be a genome, a genotype, a haplotype, chromatin, a chromosome, a chromosome locus, chromosomal material, an allele, a gene, a gene cluster, a gene locus, a genetic polymorphism, a genetic mutation, a single nucleotide polymorphism (SNP), a restriction fragment length polymorphism (RFLP), a variable tandem repeat (VTR), a copy number variant (CNV), a microsatellite sequence, a genetic marker, a sequence marker, a sequence tagged site (STS), a plasmid, a transcription unit, a transcription product, a gene expression level, a genetic expression state. A target nucleic acid sequence can comprise essentially any known genetic feature.
Provided here in are methods and compositions for the detection and identification of pathogens. Pathogens can be bacterial, viral, fungal, algal, or protozoal. Pathogens can be eukaryotic or prokaryotic.
In some embodiments, the pathogen is bacterial. In one embodiment, the pathogen is a gram-negative bacteria. In another embodiment, the pathogen is a gram-positive bacteria. In one embodiment the pathogen is a tuberculosis-causing bacteria. In an exemplary embodiment, the pathogen is Mycobacterium tuberculosis. In another embodiment, the pathogen is a bacteria of the genus Escherichia. In another embodiment, the pathogen is an Escherichia coli (E. coli) bacteria. In an exemplary embodiment, the pathogen is E. coli O157:H7. In an exemplary embodiment, the pathogen is E. coli K12. In an exemplary embodiment, the pathogen is E. coli S88. In an exemplary embodiment, the pathogen is E. coli O45:K1.
In some embodiments, the pathogen is a disease-causing pathogen. In some embodiments, the pathogen causes a foodborne infection. In some embodiments, the pathogen causes a urinary tract infection. In some embodiments, the pathogen causes pneumonia. In some embodiments, the pathogen causes an upper respiratory infection. In some embodiments, the pathogen causes sepsis or septic shock. In some embodiments, the pathogen causes a gastrointestinal illness. In some embodiments, the pathogen is a sexually-transmitted pathogen.
In some embodiments, the pathogen is used as a bioweapon. In some embodiments, such a pathogen is anthrax. In one exemplary embodiment, the pathogen is Bacillus anthraces. In one exemplary embodiment, the pathogen is Yersinia pestis.
In some embodiments, the pathogen is a eukaryotic pathogen (is pathogenic in a eukaryotic organisms).
In some embodiments, the pathogen is a mammalian pathogen (can be pathogenic in a mammalian organism). In some embodiments, the pathogen is a human pathogen. In some embodiments, the pathogen is a primate pathogen. In some embodiments, the pathogen is a non-primate pathogen. In some embodiments, the pathogen is a monkey pathogen. In some embodiments, the pathogen is specific for livestock, for example for a horse, a sheep, a cow, a pig, or a donkey. In some embodiments, the pathogen is specific for a domestic animal, for example a cat, a dog, a gerbil, a mouse, or a rat.
In some embodiments, the pathogen is a mammalian parasite. In one embodiment, the parasite is a worm. In another embodiment, the parasite is a malaria-causing parasite. In another embodiment, the parasite is a Leishmaniasis-causing parasite. In another embodiment, the parasite is an amoeba.
In some embodiments, the pathogen is a non-mammalian pathogen (can be pathogenic in a non-mammalian organism).
In some embodiments, the pathogen is a plant pathogen. In some embodiments, the plant is rice, maize, wheat, rose, grape, coffee, fruit, tomato, potato, or cotton.
In some embodiments, the pathogen is an avian pathogen (is pathogenic in birds and other avian organisms). An avian organism includes, but is not limited to, chicken, turkey, duck and goose.
In exemplary embodiments, Table 1 (adapted from the CDC's “Summary of Notifiable Diseases—United States, 2010” accessed on Feb. 18, 2016 at http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5953a1.htm) provides exemplary disease causing pathogens that can be identified using the methods and compositions provided herein.
Haemophilus influenzae, invasive disease
Streptococcus pneumoniae, invasive disease
Target-Specific gNAs
Provided herein are target-specific guide nucleic acids (gNAs). These target-specific gNAs are utilized for the detection and identification of nucleic acid targets in a sample. The gNAs guide the nucleic acid-guided nuclease proteins to the gNA's cognate target, for binding.
In some embodiments, the target-specific gNAs are capable of being labeled.
Also provided herein is a collection of gNAs targeted to at least one target. In some embodiments, the target is a pathogen. In some embodiments, the target is SNP.
In some embodiments, a collection of gNAs is targeted to a single target.
In some embodiments, a collection of gNAs is targeted to a plurality of targets.
In some embodiments, a collection comprises gNAs targeted to at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 40, 50, 60, 70, 75, 80, 90, 100, 200, 250, 300, 400, 500, 600, 700, 750, 800, 900, 1000, 2500, 5000, 7500, or even at least 10,000 targets. In some exemplary embodiments, a collection comprises gNAs targeted to about 1-3, 1-5, 1-10, 1-25, 1-50, 1-75, 1-100, 5-10, 5-25, 5-50, 5-75, 5-100, 10-20, 10-25, 10-50, 10-75, 10-100, 25-50, 25-75, 25-100, 50-75, 50-100, 75-100, 100-150, 100-200, 100-300, 100-400, 100-500, 100-600, 100-700, 100-800, 100-900, 100-1000, 100-200 targets, 200-300, 200-400, 200-500, 200-600, 200-700, 200-800, 200-900, 200-1000, 300-400, 300-500, 300-600, 300-700, 300-800, 300-900, 300-1000, 400-500, 400-600, 400-700, 400-800, 400-900, 400-1000, 400-600, 400-700, 400-800, 400-900, 400-1000, 500-600, 500-700, 500-800, 500-900, 500-1000, 600-700, 600-800, 600-900, 600-1000, 700-800, 700-900, 700-1000, 800-900, 800-1000, 900-1000, 500-1000, 500-5000, 500-10,000, 1000-5000, 1000-10,000, or even about 5000-10,000 targets.
In some embodiments, the target-specific gNAs comprise a label.
In some embodiments, a collection of gNAs is targeted a plurality of targets, and the gNAs targeted to each individual target comprises a different label or gNAs targeted to each individual target comprises multiple labels.
In some embodiments, the gNAs in a collection comprise targeting sequences directed to target sequences spaced every 106 bp, 105 bp, 104 bp, 103 bp, 102 bp, 50 bp, 25 bp or less across the genome of a target organism.
In some embodiments, the gNAs in a collection comprise targeting sequences directed to unique sequences in the genome of a target organism.
In some embodiments, the gNAs are attached to a substrate. In some embodiments the gNAs are attached to the substrate in a known, referenceable, or predetermined order.
In some embodiments the gNAs are in solution.
In some embodiments, the gNAs are complexed with a nucleic acid-guided nuclease system protein. In some embodiments a collection of gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system proteins. In one specific embodiment, a collection of gNAs as provided herein comprises members that exhibit specificity for a dCas9 protein and another dead Cas/CRISPR system protein selected from the group consisting of deadCpf1, deadCas3, deadCas8a-c, deadCas10, deadCse1, deadCsy1, deadCsn2, deadCas4, deadCsm2, and deadCm5.
In some embodiments, a collection of gNAs targeted at least one target comprises at least 1, at least 5, at least 10, at least 25, at least 50, at least 75, at least 100, at least 250, at least 500, at least 750, at least 1000, at least 2500, at least 5000, at least 75000, at least 10,000, at least 25,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, at least 750,000, at least 1,000,000, at least 2,500,000, at least 5,000,000, at least 7,500,000, at least 10,000,000 unique gNAs. In some embodiments, a collection of gNAs targeted to at least one target comprises at least 5-10, 10-50, 50-100, 10-100, 100-500, 100-1000, 100-10,000, 100-100,000, 100-1,000,000, 100-10,000,000, 1000-10,000, 1000-100,000, 1000-1,000,000, 1000-10,000,000, 10,000-100,000, 10,000-1,000,000, 10,000-10,000,000, 100,000-1,000,000, 100,000-10,000,000, or 1,000,000-10,000,000 unique gNAs. In some embodiments a collection of gNAs comprise at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 unique gNAs. In some embodiments a collection of gNAs contains a total of at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 gNAs.
In some embodiments, the gNAs comprise any purines or pyrimidines (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, uracil, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, cytosine and uracil (and/or modified versions of the same).
In some embodiments, the target-specific gNA comprises a first RNA component comprising a targeting sequence and a second RNA component comprising a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the first component comprises a crRNA and the second component comprises tracrRNA. In some embodiments, the two components are covalently bound. In some embodiments, the two components are not covalently bound. In some embodiments, the two components are associated with each other.
Pathogen-Specific gNAs
In some embodiments, a collection of gNAs is targeted to a single pathogen, at least one pathogen, or to a plurality of pathogens. In some embodiments, the pathogen causes one or more of the diseases listed in Table 1.
In some embodiments, a collection comprises gNAs targeted to at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 40, 50, 60, 70, 75, 80, 90, 100, 200, 250, 300, 400, 500, 600, 700, 750, 800, 900, 1000, 2500, 5000, 7500, or even at least 10,000 pathogens. In some exemplary embodiments, a collection comprises gNAs targeted to about 1-3, 1-5, 1-10, 1-25, 1-50, 1-75, 1-100, 5-10, 5-25, 5-50, 5-75, 5-100, 10-20, 10-25, 10-50, 10-75, 10-100, 25-50, 25-75, 25-100, 50-75, 50-100, 75-100, 100-150, 100-200, 100-300, 100-400, 100-500, 100-600, 100-700, 100-800, 100-900, 100-1000, 100-200 targets, 200-300, 200-400, 200-500, 200-600, 200-700, 200-800, 200-900, 200-1000, 300-400, 300-500, 300-600, 300-700, 300-800, 300-900, 300-1000, 400-500, 400-600, 400-700, 400-800, 400-900, 400-1000, 400-600, 400-700, 400-800, 400-900, 400-1000, 500-600, 500-700, 500-800, 500-900, 500-1000, 600-700, 600-800, 600-900, 600-1000, 700-800, 700-900, 700-1000, 800-900, 800-1000, 900-1000, 500-1000, 500-5000, 500-10,000, 1000-5000, 1000-10,000, or even about 5000-10,000 pathogens.
In some embodiments, a collection of gNAs is targeted to a plurality of pathogens of the same genus. In some embodiments, a collection of gNAs is targeted to a plurality of pathogens of a different genus.
In some embodiments, a collection of gNAs is targeted to a plurality of pathogens of the same species. In some embodiments, a collection of gNAs is targeted a plurality of pathogens of different species.
In some embodiments, a collection of gNAs is targeted to a plurality of pathogens of the same serotype. In some embodiments, a collection of gNAs is targeted a plurality of pathogens of the differing serotypes.
In some embodiments, the pathogen-specific gNAs comprise a label.
In some embodiments, a collection of gNAs is targeted a plurality of pathogens, and the gNAs targeted to each individual pathogen comprises a different label or gNAs targeted to each individual pathogen comprises multiple labels.
In an exemplary embodiment, a collection of gNAs is targeted to five pathogens, wherein the pathogens are Ebola, HIV, Dengue, Zika, and Chikungunya. In this exemplary collection, the gNAs specific for the Ebola comprise a first label; the gNAs specific for the HIV comprise a second label; the gNAs specific for the Dengue comprise a third label; the gNAs specific for the Zika comprise a fourth label; and the pathogen-specific gNAs specific for the Chikungunya comprise a fifth label.
In some embodiments, the gNAs in a collection comprise targeting sequences directed to pathogen sequences spaced every 106 bp, 105 bp, 104 bp, 103 bp, 102 bp, 50 bp, 25 bp or less across the genome of a pathogen.
In some embodiments, the pathogen-specific gNAs in a collection comprise targeting sequences directed to unique sequences in the genome of a pathogen.
In some embodiments, the pathogen-specific gNAs are attached to a substrate. In some embodiments the pathogen-specific gNAs are attached to the substrate in a known, referenceable, or predetermined order.
In some embodiments the pathogen-specific gNAs are in solution.
In some embodiments, the pathogen-specific gNAs are complexed with a nucleic acid-guided nuclease system protein. In some embodiments a collection of pathogen-specific gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system proteins. In one specific embodiment, a collection of pathogen-specific gNAs as provided herein comprises members that exhibit specificity for a dCas9 protein and another dead Cas/CRISPR system protein selected from the group consisting of deadCpf1, deadCas3, deadCas8a-c, deadCas10, deadCse1, deadCsy1, deadCsn2, deadCas4, deadCsm2, and deadCm5.
In some embodiments, a collection of pathogen-specific gNAs targeted at least one pathogen comprises at least 1, at least 5, at least 10, at least 25, at least 50, at least 75, at least 100, at least 250, at least 500, at least 750, at least 1000, at least 2500, at least 5000, at least 75000, at least 10,000, at least 25,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, at least 750,000, at least 1,000,000, at least 2,500,000, at least 5,000,000, at least 7,500,000, at least 10,000,000 unique pathogen-specific gNAs. In some embodiments, a collection of pathogen-specific gNAs targeted to at least one pathogen comprises at least 5-10, 10-50, 50-100, 10-100, 100-500, 100-1000, 100-10,000, 100-100,000, 100-1,000,000, 100-10,000,000, 1000-10,000, 1000-100,000, 1000-1,000,000, 1000-10,000,000, 10,000-100,000, 10,000-1,000,000, 10,000-10,000,000, 100,000-1,000,000, 100,000-10,000,000, or 1,000,000-10,000,000 unique pathogen-specific gNAs. In some embodiments a collection of gNAs comprise at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 unique gNAs. In some embodiments a collection of gNAs contains a total of at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 pathogen-specific gNAs.
In some embodiments, the pathogen-specific gNAs comprise any purines or pyrimidines (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, uracil, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, cytosine and uracil (and/or modified versions of the same).
In some embodiments the pathogen-specific gNA comprises a first RNA component comprising a targeting sequence and a second RNA component comprising a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the first component comprises a crRNA and the second component comprises tracrRNA. In some embodiments the two components are covalently bound. In some embodiments, the two components are not covalently bound. In some embodiments, the two components are associated with each other.
Organization of Target-Specific gNAs
The target-specific gNAs provided herein can be organized in regions on a surface. For example, the target-specific gNAs can be organized as spots, blocks, beads, droplets, wells, or other organization structures on an array or other surface or substrate (e.g., beads, plates). Populations of gNAs can be organized in regions on surface, e.g. in partitions such as wells or droplets. Populations of gNAs can be organized and located with nucleic acid-guided nuclease system proteins to enable binding to targets within the organization.
A given region, whether a spot, block, bead, droplet, well, or other organizational structure, can comprise gNAs targeting a single targeting sequence. In other cases, a given region can comprise a population of guide nucleic acids targeting at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more targeting sequences. Multiple targeting sequences within a single region can each be different targeting sequences for the same target, such as different targeting sequences that each identify a particular species of pathogen. In other cases, multiple targeting sequences within a single region can each be targeting sequences for different targets, such as different members of the same genus.
Subject-specific regions can be distributed on a surface in such a way as to be individually addressable (e.g., individually addressable for detection), such as in discrete spots or clusters. The plurality of gNAs corresponding to a subject-specific region can be arranged into one or more sets of gNAs. Within each set of gNAs, the plurality of gNAs can be identical or they can be different from one another. Within each set, the plurality of gNAs can each comprise a subject-specific feature. The plurality of gNAs within each set can comprise one or more subject-specific features that distinguish one subject from another. In some cases, a subject-specific feature can be a spot or an area on an array, such as a circular, square, or rectangular area. In some cases, a subject-specific feature can be a bead. In some cases, a subject-specific feature can be a series of gNAs labeled with a feature specific tag. The feature specific tag can be, for example, a feature specific barcode or a binding site for a feature specific label. In some instances, features have replicate features. In some instances, the replicate features are identical. In some instances, the replicate features are designed to identify the same target polynucleotides. In some instances, the replicate features are designed to identify the same genome. In some instances, the replicate features are designed to identify any strain within a species. In some cases, the replicate features are designed to identify an individual.
Multiple unique gNAs within a gNAs set or subject-specific region can be located in an area that is smaller than or is comparable in size to the resolution of a detection system employed to detect signal from the device. The area encompassed by multiple unique and ordered gNAs could be less than the resolution of the detection system, equal to the resolution of the detection system, or the area encompassed by all the unique gNAs could be larger as long as the area encompassed by at least 2 of the randomly ordered unique gNAs in the set is roughly equivalent to, or less than, the resolution of the detection system. In such cases, signal from multiple unique gNAs or features can be collected or integrated in one or few pixels, or other resolution elements. Such an approach can achieve similar results as pooling non-identical guide nucleic acids into a single feature.
gNAs can be designed to detect false positives. For example, gNAs can be designed by designing gNA sets wherein the individual gNAs within the set are designed with one or more bases that are mismatched to individual guide nucleic acids in other gNA sets. In some cases, the gNA sets are complementary. In other cases, the gNA sets are not complementary. In another example, a gNA set can be designed to search for a subject organism that has multiple similar strains. In this example, a gNA set could be added to detect individual sequences contained in strains that are not target but have genomes that are very close to that of the target with one or more individual unique characteristics.
Regions can each be different including, without limitation, containing a different number of gNAs, different types of gNAs, different subject-specific features targeted by the gNAs, different average representations of unique gNAs, and the like.
Methods of the present disclosure utilize nucleic acid-guided nucleases. As used herein, a “nucleic acid-guided nuclease” is any nuclease that cleaves DNA, RNA or DNA/RNA hybrids, and which uses one or more nucleic acid guide nucleic acids (gNAs) to confer specificity. Nucleic acid-guided nucleases include CRISPR/Cas system proteins as well as non-CRISPR/Cas system proteins.
The nucleic acid-guided nucleases provided herein can be DNA guided DNA nucleases; DNA guided RNA nucleases; RNA guided DNA nucleases; or RNA guided RNA nucleases. The nucleases can be endonucleases. The nucleases can be exonucleases. In one embodiment, the nucleic acid-guided nuclease is a nucleic acid-guided-DNA endonuclease. In one embodiment, the nucleic acid-guided nuclease is a nucleic acid-guided-RNA endonuclease.
A nucleic acid-guided nuclease system protein-binding sequence is a nucleic acid sequence that binds any protein member of a nucleic acid-guided nuclease system. For example, a CRISPR/Cas system protein-binding sequence is a nucleic acid sequence that binds any protein member of a CRISPR/Cas system.
In some embodiments, the nucleic acid-guided nuclease is selected from the group consisting of CAS Class I Type I, CAS Class I Type III, CAS Class I Type IV, CAS Class II Type II, and CAS Class II Type V proteins. In some embodiments, nucleic acid-guided nuclease system proteins comprise CRISPR/Cas system proteins, including proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems. In some embodiments, the nucleic acid-guided nuclease is selected from the group consisting of Cas9, CasX, CasY, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, Csf1, C2c2, and NgAgo. In an exemplary embodiment, the nucleic acid-guided nuclease is Cas9.
In some embodiments, nucleic acid-guided nuclease system proteins can be from any bacterial or archaeal species.
In some embodiments, the nucleic acid-guided nuclease system proteins are from, or are derived from nucleic acid-guided nuclease system proteins from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.
In some embodiments, the nucleic acid-guided nuclease system proteins are naturally occurring. In some embodiments, the nucleic acid-guided nuclease system proteins are engineered. In some embodiments, the nucleic acid-guided nuclease system proteins are isolated, recombinantly produced, or synthetic.
In some embodiments, naturally occurring nucleic acid-guided nuclease system proteins comprise CRISPR/Cas system proteins including Cas9, CasX, CasY, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
In some embodiments, engineered examples of nucleic acid-guided nuclease system proteins include catalytically dead nucleic acid-guided nuclease system proteins. The term “catalytically dead” or simply “dead nucleic acid-guided nuclease system protein” generally refers to a nucleic acid-guided nuclease system protein that has, for example in the case of some CRISPR/Cas system proteins, inactivated HNH and RuvC nucleases. Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the gNA), but the protein is unable to cleave or nick the double-stranded DNA. In some embodiments, the nucleic acid-guided nuclease system catalytically dead protein is a catalytically dead Cas9 (dCas9). In one embodiment, a dead nucleic acid-guided nuclease system protein-gNA complex binds to targets determined by the gNA sequence. The dead nucleic acid-guided nuclease system protein bound can prevent cutting by nucleic acid-guided nucleases system proteins while other manipulations proceed.
In some embodiments, engineered examples of nucleic acid-guided nuclease system proteins also include nucleic acid-guided nuclease system nickase protein. A nucleic acid-guided nuclease system nickase refers to a modified version of a nucleic acid-guided nuclease system protein, containing a single inactive catalytic domain. In one embodiment, the nucleic acid-guided nuclease system nickase is Cas9 nickase. A Cas9 nickase may contain a single inactive catalytic domain, for example, either the RuvC- or the HNH-domain. With only one active nuclease domain, the nucleic acid-guided nuclease nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the gNA-hybridized strand or the non-hybridized strand may be cleaved. nucleic acid-guided nuclease nickases bound to 2 gNAs that target opposite strands will create a double-strand break in the DNA. This “dual nickase” strategy can increase the specificity of cutting because it requires that both gNA-nucleic acid-guided nuclease complexes (e.g., Cas9/gRNA complexes) be specifically bound at a site before a double-strand break is formed.
In some embodiments, engineered examples of nucleic acid-guided nuclease system proteins also include a high-specificity mutant of a nucleic acid-guided nuclease system protein, containing amino acid changes that confer reduced off-target binding to similar but not identical sequences to the target.
In some embodiments, engineered examples of nucleic acid-guided nuclease system proteins also include nucleic acid-guided nuclease system fusion proteins. For example, a nucleic acid-guided nuclease system protein may be fused to another protein, for example an activator, a repressor, a nuclease, an enzyme, a fluorescent molecule, a chemical tag, a radioactive tag, or a transposase. For example, the nucleic acid-guided nuclease system protein can be fused to a EGFP, or a terminal transferase.
In some embodiments, the nucleic acid-guided nuclease system proteins are attached to a substrate. In some embodiments, the nucleic acid-guided nuclease system proteins are in solution.
In some embodiments, CRISPR/Cas system proteins are used. In some embodiments, CRISPR/Cas system proteins include proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems.
In some embodiments, CRISPR/Cas system proteins can be from any bacterial or archaeal species.
In some embodiments, the CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic. In some embodiments, examples of CRISPR/Cas system proteins can be naturally occurring, can mimic naturally occurring versions or, can be engineered versions.
In some embodiments, the CRISPR/Cas system proteins are from, or are derived from CRISPR/Cas system proteins from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.
In some embodiments, naturally occurring CRISPR/Cas system proteins can belong to CAS Class I Type I, III, or IV, or CAS Class II Type II or V, and can include Cas9, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cmr5, Csf1, C2c2, and Cpf1.
In an exemplary embodiment, the CRISPR/Cas system protein comprises Cas9.
A “CRISPR/Cas system protein-gNA complex” refers to a complex comprising a CRISPR/Cas system protein and a gNA (e.g. a gRNA or a gDNA). Where the gNA is a gRNA, the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA. Alternatively, the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.
A CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type CRISPR/Cas system protein. The CRISPR/Cas system protein may have all the functions of a wild type CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.
The term “CRISPR/Cas system protein-associated gNA” refers to a gNA. The CRISPR/Cas system protein-associated gNA may exist as isolated NA, or as part of a CRISPR/Cas system protein-gNA complex.
In some embodiments, the CRISPR/Cas System protein nucleic acid-guided nuclease is Cas9 or comprises Cas9. The Cas9 of the present invention can be isolated, recombinantly produced, or synthetic. The Cas9 can be naturally occurring, can mimic naturally occurring versions or, can be an engineered version.
Examples of Cas9 proteins that can be used in the embodiments herein can be found in F. A. Ran, L. Cong, W. X. Yan, D. A. Scott, J. S. Gootenberg, A. J. Kriz, B. Zetsche, O. Shalem, X. Wu, K. S. Makarova, E. V. Koonin, P. A. Sharp, and F. Zhang; “In vivo genome editing using Staphylococcus aureus Cas9,” Nature 520, 186-191 (9 Apr. 2015) doi:10.1038/nature14299, which is incorporated herein by reference.
In some embodiments, the Cas9 is a Type II CRISPR system derived from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.
In some embodiments, the Cas9 is a Type II CRISPR system derived from S. pyogenes and the PAM sequence is NGG located on the immediate 3′ end of the target specific guide sequence. The PAM sequences of Type II CRISPR systems from exemplary bacterial species can also include: Streptococcus pyogenes (NGG), Staph aureus (NNGRRT), Neisseria meningitidis (NNNNGA TT), Streptococcus thermophilus (NNAGAA) and Treponema denticola (NAAAAC) which are all usable without deviating from the present invention.
In one exemplary embodiment, Cas9 sequence can be obtained, for example, from the pX330 plasmid (available from Addgene), re-amplified by PCR then cloned into pET30 (from EMD biosciences) to express in bacteria and purify the recombinant 6His tagged protein.
A “Cas9-gNA complex” refers to a complex comprising a Cas9 protein and a gNA. A Cas9 protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type Cas9 protein, e.g., to the Streptococcus pyogenes Cas9 protein. The Cas9 protein may have all the functions of a wild type Cas9 protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.
The term “Cas9-associated gNA” refers to a gNA as described above. The Cas9-associated gNA may exist isolated, or as part of a Cas9-gNA complex.
In some embodiments, non-CRISPR/Cas system proteins are used in the embodiments provided herein.
In some embodiments, the non-CRISPR/Cas system proteins can be from any bacterial or archaeal species.
In some embodiments, the non-CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic.
In some embodiments, the non-CRISPR/Cas system proteins are from, or are derived from Aquifex aeolicus, Thermus thermophilus, Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, Natronobacterium gregoryi, or Corynebacter diphtheria.
In some embodiments, the non-CRISPR/Cas system proteins can be naturally occurring, can mimic naturally occurring versions or, can be engineered versions
In some embodiments, a naturally occurring non-CRISPR/Cas system protein is NgAgo (Argonaute from Natronobacterium gregoryi).
A “non-CRISPR/Cas system protein-gNA complex” refers to a complex comprising a non-CRISPR/Cas system protein and a gNA (e.g. a gRNA or a gDNA). Where the gNA is a gRNA, the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA. Alternatively, the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.
A non-CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type non-CRISPR/Cas system protein. The non-CRISPR/Cas system protein may have all the functions of a wild type non-CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.
The term “non-CRISPR/Cas system protein-associated gNA” refers to a gNA. The non-CRISPR/Cas system protein-associated gNA may exist as isolated NA, or as part of a non-CRISPR/Cas system protein-gNA complex.
In some embodiments, engineered examples of nucleic acid-guided nucleases include catalytically dead nucleic acid-guided nucleases (CRISPR/Cas system nucleic acid-guided nucleases or non-CRISPR/Cas system nucleic acid-guided nucleases). The term “catalytically dead” generally refers to a nucleic acid-guided nuclease that has inactivated nucleases, for example inactivated HNH and RuvC nucleases. Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the gNA), but the protein is unable to cleave or nick the nucleic acid.
Accordingly, the catalytically dead nucleic acid-guided nuclease allows separation of the mixture into unbound nucleic acids and catalytically dead nucleic acid-guided nuclease-bound fragments. In one exemplary embodiment, a dCas9/gRNA complex binds to the targets determined by the gRNA sequence. The dCas9 bound can prevent cutting by Cas9 while other manipulations proceed.
In another embodiment, the catalytically dead nucleic acid-guided nuclease can be fused to another enzyme, such as a transposase, to target that enzyme's activity to a specific site.
In some embodiments, the catalytically dead nucleic acid-guided nuclease is dCas9, dCpf1, dCas3, dCas8a-c, dCas10, dCse1, dCsy1, dCsn2, dCas4, dCsm2, dCm5, dCsf1, dC2C2, or dNgAgo.
In one exemplary embodiment, the catalytically dead nucleic acid-guided nuclease protein is a dCas9.
In some embodiments, engineered examples of nucleic acid-guided nucleases include nucleic acid-guided nuclease nickases (referred to interchangeably as nickase nucleic acid-guided nucleases).
In some embodiments, engineered examples of nucleic acid-guided nucleases include CRISPR/Cas system nickases or non-CRISPR/Cas system nickases, containing a single inactive catalytic domain.
In some embodiments, the nucleic acid-guided nuclease nickase is a Cas9 nickase, Cpf1 nickase, Cas3 nickase, Cas8a-c nickase, Cas10 nickase, Cse1 nickase, Csy1 nickase, Csn2 nickase, Cas4 nickase, Csm2 nickase, Cm5 nickase, Csf1 nickase, C2C2 nickase, or a NgAgo nickase.
In one embodiment, the nucleic acid-guided nuclease nickase is a Cas9 nickase.
In some embodiments, a nucleic acid-guided nuclease nickase can be used to bind to target sequence. With only one active nuclease domain, the nucleic acid-guided nuclease nickase cuts only one strand of a target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the gNA-hybridized strand or the non-hybridized strand may be cleaved. nucleic acid-guided nuclease nickases bound to 2 gNAs that target opposite strands can create a double-strand break in the nucleic acid. This “dual nickase” strategy increases the specificity of cutting because it requires that both nucleic acid-guided nuclease/gNA complexes be specifically bound at a site before a double-strand break is formed.
In exemplary embodiments, a Cas9 nickase can be used to bind to target sequence. The term “Cas9 nickase” refers to a modified version of the Cas9 protein, containing a single inactive catalytic domain, i.e., either the RuvC- or the HNH-domain. With only one active nuclease domain, the Cas9 nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide RNA-hybridized strand or the non-hybridized strand may be cleaved. Cas9 nickases bound to 2 gRNAs that target opposite strands will create a double-strand break in the DNA. This “dual nickase” strategy can increase the specificity of cutting because it requires that both Cas9/gRNA complexes be specifically bound at a site before a double-strand break is formed.
Capture of DNA can be carried out using a nucleic acid-guided nuclease nickase. In one exemplary embodiment, a nucleic acid-guided nuclease nickase cuts a single strand of double stranded nucleic acid, wherein the double stranded region comprises methylated nucleotides.
In some embodiments, thermostable nucleic acid-guided nucleases are used in the methods provided herein (thermostable CRISPR/Cas system nucleic acid-guided nucleases or thermostable non-CRISPR/Cas system nucleic acid-guided nucleases). In such embodiments, the reaction temperature is elevated, inducing dissociation of the protein; the reaction temperature is lowered, allowing for the generation of additional cleaved target sequences. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, at least 55% activity, at least 60% activity, at least 65% activity, at least 70% activity, at least 75% activity, at least 80% activity, at least 85% activity, at least 90% activity, at least 95% activity, at least 96% activity, at least 97% activity, at least 98% activity, at least 99% activity, or 100% activity, when maintained for at least 75° C. for at least 1 minute. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained for at least 1 minute at least at 75° C., at least at 80° C., at least at 85° C., at least at 90° C., at least at 91° C., at least at 92° C., at least at 93° C., at least at 94° C., at least at 95° C., 96° C., at least at 97° C., at least at 98° C., at least at 99° C., or at least at 100° C. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained at least at 75° C. for at least 1 minute, 2 minutes, 3 minutes, 4 minutes, or 5 minutes. In some embodiments, a thermostable nucleic acid-guided nuclease maintains at least 50% activity when the temperature is elevated, lowered to 25° C.−50° C. In some embodiments, the temperature is lowered to 25° C., to 30° C., to 35° C., to 40° C., to 45° C., or to 50° C. In one exemplary embodiment, a thermostable enzyme retains at least 90% activity after 1 min at 95° C.
In some embodiments, the thermostable nucleic acid-guided nuclease is thermostable Cas9, thermostable Cpf1, thermostable Cas3, thermostable Cas8a-c, thermostable Cas10, thermostable Cse1, thermostable Csy1, thermostable Csn2, thermostable Cas4, thermostable Csm2, thermostable Cm5, thermostable Csf1, thermostable C2C2, or thermostable NgAgo.
In some embodiments, the thermostable CRISPR/Cas system protein is thermostable Cas9.
Thermostable nucleic acid-guided nucleases can be isolated, for example, identified by sequence homology in the genome of thermophilic bacteria Streptococcus thermophilus and Pyrococcus furiosus. Nucleic acid-guided nuclease genes can then be cloned into an expression vector. In one exemplary embodiment, a thermostable Cas9 protein is isolated.
In another embodiment, a thermostable nucleic acid-guided nuclease can be obtained by in vitro evolution of a non-thermostable nucleic acid-guided nuclease. The sequence of a nucleic acid-guided nuclease can be mutagenized to improve its thermostability.
Complexes of Target-Specific gNAs and Nucleic Acid-Guided Nuclease System Proteins
Provided herein are nucleic acid-guided nuclease system proteins that are complexed to target specific guide nucleic acids (gNAs). The gNAs direct the nucleic acid-guided nuclease system proteins to the target nucleic acid. A nucleic acid-guided nuclease system can be an RNA-guided nuclease system. A nucleic acid-guided nuclease system can be a DNA-guided nuclease system. In some cases, the nucleic acid-guided nuclease system proteins are nucleic acid-guided nuclease system proteins that are complexed to target-specific gNAs and direct the nucleic acid-guided nuclease system protein to a RNA target. In some cases, the nucleic acid-guided nuclease system proteins are nucleic acid-guided nuclease system proteins that are complexed to target-specific gNAs and direct the nucleic acid-guided nuclease system protein to a DNA target.
Provided herein are target-specific gNAs complexed to nucleic acid-guided nuclease system proteins. Also provided herein are collections of gNA-nucleic acid-guided nuclease system protein complexes, wherein the gNAs are targeted to at least one target (e.g. pathogen-specific gNAs).
In some embodiments, the complexes comprise a label. In some embodiments, the label is a detectable label.
In some embodiments, the gNAs in the collection of complexes are targeted to a plurality of targets.
In some embodiments, the complexes comprise more than one label.
In some embodiments, the complexes are attached to a substrate. In some embodiments, the complexes are attached to a substrate in a known or pre-determined order. In some embodiments, the substrate is reusable.
In some embodiments, the complexes are in solution.
Provided herein methods and compositions for the detection and identification of a target in a sample, based on the nucleic acid present in the sample.
The contemplated samples include, but are not limited to biological samples, clinical samples, forensic samples, environmental samples, metagenomic samples, and the like.
In some embodiments, the sample is a tumor tissue sample.
In some embodiments, the sample is blood, serum, plasma mucus, hair, urine, feces, saliva, breath, cerebrospinal fluid, lymph, tissue, skin, or a biopsy.
In some embodiments, the sample is a food sample, for example a sample of meat, dairy, or produce.
In some embodiments, the sample is a fabric sample.
In some embodiments, the sample is a soil sample. In some embodiments, the sample is a rock sample. In some embodiments, the sample is plant sample.
In some embodiments, the sample is a water sample.
In some embodiments the sample is an air sample. In some embodiments the sample is derived from an air filter.
In some embodiments the sample is a processed sample. In some embodiments, the sample is an unprocessed sample.
In some embodiments, the sample comprises DNA.
In some embodiments, the sample comprises RNA. In some embodiments, the sample comprises RNA and is reverse transcribed to produce cDNA.
In some embodiments, the nucleic acid in the sample is sheared prior to use. In some embodiments, the nucleic acid in the sample is enzymatically sheared. In some embodiments, the nucleic acid in the sample is mechanically sheared.
In some embodiments, the nucleic acid in the sample (e.g. the DNA) is amplified prior to use.
In some embodiments, the nucleic acid in the sample (e.g. the DNA) is circularized prior to use.
In some embodiments, the nucleic acid in the sample (e.g. the DNA) is amplified by rolling circle amplification (RCA).
In some embodiments, the nucleic acid in the sample (e.g. the DNA) is amplified by an isothermal amplification technique, including but not limited to loop-mediated isothermal amplification (LAMP), helicase-dependent amplification, nicking enzyme amplification reaction (NEAR), and recombinase polymerase amplification (RPA).
Generally the size of the target nucleic acid to be identified ranges from 20 bp-109 bp.
In some embodiments, the nucleic acid in the sample (e.g. the DNA) is blocked with dideoxy CTP (a non-extendable nucleotide). In some embodiments, the nucleic acid in the sample (e.g. the DNA) is blocked at one or more termini with non-extendible nucleotides.
Provided herein are methods and compositions for the detection and identification of targets, using target-specific gNAs and nucleic acid-guided nuclease system proteins. The method comprises contacting a nucleic acid in the sample from a sample with target-specific gNA-nucleic acid-guided nuclease system protein complexes. In some embodiments the nucleic acid comprises, or is, DNA. In some embodiments, the nucleic acid comprises, or is RNA.
In some embodiments, the target-specific gNAs comprise a label, or are capable of being labeled. Merely by way of example, fluorescent RNA-binding dyes include, but are not limited to, SYBR Green II, OliGreen, and RiboGreen.
In some embodiments, the nucleic acid-guided nuclease system proteins comprise a label, or are capable of being labeled (e.g., comprise a reactive site to which a label can be attached).
In some embodiments, the target-specific gNA-nucleic acid-guided nuclease system protein complexes comprise a label, or are capable of being labeled. In some embodiments, the target-specific gNAs are labeled prior to contacting nucleic acid and the nucleic acid-guided nuclease system protein is labeled after contacting the nucleic acid. In some embodiments, the target-specific gNAs are labeled after contacting the nucleic acid and the nucleic acid-guided nuclease system protein is labeled prior to contacting the nucleic acid. In some embodiments, both the target-specific gNAs and the nucleic acid-guided nuclease system protein are labeled prior to contacting the nucleic acid. In some embodiments, both the target-specific gNAs and the nucleic acid-guided nuclease system protein are labeled after contacting the nucleic acid.
In some embodiments, the target-specific gNAs and the nucleic acid comprise a label, or are capable of being labeled. In some embodiments, the nucleic acid-guided nuclease system proteins and the nucleic acid comprise a label, or are capable of being labeled.
In some embodiments, the target-specific gNA-nucleic acid-guided nuclease system protein complexes and the nucleic acid comprise a label, or are capable of being labeled.
In some embodiments, the same label is used to tag different targets that belong to the same group. For example, in some embodiments, all gNAs specific for a bacterial pathogen comprise with a first label, all gNAs specific for viral pathogens comprise a second label, and all gNAs specific for fungal pathogens comprise a third label.
In some embodiments, different labels are used to tag different pathogens that belong to the same group. For example, gNAs specific for E. coli bacteria comprise a first label, and gNAs specific for B. subtilis bacteria comprise a second label.
In some embodiments, a complex comprising target-specific gNA 102 and a nucleic acid-guided nuclease system protein 103 is labeled with multiple labels or multi-component labels (see, e.g.,
In some embodiments, target-specific gNA and the nucleic acid-guided nuclease system proteins are both labeled with the same labels.
In some embodiments, target-specific gNAs and the nucleic acid-guided nuclease system proteins are labeled with different labels. In some embodiments, detection and localization of both labels, for example on a substrate, indicates that a complex has been formed between the gNA and the nucleic acid-guided nuclease system protein.
In some embodiments, complexes targeting different targets comprise different labels. In some embodiments, complexes targeting different targets comprise the same label, but are attached to a substrate in a known order. In some embodiments, complexes targeting different target do not comprise a label.
In some embodiments, the nucleic acid comprises a label, or is capable of being labeled.
In some embodiments, the nucleic acid label is an intercalating label.
In some embodiments, the nucleic acid label is a non-specific nucleic acid-binding label.
In some embodiments, the nucleic acid comprises a nucleic acid dye label. Examples of suitable light-emitting nucleic acid dyes include, but are not limited to, EvaGreen dye, GelRed, GelGreen, SYBR Green I (U.S. Pat. Nos. 5,436,134 and 5,658,751), SYBR GreenEr, SYBR Gold, LC Green, LC Green Plus, BOXTO, BEBO, SYBR DX, SYTO9, SYTOX Blue, SYTOX Green, SYTOX Orange, SYTO dyes, POPO-1, POPO-3, BOBO-1, BOBO-3, YOYO-1, YOYO-3, TOTO-1, TOTO-3, PO-PRO-1, BO-PRO-1, YO-PRO-1, TO-PRO-1, JO-PRO-1, PO-PRO-3, LO-PRO-1, BO-PRO-3, YO-PRO-3, TO-PRO-3, TO-PRO-5, Ethidium Homodimer-1, Ethidium Homodimer-2, Ethidium Homodimer-3, propidium iodide, ethidium bromide, various Hoechst dyes, 4′,6-diamidino-2-phenylindole (DAPI), ResoLight, Chromofy, and acridine homodimer Other nucleic acid dyes include those disclosed in U.S. Pat. No. 4,883,867 to Lee (1989), U.S. Pat. No. 5,582,977 to Yue et al. (1996), U.S. Pat. No. 5,321,130 to Yue et al. (1994), U.S. Pat. No. 5,410,030 to Yue et al. (1995), U.S. Pat. No. 5,863,753, and U.S. Patent Publication Nos. 2006/0211028 and 2008/0145526. Many of the above mentioned dyes are commercially available from Invitrogen, Sigma, Biotium and numerous other companies.
In some embodiments, the nucleic acid is labeled prior to being contacted with the target-specific guide NA-nucleic acid-guided nuclease system protein complexes. In some embodiments, the nucleic acid is labeled after being contacted with the target-specific guide NA-nucleic acid-guided nuclease system protein complexes.
In some embodiments, the target-specific gNA-nucleic acid-guided nuclease system protein complexes are labeled prior to being contacted with the nucleic acid. In some embodiments, the target-specific gNA-nucleic acid-guided nuclease system protein complexes are labeled after being contacted with the nucleic acid.
In some embodiments, both the nucleic acid and the target-specific gNA-nucleic acid-guided nuclease system protein complexes are labeled prior to the contacting of the nucleic acid to the complexes. In some embodiments, both the nucleic acid and the target-specific gNA-nucleic acid-guided nuclease system protein complexes are labeled after the contacting of the nucleic acid to the complexes.
In some embodiments, different methods are utilized to detect the label on the nucleic acid and the label on the gNA-nucleic acid-guided nuclease system protein complex.
In some embodiments, the nucleic acid and the gNA-nucleic acid-guided nuclease system protein complex are labeled for signal detection based upon principles of fluorescent resonance energy transfer (FRET). For example, nucleic acid can be labeled with YOYO-1 intercalator dye (donor) and the gNA-nucleic acid-guided nuclease system protein complex can be labeled with Cy3 (acceptor). When the nucleic acid is bound by the gNA-nucleic acid-guided nuclease system protein complex (and a donor/acceptor pair is created), a sensitized Cy3 emission will be detectable. Exemplary donor moieties include, but are not limited to, YOYO-1, Cy5, Cy3, DY-630, DiD, Dy-635, and exemplary acceptor moieties include, but are not limited to, Alexa Fluor® dyes such as Alexa Fluor® 647, Alexa Fluor® 350, Alexa Fluor® 405, and Alexa Fluor® 430.
In some embodiments, the label is a moiety that is further capable of being attached to a label.
In some embodiments, the label is a detectable label.
Contemplated labels include, but are not limited to an enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group (capable of undergoing a redox reaction), an aptamer, one member of a binding pair, and combinations thereof.
In specific embodiments, the label is a fluorophore. A fluorophore can be any substance which absorbs light of one wavelength and emits light of a different wavelength. Typical fluorophores include fluorescent dyes, semiconductor nanocrystals, lanthanide chelates, and fluorescent proteins. Exemplary fluorescent dyes include fluorescein, 6-FAM, rhodamine, Texas Red, tetramethylrhodamine, a carboxyrhodamine, carboxyrhodamine 6G, carboxyrhodol, carboxyrhodamine 110, Cascade Blue, Cascade Yellow, coumarin, Cy2 ®, Cy3 ®, Cy3.5 ®, Cy5 ®, Cy5.5 ®, Cy-Chrome, phycoerythrin, PerCP (peridinin chlorophyll-a Protein), PerCP-Cy5.5, JOE (6-carboxy-4′,5′-dichloro-2′,7′-dimethoxyfluorescein), NED, ROX (5-(and-6)-carboxy-X-rhodamine), HEX, Lucifer Yellow, Marina Blue, Oregon Green 488, Oregon Green 500, Oregon Green 514, Alexa Fluor® 350, Alexa Fluor® 430, Alexa Fluor® 488, Alexa Fluor® 532, Alexa Fluor® 546, Alexa Fluor® 568, Alexa Fluor® 594, Alexa Fluor® 633, Alexa Fluor® 647, Alexa Fluor® 660, Alexa Fluor® 680, 7-amino-4-methylcoumarin-3-acetic acid, BODIPY FL, BODIPY FL-Br2, BODIPY 530/550, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589, BODIPY 581/591, BODIPY 630/650, BODIPY 650/665, BODIPY R6G, BODIPY TMR, BODIPY TR, Quantum Dots, conjugates thereof, and combinations thereof.
Exemplary lanthanide chelates include europium chelates, terbium chelates and samarium chelates.
Exemplary enzymes include alkaline phosphatase, horseradish peroxidase, beta-galactosidase, glucose oxidase, galactose oxidase, neuraminidase, a bacterial luciferase, an insect luciferase and sea pansy luciferase (Renilla koellikeri), which can create a detectable signal in the presence of suitable substrates and assay conditions, known in the art.
Exemplary haptens and/or members of a binding pair include avidin, streptavidin, digoxigenin, biotin, and those described above.
In some embodiments, the nucleic acid, the target-specific gNAs, the nucleic acid-guided nuclease system proteins, or the gNA-nucleic acid-guided nuclease complexes are labeled with different labels.
In some embodiments, detection and localization of labels, for example on a substrate, indicates that a complex has been formed between the gNA and the nucleic acid-guided nuclease system protein. In one exemplary embodiment, the target-specific gNA and the nucleic acid-guided nuclease system protein are labeled for signal detection based upon principles of fluorescent resonance energy transfer (FRET). In such an embodiment, one label is donor moiety, and the other label is acceptor moiety. For example, the nucleic acid-guided nuclease system protein label is detectable unless quenched by signal from the target-specific gNA label (or vice versa), when the target-specific gNA and nucleic acid-guided nuclease system protein form a complex (donor/acceptor pair). Another FRET pair is a gNA comprising a donor moiety; and a gNA comprising an acceptor moiety.
Another FRET pair is a gNA comprising a donor moiety; and a nucleic acid (e.g. a DNA) comprising an acceptor moiety.
Another FRET pair is a gNA comprising an acceptor moiety; and a nucleic acid (e.g. a DNA) comprising a donor moiety.
Another FRET pair is a nucleic acid-guided nuclease system protein comprising a donor moiety; and a nucleic acid-guided nuclease system protein comprising an acceptor moiety.
Another FRET pair is a nucleic acid-guided nuclease system protein comprising a donor moiety; and a nucleic acid (e.g. a DNA)comprising an acceptor moiety.
Another FRET pair is a nucleic acid (e.g. a DNA)comprising a donor moiety; and a nucleic acid-guided nuclease system protein comprising an acceptor moiety.
In some embodiments, only the gNAs or the only the nucleic acid-guided nuclease system proteins comprise FRET labels.
For FRET embodiments, exemplary donor moieties include, but are not limited to, YOYO-1, Cy5, Cy3, DY-630, DiD, Dy-635, and exemplary acceptor moieties include, but are not limited to, Alexa Fluor® dyes such as Alexa Fluor® 647, Alexa Fluor® 350, Alexa Fluor® 405, and Alexa Fluor® 430.
Detection
Detection of the labels of the invention can be carried out with standard methods known in the art. For example, detection can be achieved by eye, by detecting a visual color change, by using a spectrophotometer, a fluorescence reader, or a fluorescent microscope. In some embodiments, a hand held unit is utilized for detection. In some embodiments, signal detection is based upon principles of fluorescent resonance energy transfer (FRET), a FRET signal can be detected using a FRET channel on a microscope.
In some embodiments, detection can be performed in multiple steps. In some embodiments, detection is performed on multiple detecting systems.
In an exemplary embodiment, a two-step detection is performed. In this embodiment, a label in the solution is initially detected (e.g. a color change), indicating a further detection step is required on the sample.
Substrates
The current invention provides methods and compositions for detecting and identifying a target using target-specific gNAs and nucleic acid-guided nuclease system proteins. Provided herein are a variety of substrates useful for this purpose.
In some embodiments, target-specific gNAs are attached to a substrate. In some embodiments, the target-specific gNAs are attached to a substrate in a known/pre-determined/referenceable order.
In some embodiments, nucleic acid-guided nuclease system proteins are attached to a substrate.
In some embodiments complexes comprising target-specific gNAs and nucleic acid-guided nuclease system protein complexes are attached to a substrate.
In some embodiments, the nucleic acid to be identified to identify the target is attached to a substrate. In some of the embodiments provided herein, the target nucleic acid comprises, or is, DNA. In some of the embodiments provided herein, the target nucleic acid comprises, or is, RNA.
A gNA, nucleic acid, nucleic acid-guided nuclease system protein, or a complex comprising the same, is attached to a substrate when it is associated with the substrate through a non-random chemical or physical interaction. In some embodiments, the attachment is through a covalent bond. In some embodiments, the nucleic acid is reversibly bound to carboxyl molecules on the surface of the substrate.
The substrate can be made of glass, plastic, silicon, silica-based materials, functionalized polystyrene, functionalized polyethyleneglycol, functionalized organic polymers, nitrocellulose or nylon membranes, paper, cotton, and materials suitable for synthesis. Substrates need not be flat.
In some embodiments, the substrate is a 2-dimensional array. In some embodiments, the 2-dimensional array is flat. In some embodiments, the 2-dimensional array is not flat, for example, the array is a wave-like array.
Substrates include any type of shape including spherical shapes (e.g., beads). Materials attached to substrates may be attached to any portion of the substrates (e.g., may be attached to an interior portion of a porous substrates material).
In some embodiments, the substrate is a 3-dimensional array, for example, a microsphere.
In some embodiments, the microsphere is magnetic. In some embodiments, the microsphere is glass. In some embodiments, the microsphere is made of polystyrene. In some embodiments, the microsphere is silica-based.
In some embodiments, the substrate is an array with interior surface, for example, is a straw, tube, capillary, cylindrical, or microfluidic chamber array. In some embodiments, the substrate comprises multiple straws, capillaries, tubes, cylinders, or chambers. In some embodiments, the outer surface of the substrate is tethered with sample nucleic acid or target-specific gNA-nucleic acid-guided nuclease system protein complexes. In some embodiments, the interior surface of the substrate is tethered with sample nucleic acid or target-specific gNA-nucleic acid-guided nuclease system protein complexes. In some embodiments, both the outer surface and the interior surfaces of the substrate is tethered with sample nucleic acid or target-specific gNA-nucleic acid-guided nuclease system protein complexes.
A gNA, nucleic acid, nucleic acid-guided nuclease system protein, or a complex can be attached to a substrate via a linker. A linker is a chemical moiety that is attachable to a substrate on one end and the gNA, nucleic acid, nucleic acid-guided nuclease system protein, or complex on the other end. The linker comprise atoms or molecules that link or bond two entities, but that is not a part of either of the individual linked entities. In general, linker molecules are oligomeric chain moieties containing 1-500 linearly connected chemical bonds. In some embodiments, the linker contains PEG linkers, I-Linker™ (Integrated DNA Technologies) modifiers, amino modifiers, thiol modiers, etc. In some embodiments, photolabile linkers are used to attach gNAs or nucleic acid-guided nuclease system proteins to the substrate surface, and upon light irradiation, the complexes can be released. In some embodiments, the complexes are be released by enzyme digestion or chemical degradation at the site of linkers.
In some embodiments, the complexes are attached to the substrate via the nucleic acid-guided nuclease system proteins, such as using substrates coated with antibodies against nucleic acid-guided nuclease system proteins, or directly immobilizing the nucleic acid-guided nuclease system proteins on the substrate.
In some embodiments, nucleic acids are attached to the substrate in a known order.
In some embodiments, gNAs are in vitro transcribed on substrate and then complexed with nucleic acid-guided nuclease system proteins.
In some embodiments, the substrate is reusable. In such embodiments, the substrate can comprise target-specific gNAs or target-specific gNA-nucleic acid-guided nuclease system protein complexes attached to the substrate in a known order. The gNAs and gNA-nucleic acid-guided nuclease system protein complexes can be organized in spots, blocks, beads, droplets, wells, or other organization structures on a given substrate. In some embodiments, the nucleic acid is stripped off, and the gNA-nucleic acid-guided nuclease system protein complexes remain for reuse.
A substrate can contain gNAs from 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 40, 50, 60, 70, 75, 80, 90, 100, 200, 250, 300, 400, 500, 600, 700, 750, 800, 900, 1000, 2500, 5000, 7500, or even at least 10,000 targets. In some exemplary embodiments, a substrate comprises gNAs targeted to about 1-3, 1-5, 1-10, 1-25, 1-50, 1-75, 1-100, 5-10, 5-25, 5-50, 5-75, 5-100, 10-20, 10-25, 10-50, 10-75, 10-100, 25-50, 25-75, 25-100, 50-75, 50-100, 75-100, 100-150, 100-200, 100-300, 100-400, 100-500, 100-600, 100-700, 100-800, 100-900, 100-1000, 100-200, 200-300, 200-400, 200-500, 200-600, 200-700, 200-800, 200-900, 200-1000, 300-400, 300-500, 300-600, 300-700, 300-800, 300-900, 300-1000, 400-500, 400-600, 400-700, 400-800, 400-900, 400-1000, 400-600, 400-700, 400-800, 400-900, 400-1000, 500-600, 500-700, 500-800, 500-900, 500-1000, 600-700, 600-800, 600-900, 600-1000, 700-800, 700-900, 700-1000, 800-900, 800-1000, 900-1000, 500-1000, 500-5000, 500-10,000, 1000-5000, 1000-10,000 targets, or even about 5000-10,000 targets.
A substrate can comprise at least 1, at least 5, at least 10, at least 25, at least 50, at least 75, at least 100, at least 250, at least 500, at least 750, at least 1000, at least 2500, at least 5000, at least 75000, at least 10,000, at least 25,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, at least 750,000, at least 1,000,000, at least 2,500,000, at least 5,000,000, at least 7,500,000, at least 10,000,000 unique gNAs attached to the substrate. In some embodiments, a substrate comprises at least 5-10, 10-50, 50-100, 10-100, 100-500, 100-1000, 100-10,000, 100-100,000, 100-1,000,000, 100-10,000,000, 1000-10,000, 1000-100,000, 1000-1,000,000, 1000-10,000,000, 10,000-100,000, 10,000-1,000,000, 10,000-10,000,000, 100,000-1,000,000, 100,000-10,000,000, or 1,000,000-10,000,000 unique gNAs attached to the substrate. In some embodiments a substrate comprises at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 unique gNAs attached to the substrate.
Provided herein are methods and for the detection and identification of targets (i.e. target nucleic acids, target DNA and/or target RNA), using target-specific gNAs and nucleic acid-guided nuclease system proteins.
In particular, provided herein is a method of identifying a target in a sample comprising: (a) contacting nucleic acid from a sample with a plurality of gNA-nucleic acid-guided nuclease system protein complexes, wherein the complexes are targeted to at least one target, and wherein the nucleic acid, the gNA-nucleic acid-guided nuclease system protein complexes, or both the nucleic acid and the gNA-nucleic acid-guided nuclease system protein complexes comprise a label; and (b) identifying the target in the sample, wherein the identifying is achieved by detecting a specific signal from the label, wherein the presence of a specific signal indicates binding of a gNA-nucleic acid-guided nuclease system protein complex to the nucleic acid. As provided herein, the nucleic acid of the target can be DNA, RNA, or a mixture of the two.
The methods of the invention can be carried out under any operating conditions. For example the methods can be carried out at 0° C.-100° C. In some embodiments, the method is carried out at 0° C., 25° C., 37° C., 50° C., 72° C., or even 100° C. In an exemplary embodiment, the method is carried out at room temperature. In an exemplary embodiment, the method is carried out at temperature range of 50°-80° C.
The methods of the invention can be carried out in under 10 minutes, in under 15 minutes in under 30 minutes, in under 60 minutes, in under 90 minutes, in under 120 minutes, in under 150 minutes, in under 180 minutes, in under 210 minutes, in under 240 minutes, in under 270 minutes, in under 300 minutes, in under 330 minutes, in under 360 minutes, in under 7 hours, in under 8 hours, in under 9 hours, in under 10 hours, in under 11 hours, in under 12 hours, in under 15 hours, in under 20 hours, in under 24 hours, or in under 36 hours.
Specific embodiments of the methods of the invention are discussed as exemplary schemes in turn below. It is to be noted that the exemplary schemes illustrate an embodiment of the invention where the target nucleic acid comprises DNA. It is to be understood that this is illustrative only, and that the methods and compositions provided herein are applicable for target identification when the target nucleic acid comprises DNA, RNA, or comprises a mixture of DNA and RNA.
As described and provided herein, the target nucleic acid (e.g., DNA), the gNAs (e.g., gRNAs), the nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins), the gNA-nucleic acid-guided nuclease system protein complexes, or combinations of the same can comprise a label (e.g., 105, 106, 107, 108). This is illustrated, for example, in
In another exemplary embodiment depicted in
In another exemplary embodiment, the target detection method comprises an initial target screening step. In some embodiments, a sample is obtained and stained with nucleic acid dyes. This will allow for an initial determination of whether any nucleic acid is even present within the sample. In other embodiments, a sample is subjected to an initial screen by incubating the sample with a first detection marker such as HRP-labeled complexes comprising target-specific gNA and nucleic acid-guided nuclease system proteins, at room temperature. Unbound complexes are washed away. Substrate for the HRP is added and further incubated. If the sample contains the target nucleic acids (e.g. DNA), it will change color. Such a rapid initial screening steps can determine which samples require further processing, for more detailed identification. In some embodiments, a further identification of target DNA within the sample is followed. In this multi-screening context, multi-component tags are particularly useful—for example the complex could contain a HRP label for initial detection, and a fluorophore label for more detailed identification.
In another exemplary embodiment (
In one particular embodiment, depicted for example in
In another embodiment,
In another exemplary embodiment (
In another exemplary embodiment (see
In another exemplary embodiment (
In some embodiments (see, e.g.,
Nucleic acids can be brought into proximity with a substrate by a variety of means in addition to diffusion. Electrophoresis and/or fluid flow can be used to concentrate nucleic acids at or near a surface. Other techniques can also be employed. For example, a surface can have hydrophobic surface chemistry over all or some of its surface (e.g., at features), and target nucleic acids can be tagged with a hydrophobic moiety, leading the nucleic acids to have an energetic preference for the hydrophobic regions of the surface. In another example, target nucleic acids can be tagged with a magnetic particle, and magnetic fields can be used to bring the target nucleic acids toward an array surface.
Volume-excluding compounds can also be used to effectively concentrate sample nucleic acids, such as sample DNA. A volume excluder can be used to exclude sample material from the liquid volume occupied by the volume excluder, thereby concentrating the sample material in the remaining liquid volume. This mechanism can help accelerate capture or binding of sample material, such as hybridization of sample nucleic acids to a substrate. For example, volume excluders can be included in a hybridization buffer to improve hybridization kinetics. Volume excluders can be, for example, beads or polymers, including but not limited to dextran sulfate, ficoll, and polyethylene glycol. Volume excluders can be high molecular weight polymers. Volume excluders can be negatively charged, for example to reduce binding of nucleic acids to the volume excluders.
The present application provides kits comprising any one or more of the compositions and collections described herein, not limited to target-specific gNAs, a collection of target-specific gNAs, labeled target-specific gNAs, target-specific gNAs complexed with nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins), target-specific gNAs attached to a substrate, target-specific gNAs complexed with nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) and attached to a substrate, pathogen-specific gNAs, a collection of pathogen-specific gNAs, labeled pathogen-specific gNAs, pathogen-specific gNAs complexed with nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins), pathogen-specific gNAs attached to a substrate, pathogen-specific gNAs complexed with nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) and attached to a substrate, substrates, and the like. In one embodiment, the nucleic acid-guided nuclease system protein is Cas9.
The present application also provides for compositions and kits with all essential reagents and instructions for carrying out the methods of making, labeling, or attaching the target-specific gNAs and the target-specific gNA-nucleic acid-guided nuclease system protein complexes (e.g., gNA-nucleic acid-guided nuclease system protein complexes) to substrates, as described herein. Reagents can include dyes, and fluorescent nucleotides necessary for detection.
Also provided herein is computer software monitoring the information before and after carrying the methods of detection and identification provided herein.
The following examples are included for illustrative purposes and are not intend to limit the scope of the invention.
In this example, a clinical sample (e.g., blood, swab, cerebrospinal fluid) is obtained and DNA or RNA is extracted (e.g., using a Qiagen Blood DNA or RNA extraction kit). If RNA is extracted, it is converted into cDNA using standard methods. Optionally, the DNA can be sheared using enzymatic or mechanical means. The DNA is then applied to a substrate that contains pathogen-specific gRNAs complexed with dCas9 in a known specific order. A colorimetric or fluorescent readout determines the type of pathogen present (e.g., Ebola, HIV, Mycobacterium tuberculosis, etc.). The specific gRNA library used in the detector can be tailored to detect pathogens in different scenarios (e.g, pneumonia, urinary tract infection, foodborne infections). The rapid readout of the detector shortens the time between sample collection and diagnosis compared to standard methods such as culturing or RT-PCR, and detection can be carried out on-site—for example, in the field at the site of an outbreak. The target DNA can also be eluted from the substrate and subsequently sequenced to obtain additional information about the pathogen.
In this example, an environmental sample (e.g., air filter, soil sample, surface swab) is obtained and DNA or RNA is extracted (e.g., using a MO Bio Soil DNA extraction kit). If RNA is extracted, it is converted into cDNA using standard methods. Optionally, the DNA can be sheared using enzymatic or mechanical means. The DNA is then applied to a substrate, which contains gRNAs targeting a set of potential bioweapon agents, and the colorimetric or fluorescent readout determines the type of pathogenic bioweapon present (e.g., Bacillus anthraces, Yersinia pestis). The rapid readout of the detector shortens the time between sample collection and detection of a threat compared to standard methods such as RT-PCR, and detection can be carried out on-site (e.g., in the field). The target DNA can also be eluted from the substrate and subsequently sequenced to obtain additional information about the pathogen.
In this example, a sample of a bioweapon toxin is obtained, and DNA or RNA is extracted (e.g., using a MO Bio Soil DNA extraction kit). If RNA is extracted, it is converted into cDNA using standard methods. Optionally, the DNA can be sheared using enzymatic or mechanical means. The DNA is then applied to a substrate, which contains gRNAs targeting a set of potential toxin agents. For example, if the toxin were ricin, then subspecies of the castor plant itself could allow rapid identification of the toxin, as well as possible subspecies and region of origin of the castor plant.
Product-specific gRNAs could be made to identify different lots or origins of materials such that the origin of a product could be tracked. For example, a baked good could contain ingredients from multiple countries of origin, or a beef sample could be contaminated with horsemeat. An array of location-specific gRNAs could direct enforcement agencies to the location of the bulk ingredients for a product. It could also be helpful in the detection of counterfeit goods that are smuggled into the country by identifying DNA markers on the product packaging.
In this example, a sample containing human DNA (e.g., forensic sample) is obtained and DNA is extracted (e.g., using a QIAmp DNA Micro kit kit). Optionally, the DNA can be sheared using enzymatic or mechanical means. The DNA is then applied to a substrate, which contains gRNAs targeting a set of SNPs selected for human identification, and the colorimetric or fluorescent readout can be selected so as to be unique for each individual. The gRNAs would be selected such that different SNP alleles would produce differential binding of the gRNAs. The rapid readout of the detector shortens the time between sample collection and suspect identification compared to standard methods such as PCR and capillary electrophoresis, and detection can be carried out on-site (e.g., in the field). The target DNA can also be eluted from the substrate and subsequently sequenced to obtain additional information about the individual (e.g., phenotype information).
In this example, a sample containing cancer DNA (e.g., a tumor sample) is obtained and DNA is extracted (e.g., using a Qiagen DNeasy tissue kit). Optionally, the DNA can be sheared using enzymatic or mechanical means. The DNA is then applied to a substrate, which contains gRNAs targeting a set of sites commonly mutated in cancers, and the colorimetric or fluorescent readout can be selected so as to be unique for each mutation. The gRNAs would be selected such that different SNP alleles would produce differential binding of the gRNAs. The rapid readout of the detector shortens the time between sample collection and tumor profiling compared to standard methods such as exome sequencing, and detection can be carried out on-site (e.g., in the clinic). The target DNA can also be eluted from the substrate and subsequently sequenced to obtain additional information.
While the described invention has been described with reference to the specific embodiments thereof it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adopt a particular situation, material, composition of matter, process, process step or steps, to the objective spirit and scope of the described invention. All such modifications are intended to be within the scope of the claims appended hereto.
Patents, patent applications, patent application publications, journal articles and protocols referenced herein are incorporated by reference in their entireties, for all purposes.
This application claims the priority benefit of U.S. Provisional Patent Application Ser. No. 62/298,937, filed on Feb. 23, 2016, which is hereby incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US17/19212 | 2/23/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62298937 | Feb 2016 | US |