This Application incorporates by reference the Sequence Listing XML file submitted herewith via the patent office electronic filing system having the file name “211390 sequences.xml” and created on Jan. 24, 2024 with a file size of 47,195 bytes.
Rapid and specific detection of infectious agents is necessary for the efficient treatment of their corresponding diseases and for epidemiologic surveillance. Nucleic-acid-based detection methods such as polymerase chain reaction (PCR) could be considered a gold standard for infectious disease diagnostics, offering high specificity and sensitivity, but are limited by high cost, the requirement of expensive instrumentation, and need for highly trained personnel. Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-based detection methods and CRISPR-Associated Protein (CAS) systems emerged recently as an alternative to PCR-based diagnostics with the potential to develop less complex detection assays but offering the diagnostic accuracy of PCR.
The enzymes and methods referred to as CRISPR tools have been in the recent years exploited to carry out a number of functions, from DNA and RNA mutagenesis, modification, and labeling to enrichment, detection, and characterization of known and unknown nucleic acid (NA) sequences. The Cas proteins, which, in their most important function, are essentially just programmable, RNA-guided, DNA or RNA binding proteins, can be applied to direct a number of actions that one might wish to perform on a NA target.
For the CRISPR-based detection of nucleic acid targets (especially those with high variability), it is important to understand the relationship between the sequence and function of the CRISPR guide RNA (crRNA) molecules defining the specificity for the target NA. CRISPR detection applications rely on “design rules” for crRNAs against specific DNA (DETECTR) or RNA (SHERLOCK) targets using Cas12 and Cas13, respectively. Empirical design rules for RNA targets using Cas13 were originally based on observed experimental results of collateral cleavage of fluorescent molecular beacons following specific activation of Cas13 using a relatively small number of candidate crRNAs. Another approach for automated, target-specific crRNA design for large numbers of different pathogens uses neural network-based machine learning algorithms trained on data obtained from massively parallel Cas13a crRNA collateral nuclease screens. Such an approach, however, depends on the selection of multiple specific guide RNAs to cover a broad phylogenetic target range.
A need exists for a technique involving for the detection of highly variable nucleic acid targets and where the design rules for degenerate crRNAs have application to a wide array of targets.
Described herein is a technique to identify the smallest number (or a close approximation thereto) of degenerate crRNAs that could activate Cas13a collateral activity to produce a simple binary result for the presence or absence of any member of a phylogenetically diverse group.
In one embodiment, a method for detecting members of a phylogenetically diverse group includes (1) identifying conserved regions in a set of diverse nucleic acids in the phylogenetically diverse group; (2) designing candidate degenerate complementary spacer regions of CRISPR guide RNAs (crRNAs) corresponding to the conserved regions; (3) conducting high-throughput screening of the candidate degenerate crRNAs against complementary synthetic targets to obtain high performing degenerate crRNAs; (4) conducting high-throughput screening of the high performing degenerate crRNAs against targets representing at least a majority of the phylogenetically diverse group and at least one target representing a near neighbor to the phylogenetically diverse group to obtain a dataset; and (5) using a machine learning algorithm to analyze the dataset to identify generalizable crRNA design rules for detection of members of the phylogenetically diverse group.
A further embodiments include a CRISPR guide RNA (crRNA) comprising a nucleic acid sequence identified as described herein, as well as Cas-based assay systems incorporating such crRNAs.
Before describing the present invention in detail, it is to be understood that the terminology used in the specification is for the purpose of describing particular embodiments, and is not necessarily intended to be limiting. Although many methods, structures and materials similar, modified, or equivalent to those described herein can be used in the practice of the present invention without undue experimentation, the preferred methods, structures and materials are described herein. In describing and claiming the present invention, the following terminology will be used in accordance with the definitions set out below.
As used herein, the singular forms “a”, “an,” and “the” do not preclude plural referents, unless the content clearly dictates otherwise.
As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
As used herein, the term “about” when used in conjunction with a stated numerical value or range denotes somewhat more or somewhat less than the stated value or range, to within a range of ±10% of that stated.
The approach to identify the smallest number (or a close approximation thereto) of degenerate crRNAs to detect members of a phylogenetically diverse group via Cas13a collateral activity used Lassa virus (LASV) as a model taxon. This entailed the following steps:
Additional details can be found in the Appendices of U.S. Provisional Patent Application No. 63/441,196.
Results demonstrated that a “single” degenerate crRNA, being a de-facto mixture of 512 distinct crRNAs with all permutations of bases in the degenerate sites of the spacer, can detect targets representing all seven currently identified LASV lineages. Detection was independent of the presence of a genomic nucleic acid background and only one closely-related near-neighbor cross-reacted out of eleven tested. The analysis of the experimental dataset revealed that the primary variable predicting greater than threshold collateral cleavage (positive detection) was a total number of mismatches between the crRNA spacer and the target region. To a lesser extent, the signal was affected by the positions and distribution of the mismatches and identity of nucleotides in the protospacer flanking sites (PFS). This indicates that many fewer crRNAs can be dedicated to the detection of phylogenetically diverse targets in assays having a limited number of test channels. The approach could also prove useful for other CRISPR-Cas applications.
The Lassa virus (LASV) was used to demonstrate this technique for the design and use of degenerate crRNAs for detection of highly diverse RNA targets. The LASV genome has conserved regions, however even these exhibit a significant degree of variability among the seven known lineages of the virus.
Multiple degenerate crRNA were designed for each of the conserved targets in L and GPC genes of lineage I and lineage IV LASV (
Both crRNA synthesis and Cas13a activity assays were conducted using a high throughput workflow in 384 well plates with fluid transfer handled by Echo 525 acoustic liquid handler (Beckman Coulter, Indianapolis, IN) using the Plate Reformat software provided by the manufacturer. In order to generate crRNAs, 27 to 50 crRNA transcription reactions were set up using reagent volumes as described above for an individual reaction. The reactions included the tested crRNAs and a negative control (crRNA template oligo replaced with TE buffer). First, master mix containing all reaction components except for template oligonucleotide were distributed using Echo instrument from 6 well Echo qualified reservoir plate to Echo qualified 384 well microplate. 23.5 μL of the master-mix were transferred to each well. Subsequently 1.5 μL of the crRNA template oligonucleotides were added to each well containing master-mix using Echo instrument from a previously prepared Echo qualified 384 well microplate. The plates were spun briefly in a centrifuge at approximately 1500 g to bring all the liquid to the bottom of the wells and remove air bubbles. The plate was sealed and incubated for 2 hours at 37° C. After incubation, the plates with transcribed crRNAs were stored in −80° C. To determine the efficacy of each crRNA, Cas13a nuclease activity assays were conducted using Cas13a enzyme from L. wadei which was synthesized and purified by GenScript Biotech (Piscataway, NJ). The enzyme was stored and diluted using the storage buffer (50 mM Tris-HCl, 600 mM NaCl, 5% Glycerol, 2 mM DTT, pH 7.5). Each nuclease activity assay was performed in 20 μL reaction that included 1 μL of 1 μM Cas13a, 1 μL of 2 μM RNase alert v.2 (from RNaseAlert™ QC System v2, ThermoFisher, Grand Island, NY), 17.2 μL of nuclease assay buffer (40 mM Tris-HCl, 60 mM NaCl, 6 mM MgCl2, pH 7.3), 0.4 μL of crRNA (from unpurified transcription reaction) and 0.4 μL of target RNA (50 ng/μL for full length target or 12 ng/μL for short target). For each crRNA a total of six reactions were set up, with three target negative reactions and three target positive. First, master mix containing all reaction components except for crRNA and target RNA were distributed using Echo instrument from 6 well Echo qualified Reservoir plate to a 384 well assay plate (black with clear flat bottom, cat #3762, Corning Life Sciences, Tewksbury, MA). A total volume of 19.2 μL of the master-mix was transferred to each well. Next, 0.4 μL crRNAs from previously prepared 384 well microplate were transferred using Echo instrument to the wells containing the master-mix in such a way that each crRNA was added to 6 subsequent wells in the reaction plate. Finally, 0.4 μL of the target RNAs (previously placed in the area of the crRNA plate not occupied by transcribed crRNAs) were added to three of the wells for each crRNA. The Cas13a reaction plates were spun briefly in a centrifuge at approximately 1500×g to bring the liquid to the bottom of the wells and remove air bubbles. Immediately after spinning, the reaction plates were sealed using the MicroAmp sealers. The plates were incubated in Biotek Synergy Neo2 plate reader (Biotek, Winooski, VT) at 37° C. and fluorescence was read from the bottom of the wells every 5 minutes for 2 hours using excitation at 490 nm, emission at 520 nm and gain set at 100. The integrated background corrected final fluorescence values reflecting the Cas13a RNase activation for each of the crRNAs was calculated by subtracting the sum of averages of fluorescence measured for template negative samples over the course of the experiment (25 measurements) from sum of averages for template positive samples. The crRNAs were classified into three groups based on the integrated, background subtracted, fluorescent signal relative to the highest signal obtained. Thee performance classes were defined: high performance (with signal at 80% or higher), intermediate (signal lower than 80% but higher than 20%) and low (signal at 20% or lower).
The performance testing results are shown in
One of the crRNA molecules tested using this experimental setup (crRNA #5_LIV, SEQ ID No: 57) was able to detect all lineages of LASV while cross-reacting with only one near neighbor species (
The data obtained in Cas13a assays using the eight crRNA sequences (#5, #9, #29 and #33, lineage II and IV versions for each of these crRNAs) was used to identify the variables affecting crRNA performance.
To understand how the distribution of mismatches and other variables influence the outcome of the detection assays, a RuleFit algorithm-based predictive model was developed using our experimental data. The RuleFit classifier was trained to predict whether a guide-target pair would yield a signal above or below threshold (defined as 20% of maximum signal) using the mismatch datasets. Following training, a variety of features were related to mismatch count and position (see
The RuleFit classifier model identifies the prediction rules associating the most important features.
While the example used the RuleFit machine learning model, other models could be used to develop suitable degenerate crRNAs.
Degenerate crRNA design features identified in this disclosure may be potentially used for targets other than LASV by selective grouping of the desired taxon sequences and exclusion of others using recommended feature ranges. A general algorithm for selection of degenerate crRNA candidates would be: (1) initial selection of crRNA sequences with degenerate spacers, (2) determination of mismatch positions between the spacer and all versions of the target and near-neighbor sequences, (3) use of the RuleFit classifier model to determine the Cas13a-based CRISPR assay for targets and near-neighbors and (4) assessment if the tested crRNA fulfils the desired criteria (detecting the intended targets and not cross-reacting with near-neighbors). If the criteria are not fulfilled, the algorithm generates new candidate crRNA and goes to step 2, otherwise step 5, which entails in vitro validation of the optimized crRNAs.
Suitable Cas-based assay systems include those known in the art such as those described in Spangler, J. R., Leski, T. A., Schultzhaus, Z. et al. “Large scale screening of CRISPR guide RNAs using an optimized high throughput robotics system,” which is incorporated herein by reference for the purposes of teaching Cas-based assays.
The design of degenerate crRNAs for detection of highly variable nucleic acid targets as described herein offers the following advantages over alternatives. First, the use of a single degenerate crRNA for detection of variable targets avoiding multiplexing of CRISPR based detection reactions for detection of broad taxonomic ranges, with degenerate crRNA composed of up to 2048 guide sequence permutations were shown to perform well. Second, the technique offers the potential for rapid design of crRNA sequences for any variable target using RuleFit based design rules. In contrast, most current design methods require extensive empirical testing of designed crRNAs in vitro while those existing automated design methods (e.g. ADAPT) do not allow using degenerate crRNAs and require multiple crRNAs to detect highly variable targets. Finally, this technique makes it easy to synthesize and test degenerate RNAs, whereas alternative methods use expensive and inflexible direct RNA synthesis with modified bases.
All documents mentioned herein are hereby incorporated by reference for the purpose of disclosing and describing the particular materials and methodologies for which the document was cited.
Although the present invention has been described in connection with preferred embodiments thereof, it will be appreciated by those skilled in the art that additions, deletions, modifications, and substitutions not specifically described may be made without departing from the spirit and scope of the invention. Terminology used herein should not be construed as being “means-plus-function” language unless the term “means” is expressly used in association therewith.
Detection of Pathogens (Including LASV) with Highly Variable Sequences
This application claims the benefit of U.S. Provisional Patent Application No. 63/441,196 filed on Jan. 26, 2023, the entirety of which, inclusive of appendices, is incorporated herein by reference.
The United States Government has ownership rights in this invention. Licensing inquiries may be directed to Office of Technology Transfer, US Naval Research Laboratory, Code 1004, Washington, DC 20375, USA; +1.202.767.7230; techtran@nrl.navy.mil, referencing NC 211390.
| Number | Date | Country | |
|---|---|---|---|
| 63441196 | Jan 2023 | US |