The instant application contains a Sequence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Mar. 24, 2021, is named 002806-097220WOPT_SL.txt and is 302,231 bytes in size.
The technology described herein relates to multiplexed methods, kits, and compositions for detecting target RNAs, such as viral RNAs.
Highly -scalable and highly-sensitive viral diagnostics (e.g. for SARS-CoV-2) are critical for both pandemic response and long-term epidemiological surveillance. During a pandemic, population-wide testing can provide effective control and monitoring of the viral spread and allow safe return to work. In the long term, regular and population-wide monitoring promises a “bio-weather map” to identify and forecast new viral infection hotspots, preventing the “next outbreak”. Furthermore, the ability to sequence and identify emerging viral variants (e.g. B.1.1.7, B 1.427 for SARS-CoV-2), also on the population scale, allows real-time monitoring of the rate of transmission and pathogenicity, as well as informing public health policies and vaccine development. Current diagnostic methods fall short of these requirements, as they are limited in either sample processing throughput, testing sensitivity and reliability, or the ability to identify different viral variants.
At present, molecular tests using “gold standard” reverse transcription polymerase chain reaction (RT-qPCR) in central laboratory facilities have demonstrated high detection sensitivity (down to 200 gce/mL-1,000 gce/mL of SARS-CoV-2 (by the FDA’s comparison panel results), but they are limited in throughput by the requirements of RNA extraction and PCR thermocycling on each sample individually, as well as other liquid handling operations; see e.g., Vandenberg et al. Nat Rev Microbiol 19, 171-183 (Oct. 14, 2020); MacKay et al. Nat Biotechnol 38, 1021-1024 (Aug. 20, 2020); Esbin et al., RNA 26, 771-783 (May 1, 2020); Arnaout et al. SARS-CoV2 Testing: The Limit of Detection Matters (bioRxiv, Jun. 4, 2020); the contents of each of which are incorporated herein by reference in their entireties. As a result, it is challenging for most current clinical labs to perform more than 10,000 diagnostic tests per day, even with the help of automation; see e.g., Cobas SARS-CoV-2 Instructions for Use (Mar. 12, 2020), available on the world wide web at fda.gov/media/136049/download; the content of which is incorporated herein by reference in its entirety. By re-purposing large-scale liquid handling and sample automation, up to 100,000 tests per day can be achieved, but this approach requires heavy upfront capital investment and personnel costs.
Next-generation sequencing (NGS) based methods have long been attractive alternatives to RT-qPCR in two ways: (i) the intrinsic high-throughput readout for multiplexed diagnostics, and (ii) the ability to obtain viral genome sequences for variant identification. In principle the very high-throughput (up to 1010 reads per session, on an Illumina NovaSeq™ machine) allows a single testing lab to process up to a million patient samples per day with pooled analysis, if they could avoid the handling of individual samples. Since the beginning of the COVID-19 pandemic, several methods for NGS-based multiplexed testing have been proposed and developed. See e.g., Bloom et al., Swab-Seq: A high-throughput platform for massively scaled up SARS-CoV-2 testing, medRxiv (Aug. 6, 2020); Illumina™ COVIDSeq Test Instructions for Use (May 1, 2020); Hossain et al. A massively parallel COVID-19 diagnostic assay for simultaneous testing of 19200 patient samples. Google Docs (Mar. 20, 2020); Schmid-Burgk et al. LAMP-Seq: Population-Scale COVID-19 Diagnostics Using a Compressed Barcode Space bioRxiv (Apr. 8, 2020); Wu et al., INSIGHT: A population-scale COVID-19 testing strategy combining point-of-care diagnosis with centralized high-throughput sequencing. Sci Adv 7, (Feb. 12, 2021); Yelagandula et al. SARSeq, a robust and highly multiplexed NGS assay for parallel detection of SARS-CoV2 and other respiratory infections (medRxiv, Nov. 3, 2020); the contents of each of which are incorporated herein by reference in their entireties.
As expected, methods that achieved detection sensitivity close to the RT-qPCR tests (200-1000 gce/ml) mostly followed the traditional barcoding and sequencing workflows, which also required RNA extraction and PCR thermocycling steps, see e.g., supra, Bloom, Illumina, or Yelagandula (or used an extraction-free protocol but with ~10 x lower sensitivity, see e.g., Bloom supra; Bruce et al., PLoS Biol 18, e3000896 (Oct. 2, 2020); the contents of each of which are incorporated herein by reference in their entireties), which in practice hindered the maximum achievable sample throughput. Furthermore, current methods either do not report viral variant information, or perform whole genome sequencing (WGS), which further limits the achievable throughput due to the large number of sequencing reads required. As such, there is great need for sequencing-based methods that achieves high sensitivity, high throughput, and identification of viral variants.
The technology described herein is directed to multiplexed methods of detecting at least one target RNA in at least two samples. Specifically, the methods use primers comprising at least one barcode region. Also described herein are kits, compositions, and system associated with such methods. Such multiplexed methods, also referred to herein as “One-Seq,” exhibit at least the following advantages compared to existing detection methods: (1) the workflow permits barcoding of 50-5,000 samples per batch, with up to ~100,000 total samples per sequencing run; (2) the workflow permits pre-amplification pooling of reverse transcription products; (3) the method can be used to detect multiple loci on one target RNA molecule in one test; (4) the method can be used to detect multiple RNA target molecules, e.g., multiple viruses, in one test; (5) the method exhibits high sensitivity, e.g., as the number of RNA targets that are on one RNA molecule increases, the level of sensitivity increases (e.g., the sensitivity of the SARS-CoV-2 detection method approaches 50-150 genome copy equivalents per mL (gce/mL), compared to other sequencing-based tests that detect over 1000 gce/mL; (6) the method exhibits high efficiency, with reduced labor (e.g., no upfront extraction step, a one-pot reverse transcription step, reduced liquid-handling steps, etc.) and reduced cost per test; (7) the protector nucleic acid described herein can be used to reduce or eliminate barcode crosstalk that can result from reverse transcription primer carry-over into the amplification step; and (8) specially-designed primers can be used to detect variations of interest in the target RNA.
Accordingly, in one aspect described herein is a multiplexed method of detecting at least one target RNA in at least two samples, comprising: (a) contacting the at least two samples with a reverse transcriptase and a first primer or first set of primers comprising at least a first barcode, under conditions permitting the generation of reverse transcription products; (b) combining reverse transcription products from samples in step (a) in one container to form a pooled reverse transcription product mixture; (c) contacting the pooled reverse transcription product mixture with a DNA polymerase and a set of second primers under conditions permitting the generation of amplification products; and (d) sequencing the amplification products, thereby detecting at least one target RNA, if present, in the at least two samples.
In some embodiments of any of the aspects, step (b) is performed before step (c).
In some embodiments of any of the aspects, steps (a)-(d) are performed sequentially.
In some embodiments of any of the aspects, the detection method has a limit of detection of at least 500 target RNA copies per mL for a given target RNA.
In some embodiments of any of the aspects, the detection method has a limit of detection of at least 1000 target RNA copies per mL for a given target RNA.
In some embodiments of any of the aspects, the detection method has a dynamic range of at least 3 logs.
In some embodiments of any of the aspects, at least 2 target RNAs in a single sample are detected.
In some embodiments of any of the aspects, the at least 2 target RNAs are on the same RNA molecule.
In some embodiments of any of the aspects, the at least 2 target RNAs are on different RNA molecules.
In some embodiments of any of the aspects, at least one target RNA is a viral RNA.
In some embodiments of any of the aspects, at least 2 target RNAs are from the same virus.
In some embodiments of any of the aspects, at least 2 target RNAs are from at least 2 different viruses.
In some embodiments of any of the aspects, at least one viral RNA is a SARS-CoV-2 RNA.
In some embodiments of any of the aspects, target RNAs from at least 50 samples are detected in a single performance of steps (a) - (d).
In some embodiments of any of the aspects, prior to step (a), the at least one target RNA is not extracted from the sample.
In some embodiments of any of the aspects, the reverse transcriptase (RT) is an engineered or recombinant version of an Moloney Murine Leukemia Virus (MMLV) RT, Avian Myeloblastosis Virus (AMV) RT, or another naturally occurring RT.
In some embodiments of any of the aspects, the first primer or each primer in the first set of primers comprises, from 5′ to 3′: (a) an adaptor region; (b) a first barcode region; and (c) a target-binding region that is complementary or substantially complementary to and permits hybridization to at least one target RNA.
In some embodiments of any of the aspects, the first primer or each primer in the first set of primers comprises, from 5′ to 3′: (a) an adaptor region; (b) a first barcode region; (c) a second barcode region; and (d) a target-binding region that is complementary or substantially complementary to and permits hybridization to at least one target RNA.
In some embodiments of any of the aspects, the barcode region of a first primer in the first set of barcoded primers is a Hamming distance of at least 10 from each other barcode region of any other primer in the first set of barcoded primers.
In some embodiments of any of the aspects, the first or second barcode region on the first primer or set of first primers comprises one of SEQ ID NOs: 18-989.
In some embodiments of any of the aspects, at least one barcode region on the first primer or set of first primers corresponds to and is different for each of the at least two samples.
In some embodiments of any of the aspects, at least one barcode region on the first primer or set of first primers corresponds to and is different for each of the target RNAs.
In some embodiments of any of the aspects, the target-binding region of a primer in the first set of primers binds at most 5 nucleotides away from a variation of interest in the target RNA.
In some embodiments of any of the aspects, the variation of interest is selected from the group consisting of: a single-nucleotide variation; a point mutation; a substitution; an insertion; and a deletion.
In some embodiments of any of the aspects, the target RNA is SARS-CoV-2 S gene and the variation of interest is selected from the group consisting of: del69-70, del144, K417N, K417T, L452R, E484K, N501Y, D614G, P681H, and A701V.
In some embodiments of any of the aspects, step (a) further comprises contacting the sample with a detergent.
In some embodiments of any of the aspects, the detergent lyses viral particles or cells in the sample.
In some embodiments of any of the aspects, the detergent releases target RNA from the sample.
In some embodiments of any of the aspects, the detergent is a nonionic surfactant.
In some embodiments of any of the aspects, the detergent is Triton X-100.
In some embodiments of any of the aspects, step (a) further comprises contacting the sample with carrier nucleic acid.
In some embodiments of any of the aspects, the carrier nucleic acid reduces loss of the target RNA.
In some embodiments of any of the aspects, the carrier nucleic acid is poly-A60 DNA oligonucleotide or E. coli tRNA.
In some embodiments of any of the aspects, step (a) further comprises contacting the sample with a positive control nucleic acid.
In some embodiments of any of the aspects, the positive control nucleic acid is a primer comprising from 5′ to 3′: (a) an adaptor region; (b) a first barcode region; and (c) a target-binding region that is complementary to or substantially complementary to a sample nucleic acid.
In some embodiments of any of the aspects, the positive control nucleic acid comprises, from 5′ to 3′: (a) a region that is not identical or substantially identical to any target RNA being assayed; and (b) a region that is identical or substantially identical to at least one target RNA.
In some embodiments of any of the aspects, the region of the positive control nucleic acid that is identical or substantially identical to at least one target RNA is complementary or substantially complementary to the target-binding region of at one least primer from the first set of primers.
In some embodiments of any of the aspects, the positive control nucleic acid comprises SEQ ID NO: 11.
In some embodiments of any of the aspects, the sample is contacted with at least 100-104 copies/ul of positive control nucleic acid.
In some embodiments of any of the aspects, step (a) further comprises contacting the samples with a stabilization agent.
In some embodiments of any of the aspects, the stabilization agent prevents degradation of the RNA target and/or reverse transcriptase for at least 6 hours at room temperature.
In some embodiments of any of the aspects, the stabilization agent prevents degradation of the RNA target and/or reverse transcriptase for at least 24 hours at room temperature.
In some embodiments of any of the aspects, the stabilization agent is an RNA-preserving agent or a reverse-transcriptase-preserving agent.
In some embodiments of any of the aspects, the RNA-preserving agent is an RNase inhibitor, a metal-chelating agent, or a reducing agent.
In some embodiments of any of the aspects, the RNase inhibitor is murine RNase inhibitor or a thermostable RNase inhibitor.
In some embodiments of any of the aspects, the metal-chelating agent is ethylenediaminetetraacetic acid (EDTA).
In some embodiments of any of the aspects, the reducing agent is dithiothreitol (DTT).
In some embodiments of any of the aspects, the reverse-transcriptase-preserving agent is an antibiotic, an antimycotic, or a protease inhibitor.
In some embodiments of any of the aspects, step (a) comprises a reverse transcription reaction.
In some embodiments of any of the aspects, step (a) comprises: (i) incubating the sample, reverse transcriptase, and first primer or first set of primers comprising at least one barcode at a temperature of at least 50° C. for at least 30 minutes; and (ii) inactivating the reverse transcription reaction at a temperature of at least 95° C. for at least 5 minutes.
In some embodiments of any of the aspects, the reverse transcription products from step (a) comprise a barcoded DNA comprising a region that is complementary to a portion of at least one target RNA.
In some embodiments of any of the aspects, reverse transcription products from step (a) from at least 5 different samples are combined in one container.
In some embodiments of any of the aspects, prior to step (c) the first set of barcoded primers is substantially removed.
In some embodiments of any of the aspects, prior to step (c) the target RNA and/or sample is substantially removed.
In some embodiments of any of the aspects, prior to step (c) the first set of barcoded primers or the RNA target is substantially removed using a bead-based purification method or a spin-column-based purification method.
In some embodiments of any of the aspects, the DNA polymerase is a thermostable DNA polymerase I.
In some embodiments of any of the aspects, the DNA polymerase is a Thermus aquaticus (Taq) DNA polymerase.
In some embodiments of any of the aspects, the second set of primers comprises forward and reverse amplification primers.
In some embodiments of any of the aspects, the forward primer in the second set of primers comprises from 5′ to 3′: (a) an adaptor region; and (b) an adaptor-binding region that is identical or substantially identical to the adaptor region of a primer in the first set of barcoded primers.
In some embodiments of any of the aspects, a forward primer in the second set of primers comprises from 5′ to 3′: (a) an adaptor region; (b) a third barcode region; and (c) an adaptor-binding region that is identical or substantially identical to the adaptor region of a primer in the first set of barcoded primers.
In some embodiments of any of the aspects, a reverse primer in the second set of primers comprises, from 5′ to 3′: (a) an adaptor region; (b) a second barcode region; and (c) a target-binding region that is identical or substantially identical to at least one target RNA.
In some embodiments of any of the aspects, a reverse primer in the second set of primers comprises, from 5′ to 3′: (a) an adaptor region; and (b) a region that is identical or substantially identical to at least one target RNA.
In some embodiments of any of the aspects, the barcode region of a first primer in the second set of barcoded primers is a Hamming distance of at least 5 from each other barcode region of any other primer in the second set of barcoded primers.
In some embodiments of any of the aspects, the second or third barcode region in the second set of primers comprises one of SEQ ID NOs: 18-989.
In some embodiments of any of the aspects, step (c) further comprises contacting the reverse transcription product with Uracil-DNA Glycosylase (UDG) enzyme.
In some embodiments of any of the aspects, step (c) further comprises contacting the reverse transcription product or amplification product thereof with a protector nucleic acid.
In some embodiments of any of the aspects, the protector nucleic acid comprises single stranded DNA.
In some embodiments of any of the aspects, the protector nucleic acid comprises, from 5′ to 3′: (a) a region complementary or substantially complementary to a region of at least one target RNA or amplification product thereof, comprising: (i) a 5′ region that is identical or substantially identical to the target-binding region of at least one primer in the first set of primers; and (ii) a 3′ region that is complementary to the target RNA sequence downstream of the target-binding region of at least one primer in the first set of primers; and (b) a 3′ nucleic acid modification that inhibits synthesis of a complementary strand by a polymerase.
In some embodiments of any of the aspects, the 3′ complementary region of the protector nucleic acid is at least 15 nucleotides long.
In some embodiments of any of the aspects, the 3′ complementary region of the protector nucleic acid is at most 30 nucleotides long
In some embodiments of any of the aspects, the 3′ nucleic acid modification is selected from the group consisting of: (a) an inverted base; (b) a spacer; (c) a dideoxynucleotide; (d) a base that is not complementary to the target RNA; and (e) a non-canonical base.
In some embodiments of any of the aspects, the protector nucleic acid displaces a primer from the first set of primers from an amplification product of the reverse transcription product.
In some embodiments of any of the aspects, the protector nucleic acid inhibits or substantially inhibits a primer from the first set of primers from being extended by the DNA polymerase.
In some embodiments of any of the aspects, the protector nucleic acid has a higher binding affinity to an amplification product of the reverse transcription product than the target-binding region of the at least one primer from the first set of primers.
In some embodiments of any of the aspects, the protector nucleic acid has a higher Tm than the target-binding region of the at least one primer from the first set of primers.
In some embodiments of any of the aspects, the protector nucleic acid inhibits or substantially inhibits a primer from the first set of primers from binding to an amplification product of the reverse transcription product.
In some embodiments of any of the aspects, the protector nucleic acid is at least 15 nucleotides long.
In some embodiments of any of the aspects, the protector nucleic acid is at least 30 nucleotides long.
In some embodiments of any of the aspects, the protector nucleic acid is present at a concentration that is greater than the concentration of the primers in the first set of primers.
In some embodiments of any of the aspects, the protector nucleic acid is present at a concentration of at least 0.5 uM.
In some embodiments of any of the aspects, the protector nucleic acid is present at a concentration of at least 2.0 uM.
In some embodiments of any of the aspects, step (c) comprises a nucleic acid amplification method.
In some embodiments of any of the aspects, the amplification method comprises polymerase chain reaction amplification (PCR).
In some embodiments of any of the aspects, step (c) comprises: (i) a denaturation step; (ii) an annealing step; (iii) and an extension step, wherein steps (i)-(iii) are repeated at least 30 times.
In some embodiments of any of the aspects, step (c) further comprises an initial denaturation step before the first step (i) at least 95° C. for at least 60 seconds.
In some embodiments of any of the aspects, step (i) is performed at a temperature of at least 95° C. for at least 15 seconds.
In some embodiments of any of the aspects, step (ii) is performed at a temperature of at least 60° C. for at least 30 seconds.
In some embodiments of any of the aspects, the first two iterations of step (ii) are performed at a temperature of at least 52° C.
In some embodiments of any of the aspects, the iterations of step (ii) after the first two iterations of step (ii) are performed at a temperature of at least 68° C.
In some embodiments of any of the aspects, step (iii) is performed at a temperature of at least 72° C. for at least 30 seconds.
In some embodiments of any of the aspects, step (c) further comprises contacting at least one reverse transcription product with a protector nucleic acid, and wherein step (ii) is performed at a temperature of at least 64° C.
In some embodiments of any of the aspects, step (c) further comprises contacting at least one reverse transcription product with a protector nucleic acid, and wherein step (ii) is performed at a temperature of at least 72° C.
In some embodiments of any of the aspects, step (c) further comprises contacting at least one reverse transcription product with a protector nucleic acid, and at least one of the following: (I) step (ii) is performed at a temperature of at least 64° C.; (II) the 3′ complementary region of the protector nucleic acid is at least 20 nucleotides long; and/or (III) the protector nucleic acid is present at a concentration of at least 0.5 uM.
In some embodiments of any of the aspects, step (c) further comprises contacting at least one reverse transcription product with a protector nucleic acid, and at least one of the following: (I) step (ii) is performed at a temperature of at least 68° C.; (II) the 3′ complementary region of the protector nucleic acid is at least 30 nucleotides long; and/or (III) the protector nucleic acid is present at a concentration of at least 2.0 uM.
In some embodiments of any of the aspects, at least 10 amplification product sets from step (c) are combined in one container.
In some embodiments of any of the aspects, prior to step (d) the second set of barcoded primers are substantially removed.
In some embodiments of any of the aspects, prior to step (d) the second set of barcoded primers are substantially removed using a bead-based purification method or a spin-column-based purification method.
In some embodiments of any of the aspects, the sequencing method is a high-throughput sequencing method.
In some embodiments of any of the aspects, the sequencing method is selected from the group consisting of: sequencing by synthesis, dideoxy chain termination sequencing, pyrosequencing, sequencing by ligation and detection, polony sequencing, ion semiconductor sequencing, sequencing by hybridization, and nanopore sequencing.
In some embodiments of any of the aspects, the sequencing method is sequencing by synthesis.
In some embodiments of any of the aspects, the sequencing method comprises contacting the amplification products with a third set of primers, comprising at least first and second sequencing primers.
In some embodiments of any of the aspects, the first and second sequencing primers comprise an adaptor-binding region that is complementary or substantially complementary to the adaptor region of a primer in the first or second set of primers.
In some embodiments of any of the aspects, the sequencing method produces a sequencing read from the first or second sequencing primer.
In some embodiments of any of the aspects, the sequencing read from the first sequencing primer comprises the sequence of the first barcode region from a primer in the first primer set.
In some embodiments of any of the aspects, the sequencing read from the second sequencing primer comprises the sequence of the first and second barcode regions from a primer in the first primer set.
In some embodiments of any of the aspects, the sequencing read from the second sequencing primer comprises the sequence of the second barcode region from a primer in the second primer set.
In some embodiments of any of the aspects, the sequencing read from the first or second sequencing primer comprises sequence from the target RNA.
In some embodiments of any of the aspects, the sequencing read from the first or second sequencing primer comprises at least one variation of interest in the target RNA.
In some embodiments of any of the aspects, the target RNA is detected in the sample if a first and second barcode region associated with the specific target RNA is detected in the sequencing read of the amplification product.
In some embodiments of any of the aspects, the target RNA is not detected in the sample if a first or second barcode region associated with the specific target RNA is not detected in the sequencing read of the amplification product.
In some embodiments of any of the aspects, at least n target RNAs in a single sample are detected, and the at least n target RNAs are on the same assayed RNA molecule.
In some embodiments of any of the aspects, the assayed RNA molecule is: (i) determined to be present in the sample if at least one of the n target RNAs are detected; or (ii) determined to not be present in the sample if none of the n target RNAs are detected.
In one aspect described herein is a method of preparing at least two pooled barcoded amplification sets from at least one target RNA in at least two samples, comprising the sequential steps of: (a) contacting the at least two samples with a reverse transcriptase and a first primer or first set of primers comprising at least a first barcode, under conditions permitting the generation of reverse transcription products; (b) combining reverse transcription products from samples in step (a) in one container to form a pooled reverse transcription product mixture; and (c) contacting the pooled reverse transcription product mixture with a DNA polymerase and a set of second primers under conditions permitting the generation of amplification products.
In one aspect described herein is a reverse transcription solution comprising: (a) a reverse transcriptase; (b) a first set of primers comprising at least one barcode; (c) a detergent; (d) carrier nucleic acid; (e) at least one positive control nucleic acid; (f) at least one stabilization agent; and/or (g) reverse transcription reaction buffer.
In one aspect described herein is a collection tube containing a reverse transcription solution as described herein.
In one aspect described herein is a kit for detecting a target RNA in a sample, comprising: (a) a reverse transcriptase; (b) a first set of primers comprising at least one barcode; (c) a detergent; (d) a carrier nucleic acid; (e) a positive control nucleic acid; (f) at least one stabilization agent; (g) at least two containers; (h) a DNA polymerase; (i) a second set of primers; (j) Uracil-DNA Glycosylase (UDG) enzyme; (k) a protector nucleic acid; and/or a third set of primers.
In one aspect described herein is a composition comprising: (a) a target RNA; (b) a reverse transcriptase; (c) a first primer or a first set of primers comprising at least one barcode; (d) a detergent; (e) a carrier nucleic acid; (f) a positive control nucleic acid; and/or (g) at least one stabilization agent.
In one aspect described herein is a composition comprising: (a) a barcoded reverse transcription product; (b) a second set of primers; (c) DNA polymerase; (d) Uracil-DNA Glycosylase (UDG) enzyme; and/or (e) a protector nucleic acid.
The technology described herein is directed to multiplexed methods of detecting at least one target RNA in at least two samples. Specifically, the methods use primers comprising at least one barcode region. Also described herein are kits, compositions, and system associated with such methods. Such multiplexed methods, also referred to herein as “One-Seq,” exhibit at least the following advantages compared to existing detection methods: (1) the workflow permits barcoding of 50-5,000 samples per batch, with up to ~100,000 total samples per sequencing run; (2) the workflow permits pre-amplification pooling of reverse transcription products; (3) the method can be used to detect multiple loci on one target RNA molecule in one test; (4) the method can be used to detect multiple RNA target molecules, e.g., multiple viruses, in one test; (5) the method exhibits high sensitivity, e.g., as the number of RNA targets that are on one RNA molecule increases, the level of sensitivity increases (e.g., the sensitivity of the SARS-CoV-2 detection method approaches 50-150 genome copy equivalents per mL (gce/mL), compared to other sequencing-based tests that detect over 1000 gce/mL; (6) the method exhibits high efficiency, with reduced labor (e.g., no upfront extraction step, a one-pot reverse transcription step, reduced liquid-handling steps, etc.) and reduced cost per test; (7) the protector nucleic acid described herein can be used to reduce or eliminate barcode crosstalk that can result from reverse transcription primer carry-over into the amplification step; and (8) specially-designed primers can be used to detect variations of interest in the target RNA. The following discusses considerations to permit those of ordinary skill in the art to make and practice the compositions and methods described herein.
In multiple aspects, described herein are methods of detecting a target RNA. The target RNA can be detected at the single molecular level using the methods, kits, and systems as described herein. In one aspect described herein is a multiplexed method of detecting at least one target RNA in at least two samples, comprising: (a) contacting the at least two samples with a reverse transcriptase and a first primer or first set of primers comprising at least a first barcode, under conditions permitting the generation of reverse transcription products; (b) combining reverse transcription products from samples in step (a) in one container to form a pooled reverse transcription product mixture; (c) contacting the pooled reverse transcription product mixture with a DNA polymerase and a set of second primers under conditions permitting the generation of amplification products; and (d) sequencing the amplification products, thereby detecting at least one target RNA, if present, in the at least two samples. In some embodiments, step (a) is performed before step (b). In some embodiments, step (b) is performed before step (c). In some embodiments, step (c) is performed before step (d). In some embodiments, steps (a)-(d) are performed sequentially.
In one aspect described herein is a multiplexed method of detecting at least one target RNA in at least two samples, comprising the sequential steps of: (a) contacting the at least two samples with a reverse transcriptase and a first primer or first set of primers comprising at least a first barcode, under conditions permitting the generation of reverse transcription products; (b) combining reverse transcription products from samples in step (a) in one container to form a pooled reverse transcription product mixture; (c) contacting the pooled reverse transcription product mixture with a DNA polymerase and a set of second primers under conditions permitting the generation of amplification products; and (d) sequencing the amplification products, thereby detecting at least one target RNA, if present, in the at least two samples.
In one aspect described herein is a multiplexed method of detecting at least one target RNA in at least two samples, consisting of: (a) contacting the at least two samples with a reverse transcriptase and a first primer or first set of primers comprising at least a first barcode, under conditions permitting the generation of reverse transcription products; (b) combining reverse transcription products from samples in step (a) in one container to form a pooled reverse transcription product mixture; (c) contacting the pooled reverse transcription product mixture with a DNA polymerase and a set of second primers under conditions permitting the generation of amplification products; and (d) sequencing the amplification products, thereby detecting at least one target RNA, if present, in the at least two samples.
In one aspect described herein is a multiplexed method of detecting at least one target RNA in at least two samples, comprising: (a) contacting the at least two samples with a reverse transcriptase and a first primer or first set of primers comprising at least a first barcode, under conditions permitting the generation of reverse transcription products; (b) combining reverse transcription products from samples in step (a) in one container to form a pooled reverse transcription product mixture; (c) contacting the pooled reverse transcription product mixture with a DNA polymerase, at least one protector nucleic acid, and a set of second primers under conditions permitting the generation of amplification products; and (d) sequencing the amplification products, thereby detecting at least one target RNA, if present, in the at least two samples.
In one aspect, described herein is a method of preparing at least two pooled barcoded amplification sets from at least one target RNA in at least two samples, comprising the sequential steps of: (a) contacting the at least two samples with a reverse transcriptase and a first primer or first set of primers comprising at least a first barcode, under conditions permitting the generation of reverse transcription products; (b) combining reverse transcription products from samples in step (a) in one container to form a pooled reverse transcription product mixture; and (c) contacting the pooled reverse transcription product mixture with a DNA polymerase and a set of second primers under conditions permitting the generation of amplification products. In some embodiments of any of the aspects, at least one target RNA in the at least two pooled barcoded amplification sets is detected using a sequencing method.
The detection methods as described herein are highly multiplexed. In some embodiments of any of the aspects, the multiplexed method detects at least one target RNA in at least two samples or as many as 100,000 samples in one sequencing run. In some embodiments of any of the aspects, at least one target RNA from at least 50 samples is/are detected, e.g., in a single performance of steps (a) - (d). In some embodiments of any of the aspects, at least one target RNA from at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370, at least 380, at least 390, at least 400, at least 410, at least 420, at least 430, at least 440, at least 450, at least 460, at least 470, at least 480, at least 490, at least 500, at least 510, at least 520, at least 530, at least 540, at least 550, at least 560, at least 570, at least 580, at least 590, at least 600, at least 610, at least 620, at least 630, at least 640, at least 650, at least 660, at least 670, at least 680, at least 690, at least 700, at least 710, at least 720, at least 730, at least 740, at least 750, at least 760, at least 770, at least 780, at least 790, at least 800, at least 810, at least 820, at least 830, at least 840, at least 850, at least 860, at least 870, at least 880, at least 890, at least 900, at least 910, at least 920, at least 930, at least 940, at least 950, at least 960, at least 970, at least 980, at least 990, at least 1000, at least 1500, at least 2000, at least 2500, at least 3000, at least 3500, at least 4000, at least 4500, at least 5000, at least 5500, at least 6000, at least 6500, at least 7000, at least 7500, at least 8000, at least 8500, at least 9000, at least 9500, at least 10000, at least 15000, at least 20000, at least 25000, at least 30000, at least 35000, at least 40000, at least 45000, at least 50000, at least 55000, at least 60000, at least 65000, at least 70000, at least 75000, at least 80000, at least 85000, at least 90000, at least 95000, at least 100000 or more samples is/are detected. This improved workflow, facilitated for example by pre-amplification barcoding and pooling ahead of next generation sequencing permits highly increased throughput without sacrificing sensitivity.
In some embodiments of any of the aspects, at least one target RNA from at least 50 samples are detected per batch, e.g., in a single performance of steps (a) - (c). In some embodiments of any of the aspects, at least one target RNA from at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370, at least 380, at least 390, at least 400, at least 410, at least 420, at least 430, at least 440, at least 450, at least 460, at least 470, at least 480, at least 490, at least 500, at least 510, at least 520, at least 530, at least 540, at least 550, at least 560, at least 570, at least 580, at least 590, at least 600, at least 610, at least 620, at least 630, at least 640, at least 650, at least 660, at least 670, at least 680, at least 690, at least 700, at least 710, at least 720, at least 730, at least 740, at least 750, at least 760, at least 770, at least 780, at least 790, at least 800, at least 810, at least 820, at least 830, at least 840, at least 850, at least 860, at least 870, at least 880, at least 890, at least 900, at least 910, at least 920, at least 930, at least 940, at least 950, at least 960, at least 970, at least 980, at least 990, at least 1000 samples are detected per batch.
The detection methods as described herein are highly sensitive. In some embodiments of any of the aspects, the detection method has a limit of detection of at least 500 target RNAs per mL for a given target RNA. As used herein, the term “limit of detection” (LoD or detection limit) refers to the lowest quantity of the target RNA that can be distinguished from the absence of target RNA with a predetermined confidence level (e.g., 90% or 95% detection rate). In some embodiments of any of the aspects, the detection method has a limit of detection of at least 1000 target RNA copies per mL for a given target RNA. In some embodiments of any of the aspects, the detection method, e.g., using one primer per target RNA molecule, has a limit of detection of at least 500 target RNA copies per mL for a given target RNA. In some embodiments of any of the aspects, the detection method, e.g., using four primers per target RNA molecule, has a limit of detection of at least 100 target RNA copies per mL for a given target RNA. In some embodiments of any of the aspects, the limit of detection of the target RNA decreases and the sensitivity increases as the number of primers specific for a given target RNA molecule increases.
In some embodiments of any of the aspects, the detection method has a limit of detection of at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370, at least 380, at least 390, at least 400, at least 410, at least 420, at least 430, at least 440, at least 450, at least 460, at least 470, at least 480, at least 490, at least 500, at least 510, at least 520, at least 530, at least 540, at least 550, at least 560, at least 570, at least 580, at least 590, at least 600, at least 610, at least 620, at least 630, at least 640, at least 650, at least 660, at least 670, at least 680, at least 690, at least 700, at least 710, at least 720, at least 730, at least 740, at least 750, at least 760, at least 770, at least 780, at least 790, at least 800, at least 810, at least 820, at least 830, at least 840, at least 850, at least 860, at least 870, at least 880, at least 890, at least 900, at least 910, at least 920, at least 930, at least 940, at least 950, at least 960, at least 970, at least 980, at least 990, or at least 1000 or more target RNA copies per mL for a given target RNA.
In some embodiments of any of the aspects, the detection method has a dynamic range of at least 3 logs. As used herein, the term “dynamic range” refers to the variation of target RNA concentrations detectable by the methods described herein. Dynamic range can be calculated as the base-10 logarithmic value (“logs”) of the difference between the smallest and largest signal values. In some embodiments of any of the aspects, the detection method has a dynamic range of at least 5 logs. In some embodiments of any of the aspects, the detection method has a dynamic range of at least 6 logs. In some embodiments of any of the aspects, the detection method has a dynamic range of at least 3 logs, at least 3.25 logs, at least 3.5 logs, at least 3.75 logs, at least 4 logs, at least 4.25 logs, at least 4.5 logs, at least 4.75 logs, at least 5 logs, at least 5.25 logs, at least 5.5 logs, at least 5.75 logs, at least 6 logs, at least 6.25 logs, at least 6.5 logs, at least 6.75 logs, at least 7 logs or more.
In some embodiments of any of the aspects, between any of the steps, the reaction product is diluted before being added to the next reaction step. In some embodiments of any of the aspects, the reaction product of step (a) (e.g., the RT step) is diluted prior to being added to step (b) (e.g., the pooling step). In some embodiments of any of the aspects, the pooled mixture of step (b) (e.g., the pooling step) is diluted prior to being added to step (c) (e.g., the amplification step). In some embodiments of any of the aspects, the reaction product of step (c) (e.g., the amplification step) is diluted prior to being added to step (d) (e.g., the sequencing step). In some embodiments, such a dilution step reduces the level of components (e.g., primers, stabilization agents, metal-chelating agents, etc.) that can inhibit subsequent enzymatic reaction(s).
In some embodiments of any of the aspects, the diluent comprises the reaction buffer of the next reaction or an aqueous solution. In some embodiments of any of the aspects, the dilution comprises a ratio of at least 4:5, at least 2:3, at least 1:2, at least 1:3, at least 1:4, at least 1:5, at least 1:6, at least 1:7, at least 1:8, at least 1:9, at least 1:10, at least 1:20, at least 1:30, at least 1:40, at least 1:50, at least 1:60, at least 1:70, at least 1:80, at least 1:90, at least 1:10, at least 1:100, least 1:200, least 1:300, least 1:400, least 1:500, least 1:600, least 1:700, least 1:800, least 1:900, at least 1:103, at least 1:104, or at least 1:105, of reaction product to diluent.
In some embodiments of any of the aspects, steps (a), (b), (c), and/or (d), or a sub-part thereof, are performed between 12° C. and 72° C. As a non-limiting example, steps (a), (b), (c), and/or (d), or a sub-part thereof, are performed at a temperature of at least 12° C., at least 13° C., at least 14° C., at least 15° C., at least 16° C., at least 17° C., at least 18° C., at least 19° C., at least 20° C., at least 21° C., at least 22° C., at least 23° C., at least 24° C., at least 25° C., at least 26° C., at least 27° C., at least 28° C., at least 29° C., at least 30° C., at least 31° C., at least 32° C., at least 33° C., at least 34° C., at least 35° C., at least 36° C., at least 37° C., at least 38° C., at least 39° C., at least 40° C., at least 41° C., at least 42° C., at least 43° C., at least 44° C., at least 45° C., at least 46° C., at least 47° C., at least 48° C., at least 49° C., at least 50° C., at least 51° C., at least 52° C., at least 53° C., at least 54° C., at least 55° C., at least 56° C., at least 57° C., at least 58° C., at least 59° C., at least 60° C., at least 61° C., at least 62° C., at least 63° C., at least 64° C., at least 65° C., at least 66° C., at least 67° C., at least 68° C., at least 69° C., at least 70° C., at least 71° C., at least 72° C. or more. In some embodiments of any of the aspects, steps (a), (b), (c), and/or (d) further comprise a step of heat-inactivation, e.g., heat-inactivation of an enzyme (reverse transcriptase; UDG; etc.). Such heat inactivation can be performed at least 50° C., at least 55° C., at least 60° C., at least 65° C., at least 70° C., at least 75° C., at least 80° C., at least 85° C., at least 90° C., at least 91° C., at least 92° C., at least 93° C., at least 94° C., at least 95° C., at least 96° C., at least 97° C., at least 98° C., at least 99° C., at least 99° C., or at least 99.5° C.
In some embodiments of any of the aspects, steps (a), (b), (c), and/or (d), or a sub-part thereof, are performed at a temperature of at most 20° C., at most 21° C., at most 22° C., at most 23° C., at most 24° C., at most 25° C., at most 26° C., at most 27° C., at most 28° C., at most 29° C., at most 30° C., at most 31° C., at most 32° C., at most 33° C., at most 34° C., at most 35° C., at most 36° C., at most 37° C., at most 38° C., at most 39° C., at most 40° C., at most 41° C., at most 42° C., at most 43° C., at most 44° C., at most 45° C., at most 46° C., at most 47° C., at most 48° C., at most 49° C., at most 50° C., at most 51° C., at most 52° C., at most 53° C., at most 54° C., at most 55° C., at most 56° C., at most 57° C., at most 58° C., at most 59° C., at most 60° C., at most 61° C., at most 62° C., at most 63° C., at most 64° C., at most 65° C., at most 66° C., at most 67° C., at most 68° C., at most 69° C., at most 70° C., at most 71° C., or at most 72° C.
In some embodiments of any of the aspects, steps (a), (b), (c), and/or (d) are performed at room temperature. As used herein, the term “room temperature” refers to the ambient temperature of a space, which is typically 20° C.-22° C. In some embodiments of any of the aspects, steps (a), (b), (c), and/or (d) are performed at body temperature. As used herein, the term “body temperature” refers to the temperature of the subject such as that of a human subject, which is typically 37° C. In some embodiments of any of the aspects, steps (a), (b), (c), and/or (d) are performed on a heat block or an incubator capable of maintaining a stable temperature. In some embodiments of any of the aspects, the heat block or incubator is set to approximately 50° C. In some embodiments of any of the aspects, steps (a), (b), (c), and/or (d) are performed in a thermocycler.
In some embodiments of any of the aspects, steps (a), (b), (c), and/or (d) are performed in at most 30 minutes. As a non-limiting example, steps (a), (b), (c), and/or (d) are performed in at most 5 minutes, at most 6 minutes, at most 7 minutes, at most 8 minutes, at most 9 minutes, at most 10 minutes, at most 15 minutes, at most 20 minutes, at most 25 minutes, at most 30 minutes, at most 40 minutes, at most 50 minutes, at most 60 minutes, at most 70 minutes, at most 80 minutes, at most 90 minutes, or at most 100 minutes.
In some embodiments of any of the aspects, steps (a), (b), and (c) are performed in at most 60 minutes. In some embodiments of any of the aspects, steps (a), (b), and (c) are performed in at most 60 minutes, at most 65 minutes, at most 70 minutes, at most 75 minutes, at most 80 minutes, at most 85 minutes, at most 90 minutes, at most 95 minutes, at most 100 minutes, at most 105 minutes, at most 110 minutes, at most 115 minutes, at most 120 minutes, at most 2.5 hours, at most 3 hours, at most 3.5 hours, at most 4 hours, at most 4.5 hours, at most 5 hours, at most 5.5 hours, at most 6 hours, at most 6.5 hours, at most 7 hours, at most 7.5 hours, at most 8 hours, at most 8.5 hours, at most 9 hours, at most 9.5 hours, at most 10 hours, at most 10.5 hours, at most 11 hours, at most 11.5 hours, at most 12 hours, at most 12.5 hours, at most 13 hours, at most 13.5 hours, at most 14 hours, at most 14.5 hours, at most 15 hours, at most 15.5 hours, at most 16 hours, at most 16.5 hours, at most 17 hours, at most 17.5 hours, or at most 18 hours.
In some embodiments of any of the aspects, steps (a), (b), (c), and (d) are performed in at most 180 minutes. In some embodiments of any of the aspects, steps (a), (b), (c), and (d) are performed in at most 2 hours, at most 2.5 hours, at most 3 hours, at most 3.5 hours, at most 4 hours, at most 4.5 hours, at most 5 hours, at most 5.5 hours, at most 6 hours, at most 6.5 hours, at most 7 hours, at most 7.5 hours, at most 8 hours, at most 8.5 hours, at most 9 hours, at most 9.5 hours, at most 10 hours, at most 10.5 hours, at most 11 hours, at most 11.5 hours, at most 12 hours, at most 12.5 hours, at most 13 hours, at most 13.5 hours, at most 14 hours, at most 14.5 hours, at most 15 hours, at most 15.5 hours, at most 16 hours, at most 16.5 hours, at most 17 hours, at most 17.5 hours, or at most 18 hours.
Described herein are methods, kits, and systems permitting detection of a target RNA from a sample. The term “sample” or “test sample” as used herein denotes a sample taken or isolated from a biological organism, e.g., a subject in need of testing. In some embodiments of any of the aspects, the technology described herein encompasses several examples of a biological sample, including but not limited to a saliva sample, sputum sample, a nasopharyngeal sample, a pharyngeal sample, or a nasal sample. In some embodiments of any of the aspects, the sample is a saliva sample. In some embodiments of any of the aspects, the sample is obtained using a swab or another collection tool. In some embodiments of any of the aspects, the biological sample is cells, or tissue, or peripheral blood, or bodily fluid. Depending on the type of target RNA to be detected, exemplary biological samples include, but are not limited to, a biopsy, a tumor sample, biofluid sample; blood; serum; plasma; urine; semen; mucus; tissue biopsy; organ biopsy; synovial fluid; bile fluid; cerebrospinal fluid; mucosal secretion; effusion; sweat; saliva; and/or tissue sample, etc. The term also includes a mixture of the above-mentioned samples. The term “test sample” also includes untreated or pretreated (or pre-processed) biological samples. In some embodiments of any of the aspects, a test sample can comprise cells from a subject.
In some embodiments of any of the aspects, the sample is contacted with a transport medium, such as a viral transport medium (VTM). In some embodiments of any of the aspects, transport medium preserves the target RNA between the time of sample collection and assaying the sample for the detection of the target RNA. The constituents of suitable viral transport media are designed to provide an isotonic solution containing protective agents, including protein protective agents, antibiotics to control microbial contamination, and one or more buffers to control the pH. Isotonicity, however, is not an absolute requirement; some highly successful transport media contain hypertonic solutions of sucrose. Liquid transport media are used primarily for transporting swabs or materials released into the medium from a collection swab. Liquid media can be added to other specimens when inactivation of the viral agent is likely and when the resultant dilution is acceptable. An exemplary VTM comprises FBS (e.g., 2%; heat-inactivated at 56° C. for 30 min, Gibco™ 26140079), 1x Antibiotic-Antimycotic (Gibco™, 15240096) and phenol red (e.g., 11 mg/L), in 1 x Hank’s balanced salt solution. In some embodiments of any of the aspects, the VTM further comprises a detergent, in an amount that does not interfere with subsequent enzymatic reactions; the detergent can allow for viral lysis without the need for a nucleic-acid extraction step. Another exemplary VTM suitable for use in collecting throat and nasal swabs from human patients is prepared as follows: (1) add 10 g veal infusion broth and 2 g bovine albumin fraction V to sterile distilled water (to 400 ml); (2) add 0.8 ml gentamicin sulfate solution (50 mg/ml) and 3.2 ml amphotericin B (250 µg/ml); and (3) sterilize by filtration. Additional non-limiting examples of viral transport media include COPAN Universal Transport Medium; Eagle Minimum Essential Medium (E-MEM); Transport medium 199; and PBS-Glycerol transport medium. see e.g., Johnson, Transport of Viral Specimens, CLINICAL MICROBIOLOGY REVIEWS, April 1990, p. 120-131; Collecting, preserving and shipping specimens for the diagnosis of avian influenza A(H5N1) virus infection, Guide for field operations, October 2006. In some embodiments of any of the aspects, viral transport media does not inhibit the detection methods as described herein.
In some embodiments of any of the aspects, prior to the reverse transcription (RT) step total RNA is not isolated from the sample. In some embodiments of any of the aspects, prior to the RT step, the at least one target RNA is not extracted from the sample. In some embodiments of any of the aspects, prior to the RT step, a standard RNA isolation method or kit is not used. Non-limiting examples of standard RNA extraction methods, which are not necessary to be used herein, include: (1) organic extraction, such as phenol-Guanidine Isothiocyanate (GITC)-based solutions (e.g., TRIZOL and TRI reagent); (2) silica-membrane based spin column technology (e.g., RNeasy and its variants); (3) paramagnetic particle technology (e.g., DYNABEADS mRNA DIRECT MICRO); (4) density gradient centrifugation using cesium chloride or cesium trifluoroacetate; (5) lithium chloride and urea isolation; (6) oligo(dt)-cellulose column chromatography; and (7) non-column poly (A)+ purification/isolation. In some embodiments of any of the aspects, prior to the RT step the sample is not heat-inactivated.
In some embodiments of any of the aspects, prior to the RT step, the sample is contacted with a detergent, in an amount that does not interfere with subsequent enzymatic reactions; the detergent can allow for viral lysis without the need for a nucleic-acid extraction step. Alternatively, the sample can be contacted with a detergent in an amount that facilitates release of viral nucleic acids, but that may be high enough to impact subsequent enzymatic steps; in this instance, dilution of the detergent-containing sample prior to enzymatic reaction (e.g., RT reaction, amplification reaction, or both) can reduce the detergent to a level that permits efficient enzyme activity. Non-limiting examples of detergents include Triton X-100, sodium tri-isopropyl naphthalene sulfonate, lithium dodecyl sulfate (LDS); sodium dodecyl sulfate (SDS), NP-40; lecithin, a Span group (e.g., Span 20, or 80), or a Tween group (e.g., Tween 20, 21, 40, 60, 60 K, 61, 65, 80, 80 K, 81, or 85), a sugar amide (e.g. polysaccharide amide), or an alkyl polyglucocide. In some embodiments of any of the aspects, the detergent is Triton X-100 (2-[4-(2,4,4-trimethylpentan-2-yl)phenoxy]ethanol).
In some embodiments of any of the aspects, the test sample can be an untreated test sample. As used herein, the phrase “untreated test sample” refers to a test sample that has not had any prior sample pre-treatment except for dilution and/or suspension in a solution. While pre-treatment is not required, and the lack of such requirement provides an advantage for assay workflow and throughput, in some embodiments the test sample can be treated prior to performing the RNA detection methods as described herein. Exemplary methods for treating a test sample include, but are not limited to, centrifugation, filtration, sonication, homogenization, heating, freezing and thawing, and combinations thereof. In some embodiments of any of the aspects, the test sample can be a frozen test sample. The frozen sample can be thawed before employing methods, assays and systems described herein. After thawing, a frozen sample can be centrifuged before being subjected to methods, assays and systems described herein. In some embodiments of any of the aspects, the test sample is a clarified test sample, for example, by centrifugation and collection of a supernatant comprising the clarified test sample. In some embodiments of any of the aspects, a test sample can be a pre-processed test sample, for example, supernatant or filtrate resulting from a treatment selected from the group consisting of centrifugation, homogenization, sonication, filtration, thawing, purification, and any combinations thereof. In some embodiments of any of the aspects, the test sample can be treated with a chemical and/or biological reagent. Chemical and/or biological reagents can be employed, for example, to protect and/or maintain the stability of the sample, including biomolecules (e.g., nucleic acid and protein) therein, during processing. The skilled artisan is well aware of methods and processes appropriate for pre-processing of biological samples required for detection of a nucleic acid as described herein.
Described herein are methods, kits, and systems that can be used to detect a target RNA, which can also be referred to as “an RNA of interest.” Ribonucleic acid (RNA) is a polymeric nucleic acid molecule essential in various biological roles in coding, decoding, regulation and expression of genes. Each nucleotide in RNA contains a ribose sugar, with carbons numbered 1′ through 5′. A base is attached to the 1′ position, in general, adenine (A), cytosine (C), guanine (G), or uracil (U). A phosphate group is attached to the 3′ position of one ribose and the 5′ position of the next. The phosphate groups have a negative charge each, making RNA a charged molecule (polyanion). An important structural component of RNA that distinguishes it from DNA is the presence of a hydroxyl group at the 2′ position of the ribose sugar. In some embodiments of any of the aspects, the target RNA can be any known type of RNA. In some embodiments of any of the aspects, the target RNA comprises an RNA selected from Table 11.
In some embodiments of any of the aspects, at least 2 target RNAs in a single sample are detected, which can be on the same RNA molecule or different RNA molecules. Targeting more than one sequence on an RNA molecule, including but not limited to more than one sequence on a viral genomic RNA can permit increased sensitivity for the assay. This is especially true of longer RNA molecules, which can be subject to some degree of degradation - an assay designed to detect any of a number of sequences on the RNA molecule can improve the chances for detection by increasing the number of possible targets for detection. If one target site has been disrupted by cleavage or other degradation, other sites may remain intact, permitting detection. In some embodiments of any of the aspects, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370, at least 380, at least 390, at least 400, at least 410, at least 420, at least 430, at least 440, at least 450, at least 460, at least 470, at least 480, at least 490, at least 500, at least 510, at least 520, at least 530, at least 540, at least 550, at least 560, at least 570, at least 580, at least 590, at least 600, at least 610, at least 620, at least 630, at least 640, at least 650, at least 660, at least 670, at least 680, at least 690, at least 700, at least 710, at least 720, at least 730, at least 740, at least 750, at least 760, at least 770, at least 780, at least 790, at least 800, at least 810, at least 820, at least 830, at least 840, at least 850, at least 860, at least 870, at least 880, at least 890, at least 900, at least 910, at least 920, at least 930, at least 940, at least 950, at least 960, at least 970, at least 980, at least 990, at least 1000 target RNAs in a single sample are detected, which can be on the same RNA molecule or different RNA molecules.
In some embodiments of any of the aspects, at least one target RNA is a viral RNA. In some embodiments of any of the aspects, at least 2 target RNAs are from the same virus, which can be an RNA virus, a retrovirus, or a DNA virus. In some embodiments of any of the aspects, at least 2 target RNAs are from at least 2 different viruses, non-limiting examples of which are provided herein. Accordingly, in one aspect described herein is a method of detecting an RNA virus in a sample from a subject, comprising: obtaining a sample from the subject; and performing the methods as described herein (e.g., One-Seq) to detect the target viral RNA.
As used herein, the term “RNA virus” refers to a virus comprising an RNA genome. In some embodiments of any of the aspects, the RNA virus is a double-stranded RNA virus, a positive-sense RNA virus, a negative-sense RNA virus, or a reverse transcribing virus (e.g., retrovirus).
In some embodiments of any of the aspects, the RNA virus is a Group III (i.e., double stranded RNA (dsRNA)) virus. In some embodiments of any of the aspects, the Group III RNA virus belongs to a viral family selected from the group consisting of: Amalgaviridae, Birnaviridae, Chrysoviridae, Cystoviridae, Endomaviridae, Hypoviridae, Megabirnaviridae, Partitiviridae, Picobirnaviridae, Reoviridae (e.g., Rotavirus), Totiviridae, Quadriviridae. In some embodiments of any of the aspects, the Group III RNA virus belongs to the Genus Botybirnavirus. In some embodiments of any of the aspects, the Group III RNA virus is an unassigned species selected from the group consisting of: Botrytis porri RNA virus 1, Circulifer tenellus virus 1, Colletotrichum camelliae filamentous virus 1, Cucurbit yellows associated virus, Sclerotinia sclerotiorum debilitation-associated virus, and Spissistilus festinus virus 1.
In some embodiments of any of the aspects, the RNA virus is a Group IV (i.e., positive-sense single stranded (ssRNA)) virus. In some embodiments of any of the aspects, the Group IV RNA virus belongs to a viral order selected from the group consisting of: Nidovirales, Picomavirales, and Tymovirales. In some embodiments of any of the aspects, the Group IV RNA virus belongs to a viral family selected from the group consisting of: Arteriviridae, Coronaviridae (e.g., Coronavirus, SARS-CoV), Mesoniviridae, Roniviridae, Dicistroviridae, Iflaviridae, Marnaviridae, Picornaviridae (e.g., Poliovirus, Rhinovirus (a common cold virus), Hepatitis A virus), Secoviridae (e.g., sub Comovirinae), Alphaflexiviridae, Betaflexiviridae, Gammaflexiviridae, Tymoviridae, Alphatetraviridae, Alvernaviridae, Astroviridae, Barnaviridae, Benyviridae, Bromoviridae, Caliciviridae (e.g., Norwalk virus),
Carmotetraviridae, Closteroviridae, Flaviviridae (e.g., Yellow fever virus, West Nile virus, Hepatitis C virus, Dengue fever virus, Zika virus), Fusariviridae, Hepeviridae, Hypoviridae, Leviviridae, Luteoviridae (e.g., Barley yellow dwarf virus), Polycipiviridae, Narnaviridae, Nodaviridae, Permutotetraviridae, Potyviridae, Sarthroviridae, Statovirus, Togaviridae (e.g., Rubella virus, Ross River virus, Sindbis virus, Chikungunya virus), Tombusviridae, and Virgaviridae. In some embodiments of any of the aspects, the Group IV RNA virus belongs to a viral genus selected from the group consisting of: Bacillariornavirus, Dicipivirus, Labyrnavirus, Sequiviridae, Blunervirus, Cilevirus, Higrevirus, Idaeovirus, Negevirus, Ourmiavirus, Polemovirus, Sinaivirus, and Sobemovirus. In some embodiments of any of the aspects, the Group IV RNA virus is an unassigned species selected from the group consisting of: Acyrthosiphon pisum virus, Bastrovirus, Blackford virus, Blueberry necrotic ring blotch virus, Cadicistrovirus, Chara australis virus, Extra small virus, Goji berry chlorosis virus, Hepelivirus, Jingmen tick virus, Le Blanc virus, Nedicistrovirus, Nesidiocoris tenuis virus 1, Niflavirus, Nylanderia fulva virus 1, Orsay virus, Osedax japonicus RNA virus 1, Picalivirus, Plasmopara halstedii virus, Rosellinia necatrix fusarivirus 1, Santeuil virus, Secalivirus, Solenopsis invicta virus 3, Wuhan large pig roundworm virus. In some embodiments of any of the aspects, the Group IV RNA virus is a satellite virus selected from the group consisting of: Family Sarthroviridae, Genus Albetovirus, Genus Aumaivirus, Genus Papanivirus, Genus Virtovirus, and Chronic bee paralysis virus.
In some embodiments of any of the aspects, the RNA virus is a Group V (i.e., negative-sense ssRNA) virus. In some embodiments of any of the aspects, the Group V RNA virus belongs to a viral phylum or subphylum selected from the group consisting of: Negarnaviricota, Haploviricotina, and Polyploviricotina. In some embodiments of any of the aspects, the Group V RNA virus belongs to a viral class selected from the group consisting of: Chunqiuviricetes, Ellioviricetes, Insthoviricetes, Milneviricetes, Monjiviricetes, and Yunchangviricetes. In some embodiments of any of the aspects, the Group V RNA virus belongs to a viral order selected from the group consisting of: Articulavirales, Bunyavirales, Goujianvirales, Jingchuvirales, Mononegavirales, Muvirales, and Serpentovirales. In some embodiments of any of the aspects, the Group V RNA virus belongs to a viral family selected from the group consisting of: Amnoonviridae (e.g., Taastrup virus), Arenaviridae (e.g., Lassa virus), Aspiviridae, Bornaviridae (e.g., Borna disease virus), Chuviridae, Cruliviridae, Feraviridae, Filoviridae (e.g., Ebola virus, Marburg virus), Fimoviridae, Hantaviridae, Jonviridae, Mymonaviridae, Nairoviridae, Nyamiviridae, Orthomyxoviridae (e.g., Influenza viruses), Paramyxoviridae (e.g., Measles virus, Mumps virus, Nipah virus, Hendra virus, and NDV), Peribunyaviridae, Phasmaviridae, Phenuiviridae, Pneumoviridae (e.g., RSV and Metapneumovirus), Qinviridae, Rhabdoviridae (e.g., Rabies virus), Sunviridae, Tospoviridae, and Yueviridae. In some embodiments of any of the aspects, the Group V RNA virus belongs to a viral genus selected from the group consisting of: Anphevirus, Arlivirus, Chengtivirus, Crustavirus, Tilapineviridae, Wastrivirus, and Deltavirus (e.g., Hepatitis D virus).
In some embodiments of any of the aspects, the RNA virus is a Group VI RNA virus, which comprise a virally encoded reverse transcriptase. In some embodiments of any of the aspects, the Group VI RNA virus belongs to the viral order Ortervirales. In some embodiments of any of the aspects, the Group VI RNA virus belongs to a viral family or subfamily selected from the group consisting of: Belpaoviridae, Caulimoviridae, Metaviridae, Pseudoviridae, Retroviridae (e.g., Retroviruses, e.g. HIV), Orthoretrovirinae, and Spumaretrovirinae. In some embodiments of any of the aspects, the Group VI RNA virus belongs to a viral genus selected from the group consisting of: Alpharetrovirus (e.g., Avian leukosis virus; Rous sarcoma virus), Betaretrovirus (e.g., Mouse mammary tumour virus), Bovispumavirus (e.g., Bovine foamy virus), Deltaretrovirus (e.g., Bovine leukemia virus; Human T-lymphotropic virus), Epsilonretrovirus (e.g., Walleye dermal sarcoma virus), Equispumavirus (e.g., Equine foamy virus), Felispumavirus (e.g., Feline foamy virus), Gammaretrovirus (e.g., Murine leukemia virus; Feline leukemia virus), Lentivirus (e.g., Human immunodeficiency virus 1; Simian immunodeficiency virus; Feline immunodeficiency virus), Prosimiispumavirus (e.g., Brown greater galago prosimian foamy virus), and Simiispumavirus (e.g., Eastern chimpanzee simian foamy virus). In some embodiments of any of the aspects, the RNA virus is any known RNA virus.
In some embodiments of any of the aspects, the RNA virus is a coronavirus. The scientific name for coronavirus is Orthocoronavirinae or Coronavirinae. Coronaviruses belong to the family of Coronaviridae, order Nidovirales, and realm Riboviria. They are divided into alphacoronaviruses and betacoronaviruses which infect mammals - and gammacoronaviruses and deltacoronaviruses which primarily infect birds. Non limiting examples of alphacoronaviruses include: Human coronavirus 229E, Human coronavirus NL63, Miniopterus bat coronavirus 1, Miniopterus bat coronavirus HKU8, Porcine epidemic diarrhea virus, Rhinolophus bat coronavirus HKU2, Scotophilus bat coronavirus 512, and Feline Infectious Peritonitis Virus (FIPV, also referred to as Feline Infectious Hepatitis Virus). Non limiting examples of betacoronaviruses include: Betacoronavirus 1 (e.g., Bovine Coronavirus, Human coronavirus OC43), Human coronavirus HKU1, Murine coronavirus (also known as Mouse hepatitis virus (MHV)), Pipistrellus bat coronavirus HKU5, Rousettus bat coronavirus HKU9, Severe acute respiratory syndrome-related coronavirus (e.g., SARS-CoV, SARS-CoV-2), Tylonycteris bat coronavirus HKU4, Middle East respiratory syndrome (MERS)-related coronavirus, and Hedgehog coronavirus 1 (EriCoV). Non limiting examples of gammacoronaviruses include: Beluga whale coronavirus SW1, and Infectious bronchitis virus. Non limiting examples of deltacoronaviruses include: Bulbul coronavirus HKU11, and Porcine coronavirus HKU15.
In some embodiments of any of the aspects, the coronavirus is selected from the group consisting of: severe acute respiratory syndrome-associated coronavirus (SARS-CoV); severe acute respiratory syndrome-associated coronavirus 2 (SARS-CoV-2); Middle East respiratory syndrome-related coronavirus (MERS-CoV); HCoV-NL63; and HCoV-HKu1. In some embodiments of any of the aspects, the coronavirus is severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes coronavirus disease of 2019 (COVID19 or simply COVID). In some embodiments of any of the aspects, the coronavirus is severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1), which causes SARS. In some embodiments of any of the aspects, the coronavirus is Middle East respiratory syndrome-related coronavirus (MERS-CoV), which causes MERS.
In some embodiments of any of the aspects, the RNA virus is severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In some embodiments of any of the aspects, at least one viral RNA is a SARS-CoV-2 RNA. In some embodiments of any of the aspects, the target nucleic acid comprises at least a portion of Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2, (see e.g., complete genome, SARS-CoV-2 Jan. 2020/NC_045512.2 Assembly (wuhCor1)). In some embodiments of any of the aspects, the target nucleic acid comprises any gene from SARS-CoV-2, such as the N gene, the S gene, or the ORF1ab gene. In some embodiments of any of the aspects, the target nucleic acid comprises SEQ ID NO: 1001 (Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2, N gene). In some embodiments of any of the aspects, the target nucleic acid comprises SEQ ID NO: 1002 (Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2, S gene). In some embodiments of any of the aspects, the target nucleic acid comprises SEQ ID NO: 1018 (Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2, ORF1ab gene). In some embodiments of any of the aspects, the target nucleic acid comprises one of SEQ ID NOs: 1001-1002 or SEQ ID NO: 1018, or a nucleic acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to one of SEQ ID NO: 1001-1002 or SEQ ID NO: 1018 that maintains the same function or a codon-optimized version of SEQ ID NOs: 1001-1002. In some embodiments of any of the aspects, the target nucleic acid comprises one of SEQ ID NOs: 1001-1002 or SEQ ID NO: 1018, or a nucleic acid sequence that is at least 95% identical to one of SEQ ID NOs: 1001-1002 or SEQ ID NO: 1018 that maintains the same function.
SEQ ID NO: 1001, Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, N nucleocapsid phosphoprotein, Gene ID: 43740575, 1260 bp ss-RNA, NC_045512 REGION: 28274-29533
SEQ ID NO: 1002, Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, S surface glycoprotein, Gene ID: 43740568, 3822 bp ss-RNA, NC_045512 REGION: 21563-25384
SEQ ID NO: 1003, Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, S surface glycoprotein, Gene ID: 43740568, 1273 aa
SEQ ID NO: 1018, ORF1ab polyprotein, Severe acute respiratory syndrome coronavirus 2, isolate Wuhan-Hu-1, NCBI Reference Sequence: NC_045512.2 region: 266-21555, 21290 nt
In some embodiments of any of the aspects, the target RNA comprises a variation of interest. In some embodiments of any of the aspects, the variation of interest is selected from the group consisting of: a single-nucleotide variation; a point mutation; a substitution; an insertion; and a deletion. In some embodiments of any of the aspects, the variation of interest is associated with a variant of SARS-CoV-2. In some embodiments of any of the aspects, the SARS-CoV-2 variant is selected from the group consisting of: B.1.1.7 (also referred to as the United Kingdom variant, 20I/501Y.V1, or VOC 202012/01); B.1.351 (also referred to as the South African variant, or 20H/501Y.V2); P.1 (also referred to as the Brazilian variant); and CAL.20C (also referred to as the California variant). Non-limiting examples of variations of interest include del69-70, del144, K417N, K417T, L452R, E484K, N501Y, D614G, P681H, or A701V in the SARS-CoV-2 S protein (see e.g., SEQ ID NO: 1003). In some embodiments of any of the aspects, variations associated with the B.1.1.7 variant include del69-70, del144, N501Y, D614G, P681H, and/or A701V in the SARS-CoV-2 S protein. In some embodiments of any of the aspects, variations associated with the B.1.351 variant include K417N, E484K, and/or N501Y in the SARS-CoV-2 S protein. In some embodiments of any of the aspects, variations associated with the P.1 variant include K417T, E484K, and/or N501Y in the SARS-CoV-2 S protein. In some embodiments of any of the aspects, a variations associated with the CAL.20C variant includes L452R in the SARS-CoV-2 S protein. See e.g., Table 12 for exemplary variations of interest in the SARS-CoV-2 S gene and associated nucleic acid mutations in the target nucleic acid (T (thymine) and U (uracil) are used interchangeably).
In some embodiments of any of the aspects, the viral RNA is an RNA produced by a virus with a DNA genome, i.e., a DNA virus. As a non-limiting example the DNA virus is a Group I (dsDNA) virus, a Group II (ssDNA) virus, or a Group VII (dsDNA-RT) virus. In some embodiments of any of the aspects, the RNA produced by a DNA virus comprises an RNA transcript of the DNA genome.
Described are methods, kits, and systems that can be used to detect a target RNA. In some embodiments of any of the aspects, the target RNA is reverse transcribed to a complementary DNA (cDNA) that is thereafter amplified and detected. Accordingly, the methods described herein comprise a step (a) (i.e., the RT step) of contacting the sample with a reverse transcriptase and a first primer or first set of primers. In some embodiments of any of the aspects, the method comprises contacting the at least two samples with a reverse transcriptase and a first primer or first set of primers comprising at least a first barcode, under conditions permitting the generation of reverse transcription products. As used herein, the phrase “conditions permitting the generation of reverse transcription products” refers to temperature(s), time(s), and/or reagent(s) that allow the reverse transcriptase to reverse-transcribe a cDNA from the target RNA using at least one primer from the first set of primers; non-limiting examples of such conditions are described herein. In some embodiments of any of the aspects, prior to step (a) (i.e., the RT step) the at least one target RNA is not extracted from the sample, as described herein with regard to sample preparation.
The term “reverse transcriptase” (RT) refers to an RNA-dependent DNA polymerase used to generate complementary DNA (cDNA) from an RNA template. In some embodiments of any of the aspects, the cDNA is single-stranded DNA (ssDNA) or double-stranded DNA (dsDNA). Reverse transcriptases are used by retroviruses to replicate their genomes, by retrotransposon mobile genetic elements to proliferate within the host genome, by eukaryotic cells to extend the telomeres at the ends of their linear chromosomes, and by some non-retroviruses such as the hepatitis B virus, a member of the Hepadnaviridae, which are dsDNA-RT viruses. Reverse transcriptases are also used in the synthesis of extrachromosomal DNA/RNA chimeric elements called multicopy single-stranded DNA (msDNA) in bacteria. Retroviral RT has three sequential biochemical activities: RNA-dependent DNA polymerase activity, ribonuclease H (RNAse H), and/or DNA-dependent DNA polymerase activity. Collectively, these activities permit the enzyme to convert single-stranded RNA into single-stranded cDNA or double-stranded cDNA.
In some embodiments of any of the aspects, the reverse transcriptase can be any enzyme that can produce cDNA from an RNA transcript. In some embodiments of any of the aspects, the reverse transcriptase comprises an HIV-1 reverse transcriptase from human immunodeficiency virus type 1. In some embodiments of any of the aspects, the reverse transcriptase comprises M-MuLV reverse transcriptase from the Moloney murine leukemia virus (referred to as M-MuLV, M-MLV, or MMLV). In some embodiments of any of the aspects, the reverse transcriptase comprises AMV reverse transcriptase from the avian myeloblastosis virus (AMV). In some embodiments of any of the aspects, the reverse transcriptase comprises telomerase reverse transcriptase that maintains the telomeres of eukaryotic chromosomes. In some embodiments of any of the aspects, the reverse transcriptase is selected from those expressed by any Group VI or Group VII virus. In some embodiments of any of the aspects, the reverse transcriptase is a naturally occurring RT selected from the group consisting of: an M-MLV RT, an AMV RT, a retrotransposon RT, a telomerase reverse transcriptase, and an HIV-1 reverse transcriptase.
In some embodiments of any of the aspects, the reverse transcriptase (RT) is an engineered or recombinant version of, for example, a Moloney Murine Leukemia Virus (MMLV) RT, Avian Myeloblastosis Virus (AMV) RT, or another naturally occurring RT. In some embodiments of any of the aspects, the reverse transcriptase is ProtoScript® II Reverse Transcriptase, which is also referred to herein as ProtoScript® II RT or Protoscriptase II. ProtoScript® II RT is a recombinant Moloney Murine Leukemia Virus (M-MuLV) reverse transcriptase, e.g., a fusion of the Escherichia coli trpE gene with the central region of the M-MuLV pol gene.
In some embodiments of any of the aspects, the reverse transcriptase is selected from the group consisting of: Maxima® RT (e.g., Maxima H Minus® RT); Omniscript® RT; PowerScript® RT; Sensiscript® RT (SES); SuperScript® II (SSII or SS2); SuperScript® III (SSIII or SS3); SuperScript® IV (SSIV); Accuscript® RT (ACC); a recombinant HIV RT; imProm-II® (IP2) RT; M-MLV RT (MML); Protoscript® RT (PRS); Smart MMLV (SML) RT; ThermoScript® (TSR) RT; RapiDxFire™ RT; (see e.g., Levesque-Sergerie et al., BMC Molecular Biology volume 8, Article number: 93 (2007); Okello et al., PLoS One. 2010 Nov 10;5(11):e13931). Non limiting examples of RTs derived from MMLV include PowerScript®, ACC, MML, SML, SS2, SS3, and SS4. Non limiting examples of RTs derived from AMV include PRS and TSR. Non limiting examples of RTs derived from proprietary sources include IP2, SES, Omniscript®, RapiDxFire™ RT (derived from viral DNA isolated from hot springs). In some embodiments of any of the aspects, the reverse transcriptase exhibits increased thermostability (e.g., up to 48° C.) compared to the wild type RT.
In some embodiments of any of the aspects, the reverse transcriptase is SuperScript® IV (see e.g.,
As used herein, one unit (“U”) of reverse transcriptase (e.g., SuperScript® IV RT) is defined as the amount of enzyme that will incorporate 1 nmol of dTTP into acid-insoluble material in a total reaction volume of 50 µl in 10 minutes at 37° C. using poly(rA)•oligo(dT)18 (“(dT)18” disclosed as SEQ ID NO: 1017) as template. In some embodiments of any of the aspects, the reverse transcriptase is provided at a concentration of at least 1 U/µL, at least 2 U/µL, at least 3 U/µL, at least 4 U/µL, at least 5 U/µL, at least 6 U/µL, at least 7 U/µL, at least 8 U/µL, at least 9 U/µL, at least 10 U/µL, at least 20 U/µL, at least 30 U/µL, at least 40 U/µL, at least 50 U/µL, at least 60 U/µL, at least 70 U/µL, at least 80 U/µL, at least 90 U/µL, at least 100 U/µL, at least 110 U/µL, at least 120 U/µL, at least 130 U/µL, at least 140 U/µL, at least 150 U/µL, at least 160 U/µL, at least 170 U/µL, at least 180 U/µL, at least 190 U/µL, at least 200 U/µL, at least 210 U/µL, at least 220 U/µL, at least 230 U/µL, at least 240 U/µL, at least 250 U/µL, at least 260 U/µL, at least 270 U/µL, at least 280 U/µL, at least 290 U/µL, at least 300 U/µL, at least 310 U/µL, at least 320 U/µL, at least 330 U/µL, at least 340 U/µL, at least 350 U/µL, at least 360 U/µL, at least 370 U/µL, at least 380 U/µL, at least 390 U/µL, at least 400 U/µL, at least 410 U/µL, at least 420 U/µL, at least 430 U/µL, at least 440 U/µL, at least 450 U/µL, at least 460 U/µL, at least 470 U/µL, at least 480 U/µL, at least 490 U/µL, or at least 500 U/µL. In some embodiments of any of the aspects, the reverse transcriptase is provided at a concentration of 20 U/µL. In some embodiments of any of the aspects, the reverse transcriptase is provided at a concentration of 200 U/µL.
In some embodiments of any of the aspects, the sample is contacted with a first primer or first set of primers comprising at least a first barcode. In some embodiments of any of the aspects, the sample is contacted with a first primer comprising at least a first barcode. In some embodiments of any of the aspects, the sample is contacted with a first set of primers comprising at least a first barcode. In some embodiments of any of the aspects, the first primer or first set of primers comprises one barcode region. In some embodiments of any of the aspects, the first primer or first set of primers comprises 1, 2, 3, 4, 5, or more barcode regions.
As used herein, the term “primer” denotes a single-stranded nucleic acid that hybridizes to a nucleic acid region of interest and provides a starting point for nucleic acid synthesis, i.e. for enzymatic synthesis of a nucleic acid strand complementary to a template, e.g., a target RNA. In some embodiments of any of the aspects, the primer can be DNA, RNA, modified DNA, modified RNA, synthetic DNA, synthetic RNA, or another synthetic nucleic acid that serves as a substrate for extension when hybridized to a target RNA template. In some embodiments, the primer, e.g., in the first set of primers is about 60 nucleotides long. In some embodiments, the primer, e.g., in the first set of primers is about 40-80 nucleotides long. As a non-limiting example, the primer is 40 nucleotides (nt) long, 41 nt, 42 nt, 43 nt, 44 nt, 45 nt, 46 nt, 47 nt, 48 nt, 49 nt, 50 nt, 51 nt, 52 nt, 53 nt, 54 nt, 55 nt, 56 nt, 57 nt, 58 nt, 59 nt, 60 nt, 61 nt, 62 nt, 63 nt, 64 nt, 65 nt, 66 nt, 67 nt, 68 nt, 69 nt, 70 nt, 71 nt, 72 nt, 73 nt, 74 nt, 75 nt, 76 nt, 77 nt, 78 nt, 79 nt, 80 nt or more. In some embodiments of any of the aspects, at least one primer, e.g., from the first set of primers, comprises sequences selected from Table 4.
In some embodiments of any of the aspects, the first primer or each primer in the first set of primers comprises, from 5′ to 3′: (a) an adaptor region; (b) a first barcode region; and (c) a target-binding region that is complementary or substantially complementary to and permits hybridization to at least one target RNA. In some embodiments of any of the aspects, the first primer or each primer in the first set of primers comprises, from 5′ to 3′: (a) an adaptor region; (b) a first barcode region; (c) a second barcode region; and (d) a target-binding region that is complementary or substantially complementary to and permits hybridization to at least one target RNA.
In some embodiments of any of the aspects, the adaptor region, e.g., of the first primer or each primer in the first set of primers, comprises an amplification adaptor region such as a PCR adaptor region. The adaptor region provides a hybridization or binding site for an amplification primer to be used after reverse transcription and pooling of reverse-transcription products. Inclusion of an adaptor thus permits amplification of an entire pooled population of cDNA products with, for example, a common forward amplification primer or one pair of forward and reverse amplification primers. In some embodiments of any of the aspects, the adaptor region, e.g., of the first primer or each primer in the first set of primers, is complementary or substantially complementary to an adaptor binding region of a primer in a second or subsequent set of primers. In some embodiments of any of the aspects, the adaptor region, e.g., of the first primer or each primer in the first set of primers, comprises SEQ ID NO: 13 or a nucleic acid sequence that is at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SEQ ID NO: 13 that maintains the same function (e.g., amplification adaptor or binding to amplification primer).
In some embodiments of any of the aspects, the first or second barcode region on the first primer or set of first primers is at least 25 nucleotides long. As a non-limiting example, the barcode region can be 10 nucleotides (nt) long, 11 nt, 12 nt, 13 nt, 14 nt, 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32 nt, 33 nt, 34 nt, 35 nt long or more. In some embodiments of any of the aspects, the barcode region of a first primer in the first set of barcoded primers is a Hamming distance of at least 10 from each other barcode region of any other primer in the first set of barcoded primers. As used herein, the term “Hamming distance” refers to the number of positions (e.g., base pairs) at which the corresponding sequences are different. In some embodiments of any of the aspects, the barcode region of a first primer in the first set of barcoded primers is a Hamming distance of at least 12 from each other barcode region of any other primer in the first set of barcoded primers. In some embodiments of any of the aspects, the barcode region of a first primer in the first set of barcoded primers is a Hamming distance of at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20 or more from each other barcode region of any other primer in the first set of barcoded primers (or barcode region in a second, third, fourth, etc. set of barcoded primers).
In some embodiments of any of the aspects, the first or second barcode region on the first primer or set of first primers comprises one of SEQ ID NOs: 18-989 or a nucleic acid sequence that is at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to one of SEQ ID NOs: 18-989 that maintains the same function (e.g., identification). In some embodiments of any of the aspects, the first barcode region on the first primer or set of first primers comprises one of SEQ ID NOs: 30-989 (see e.g., Table 5 or Table 6); such barcodes are also referred to herein as “sample barcode,” “sample ID”, “patient barcode,” or “patient ID.” In some embodiments of any of the aspects, at least one barcode region on the first primer or set of first primers corresponds to and is different for each of the at least two samples. In some embodiments of any of the aspects, at least one barcode region on the first primer or set of first primers corresponds to and is different for each of the target RNAs.
In some embodiments of any of the aspects, a target-binding region is complementary or substantially complementary to and permits hybridization to at least one target RNA. In some embodiments of any of the aspects, the target-binding region permits hybridization to at least one target RNA under conditions permitting the generation of a reverse transcription product. In some embodiments of any of the aspects, the target-binding region, e.g., of a primer in the first set of primers, is about 20 nucleotides long. In some embodiments, the target-binding region, e.g., of a primer in the first set of primers, is about 15-35 nucleotides long. As a non-limiting example, the target-binding region can be 15 nucleotides (nt) long, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32 nt, 33 nt, 34 nt, 35 nt long or more. In some embodiments, the target-binding region, e.g., of a primer in the first set of primers, has a Tm of about 53° C.-62° C., e.g., at least 53° C., at least 54° C., at least 55° C., at least 56° C., at least 57° C., at least 58° C., at least 59° C., at least 60° C., at least 61° C., at least 62° C. or more.
In some embodiments of any of the aspects, the target-binding region of a primer in the first set of primers binds to a region of SARS-CoV-2 N gene or S gene (see e.g., SEQ ID NO: 1001-1002). In some embodiments of any of the aspects, the target-binding region of a primer in the first set of primers comprises one of SEQ ID NO: 3 (N#1_RT), SEQ ID NO: 5 (N#2_RT), SEQ ID NO: 7 (del6970_RT), SEQ ID NO: 9 (D614_RT), or a nucleic acid sequence that is at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to one of SEQ ID NOs: 3, 5, 7, or 9 that maintains the same function (e.g., binding to the target RNA or positive control RNA) (see e.g., Table 4).
In some embodiments of any of the aspects, the target-binding region, e.g., of a primer in the first set of primers, binds at most 5 nucleotides away from, e.g., between the 3′ end of the primer and the 5′ end of, a variation of interest in the target RNA. In some embodiments of any of the aspects, the target-binding region, e.g., of a primer in the first set of primers, binds 0 nt, 1 nt, 2 nt, 3 nt, 4 nt, 5 nt, 6 nt, 7 nt, 8 nt, 9 nt, or 10 nt away from a variation of interest in the target RNA (see e.g.,
In some embodiments of any of the aspects, the target-binding region, e.g., of a primer in the first set of primers, comprises at most 1 nucleotide mismatch (i.e., non-complementary nucleotide) compared to a target RNA (see e.g., Table 7). In some embodiments of any of the aspects, the target-binding region, e.g., of a primer in the first set of primers, does not specifically bind to a non-target nucleic acid, e.g., a nucleic acid that is not a target RNA. In some embodiments of any of the aspects, the target-binding region, e.g., of a primer in the first set of primers, is at most 80% identical to a non-target nucleic acid (see e.g., Table 8 for non-limiting examples of non-target microbial nucleic acids). In some embodiments of any of the aspects, the target-binding region, e.g., of a primer in the first set of primers, is at most 40%, at most 45%, at most 50%, at most 55%, at most 60%, at most 65%, at most 70%, at most 75%, or at most 80% identical to a non-target nucleic acid.
In some embodiments of any of the aspects, the first primer or each primer in the first set of primers comprises, from 5′ to 3′: (a) an adaptor region (e.g., SEQ ID NO: 13); (b) a first barcode region (e.g., one of SEQ ID NOs: 30-989); and (c) a target-binding region that is complementary or substantially complementary to and permits hybridization to at least one target RNA (e.g., one of SEQ ID NOs: 3, 5, 7, or 9). SEQ ID NO: 1005 is an exemplary primer from the first set of primers, comprising from 5′ to 3′: SEQ ID NO: 13 (bolded), SEQ ID NO: 30, and SEQ ID NO: 3 (bold italicized).
SEQ ID NO: 1005, 61 nt (see e.g.,
In some embodiments of any of the aspects, the first primer or each primer in the first set of primers is present in the RT reaction at a concentration of at least 125 nM. In some embodiments of any of the aspects, the first primer or each primer in the first set of primers is present in the RT reaction at a concentration of at least 25 nM, at least 30 nM, at least 35 nM, at least 40 nM, at least 45 nM, at least 50 nM, at least 55 nM, at least 60 nM, at least 65 nM, at least 70 nM, at least 75 nM, at least 80 nM, at least 85 nM, at least 90 nM, at least 95 nM, at least 100 nM, at least 105 nM, at least 110 nM, at least 115 nM, at least 120 nM, at least 125 nM, at least 130 nM, at least 135 nM, at least 140 nM, at least 145 nM, at least 150 nM, at least 160 nM, at least 170 nM, at least 180 nM, at least 190 nM, at least 200 nM, at least 210 nM, at least 220 nM, at least 230 nM, at least 240 nM, at least 250 nM, at least 260 nM, at least 270 nM, at least 280 nM, at least 290 nM, at least 300 nM, at least 310 nM, at least 320 nM, at least 330 nM, at least 340 nM, at least 350 nM, at least 360 nM, at least 370 nM, at least 380 nM, at least 390 nM, at least 400 nM, at least 410 nM, at least 420 nM, at least 430 nM, at least 440 nM, at least 450 nM, at least 460 nM, at least 470 nM, at least 480 nM, at least 490 nM, at least 500 nM.
In some embodiments of any of the aspects, step (a) (the RT step) further comprises contacting the sample with a detergent (also referred to as a surfactant). Such detergent can be included in the viral transport medium, or added thereafter, e.g., in a diluent or RT solution. In some embodiments of any of the aspects, the detergent lyses viral particles or cells in the sample. In some embodiments of any of the aspects, the detergent allows target RNA detection in extraction-free samples, i.e., without the need for a nucleic acid-extraction step. In some embodiments of any of the aspects, the detergent releases target RNA from the sample. In some embodiments of any of the aspects, the detergent releases at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or more of the target RNA from the sample. Non-limiting examples of detergents include anionic surfactants, cationic surfactants, nonionic surfactants, amphoteric/zwitterionic surfactants, and co-surfactants or mixtures thereof.
In some embodiments of any of the aspects, the detergent is a nonionic surfactant. Non-limiting examples of nonionic surfactants include Triton X-100, sodium tri-isopropyl naphthalene sulfonate, LDS, SDS, NP-40; lecithin, a Span group (e.g., Span 20, or 80), or a Tween group (e.g., Tween 20, 21, 40, 60, 60 K, 61, 65, 80, 80 K, 81, or 85), a sugar amide (e.g. polysaccharide amide), or an alkyl polyglucocide. In some embodiments of any of the aspects, the detergent is Triton X-100 (2-[4-(2,4,4-trimethylpentan-2-yl)phenoxy]ethanol). Non-limiting examples of anionic surfactant include alkyl sulfosuccinate, sodium dioctyl sulfosuccinate (AOT), sodium dihexyl sulfosuccinate (AMA), ammonium or sodium lauryl ether sulfate, alkyl or acyl taurates, alkyl or acyl sarcosinates, alyl ether sulfates, alkyl ether sulfonates, or alkyl ether carboxylates (e.g., counterion can be sodium, ammonium, or potassium). Alkyl sulfosuccinate can include a mono or dialkyl sulfosuccinate or a C6-C22 sulfosuccinate. Non limiting examples of cationic surfactants include a quaternary ammonium compound (e.g., an alkyldimethylammonium haloginide), alkyl pyridinium chlorides or bromides, or other hydrogenides. Non-limiting examples of amphoteric surfactants include, for example, a quaternary amino acid, an alkyl amine oxide, or an alkyl betaine.
In some embodiments of any of the aspects, the detergent is present in an amount that does not interfere with subsequent enzymatic reactions (e.g., the RT step, the amplification step, and/or the sequencing step). If the detergent concentration can interfere with subsequent enzymatic reactions then it is diluted or the reaction product is isolated prior to the subsequent enzymatic reactions. In some embodiments of any of the aspects, the detergent (e.g., Triton X-100) is present in the RT reaction at a concentration of at least 0.1%. In some embodiments of any of the aspects, the detergent (e.g., Triton X-100) is present in the RT reaction at a concentration of at least 0.01%, at least 0.02%, at least 0.03%, at least 0.04%, at least 0.05%, at least 0.06%, at least 0.07%, at least 0.08%, at least 0.09%, at least 0.1%, at least 0.2%, at least 0.3%, at least 0.4%, at least 0.5%, at least 0.6%, at least 0.7%, at least 0.8%, at least 0.9%, at least 1% or more.
In some embodiments of any of the aspects, step (a) (the RT step) further comprises contacting the sample with carrier nucleic acid. Such carrier nucleic acid can be included, for example, in the viral transport medium, or added thereafter. In some embodiments of any of the aspects, carrier nucleic acid reduces loss of the target RNA, e.g., preserves at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or more of the target RNA in the sample. In some embodiments of any of the aspects, the carrier nucleic acid is poly-A60 DNA oligonucleotide (e.g., a DNA comprising at least 60 adenosines; (“(dA)60” disclosed as SEQ ID NO: 1025) or E. coli tRNA (e.g., E. coli MRE 600; see e.g., Sigma™, 10109541001).
In some embodiments of any of the aspects, the carrier nucleic acid (e.g., poly-A60 DNA oligonucleotide) is present at a concentration of at least 0.5 uM in the RT reaction. In some embodiments of any of the aspects, the carrier nucleic acid (e.g., poly-A60 DNA oligonucleotide) is present at a concentration of at least 0.01 uM, at least 0.02 uM, at least 0.03 uM, at least 0.04 uM, at least 0.05 uM, at least 0.06 uM, at least 0.07 uM, at least 0.08 uM, at least 0.09 uM, at least 0.1 uM, at least 0.2 uM, at least 0.3 uM, at least 0.4 uM, at least 0.5 uM, at least 0.6 uM, at least 0.7 uM, at least 0.8 uM, at least 0.9 uM, at least 1 uM, at least 2 uM, at least 3 uM, at least 4 uM, at least 5 uM, at least 6 uM, at least 7 uM, at least 8 uM, at least 9 uM, at least 10 uM or more in the RT reaction.
In some embodiments of any of the aspects, the carrier nucleic acid (e.g., E. coli tRNA) is present at a concentration of at least 15 ug/ml in the RT reaction. In some embodiments of any of the aspects, the carrier nucleic acid (e.g., E. coli tRNA) is present at a concentration of at least 1 ug/ml, at least 2 ug/ml, at least 3 ug/ml, at least 4 ug/ml, at least 5 ug/ml, at least 6 ug/ml, at least 7 ug/ml, at least 8 ug/ml, at least 9 ug/ml, at least 10 ug/ml, at least 11 ug/ml, at least 12 ug/ml, at least 13 ug/ml, at least 14 ug/ml, at least 15 ug/ml, at least 16 ug/ml, at least 17 ug/ml, at least 18 ug/ml, at least 19 ug/ml, at least 20 ug/ml, at least 21 ug/ml, at least 22 ug/ml, at least 23 ug/ml, at least 24 ug/ml, at least 25 ug/ml or more in the RT reaction.
In some embodiments of any of the aspects, step (a) (the RT step) further comprises contacting the sample with a positive control nucleic acid. In some embodiments of any of the aspects, the positive control nucleic acid is a positive sample control nucleic acid or a positive enzymatic control nucleic acid. As discussed further below, a sample control tests for the presence of a host (e.g., human) gene transcript to control for the integrity of the sample nucleic acid. In some embodiments of any of the aspects, the reverse transcription reaction comprises a positive sample control nucleic acid. In some embodiments of any of the aspects, the reverse transcription reaction comprises a positive enzymatic control nucleic acid. The enzymatic control tests for the activity or activities of the RT and amplification enzymes used in the reaction. In some embodiments of any of the aspects, the reverse transcription reaction comprises both a positive sample control nucleic acid or a positive enzymatic control nucleic acid.
In some embodiments of any of the aspects, the detection methods described herein comprise a “split amplification” step, e.g., in order to allow optimal detection of the positive control nucleic acids during the sequencing step. In such a split amplification, the pooled reverse transcription product mixture from step (b) is divided into at least two portions, e.g., a “positive control portion” and a “target portion,” and a separate step (c) (e.g., the amplification step) is performed for each portion. In some embodiments, the positive control portion (e.g., the smaller portion) is used to amplify the positive control nucleic acids, e.g., using forward and reverse amplification primers specific for the positive control nucleic acids. The positive control portion can be used to amplify the sample control and/or the enzymatic control. In some embodiments, the target portion (e.g., the larger portion) is used to amplify the target RNAs, e.g., using forward and reverse amplification primers specific for the target cDNAs (e.g., viral targets). After the split amplification step, the at least two portions comprising amplification products from the positive controls and target nucleic acids are combined in the one container for step (d) (e.g., the sequencing step). In some embodiments, before step (d) (e.g., the sequencing step), the amplified portions are combined at the same ratio as before the split amplification. In some embodiments, before step (d) (e.g., the sequencing step), the amplified portions are combined at a new ratio, e.g., with a higher proportion of the positive control amplification products to the target amplification products than before the split, in order to allocate more sequencing reads for the positive control sequences. In some embodiments, the pooled reverse transcription product mixture from step (b) is split 1:10, e.g., into 1 part positive control portion and 10 parts target portion. In some embodiments, before step (d) (e.g., the sequencing step), the amplification products are combined 1:10, e.g., 1 part positive control amplification product and 10 parts target amplification product. In some embodiments, before step (d) (e.g., the sequencing step), the amplification products are combined at a ratio higher than 1:10, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10 or more parts positive control amplification product and 10 parts target amplification product.
In some embodiments of any of the aspects, the positive control nucleic acid (e.g., “positive sample control nucleic acid” or “sample control”) is a primer comprising from 5′ to 3′: (a) an adaptor region; (b) a first barcode region; and (c) a target-binding region that is complementary to or substantially complementary to a sample nucleic acid (e.g., RPP30). The “positive sample control nucleic acid” targets a nucleic acid that is present in the sample, e.g., a “sample nucleic acid,” e.g., a nucleic acid from the subject species or patient, e.g., a human nucleic acid. In some embodiments of any of the aspects, the sample control targets human Ribonuclease P protein subunit p30 (hRPP30 or RPP30 or RPP) gene. RPP30 is a single copy gene present in the human genome. In some embodiments, the sample control targets an RNA (e.g., a specific mRNA) present in the sample. In some embodiments of any of the aspects, the sample control (e.g., primer binding to hRPP30) functions as a control to indicate presence or absence of sample (see e.g.,
In some embodiments of any of the aspects, the forward primer in the second set of primers (i.e., FW PCR primer) for the reverse transcription product of the sample control (e.g., SEQ ID NO: 11) is SEQ ID NO: 14. In some embodiments of any of the aspects, the reverse primer in the second set of primers (i.e., RV PCR primer) for the reverse transcription product of the sample control comprises a target-binding region that is complementary or substantially complementary to the sample nucleic acid. In some embodiments of any of the aspects, the first and second sequencing primers in the third set of primers for the sample control are SEQ ID NO: 15 and SEQ ID NO: 17. If a sequencing signal is detected from the sample control, then the RT reaction comprised a sample that included RNA that could be reverse transcribed and amplified for detection. If a sequencing signal is not detected from the sample control, then the RT reaction did not comprise a sample that included such RNA.
In some embodiments of any of the aspects, the sample control is present in the RT reaction at a concentration of at least 125 nM. In some embodiments of any of the aspects, the sample control is present in the RT reaction at a concentration of at least 25 nM, at least 30 nM, at least 35 nM, at least 40 nM, at least 45 nM, at least 50 nM, at least 55 nM, at least 60 nM, at least 65 nM, at least 70 nM, at least 75 nM, at least 80 nM, at least 85 nM, at least 90 nM, at least 95 nM, at least 100 nM, at least 105 nM, at least 110 nM, at least 115 nM, at least 120 nM, at least 125 nM, at least 130 nM, at least 135 nM, at least 140 nM, at least 145 nM, at least 150 nM, at least 160 nM, at least 170 nM, at least 180 nM, at least 190 nM, at least 200 nM, at least 210 nM, at least 220 nM, at least 230 nM, at least 240 nM, at least 250 nM, at least 260 nM, at least 270 nM, at least 280 nM, at least 290 nM, at least 300 nM, at least 310 nM, at least 320 nM, at least 330 nM, at least 340 nM, at least 350 nM, at least 360 nM, at least 370 nM, at least 380 nM, at least 390 nM, at least 400 nM, at least 410 nM, at least 420 nM, at least 430 nM, at least 440 nM, at least 450 nM, at least 460 nM, at least 470 nM, at least 480 nM, at least 490 nM, at least 500 nM.
In some embodiments of any of the aspects, the target-binding region of the sample control comprises a 15 nt - 25 nt sequence that is complementary to or substantially complementary to SEQ ID NO: 1006, or a 15 nt - 25 nt sequence that is complementary to or substantially complementary to a nucleic acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SEQ ID NO: 1006 that maintains the same function (e.g., specifically binding a nucleic acid in the sample; e.g., specifically binding hRPP30 mRNA). In some embodiments of any of the aspects, the target-binding region of the sample control comprises SEQ ID NO: 1019 or a nucleic acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SEQ ID NO: 1019 that maintains the same function (e.g., specifically binding hRPP30 mRNA).
SEQ ID NO: 1006, Homo sapiens ribonuclease P/MRP subunit p30 (RPP30), transcript variant 1, mRNA, 4521 nt
SEQ ID NO: 1019, RPP30 RT primer, target-binding region, 20 nt GAGCGGCTGTCTCCACAAGT
SEQ ID NO: 1020, RPP30 RV amplification primer, target-binding region, 20 nt GTGTTTGCAGATTTGGACCT
In some embodiments of any of the aspects, the primer in the first step of primers (i.e., RT primer) for the sample control (e.g., RPP30, SEQ ID NO: 1006) comprises SEQ ID NO: 1019. In some embodiments of any of the aspects, the forward primer in the second set of primers (i.e., FW PCR primer) for the sample control (e.g., RPP30, SEQ ID NO: 1006) is SEQ ID NO: 14. In some embodiments of any of the aspects, the reverse primer in the second set of primers (i.e., RV PCR primer) for the sample control (e.g., RPP30, SEQ ID NO: 1006) comprises SEQ ID NO: 1020. In some embodiments of any of the aspects, the first and second sequencing primers in the third set of primers for the enzymatic control are SEQ ID NO: 15 and SEQ ID NO: 17 (see e.g., Table 15).
In some embodiments of any of the aspects, the positive control nucleic acid (e.g., a “positive enzymatic control nucleic acid” or “enzymatic control”) comprises, from 5′ to 3′: (a) a region that is not identical or substantially identical to any target RNA being assayed; and (b) a region that is identical or substantially identical to at least one target RNA region. In some embodiments of any of the aspects, the positive control nucleic acid (e.g., a “positive enzymatic control nucleic acid”) comprises, from 5′ to 3′: (a) a region that is not identical or substantially identical to any target RNA being assayed; and (b) a region that is complementary or substantially complementary to the target-binding region of at one least primer from the first set of primers. In some embodiments of any of the aspects, the region of the positive control nucleic acid that is identical or substantially identical to at least one target RNA is complementary or substantially complementary to the target-binding region of at one least primer from the first set of primers. In some embodiments of any of the aspects, the enzymatic control comprises SEQ ID NO: 11 or a nucleic acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SEQ ID NO: 11 that maintains the same function (e.g., specific binding to at least one primer in the first set of primers).
In some embodiments of any of the aspects, the enzymatic control functions as a control for the enzymatic reactions (e.g., the RT step, the amplification step, and/or the sequencing step). In some embodiments of any of the aspects, the primer in the first step of primers (i.e., RT primer) for the enzymatic control (e.g., SEQ ID NO: 11) comprises SEQ ID NO: 3, or e.g., SEQ ID NO: 1005. In some embodiments of any of the aspects, the forward primer in the second set of primers (i.e., FW PCR primer) for the enzymatic control (e.g., SEQ ID NO: 11) is SEQ ID NO: 14. In some embodiments of any of the aspects, the reverse primer in the second set of primers (i.e., RV PCR primer) for the enzymatic control (e.g., SEQ ID NO: 11) comprises SEQ ID NO: 12. In some embodiments of any of the aspects, the first and second sequencing primers in the third set of primers for the enzymatic control are SEQ ID NO: 15 and SEQ ID NO: 17 (see e.g., Table 15).
If a sequencing signal is detected from the enzymatic control (e.g., SEQ ID NO: 11), then all of the enzymatic reactions were completed successfully. If a sequencing signal is not detected from the enzymatic control (e.g., SEQ ID NO: 11), then at least one of the enzymatic reactions (e.g., the RT step, the amplification step, and/or the sequencing step) were not completed successfully.
In some embodiments of any of the aspects, the sample is contacted with at least 100 copies/ul of enzymatic control (e.g., SEQ ID NO: 11). In some embodiments of any of the aspects, the sample is contacted with at least 104 copies/ul of enzymatic control (e.g., SEQ ID NO: 11). In some embodiments of any of the aspects, the sample is contacted with at least 101 copies/ul, at least 102 copies/ul, at least 103 copies/ul, at least 104 copies/ul, at least 105 copies/ul, at least 106 copies/ul, at least 107 copies/ul, at least 108 copies/ul, at least 109 copies/ul, at least 1010 copies/ul or more of enzymatic control. In some embodiments of any of the aspects, the sample is contacted with both a sample control (e.g., primer specific to hRPP30) and an enzymatic control (e.g., SEQ ID NO: 11).
In some embodiments of any of the aspects, step (a) (e.g., the RT step) further comprises contacting the samples with a stabilization agent. In some embodiments of any of the aspects, the stabilization agent prevents degradation of the RNA target and/or reverse transcriptase for at least 6 hours at room temperature. The stabilization agent or agents can be present, for example, in the viral transport medium, such that RNA is protected as soon as the sample is placed in the medium. In some embodiments of any of the aspects, the stabilization agent prevents degradation of the RNA target and/or reverse transcriptase for at least 24 hours at room temperature. In some embodiments of any of the aspects, the stabilization agent prevents degradation of the RNA target and/or reverse transcriptase for at least 1 hour, at least 2 hours, at least 3 hours, at least 4 hours, at least 5 hours, at least 6 hours, at least 7 hours, at least 8 hours, at least 9 hours, at least 10 hours, at least 11 hours, at least 12 hours, at least 13 hours, at least 14 hours, at least 15 hours, at least 16 hours, at least 17 hours, at least 18 hours, at least 19 hours, at least 20 hours, at least 21 hours, at least 22 hours, at least 23 hours, at least 24 hours, at least 25 hours, at least 26 hours, at least 27 hours, at least 28 hours, at least 29 hours, at least 30 hours, at least 31 hours, at least 32 hours, at least 33 hours, at least 34 hours, at least 35 hours, at least 36 hours, at least 37 hours, at least 38 hours, at least 39 hours, at least 40 hours, at least 41 hours, at least 42 hours, at least 43 hours, at least 44 hours, at least 45 hours, at least 46 hours, at least 47 hours, at least 48 hours, at least 49 hours, at least 50 hours, at least 51 hours, at least 52 hours, at least 53 hours, at least 54 hours, at least 55 hours, at least 56 hours, at least 57 hours, at least 58 hours, at least 59 hours, at least 60 hours, at least 61 hours, at least 62 hours, at least 63 hours, at least 64 hours, at least 65 hours, at least 66 hours, at least 67 hours, at least 68 hours, at least 69 hours, at least 70 hours, at least 71 hours, at least 72 hours or more, e.g., at room temperature.
In some embodiments of any of the aspects, the stabilization agent is an RNA-preserving agent and/or a reverse-transcriptase-preserving agent. In some embodiments of any of the aspects, the reverse transcription reaction comprises an RNA-preserving agent. In some embodiments of any of the aspects, the reverse transcription reaction comprises a reverse-transcriptase-preserving agent. In some embodiments of any of the aspects, the reverse transcription reaction comprises both an RNA-preserving agent and a reverse-transcriptase-preserving agent.
In some embodiments of any of the aspects, the RNA-preserving agent is an RNase inhibitor, a metal-chelating agent, and/or a reducing agent. In some embodiments of any of the aspects, the reverse transcription reaction comprises an RNase inhibitor. In some embodiments of any of the aspects, the reverse transcription reaction comprises a metal-chelating agent. In some embodiments of any of the aspects, the reverse transcription reaction comprises a reducing agent. In some embodiments of any of the aspects, the reverse transcription reaction comprises an RNase inhibitor and a metal-chelating agent. In some embodiments of any of the aspects, the reverse transcription reaction comprises an RNase inhibitor and a reducing agent. In some embodiments of any of the aspects, the reverse transcription reaction comprises a metal-chelating agent and a reducing agent. In some embodiments of any of the aspects, the reverse transcription reaction comprises an RNase inhibitor, a metal-chelating agent, and a reducing agent.
In some embodiments of any of the aspects, the reverse-transcriptase-preserving agent is an antibiotic, an antimycotic, and/or a protease inhibitor. In some embodiments of any of the aspects, the reverse transcription reaction comprises an antibiotic. In some embodiments of any of the aspects, the reverse transcription reaction comprises an antimycotic. In some embodiments of any of the aspects, the reverse transcription reaction comprises a protease inhibitor. In some embodiments of any of the aspects, the reverse transcription reaction comprises an antibiotic and an antimycotic. In some embodiments of any of the aspects, the reverse transcription reaction comprises an antibiotic and a protease inhibitor. In some embodiments of any of the aspects, the reverse transcription reaction comprises an antimycotic and a protease inhibitor. In some embodiments of any of the aspects, the reverse transcription reaction comprises an antibiotic, an antimycotic, and a protease inhibitor.
In some embodiments of any of the aspects, the viral transport medium or reverse transcription reaction comprises contacting the sample with at least one of the following stabilization agents: (a) an RNase inhibitor; (b) a metal-chelating agent; (c) a reducing agent; d) an antibiotic; (e) an antimycoctic; and/or (f) a protease inhibitor. Table 13 provides exemplary combinations of such stabilization agents. In some embodiments, if the reverse transcription reaction does not comprise a specific stabilization agent, it can be added in a subsequent step.
Table 13: Non-Limiting Examples of Stabilization Agents in the RT Reaction; “RI” indicates an RNase inhibitor; “MC: indicates a metal-chelating agent; “RA” indicates a reducing agent; “AB” indicates an antibiotic; “AM” indicates an antimycoctic; and “PI” indicates a protease inhibitor.
In some embodiments of any of the aspects, the RNase inhibitor is murine RNase inhibitor or a thermostable RNase inhibitor. In some embodiments of any of the aspects, the RNase inhibitor specifically inhibits RNases A, B and C, which specifically cleave ssRNA or dsRNA. RNase A and RNase B are an endoribonuclease that specifically degrades single-stranded RNA at C and U residues. RNase C recognizes dsRNA and cleaves it at specific targeted locations to transform them into mature RNAs. In some embodiments of any of the aspects, the RNase inhibitor is present in the reverse transcription reaction at a concentration of at least 10% (e.g., volume per volume, v/v, percent). In some embodiments of any of the aspects, the RNase inhibitor is present in the reverse transcription reaction at a concentration of at least 0.1%, at least 0.2%, at least 0.3%, at least 0.4%, at least 0.5%, at least 0.6%, at least 0.7%, at least 0.8%, at least 0.9%, at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, at least 10%, at least 11%, at least 12%, at least 13%, at least 14%, at least 15%, at least 16%, at least 17%, at least 18%, at least 19%, or at least 20%.
Exemplary RNase inhibitors include, but are not limited to, mammalian ribonuclease inhibitor proteins such as porcine ribonuclease inhibitor and human ribonuclease inhibitor (e.g., human placenta ribonuclease inhibitor and recombinant human ribonuclease inhibitor), vanadyl ribonucleoside complexes, proteinase K, phenylglyoxal, p-hydroxyphenylglyoxal, polyamines, spermidine, 9-aminoacridine, iodoacetate, bentonite, poly[2′-O-(2,4-dinitrophenyl)]poly(adenyhlic acid), zinc sulfate, bromopyruvic acid, formamide, dimethylformamide, copper, zinc, aurintricarboxylic acid (ATA) and salts thereof such as triammonium aurintricarboxylate (aluminon), adenosine 5′-pyrophosphate, 2′-cytidine monophosphate free acid (2′-CMP), 5′-diphosphoadenosine 3′-phosphate (ppA-3′-p), 5′-diphosphoadenosine 2′-phosphate (ppA-2′-p), leucine, oligovinysulfonic acid, poly(aspartic acid), tyrosine-glutamic acid polymer, 5′-phospho-2′-deoxyuridine 3′-pyrophosphate P′→5′-ester with adenosine 3′-phosphate (pdUppAp), and analogs, derivatives and salts thereof.
In some embodiments of any of the aspects, the RNase inhibitor is a ribonuclease inhibitor protein, such as a recombinant RNase inhibitor, e.g., a recombinant mammalian RNase inhibitor. In some embodiments of any of the aspects, the RNase inhibitor is murine RNase inhibitor or RNasin® Plus (Promega™). In some embodiments of any of the aspects, the RNase inhibitor is murine RNase inhibitor or a thermostable RNase inhibitor. In some embodiments of any of the aspects, the RNase inhibitor is a thermostable RNase inhibitor, e.g., RNasin® Plus. One unit is defined as the amount of RNase inhibitor (e.g., RNasin®) required to inhibit the activity of 5 ng of ribonuclease A by 50%; activity is measured by the inhibition of hydrolysis of cytidine 2,3′-cyclic monophosphate by ribonuclease A.
In some embodiments of any of the aspects, the RNase inhibitor, i.e., a ribonuclease inhibitor protein, is added to a final concentration of at least 0.01 U/µL, at least 0.02 U/µL, at least 0.03 U/µL, at least 0.04 U/µL, at least 0.05 U/µL, at least 0.06 U/µL, at least 0.07 U/µL, at least 0.08 U/µL, at least 0.09 U/µL, at least 0.1 U/µL, at least 0.2 U/µL, at least 0.3 U/µL, at least 0.4 U/µL, at least 0.5 U/µL, at least 0.6 U/µL, at least 0.7 U/µL, at least 0.8 U/µL, at least 0.9 U/µL, at least 1.0 U/µL, at least 1.1 U/µL, at least 1.2 U/µL, at least 1.3 U/µL, at least 1.4 U/µL, at least 1.5 U/µL, at least 1.6 U/µL, at least 1.7 U/µL, at least 1.8 U/µL, at least 1.9 U/µL, at least 2.0 U/µL, at least 2.1 U/µL, at least 2.2 U/µL, at least 2.3 U/µL, at least 2.4 U/µL, at least 2.5 U/µL, at least 2.6 U/µL, at least 2.7 U/µL, at least 2.8 U/µL, at least 2.9 U/µL, at least 3.0 U/µL, at least 3.1 U/µL, at least 3.2 U/µL, at least 3.3 U/µL, at least 3.4 U/µL, at least 3.5 U/µL, at least 3.6 U/µL, at least 3.7 U/µL, at least 3.8 U/µL, at least 3.9 U/µL, at least 4.0 U/µL, at least 4.1 U/µL, at least 4.2 U/µL, at least 4.3 U/µL, at least 4.4 U/µL, at least 4.5 U/µL, at least 4.6 U/µL, at least 4.7 U/µL, at least 4.8 U/µL, at least 4.9 U/µL, at least 5.0 U/µL, at least 5.1 U/µL, at least 5.2 U/µL, at least 5.3 U/µL, at least 5.4 U/µL, at least 5.5 U/µL, at least 5.6 U/µL, at least 5.7 U/µL, at least 5.8 U/µL, at least 5.9 U/µL, at least 6.0 U/µL, at least 6.1 U/µL, at least 6.2 U/µL, at least 6.3 U/µL, at least 6.4 U/µL, at least 6.5 U/µL, at least 6.6 U/µL, at least 6.7 U/µL, at least 6.8 U/µL, at least 6.9 U/µL, at least 7.0 U/µL, at least 7.1 U/µL, at least 7.2 U/µL, at least 7.3 U/µL, at least 7.4 U/µL, at least 7.5 U/µL, at least 7.6 U/µL, at least 7.7 U/µL, at least 7.8 U/µL, at least 7.9 U/µL, at least 8.0 U/µL, at least 8.1 U/µL, at least 8.2 U/µL, at least 8.3 U/µL, at least 8.4 U/µL, at least 8.5 U/µL, at least 8.6 U/µL, at least 8.7 U/µL, at least 8.8 U/µL, at least 8.9 U/µL, at least 9.0 U/µL, at least 9.1 U/µL, at least 9.2 U/µL, at least 9.3 U/µL, at least 9.4 U/µL, at least 9.5 U/µL, at least 9.6 U/µL, at least 9.7 U/µL, at least 9.8 U/µL, at least 9.9 U/µL, at least 10 U/µL, at least 20 U/µL, at least 30 U/µL, at least 40 U/µL, or at least 50 U/µL.
In some embodiments of any of the aspects, the metal-chelating agent is selected from the group consisting of ethylenediaminetetraacetic acid (EDTA), ethylene glycol-bis(β-aminoethyl ether)-N,N,N′,N′-tetraacetic acid (EGTA), 2,3-dimercapto-1-propanesulfonic acid sodium (DMPS), dimercaptosuccinic acid (DMSA), metallothionin, and desferroxamine. Chelation is the binding of ions and molecules to metal ions, involving the formation or presence of two or more separate coordinate bonds between a polydentate (multiple bonded) ligand and a single central metal atom. In some embodiments of any of the aspects, the metal-chelating agent is EDTA. In some embodiments of any of the aspects, the metal-chelating agent (e.g., EDTA) is present in the reverse transcription reagent at a concentration of at least 0.5 mM. In some embodiments of any of the aspects, the metal-chelating agent (e.g., EDTA) is present in the reverse transcription reagent at a concentration of at least 0.01 mM, at least 0.02 mM, at least 0.03 mM, at least 0.04 mM, at least 0.05 mM, at least 0.06 mM, at least 0.07 mM, at least 0.08 mM, at least 0.09 mM, at least 0.1 mM, at least 0.2 mM, at least 0.3 mM, at least 0.4 mM, at least 0.5 mM, at least 0.6 mM, at least 0.7 mM, at least 0.8 mM, at least 0.9 mM, at least 1 mM or more.
It should be noted that metal-chelating agents, e.g., EDTA, can inhibit polymerase function as well as nuclease activities. In some embodiments of any of the aspects, the metal-chelating agent is diluted out or removed from the solution prior to the RT and/or amplification reactions.
In some embodiments of any of the aspects, the reducing agent is selected from the group consisting of: tris-(2-carboxyethyl)-phosphine (TCEP), cysteine, dithionite, dithioerythritol, dithiothreitol (DTT), dysteine, 2- mercaptoethanol, mercaptoethylene, bisulfite, sodium metabisulfite, pyrosulfite, pentaerythritol, thioglycolic acid, urea, uric acid, vitamin C, vitamin E, superoxide dismutases, and analogs, derivatives and salts thereof. In some embodiments of any of the aspects, the reducing agent is dithiothreitol (DTT). Dithiothreitol (DTT) is a redox reagent used to stabilize proteins which possess free sulfhydryl groups (e.g., RT).
The reducing agent can be added to any desired amount. In some embodiments of any of the aspects, the reducing agent is present in the reverse transcription reaction at a concentration of at least 5 mM. For example, the reducing agent can be added to a final concentration of at least 0.1 mM, at least 0.2 mM, at least 0.3 mM, at least 0.4 mM, at least 0.5 mM, at least 0.6 mM, at least 0.7 mM, at least 0.8 mM, at least 0.9 mM, at least 1 mM, at least 2 mM, at least 3 mM, at least 4 mM, at least 5 mM, at least 6 mM, at least 7 mM, at least 8 mM, at least 10 mM, at least 11 mM, at least 12 mM, at least 13 mM, at least 14 mM, at least 15, mM, at least 16 mM, at least 17 mM, at least 18 mM, at least 19 mM, at least 20 mM, at least 25 mM, at least 30 mM, at least 35 mM, at least 40 mM, at least 45 mM, at least 50 mM, at least 55 mM, at least 60 mM, at least 65 mM, at least 70 mM, at least 75 mM, at least 80 mM, at least 85 mM, at least 90 mM, at least 95 mM, at least 100 mM or more.
In some embodiments of any of the aspects, the reverse transcription reaction comprises an antibiotic (i.e., anti-bacterial) and/or an antimycoctic (i.e., anti-fungal), which permits stabilization of the reverse transcriptase and prevents bacterial or fungal contamination of the sample (e.g., during incubation at room temperature for 6-24 hours). In some embodiments of any of the aspects, the antibiotic is penicillin (e.g., 10,000 units/mL) and/or streptomycin (e.g., 10,000 µg/mL). Penicillin was originally purified from the fungus Penicillium and acts by interfering directly with the turnover of the bacterial cell wall and indirectly by triggering the release of enzymes that further alter the cell wall. Penicillin inhibits gram-positive bacteria. Streptomycin was originally purified from Streptomyces griseus. Streptomycin acts by binding to the 30S subunit of the bacterial ribosome leading to inhibition of protein synthesis and death in susceptible bacteria. Streptomycin inhibits gram-positive and gram-negative bacteria.
In some embodiments of any of the aspects, the antibiotic (also referred to as anti-bacterial) is selected from the group consisting of: aminoglycosides, ansamycins, beta-lactams, bis-biguanides, carbacephems, carbapenems, cationic polypeptides, cephalosporins, fluoroquinolones, glycopeptides, iron-sequestering glycoproteins, linosamides, lipopeptides, macrolides, monobactams, nitrofurans, oxazolidinones, penicillins, polypeptides, quaternary ammonium compounds, quinolones, silver compounds, sulfonamides, tetracyclines, and any combinations thereof. In some embodiments of any of the aspects, the antimicrobial agent can comprise an antibiotic.
Some exemplary specific antimicrobial agents include broad penicillins, amoxicillin (e.g., Ampicillin, Bacampicillin, Carbenicillin Indanyl, Mezlocillin, Piperacillin, Ticarcillin), Penicillins and Beta Lactamase Inhibitors (e.g., Amoxicillin-Clavulanic Acid, Ampicillin-Sulbactam, Benzylpenicillin, Cloxacillin, Dicloxacillin, Methicillin, Oxacillin, Penicillin G, Penicillin V, Piperacillin Tazobactam, Ticarcillin Clavulanic Acid, Nafcillin), Cephalosporins (e.g., Cephalosporin I Generation, Cefadroxil, Cefazolin, Cephalexin, Cephalothin, Cephapirin, Cephradine), Cephalosporin II Generation (e.g., Cefaclor, Cefamandole, Cefonicid, Cefotetan, Cefoxitin, Cefprozil, Cefmetazole, Cefuroxime, Loracarbef), Cephalosporin III Generation (e.g., Cefdinir, Ceftibuten, Cefoperazone, Cefixime, Cefotaxime, Cefpodoxime proxetil, Ceftazidime, Ceftizoxime, Ceftriaxone), Cephalosporin IV Generation (e.g., Cefepime), Macrolides and Lincosamides (e.g., Azithromycin, Clarithromycin, Clindamycin, Dirithromycin, Erythromycin, Lincomycin, Troleandomycin), Quinolones and Fluoroquinolones (e.g., Cinoxacin, Ciprofloxacin, Enoxacin, Gatifloxacin, Grepafloxacin, Levofloxacin, Lomefloxacin, Moxifloxacin, Nalidixic acid, Norfloxacin, Ofloxacin, Sparfloxacin, Trovafloxacin, Oxolinic acid, Gemifloxacin, Perfloxacin), Carbapenems (e.g., Imipenem-Cilastatin, Meropenem), Monobactams (e.g., Aztreonam), Aminoglycosides (e.g., Amikacin, Gentamicin, Kanamycin, Neomycin, Netilmicin, Streptomycin, Tobramycin, Paromomycin), Glycopeptides (e.g., Teicoplanin, Vancomycin), Tetracyclines (e.g., Demeclocycline, Doxycycline, Methacycline, Minocycline, Oxytetracycline, Tetracycline, Chlortetracycline), Sulfonamides (e.g., Mafenide, Silver Sulfadiazine, Sulfacetamide, Sulfadiazine, Sulfamethoxazole, Sulfasalazine, Sulfisoxazole, Trimethoprim-Sulfamethoxazole, Sulfamethizole), Rifampin (e.g., Rifabutin, Rifampin, Rifapentine), Oxazolidinones (e.g., Linezolid, Streptogramins, Quinupristin Dalfopristin), Bacitracin, Chloramphenicol, Fosfomycin, Isoniazid, Methenamine, Metronidazole, Mupirocin, Nitrofurantoin, Nitrofurazone, Novobiocin, Polymyxin, Spectinomycin, Trimethoprim, Colistin, Cycloserine, Capreomycin, Ethionamide, Pyrazinamide, Para-aminosalicylic acid, Erythromycin ethylsuccinate, and the like.
In some embodiments of any of the aspects, the antimycotic is Amphotericin B (e.g., 25 µg/mL). Amphotericin B is an antifungal agent that prevents the growth of fungi and yeast by causing an increase in fungal plasma membrane permeability. In some embodiments of any of the aspects, the antimycotic (also referred to as anti-fungal) is selected from the group consisting of: polyene antifungals, Amphotericin B, Candicidin, Filipin, Hamycin, Natamycin, Nystatin, Rimocidin, imidazole antifungals, triazole antifungals, thiazole antifungals, Bifonazole, Butoconazole, Clotrimazole, Econazole, Fenticonazole, Isoconazole, Ketoconazole, Luliconazole, Miconazole, Omoconazole, Oxiconazole, Sertaconazole, Sulconazole, Tioconazole, Triazoles, Albaconazole, Efinaconazole, Epoxiconazole, Fluconazole, Isavuconazole, Itraconazole, Posaconazole, Propiconazole, Ravuconazole, Terconazole, Voriconazole, Abafungin, Allylamines, amorolfin, butenafine, naftifine, terbinafine, Echinocandins, Anidulafungin, Caspofungin, Micafungin, Aurones, Benzoic acid, Ciclopirox, Flucytosine, 5-fluorocytosin, Griseofulvin, Haloprogin, Tolnaftate, Undecylenic acid, Triacetin, Crystal violet, Castellani’s paint, Orotomide, Miltefosine, Potassium iodide, Coal tar, Copper(II) sulfate, Selenium disulfide, Sodium thiosulfate, Piroctone olamine, Iodoquinol, clioquinol, Acrisorcin, Zinc pyrithione, and Sulfur. Additional antifungals known in the art can also be used.
In some embodiments of any of the aspects, the antibiotic(s) and/or antimycoctic(s) is present in the reverse transcription reaction at a concentration of at least 10 ug/mL, at least 15 ug/mL, at least 20 ug/mL, at least 25 ug/mL, at least 30 ug/mL, at least 35 ug/mL, at least 40 ug/mL, at least 45 ug/mL, at least 50 ug/mL, at least 60 ug/mL, at least 70 ug/mL, at least 80 ug/mL, at least 90 ug/mL, at least 100 ug/mL, at least 110 ug/mL, at least 120 ug/mL, at least 130 ug/mL, at least 140 ug/mL, at least 150 ug/mL, at least 160 ug/mL, at least 170 ug/mL, at least 180 ug/mL, at least 190 ug/mL, at least 200 ug/mL, at least 210 ug/mL, at least 220 ug/mL, at least 230 ug/mL, at least 240 ug/mL, at least 250 ug/mL, at least 260 ug/mL, at least 270 ug/mL, at least 280 ug/mL, at least 290 ug/mL, at least 300 ug/mL, at least 310 ug/mL, at least 320 ug/mL, at least 330 ug/mL, at least 340 ug/mL, at least 350 ug/mL, at least 360 ug/mL, at least 370 ug/mL, at least 380 ug/mL, at least 390 ug/mL, at least 400 ug/mL, at least 410 ug/mL, at least 420 ug/mL, at least 430 ug/mL, at least 440 ug/mL, at least 450 ug/mL, at least 460 ug/mL, at least 470 ug/mL, at least 480 ug/mL, at least 490 ug/mL, at least 500 ug/mL, at least 510 ug/mL, at least 520 ug/mL, at least 530 ug/mL, at least 540 ug/mL, at least 550 ug/mL, at least 560 ug/mL, at least 570 ug/mL, at least 580 ug/mL, at least 590 ug/mL, at least 600 ug/mL, at least 610 ug/mL, at least 620 ug/mL, at least 630 ug/mL, at least 640 ug/mL, at least 650 ug/mL, at least 660 ug/mL, at least 670 ug/mL, at least 680 ug/mL, at least 690 ug/mL, at least 700 ug/mL, at least 710 ug/mL, at least 720 ug/mL, at least 730 ug/mL, at least 740 ug/mL, at least 750 ug/mL, at least 760 ug/mL, at least 770 ug/mL, at least 780 ug/mL, at least 790 ug/mL, at least 800 ug/mL, at least 810 ug/mL, at least 820 ug/mL, at least 830 ug/mL, at least 840 ug/mL, at least 850 ug/mL, at least 860 ug/mL, at least 870 ug/mL, at least 880 ug/mL, at least 890 ug/mL, at least 900 ug/mL, at least 910 ug/mL, at least 920 ug/mL, at least 930 ug/mL, at least 940 ug/mL, at least 950 ug/mL, at least 960 ug/mL, at least 970 ug/mL, at least 980 ug/mL, at least 990 ug/mL, at least 1000 ug/mL, at least 1500 ug/mL, at least 2000 ug/mL, at least 2500 ug/mL, at least 3000 ug/mL, at least 3500 ug/mL, at least 4000 ug/mL, at least 4500 ug/mL, at least 5000 ug/mL, at least 5500 ug/mL, at least 6000 ug/mL, at least 6500 ug/mL, at least 7000 ug/mL, at least 7500 ug/mL, at least 8000 ug/mL, at least 8500 ug/mL, at least 9000 ug/mL, at least 9500 ug/mL, at least 10,000 ug/mL or more.
In some embodiments of any of the aspects, the reverse transcription reaction does not comprise an antiviral. Non-limiting examples of antivirals include Abacavir, Acyclovir, Adefovir, Amantadine, Ampligen, Amprenavir, antiretroviral, Arbidol, Atazanavir, Atripla, Cidofovir, Combivir, Darunavir, Delavirdine, Didanosine, Docosanol, Dolutegravir, Ecoliever, Edoxudine, Efavirenz, Emtricitabine, Enfuvirtide, Entecavir, Famciclovir, Fomivirsen, Fosamprenavir, Foscarnet, Fosfonet, Fusion inhibitor, Ibacitabine, Idoxuridine, Imiquimod, Imunovir, Indinavir, Inosine, Integrase inhibitor, Interferon, Interferon type I, Interferon type II, Interferon type III, Lamivudine, Lopinavir, Loviride, Maraviroc, Methisazone, Moroxydine, Nelfinavir, Nevirapine, Nexavir, Nitazoxanide, Norvir, Nucleoside analogues, Oseltamivir (Tamiflu), Peginterferon alfa-2a, Penciclovir, Peramivir, Pleconaril, Podophyllotoxin, viral protease inhibitor, Pyramidine, Raltegravir, Reverse transcriptase inhibitor, Ribavirin, Rimantadine, Ritonavir, Saquinavir, Sofosbuvir, Stavudine, Synergistic enhancer (antiretroviral), Telaprevir, Tenofovir, Tenofovir disoproxil, Tipranavir, Trifluridine, Trizivir, Tromantadine, Truvada, Valaciclovir (Valtrex), Valganciclovir, Vicriviroc, Vidarabine, Viramidine, Zalcitabine, Zanamivir (Relenza), or Zidovudine.
Protease inhibitors inhibit peptide degradation, e.g., degradation of the reverse transcriptase. Non-limiting classes of protease inhibitors include reversible or irreversible inhibitors of substrate (e.g., peptide) binding to the protease. Particular non-limiting classes of protease inhibitors include serine and cysteine protease inhibitors. Specific non-limiting examples of protease inhibitors include PMSF, PMSF Plus, APMSF, antithrombin I11, Amastatin, Antipain, aprotinin, Bestatin, Benzamidine, Chymostatin, calpain inhibitor I and II, E-64,3,4-dichloroisocoumarin, DFP, Elastatinal, Leupeptin, Pepstatin, 1,10-Phenanthroline, Phosphoramidon, TIMP-2, TLCK, TPCK, trypsin inhibitor (soybean or chicken egg white), hirustasin, alpha-2-macroglobulin, 4-(2-aminoethyl)-benzenesulfonyl fluoride hydrochloride (AEBSF) and Kunitz-type protease inhibitors.
In some embodiments of any of the aspects, the protease inhibitor is a protease inhibitor cocktail (e.g., cOmplete™ tablets). Such protease inhibitor tablets inhibit a broad spectrum of serine, cysteine, and metalloproteases, as well as calpains. Due to the composition of the tablets, they show excellent inhibition effects, and are well suited for the protection of proteins isolated from animal tissues, plants, yeast, and bacteria. Such protease inhibitor tablets comprise both irreversible and reversible protease inhibitors. Such protease inhibitor tablets can be substantially free of metal-chelating agents, such as EDTA.
In some embodiments of any of the aspects, the protease inhibitor is present at a concentration of one tablet per 10 mL of reverse transcriptase reaction buffer. In some embodiments of any of the aspects, the protease inhibitor is present at a concentration of at least 1, at least 2, at least 3, at least 4, at least 5 or more tablets per 10 mL of reverse transcriptase reaction buffer. In some embodiments of any of the aspects, the protease inhibitor is present at a concentration of one tablet for at least 1 mL, at least 2 mL, at least 3 mL, at least 4 mL, at least 5 mL, at least 6mL, at least 7 mL, at least 8 mL, at least 9 mL, at least 10 mL, at least 11 mL, at least 12 mL, at least 13 mL, at least 14 mL, at least 15 mL, at least 16 mL, at least 17 mL, at least 18 mL, at least 19 mL, at least 20 mL or more of reverse transcriptase reaction buffer.
In some embodiments of any of the aspects, step (a) comprises a reverse transcription reaction. In some embodiments of any of the aspects, the RT step comprises one round of polymerization, wherein the target RNA is reverse-transcribed into a single-stranded cDNA. In some embodiments of any of the aspects, the reverse transcription products from step (a) (the RT step) comprise a barcoded DNA comprising a region that is complementary to a portion of at least one target RNA.
In some embodiments of any of the aspects, the reverse transcription step comprises contacting the sample with a reverse transcriptase, a first primer or a first set of primers, and a reverse transcription reaction buffer. In some embodiments, the RT reaction buffer comprises at least one of the following: water, magnesium acetate (or another magnesium compound such as magnesium chloride), and/or dNTPs. In some embodiments of any of the aspects, the reaction buffer maintains the reaction at specific optimal pH (e.g., 7-9; e.g., 8.1) and can include such components as Tris, KCl, MgCl2, and other buffers or salts. Magnesium ions (Mg2+) can function as a cofactor for polymerases, increasing their activity. Deoxynucleoside triphosphate (dNTPs) are free nucleoside triphosphates comprising deoxyribose as the sugar (e.g., dATP, dGTP, dCTP, and dTTP) that are used in the polymerization of the cDNA.
In one aspect, described herein is a reverse transcription solution comprising at least one of the following: (a) a reverse transcriptase; (b) a first primer or a first set of primers comprising at least one barcode; (c) a detergent; (d) carrier nucleic acid; (e) at least one positive control nucleic acid; (f) at least one stabilization agent; and/or (g) a RT reaction buffer. Table 14 provides exemplary combinations of such reverse transcription solution components. In some embodiments, if the reverse transcription solution does not comprise a specific component, it can be added in a subsequent step.
“RT” indicates reverse transcriptase; “FP” indicates first primer or a first set of primers comprising at least one barcode; “Det.” indicates a detergent; “CN” indicates carrier nucleic acid; “PC” indicates at least one positive control nucleic acid; “SA” indicates at least one stabilization agent; and “Buf.” indicates a RT reaction buffer.
In one aspect, described herein is a collection container (e.g., a collection tube) containing a reverse transcription solution as described herein. In some embodiments of any of the aspects, the sample collection container further contains viral transport media, as described further herein. In some embodiments of any of the aspects, a sample from the subject can be added directly to the collection container, reducing the number of liquid handling steps (see e.g.,
In some embodiments of any of the aspects, step (a) (the RT step) comprises: (i) incubating the sample, reverse transcriptase, and first primer or first set of primers comprising at least one barcode at a temperature of at least 50° C. for at least 30 minutes; and (ii) inactivating the reverse transcription reaction at a temperature of at least 95° C. for at least 5 minutes. In some embodiments of any of the aspects, step (i) further comprises incubating the sample in a RT reaction solution as described herein (see e.g. Table 13 and Table 14).
In some embodiments of any of the aspects, step (i) (e.g., the incubation step) of the RT reaction comprises incubating the reaction at a temperature of at least 50° C. In some embodiments of any of the aspects, step (i) (e.g., the incubation step) of the RT reaction comprises incubating the reaction at a temperature of at least 30° C., at least 31° C., at least 32° C., at least 33° C., at least 34° C., at least 35° C., at least 36° C., at least 37° C., at least 38° C., at least 39° C., at least 40° C., at least 41° C., at least 42° C., at least 43° C., at least 44° C., at least 45° C., at least 46° C., at least 47° C., at least 48° C., at least 49° C., at least 50° C., at least 51° C., at least 52° C., at least 53° C., at least 54° C., at least 55° C., at least 56° C., at least 57° C., at least 58° C., at least 59° C., at least 60° C., at least 61° C., at least 62° C., at least 63° C., at least 64° C., at least 65° C., at least 66° C., at least 67° C., at least 68° C., at least 69° C., at least 70° C. or more. In some embodiments of any of the aspects, the RT step is performed at body temperature (e.g., 37° C.). In some embodiments of any of the aspects, the RT step is performed on a heat block set to approximately 50° C. or an incubator set to approximately 50° C.
In some embodiments of any of the aspects, step (i) (e.g., the incubation step) of the RT reaction comprises incubating the reaction for at least 30 minutes. In some embodiments of any of the aspects, step (i) (e.g., the incubation step) of the RT reaction comprises incubating the reaction for at least 15 minutes, at least 20 minutes, at least 25 minutes, at least 30 minutes, at least 40 minutes, at least 50 minutes, at least 60 minutes, at least 70 minutes, at least 80 minutes, at least 90 minutes, or at least 100 minutes. The specific conditions, e.g., of temperature, time, and buffer conditions can be varied as necessary to accommodate different RT enzymes.
In some embodiments of any of the aspects, step (ii) (e.g., the inactivation step) of the RT reaction comprises inactivating the reaction at a temperature of at least 95° C. In some embodiments of any of the aspects, step (ii) (e.g., the inactivation step) of the RT reaction comprises inactivating the reaction at a temperature of at least 50° C., at least 55° C., at least 60° C., at least 65° C., at least 70° C., at least 75° C., at least 80° C., at least 85° C., at least 90° C., at least 91° C., at least 92° C., at least 93° C., at least 94° C., at least 95° C., at least 96° C., at least 97° C., at least 98° C., at least 99° C., at least 99° C., or at least 99.5° C. In some embodiments of any of the aspects, step (ii) (e.g., the inactivation step) of the RT reaction comprises inactivating the reaction for at least 5 minutes. In some embodiments of any of the aspects, step (ii) (e.g., the inactivation step) of the RT reaction comprises inactivating the reaction for at least 1 minute, at least 2 minutes, at least 3 minutes, at least 4 minutes, at least 5 minutes, at least 6 minutes, at least 7 minutes, at least 8 minutes, at least 9 minutes, at least 10 minutes, at least 15 minutes, at least 20 minutes, at least 25 minutes, or at least 30 minutes.
In some embodiments of any of the aspects, the reverse transcription products from step (a) for different samples are combined in one container to form a pooled reverse transcription product mixture. Such a step is in contrast to other methods, in which products can only be combined after the amplification step, not the reverse transcription step. Contacting the sample with a first primer or a first set of primers comprising at least one barcode, which produces individually barcoded cDNAs, allows for pre-amplification pooling of the reverse transcription products. In some embodiments of any of the aspects, reverse transcription products from step (a) (the RT step) of at least 5 samples are combined in one container. In some embodiments of any of the aspects, reverse transcription products from step (a) (the RT step) of at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1500, at least 2000, at least 2500, at least 3000, at least 3500, at least 4000, at least 4500, at least 5000 or more samples are combined in one container.
In some embodiments of any of the aspects, the reverse transcription step is performed in at most 30 minutes. As a non-limiting example, the reverse transcription step is performed in at most 20 minutes, at most 25 minutes, at most 30 minutes, at most 40 minutes, at most 50 minutes, at most 60 minutes, at most 70 minutes, at most 80 minutes, at most 90 minutes, at most 100 minutes, at most 110 minutes, or at most 120 minutes.
In another aspect, provided herein are compositions useful in detecting an RNA target. The composition can comprise any of the reagents discussed herein. In one aspect, described herein is a reverse transcription composition comprising at least two of the following: (a) a target RNA; (b) a reverse transcriptase; (c) a first primer or a first set of primers comprising at least one barcode; (d) a detergent; (e) a carrier nucleic acid; (f) a positive control nucleic acid; and/or (g) at least one stabilization agent. It is noted that a composition can comprise any one, two, three, four, five, six, or all seven of the components listed above.
Described are methods, kits, and systems that can be used to detect a target RNA. In some embodiments of any of the aspects, the cDNA resulting from the RT step is amplified to detectable levels. In some embodiments, the target RNA is present at a low starting amount, such that amplification is needed in order to detect the RNA. As used herein, “amplification” is defined as the production of additional copies of a nucleic acid sequence, i.e., for example, amplicons or amplification products. Methods of amplifying nucleic acid sequences are well known in the art. Such methods include, but are not limited to, polymerase chain reaction (PCR) and variants of PCR such as Rapid amplification of cDNA ends (RACE); ligase chain reaction (LCR); multiplex RT-PCR; immuno-PCR; Sequence-Independent, Single-Primer-Amplification (SSIPA); Real Time RT-qPCR; nanofluidic digital PCR; or isothermal amplification methods. Accordingly, the methods described herein comprise an amplification step (e.g., step (c)) of contacting the pooled reverse transcription product mixture with a DNA polymerase and a second set of primers, e.g., under conditions permitting the generation of amplification products. As used herein, the phrase “conditions permitting the generation of amplification products” refers to temperature(s), time(s), and/or reagent(s) that allow the DNA polymerase to catalyze the generation of dsDNA from the cDNA using at least one primer (e.g., at least two primers) from the second set of primers. In some embodiments of any of the aspects, the second set of primers comprises at least 2 primers and comprises a forward primer and reverse primer that together amplify a target of 15 base pairs (bp) - 50,000 bp, unless indicated otherwise.
In some embodiments of any of the aspects, the amplification step permits an amplification reaction, such as a polymerase chain reaction. In general, the PCR procedure relates to a method of gene amplification which is comprised of (i) sequence-specific hybridization of primers to specific genes or sequences within a nucleic acid sample or library, (ii) subsequent amplification involving multiple rounds of annealing, elongation, and denaturation using a thermostable DNA polymerase, and (iii) screening the PCR products for a band of the correct size. The primers used are oligonucleotides of sufficient length and appropriate sequence to provide initiation of polymerization, i.e. each primer is specifically designed to be complementary or include sequence complementary to a strand of the template (e.g., target cDNA) to be amplified. In an alternative embodiment, mRNA level of gene expression products described herein can be determined by reverse-transcription (RT) PCR or quantitative RT-PCR (QRT-PCR) or real-time PCR methods. Methods of RT-PCR and QRT-PCR are well known in the art.
In some embodiments of any of the aspects, the amplification method comprises isothermal amplification, which permits rapid and specific amplification of DNA at a constant temperature. In general, isothermal amplification is comprised of (i) sequence-specific hybridization of primers to specific genes or sequences within a nucleic acid sample or library, (ii) subsequent amplification involving multiple rounds of primer annealing, elongation, and strand displacement (as a non-limiting example, using a combination of recombinase, single-stranded binding proteins, and DNA polymerase), and (iii) detection of the product. In some embodiments of any of the aspects, the isothermal amplification produce can be detected through such methods as sequencing to confirm the identity of the amplified product or general assays such as turbidity. In some types of isothermal amplification, turbidity results from pyrophosphate byproducts produced during the reaction; these byproducts form a white precipitate that increases the turbidity of the solution. The primers used in isothermal amplification are oligonucleotides of sufficient length and appropriate sequence to provide initiation of polymerization, i.e. each primer is specifically designed to be complementary or include sequence complementary to a strand of the template (e.g., target cDNA) to be amplified. In contrast to the polymerase chain reaction (PCR) technology in which the reaction is carried out with a series of alternating temperature steps or cycles, isothermal amplification is carried out at one temperature, and does not require a thermal cycler or thermostable enzymes.
Non-limiting examples of isothermal amplification include: Recombinase Polymerase Amplification (RPA), nested RPA, Loop Mediated Isothermal Amplification (LAMP), Helicase-dependent isothermal DNA amplification (HDA), thermophilic helicase-dependent amplification (tHDA), Rolling Circle Amplification (RCA), strand displacement amplification (SDA), ligase chain reaction (LCR), nicking enzyme amplification reaction (NEAR), polymerase Spiral Reaction (PSR), polymerase cross-linking spiral reaction (PCLSR), and transcription-based amplification systems (TAS) such as nucleic acid sequence based amplification (NASBA), Rolling Circle Amplification (RCA), “RACE” and “one-sided PCR.” See e.g., Yan et al., Isothermal amplified detection of DNA and RNA, March 2014, Molecular BioSystems 10(5), DOI: 10.1039/c3mb70304e, the content of which is incorporated herein by reference in its entirety. In some embodiments of any of the aspects, the isothermal amplification reaction is Recombinase Polymerase Amplification (RPA) or Loop Mediated Isothermal Amplification (LAMP).
In some embodiments of any of the aspects, the isothermal amplification reaction is Recombinase Polymerase Amplification (RPA). RPA is a low temperature DNA and RNA amplification technique. The RPA process employs three core enzymes - a recombinase, a single-stranded DNA-binding protein (SSB) and strand-displacing polymerase. Recombinases are capable of pairing oligonucleotide primers with homologous sequence in duplex DNA. SSB bind to displaced strands of DNA and prevent the primers from being displaced. Finally, the strand displacing polymerase begins DNA synthesis where the primer has bound to the target DNA. By using two opposing primers, much like PCR, if the target sequence is indeed present, an exponential DNA amplification reaction is initiated. No other sample manipulation such as thermal or chemical melting is required to initiate amplification. At optimal temperatures (e.g., 37-42° C.), the RPA reaction progresses rapidly and results in specific DNA amplification from just a few target copies to detectable levels, typically within 10 minutes, for rapid detection of the target nucleic acid. In some embodiments of any of the aspects, the single-stranded DNA-binding protein is a gp32 SSB protein. In some embodiments of any of the aspects, the recombinase is a uvsX recombinase. See e.g., U.S. Pat. 7,666,598, the content of which is incorporated herein by reference in its entirety. In some embodiments of any of the aspects, RPA can also be referred to as Recombinase Aided Amplification (RAA). Accordingly, in some embodiments of any of the aspects, the amplification step comprises contacting the pooled reverse transcription product mixture from step (b) with a recombinase and single-stranded DNA binding protein. In some embodiments of any of the aspects, the amplification step(s) comprises contacting the pooled reverse transcription product mixture from step (b) with a DNA polymerase, a second set of primers, a recombinase, and single-stranded DNA binding protein.
In some embodiments of any of the aspects, the isothermal amplification reaction is Loop Mediated Isothermal Amplification (LAMP). LAMP is a single tube technique for the amplification of DNA; LAMP uses 4-6 primers, which form loop structures to facilitate subsequent rounds of amplification. Accordingly, in some embodiments of the aspects, the amplification step(s) comprises contacting the pooled reverse transcription product mixture from step (b) with a DNA polymerase and a set of primers, wherein the set of primers comprises 4, 5, or 6 loop-forming primers.
In some embodiments of any of the aspects, prior to step (c) (the amplification step) the first set of barcoded primers is substantially removed, e.g., from the pooled reverse transcription product mixture. In some embodiments of any of the aspects, prior to step (c) the target RNA is substantially removed, e.g., from the pooled reverse transcription product mixture. In some embodiments of any of the aspects, prior to step (c) the sample (e.g., the patient sample; e.g., the viral sample) is substantially removed, e.g., from the pooled reverse transcription product mixture. In some embodiments of any of the aspects, prior to step (c) the first set of barcoded primers, the RNA target, and/or the sample is substantially removed using a bead-based purification method. In some embodiments of any of the aspects, prior to step (c) the first set of barcoded primers, the RNA target, and/or the sample is substantially removed using a spin-column-based purification method.
Spin column-based nucleic acid purification is a solid phase extraction method to quickly purify nucleic acids. This method relies on the fact that nucleic acid will bind to the solid phase of silica under certain conditions. Magnetic bead/particle-based purification methods also employ a bind-wash-elute process. However, instead of using centrifugation or vacuum manifolds to remove the aqueous phase from contact with the silica matrix, these workflows use magnetic beads or particles functionalized with silica surfaces to allow selective binding of DNA in the presence of high concentrations of salt. DNA bound to a magnetic bead can be easily separated from the aqueous phase using a magnet; thereby allowing rapid sample processing and fine control of solution volumes. Magnetic-based methods are ideal for automation of high throughput processing, as they eliminate the need for centrifugation and other time-consuming steps.
In some embodiments of any of the aspects, the DNA polymerase used in the amplification step is a DNA-dependent DNA polymerase. DNA polymerases catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA, using a DNA or cDNA template. In some embodiments of any of the aspects, the DNA polymerase is a thermostable DNA polymerase, e.g., capable of withstanding (i.e., not irreversibly denaturing at) the high temperatures used in the amplification step. In some embodiments of any of the aspects, the DNA polymerase is a thermostable DNA polymerase I. DNA polymerase I (Pol I) is a prokaryotic polymerase, which is encoded by the po1A gene and ubiquitous among prokaryotes. This repair polymerase is involved in excision repair with both 3′-5′ and 5′-3′ exonuclease activity and processing of Okazaki fragments generated during lagging strand synthesis. Pol I is the most abundant polymerase in most prokaryotes.
Non-limiting examples of thermostable DNA polymerases include: Taq DNA polymerase from Thermus aquaticus; AmpliTaq™ Gold from Thermus aquaticus; HotTub™ from Thermus flavus; rTth from Thermus thermophilus; DNA polymerase from Thermotoga maritima (Ultma); Pwo DNA polymerase (Pyrococcus woesei); Tfl DNA polymerase (Thermus flavus); Tli DNA polymerase (Thermus litoralis); see e.g., Al-Soud et al., Appl Environ Microbiol. 1998 Oct; 64(10): 3748-3753. In some embodiments of any of the aspects, the DNA polymerase is a Thermus aquaticus (Taq) DNA polymerase or variant thereof (see e.g., SEQ ID NO: 1007). Taq polymerase is a heat-stable enzyme of this family that lacks proofreading ability. In some embodiments of any of the aspects, the DNA polymerase comprises SEQ ID NO: 1007 or an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SEQ ID NO: 1007 that maintains the same function (e.g., DNA-dependent DNA polymerase).
SEQ ID NO: 1007, DNA polymerase I, thermostable, po1A, Thermus aquaticus, UniProtKB - P19821, 832 aa
In some embodiments of any of the aspects, the DNA polymerase is provided (i.e., added to the reaction mixture) at a sufficient concentration to promote polymerization, e.g., 0.1 U/µL to 100 U/µL. As used herein, one unit (“U”) of DNA polymerase (e.g., Taq) is defined as the amount of enzyme that incorporates 10 nmol of total deoxyribonucleoside triphosphates into acid precipitable DNA within 60 min at +65° C. In some embodiments of any of the aspects, the DNA polymerase is provided at a concentration of at least 0.1 U/µL, at least 0.2 U/µL, at least 0.3 U/µL, at least 0.4 U/µL, at least 0.5 U/µL, at least 0.6 U/µL, at least 0.7 U/µL, at least 0.8 U/µL, at least 0.9 U/µL, at least 1 U/µL, at least 2 U/µL, at least 3 U/µL, at least 4 U/µL, at least 5 U/µL, at least 6 U/µL, at least 7 U/µL, at least 8 U/µL, at least 9 U/µL, at least 10 U/µL, at least 20 U/µL, at least 30 U/µL, at least 40 U/µL, at least 50 U/µL, at least 60 U/µL, at least 70 U/µL, at least 80 U/µL, at least 90 U/µL, at least 100 U/µL or more.
In some embodiments of any of the aspects, the sample is contacted with a second set of primers (i.e., after the first set of RT primers). In some embodiments of any of the aspects, the second set of primers is specific to the target RNA. In some embodiments of any of the aspects, the second set of primers is specific (i.e., binds specifically through complementarity) to cDNA, in other words, the DNA produced in the RT step that is complementary to the target RNA. The second set of primers can be specific to any region of the target RNA. In some embodiments of any of the aspects, the second set of primers comprises at least one barcode region. In some embodiments of any of the aspects, the second set of primers comprises 1, 2, 3, 4, 5, or more barcode regions.
In some embodiments, a forward primer, e.g., in the second set of primers is about 50 nucleotides long. In some embodiments, a reverse primer, e.g., in the second set of primers is about 80 nucleotides long. In some embodiments, a primer, e.g., in the second set of primers is about 40-100 nucleotides long. As a non-limiting example, the primer is 40 nucleotides (nt) long, 41 nt, 42 nt, 43 nt, 44 nt, 45 nt, 46 nt, 47 nt, 48 nt, 49 nt, 50 nt, 51 nt, 52 nt, 53 nt, 54 nt, 55 nt, 56 nt, 57 nt, 58 nt, 59 nt, 60 nt, 61 nt, 62 nt, 63 nt, 64 nt, 65 nt, 66 nt, 67 nt, 68 nt, 69 nt, 70 nt, 71 nt, 72 nt, 73 nt, 74 nt, 75 nt, 76 nt, 77 nt, 78 nt, 79 nt, 80 nt, 81 nt, 82 nt, 83 nt, 84 nt, 85 nt, 86 nt, 87 nt, 88 nt, 89 nt, 90 nt, 91 nt, 92 nt, 93 nt, 94 nt, 95 nt, 96 nt, 97 nt, 98 nt, 99 nt, 100 nt or more. In some embodiments of any of the aspects, at least one primer, e.g., from the second set of primers, comprises sequences selected from Table 4. In some embodiments of any of the aspects, the second set of primers comprises forward and reverse amplification primers.
In some embodiments of any of the aspects, a forward primer in the second set of primers comprises from 5′ to 3′: (a) an adaptor region; and (b) an adaptor-binding region that is identical or substantially identical to the adaptor region of a primer in the first set of barcoded primers. In some embodiments of any of the aspects, a forward primer in the second set of primers comprises from 5′ to 3′: (a) an adaptor region; (b) a third barcode region; and (c) an adaptor-binding region that is identical or substantially identical to the adaptor region of a primer in the first set of barcoded primers.
In some embodiments of any of the aspects, the adaptor region, e.g., of a forward primer in the second set of primers, comprises a sequencing adaptor region that allows for a high throughput sequencing method (e.g., P5 adaptor or P7 adaptor). In some embodiments of any of the aspects, the adaptor-binding region, e.g., of a forward primer in the second set of primers, specifically binds to the reverse complement of the adaptor region (e.g., PCR adaptor) of a primer in the first set of primers. In some embodiments of any of the aspects, the PCR adaptor-binding region, e.g., of a forward primer in the second set of primers, comprises SEQ ID NO: 13. In some embodiments of any of the aspects, a forward primer in the second set of primers, e.g., comprising the adaptor region and the adaptor-binding region, comprises SEQ ID NO: 14 or a nucleic acid sequence that is at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SEQ ID NO: 14 that maintains the same function (e.g., amplification adaptor and/or sequencing adaptor). In some embodiments of any of the aspects, a forward primer in the second set of primers allows the amplification product to specifically bind to a sequencing primer (e.g., read 1 primer, SEQ ID NO: 15).
In some embodiments of any of the aspects, a reverse primer in the second set of primers comprises, from 5′ to 3′:(a) an adaptor region; (b) a second barcode region; and (c) a target-binding region that is identical or substantially identical to at least one target RNA. In some embodiments of any of the aspects, a reverse primer in the second set of primers comprises, from 5′ to 3′:(a) an adaptor region; and (b) a region that is identical or substantially identical to at least one target RNA. In some embodiments of any of the aspects, the adaptor region, e.g., of a reverse primer in the second set of primers, comprises a sequencing adaptor region that allows for a high throughput sequencing method (e.g., P7 adaptor or P5 adaptor).
In some embodiments of any of the aspects, the adaptor region, e.g., of a reverse primer in the second set of primers, comprises SEQ ID NO: 16 or a nucleic acid sequence that is at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SEQ ID NO: 16 that maintains the same function (e.g., sequencing adaptor). In some embodiments of any of the aspects, a reverse primer in the second set of primers allows the amplification product to specifically bind to a sequencing primer (e.g., read 2 primer, SEQ ID NO: 17).
In some embodiments of any of the aspects, a barcode region on a primer in the second set of primers is shorter than the barcode region on a primer in the first set of primers. In some embodiments of any of the aspects, a barcode region on a primer in the second set of primers is at least 8 nucleotides long. As a non-limiting example, the barcode region can be 10 nucleotides (nt) long, 11 nt, 12 nt, 13 nt, 14 nt, 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32 nt, 33 nt, 34 nt, 35 nt long or more. In some embodiments of any of the aspects, the barcode region of a first primer in the second set of barcoded primers is a Hamming distance of at least 5 from each other barcode region of any other primer in the second set of barcoded primers. In some embodiments of any of the aspects, the barcode region of a first primer in the second set of barcoded primers is a Hamming distance of 4-6 from each other barcode region of any other primer in the second set of barcoded primers. In some embodiments of any of the aspects, the barcode region of a first primer in the second set of barcoded primers is a Hamming distance of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10, or more from each other barcode region of any other primer in the second set of barcoded primers (or barcode region in a first, third, fourth, etc. set of barcoded primers).
In some embodiments of any of the aspects, the second or third barcode region on a primer in the second set of primers comprises one of SEQ ID NOs: 18-989 or a nucleic acid sequence that is at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to one of SEQ ID NOs: 18-989 that maintains the same function (e.g., identification). In some embodiments of any of the aspects, the first barcode region on the first primer or set of first primers comprises one of SEQ ID NOs: 18-29 or SEQ ID NO: 992 (see e.g., Table 4 or
In some embodiments of any of the aspects, a target-binding region is complementary or substantially complementary to and permits hybridization to at least one target RNA. In some embodiments of any of the aspects, the target-binding region permits hybridization to at least one target RNA under conditions permitting the generation of a reverse transcription product. In some embodiments of any of the aspects, the target-binding region, e.g., of a primer in the second set of primers, is about 20 nucleotides long. In some embodiments, the target-binding region, e.g., of a primer in the second set of primers, is about 15-35 nucleotides long. As a non-limiting example, the target-binding region can be 15 nucleotides (nt) long, 16 nt, 17 nt,18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32 nt, 33 nt, 34 nt, 35 nt long or more. In some embodiments, the target-binding region, e.g., of a primer in the second set of primers, has a Tm of about 60° C.-62° C., e.g., at least 60° C., at least 60.5° C., at least 61° C., at least 61.5° C., at least 62° C. or more.
In some embodiments of any of the aspects, the target-binding region of a primer in the second set of primers binds to a region of SARS-CoV-2 N gene or S gene (see e.g., SEQ ID NO: 1001-1002). In some embodiments of any of the aspects, the target-binding region of a primer in the first set of primers comprises one of SEQ ID NO: 4 (N#1 _PCR), SEQ ID NO: 6 (N#2 _PCR), SEQ ID NO: 8 (del6970_PCR), SEQ ID NO: 10 (D614 _PCR), SEQ ID NO: 12 (positive control PCR) or a nucleic acid sequence that is at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to one of SEQ ID NOs: 4, 6, 8, 10, or 12 that maintains the same function (e.g., binding to the target RNA or positive control RNA) (see e.g., Table 4).
In some embodiments of any of the aspects, the reverse primer in the second set of primers comprises, from 5′ to 3′: (a) an adaptor region (e.g., SEQ ID NO: 16); (b) optionally, a second barcode region (e.g., one of 18-29 or SEQ ID NO: 992 or reverse complement thereof); and (c) a target-binding region that is identical or identical complementary to and permits hybridization to at least one target RNA (e.g., one of SEQ ID NOs: 4, 6, 8, 10, or 12). SEQ ID NO: 1008 is an exemplary reverse primer from the second set of primers, comprising from 5′ to 3′: SEQ ID NO: 16 (bolded), the reverse complement of SEQ ID NO: 992, and SEQ ID NO: 4 (bold italicized).
SEQ ID NO: 1008, 85 nt (see e.g.,
In some embodiments of any of the aspects, the forward and/or reverse primers of the second set of primers are present in the amplification reaction at a concentration of at least 0.125 uM. In some embodiments of any of the aspects, the forward and/or reverse primers of the second set of primers are present in the amplification reaction at a concentration of at least 0.25 uM. In some embodiments of any of the aspects, the forward and/or reverse primers of the second set of primers are present in the amplification reaction at a concentration of at least 0.5 uM. In some embodiments of any of the aspects, the forward and/or reverse primers of the second set of primers are present in the amplification reaction at a concentration of at least 25 nM, at least 30 nM, at least 35 nM, at least 40 nM, at least 45 nM, at least 50 nM, at least 55 nM, at least 60 nM, at least 65 nM, at least 70 nM, at least 75 nM, at least 80 nM, at least 85 nM, at least 90 nM, at least 95 nM, at least 100 nM, at least 105 nM, at least 110 nM, at least 115 nM, at least 120 nM, at least 125 nM, at least 130 nM, at least 135 nM, at least 140 nM, at least 145 nM, at least 150 nM, at least 160 nM, at least 170 nM, at least 180 nM, at least 190 nM, at least 200 nM, at least 210 nM, at least 220 nM, at least 230 nM, at least 240 nM, at least 250 nM, at least 260 nM, at least 270 nM, at least 280 nM, at least 290 nM, at least 300 nM, at least 310 nM, at least 320 nM, at least 330 nM, at least 340 nM, at least 350 nM, at least 360 nM, at least 370 nM, at least 380 nM, at least 390 nM, at least 400 nM, at least 410 nM, at least 420 nM, at least 430 nM, at least 440 nM, at least 450 nM, at least 460 nM, at least 470 nM, at least 480 nM, at least 490 nM, at least 500 nM.
In some embodiments of any of the aspects, specific combinations of primers in the first and second set of primers are used for the reverse transcription and amplification reactions. In some embodiments of any of the aspects, the same set of sequencing primers (i.e., the third set of primers) can be used for sequencing the amplification products (see e.g., Table 15).
For the RT primer and RV PCR primers, the SEQ ID NOs correspond to the target-binding regions of the specific primers; as described herein, the full primers can also comprise adaptor regions and/or barcode regions. For the FW PCR primer and sequencing primers, the SEQ ID NOs correspond to the full-length primer, or a portion thereof.
Described herein are protector nucleic acids (or simply “protectors”) that are capable of reducing barcode crosstalk. Such barcode crosstalk can arise due to binding of primers from the first set of the primers (i.e., RT primers) to amplification products of the RT product during the amplification step. As used herein, the term “protector nucleic acid” denotes a single-stranded nucleic acid that hybridizes to a region of an amplification product of the reverse transcription product (or RT primers) and prevents extension of the RT primer during the amplification step. Specifically, the protector nucleic acid can hybridize to an amplification product that is identical, or the same sense, as the target RNA, and comprises a region that is complementary to the target-binding region of an RT primer from the first set of primers. In some embodiments of any of the aspects, the protector nucleic acid can be DNA, RNA, modified DNA, modified RNA, synthetic DNA, synthetic RNA, or another synthetic nucleic acid.
In some embodiments of any of the aspects, step (c) (amplification step) further comprises adding a protector nucleic acid to the amplification reaction mixture. In this way, the amplification reaction of step (c) comprises contacting the reverse transcription product (or pooled reverse transcription product mixture or amplification product thereof) with at least one protector nucleic acid (see e.g., upper panel of
In some embodiments of any of the aspects, region (a)(ii) of the protector nucleic acid (also known as the “toe-hold region” or “3′ complementary region”) is at least 15 nucleotides long. In some embodiments of any of the aspects, the 3′ complementary region of the protector nucleic acid is at most 30 nucleotides long. In some embodiments of any of the aspects, the 3′ complementary region of the protector nucleic acid is at least 5 nt, 6 nt, 7 nt, 8 nt, 9 nt, 10 nt, 11 nt, 12 nt, 13 nt, 14 nt, 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32 nt, 33 nt, 34 nt, 35 nt, 36 nt, 37 nt, 38 nt, 39 nt, 40 nt, 41 nt, 42 nt, 43 nt, 44 nt, 45 nt, 46 nt, 47 nt, 48 nt, 49 nt, 50 nt or more long.
In some embodiments of any of the aspects, an amplification product of the reverse transcription product comprises one of SEQ ID NOs: 1009-1012 or a nucleic acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to one of SEQ ID NOs: 1009-1012 that maintains the same function (RNA target region).
SEQ ID NO: 1009, N#1 target amplification product (showing only the RNA target region, e.g., nt 131-197 of SEQ ID NO: 1001); bolded text indicates where the target-binding region of the RT primer (e.g., SEQ ID NO: 3) binds (nt 49-67 of SEQ ID NO: 1009); double-underlined text indicates where an exemplary protector nucleic acid binds (e.g., SEQ ID NO: 1013); the N#1 reverse transcription product corresponds to the reverse complement of SEQ ID NO: 1009.
AAGGAAGACCTTAAATT
SEQ ID NO: 1010, N#2 target amplification product (showing only the RNA target region, e.g., nt 876-1002 of SEQ ID NO: 1001); bolded text indicates where the target-binding region of the RT primer (e.g., SEQ ID NO: 5) binds (nt 111-127 of SEQ ID NO: 1010); double-underlined text indicates where an exemplary protector nucleic acid binds (e.g., SEQ ID NO: 1014); the N#2 reverse transcription product corresponds to the reverse complement of SEQ ID NO: 1010.
CCTTCGGGAACGTGGTTGACCTACACA
SEQ ID NO: 1011, del6970 target amplification product (showing only the RNA target region, e.g., nt 163-233 of SEQ ID NO: 1002); bolded text indicates where the target-binding region of the RT primer (e.g., SEQ ID NO: 7) binds (nt 53-71 of SEQ ID NO: 1011); double-underlined text indicates where an exemplary protector nucleic acid binds (e.g., SEQ ID NO: 1015); the del6970 reverse transcription product corresponds to the reverse complement of SEQ ID NO: 1011.
TGGGACCAATGGTACTAAGAG
SEQ ID NO: 1012, D614 target amplification product showing only the RNA target region, (e.g., nt 1785-1861 of SEQ ID NO: 1002); bolded text indicates where the target-binding region of the RT primer (e.g., SEQ ID NO: 9) binds (nt 59-77 of SEQ ID NO: 1012); double-underlined text indicates where an exemplary protector nucleic acid binds (e.g., SEQ ID NO: 1016); the D614 reverse transcription product corresponds to the reverse complement of SEQ ID NO: 1012.
ATCAGGATGTTAACTGCACAGAAGTCC
In some embodiments of any of the aspects, the protector nucleic acid is complementary or substantially complementary to a region of at least one of SEQ ID NOs: 1009-1012. In some embodiments of any of the aspects, the protector nucleic acid is complementary or substantially complementary to a 3′ region of at least one of SEQ ID NOs: 1009-1012. In some embodiments of any of the aspects, the protector nucleic acid is complementary or substantially complementary to a region of at least one of SEQ ID NOs: 1009-1012 that overlaps with the region bound by the target-binding region of an RT primer (e.g., the bolded regions of SEQ ID NOs: 1009-1012).
SEQ ID NOs: 1021-1024 represent exemplary protector nucleic acids comprising: (i) a 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., one of SEQ ID NOs: 3, 5, 7, 9); and (ii) a 30-nt-long 3′ region (i.e., toe-hold region) that is complementary to the target RNA sequence downstream of the target-binding region of the primer in the first set of primer (e.g., one of SEQ ID NOs: 3, 5, 7, 9) on the reverse transcription product.
SEQ ID NO: 1021, exemplary protector nucleic acid for the N#1 reverse transcription product (e.g., SEQ ID NO: 1009); bolded region indicates 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., SEQ ID NO: 3) and unformatted text is the 30-nt-long toehold region: AATTTAAGGTCTTCCTTGCCATGTTGAGTGAGAGCGGTGAACCAAGACG
SEQ ID NO: 1022, exemplary protector nucleic acid for the N#2 reverse transcription product (e.g., SEQ ID NO: 1010); bolded region indicates 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., SEQ ID NO: 5) and unformatted text is the 30-nt-long toehold region: TGTGTAGGTCAACCACGTTCCCGAAGGTGTGACTTCCATGCCAATGC
SEQ ID NO: 1023, exemplary protector nucleic acid for the del6970 reverse transcription product (e.g., SEQ ID NO: 1011); bolded region indicates 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., SEQ ID NO: 7) and unformatted text is the 30-nt-long toehold region: CTCTTAGTACCATTGGTCCCAGAGACATGTATAGCATGGAACCAAGTAA
SEQ ID NO: 1024, exemplary protector nucleic acid for the D614 reverse transcription product (e.g., SEQ ID NO: 1012); bolded region indicates 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., SEQ ID NO: 9) and unformatted text is the 30-nt-long toehold region: GGACTTCTGTGCAGTTAACATCCTGATAAAGAACAGCAACCTGGTTAGA
SEQ ID NOs: 1013-1016 represent exemplary protector nucleic acids comprising: (i) a 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., one of SEQ ID NOs: 3, 5, 7, 9); and (ii) a 20-nt-long 3′ region (i.e., toe-hold region) that is complementary to the target RNA sequence downstream of the target-binding region of the primer in the first set of primer (e.g., one of SEQ ID NOs: 3, 5, 7, 9) on the reverse transcription product.
SEQ ID NO: 1013, exemplary protector nucleic acid for the N#1 reverse transcription product (e.g., SEQ ID NO: 1009); bolded region indicates 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., SEQ ID NO: 3) and unformatted text is the 20-nt-long toehold region: AATTTAAGGTCTTCCTTGCCATGTTGAGTGAGAGCGGTG
SEQ ID NO: 1014, exemplary protector nucleic acid for the N#2 reverse transcription product (e.g., SEQ ID NO: 1010); bolded region indicates 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., SEQ ID NO: 5) and unformatted text is the 20-nt-long toehold region: TGTGTAGGTCAACCACGTTCCCGAAGGTGTGACTTCC
SEQ ID NO: 1015, exemplary protector nucleic acid for the del6970 reverse transcription product (e.g., SEQ ID NO: 1011); bolded region indicates 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., SEQ ID NO: 7) and unformatted text is the 20-nt-long toehold region: CTCTTAGTACCATTGGTCCCAGAGACATGTATAGCATGG
SEQ ID NO: 1016, exemplary protector nucleic acid for the D614 reverse transcription product (e.g., SEQ ID NO: 1012); bolded region indicates 5′ region that is identical to the target-binding region of a primer in the first set of primers (e.g., SEQ ID NO: 9) and unformatted text is the 20-nt-long toehold region: GGACTTCTGTGCAGTTAACATCCTGATAAAGAACAGCAA
In some embodiments of any of the aspects, the protector nucleic acid comprises one of SEQ ID NOs: 1013-1016 or SEQ ID NOs: 1021-1024 or functional fragment thereof or a nucleic acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to one of SEQ ID NOs: 1013-1016 or SEQ ID NOs: 1021-1024 that maintains the same function (e.g., protector nucleic acid, reduction of barcode crosstalk during amplification step).
In some embodiments of any of the aspects, the protector nucleic acid comprises a nucleic acid modification capable of inhibiting synthesis of a complementary strand by a polymerase. In some embodiments of any of the aspects, the protector nucleic acid comprises a 3′ nucleic acid modification capable of inhibiting synthesis of a complementary strand by a polymerase. In some embodiments of any of the aspects, the 3′ nucleic acid modification is selected from the group consisting of: (a) an inverted base; (b) a spacer; (c) a dideoxynucleotide; (d) a base that is not complementary to the target RNA; and (e) a non-canonical base.
In some embodiments of any of the aspects, the nucleic acid modification capable of inhibiting synthesis of a complementary strand by a polymerase is an inverted nucleotide. As used herein, the term “inverted nucleotide” refers to a nucleotide that is inserted by a DNA polymerase inverted onto a DNA molecule; e.g., the 3′ OH group is used for polymerization, as opposed to the 5′ OH group. In some embodiments of any of the aspects, the inverted nucleotide is an inverted dT, inverted dA, inverted dG, or inverted dC. In some embodiments of any of the aspects, the inverted nucleotide is a 3′ Inverted dT. Inverted dT can be incorporated at the 3′-end of the protector nucleic acid, leading to a 3′-3′ linkage which inhibits both degradation by 3′ exonucleases and extension by DNA polymerases.
In some embodiments of any of the aspects, the nucleic acid modification capable of inhibiting synthesis of a complementary strand by a polymerase is a spacer. In some embodiments of any of the aspects, the spacer is located at an internal location of one or both primers. Non-limiting examples of spacers include the C3 spacer (phosphoramidite); hexanediol; 1′,2′-Dideoxyribose (dSpacer; e.g., an abasic site); Spacer 9 (a triethylene glycol spacer); and Spacer 18 (an 18-atom hexa-ethyleneglycol spacer).
In some embodiments of any of the aspects, the nucleic acid modification capable of inhibiting synthesis of a complementary strand by a polymerase is a dideoxynucleotide. Dideoxynucleotides are chain-elongating inhibitors of DNA polymerase, e.g., used in the Sanger method for DNA sequencing. The dideoxynucleotides, when attached or incorporated at the 3′ end of an oligonucleotide or a growing strand do not present a substrate for elongation by DNA polymerase. Dideoxynucleotides are also known as 2′,3′ because both the 2′ and 3′ positions on the ribose lack hydroxyl groups, and are abbreviated as ddNTPs (ddGTP, ddATP, ddTTP and ddCTP). In some embodiments of any of the aspects, the nucleic acid modification capable of inhibiting synthesis of a complementary strand by a polymerase is selected from the group consisting of ddGTP, ddATP, ddTTP and ddCTP.
In some embodiments of any of the aspects, the nucleic acid modification capable of inhibiting synthesis of a complementary strand by a polymerase is a base that is not complementary to the target RNA. As a non-limiting example, A-T and G-C represent proper base-pairing; as such, non-limiting examples of non-complementary base-paring include: A-G, A-C, A-A, G-T, G-G, C-A, T-T, T-C, T-G, C-C, C-T, or C-A. If the final 3′ nucleotide of an oligonucleotide is not complementary to the template, it cannot be extended.
In some embodiments of any of the aspects, the nucleic acid modification capable of inhibiting synthesis of a complementary strand by a polymerase is a non-canonical base. In some embodiments of any of the aspects, the non-canonical bases is isocytosine (iso-dC). In some embodiments of any of the aspects, the non-canonical bases is isoguanosine (iso-dG).
In some embodiments of any of the aspects, the protector nucleic acid displaces a primer from the first set of primers from an amplification product of the reverse transcription product. In some embodiments of any of the aspects, the protector nucleic acid inhibits or substantially inhibits a primer from the first set of primers from being extended by the DNA polymerase. In some embodiments of any of the aspects, the protector nucleic acid has a higher binding affinity to an amplification product of the reverse transcription product than the target-binding region of the at least one primer from the first set of primers.
In some embodiments of any of the aspects, the protector nucleic acid has a higher Tm than the target-binding region of the at least one primer from the first set of primers. In some embodiments of any of the aspects, the protector nucleic acid has a Tm that is at least 1° C., at least 2° C., at least 3° C., at least 4° C., at least 5° C., at least 6mL, at least 7° C., at least 8° C., at least 9° C., at least 10° C., at least 11° C., at least 12° C., at least 13° C., at least 14° C., at least 15° C., at least 16° C., at least 17° C., at least 18° C., at least 19° C., or at least 20° C. higher than the target-binding region of the at least one primer from the first set of primers.
In some embodiments of any of the aspects, step (c) (the amplification step) further comprises contacting at least one primer from the first set of primers, e.g., if present in the amplification reaction, with a protector nucleic acid (see e.g., lower panel of
In some embodiments of any of the aspects, the protector nucleic acid is at least 15 nucleotides long. In some embodiments of any of the aspects, the protector nucleic acid is at least 30 nucleotides long. In some embodiments of any of the aspects, the protector nucleic acid is at least 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32 nt, 33 nt, 34 nt, 35 nt, 36 nt, 37 nt, 38 nt, 39 nt, 40 nt, 41 nt, 42 nt, 43 nt, 44 nt, 45 nt, 46 nt, 47 nt, 48 nt, 49 nt, 50 nt, 51 nt, 52 nt, 53 nt, 54 nt, 55 nt, 56 nt, 57 nt, 58 nt, 59 nt, 60 nt, 61 nt, 62 nt, 63 nt, 64 nt, 65 nt, 66 nt, 67 nt, 68 nt, 69 nt, 70 nt or more long.
In some embodiments of any of the aspects, the protector nucleic acid is present at a concentration that is greater than the concentration of the primers in the first set of primers. In some embodiments of any of the aspects, the protector nucleic acid is present at a concentration of at least 0.5 uM. In some embodiments of any of the aspects, the protector nucleic acid is present at a concentration of at least 2.0 uM. In some embodiments of any of the aspects, the protector nucleic acid is present, e.g., in the amplification reaction, at a concentration of at least 0.1 uM, at least 0.2 uM, at least 0.3 uM, at least 0.4 uM, at least 0.5 uM, at least 0.6 uM, at least 0.7 uM, at least 0.8 uM, at least 0.9 uM, at least 1 uM, at least 2 uM, at least 3 uM, at least 4 uM, at least 5 uM, at least 6 uM, at least 7 uM, at least 8 uM, at least 9 uM, at least 10 uM, or more.
In some embodiments of any of the aspects, prior to step (c) the first set of barcoded primers is substantially removed, for example, using a bead-based purification method or a spin-column-based purification method, and during step (c) the reverse transcription product or amplification product thereof is contacted with at least one protector nucleic acid.
In some embodiments of any of the aspects, step (c) comprises a nucleic acid amplification method. In some embodiments of any of the aspects, the amplification step comprises 35-50 rounds or cycles of amplification in which the DNA polymerase replicates the cDNA using forward and reverse primers in the second set of primers. In some embodiments of any of the aspects, the product of the amplification step comprises a barcoded dsDNA library, each comprising a region that is complementary to a portion of at least one target RNA.
In some embodiments of any of the aspects, the amplification step comprises contacting the pooled reverse transcription product mixture with a DNA polymerase, a second set of primers, optionally at least one protector nucleic acid, and an amplification reaction buffer. In some embodiments of any of the aspects, the amplification step further comprises contacting the reverse transcription product with carrier nucleic acid, e.g., poly-A60 DNA oligonucleotide and/or E. coli tRNA. In some embodiments of any of the aspects, the carrier nucleic acid can be provided at a similar concentration as in the RT step.
In some embodiments of any of the aspects, step (c) (the amplification step) further comprises contacting the reverse transcription product with Uracil-DNA Glycosylase (UDG or UNG) enzyme. UNG can be used to eliminate carryover polymerase chain reaction (PCR) products. This method modifies PCR products such that in a new reaction, any residual products from any previous PCR amplifications are digested and prevented from amplifying, but the true cDNA templates are unaffected. PCR synthesizes abundant amplification products each round, but contamination of further rounds of PCR with trace amounts of these products, called carry-over contamination (e.g., on surfaces of a laboratory), yields false positive results. Carry-over contamination from some previous PCR can be a significant problem, due both to the abundance of PCR products, and to the ideal structure of the contaminant material for re-amplification. In some embodiments, carry-over contamination can be controlled by the following two steps: (i) incorporating dUTP in all PCR products (e.g., by substituting dUTP for dTTP, either completely or partially, or by incorporating uracil during synthesis of primers); and (ii) treating all subsequent fully preassembled starting reactions with uracil DNA glycosylase (UDG), followed by thermal inactivation of UDG. UDG cleaves the uracil base from the phosphodiester backbone of uracil-containing DNA, but has no effect on natural (i.e., thymine-containing) DNA. The resulting apyrimidinic sites block replication by DNA polymerases, and are very labile to acid/base hydrolysis. Because UDG does not react with dTTP, and is also inactivated by heat denaturation prior to the actual PCR, carry-over contamination of PCRs can be controlled effectively if the contaminants contain uracils in place of thymines.
In some embodiments of any of the aspects, the amplification reaction buffer comprises dNTPs (e.g., dATP, dGTP, dCTP, and dTTP). In some embodiments of any of the aspects, the amplification reaction buffer comprises UNG and dNTPs (e.g., dATP, dGTP, dCTP, dUTP, and/or +/dTTP). In some embodiments of any of the aspects, the reaction buffer maintains the reaction at specific optimal pH (e.g., 8.3) and can include such components as water, Tris-HCl, KCl, MgCl2, and other buffers or salts.
In some embodiments of any of the aspects, the amplification reaction buffer comprises a detectable marker, e.g., for the presence of amplification product, e.g., dsDNA. In some embodiments of any of the aspects, the amount of amplification product can be determined by quantitative PCR (QPCR) or real-time PCR methods, e.g., using a set of primers specific to the amplification product and/or SYBR® GREEN, or an equivalent dye, or a detectable probe. Methods of qPCR and real-time qPCR are known in the art.
In some embodiments of any of the aspects, step (c) (the amplification step) comprises: (i) a denaturation step; and (ii) an annealing step; and (iii) an extension step. In some embodiments of any of the aspects, step (c) (e.g., the amplification step) is performed in a thermocycler. In some embodiments of any of the aspects, (i)-(iii) of the amplification (e.g., PCR) are repeated at least 30 times (e.g., 30-40 times). In some embodiments of any of the aspects, (i) and (ii) of the amplification (e.g., PCR) are repeated at least 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more times.
In some embodiments of any of the aspects, step (c) (the amplification step) further comprises an initial denaturation step before the first step (i) at least 95° C. for at least 60 seconds. Such an initial denaturation step can denature the cDNA, the UNG enzyme, and/or the reverse transcriptase. In some embodiments of any of the aspects, the initial denaturation step is performed at temperature of at least 90° C., at least 91° C., at least 92° C., at least 93° C., at least 94° C., at least 95° C., at least 96° C., at least 97° C., at least 98° C., at least 99° C., at least 99° C., or at least 99.5° C. In some embodiments of any of the aspects, the initial denaturation step is performed for at least 10 seconds, at least 20 second, at least 30 seconds, at least 40 seconds, at least 50 seconds, at least 1 minute, at least 2 minutes, at least 3 minutes, at least 4 minutes, at least 5 minutes, at least 6 minutes, at least 7 minutes, at least 8 minutes, at least 9 minutes, at least 10 minutes or more.
In some embodiments of any of the aspects, step (i) of the amplification (e.g., the denaturation step) is performed at a temperature of at least 95° C. for at least 15 seconds (sec). In some embodiments of any of the aspects, step (i) of the amplification (e.g., the denaturation step) is performed at a temperature of at least 90° C., at least 91° C., at least 92° C., at least 93° C., at least 94° C., at least 95° C., at least 96° C., at least 97° C., at least 98° C., at least 99° C., at least 99° C., or at least 99.5° C. In some embodiments of any of the aspects, step (i) of the amplification (e.g., the denaturation step) is performed for at least 5 sec, at least 6 sec, at least 7 sec, at least 8 sec, at least 9 sec, at least 10 sec, at least 11 sec, at least 12 sec, at least 13 sec, at least 14 sec, at least 15 sec, at least 16 sec, at least 17 sec, at least 18 sec, at least 19 sec, at least 20 sec, at least 21 sec, at least 22 sec, at least 23 sec, at least 24 sec, at least 25 sec, at least 26 sec, at least 27 sec, at least 28 sec, at least 29 sec, at least 30 sec or more.
In some embodiments of any of the aspects, step (ii) of the amplification (e.g., the annealing step) is performed at a temperature of at least 60° C. for at least 30 seconds. In some embodiments of any of the aspects, step (ii) of the amplification (e.g., the annealing step) is performed at a temperature of at least 60° C. In some embodiments of any of the aspects, step (ii) of the amplification (e.g., the annealing step) is performed at a temperature of at least 45° C., at least 46° C., at least 47° C., at least 48° C., at least 49° C., at least 50° C., at least 51° C., at least 52° C., at least 53° C., at least 54° C., at least 55° C., at least 56° C., at least 57° C., at least 58° C., at least 59° C., at least 60° C., at least 61° C., at least 62° C., at least 63° C., at least 64° C., at least 65° C., at least 66° C., at least 67° C., at least 68° C., at least 69° C., at least 70° C., at least 71° C., at least 72° C., at least 73° C., at least 74° C., at least 75° C. or more. In some embodiments of any of the aspects, step (ii) of the amplification (e.g., the annealing step) is performed for at least 30 seconds. In some embodiments of any of the aspects, step (ii) of the amplification (e.g., the annealing step) is performed for at least 15 sec, at least 20 sec, at least 25 sec, at least 30 sec, at least 35 sec, at least 40 sec, at least 45 sec, at least 50 sec, at least 55 sec, at least 60 sec, at least 65 sec, at least 70 sec, at least 75 sec, at least 80 sec, at least 85 sec, at least 90 sec, at least 95 sec, at least 100 sec, at least 105 sec, at least 110 sec, at least 115 sec, or at least 120 sec or more.
In some embodiments of any of the aspects, the at least first iteration of step (ii) of the amplification (e.g., the annealing step) is performed at a lower temperature than subsequent iterations of step (ii). In some embodiments of any of the aspects, the first two iterations of step (ii) of the amplification (e.g., the annealing step) are performed at a temperature of at least 52° C. In some embodiments of any of the aspects, the first 1, 2, 3, 4, 5, or more iterations of step (ii) of the amplification (e.g., the annealing step) are performed at a temperature of at least 52° C. In some embodiments of any of the aspects, the first 1, 2, 3, 4, 5, or more iterations of step (ii) (e.g., the annealing step) of the amplification are performed at a temperature of at least 58° C. In some embodiments of any of the aspects, the first 1, 2, 3, 4, 5, or more iterations of step (ii) (e.g., the annealing step) of the amplification are performed at a temperature of at least 45° C., at least 46° C., at least 47° C., at least 48° C., at least 49° C., at least 50° C., at least 51° C., at least 52° C., at least 53° C., at least 54° C., at least 55° C., at least 56° C., at least 57° C., at least 58° C., at least 59° C., at least 60° C., at least 61° C., at least 62° C., at least 63° C., at least 64° C., or at least 65° C.
In some embodiments of any of the aspects, the subsequent iterations of step (ii) (e.g., after the first two iterations of step (ii), e.g., the annealing step) are performed at a temperature of at least 68° C. In some embodiments of any of the aspects, the subsequent iterations of step (ii) (e.g., after the first 1, 2, 3, 4, 5, or more iterations of step (ii) of the amplification, e.g., the annealing step) are performed at a temperature of at least 55° C., at least 56° C., at least 57° C., at least 58° C., at least 59° C., at least 60° C., at least 61° C., at least 62° C., at least 63° C., at least 64° C., at least 65° C., at least 66° C., at least 67° C., at least 68° C., at least 69° C., at least 70° C., at least 71° C., at least 72° C., at least 73° C., at least 74° C., or at least 75° C.
In some embodiments of any of the aspects, step (iii) of the amplification (e.g., the extension step) is performed at a temperature of at least 72° C. for at least 30 seconds. In some embodiments of any of the aspects, step (iii) of the amplification (e.g., the extension step) is performed at a temperature of at least 72° C. In some embodiments of any of the aspects, step (ii) of the amplification (e.g., the annealing step) is performed at a temperature of at least 55° C., at least 56° C., at least 57° C., at least 58° C., at least 59° C., at least 60° C., at least 61° C., at least 62° C., at least 63° C., at least 64° C., at least 65° C., at least 66° C., at least 67° C., at least 68° C., at least 69° C., at least 70° C., at least 71° C., at least 72° C., at least 73° C., at least 74° C., at least 75° C., at least 76° C., at least 77° C., at least 78° C., at least 79° C., or at least 80° C. or more. In some embodiments of any of the aspects, step (iii) of the amplification (e.g., the extension step) is performed for at least 30 seconds. In some embodiments of any of the aspects, step (ii) of the amplification (e.g., the extension step) is performed for at least 15 sec, at least 20 sec, at least 25 sec, at least 30 sec, at least 35 sec, at least 40 sec, at least 45 sec, at least 50 sec, at least 55 sec, at least 60 sec, at least 65 sec, at least 70 sec, at least 75 sec, at least 80 sec, at least 85 sec, at least 90 sec, at least 95 sec, at least 100 sec, at least 105 sec, at least 110 sec, at least 115 sec, or at least 120 sec, at least 130 sec, at least 140 sec, at least 150 sec, at least 160 sec, at least 170 sec, at least 180 sec, at least 190 sec, at least 200 sec or more.
In some embodiments of any of the aspects, step (c) (the amplification step) further comprises contacting at least one reverse transcription product (or at least one primer from the first set of primers, if present) with a protector nucleic acid, and wherein step (ii) (e.g., the annealing step) is performed at a temperature of at least 64° C. In some embodiments of any of the aspects, step (c) (the amplification step) further comprises contacting at least one reverse transcription product (or at least one primer from the first set of primers, if present) with a protector nucleic acid, and wherein step (ii) (e.g., the annealing step) is performed at a temperature of at least 72° C. In some embodiments of any of the aspects, step (c) (the amplification step) further comprises contacting at least one reverse transcription product (or at least one primer from the first set of primers, if present) with a protector nucleic acid, and wherein step (ii) (e.g., the annealing step) is performed at a temperature of at least 55° C., at least 56° C., at least 57° C., at least 58° C., at least 59° C., at least 60° C., at least 61° C., at least 62° C., at least 63° C., at least 64° C., at least 65° C., at least 66° C., at least 67° C., at least 68° C., at least 69° C., at least 70° C., at least 71° C., at least 72° C., at least 73° C., at least 74° C., or at least 75° C.
In some embodiments of any of the aspects, step (c) (the amplification step) further comprises contacting at least one reverse transcription product with a protector nucleic acid, and at least one of the following: (I) step (ii) (e.g., the annealing step) is performed at a temperature of at least 64° C.; (II) the 3′ complementary region (i.e., toe-hold region) of the protector nucleic acid is at least 20 nucleotides long; and/or (III) the protector nucleic acid is present at a concentration of at least 0.5 uM. In some embodiments of any of the aspects, (I) step (ii) is performed at a temperature of at least 64° C. In some embodiments of any of the aspects, (II) the 3′ complementary region of the protector nucleic acid is at least 20 nucleotides long. In some embodiments of any of the aspects, (III) the protector nucleic acid is present at a concentration of at least 0.5 uM. In some embodiments of any of the aspects, (I) step (ii) is performed at a temperature of at least 64° C.; and (II) the 3′ complementary region of the protector nucleic acid is at least 20 nucleotides long. In some embodiments of any of the aspects, (I) step (ii) is performed at a temperature of at least 64° C.; and (III) the protector nucleic acid is present at a concentration of at least 0.5 uM. In some embodiments of any of the aspects, (II) the 3′ complementary region of the protector nucleic acid is at least 20 nucleotides long; and (III) the protector nucleic acid is present at a concentration of at least 0.5 uM. In some embodiments of any of the aspects, (I) step (ii) is performed at a temperature of at least 64° C.; (II) the 3′ complementary region of the protector nucleic acid is at least 20 nucleotides long; and (III) the protector nucleic acid is present at a concentration of at least 0.5 uM.
In some embodiments of any of the aspects, step (c) (the amplification step) further comprises contacting at least one reverse transcription product with a protector nucleic acid, and at least one of the following: (I) step (ii) (e.g., the annealing step) is performed at a temperature of at least 68° C.; (II) the 3′ complementary region (i.e., toe-hold region) of the protector nucleic acid is at least 30 nucleotides long; and/or (III) the protector nucleic acid is present at a concentration of at least 2.0 uM. In some embodiments of any of the aspects, (I) step (ii) is performed at a temperature of at least 68° C. In some embodiments of any of the aspects, (II) the 3′ complementary region of the protector nucleic acid is at least 30 nucleotides long. In some embodiments of any of the aspects, (III) the protector nucleic acid is present at a concentration of at least 2.0 uM. In some embodiments of any of the aspects, (I) step (ii) is performed at a temperature of at least 68° C.; and (II) the 3′ complementary region of the protector nucleic acid is at least 30 nucleotides long. In some embodiments of any of the aspects, (I) step (ii) is performed at a temperature of at least 68° C.; and (III) the protector nucleic acid is present at a concentration of at least 2.0 uM. In some embodiments of any of the aspects, (II) the 3′ complementary region of the protector nucleic acid is at least 30 nucleotides long; and (III) the protector nucleic acid is present at a concentration of at least 2.0 uM. In some embodiments of any of the aspects, (I) step (ii) is performed at a temperature of at least 68° C.; (II) the 3′ complementary region of the protector nucleic acid is at least 30 nucleotides long; and (III) the protector nucleic acid is present at a concentration of at least 2.0 uM.
In some embodiments of any of the aspects, at least 2 batches of amplification products from step (c) (the amplification step) are combined in one container. As used herein, the term “batch” refers to the combined products from one reaction, e.g., the barcoded amplification products from a single amplification reaction. In some embodiments of any of the aspects, at least 10 amplification product batches from step (c) (the amplification step) are combined in one container. In some embodiments of any of the aspects, at least 2 batches, at least 3 batches, at least 4 batches, at least 5 batches, at least 6 batches, at least 7 batches, at least 8 batches, at least 9 batches, at least 10 batches, at least 15 batches, at least 20 batches, at least 25 batches, at least 30 batches, at least 35 batches, at least 40 batches, at least 45 batches, at least 50 batches, at least 55 batches, at least 60 batches, at least 65 batches, at least 70 batches, at least 75 batches, at least 80 batches, at least 85 batches, at least 90 batches, at least 95 batches, at least 100 batches or more of amplification products from step (c) are combined in one container.
In some embodiments of any of the aspects, the amplification step is performed in at most 30 minutes. As a non-limiting example, the amplification step is performed in at most 20 minutes, at most 25 minutes, at most 30 minutes, at most 40 minutes, at most 50 minutes, at most 60 minutes, at most 70 minutes, at most 80 minutes, at most 90 minutes, at most 100 minutes, at most 110 minutes, at most 120 minutes, at most 130 minutes, at most 140 minutes, at most 150 minutes, at most 160 minutes, at most 170 minutes, or at most 180 minutes. The specific conditions, e.g., of temperature, time, and buffer conditions can be varied as necessary to accommodate different DNA polymerases.
In one aspect, described herein is an amplification composition comprising at least two of the following: (a) a barcoded reverse transcription product; (b) a second set of primers; (c) DNA polymerase; (c) Uracil-DNA Glycosylase (UDG) enzyme; and/or (d) a protector nucleic acid. It is noted that a composition can comprise any one, two, three, or all four of the components listed above.
In some embodiments as described further herein, nucleic acid samples (e.g., amplified nucleic acid samples) can be sequenced. Accordingly, the detection method comprises sequencing the amplification products, thereby detecting at least one target RNA, if present, in the at least two samples Sequencing is the process of determining the order of monomers in a polymer. For example, DNA or RNA sequencing is the process of determining a nucleic acid sequence - the order of nucleotides in DNA or RNA, respectively, from a sample. DNA or RNA sequencing can also be referred to herein as “nucleic acid sequencing” or simply “sequencing.”
In some embodiments of any of the aspects, prior to step (d) (the sequencing step) the second set of barcoded primers are substantially removed. In some embodiments of any of the aspects, prior to step (d) (the sequencing step) the second set of barcoded primers are substantially removed using, for example, a bead-based purification method or a spin-column-based purification method.
Methods of sequencing a nucleic acid sequence are well known in the art. Briefly, a sample obtained from a subject can be contacted with one or more primers which specifically hybridize to a single-strand nucleic acid sequence flanking the target gene sequence and a complementary strand is synthesized. In some next-generation technologies, an adaptor (double or single-stranded) is ligated to nucleic acid molecules in the sample and synthesis proceeds from the adaptor or adaptor compatible primers. In some third-generation technologies, the sequence can be determined, e.g. by determining the location and pattern of hybridization of probes, or measuring one or more characteristics of a single molecule as it passes through a sensor (e.g. the modulation of an electrical field as a nucleic acid molecule passes through a nanopore).
In some embodiments as described herein, nucleic acid sequence data can be obtained from a sequencing platform. The term “sequencing platform” refers not only to a particular machine or device used for sequencing, but also to the particular chemical and/or physical approaches applied to extract or derive the sequence information from a sample. Exemplary methods of sequencing include, but are not limited to, Sanger sequencing, dideoxy chain termination, high-throughput sequencing, next generation sequencing, pyrosequencing (e.g., 454), sequencing by ligation and detection (SOLiD™), polony sequencing, sequencing by synthesis (e.g., Illumina™), ion semiconductor sequencing (e.g., Ion Torrent™), sequencing by hybridization, nanopore sequencing, HeliScope single molecule sequencing, single-molecule real-time sequencing (SMRT), RNAP sequencing, combinatorial probe anchor synthesis (cPAS), nanopore sequencing, chain termination sequencing, DNA nanoball sequencing, and the like. Methods and protocols for performing these sequencing methods are known in the art, see, e.g. “Next Generation Genome Sequencing” Ed. Michal Janitz, Wiley-VCH; “High-Throughput Next Generation Sequencing” Eds. Kwon and Ricke, Humanna Press, 2011; and Sambrook et al., Molecular Cloning: A Laboratory Manual (4 ed.), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., USA (2012); which are incorporated by reference herein in their entireties.
Early methods of DNA sequencing, or “first generation sequencing,” included Sanger sequencing (also known as chain terminator sequencing) and Maxam-Gilbert sequencing (also known as chemical sequencing). High-throughput sequencing methods have significantly reduced the cost and time to sequence nucleic acid samples. High-throughput sequencing can also be referred to herein as “next-generation sequencing”, “second-generation sequencing”, “third-generation sequencing”, or “massively parallel signature sequencing (MPSS)”.
Non-limiting examples of ion semiconductor sequencing platforms include Ion Torrent™ sequencing platforms comprising Ion S5™, Ion AmpliSeq™, Ion Proton™, Ion PGM™ (e.g., PGM 314™, PGM 316™, PGM 318™, PI™, or PII™), or Ion Chef™ platforms, from ThermoFisher™ (see e.g., U.S. Pat. 7,785,785, US 8552771, US8692298B2, US8731847B2, US8742472B2, US8841217B1, US8912580B2, US8912005B1, US8962366B2, US8963216B2, US9116117B2, US9128044B2, US9194000B2, US9239313B2, US9404920B2, US9841398B2, US9927393B2, US9944981B2, US9958414B2, US9960253B2, which are incorporated herein by reference in their entireties).
Pyrosequencing, an example of sequencing by synthesis, can also be referred to as 454 Life Sciences™ sequencing, 454 sequencing, or 454 pyrosequencing. Non-limiting examples of 454 pyrosequencing platforms include Genome Sequencer FLX™, GS20™, or GS Junior™ sequencing platforms. Pyrosequencing can also be performed on any the following sequencing platforms from QIAGEN: PyroMark Q48 Autoprep™, PyroMark Q24 Advanced™, PyroMark Q24™, or PyroMark Q96 ID™ (see e.g., U.S. Pat. US 6,210,891, US 7,323,305, US 8,748,102, US 8,765,380, which are incorporated herein by reference in their entireties).
Sequencing by synthesis methods include, for example, Illumina™ sequencing or Solexa™ sequencing. Non-limiting examples of Illumina™ sequencing platforms include cBot™, Genome Analyzer (GA)™, MiniSeq™, NextSeq™, MiSeq™, HiSeq 2500™, HiSeq 3000™, HiSeq 4000™, HiSeq X™ (e.g., Hiseq Ten™), iSeq™ 100, HiScan™, and iScan™ Illumina platforms (see e.g., U.S. Pat. US 7,414,116, US 7,329,860, US 7,589,315, US 7,960,685, US 8,039,817, US 8,071,962, US 8,158,926, US 8,241,573, US 8,778,848, US 8,778,849, US 8,244,479, US 8,315,817, US 8,412,467, US 8,422,031, US 8,446,573, US 8,914,241, US 8,965,076, US 9,012,022, US 9,068,220, US 9,121,063, US 9,365,898, US 9,410,977, US 9,512,422, US 9,540,690, US 9,670,535, US 9,752,186, US 9,777,325, US 9,994,687, US 10,005,083, US 10,053,730, US 10,152,776, which are incorporated herein by reference in their entireties).
Additional non-limiting examples of sequencing by synthesis platforms can comprise GeneReader™ from QIAGEN or Mini-20™ from AZCO Biotech™, Inc.
Non-limiting examples of SMRT sequencing platforms include C1™, C2™, P4-XL™, P5-C3™, P6-C4™, RS™, RS II™, or Sequel™ platforms, all from PacBio™ sequencing. SMRT sequencing can also be referred to as PacBio™ sequencing.
Non-limiting examples of cPAS sequencing platforms includeBGISEQ-50™, MGISEQ 200™, BGISEQ-500™, or MGISEQ-2000™ cPAS platforms. cPas sequencing platforms can also utilize DNA nanoball sequencing methods (e.g., BGISEQ-500™, or MGISEQ-2000™).
Non-limiting examples of SOLiD™ sequencing platforms include 5500x1 SOLiD™, 5500 SOLiD™, SOLiD 5500xl Wildfire™, or SOLiD 5500 Wildfire™, from Thermo Fisher Scientific™.
Non-limiting examples of Nanopore sequencing platforms include SmidgeION™, MinION™, and PromethION™, all from Oxford Nanopore Technologies™.
Non-limiting examples of chain termination sequencing platforms can comprise Microfluidic Sanger sequencing platforms or the Apollo 100™ platform (Microchip Biotechnologies™, Inc.).
Non-limiting examples of Polony sequencing platforms include a Polonator™ platform (Dover™) or fluorescence microscope and a computer controlled flowcell.
Non-limiting examples of HeliScope single molecule sequencing platforms include Helicos® Genetic Analysis System platform or the HeliScope™ Sequencer.
Additional non-limiting examples of sequencing methods include tunneling currents DNA sequencing, sequencing by hybridization, sequencing with mass spectrometry, microscopy-based techniques, RNA polymerase (RNAP) sequencing, or in vitro virus high-throughput sequencing.
In some embodiments of any of the aspects, the sequencing method is sequencing by synthesis. In some embodiments of any of the aspects, the sequencing method is Illumina™ sequencing. In some embodiments of any of the aspects, the sequencing method comprises contacting the amplification products with a third set of primers, comprising at least first and second sequencing primers. In some embodiments of any of the aspects, the first and second sequencing primers comprise at least one of SEQ ID NOs: 15 and 17 or a nucleic acid sequence that is at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to one of SEQ ID NOs: 15 and 17 that maintains the same function (e.g., priming for sequencing by synthesis). In some embodiments of any of the aspects, the first and second sequencing primers comprise an adaptor-binding region that is complementary or substantially complementary to the adaptor region of a primer in the first or second set of primers.
In some embodiments of any of the aspects, the sequencing method produces a sequencing read from the first or second sequencing primer (see e.g.,
In some embodiments of any of the aspects, the sequencing read from the first or second sequencing primer (e.g., SEQ ID NOs: 15 or 17) comprises sequence from the target RNA (e.g., one of SEQ ID NOs: 1009-1012 or the reverse complement thereof). In some embodiments of any of the aspects, the sequencing read from the first or second sequencing primer comprises at least one variation of interest in the target RNA.
In some embodiments of any of the aspects, the target RNA is detected in the sample if a first and second barcode region associated with the specific target RNA is detected in the sequencing read of the amplification product. In some embodiments of any of the aspects, the target RNA is detected in the sample if at least one first barcode region associated with the specific target RNA is detected in the sequencing read of the amplification product. In some embodiments of any of the aspects, the target RNA is detected in the sample if at least one second barcode region associated with the specific target RNA is detected in the sequencing read of the amplification product. In some embodiments of any of the aspects, the target RNA is not detected in the sample if a first or second barcode region associated with the specific target RNA is not detected in the sequencing read of the amplification product. In some embodiments of any of the aspects, if the target RNA is not present in the sample, then no barcode regions associated with the specific target RNA is detected in the sequencing reads of the amplification product.
In some embodiments of any of the aspects, at least n target RNAs in a single sample are detected, and the at least n target RNAs are on the same assayed RNA molecule. In some embodiments of any of the aspects, the assayed RNA molecule is determined to be present in the sample if at least one of the n target RNAs are detected. In some embodiments of any of the aspects, the assayed RNA molecule is determined to not be present in the sample if none of the n target RNAs are detected.
Another aspect of the technology described herein relates to kits for detecting a target RNA. Described herein are kit components that can be included in one or more of the kits described herein. In one aspect, described herein is a kit for detecting a target RNA in a sample, comprising: at least one of the following (a) a reverse transcriptase; (b) a first set of primers comprising at least one barcode; (c) a detergent; (d) a carrier nucleic acid; (e) a positive control nucleic acid; (f) at least one stabilization agent; (g) at least two containers; (h) a DNA polymerase; (i) a second set of primers; (j) Uracil-DNA Glycosylase (UDG) enzyme; (k) a protector nucleic acid; and/or (i) a third set of primers.
In some embodiments of any of the aspects, the kit comprises a reverse transcriptase. In some embodiments of any of the aspects, the kit is used to reverse transcribe target RNA into DNA, and to amplify the DNA to a detectable amplification product. In some embodiments of any of the aspects, the reverse transcriptase is selected from the group consisting of: a Moloney murine leukemia virus (M-MLV) reverse transcriptase (RT), an avian myeloblastosis virus (AMV) RT, a retrotransposon RT, a telomerase reverse transcriptase, an HIV-1 reverse transcriptase, or a recombinant version thereof. In some embodiments of any of the aspects, the reverse transcriptase is provided at a sufficient amount, such that, e.g., at least 200 U/µL, can be added to the RT reaction mixture.
In some embodiments of any of the aspects, the kit comprises a DNA polymerase. In some embodiments of any of the aspects, the DNA polymerase is a Thermus aquaticus (Taq) DNA polymerase or variant thereof. In some embodiments of any of the aspects, the DNA polymerase(s) is provided at a sufficient amount to be added to the amplification reaction mixture.
In some embodiments of any of the aspects, the kit comprises a first set of primers (e.g., for RT), comprising at least one barcode. In some embodiments of any of the aspects, the first set of primers comprises primers that bind to target RNA and provide an adaptor region (e.g., a PCR adaptor region). In some embodiments of any of the aspects, the kit comprises a second set of primers (e.g., for amplification). In some embodiments of any of the aspects, the second set of primers is specific (i.e., binds specifically through complementarity) to cDNA, in other words, the DNA produced in the RT step that is complementary to the target RNA. In some embodiments of any of the aspects, the second set of primers provides adaptors for sequencing. In some embodiments of any of the aspects, the kit comprises a third set of primers (e.g., for sequencing). In some embodiments of any of the aspects, the first, second, and/or third sets of primers are provided at a sufficient concentration, e.g., 25 uM to 500 uM, to be added to associated reaction mixture.
In some embodiments of any of the aspects, the kit comprises carrier nucleic acid, e.g., poly-A60 DNA oligonucleotide and/or E. coli tRNA, provided at a sufficient concentration to be added to the RT and/or amplification reaction. In some embodiments of any of the aspects, the kit comprises at least one positive control nucleic acid, provided at a sufficient concentration to be added to the RT reaction. In some embodiments of any of the aspects, the positive control nucleic acid is a positive sample control nucleic acid or a positive enzymatic control nucleic acid. In some embodiments of any of the aspects, the kit further comprises detergent, e.g., Triton-X10, provided at a sufficient concentration to be added to the RT reaction.
In some embodiments of any of the aspects, the kit comprises a stabilization agent, provided at a sufficient concentration to be added to the RT reaction. In some embodiments of any of the aspects, the kit comprises at least one of the following stabilization agents: (a) an RNase inhibitor; (b) a metal-chelating agent; (c) a reducing agent; d) an antibiotic; (e) an antimycoctic; and/or (f) a protease inhibitor (or any combination thereof, see e.g., Table 13).
In some embodiments of any of the aspects, the kit comprises at least one protector nucleic acid, provided at a sufficient concentration to be added to the amplification reaction. In some embodiments of any of the aspects, the at least one protector nucleic acid reduces or inhibits barcode crosstalk in the amplification reaction. In some embodiments of any of the aspects, the kit comprises Uracil-DNA Glycosylase (UDG) enzyme, provided at a sufficient concentration to be added to the amplification reaction, which can reduce or inhibit detection of amplification product contaminants.
In some embodiments of any of the aspects, the kit comprises at least two containers, such that at least two RT reactions can be combined into one amplification reaction, and/or at least two amplification reactions can be combined into one sequencing reaction. In some embodiments of any of the aspects, the container is a test tube, centrifuge tube, multi-well plate, and the like.
In some embodiments of any of the aspects, the kit further comprises a reaction buffer for the RT reaction and/or a reaction buffer for the amplification reaction. Such reaction buffers can comprise at least one of the following: diluent, water, magnesium acetate (or another magnesium compound such as magnesium chloride), and/or dNTPs. In some embodiments of any of the aspects, the kit further comprises a sample collection device, such a swab. In some embodiments of any of the aspects, the kit further comprises a sample collection container, optionally containing transport media. In some embodiments of any of the aspects, the kit further comprises reagents for a bead-based purification method or a spin-column-based purification method. In some embodiments of any of the aspects, the kit further comprises at least one negative control. Non-limiting examples of negative controls for SARS-CoV-2 include MERS, SARS, 229e, NL63, and hKul, which can be detected using specific primers.
In some embodiments, the kit comprises an effective amount of the reagents as described herein. As will be appreciated by one of skill in the art, the reagents can be supplied in a lyophilized form or a concentrated form that can diluted or suspended in liquid prior to use. The kit reagents described herein can be supplied in aliquots or in unit doses.
In some embodiments, the components described herein can be provided singularly or in any combination as a kit. Such a kit includes the components described herein and packaging materials thereof. In addition, a kit optionally comprises informational material.
In some embodiments, the compositions in a kit can be provided in a watertight or gas tight container which in some embodiments is substantially free of other components of the kit. For example, the reagents described herein can be supplied in more than one container, e.g., it can be supplied in a container having sufficient reagent for a predetermined number of applications, e.g., 1, 2, 3 or greater. One or more components as described herein can be provided in any form, e.g., liquid, dried or lyophilized form. Liquids or components for suspension or solution of the reagents can be provided in sterile form and should not contain microorganisms or other contaminants. When the components described herein are provided in a liquid solution, the liquid solution preferably is an aqueous solution.
The informational material can be descriptive, instructional, marketing or other material that relates to the methods described herein. The informational material of the kits is not limited in its form. In some embodiments, the informational material can include information about production of the reagents, concentration, date of expiration, batch or production site information, and so forth. In some embodiments, the informational material relates to methods for using or administering the components of the kit.
The kit will typically be provided with its various elements included in one package, e.g., a fiber-based, e.g., a cardboard, or polymeric, e.g., a Styrofoam box. The enclosure can be configured so as to maintain a temperature differential between the interior and the exterior, e.g., it can provide insulating properties to keep the reagents at a preselected temperature for a preselected time.
The computing device 170 and server 180 can be connected by a network 160 and the network 160 can be connected to various other devices, servers, or network equipment for implementing the present disclosure. A computing device 170 can be connected to a display 175. Computing device 170 can be any suitable computing device, including a desktop computer, server (including remote servers), mobile device, or other suitable computing device. A computing device 170 can be used to view or process sequencer 150 data. Data output from the sequencer 150 can also be input into a program that can be stored in a database 185. In some examples, sequencing data as described herein and other associated software can be stored in database 185 and run on server 180. Additionally, sequencing data processed or produced by said programs can be stored in database 185.
It should initially be understood that the methods and systems described herein can be implemented with any type of hardware and/or software, and can include use of a pre-programmed general purpose computing device. For example, the system can be implemented using a server, a personal computer, a portable computer, a thin client, or any suitable device or devices. The kits, methods and/or components for the performance thereof can include the use of a single device at a single location, or multiple devices at a single, or multiple, locations that are connected together using any appropriate communication protocols over any communication medium such as electric cable, fiber optic cable, or in a wireless manner.
It should also be noted that the systems as described herein can be arranged or used in a format having a plurality of modules which perform particular functions. It should be understood that these modules are merely schematically illustrated based on their function for clarity purposes only, and do not necessary represent specific hardware or software. In this regard, these modules can be hardware and/or software implemented to substantially perform the particular functions discussed. Moreover, the modules can be combined together within the disclosure, or divided into additional modules based on the particular function desired. Thus, the disclosure should not be construed to limit the present technology as disclosed herein, but merely be understood to illustrate one example implementation thereof.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some implementations, a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device). Data generated at the client device (e.g., a result of the user interaction) can be received from the client device at the server.
Implementations of the subject matter described in this specification can be performed in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer to-peer networks).
Implementations of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., CDs, disks, or other storage devices).
The operations described in this specification can be implemented as operations performed by a “data processing apparatus” on data stored on one or more computer-readable storage devices or received from other sources.
The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of these. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program can, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA or an ASIC as noted above.
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from, or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few. Devices suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
For convenience, the meaning of some terms and phrases used in the specification, examples, and appended claims, are provided below. Unless stated otherwise, or implicit from context, the following terms and phrases include the meanings provided below. The definitions are provided to aid in describing particular embodiments, and are not intended to limit the claimed invention, because the scope of the invention is limited only by the claims. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. If there is an apparent discrepancy between the usage of a term in the art and its definition provided herein, the definition provided within the specification shall prevail.
For convenience, certain terms employed herein, in the specification, examples and appended claims are collected here.
As used herein, the term “hybridizing”, “hybridize”, “hybridization”, “annealing”, or “anneal” are used interchangeably in reference to the pairing of complementary nucleic acids using any process by which a strand of nucleic acid joins with a complementary strand through base pairing to form a hybridization complex. In other words, the term “hybridization” refers to the process in which two single-stranded polynucleotides bind non-covalently through hydrogen bonding to form a stable double-stranded polynucleotide. The term “hybridization” may also refer to triple-stranded hybridization. The resulting (usually) double-stranded polynucleotide is a “hybrid” or “duplex.”
As used herein, the term “complementary” refers to nucleic acid sequences that are capable of base-pairing according to the standard Watson-Crick complementary rules. That is, the larger purines will base pair with the smaller pyrimidines to form combinations of guanine paired with cytosine (G:C) and adenine paired with either thymine (A:T) in the case of DNA, or adenine paired with uracil (A:U) in the case of RNA.
As used herein, the term “substantial” refers to of ample or considerable amount, quantity, or size as determined by a user. As a non-limiting example, the term “substantially complementary” refers to a nucleic acid that is at least at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or more complementary to another nucleic acid. As another non-limiting example, the term “substantially identical” refers to a nucleic acid that is at least at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or more identical to another nucleic acid. The term “essentially complementary” can be used interchangeably with “substantially complementary.” The term “essentially identical” can be used interchangeably with “substantially identical.”
As used herein, a “barcode” is an artificial DNA sequence that provides an indication, e.g., of sample origin, target identity or other information regarding a sequencing target. In one embodiment, the presence of a barcode can be an indicator that a target sequence is or was present in a given starting sample. In general, a barcode should not be substantially identical to or substantially complementary to any sequence of the genome of a host or to the genome of, e.g., a virus one wishes to detect. Similarly, the barcodes used in a given method should not be substantially complementary to other barcodes used in that method, i.e., the barcodes are members of a minimally cross-hybridizing set. That is, the nucleotide sequence of each member of such a barcode set is sufficiently different from that of every other member of the set that no member can form a stable duplex with the complement of any other member under stringent hybridization conditions. Barcodes can vary in length, but will generally be at least 4 nucleotides in length. Longer barcodes are contemplated, but will generally be less than 36 nucleotides in length. In some embodiments, barcodes can each have a length within a range of from 4 to 36 nucleotides, or from 6 to 30 nucleotides, or from 8 to 20 nucleotides. For more details concerning barcode technologies, see e.g., U.S. Pat. US9902950, US10233490; U.S. Pat. Publications US20150298091, US2018032017, US20180216160; international patent publications WO2015164212, WO2013192292; Winzeler et al. (1999) Science 285:901; Brenner (2000) Genome Biol. 1:1 Kumar et al. (2001) Nature Rev. 2:302; Giaever et al. (2004) Proc. Natl. Acad. Sci. USA 101:793; Eason et al. (2004) Proc. Natl. Acad. Sci. USA 101: 11046; and Brenner (2004) Genome Biol. 5:240; the contents of each of which are incorporated herein by reference in their entireties.
By adding a barcode to a primer with another region that specifically binds or hybridizes to a sequence one wishes to detect, detection of the barcode by sequencing becomes a surrogate for reading the actual signal of the target nucleic acid. When the only way to obtain an amplification product to sequence is to have a target nucleic acid present in an initial reverse-transcription and/or amplification reaction, one only needs to sequence the barcode to determine that the target sequence was present in the initial sample. Barcoding can also be used to indicate, for example, which sample a given sequence read belongs to. For example, when each sample is reverse transcribed using a primer that includes a barcode unique to that sample, detection of the sample-indicating barcode identifies which sample a given sequence read arose from. A combination of two or more barcodes can therefore provide significant information without the need to read into the actual target sequence, if so desired. For example, a primer including two barcodes (or a set or sets of primers including two barcodes), one correlating with target identity (indicating presence or absence of an RNA target) and one indicating which sample the read came from (a sample-specific barcode) can identify which sample, e.g., which individual subject, and which target nucleic acid is present in that sample without the need to sequence beyond the two barcodes, if so desired. As another example, a primer including two barcodes (or a set or sets of primers including two barcodes), one correlating with sample identity (a sample-specific barcode) and one correlating with batch identity (a batch-specific barcode indicating the reverse transcription batch) can identify the sample and reaction batch; sequencing in between the barcodes can determine the specific target sequence. In this manner, very high throughput diagnostics, e.g., viral diagnostics, can be realized. Of course, additional sequence information beyond just the barcodes can be and often is obtained using NGS approaches. In addition to simply obtaining more sequence beyond the barcodes through longer reads, reads beyond the barcodes can provide information on variants of a given target, for example.
The terms “decrease”, “reduced”, “reduction”, or “inhibit” are all used herein to mean a decrease by a statistically significant amount. In some embodiments, “reduce,” “reduction” or “decrease” or “inhibit” typically means a decrease by at least 10% as compared to a reference level (e.g. the absence of a given treatment or agent) and can include, for example, a decrease by at least about 10%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or more. As used herein, “reduction” or “inhibition” does not encompass a complete inhibition or reduction as compared to a reference level. “Complete inhibition” is a 100% inhibition as compared to a reference level. A decrease can be preferably down to a level accepted as within the range of normal for an individual without a given disorder.
The terms “increased”, “increase”, “enhance”, or “activate” are all used herein to mean an increase by a statically significant amount. In some embodiments, the terms “increased”, “increase”, “enhance”, or “activate” can mean an increase of at least 10% as compared to a reference level, for example an increase of at least about 20%, or at least about 30%, or at least about 40%, or at least about 50%, or at least about 60%, or at least about 70%, or at least about 80%, or at least about 90% or up to and including a 100% increase or any increase between 10-100% as compared to a reference level, or at least about a 2-fold, or at least about a 3-fold, or at least about a 4-fold, or at least about a 5-fold or at least about a 10-fold increase, or any increase between 2-fold and 10-fold or greater as compared to a reference level. In the context of a marker or symptom, a “increase” is a statistically significant increase in such level.
As used herein, a “subject” means a human or animal. Usually the animal is a vertebrate such as a primate, rodent, domestic animal or game animal. Primates include chimpanzees, cynomolgus monkeys, spider monkeys, and macaques, e.g., Rhesus. Rodents include mice, rats, woodchucks, ferrets, rabbits and hamsters. Domestic and game animals include cows, horses, pigs, deer, bison, buffalo, feline species, e.g., domestic cat, canine species, e.g., dog, fox, wolf, avian species, e.g., chicken, emu, ostrich, and fish, e.g., trout, catfish and salmon. In some embodiments, the subject is a mammal, e.g., a primate, e.g., a human. The terms, “individual,” “patient” and “subject” are used interchangeably herein.
Preferably, the subject is a mammal. The mammal can be a human, non-human primate, mouse, rat, dog, cat, horse, or cow, but is not limited to these examples. Mammals other than humans can be advantageously used as subjects that represent animal models of viral infection. A subject can be male or female.
A subject can be one who has been previously diagnosed with or identified as suffering from or having a condition in need of treatment (e.g. a viral infection) or one or more complications related to such a condition, and optionally, have already undergone treatment for a viral infection or the one or more complications related to a viral infection. Alternatively, a subject can also be one who has not been previously diagnosed as having a viral infection or one or more complications related to a viral infection. For example, a subject can be one who exhibits one or more risk factors for a viral infection or one or more complications related to a viral infection or a subject who does not exhibit risk factors. A “subject in need” of testing for a particular condition can be a subject having that condition, diagnosed as having that condition, or at risk of developing that condition.
In the various embodiments described herein, it is contemplated that variants (naturally occurring or otherwise), alleles, homologs, conservatively modified variants, and/or conservative substitution variants of any of the particular polypeptides described (e.g., reverse transcriptase, DNA polymerase, etc.) are encompassed. As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters a single amino acid or a small percentage of amino acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an amino acid with a chemically similar amino acid and retains the desired activity of the polypeptide. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles consistent with the disclosure.
A given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are well known. Polypeptides comprising conservative amino acid substitutions can be tested confirm that a desired activity and specificity of a native or reference polypeptide is retained.
Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) nonpolar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, G1n; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into G1n or into His; Asp into Glu; Cys into Ser; G1n into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into G1n or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Tip; and/or Phe into Val, into Ile or into Leu.
In some embodiments, a polypeptide described herein (or a nucleic acid encoding such a polypeptide) can be a functional fragment of one of the amino acid sequences described herein. As used herein, a “functional fragment” is a fragment or segment of a polypeptide which retains at least 50% of the wild-type reference polypeptide’s activity according to the assays described herein. A functional fragment can comprise conservative substitutions of the sequences disclosed herein.
In some embodiments, a polypeptide described herein can be a variant of a sequence described herein. In some embodiments, the variant is a conservatively modified variant. Conservative substitution variants can be obtained by mutations of native nucleotide sequences, for example. A “variant,” as referred to herein, is a polypeptide substantially homologous to a native or reference polypeptide, but which has an amino acid sequence different from that of the native or reference polypeptide because of one or a plurality of deletions, insertions or substitutions. Variant polypeptide-encoding DNA sequences encompass sequences that comprise one or more additions, deletions, or substitutions of nucleotides when compared to a native or reference DNA sequence, but that encode a variant protein or fragment thereof that retains activity. A wide variety of PCR-based site-specific mutagenesis approaches are known in the art and can be applied by the ordinarily skilled artisan to generate and test artificial variants.
A variant DNA or amino acid sequence can be at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or more, identical to a native or reference sequence. The degree of homology (percent identity) between a native and a mutant sequence can be determined, for example, by comparing the two sequences using freely available computer programs commonly employed for this purpose on the world wide web (e.g. BLASTp or BLASTn with default settings).
In some embodiments, the methods described herein relate to measuring, detecting, or determining the level of at least one target, e.g., the target RNA. As used herein, the term “detecting” or “measuring” refers to observing a signal from, e.g. a sequencing read, a probe, label, or target molecule to indicate the presence of an analyte in a sample. Any method known in the art for detecting a particular label moiety can be used for detection. Exemplary detection methods include, but are not limited to, sequencing, spectroscopic, fluorescent, photochemical, biochemical, immunochemical, electrical, optical or chemical methods. In some embodiments of any of the aspects, measuring can be a quantitative observation. Sequence determination, e.g., that indicates or confirms the presence of a given barcode region is a form of detecting used herein.
In some embodiments of any of the aspects, a polypeptide or nucleic acid as described herein can be engineered. As used herein, “engineered” refers to the aspect of having been manipulated by the hand of man. For example, a polynucleotide is considered to be “engineered” when at least one aspect of the polynucleotide, e.g., its sequence, has been manipulated by the hand of man to differ from the aspect as it exists in nature.
As used herein, “contacting” refers to any suitable means for delivering, or exposing, an agent to at least one component as described herein (e.g., sample, target RNA, cDNA, amplification product, etc.). In some embodiments, contacting comprises physical human activity, e.g., an injection; an act of dispensing, mixing, and/or decanting; and/or manipulation of a delivery device or machine.
As used herein, the term “specific binding” refers to a chemical or physical interaction between two molecules, compounds, cells and/or particles wherein the first entity binds to the second, target entity with greater specificity and affinity than it binds to a third entity which is a non-target. In some embodiments, specific binding can refer to an affinity of the first entity for the second target entity which is at least 10 times, at least 50 times, at least 100 times, at least 500 times, at least 1000 times or greater than the affinity for the third non-target entity. A reagent specific for a given target is one that exhibits specific binding for that target under the conditions of the assay being utilized.
The term “statistically significant” or “significantly” refers to statistical significance and generally means a two standard deviations (2SD) or greater difference.
Other than in the operating examples, or where otherwise indicated, all numbers expressing quantities of ingredients or reaction conditions used herein should be understood as modified in all instances by the term “about.” The term “about” when used in connection with percentages can mean ±1%. In some embodiments of any of the aspects, the term “about” when used in connection with percentages can mean ±5%.
As used herein, the term “comprising” means that other elements can also be present in addition to the defined elements presented. The use of “comprising” indicates inclusion rather than limitation.
The term “consisting of refers to compositions, methods, and respective components thereof as described herein, which are exclusive of any element not recited in that description of the embodiment.
As used herein the term “consisting essentially of” refers to those elements required for a given embodiment. The term permits the presence of additional elements that do not materially affect the basic and novel or functional characteristic(s) of that embodiment of the invention.
The singular terms “a,” “an,” and “the” include plural referents unless context clearly indicates otherwise. Similarly, the word “or” is intended to include “and” unless the context clearly indicates otherwise. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of this disclosure, suitable methods and materials are described below. The abbreviation, “e.g.” is derived from the Latin exempli gratia, and is used herein to indicate a non-limiting example. Thus, the abbreviation “e.g.” is synonymous with the term “for example.”
Groupings of alternative elements or embodiments of the invention disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.
Unless otherwise defined herein, scientific and technical terms used in connection with the present application shall have the meanings that are commonly understood by those of ordinary skill in the art to which this disclosure belongs. It should be understood that this invention is not limited to the particular methodology, protocols, and reagents, etc., described herein and as such can vary. The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention, which is defined solely by the claims. Definitions of common terms in immunology and molecular biology can be found in The Merck Manual of Diagnosis and Therapy, 20th Edition, published by Merck Sharp & Dohme Corp., 2018 (ISBN 0911910190, 978-0911910421); Robert S. Porter et al. (eds.), The Encyclopedia of Molecular Cell Biology and Molecular Medicine, published by Blackwell Science Ltd., 1999-2012 (ISBN 9783527600908); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8); Immunology by Werner Luttmann, published by Elsevier, 2006; Janeway’s Immunobiology, Kenneth Murphy, Allan Mowat, Casey Weaver (eds.), W. W. Norton & Company, 2016 (ISBN 0815345054, 978-0815345053); Lewin’s Genes XI, published by Jones & Bartlett Publishers, 2014 (ISBN-1449659055); Michael Richard Green and Joseph Sambrook, Molecular Cloning: A Laboratory Manual, 4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., USA (2012) (ISBN 1936113414); Davis et al., Basic Methods in Molecular Biology, Elsevier Science Publishing, Inc., New York, USA (2012) (ISBN 044460149X); Laboratory Methods in Enzymology: DNA, Jon Lorsch (ed.) Elsevier, 2013 (ISBN 0124199542); Current Protocols in Molecular Biology (CPMB), Frederick M. Ausubel (ed.), John Wiley and Sons, 2014 (ISBN 047150338X, 9780471503385), Current Protocols in Protein Science (CPPS), John E. Coligan (ed.), John Wiley and Sons, Inc., 2005; and Current Protocols in Immunology (CPI) (John E. Coligan, ADA M Kruisbeek, David H Margulies, Ethan M Shevach, Warren Strobe, (eds.) John Wiley and Sons, Inc., 2003 (ISBN 0471142735, 9780471142737), the contents of which are all incorporated by reference herein in their entireties.
Other terms are defined herein within the description of the various aspects of the invention.
All patents and other publications; including literature references, issued patents, published patent applications, and co-pending patent applications; cited throughout this application are expressly incorporated herein by reference for the purpose of describing and disclosing, for example, the methodologies described in such publications that might be used in connection with the technology described herein. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing in this regard should be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents is based on the information available to the applicants and does not constitute any admission as to the correctness of the dates or contents of these documents.
The description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. While specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize. For example, while method steps or functions are presented in a given order, alternative embodiments can perform functions in a different order, or functions can be performed substantially concurrently. The teachings of the disclosure provided herein can be applied to other procedures or methods as appropriate. The various embodiments described herein can be combined to provide further embodiments. Aspects of the disclosure can be modified, if necessary, to employ the compositions, functions and concepts of the above references and application to provide yet further embodiments of the disclosure. Moreover, due to biological functional equivalency considerations, some changes can be made in protein structure without affecting the biological or chemical action in kind or amount. These and other changes can be made to the disclosure in light of the detailed description. All such modifications are intended to be included within the scope of the appended claims.
Specific elements of any of the foregoing embodiments can be combined or substituted for elements in other embodiments. Furthermore, while advantages associated with certain embodiments of the disclosure have been described in the context of these embodiments, other embodiments can also exhibit such advantages, and not all embodiments need necessarily exhibit such advantages to fall within the scope of the disclosure.
The technology described herein is further illustrated by the following examples which in no way should be construed as being further limiting.
Some embodiments of the technology described herein can be defined according to any of the following numbered paragraphs:
This project addresses the urgent need of high-throughput viral diagnostics. The rapid, exponential spread of the COVID-19 virus in the US and across the world has forced a switch from a containment to a mitigation strategy. A national-scale lockdown, while maybe effective in the short run, is neither sustainable nor economically affordable. Learning from the experience of countries like China and South Korea, one strategy for resolving this crisis is to perform viral screening (and regular monitoring) at the population level - isolate the infected; let the others go to work. In particular, in a situation where there are significant numbers of infected but asymptomatic individuals in the population, population-wide testing is of vital importance. However, such a strategy requires a tremendously high testing capacity (e.g., >100,000,000 tests). As of Mar. 23, 2020, 80,000 tests had been performed across the US, with a testing capacity (e.g., <10,000 per day) that was not even enough to test all symptomatic patients. Even with the introduction of the high-volume testing systems (e.g., 10x higher throughput than conventional RT-PCR), there is at least a 100- to 1000-fold gap in testing capacity relative to need.
Described herein is an approach that uses a DNA barcoding strategy for multiplexed sample detection, to allow for massively parallel viral detection in 1,000 or more patient samples and several viral species, simultaneously. To achieve this, the method takes advantage of the tremendously high throughput of next-generation sequencing (NGS) platforms (e.g., 10 million reads per run on an Illumina MiSeq™ machine, and 10 billion reads on a NovaSeq™). Importantly, hundreds of these sequencing machines are set up in academic institutes and centralized core facilities across the country, and are readily convertible to clinical testing centers to meet the current urgent diagnostic needs. The method described herein allows highly-multiplexed viral testing (e.g., COVID-19, SARS, H1N1) in thousands of patient samples in a few hours, with an amortized instrument and reagent cost of <$1 per test. Successful implementation of this method allows massive-scale viral surveillance at a population level and can immediately impact the course of an infectious disease, such as the COVID19 pandemic. As well as identifying asymptomatic carriers, these surveillance results provide critical data for better epidemiological understanding of the spatial and temporal dynamics of viral transmission. Apart from viral detection, the method further provides the ability of (e.g., partial) viral sequencing to allow monitoring of new subspecies and better understanding of its mutational and transmission dynamics. Combined, these results play a critical role in evaluating effective strategies (e.g., social isolation) and guiding public policy making for subsequent phases (e.g., months or years to come) in the battle against infectious disease, while reducing negative economic and social impacts.
Described herein is development of the workflow for highly-multiplexed RNA barcoding, sample pooling, library preparation and sequencing readout. Synthetic COVID-19 viral RNA (commercially available, e.g., from ATCC) is used as a test target. Tests involve multiplexing specificity and cross-talk and determination of the limit of detection, dynamic range, and uniformity of barcode detection sensitivity. One can test for and optimize different barcoding probes and reverse transcription primer designs, various reaction conditions (e.g., concentrations, temperature) and then test for large-scale (e.g., 1,000) multiplexed detection.
Current gold-standard RT-PCR protocols rely on RNA extraction before cDNA conversion, which limits the overall assay throughput and makes testing dependent on the availability of RNA extraction kits, which can be in short supply during pandemics. The methods described herein comprise an efficient cDNA conversion and barcoding method without a separate RNA extraction step. Methods for nuclease inhibition and reverse transcription are also utilized; see e.g., Myhrvold et al., Science, 2018. 360(6387): 444-448, the contents of which are incorporated herein by reference in its entirety. Mimicked clinical samples (e.g., spiked-in synthetic targets in human cell background) are used to assay cDNA conversion efficiency and overall detection sensitivity and uniformity across different barcodes. Finally, the method for multiplexed detection is tested with patient samples, through collaboration with hospitals. Such tests are cross-checked with standard RT-PCR methods to validate the test results and further quantify our limit of detection, false positive, and false negative rates in patient samples.
The principle of this approach is to use DNA barcoding to tag different patient samples (e.g., sample ID), as well as multiple viral species or genomic loci (e.g., locus ID) at the cDNA level, thus permitting highly-parallel readout by NGS sequencing. In contrast to traditional sequencing-based viral detection and assay methods, the approach does not sequence the viral genome. In some embodiments, it only reads out the two DNA barcodes. Additionally, the method uses limited pre-amplification in combination with bridge PCR to prevent the common problem of carryover contamination.
The molecular workflow for the method comprises four steps (see e.g.,
A single sequencing run on a MiSeq machine (e.g., 20 million reads), for 1,000 patient samples and 20 genomic loci, gives an average of 1,000 reads per patient/locus pair. This matches well with the clinically observed dynamic range of viral load, and indicates that the method can not only report the existence or absence of virus, but can also provide quantitative information on the patient’s viral load (e.g., around the swab sampling area). The test result can be interpreted as positive when most of the 20 locus IDs (e.g., >15) are observed (e.g., associated with a particular patient ID); and negative when none or only a few are observed (e.g., <5). The assay is therefore highly robust against sample degradation and barcode cross-talk.
(a) Multiple viral pathogens can be tested simultaneously for differential diagnosis, by extending the pool of locus-specific probes to target different viral genomes (e.g., COVID-19, SARS, H1N1). (b) A unique molecular identifier (UMI) can be incorporated on the reverse primer, to allow digital counting of viral load. (c) A short segment of viral genome can be sequenced, immediately following the barcode regions, to provide viral sequence and mutation information at locations critical for the study of virus-host interaction and potentially vaccine development (e.g., the ACE2 binding site on the SARS-CoV-2 spike protein). (d) cDNA conversion and barcoding can be performed in one reaction, after heat inactivation of the virus and in the presence of nuclease inhibitor and viral transport medium (VTM).
The workflow for multiplexed viral RNA barcoding and detection can be used to detect 1,000 samples in a single sequencing run. A pilot test can be performed that multiplexes 100 samples. With demonstration of massively multiplexed viral detection in 1,000 patient samples, this workflow can be implemented in local hospitals.
In some embodiments, a sequencer can be used as a single molecule detector without amplification. Also, by employing DNA barcoding several steps in bulk biochemistry can be performed after pooling the individual molecules and recovering the identities of the individuals who contributed the samples. The assay can be expanded to multiple individuals and multiple viruses simultaneously. This technique can be immediately extended to as many viruses as one wished and used to look at the spread of genetic variants in the populations of many samples all at once, taken from tens of thousands to hundreds of thousands of individuals. With this information, epidemiologists can design optimal strategies for predicting the course of an epidemic and for designing a strategy to contain the epidemic by identifying the carriers and segregating them from a healthy population. This kind of technique can be used to measure the efficacy of anti-virals and vaccines in smaller populations in clinical trials.
The goal for the method described herein is to reduce or remove as many pre-processing steps as possible to cut down the labor and material requirement for scaling up; such pre-processing steps include, e.g., RNA extraction, pre-amplification, and the logistics of sample handling. Barcoding and sequencing methods allow for low-crosstalk, high-dynamic-range readout. Such methods are referred to herein as “one-step” and/or “one-Seq” methods, e.g., from the patient and logistic perspective. For the patient, such methods allow at-home sample collection and remove the burden of a heating step at home. For the testing facility, such methods remove any per-tube reaction (e.g., RNA extraction, PCR/thermocycling) and any nontrivial robotic pipetting.
See, e.g., Table 1 for exemplary advantages of the sequence-based detection methods as described herein.
The workflow of the method described herein comprises barcoding at the first step. There is also no pre amplification before pooling, allowing for a simpler biochemistry reaction for complex environment, multiplexed detection, and semi-quantitative readout. The method also involves short amplicon sequencing.
Biochemically, the methods described herein comprise: a one-step RT reaction, e.g., in the presence of viral media and/or saliva; a multiplexed RT reaction; sample preservation before reaching a central testing facility; and/or a positive control for sample quality, amount, and/or RT reaction. See e.g.,
In summary, the one-step reaction system for viral lysis and efficient reverse transcription described herein is compatible with multiplexed RT reactions and sample storage at room temperature for up to 24 hrs; furthermore, the high-throughput sequencing readout method demonstrates a high dynamic range (see e.g.,
Outlook, regulatory agencies, such as the FDA, have approved for Emergency Use Authorization (EUA) NGS-based COVID-19 diagnostic test (e.g., IDT). The methods described herein can also be used for COVID-19 diagnostics, using sequence-optimized primers and barcodes, as well as multiplexed viral and viral loci detection.
The management of pandemics, such as COVID-19, requires highly scalable and sensitive viral diagnostics, together with variant identification. Next-generation sequencing (NGS) has many attractive features for highly multiplexed testing, however current sequencing-based methods are limited in throughput by early processing steps on individual samples (e.g., RNA extraction and PCR amplification). Described herein is a method, “One-Seq”, that eliminates the bottlenecks in scalability, by permitting early pooling of samples, before any extraction or amplification steps. To permit early pooling, a one-pot reaction is used for efficient reverse transcription (RT) and upfront barcoding in extraction-free clinical samples, and a “protector” strategy in which carefully designed competing oligonucleotides prevent barcode crosstalk and preserve detection of the high dynamic range of viral load in clinical samples. One-Seq is highly sensitive, achieving a limit of detection (LoD) down to 2.5 genome copy equivalent (gce) in contrived RT samples, 10 gce in multiplexed sequencing, and 2-5 gce with multi-primer detection, indicating an LoD of 100-250 gce/ml for clinical testing. In clinical specimens, One-Seq showed quantitative viral detection against clinical Ct values with 6 logs of linear dynamic range and detection of SARS-CoV-2 positive samples down to ~300 gce/ml. In addition, One-Seq reports a number of hotspot viral mutations, allowing variant identification, at equal scalability with no extra cost. Scaling up One-Seq allows a throughput of 100,000-1,000,000 tests per day per single clinical lab, at an estimated amortized reagent cost of $3 per test and turn-around time (TAT) of 7.5-15 hr.
Highly-scalable and highly-sensitive viral diagnostics (e.g. for SARS-CoV-2) are critical for both pandemic response and long-term epidemiological surveillance. During a pandemic, population-wide testing can provide effective control and monitoring of the viral spread and allow safe return to work. In the long term, regular and population-wide monitoring promises a “bio-weather map” to identify and forecast new viral infection hotspots, preventing the “next outbreak”. Furthermore, the ability to sequence and identify emerging viral variants (e.g. B.1.1.7, B 1.427 for SARS-CoV-2), also on the population scale, allows real-time monitoring of the rate of transmission and pathogenicity, as well as informing public health policies and vaccine development. Current diagnostic methods fall short of these requirements, as they are limited in either sample processing throughput, testing sensitivity and reliability, or the ability to identify different viral variants.
At present, molecular tests using “gold standard” reverse transcription polymerase chain reaction (RT-qPCR) in central laboratory facilities have demonstrated high detection sensitivity (down to 200 gce/mL-1,000 gce/mL of SARS-CoV-2 (by the FDA’s comparison panel results), but they are limited in throughput by the requirements of RNA extraction and PCR thermocycling on each sample individually, as well as other liquid handling operations; see e.g.,
Next-generation sequencing (NGS) based methods have long been attractive alternatives to RT-qPCR in two ways: (i) the intrinsic high-throughput readout for multiplexed diagnostics, and (ii) the ability to obtain viral genome sequences for variant identification. In principle the very high-throughput (up to 1010 reads per session, on an Illumina NovaSeq™ machine) allows a single testing lab to process up to a million patient samples per day with pooled analysis, if they could avoid the handling of individual samples. Since the beginning of the COVID-19 pandemic, several methods for NGS-based multiplexed testing have been proposed and developed. See e.g., Bloom et al., Swab-Seq: A high-throughput platform for massively scaled up SARS-CoV-2 testing, medRxiv (Aug. 6, 2020); Illumina™ COVIDSeq Test Instructions for Use (May 1, 2020); Hossain et al. A massively parallel COVID-19 diagnostic assay for simultaneous testing of 19200 patient samples. Google Docs (Mar. 20, 2020); Schmid-Burgk et al. LAMP-Seq: Population-Scale COVID-19 Diagnostics Using a Compressed Barcode Space bioRxiv (Apr. 8, 2020); Wu et al., INSIGHT: A population-scale COVID-19 testing strategy combining point-of-care diagnosis with centralized high-throughput sequencing. Sci Adv 7, (Feb. 12, 2021); Yelagandula et al. SARSeq, a robust and highly multiplexed NGS assay for parallel detection of SARS-CoV2 and other respiratory infections (med Rxiv, Nov. 3, 2020); the contents of each of which are incorporated herein by reference in their entireties.
As expected, methods that reported detection sensitivity close to the RT-qPCR tests (200-1000 gce/ml) mostly followed the traditional barcoding and sequencing workflows and require individual RNA extraction and PCR thermocycling steps; see e.g.,
To overcome these limitations, described herein is a sequencing-based method that achieves high sensitivity, high throughput, and identification of viral variants. To obtain high throughput a “pooling-before-amplification” strategy was implemented (see e.g.,
To overcome the bottleneck in throughput, One-Seq introduces a “pooling-before-amplification” strategy (see e.g.,
Such a workflow involves at least two critical challenges. First, the one-step, extraction-free reaction has to perform three tasks simultaneously: viral lysis and release of viral RNA, an efficient reverse-transcription that allows high-sensitivity viral detection, and preservation of patient samples at room temperature for up to 24 hr during sample collection and transport to the central lab. Second, by performing pooling before amplification, the library amplification reaction must faithfully preserve the high dynamic range of viral load known to exist in clinical samples (e.g., up to 106to 107-fold range), and at the same time achieve high detection sensitivity. In particular, the method needs to stringently avoid any barcode crosstalk that can arise from amplification and sequencing steps, as this crosstalk would result in false positive diagnoses. The detection methods described herein overcome at least those challenges.
Described herein is an extraction-free and high-sensitivity method for viral lysis and reverse transcription (RT), which can be performed in the presence of potential inhibitors in patient samples (e.g. NP swab or saliva). Since reverse transcriptases are in general more resistant to inhibitors than thermostable polymerases, there is an unappreciated advantage in separating the RT and PCR steps in the traditional RT-PCR workflow, since this allows more flexibility in formulating the RT reaction mix. To assay RT efficiency in the presence of inhibitors, contrived standard samples were prepared with human saliva collected from COVID-19 negative donors and viral RNA spike-in (e.g., synthetic RNA fragment by in vitro transcription (IVT), or full-length RNA genome from Twist Bio Sciences™). First, the RNA protection effects of different RNase inhibitors were compared, and Murine™ (New England Biolabs™) and RNAsin™ (Promega™) provided the best and similar protection at 25° C. to 50° C. The RT efficiency of various reverse transcriptases was then compared in saliva-containing samples (see e.g.,
Contrived clinical samples were next prepared using pooled COVID-19 negative remnant clinical specimens (nasopharyngeal (NP) swab in viral transport medium (VTM), N=15), with spiked-in inactivated virus standard (heat-inactivated SARS-CoV-2 from ATCC, VR-1986HK; or AccuPlex™ SARS-CoV-2 verification panel from SeraCare™, 0505-0168) (see e.g.,
To assay the analytical sensitivity of RT reaction, a roughly 2x dilution series was prepared of inactivated virus standard (ATCC) in contrived clinical samples, ranging from 100 genome copy equivalent (gce) to less than 1 gce per reaction. The RT product was assayed by qPCR in triplicate (see e.g.,
To optimize viral lysis and RNA release, the effect of using detergent was tested; see e.g., Smyrlaki et al., Nat Commun 11, 4812 (Sep. 23, 2020); Srivatsan et al. Preliminary support for a “dry swab, extraction free” protocol for SARS-CoV-2 testing via RT-qPCR (Biorxiv, Apr. 23, 2020); the content of each of which is incorporated herein by reference in its entirety. The addition of mild detergent (Triton X-100) improved the detection sensitivity by ~5x from extraction-free viral samples, from a limit of detection (LoD) = 50 gce to 10 gce (3/3 detection; see e.g.,
Multiplexed RT with multiple primers provides the ability for multi-loci and multi-virus monitoring as well as increased detection sensitivity. This effect was tested using the two SARS-CoV-2 N-gene-targeting primers in contrived clinical samples. Indeed, there was a roughly 2-fold higher detection sensitivity (LoD = 1 gce) when signals from both primers were considered (see e.g.,
The one-pot reaction system can also stabilize patient samples for up to 24 hr at room temperature, during the delay between sample collection and transport to central testing lab. To work out the parameters, using contrived saliva samples with synthetic RNA spike-in (IVT), a list of stabilization agents were screened for their sample preserving effect, including antibiotics and antimycotics, protease inhibitors, reducing agents and metal chelating agents. The stabilization agents can be grouped into RNA-preserving (e.g., EDTA and DTT) and RT enzyme-preserving (e.g., antibiotic and antimycotic, protease inhibitor) factors. Their effects were tested in contrived clinical VTM samples prepared as above, with inactivated virus spike-in. After 24 hr incubation at room temperature, both groups individually improved RT efficiency by roughly 2-fold (see e.g.,
The sample stabilization buffer was also tested in contrived saliva samples (see e.g.,
Described herein is a “pooling-before-amplification” workflow for sample pooling and PCR library amplification that not only maintains the high detection sensitivity and preserves signal linearity, but also preserves high sample dynamic range and allows quantitative report of viral load in patient samples.
A set of PCR primers were first designed for efficient library amplification (see e.g., Table 4). For each RT target, several different reverse primers were designed and the best one was selected for library amplification efficiency by qPCR and band purity by gel electrophoresis. For sample barcodes, a large set of distinct sample barcodes need to be error-tolerant and color-balanced for Illumina™ sequencing machines. The IDT for Illumina™ unique dual (UD) index set (384 dual index pairs) were concatenated and expanded to 960 unique barcodes by inserting three blocks of sequence tags (see e.g.,
Amplification efficiency and dynamic range were tested for these selected barcodes, with a 10x dilution series (see e.g.,
Suppressing off-target barcode crosstalk and preserving high sample dynamic range are critical for faithful diagnostics, such as COVID-19 since clinical samples have been shown to exhibit a large dynamic range (up to 106to 107) of detectable viral load, and any barcode mis-assignment could result in false positive diagnoses; see e.g., Bar-On et al. SARS-CoV-2 (COVID-19) by the numbers. Elife 9 (Mar. 30, 2020); Arnaout et al., supra. The degree of barcode crosstalk in the workflow was first assayed by pooling 1 or 10 barcoded RT samples prepared with high spiked-in viral load together with 95 or 86 negative samples with other barcodes, and sequencing reads carrying any of the off-target barcodes were tallied (see e.g.,
A major source of barcode crosstalk in a “pooling-after-amplification” workflow is from cross-hybridization of excess library adapters during the cluster amplification process, which then produces mis-barcoded transcripts; see e.g., Kircher et al, Nucleic Acids Res 40, e3 (2012). A similar mechanism with cross-hybridized excess RT primers during the library amplification step can account for the main source of the 0.1% barcode crosstalk observed in the One-Seq workflow. Methods for minimizing crosstalk using unique dual indices is not compatible with a “pooling-before-amplification” strategy. Described herein is a strategy to reduce this crosstalk by suppressing cross-hybridization of excess RT primers, e.g., during the PCR step (see e.g.,
First, a simple test of this protector strategy was performed using a short DNA amplicon together with an off-target barcoded RT primer, and using qPCR as the readout. The test included several different protector strand designs, including a naive approach using the complement of the RT primer sequence (see e.g.,
Next, the protector strategy was tested in multiplexed sequencing settings and in contrived clinical samples, following similar test design as above (1-10 high-load sample along with ~90 off-target barcodes) (see e.g.,
Performance of the method was validated using SARS-CoV-2 positive clinical samples (see e.g.,
To test the detection sensitivity as well as dynamic range of our method, a set of representative COVID-19 positive samples (Np swab in VTM) were chosen that spanned a wide range of clinical Ct values (e.g., from 15 to 38), and the samples were subjected to the One-Seq workflow. For this test, three distinct barcodes were mixed together for each sample and their sequencing reads were summed, to maximize the sensitivity and robustness of detection. The first assay tested the detection sensitivity of One-Seq and its dependence on input sample volume (see e.g.,
The lowest sample concentration detected was at 360 gce/ul (Ct = 34.39), indicating that One-Seq can detect clinical samples with viral load in the 200-500 gce/ul range, using a single amplicon. There was a linear correlation between the detected sequencing reads and estimated viral load (calculated from clinical Ct values), over the entire range of Ct values (from 15 to 35), demonstrating that One-Seq faithfully reports viral load in a quantitative manner over 6 logs of dynamic range (see e.g.,
In this test, there were three clinically determined positive samples that were not detected. Notably all three had only one of the two targets detected by RT-qPCR (i.e. either the SARS-CoV-2 N gene or SARS-CoV-2 orflab gene was not detected), and they all had Ct values >36 for the detected target. If these samples were indeed actually positive, they were likely missed by the One-Seq test due to the small sample volume (6 ul) used in this test as compared to a typical RT-qPCR test (300 ul or more); further increasing sample volume can improve the detection sensitivity.
Simultaneous detection using multiple RT primers allows multi-loci, multi-virus diagnostics, with increased viral detection sensitivity. Furthermore, if the RT primers are designed to be in close proximity to mutation hotspots (see e.g.,
RT primers were designed targeting several characteristic mutations in the SARS-CoV-2 S gene for the reported variant B.1.1.7, including del69-70, dell44, N501Y, D614G and A701V, and dye-based qPCR was used to assay for RT efficiency. It was not always easy to design good RT primers in close proximity to the target mutations, likely due to the presence of strong local secondary structure in the RNA (see e.g.,
In silico analysis was performed for primer inclusivity and specificity for all designed primer pairs, following FDA guidelines. All primers aligned to all available SARS-CoV-2 genome sequences in the NCBI database (98,765 sequences) with at most 1 base mismatch, and 7 out of the 8 primers showed exact match to >99.4% of all sequences (see e.g., Table 7). Since One-Seq performs RT and PCR in separate steps, cross-reactivity analysis was only performed on RT primers. All four RT primers showed no significant (>80%) homology to genome sequences of common respiratory flora and other related viruses (see e.g., Table 8). In addition, One-Seq reads a short sequence into the viral genome, providing highly specific viral detection.
Next, a confirmatory clinical sensitivity test was performed for all designed primer pairs (4 in total) in a similar 96x multiplexed format, in both single-primer and multi-primer settings (see e.g.,
For a 4-primer multiplexed test with a 20 ul patient sample intake, this result translates to an LoD = 100-250 gce/ml in clinical samples, approaching the detection limit of RT-qPCR tests. Further increasing sample input volume, or using more primers in parallel can both further increase the detection sensitivity in a linear fashion, e.g. taking 300 ul specimen (typical for RT-qPCR tests) can allow an LoD down to 5-10 gce/ml.
Finally, One-Seq was tested multi-primer detection in clinical samples in a 96x multiplexed format, consisting of 56 COVID-19 clinical samples (two repeats of 28 specimens), 24 contrived standards, and 16 no-target negative controls (see e.g.,
Described herein is a method for viral RNA molecular diagnostics (e.g. SARS-CoV-2) that allows highly scalable central lab testing, achieves high detection sensitivity, and provides sequence information at targeted mutation hotspots, allowing for viral variant identification. To permit such high scalability, the method includes a “pooling-before-amplification” strategy and avoids the high-complexity steps of RNA extraction and PCR thermocycling, thus eliminating current bottlenecks in scalability. To permit early pooling, a one-pot reaction was used for efficient reverse transcription (RT) and upfront barcoding, and a “protector” strategy was used that preserved the high dynamic range of viral load in patient samples. One-Seq can reach a high detection sensitivity in unextracted samples, down to 10 gce (e.g., per 20 uL sample) by multiplexed sequencing for a single primer, and down to 2-5 gce (e.g., per 20 uL sample) for multi-primer detection with four primers. Assuming 20 ul sample intake, this is equivalent to a viral load of 100-250 gce/ml in unextracted patient sample, approaching the maximum sensitivity of extraction-based RT-qPCR assays. Scaling up sample volume can further improve the detection sensitivity linearly. In clinical samples, One-Seq quantitatively reported patient viral load, preserved 6 logs of linear dynamic range of viral load (estimated from clinical Ct values), and detected SARS-CoV-2 positive samples down to ~300 gce/ml in viral load. One-Seq further reports sequences at a number of viral mutation hotspots, allowing for variant identification at equal scalability with no extra cost.
One-Seq can be used with a two-stage barcoding and pooling strategy to test a large number (e.g., 100,000) of patient specimens, without the need to design and manufacture an equally large number of distinct barcodes (see e.g.,
One-Seq is highly scalable, cost-effective, with a fast turn-around (see e.g., Table 2). Using a high output Illumina™ sequencer such as the NovaSeq™ 6000, a maximum sample throughput is 100,000-160,000 samples per day per machine, allowing an overall throughput of up to 1,000,000 tests per day in a single clinical lab, using multiple sequencers. Further increase in sample throughput as well as cost reduction are possible with other sequencing modalities (e.g. Oxford Nanopore PromethION™ 48 allows 5x lower sequencing reagent cost, and up to 180,000 tests per day at comparable capital cost) (see e.g., Table 2). Depending on the sequencer model used and whether batch pooling and viral sequencing are desired, One-Seq sample turn-around time (TAT) ranges from a minimum of 7.5 hr (for a single batch on a MiSeq™, without viral sequencing) to a maximum of 14.5 hr (for batch pooling on a NovaSeq™ 6000, with viral sequencing), allowing for diagnostic results to be available within 24 hr of sample collection or drop-off (see e.g., Table 9). The cost per sample for the One-Seq method also scales favorably for highly-multiplexed settings. At relatively small scale (e.g., 80 samples per run on a MiSeq™ sequencer) and using off-the-shelf reagents, the cost of the method is at $20 per test; at large scale, (e.g., 40,000 samples per run on a NovaSeq™) sequencing reagent cost is reduced to <$0.5 per sample, and mass production can lower enzyme and reagent cost by 70% or more, bringing the total cost down to $3 (see e.g., Table 10). Due to the minimum sample processing needed for the One-Seq workflow, the consumable cost (e.g., tips, tubes) is also considerably lower, making the total cost per test lower than RT-qPCR or sequencing-based testing methods. In addition to scalability, One-Seq also shows superior performance in comparison with other methods, and offers high detection sensitivity (down to LoD = 100-250 gce/ml), and ability to test unextracted clinical samples (see e.g., Table 3). Taken together, One-Seq offers a technically and economically viable solution for highly-scalable testing on a population scale.
One-Seq also allows detection of viral hotspot mutations and monitoring of their transmission dynamics (see e.g., Table 3). This is especially important as certain mutations can convey higher transmission rate or pathogenicity (e.g. B.1.1.7 of SARS-CoV-2) or evasion from immunity induced by vaccination or prior infection (e.g. E484K of SARS-CoV-2). It has been increasingly appreciated that identifying and tracking viral variants is as critical as diagnostic screening, and sequencing remains the only method available for effective variant identification. Current whole-genome sequencing (WGS) methods (e.g. Illumina™ COVIDSeq) typically require 50-100x sequencing reads for the same sample and are further bottlenecked in throughput by the PCR-limited sample preparation steps. In contrast, One-Seq uses targeted sequencing that requires much fewer reads per sample, and allows much higher scalability and lower amortized cost. Therefore, One-Seq is ideally suited for variant identification and tracking.
One-Seq can be clinically implemented in at least one of two ways to permit highly-scalable viral diagnostics (see e.g.,
Finally, One-Seq is flexible in at least two important ways: it can be continually updated in a matter of days to include RT primers targeting emerging viral mutations as they appear, providing a real-time monitoring of viral evolution and transmission during an ongoing pandemic; and it can be targeted to detect any single-stranded RNA viruses of positive and negative sense, including the common cold, seasonal flu, hepatitis, dengue, Ebola, West Nile, Zika, and more, or a number of them in a multiplexed manner. One-Seq allows for population-scale surveillance with a panel of viruses of special concern, allowing for the reporting of a “bio-weather map” for the early identification and tracking of emerging viral hotspots, in order to help prevent future viral outbreaks.
All clinical specimen and saliva samples used in the study were deidentified. Remnant clinical nasopharyngeal swab samples were obtained from Boca Biolistics™. None of the clinical specimens were heat-inactivated prior to use, and all operations with clinical specimens were performed inside a biosafety cabinet (BSC) following BL2+ safety protocols. SARS-CoV-2 inactivated virus standard materials were obtained from ATCC (VR-1986HK) or SeraCare™ (AccuPlex™ 0505-0168). In vitro transcribed SARS-CoV-2 viral N gene mRNA were prepared with Invitrogen™ MAXIscript™ T7 transcription kit (ThermoFisher™, AM1312), following manufacturer’s protocol. The template DNA was prepared from N positive control plasmid (IDT, 10006625) with T7 promoter-containing primers, and purified from an agarose gel using QIAquick™ PCR purification kit (QIAGEN, 28104).
For clinical limit of detection studies, pooled confirmed COVID-19 negative remnant nasopharyngeal swab specimens purchased from Boca Biolistics™ (N=15) were used. Pooled clinical samples were then spiked in with ATCC or SeraCare™ inactivated virus standard, or in vitro transcribed viral RNA at various specified concentrations, pre-diluted into viral transport medium (VTM). VTM was prepared with 2% FBS (heat-inactivated at 56° C. for 30 min, Gibco™ 26140079), 1x Antibiotic-Antimycotic (Gibco™, 15240096) and 11 mg/L phenol red, in 1x Hank’s balanced salt solution (Gibco™, 14025092). None of the contrived clinical samples were pre-heat-inactivated before one-pot reverse transcription step.
For reverse transcription efficiency studies, pooled saliva specimen collected from COVID-19 negative donors were used, either with (N=4, “clean”) or without (N=9, “dirty) mouth rinsing before collection. Pooled saliva samples were then spiked with ATCC inactivated virus standard, or in vitro transcribed viral RNA, at specified concentrations, as above.
Reverse transcription primers were designed following these criteria: (i) Tm (calculated with IDT oligo analyzer, RNA-targeting primer) in range of 54° C.-60° C., strong 3′-end binding (e.g., the presence of G or C bases within the last five bases from the 3′ end of primers (i.e., GC clamp) helps promote specific binding at the 3′ end due to the stronger bonding of G and C bases.), and (ii) high sequence coverage of available SARS-CoV-2 genomes and low homology with SARS, MERS, and related viral sequences. Furthermore, RT primers targeting mutation hotspots were designed to be in close vicinity (e.g., within 5 nt) to the targeted loci, to avoid significantly increasing the sequencing runtime (see e.g.,
960 unique patient barcodes were designed by concatenating the i7 and i5 sequences and further expanding from IDT for Illumina™ Unique Dual Index set (4x96=384 pairs in total; see e.g.,
Sequencing constructs were designed using custom read primers and PCR adapters. Read primers were designed to be orthogonal to sequencing adapters and have Tm > 70° C. A short PCR adapter sequence, which forms a part of the read 1 primer, was designed to allow for pooled amplification using a common forward primer and also compatible with the protector strand. A detailed illustration of the sequencing construct including example sequences are given in
A full list of all primers, barcodes and adapters used in this study is provided in Tables 4-6 (Table 4: primers, adapters, batch barcodes; Table 5: 960 sample barcodes; Table 6: 96 selected sample barcodes).
Positive control RNA (e.g., SEQ ID NO: 11) was designed to start with the same RT primer with the SARS-CoV-2 N gene targeting primer N#1, and extended with 8 nt sequence distinct from the viral genome. Synthetic RNA was purchased from IDT, and spiked into all samples at a concentration of 104-105 copies/ul to provide positive control reads.
One-pot sample reaction for viral lysis, reverse transcription and sample barcoding was performed with SuperScript™ IV reverse transcriptase (Thermo™, 18090010) in manufacturer provided reaction buffer (without DTT), supplemented with 10% (v/v) murine RNAse inhibitor ( New England Biolabs™, M0314), 0.1% Triton X-100, 1x Antibiotic-Antimycotic (Gibco™, 15240096), 0.5 mM EDTA, 5 mM DTT, cOmplete™ protease inhibitor cocktail (1 tablet into 13.3 ml, Sigma™, 11873580001), 0.5 uM poly-A60 DNA oligonucleotide, 15 ug/ml E. coli tRNA (Sigma™, 10109541001) and 104-105 copies/ul synthetic RNA for positive control, further added with 35-50% (v/v) equivalent of viral transport media or pooled clinical or saliva sample and 125 nM of barcoded RT primer (for each primer). For limit of detection studies, inactivated virus standard from ATCC or SeraCare™ was spiked into the one-pot reaction at specified concentrations. For barcode crosstalk studies, in vitro transcribed viral mRNA was used. For viral lysis and sample preservation studies, different subsets of above components were added to the reaction mix. For primer concentration studies, 25 nM-500 nM of barcoded RT primers were used. For multiplexed sequencing samples, a master mix of above reaction mix without barcoded primer and contrived clinical sample was first prepared and aliquoted into a 96-well plate, then RT primers with unique barcodes and samples was added to each well.
One-pot reactions were assembled on ice-cold blocks. Once assembled, the reaction was incubated at 50° C. for 30 minutes (min), followed by inactivation at 95° C. for 5 min. For tests with contrived samples, incubation was performed in a closed-lid PCR thermocycler; for tests with clinical specimen, incubation was performed in a heat block, and followed by another inactivation session at 95° C. for 5 min in a closed-lid thermocycler once moved out of the BSC. For sample preservation studies, the assembled reaction was left at room temperature and covered for up to 24 hours (hr) before starting the 50° C. incubation.
For limit of detection studies for N# 1 and N#2 primers, and RT quality control for clinical sample tests, qPCR was performed after the one-pot sample reaction. 0.5 ul-1.0 ul one-pot reaction sample was added to 40 ul qPCR mix (40x-80x dilution), containing Taq polymerase and standard buffer (New England Biolabs™, M0273), 0.2 mM dNTP mix and CDC SARS-CoV-2 primer and probe set at 0.5 uM equivalent primer concentration (IDT RUO kit, 10006713). Formation of cloudy aggregation was observed in certain clinical samples after the one-pot reaction. In such situation, to ensure adequate sample intake, the one-pot reactions were mixed with pipetting a few times before adding to the qPCR reaction. For limit of detection studies for variant targeting primers, qPCR was performed with dye-based readout, using Luna™ universal qPCR master mix (New England Biolabs™, M3003) and 0.5 uM of both forward and reverse PCR primers.
qPCR samples were run on a Bio-Rad™ C1000 thermal cycler and CFX real-time PCR system for 50 cycles, and optionally with melt curve measurement for dye-based readout. Ct values were determined by manufacturer’s auto-thresholding function when possible. For preliminary clinical sensitivity studies, limit of detection (LoD) was determined to be the lowest viral spike-in concentration at which all 3/3 tests yielded a valid Ct value. For dye-based qPCR, results were interpreted with melt curve analysis instead of Ct values.
One-pot reaction samples (20 ul-80 ul each) were pooled by multichannel pipettes from 96-well plate to a single tube and immediately proceeded to cDNA purification using spin column (QIAquick™ PCR purification kit, QIAGEN 28104) or bead-based method (MagMax™ viral/pathogen nucleic acid isolation kit, Thermal™ A42352). The manufacturer’s protocols were adapted for large input sample volume and high sensitivity recovery. For column purification, the sample was added multiple times to the same spin column. For bead purification, large 50 ml conical tubes were used, and centrifugation (e.g., 3,000 rcf for 3 min) was used instead of magnetic attraction for effective collection of the beads. To ensure maximum recovery, only DNA low-bind tubes and pipette tips were used for this step. The purified cDNA library was supplemented with carrier DNA and RNA (e.g., poly-A60 oligonucleotide and E. coli tRNA) to further avoid sample loss on tube walls. For purification method comparison studies, QIAquick™ nucleotide removal kit (QIAGEN, 28304) was also compared to AmPure™ XP beads (Beckman Coulter™, A63880), both following manufacturer’s protocols.
The pooled and purified cDNA library was amplified in a dUTP-incorporating PCR reaction, using Luna™ universal qPCR master mix (New England Biolabs™, M3003), supplemented with Uracil-DNA Glycosylase (UDG) enzyme at 25 units/ml (New England Biolabs™, M0372). For single-primer detection, 0.25 uM of both forward and reverse primers were used. For multi-primer detection with 4 primers, 0.5 uM of forward and 0.125 uM of each of the reverse primers were used. For multiplexed sequencing tests on clinical samples, 2 uM protector oligonucleotide was added. For protector concentration studies 0.5 uM-5 uM protector was used. For barcode crosstalk studies, a mixture of 86 or 95 off-target barcoded RT primers was further supplemented into the reaction. Library amplification was run for 40-50 cycles with a custom-optimized thermocycling program: the first two cycles used a low annealing temperature (e.g., 52° C. -58° C.), and the remaining cycles used a high annealing temperature (e.g., 68° C.).
The amplified library samples were within the 200 bp-260 bp range. Since non-specific amplification products can adversely affect loading concentration and sequencing quality, library quality was assessed on agarose gel and the desired band was purified using QIAquick™ PCR purification kit (QIAGEN, 28104). The purified library sample was then normalized using either Qubit™ or Agilent TapeStation™ before proceeding to sequencing run.
Sample libraries were sequenced on an Illumina MiSeq™ machine, at a loading concentration of 10 pM (for V2 Micro kit, 300-culec, MS-103-1002) or 20 pM (for V3 kit, 150-cycle, MS-102-3001), supplemented with 15-20% Phi-X control v3 (Illumina™, FC-110-3001). To avoid template carryover contamination between consecutive sequencing runs, two template line washes (e.g., containing sodium hypochlorite solution, Sigma™, 239305) were performed between each run, following Illumina™ protocol.
Since the sequencing construct as well as barcodes were custom designed, custom read primers were spiked into the sequencing kit following Illumina™ protocols (e.g., 2 ul of 100 uM R1 custom read primer into well 12, and 2 ul of R2 primer into well 14). Sequencing was performed for 100+100 bases (for V2 Micro kit, 300-cycle) or 100 \+68 bases (for V3 kit, 150-cycle) with no indexing reads for developing the test; this can be shortened to 40-60 cycles for clinical use.
The bioinformatic analysis of sequencing results was performed in a few steps: FASTQ generation and adapter trimming (Illumina™ BaseSpace), sequence alignment (bowtie2™), demultiplexing and read counting (custom scripts in MATLAB and Excel™). Here sequence alignment was performed against sequences from one or multiple RT primers, allowing for ≤2 edit distance between the library and sequencing read. In the case of viral sequencing and mutation identification, the reads were aligned against both original and mutated viral sequences, and the best matched genotype was reported. After alignment, each sample was identified using a combination of a front sample barcode, and a reverse batch barcode. All sequencing read counts were added by 1 to allow easy plotting. The analysis pipeline takes 20-30 min per run. The analysis pipeline involves a fast and user-friendly analysis workflow.
For barcode crosstalk studies with 1-10 high-load barcoded samples, supplemented with 86-95 off-target RT primers, after sequence alignment, the matched sequence counts for both groups of barcodes (on-target and off-target) were separately tallied. Read counts from the high-load samples were then normalized to 106, and then counts from the off-target barcodes and relative level of crosstalk were determined.
In silico analysis for RT primer specificity and inclusivity was performed following the FDA guideline (see e.g., Molecular Diagnostic Template for Laboratories, version Jul. 28, 2020). Specifically, inclusivity analysis was performed against all available SARS-CoV-2 genome sequences downloaded from NCBI (98,765 sequences), after excluding incomplete genomes (e.g., sequences with consecutive N’s and sequence fragments less than 20,000 nt in length). Specificity analysis was performed on Blastn against the recommended list of common respiratory flora and other viral pathogens (see e.g., full list available in Table 8), using parameters optimized for detection of short, somewhat similar sequences.
Conformity clinical sensitivity studies were performed in pooled negative remnant clinical specimen background with different concentration of inactivated virus spike-in (ATCC) in a roughly 2x dilution series, based on results from pilot studies. All tests were performed with 96x multiplexed sample processing workflow. Each testing condition was repeated 20-22 times using high-quality, unique barcodes (i.e. not repeated 20-22 times with the same barcode) selected from barcode QC experiment. Each primer was tested multiple times with different batch barcode on the reverse side. Sequencing read threshold values were calculated using 3-σ formula (cut-off = mean + 3x stdev.) and reads obtained from negative control samples. The final limit of detection (LoD) for each target primer pair was determined using 95% detection rate cut-off (e.g., 19/20 or 21/22 detection) or 90% cut-off (when specified).
For sensitivity studies and clinical sample tests by multiplexed sequencing, positive samples can be determined using the 3-σ threshold, e.g., any sample with matched record count higher than mean + 3x stdev of all measurements obtained on the negative control samples were determined to be positive. Here, record count can be measured in one of two ways: either using raw sequencing read count (+1), or using above read count normalised by read count of positive control RNA.
Table 2 shows key performance characteristics for scalable diagnostics with One-Seq. “*” indicates column was scaled (2x) to match capital cost as one NovaSeq™ 6000 sequencer; “**” indicates assuming an average of 2.5x105 sequencing reads per sample. “***” indicates the estimated amortized cost with mass production. See e.g., Table 9 for details.
Table 3 compares performance between One-Seq and other methods. “*” indicates that for RNA extraction or PCR limited tests, throughput is estimated assuming sample processing in 96-well formats, and under the assumption that RNA extraction takes 0.5 hr, and PCR thermocycling takes 1.5 hr. PCR throughput is estimated using 384-well plates.
indicates that it was tested by FDA’s SARS-CoV-2 Reference Panel (see e.g., fda.gov/medical-devices/coronavirus-covid-19-and-medical-devices/sars-cov-2-reference-panel-comparative-data#table2a);
indicates projected sensitivity using four primers;
indicates an estimate.
Table 4 lists One-Seq primers, adapters, and batch barcodes. All Tm values were calculated using IDT oligo analyzer (available on the worldwide web at idtdna.com/calc/analyzer), with qPCR default parameters.
Table 5 lists the 960x unique sample barcodes (e.g., Barcode IDs: UDPX001-960). “#” in the first column indicates SEQ ID NO. “UDPX” in the second column indicates UDPX ID number.
Table 6 lists the 96x selected sample barcodes (Barcode IDs: UDPS001-096).
Table 7 shows the inclusivity analysis of primers used.
Table 8 lists the organisms and taxonomy ID used for cross-reactivity analysis.
Chlamydia pneumoniae
Haemophilus influenzae
Legionella pneumophila
Mycobacterium tuberculosis
Streptococcus pneumoniae
Streptococcus pyogenes
Bordetella pertussis
Mycoplasma pneumoniae
Pneumocystis jirovecii
Candida albicans
Pseudomonas aeruginosa
Staphylococcus epidermidis
Streptococcus salivarius
Table 9 shows the breakdown of One-Seq processing times
Table 10 shows a breakdown of One-Seq reagent cost. “*” indicates that all costs are estimated for 20 ul patient sample input. “∗∗” indicates that enzyme costs can be significantly reduced when mass produced, estimated as 25% of current off-the-shelf cost.
This application claims benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No. 62/994,072 filed Mar. 24, 2020, U.S. Provisional Application No. 63/040,790 filed Jun. 18, 2020, and U.S. Provisional Application No. 63/159,033 filed Mar. 10, 2021, the contents of each of which are incorporated herein by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/023978 | 3/24/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63159033 | Mar 2021 | US | |
63040790 | Jun 2020 | US | |
62994072 | Mar 2020 | US |