This application contains a Sequence Listing that has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Oct. 4, 2021, is named 103182-1270065-005910WO_SL.txt and is 11,926 bytes in size.
In order to contain the Covid-19 global pandemic, protect high-risk populations from infection, and sustainably resume economic activity, a significant fraction of the asymptomatic global population must be routinely tested for the causative SARS-CoV-2 virus. Periodic testing of asymptomatic essential workers and students will require a 100-fold expansion of global testing capacity, yet commercial laboratories are currently unable to process even symptomatic patients on clinically relevant time scales, and rapid self-tests are not sufficiently accurate to provide actionable diagnostic information. There is a therefore an urgent need for technological advances that scale clinical-grade viral testing by orders-of-magnitude.
The terms “invention,” “the invention,” “this invention” and “the present invention,” as used in this document, are intended to refer broadly to all of the subject matter of this patent application and the claims below. Statements containing these terms should be understood not to limit the subject matter described herein or to limit the meaning or scope of the patent claims below. This summary is a high-level overview of various aspects of the invention and introduces some of the concepts that are described and illustrated in the present document and the accompanying figures. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to appropriate portions of the entire specification, any or all figures and each claim. Some of the illustrative embodiments of the present invention are discussed below.
The present disclosure provides methods for concurrent sample processing called Identity Preserving Sample Multiplexing (IPSM) that provides the ability to scale SARS-CoV-2 testing by orders of magnitude.
In one aspect, the disclosure provides a method for rapid identification of a SARS-CoV-2 positive subject, the method comprising:
In a further aspect, the disclosure provides a method for rapid identification of a SARS-CoV-2 positive subject, the method comprising:
In another aspect, provided herein is a method for rapid identification of a patient that is infected with a single-stranded RNA (ssRNA) virus, the method comprising: (a) incubating a patient nucleic acid sample comprising RNA obtained from a patient to be evaluated for infection with the ssRNA virus with an oligonucleotide that comprises a patient-specific identifying sequence that distinguishes the nucleic acid sample from the patient from nucleic acid samples from other patients in a pool of patient nucleic acid samples, wherein incubation comprises annealing ssRNA nucleic acid, if present in the patient nucleic acid sample, with at least three collinear oligonucleotides that are reverse complementary to the ssRNA target sequence under conditions to form a hybridized oligonucleotide-viral nucleic acid complex, wherein the at least three collinear oligonucleotides are each hybridized at adjacent positions to the respective target region of the ssRNA genome, and wherein each of the three oligonucleotides hybridizes to a target region of the ssRNA viral nucleic acid;
In an additional aspect, the disclosure provides a method of rapid identification of a patient infected with a ssRNA virus, the method comprising: (a) incubating a patient nucleic acid sample comprising RNA obtained from a patient to be evaluated for infection with the ssRNA virus with an oligonucleotide that comprises a patient-specific identifying sequence at the 5′ end that distinguishes the nucleic acid sample from the patient from nucleic acid samples from other patients in a pool of patient nucleic acid samples, wherein incubation comprises annealing ssRNA viral nucleic acid, if present in the patient nucleic acid sample, with the oligonucleotide, wherein the oligonucleotide hybridizes to a target region of the ssRNA viral nucleic acid; (b) pooling the patient nucleic acid sample following incubation in step (a) with a plurality of nucleic acid samples from other patients incubated as in (a), but where the patient-specific identifier sequence is different for each of the other patient samples present in the pool, relative to each of the other patient specific-specific barcodes; (c) purifying hybridized oligonucleotide-ssRNA virus nucleic acid complexes, when present, from the pool; (d) performing a reverse transcriptase reaction to extend the oligonucleotide hybridized to the ssRNA viral nucleic acids; and (e) performing an amplification reaction on a portion of the product obtained in (d) that is capable of detecting ssRNA viral nucleic acids to determine whether the pool is positive for the presence of ssRNA viral polynucleotide sequences. In some embodiments, the method further comprises (f) performing an asymmetric RNaseH-dependent PCR on a positive pool to provide a library of nucleic acid molecules for sequencing, wherein the asymmetric PCR comprises amplification using patient-specific primers, each of which hybridizes to a patient-specific barcode sequence, and is present in approximately the same limiting concentration; and (g) sequencing the library of nucleic acid molecules to determine the patient-specific identifier sequences, thereby identifying a patient that is infected with the ssRNA virus. In some embodiments, the target region of the SARS-CoV-2 viral nucleic acid of each of the collinear oligonucleotides has low secondary structure. In some embodiments, the oligonucleotide has a GC content from about 45% to about 55%. In some embodiments, the amplification reaction of (e) is quantitative PCR. In alternative embodiments, embodiments, the amplification reaction of (e) is rolling circle amplification (RCA) or loop-mediated isothermal amplification (LAMP). In some embodiments, the Tm of the oligonucleotide is in the range of 65° C. to 95° C.
As used herein, the terms “a”, “an”, and “the” can refer to one or more unless specifically noted otherwise.
The terms “about” and “approximately” as used herein shall generally mean an acceptable degree of error for the quantity measured given the nature or precision of the measurements. For example, exemplary degrees of error for temperature may be less than 5%, e.g., 4%, 3%, 2%, 1%, or 0.5% of a given value or range of values. Any reference to “about X” or “approximately X” specifically indicates at least the values X, 0.95X, 0.96X, 0.97X, 0.98X, 0.99X, 1.01X, 1.02X, 1.03X, 1.04X, and 1.05X. Thus, expressions “about X” or “approximately X” are intended to teach and provide written support for a claim limitation of, for example, “0.98X.”. Numerical quantities given herein are approximate unless stated otherwise, meaning that the term “about” or “approximately” can be inferred when not expressly stated. When “about” is applied to the beginning of a numerical range, it applies to both ends of the range.
The term “low secondary structure” in a RNA virus target sequence, e.g., a SARS-CoV-2 target sequence, refers to a region that is not predicted to form a helix through intramolecular base pairing between RNA nucleotides in the SARS-CoV-2 RNA genome. SARS-CoV-2 RNA secondary structure has been described (see, e.g., Rangan & Das, RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses. BioRxiv, 2020). RNA secondary structure can also can be predicted using software for numerous other RNA structure prediction models, e.g., RNAfold, RNAstructure, and RNAshapes, CONTRAfold, CentroidFold, ContextFold, pknotsRG, Probknot, Pknot, Knotty, MC-Fold, MC-Fold-DP, CycleFold, and EvoClustRNA, among others.
The term “collinear” in the context of “collinear” oligonucleotides refers to oligonucleotides that hybridize to adjacent sequences of a target nucleic acid, such that there are no unhybridized intervening bases of the target nucleic acid sequence between the adjacent oligonucleotides.
A “polynucleotide” or “nucleic acid” includes any form of RNA or DNA, including, for example, genomic DNA; complementary DNA (cDNA), and DNA molecules produced synthetically or by amplification. “Polynucleotides” include nucleic acids comprising non-standard bases. A polynucleotide in accordance with the disclosure will generally contain phosphodiester bonds, although in some cases, nucleic acid analogs may be used that may have alternate backbones, comprising, e.g., phosphoramidate, phosphorothioate, phosphorodithioate, or O-methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A Practical Approach, Oxford University Press); positive backbones; non-ionic backbones, and non-ribose backbones. Polynucleotides may be single-stranded, double-stranded, or partially double-stranded. An “oligonucleotide” as used herein is preferably DNA; and includes embodiments in which an oligonucleotide contains one or more modified nucleotides.
As used herein, the term “complementary” refers to the capacity for precise pairing between two nucleotides. I.e., if a nucleotide at a given position of a nucleic acid is capable of hydrogen bonding with a nucleotide of another nucleic acid, then the two nucleic acids are considered to be complementary to one another at that position. A “complement” may be an exactly or partially complementary sequence. Two oligonucleotides are considered to have “complementary” sequences when there is sufficient complementarity that the sequences hybridize (forming a partially double stranded region) under assay conditions.
The terms “anneal”, “hybridize” or “bind,” in reference to two polynucleotide sequences, segments or strands, are used interchangeably and have the usual meaning in the art. Two complementary sequences (e.g., DNA and/or RNA) anneal or hybridize by forming hydrogen bonds with complementary bases to produce a double-stranded polynucleotide or a double-stranded region of a polynucleotide.
As used herein, “amplification” of a nucleic acid sequence has its usual meaning, and refers to in vitro techniques for enzymatically increasing the number of copies of a target sequence. Amplification methods include both asymmetric methods (in which the predominant product is single-stranded) and conventional methods (in which the predominant product is double-stranded).
As will be understood from context, every description of a method step or of an interaction of a reagent with SARS-CoV-2 RNA in a patient sample or pooled sample contemplates that the same steps or activities may be carried out in samples comprising SARS-CoV-2 (positive samples) and in samples that do not comprise comprising SARS-CoV-2 (negative samples). For example, the step of “ligating the three oligonucleotides hybridized to a SARS-CoV-2 nucleic acid” contemplates that ligase and oligonucleotides will be added to a negative pool, which will be maintained under ligation conditions, even though the oligonucleotides are not ligated together in a pool free from viral RNA.
The IPSM technology described herein eliminates the retesting bottleneck of conventional pooling by individually labeling samples with patient-specific barcodes before pooling to preserve patient identities during pooled viral purification and enzymatic sample processing. This provides the ability to perform concurrent viral isolation, purification and enzymatic processing of 100-1000 patients per cohort, rapid screening of positive cohorts, and quantification of individual patient viral titers by massively-parallel barcode sequencing. A schematic of the method is provided in
Although the invention is largely described in the context of SARS-CoV-2 infection, the methods described herein can be employed for rapid screening for other viral infections, including other coronaviruses, such as SARS-CoV, MERS-CoV, or any other single-stranded RNA (ssRNA) virus. Further, the methodology can also be employed for rapid screening for single-stranded DNA (ssDNA) virus infections. Accordingly, the steps of the methods described herein, can be applied to detect other ssRNA or ssDNA viruses.
The patient screening methods of the present disclosure employ sequence-based barcodes, which provide trackable patient identifiers for SARS-CoV-2 sequences, if present, from a test sample obtained from a patient, thus allowing transcripts from pooled patient samples to be sequenced simultaneously in a single massively parallel sequencing pool without loss of the ability to trace the patient sample from which transcripts originated.
The present disclosure thus provides a method of rapidly identifying SARS-CoV-2-positive patients by incubating (i) a nucleic acid preparation from a patient with (ii) one, two, or three or more oligonucleotides that hybridize to target regions of SARS-CoV-2 RNA. One of the oligonucleotides comprises a patient-specific identification region, i.e., barcode. The oligonucleotides are incubated with the patient nucleic acid sample under conditions in which oligonucleotides can anneal to viral nucleic acids, if present in the sample. Following incubation, the patient sample is pooled with nucleic acid samples from other patients, e.g., from 10-100 different patients, that are similarly processed, but where the oligonucleotide(s) incubated with nucleic acid samples from different patient comprises different patient-specific identifying sequences.
Hybridized complexes comprising collinear oligonucleotides hybridized to SARS-CoV-2 RNA genome are isolated following pooling of nucleic acid samples; and, in instances in which two or more oligonucleotides are employed, ligated by RNA-splinted DNA ligation to ligate the oligonucleotides to provide a single oligonucleotide molecule comprising the patient-specific barcode hybridized to the SARS-CoV-2 nucleic acid. In embodiments in which a single oligonucleotide comprising a patient identifier sequence is hybridized to SARS-CoV-2 RNA instead of collinear oligonucleotides, a reverse transcriptase is employed to extend the hybridized oligonucleotide following pooling and isolation of hybridized oligonucleotide-SARS-CoV-2 complexes.
An amplification reaction, e.g., a quantitative PCR using SARS-CoV-2 primers, is then performed on a portion of the pool comprising the nucleic acids from different patients to determine whether the pool is positive or negative for the presence of SARS-CoV-2 polynucleotide sequences.
Positive pools are further processed for sequencing to balance the sequencing library so that SARS-CoV-2 sequences from patients having a high SARS-CoV-2 viral titer do not dominate the sequencing library and prevent identification of SARS-CoV-2 sequences from other patients who may have low viral SARS-CoV-2 titers. This procedure employs an asymmetric RNaseH-dependent PCR reaction to generate the balanced cohort sequencing library of nucleic acid molecules. For the asymmetric PCR, each patient-specific primer that targets the corresponding patient identifier barcode, is supplied in a common limiting concentration during PCR amplification. During this asymmetric PCR, each patient sub-library transitions from exponential to linear amplification once the patient-specific primer is consumed. The number of double stranded ligation products generated by this asymmetric PCR will then be narrowly distributed across all patients in the cohort. The library is then sequenced to determine the patient barcode sequences, thereby identifying patients that are positive for SARS-CoV-2.
In some embodiments, oligonucleotides for hybridization to SARS-CoV-2 RNA sequences are designed to target regions of the SARS-CoV-2 genome that have low secondary structure. In some embodiments, such oligonucleotides have a GC content of about 45% to about 55%. The oligonucleotides are thus designed to be stably bound to target during manipulations subsequent to annealing. One of skill understands how to work at temperatures that don't disrupt the duplex. Generally, SAR-CoV-2 binding region of an oligonucleotide provided herein can range in size from 15 to 50 nucleotides, although in some embodiments, the binding region may be longer. In some embodiments, the SAR-CoV-2 binding region is from 25 to 35 nucleotide in length. In some embodiments, the binding region is 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length.
In embodiments employing multiple collinear oligonucleotides to target SAR-CoV-2, the Tm of an oligonucleotide that does not comprise a patient identifier sequence, e.g., an oligonucleotide that binds to a SARS-CoV-2 target region positioned between the sequences to which flanking oligonucleotides bind, has a Tm that is about the temperature of the ligation reaction in which collinear oligonucleotides are joined, or higher. For example, ligation reactions can be performed at room temperature or higher. Thus, in some embodiments, the oligonucleotide-viral RNA duplex may have a Tm of at least about 22° C. In some embodiments, the Tm is at least 10° C. higher, or at least 20° C. greater than the temperature at which the ligation reaction is performed. In some embodiments, the oligonucleotides are designed to have a Tm of at least 50° C. or at least 55° C. or at least 60° C. In other embodiments, the Tm is at least 65° C. In some embodiments, the Tm is at least 70° C. In some embodiments, the Tm is at least 75° C. In some embodiments, the Tm is at least 80° C. or at least 85° C. In some embodiments, suitable oligonucleotides have a Tm in the range of about 45° C. to about 95° C. In some embodiments, the Tm is in the range of about 50° C. to about 95° C. In some embodiments, the Tm is in the range of about 55° C. to about 95° C. In some embodiments, the Tm is in the range of about 60° C. to about 95° C. In some embodiments, the Tm is in the range of about 65° C. to about 90° C. Tm can be calculated using known methods, for example, the www http address idtdna.com/pages/tools/oligoanalyzer.
An oligonucleotide that comprises a patient identifier sequence is generally designed to have a Tm that is at least about 20° C. above the temperature at which collinear oligonucleotides are ligated or reverse transcription is conducted. Thus, for example, in embodiments employing collinear oligonucleotides, the Tm of an oligonucleotide that comprises the patient identifier region is generally designed to be above about 42° C., i.e., 20° C. above a room temperature ligation reaction. In embodiments in which a single oligonucleotide comprising the patient identifier region is extended by reverse transcriptase, the oligonucleotide may have a Tm of least about 62° C., i.e., 20° C. above a reverse transcription reaction. Accordingly, in some embodiments, the Tm is at least about 45° C. In some embodiments, the Tm is at least 50° C. or at least 55° C. or at least 60° C. In other embodiments, the Tm at least 65° C. In some embodiments, the Tm is at least 70° C. In some embodiments, the Tm is at least 75° C. In some embodiments, the Tm is at least 80° C. or at least 85° C. In some embodiments, suitable oligonucleotides have a Tm in the range of about C to about 95° C. In some embodiments, the Tm is in the range of about 50° C. to about C. In some embodiments, the Tm is in the range of about 55° C. to about 95° C. In some embodiments, the Tm is in the range of about 60° C. to about 95° C. In some embodiments, the Tm is in the range of about 65° C. to about 90° C.
In embodiment in which multiple collinear oligonucleotides are employed, the Tms of the individual oligonucleotides may differ. In some embodiments, the Tms are within 5° C. or 10° C. of one another. In some embodiments, the Tms are the same.
In some approaches, a target hybridization region is a region that will anneal to oligonucleotide(s) having a GC content of about 45% to about 55%. In a preferred embodiment, the target hybridization region is a region of low secondary structure in the SARS-CoV-2 RNA sequence. For example, in some embodiments in which multiple, e.g., three, collinear oligonucleotides are used, the oligonucleotide that binds to the region between the 5′-most and 3-most oligonucleotides may bind at a region starting at position 28448 within the N gene of SARS-CoV-2, as defined using the MT007544.1 genome build (NCBI, Severe acute respiratory syndrome coronavirus 2 isolate Australia/VIC01/2020, complete genome).
In some embodiments, the oligonucleotide(s) comprise one or more modified nucleotides. Any suitable modified nucleotide may be included, but in some embodiments, the modification includes a Tm-enhancing modification, that is, a modification that increases Tm relative to an oligonucleotide that has the same sequence, but does not include the modification. Such Tm-enhancing modifications include, for example, a modified 5-methyl deoxycytidine (5-methyl-dc); 2,6-diaminopurine; a locked nucleic acid (LNA); a bridged nucleic acid (also referred to as a bicyclic nucleic acid or BNA); a tricyclic nucleic acid; a peptide nucleic acid (PNA); a CS-modified pyrimidine base; a propynyl pyrimidine; a morpholino; a phosphoramidite; or a 5′-Pyrene cap. In embodiments in which multiple oligonucleotides are employed for annealing and subsequent ligations, each of the oligonucleotides typically comprises the same type of modified nucleotides to increase Tm.
The same oligonucleotide design considerations, such as Tm, GC content, and length of SARS-CoV-2 binding region detailed above are employed for embodiments in which one oligonucleotide is to be annealed to target viral RNA and extended by reverse transcriptase after annealing and pooling of the patient sample with other samples.
In some embodiments at least two, and preferably at least three, oligonucleotides are annealed to SARS-CoV-2 and subsequently ligated to each other. In such embodiments, the SARS-CoV-2 binding region of each of the oligonucleotides may be of the same length. Alternatively, the SARS-CoV-2 binding region of each oligonucleotide may differ in length. For example, in some embodiments, the binding regions may differ in length by 1-5 nucleotides, or by 1-10 nucleotides.
Embodiments in which collinear oligonucleotides are annealed to viral nucleic acids and joined by ligation typically employ three oligonucleotides. However, in some embodiments, more than three oligonucleotides, e.g., 4 or 5, may be used to increase specificity.
Identification of a patient that is infected with SARS-CoV-2 is achieved through the use of patient-specific identifier sequences, i.e., barcodes, incorporated at the 5′ or 3′ end of at least one of the oligonucleotides that is incubated with a patient sample for annealing to SARS-CoV-2 RNA, when present in the sample. For embodiments employing one oligonucleotide in which an oligonucleotide annealed to viral target RNA is extended using RT, the barcode sequence is present at the 5′ end of the oligonucleotide. For embodiments in which multiple oligonucleotides are annealed to viral target RNA and ligated, the barcode sequence may be present at the 3′ end of the oligonucleotide that targets the region that is the farthest upstream (i.e., 5′), relative to the target regions of the other oligonucleotide(s) (also referred to herein as “5′-most” oligonucleotide. Thus, when the multiple oligonucleotides are ligated to one another, the barcode is at the 3′ end of the ligated product. Alternatively, the barcode sequence may be present at the 5′ end of the oligonucleotide that targets the region that is farthest downstream (i.e., 3′) relative to the target region of the other oligonucleotide(s) (also referred to herein as “3′-most”). Accordingly, when the multiple oligonucleotides are ligated to one another, the barcode is at the 5′ end of the ligated product. In some embodiments, the barcode sequence may be included at both the 3′ end of the oligonucleotide that hybridizes to the target at the position farthest upstream, and the 5′ end of the oligonucleotide that hybridizes to the target at the position farthest downstream. The resulted ligated product when the oligonucleotides are ligated to one another will then contain the patient-specific identifying region at both the 3′ and 5′ ends.
The patient-specific identifying regions are typically the same size relative to one another. In some embodiments, the size may be anywhere from 15-25 nucleotides in length, for example, 15, 16, 17, 18, 19, or 20 nucleotides in size. In some embodiments, the barcode region is 16 nucleotides in length.
The barcode sequences are designed to result in one or more base-pair mismatches if the barcode hybridizes to any primer (for the RNase H asymmetric extension, as detailed below) other than the primer specific for the particular patient-specific barcode. In some embodiments, the barcode sequences are selected for a Hamming distance of 1, 2, 3, 4, 5, or 6, or more, nucleotides up to the length of the barcode sequence. In some embodiments, the barcode sequences are selected for a Hamming distance of 4 nucleotides. Additional considerations in barcode design include the GC content (preferably about 50%) and incorporation of an RNA base for the corresponding primer used in the RNaseH-dependent PCR.
An oligonucleotide that anneals to the viral nucleic acid target may also comprise additional sequences, such as a unique molecular identifier that identifies sequences that are amplified from the same initial template molecule; and a universal amplification sequence, i.e., a primer binding site for a universal primer.
In some embodiments, the oligonucleotide is designed to contain a sequence that forms a hairpin. For example, an oligonucleotide that comprises the barcode at the 3′ end and hybridizes to the 5′-most target sequence in the viral nucleic acid may be designed such that the first few, e.g., 4-12, non-complementary bases are reverse complementary to the initial (complementary) bases of the oligonucleotide, forming a stem-loop structure in the absence of viral templates. As the temperature is decreased during annealing, the hairpin region adopts one of two thermodynamically favorable configurations: it is either specifically annealed to the viral RNA template, or collapsed as a sequestered hairpin, which can't anneal to viral RNA after pooling with other patient samples.
Oligonucleotides for annealing to target viral nucleic acids as described herein are also often attached to a molecule that allows for easy purification. Thus, for example, an oligonucleotide may be biotinylated, e.g., at the 5′ end. Examples of other purification moieties molecules include a hapten, a ligand that binds to a cognate binding partner, or an alternative purification tag.
Viral RNA is extracted from a sample obtained from a patient to be evaluated for SARS-CoV-2 infection. The sample may be from a throat swab, a nasopharyngeal swab, sputum or tracheal aspirate, or any other sample that may contain viral nucleic acids. At least one oligonucleotide as described above, which comprises a patient-specific identifier sequence, is then incubated with the nucleic acids extracted from the sample under conditions suitable for annealing, i.e., conditions in which the oligonucleotide will anneal to target SARS-CoV-2 sequences. In one approach, the samples are heated to a temperature above the Tm of the oligonucleotide, and then cooled, e.g., allowed to cool to room temperature, so that the oligonucleotide anneals to the target sequence, if present in the sample, and provides a stable hybridization complex in which oligonucleotides hybridized to the viral nucleic acid remains hybridized when pooled with other samples and throughout subsequent manipulations.
RNA-containing samples obtained from each of a plurality of patients are separately incubated with one or more oligonucleotides. As explained above, the patient-specific identifying sequence for each patient differs in sequence from the patient-identifying sequences for other patients. The barcode-comprising oligonucleotides in the separate incubations thus contain distinct barcodes for each patient. Samples can be separately incubated in droplets, microfluidic devices, wells, tubes, or any other compartments in which each patient samples is in a separate compartment.
Following the annealing step, the patient nucleic acid preparation containing the oligonucleotides (hybridized to SARS-CoV-2 viral RNA, if it is present) is pooled with the nucleic acid preparations from other patients that were similarly processed, i.e., incubated with an oligonucleotide comprising a barcode region that is specific for each patient under conditions in which the oligonucleotide will anneal to the target viral sequence if it is present in the sample. Hybridization complexes are then purified from the pool, e.g., via a biotin tag.
In some embodiments, in order to mitigate potential binding of oligonucleotides that comprise other patient identifier sequences the oligonucleotide comprising the patient-specific barcode is added in significant molar excess, e.g., 2-fold or 5-10-fold, of target viral RNA in the annealing incubation to block specific binding of alternate barcodes after pooling. In some embodiments, a single-stranded DNA nuclease, e.g., a 5′ exonuclease, is added after annealing, but prior to pooling, to remove free, i.e., unannealed, oligonucleotides that may otherwise anneal at room temperature.
In embodiments in which at least two, and preferably at least three, collinear oligonucleotides are annealed to the target viral nucleic acid and ligated to one another via RNA-splinted DNA ligation. An example of an RNA-splinted DNA ligase is Chlorella virus DNA ligase (PBCV-1 DNA ligase) (see, e.g., Lohman et al., Nucleic Acids Res. 42:1831-1844, 2014) or an analog or homolog thereof. PBCV-1 DNA ligase ligates adjacent, single-stranded DNA splinted by a complementary RNA strand. In preferred embodiments, at least three collinear oligonucleotides annealed to target viral RNA are ligated to provide at least two ligations, which can reduce or eliminate non-specific ligation events.
In embodiments in which one oligonucleotide is used to incubate with patient nucleic samples in the annealing step, the oligonucleotide annealed to the viral target region is extended using reverse transcriptase.
Following ligation, or reverse transcription, a portion of the pooled nucleic acid sample is then amplified to determine whether or not a pool contains SARS-CoV-2 sequences. Any type of amplification reactions can be used. In some embodiments, qPCR is performed using SARS-CoV-2 specific primers to amplify viral nucleic acids.
Alternative amplification reactions to determine positive pools include T7 amplification, rolling circle amplification (RCA), loop-mediated isothermal amplification (LAMP) or any other suitable amplification reaction. For example, LAMP or RCA amplification reactions can be employed to generate a fluorescently amplified product that can be quantified.
Pools that are determined to be negative are not analyzed further. Positive pools are further processed in preparation for sequencing.
A positive pool is processed to balance the library to provide a balanced cohort sequencing library such that SARS-CoV-2 sequences from patients having a high SARS-CoV-2 viral titer do not dominate the sequencing library and prevent identification of SARS-CoV-2 sequences from other patients who may have very low viral SARS-CoV-2 titers. This procedure employs an asymmetric RNase H-dependent PCR reaction.
For the asymmetric PCR, each patient-specific primer that targets the corresponding patient identifier barcode, is supplied in a common limiting concentration during PCR amplification. During this asymmetric PCR, each patient sub-library transitions from exponential to linear amplification once the patient-specific primer is consumed. The number of double stranded ligation products generated by this asymmetric PCR will then be narrowly distributed across all patients in the cohort. The library is then sequenced to determine the patient barcode sequences, thereby identifying patients that are positive for SARS-CoV-2.
RNase-dependent PCR reactions are known (see e.g., Dobsy et al, BMC Biotechnology, 11:80. 2011). The reaction employs a cleavable RNA base in a PCR primer to increase specificity. In some embodiments, the RNA base is incorporated at or near, e.g., within 1, 2, 3, 4, 5, 6, or 7 nucleotides of the 3′ end of the primer. The following examples of primer sequences are provided to illustrate the primer sequences that hybridize to the patient-specific identifier regions:
The 6th from the last base is the cleavable RNA base, the terminal 3′ base is a mismatch and each primer is blocked at the 3′ end with a spacer.
The above procedure provides a balanced cohort library from a positive pool. The library is then processed for sequencing using high throughput sequencing methodology.
An illustrative protocol employing three collinear oligonucleotides for annealing to SARS-CoV-2 RNA, if present in a patient sample; and reagents for performing the method are provided below. Sequences referred to in the protocol are provided at the end. One of skill understands that variations will be recognized by those of skill in the art.
Patients for which both paired samples admit a p-value less (greater) than a selected false positive rate (FPR) are reported as positive (negative). Patients with discordant p-values (one above and one below the FPR threshold) are considered indeterminate and will be re-processed using the remaining 20 μl of barcoded sample from step 2.
CAAGCAGAAGACGGCATACGAGATTCGGTAGTAGCCAATTTGGTCATCTGGAC
It is understood that the examples and embodiments described throughout the specification are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims.
The examples provide data illustrating aspect of Identity Preserving Sample Multiplexing (IPSM) technology, which preserves the identity of patient samples by non-enzymatically barcoding each patient virus sample prior to pooling, purification, concurrent enzymatic processing and patient barcode sequencing. In particular, the examples illustrate high sensitivity sample barcoding (the limit of detection is currently less than 50 molecules and on track for single-digit sensitivity), low levels of crosstalk between pooled patient samples (fewer than 1 in 1,000,000 barcodes misrepresent the patient origin), and efficient, massively parallel sequencing of patient barcodes using a pool balancing technique termed internal cohort balancing (ICB).
Patient samples were barcoded by annealing high melting-temperature DNA oligonucleotides to lysed viral RNA. Annealing was highly efficient across a broad range of lysis conditions, and barcodes remained stably bound during subsequent purification and ligation reactions, as these procedures are executed at room temperature, well below the melting temperature of the barcoding oligonucleotides. To test the specificity and sensitivity of both annealing and RNA-splinted DNA ligation, we synthesized a hybrid oligonucleotide sequence comprising a 52 base pair RNA sequence that is the complement of a covalently linked DNA sequence (
Although the DNA splinted ligase SplintR is highly specific to base pair mutations at the ligation junction, short regions of perfect complementarity near the ligation junction may exist among sequences present in co-isolated host RNA. In order to reduce or eliminate these non-specific ligation events, we adopted a dual ligation scheme in which three collinear oligonucleotides were annealed and subsequently ligated following purification (
To estimate the limit of detection, we synthesized a 252 bp RNA fragment containing the barcode target region, and performed the detection assay across a titration of more than 100,000 to fewer than 50 molecules. We then used digital droplet TaqMan PCR (ddPCR, BioRad) to directly count the number of ligation events in each sample. These data establish a current limit of detection between 5 molecules (the detection limit of the ddPCR platform) and 50 molecules (the lowest abundance template tested to date,
In this example, we opted to detect RNA by ligation because of a unique property of how the barcode is oriented: the barcoding oligonucleotide (a, see
In order to measure the abundance of viral RNA for each sample within a pooled cohort by sequencing, a balanced library is constructed such that each patient sub-library is similarly represented. Otherwise low-titer positive samples will be obscured by over-sequenced patient samples with high viral load. While it is straightforward to balance reads across separate patient pools, building a library that balances reads within a patient pool is important, given that viral loads vary by many orders of magnitude among patients. To solve this problem, we developed a novel framework for internal cohort balancing (ICB) using asymmetric PCR (
For the ICB approach to work at scale in a multiplexed, asymmetric PCR reaction (e.g., 100-1000 patients per cohort), it is important that each patient sub-library saturates independently. Consequently, we have employed RNaseH-dependent PCR to ensure that patient-specific primers do not cross-amplify mismatched targets. The degree to which barcodes are resolved by this mechanism will determine the maximum achievable patient cohort size. A distinct advantage of this “balance & sequence” approach over a naïve patient-specific qPCR readout is that even if the ICB PCR is not perfectly specific, crosstalk only effects the efficiency of the sequencing read-out (by introducing an imbalanced representation of samples) and does not result in wrongly diagnosed patients.
Examples 1-4 thus support that this method is robust and promises to dramatically reduce per-sample sequencing costs.
In this example, a region of low secondary structure in the SARS-CoV-2 RNA that also provides for design of oligonucleotide having a GC content of about 45% to about 55% serves as the target hybridization region. For example, in some embodiments, oligonucleotides may bind at a region starting at position 28448 within the N gene of SARS-CoV-2 (e based on the MT007544.1 genome build (NCBI, Severe acute respiratory syndrome coronavirus 2 isolate Australia/VIC01/2020, complete genome). Examples of sequences of the SAR-CoV-2-targeting region of three collinear oligonucleotides designated as alpha, beta, or gamma as designated in
An example of a complete alpha oligonucleotide sequence is: /5Phos/TCGAGGGAATTTAAGGTCTTCCTTGCCATGTCGANNNNNNNNNN (SEQ ID NO:40). The self-complementary sequences TCGA are shown in bold, as is the barcode, represented by “N”.
An example of a complete beta oligonucleotide sequence is:
GGCATACGAGATTCGGTAGTAGCCAATTTGGTCATCTGGACT 3′.
The sequence shown in bold is a universal amplification sequence.
All publications, patents, and patent applications cited herein are hereby incorporated by reference with respect to the material for which they are expressly cited.
This application claims priority to U.S. provisional application No. 63/088,855 filed Oct. 7, 2020, the entire contents of which are incorporated by reference.
This invention was made with Government support under contract W911NF1920185 awarded by the Defense Advanced Research Projects Agency. The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/053834 | 10/6/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63088855 | Oct 2020 | US |