The Sequence Listing associated with this application is provided in text format in lieu of a paper copy, and is hereby incorporated by reference into the specification. The name of the text file containing the Sequence Listing is 830109_416WO_SEQUENCE_LISTING.txt. The text file is 97.5 KB, was created on Sep. 15, 2019, and is being submitted electronically via EFS-Web.
The present disclosure relates to methods and kits for depleting unwanted RNA species from RNA samples, especially for constructing transcriptome sequencing libraries.
Libraries constructed for transcriptome sequencing are heavily composed of unwanted species (e.g., cytoplasmic ribosomal RNA, mitochondrial ribosomal RNA, and globin mRNA) that take up a majority of the sequencing budget and render RNA sequencing extremely inefficient. rRNA alone constitutes greater than 80% of the RNA found a sample. As a result, various methods have been developed to enrich for mRNA or deplete unwanted RNA from next generation sequencing (NGS) libraries. For example, poly(A) RNA is isolated from RNA samples. While effective, this procedure is laborious and does not allow for the characterization of long non-coding RNAs or other RNAs which lack poly-A tails. In addition, it is unsuitable for heavily damaged samples, such as FFPE samples. Other methods use antisense DNA or RNA probes to hybridize unwanted RNAs in RNA samples prior to NGS library construction. After hybridization, in one approach, the samples are digested with a double stranded RNA specific enzyme (RNAase H), thus removing RNA probes and unwanted RNAs. However, this method is not very efficient and is fraught with technical uncertainties. In an alternative approach, the probes are biotinylated probes, allowing unwanted RNAs to be selectively removed out of the samples by capturing the probe/target RNA molecules to streptavidin coated beads or surfaces. However, this method is time consuming, costly, and only somewhat effective. In addition, the bead binding and washing is arduous and usually results in significant sample loss due to non-specific binding and capture.
The present disclosure provides methods, blocking oligonucleotides, compositions, and kits for depleting unwanted RNA species from RNA samples.
In one aspect, the present disclosure provides a method for inhibiting cDNA synthesis of one or more unwanted RNA species in an RNA sample during reverse transcription, comprising:
(a) providing an RNA sample that comprises one or more desired RNA species and one or more unwanted RNA species,
(b) annealing one or more blocking oligonucleotides to one or more regions of the one or more unwanted RNA species in the RNA sample to generate a template mixture,
wherein the one or more blocking oligonucleotides are complementary, and stably bind, to the one or more regions of the one or more unwanted RNA species, and comprise 3′ modifications that prevent the one or more blocking oligonucleotides from being extended, and
(c) incubating the template mixture with a reaction mixture that comprises:
(i) at least one reverse transcriptase,
(ii) one or more reverse transcription primers, and
(iii) a reaction buffer,
under conditions sufficient to synthesize cDNA molecules using the one or more desired RNA species as template(s), wherein cDNA synthesis using the one or more unwanted RNA species is inhibited.
In another aspect, the present disclosure provides a set of blocking oligonucleotides that are complementary (preferably fully complementary) to a plurality of regions of an unwanted RNA species, wherein each blocking oligonucleotide comprises one or more modified nucleotides that increase its binding to a region of the unwanted RNA species.
In a related aspect, the present disclosure provides a plurality of sets of blocking oligonucleotides.
In another aspect, the present disclosure provides a kit of inhibiting cDNA synthesis of one or more unwanted RNA species in an RNA sample, comprising:
(1) (a) one or more blocking oligonucleotides that are complementary to one or more regions of one or more unwanted RNA species in the RNA sample, and each comprise one or more modified nucleotides that increase the binding between the one or more blocking oligonucleotides and the regions of the one or more unwanted RNA species, or
(2) a reverse transcriptase.
In another aspect, the present disclosure provides a method for designing blocking oligonucleotides for inhibiting cDNA synthesis of one or more unwanted RNA species in an RNA sample during reverse transcription, comprising:
(a) generating multiple blocking oligonucleotides complementary to regions of the one or more unwanted RNA species,
(b) filtering unacceptable blocking oligonucleotides,
(c) generating one or more groups of blocking oligonucleotides that are complementary to multiple different regions of the one or more unwanted RNA species, and
(d) optionally shuffling blocking oligonucleotides among the groups to generate new groups of blocking oligonucleotides and selecting one or more of the new groups of blocking oligonucleotides.
In another aspect, the present disclosure provides use of the kit of any of claims 28 to 43 or component (1) thereof in inhibiting cDNA synthesis of one or more unwanted RNA species in an RNA sample.
The present disclosure provides methods, blocking oligonucleotides, compositions, and kits for depleting unwanted RNA species from RNA samples. The resulting depleted RNA samples are useful for various downstream applications, especially for constructing transcriptome sequencing libraries.
The methods provided herein use blocking oligonucleotides complementary to regions of unwanted RNA species (e.g., locked nucleic acid (LNA)-enhanced antisense oligonucleotides) to inhibit cDNA synthesis of the unwanted RNA species during reverse transcription.
Also disclosed are methods for designing tiled blocking oligonucleotides (e.g., LNA-enhanced antisense oligonucleotides), along an undesired RNA (e.g., cytoplasmic and mitochondrial rRNA, globin mRNA) at designated positions. The LNA bases are positioned in the oligonucleotides to facilitate the persistent binding of the antisense oligonucleotides to the unwanted RNA at commonly used reverse transcription temperatures.
The methods for depleting unwanted RNA species provided herein have one or more of the following advantages compared to existing methods: (1) because unwanted RNA depletion according to the present methods occurs during, rather than prior to, NGS library construction, they are faster and take fewer steps; (2) the present methods can be used not only with anchored oligo(dT) primed libraries, but also with random hexamer primed libraries; (3) the present methods can be used to deplete any unwanted RNAs (as opposed to enriching only poly(A)-containing RNAs using oligo(dT)); (4) the present methods do not significantly alter the remaining RNA profile of the samples (as opposed to poly(A) mRNA enrichment using oligo(dT)); (5) the present methods are more effective than or at least as effective as existing methods in depleting unwanted RNAs; and (6) the present methods cause less sample loss (e.g., compared to rRNA removal using biotin-labeled antisense oligonucleotides and streptavidin coated magnetic beads).
In the following description, any ranges provided herein include all the values in the ranges. It should also be noted that the term “or” is generally employed in its sense including “and/or” (i.e., to mean either one, both, or any combination thereof of the alternatives) unless the content dictates otherwise. Also, as used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content dictates otherwise. The terms “include,” “have,” “comprise” and their variants are used synonymously and to be construed as non-limiting. The term “about” refers to ±10% of a reference a value. For example, “about 50° C.” refers to “50° C.±5° C.” (i.e., 50° C.±10% of 50° C.).
In one aspect, the present disclosure provides a method for inhibiting cDNA synthesis of one or more unwanted RNA species in an RNA sample during reverse transcription, comprising:
(a) providing an RNA sample that comprises one or more desired RNA species and one or more unwanted RNA species,
(b) annealing one or more blocking oligonucleotides to one or more regions of the one or more unwanted RNA species in the RNA sample to generate a template mixture,
wherein the one or more blocking oligonucleotides are complementary, and stably bind, to the one or more regions of the one or more unwanted RNA species, and comprise 3′ modifications that prevent the one or more blocking oligonucleotides from being extended, and
(c) incubating the template mixture with a reaction mixture that comprises:
under conditions sufficient to synthesize cDNA molecules using the one or more desired RNA species as template(s), wherein cDNA synthesis using the one or more unwanted RNA species is inhibited.
1. Inhibiting cDNA Synthesis
cDNA synthesis of an RNA species is inhibited if the amount of single stranded or double stranded cDNA generated using the RNA species as a template during reverse transcription is reduced at a statistically significant degree under a modified condition (e.g., in the presence of one or more blocking oligonucleotides complementary to one or more regions of the RNA species) compared to the amount of single stranded or double stranded cDNA generated during reverse transcription under a reference condition (e.g., in the absence of the one or more blocking oligonucleotides).
The reduction in the amount of synthesized cDNA may be measured using qPCR or transcriptome sequencing as disclosed in the Examples provided herein, and may also include other techniques known to those skilled in the art (e.g., DNA microarrays).
The inhibition of cDNA synthesis of an RNA species may be referred to as depletion of the RNA species or as depleting the RNA species. Even though the RNA species is not physically removed from an initial RNA sample, the involvement of the RNA species in the downstream manipulation or analysis of the initial RNA sample is reduced or eliminated due to the inhibition of cDNA synthesis of the RNA species.
2. Unwanted RNA Species
The term “unwanted RNA species,” “unwanted RNAs,” or “unwanted RNA molecules” refers to RNA species or molecules undesired in an initial RNA composition for a given downstream manipulation or analysis of the RNA composition. Such RNA species or molecules are not the targets of, but may interfere with, downstream manipulation or analysis.
The unwanted RNA may be any undesired RNA present in the initial RNA composition. The unwanted RNA may comprise any sequence as long as it is distinguishable by its sequence from the remaining RNA population of interest to allow a sequence-specific design of blocking oligonucleotides.
According to one embodiment, the unwanted RNA is selected from one or more of the group consisting of rRNA, tRNA, snRNA, snoRNA and abundant protein mRNA.
When processing eukaryotic samples, the unwanted RNA may be an eukaryotic rRNA, preferably selected from 28S rRNA, 18S rRNA, 5.8S rRNA, 5S rRNA, mitochondrial 12S rRNA and mitochondrial 16S rRNA. Preferably, at least two, at least three, more preferred at least four of the aforementioned rRNA types are depleted, wherein preferably 18S rRNA and 28S rRNA are among the rRNAs to be depleted. According to one embodiment, all of the aforementioned rRNA types are depleted. Furthermore, it is preferred to also deplete other non-coding rRNA species, such as 12S and 16S eukaryotic mitochondrial rRNA molecules in addition to the 28S rRNA and 18S rRNA. In the cases where total RNA from plant samples are processed, plastid rRNA, such as chloroplast rRNA, may be depleted.
In certain embodiments, unwanted RNA(s) is one or more selected from the group consisting of 23S, 16S and 5S prokaryotic rRNA. This is particularly feasible when processing a prokaryotic sample. Preferably, all these rRNA types are depleted using one or more groups of blocking oligonucleotides specific for the respective rRNA type.
Furthermore, the methods of the present disclosure may also be used to specifically deplete abundant protein-coding mRNA species. Depending on the processed sample, mRNA comprised in the sample may correspond predominantly to a certain abundant mRNA type. For example, when intending to analyze, for example, sequence the transcriptome of a blood sample, most of the mRNA comprised in the sample will correspond to globin mRNA. However, for many applications, the sequence of the comprised globin mRNA is not of interest and thus, globin mRNA, even though being a protein-coding mRNA, also represents an unwanted RNA for this application. Additional unwanted, abundant protein-coding mRNAs may include ACTB, B2M, GAPDH, GUSB, HPRT1, HSP90AB1, LDHA, NONO, PGK1, PPIH, RPLP0, TFRC or various mitochondrial genes.
In certain embodiments, as described below, the RNA sample may be derived from (e.g., isolated from) a starting material that contains nucleic acids from multiple organisms, such as an environmental sample that contains plant, animal, and/or bacterial species or a clinical sample that contains human cells or tissues and one or more bacterial species. In such embodiments, unwanted RNA species may encompass or consist of a specific type of RNA species (e.g., 5S rRNA) from multiple organisms (e.g., multiple different bacteria) present in the starting material so that the method is capable of inhibiting cDNA synthesis of the specific type of RNA species from the multiple organisms (e.g., inhibiting cDNA synthesis of 5S rRNA from multiple bacteria in a starting material). In some other embodiments, unwanted RNA species may encompass or consist of multiple types of RNA species (e.g., 5S, 16S and 23S rRNAs) from multiple organisms (e.g., multiple different bacteria) present in the starting material so that the method is capable of inhibiting cDNA synthesis of multiple types of RNA species from the multiple organisms (e.g., inhibiting cDNA synthesis of 5S rRNA from multiple bacteria in a starting material).
In certain embodiments, the number of different unwanted RNA species to which blocking oligonucleotides are complementary is at least 2, at least 3, at least 4, or at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 75, at least 100, at least 200, at least 300, at least 400, or at least 500, and/or at most 1,000,000, at most 500,000, at most 100,000, at most 50,000, at most 10,000, at most 9000, at most 8000, at most 7000, at most 6000, at most 5000, at most 4000, at most 3000, or at most 2000, such as from 2 to 1,000,000, from 100 to 500,000, from 500 to 100,000, and from 1000 to 10,000.
3. RNA Sample
As described above, step (a) of a method for inhibiting cDNA synthesis of one or more unwanted RNA species in an RNA sample during reverse transcription disclosed herein is to provide an RNA sample that comprises one or more desired RNA species and one or more unwanted RNA species.
The term “RNA sample” refers to an RNA-containing sample. Preferably, an RNA sample is a sample containing RNAs isolated from a starting material. An RNA sample may further contain DNAs isolated from the starting material. In some embodiments, an RNA sample contains RNA molecules that have been isolated from a starting material and further fragmented. In other cases, an RNA sample is derived from a directly lysed sample without specific nucleic acid isolation.
The term “nucleic acid” or “nucleic acids” as used herein refers to a polymer comprising ribonucleosides or deoxyribonucleosides that are covalently bonded typically by phosphodiester linkages between subunits. Nucleic acids include DNA and RNA. DNA includes but is not limited to genomic DNA, linear DNA, circular DNA, plasmid DNA, cDNA and free circulating DNA (e.g., tumor derived or fetal DNA). RNA includes but is not limited to hnRNA, mRNA, noncoding RNA (ncRNA), and free circulating RNA (e.g., tumor derived RNA). Noncoding RNA includes but is not limited to rRNA, tRNA, lncRNA (long non coding RNA), lincRNA (long intergenic non coding RNA), miRNA (micro RNA), and siRNA (small interfering RNA),
The starting material from which the RNA sample is generated can be any material that comprises RNA molecules. The starting material can be a biological sample or material, such as a cell sample, an environmental sample, a sample obtained from a body, in particular a body fluid sample, and a human, animal or plant tissue sample. Specific examples include but are not limited to whole blood, blood products, plasma, serum, red blood cells, white blood cells, buffy coat, urine, sputum, saliva, semen, lymphatic fluid, amniotic fluid, cerebrospinal fluid, peritoneal effusions, pleural effusions, fluid from cysts, synovial fluid, vitreous humor, aqueous humor, bursa fluid, eye washes, eye aspirates, pulmonary lavage, bone marrow aspirates, lung aspirates, biopsy samples, swab samples, animal (including human) or plant tissues, including but not limited to samples from liver, spleen, kidney, lung, intestine, brain, heart, muscle, pancreas, cell cultures, as well as lysates, extracts, or materials and fractions obtained from the samples described above or any cells and microorganisms and viruses that may be present on or in a sample and the like.
Materials obtained from clinical or forensic settings that contain RNA are also within the intended meaning of a starting material. Preferably, the starting material is a biological sample derived from a eukaryote or prokaryote, preferably from human, animal, plant, bacteria or fungi. Preferably, the starting material is selected from the group consisting of cells, tissue, tumor cells, bacteria, virus and body fluids such as blood, blood products (e.g., buffy coat, plasma and serum), urine, liquor, sputum, stool, CSF and sperm, epithelial swabs, biopsies, bone marrow samples and tissue samples, preferably organ tissue samples such as lung, kidney or liver.
The starting material also includes processed samples such as preserved, fixed and/or stabilised samples. Non-limiting examples of such samples include cell containing samples that have been preserved, such as formalin fixed and paraffin-embedded (FFPE samples) or other samples that were treated with cross-linking or non-crosslinking fixatives (e.g., glutaraldehyde) or the PAXgene Tissue system. For example, tumor biopsy samples are routinely stored after surgical procedures by FFPE, which may compromise the RNA integrity and may in particular degrade the comprised RNA. Thus, an RNA sample may consist of or comprise modified or degraded RNA. The modification or degradation can be due to, for example, treatment with a preservative(s).
Nucleic acids can be isolated from a starting material according to methods known in the art to provide an RNA sample. The RNA sample may contain both DNA and RNA. In certain embodiments, the RNA sample contains predominantly RNA as DNA in the starting material has been removed or degraded. RNA in an RNA sample may be total RNA isolated from a starting material. Alternatively, RNA in an RNA sample may be a fraction of total RNA (e.g., the fraction containing mostly mRNA) isolated from a starting material where certain RNA species (e.g., RNA without a poly(A) tail) have been depleted or removed.
As disclosed above, an RNA sample may contain RNA molecules that have been isolated from a starting material and further fragmented. Fragmenting nucleic acids, such as isolated RNAs, may be performed physically, enzymatically or chemically. Physical fragmentation includes acoustic shearing, sonication, and hydrodynamic shearing. Enzymatic fragmentation may use an endonuclease (e.g., RNase III) that cleaves RNA into small fragments with 5′ phosphate and 3′ hydroxyl groups. Chemical fragmentation includes heat and divalent metal cation (e.g., magnesium or zinc).
Also as disclosed above, in certain embodiments, an RNA sample is from a crude lysate where specific nucleic acid isolation has not been performed.
4. Desired RNA Species
In addition to unwanted RNAs, an RNA sample also contains one or more desired RNA species. Desired RNA species can be any RNA species or molecules characteristic(s) of which (e.g., expression level or sequence) are of interest. In certain embodiments, the desired RNA species comprise mRNA, preferably those of which expression level changes (compared with a reference expression level) or sequence changes (compared with wild type sequences) are associated with a disease or disorder or with responsiveness to a treatment of a disease or disorder.
5. Blocking Oligonucleotides
The term “blocking oligonucleotide” as used herein refers to an oligonucleotide that is complementary and capable of stably binding to a region of an unwanted RNA species. The blocking oligonucleotide may be described as “targeting” the region of the unwanted RNA species. The blocking oligonucleotide is incapable of being extended due to a modification at its 3′ terminus (i.e., “3′ modification”). Consequently, the blocking oligonucleotide is able to inhibit cDNA synthesis using the region of the unwanted RNA species as a template during reverse transcription.
An oligonucleotide is capable of stably binding to a region of a RNA species if the oligonucleotide anneals to the region of the RNA species and stays bound to the region of the RNA species during reverse transcription of a RNA sample comprising the RNA species.
Preferably, a blocking oligonucleotide contains one or more modified nucleotides that increase the binding between the oligonucleotide and the region of the unwanted RNA species compared to an oligonucleotide with the same sequence but without any modified nucleotides. In certain other embodiments, a blocking oligonucleotide does not contain any of the above-described modified nucleotides, but is sufficiently long to be able to stably bind to a region of the unwanted RNA species during reverse transcription.
In the embodiments where a blocking oligonucleotide contains one or more modified nucleotides that increase the binding between the oligonucleotide and the region of an unwanted RNA species, the region of the unwanted RNA species to which the blocking oligonucleotide is complementary may be at least 10 nucleotides in length, such as at least 11, 12, 13, 14, 15, 16, 17, or 18 nucleotides in length. Such a region may be at most 100 nucleotides in length, such as at most 90, 80, 70, 60, 50, 45, 40, 35, 30, 25, 24, 23, 22, 21, or 20 nucleotides in length. In certain embodiments, the region may be 10 to 100 nucleotides in length, such as 15 to 80, 20 to 60, 25 to 40, 10 to 30, 16 to 24, or 18 to 22 nucleotides in length.
In the embodiments where a blocking oligonucleotide does not contain any modified nucleotides that increase the binding between the oligonucleotide and the region of an unwanted RNA species, the region of the unwanted RNA species to which the blocking oligonucleotide is complementary may be at least 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 nucleotides in length. Such a region may be at most 100 nucleotides in length, such as at most 90, 80, 70, 60, or 50 nucleotides in length. In certain embodiments, the region may be 20 to 100 nucleotides in length, such as 25 to 90, 25 to 80, 25 to 70, 25 to 60, 25 to 50, 25 to 40, 25 to 30, 30 to 90, 30 to 80, 30 to 70, 30 to 60, 30 to 50, 30 to 40, 35 to 90, 35 to 80, 35 to 70, 35 to 60, 35 to 50, 35 to 40, 40 to 90, 40 to 80, 40 to 70, 40 to 60, or 40 to 50 nucleotides in length.
As disclosed above, a blocking oligonucleotide is complementary to a region of an unwanted RNA species. An oligonucleotide is complementary to a region of an unwanted RNA species if at least 80%, such as at least 85%, at least 90% or preferably at least 95% of nucleotides in the oligonucleotide are complementary to the region of the unwanted RNA species. In certain embodiments, a blocking oligonucleotide comprises one or more (e.g., at most 6, at most 5, at most 4, at most 3, at most 2, or only 1) nucleotide mismatches with the region of the unwanted RNA species. Preferably, the mismatch is at or near (e.g., within the first 10 nucleotides, such as within the first 5 nucleotides, from) the 5′ terminus of the oligonucleotide. For example, a blocking oligonucleotide having the sequence of 5′-GACAAACCCTTGTGTCGAG-3′ (SEQ ID NO: 15) is complementary to the region of 3′-GTCGACACAAGGGTTTGTC-5′ (SEQ ID NO: 508) of an unwanted RNA species even though there is a mismatch between the 5′ terminal “G” of the oligonucleotide and the 3′ terminal “G” of the region of the unwanted RNA species. In certain other embodiments, a blocking oligonucleotide may comprise a one or more nucleotide-insertion (e.g., an insertion having at most 6, at most 5, at most 4, at most 3, at most 2, or only 1 nucleotide) when compared with the fully complementary sequence of the region of the unwanted RNA species. For example, a blocking oligonucleotide may comprise two segments that are fully complementary to two contiguous sections of a region of an unwanted RNA species respectively, but are separated by one or more nucleotides.
Preferably, a blocking oligonucleotide is fully complementary to a region of an unwanted RNA species. An oligonucleotide is fully complementary to a region of an unwanted RNA species if each nucleotide of the oligonucleotide is complementary to a nucleotide at the corresponding position in the region of the unwanted RNA species. For example, an oligonucleotide having the sequence of 5′-GACAAACCCTTGTGTCGAG-3′ (SEQ ID NO: 15) is fully complementary to the region of 3′-CTCGACACAAGGGTTTGTC-5′ (SEQ ID NO: 509) of an unwanted RNA species.
Also as disclosed above, a blocking oligonucleotide has a 3′ modification that prevents the oligonucleotide from being extended during reverse transcription. The 3′ modification replaces the 3′-OH of an oligonucleotide with another group (e.g., a phosphate group), which rendering the resulting oligonucleotide incapable of being extended by a reverse transcriptase during reverse transcription. 3′ modifications that prevent oligonucleotides that contain such modifications from being extended include but are not limited to 3′ ddC (dideoxycytidine), 3′ inverted dT, 3′ C3 spacer, 3′ Amino Modifier (3AmMo), and 3′ phosphorylation. Some of 3′ modifications are commercially available, such as from Integrated DNA Technologies.
a. Blocking Oligonucleotides Having Modified Nucleotides for Increasing Binding
As disclosed above, preferably, a blocking oligonucleotide comprises one or more modified nucleotides that increase the binding between the blocking oligonucleotide and a region of an unwanted RNA species to which the blocking oligonucleotide is complementary compared to an oligonucleotide with the same sequence but without any modified nucleotide.
Modified nucleotides are nucleotides other than naturally occurring nucleotides that each comprise a phosphate group, a 5-carbon sugar (i.e., deoxyribose or ribose), and a nitrogenous base selected from adenine, cytosine, guanine, thymine and uridine.
A modified nucleotide that increases the binding between an oligonucleotide and a region of an unwanted RNA species compared to an oligonucleotide with the same sequence but without any modified nucleotides if it increases the melting temperature of the duplex formed between the oligonucleotide comprising the modified nucleotide and the region of the unwanted RNA species compared to the melting temperature of the duplex formed between the oligonucleotide with the same sequence but without any modified nucleotides and the region of the unwanted RNA species measured under the same conditions (e.g., in 20 mM KCl).
The melting temperature (Tm) of an oligonucleotide as used in the present disclosure is the temperature at which 50% of the oligonucleotide is duplexed with its perfect complement and 50% is free in 115 mM KCl. Tm is determined by measuring the absorbance change of the oligonucleotide with its complement as a function of temperature (i.e., generating a melting curve). The Tm is the reading halfway between the double-stranded DNA and single stranded DNA plateaus in the melting curve.
Exemplary nucleotides capable of increasing Tm of oligonucleotides that comprise such nucleotides include but are not limited to nucleotides comprising 2′-O-methylribose, 5-hydroxybutynyl-2′-deoxyridine (Integrated DNA Technologies), 2-Amino-2′deoxyadenosine (IBA Lifesciences), 5-Methyl-2′deoxycytidine (IBA Lifesciences), or locked nucleic acids (LNA).
Preferably, blocking oligonucleotides comprise one or more LNAs. LNA is a modified RNA nucleotide. The ribose moiety of an LNA nucleotide is modified with an extra bridge connecting the 2′ oxygen and 4′ carbon. The bridge “locks” the ribose in the 3′-endo (North) conformation, which is often found in the A-form duplexes. LNA nucleotides can be mixed with DNA or RNA residues in the oligonucleotide and hybridize with DNA or RNA according to Watson-Crick base-pairing rules. The locked ribose conformation enhances base stacking and backbone pre-organization. This significantly increases the hybridization properties (melting temperature) of oligonucleotides (see e.g., Kaur et al., Biochemistry 45(23): 7347-55, 2006; Owczarzy et al., Biochemistry 50(43): 9352-67, 2011). An increase in the duplex melting temperature can be 2-8° C. per LNA nucleotide when incorporated into an oligonucleotide. DNA or RNA oligonucleotides that comprise one or more LNA nucleotides are referred to as “LNA oligonucleotides.” Such oligonucleotides can be synthesized by conventional phosphoamidite chemistry and are commercially available (e.g., from Exiqon).
Additional blocking oligonucleotides may be peptide nucleic acid oligomers that are synthetic polymers similar to DNA or RNA but with backbone composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. In peptide nucleic acid oligomers, various purine and pyrimidine bases are linked to the backbone by a methylene bridge (—CH2—) and a carbonyl group (—(C═O)—).
The number of modified nucleotides (e.g., LNAs) in a blocking oligonucleotide ranges from 3 to 30, preferably 4 to 16, more preferably 3 to 15.
The lengths of blocking oligonucleotides may be at least 10 nucleotides in length, such as at least 11, 12, 13, 14, 15, 16, 17, or 18 nucleotides in length. They may be at most 100 nucleotides, such as at most 100 nucleotides in length, such as at most 90, 80, 70, 60, 50, 45, 40, 35, 30, 25, 24, 23, 22, 21, or 20 nucleotides in length. In certain embodiments, the lengths may be 10 to 100 nucleotides, such as 15 to 80, 20 to 60, 25 to 40, 10 to 30, 16 to 24, or 18 to 22 nucleotides.
The melting temperature of duplexes formed between blocking oligonucleotides and regions of unwanted RNA species to which the blocking oligonucleotides are complementary range from 80 to 96° C., 82 to 94° C., or preferably 86 to 92° C. as measured in 115 mM KCl.
b. Blocking Oligonucleotides without Modified Nucleotides for Increasing Binding
As disclosed above, in certain embodiments, a blocking oligonucleotide does not comprise any modified nucleotides that increase the binding between the blocking oligonucleotide and a region of an unwanted RNA species to which the blocking oligonucleotide is complementary, but is sufficiently long to be able to stably bind to a region of the unwanted RNA species during reverse transcription.
The lengths of blocking oligonucleotides without the above-described modified nucleotides may be at least 20 nucleotides in length, such as at least 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length. They may be at most 100 nucleotides, such as at most 90, 80, 70, 60, 50, 45, or 40 nucleotides in length. In certain embodiments, the lengths may be 25 to 100 nucleotides, such as 30 to 80, 30 to 70, 30 to 60, 30 to 50, 30 to 45, 30 to 40, 35 to 80, 35 to 70, 35 to 60, 35 to 50, 35 to 45, 40 to 80, 40 to 70, 40 to 60, 40 to 50, or 40 to 45 nucleotides.
The melting temperature of duplexes formed between blocking oligonucleotides and regions of unwanted RNA species to which the blocking oligonucleotides are complementary range from 80 to 96° C., 82 to 94° C., or preferably 86 to 92° C. as measured in 115 mM KCl.
c. Multiple Blocking Oligonucleotides
The number of blocking oligonucleotides used in the method disclosed herein may be at least 2, at least 3, at least 4, at least 5, at least 10, at least 50, at least 100, at least 150, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1500, or at least 2000, at least 3000, at least 4000, at least 5000, at least 6000, at least 7000, at least 8000, at least 9000, or at least 10,000, and/or at most 100,000, at most 90,000, at most 80,000, at most 70,000, at most 60,000, or at most 50,000, such as from 2 to 100,000, from 100 to 80,000, or from 800 to 50,000.
In certain embodiments, 2 or more blocking oligonucleotides are complementary to multiple different regions (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) of a single unwanted RNA species. In certain other embodiments, 2 or more blocking oligonucleotides are complementary to multiple different regions (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 different regions) of multiple unwanted RNA species (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 unwanted RNA species).
In certain embodiments where multiple blocking oligonucleotides are complementary to multiple different regions of one or more unwanted RNA species, the distances between two neighboring regions of the one or more unwanted RNA species to which the blocking oligonucleotides are complementary may range from 0 to 100 nucleotides, such as 0 to 75 nucleotides, 0 to 50 nucleotides, 20 to 100 nucleotides, 20 to 75 nucleotides, 20 to 50 nucleotides, 30 to 100 nucleotides, 30 to 75 nucleotides, 30 to 50 nucleotides, or 30 to 45 nucleotides.
In certain embodiments, the blocking oligonucleotides comprise or consist of a set of blocking oligonucleotides for inhibiting cDNA synthesis of a single unwanted RNA species (e.g., E. coli 5S rRNA). The blocking oligonucleotides are complementary to multiple different (preferably evenly spaced as described in detail in other sections below) regions of the unwanted RNA species.
In certain other embodiments, the blocking oligonucleotides comprise or consist of a plurality of sets of blocking oligonucleotides for inhibiting cDNA synthesis of multiple unwanted RNA species. Each set of blocking oligonucleotides are complementary to multiple different (preferably evenly spaced) regions of an unwanted RNA species as described above, and different sets of blocking oligonucleotides are complementary to evenly spaced regions of different unwanted RNA species.
Blocking oligonucleotides may also be referred herein as “blockers,” “blocking antisense oligonucleotides,” or the like.
Exemplary blocking oligonucleotides (Blockers B1 to B193) that can be used in depleting human 18S rRNA in the method according to the present disclosure are described in the Examples. Exemplary blocking oligonucleotides (Blockers 5S1 to 5S100, Blockers 16S1 to 16S100, Blockers 23S1 to 23S100) that can be used in depleting bacterial 5S, 16S, and 23S rRNAs, respectively, are described in Example 4.
Additional descriptions of blocking oligonucleotides are provided in Sections B, C and D of the present disclosure below.
6. Annealing Blocking Oligonucleotides to Unwanted RNAs
As disclosed above, step (b) of a method for inhibiting cDNA synthesis of one or more unwanted RNA species in an RNA sample during reverse transcription disclosed herein is to anneal one or more blocking oligonucleotides to one or more regions of one or more unwanted RNA species in the RNA sample to generate a template mixture.
This step may be performed by mixing an RNA sample with one or more blocking oligonucleotides under conditions appropriate for the blocking oligonucleotide(s) to anneal to the one or more regions of the one or more unwanted RNA species in the RNA sample. The resulting mixture is referred to herein as “annealing mixture.”
Typically, the annealing mixture is first heated to a high temperature (e.g., about 65° C., about 70° C., 75° C., 80° C., 85° C., 90° C., or 95° C., or at least 65° C., at least 70° C., preferably at least 75° C.) for a sufficient period of time (e.g., at least about 30 seconds, such as at least 1 minute or at least 2 minutes) so that the RNA molecules in the RNA sample is denatured, and then cooled down to a lower temperature (e.g., at or lower than 40° C., such as at or lower than 25° C., at or lower than room temperature (22° C. to 25° C.), or at 4° C.).
The cooling process may be performed in various ways, such as gradually reduced the temperature at defined levels for defined time periods or cooling down naturally to room temperature. Exemplary cooling processes include but are not limited to the following:
Turn off thermocycler, let it cool down to room temperature
The amount of one or more blocking oligonucleotides in the annealing mixture may be from about 0.1 pmol to about 50 pmol per blocking oligonucleotide, such as from about 0.5 pmol to about 20 pmol, from about 0.5 pmol to about 10 pmol, from about 1 pmol to about 20 pmol, from about 1 pmol to about 10 pmol, from about 1.5 pmol to about 10 pmol, from about 1.5 pmol to about 8 pmol, or from 2 pmol to about 7 pmol per blocking oligonucleotide.
Preferably, about the same amount of each of different blocking oligonucleotides is present in the anneal mixture. In certain embodiments, the amounts of different blocking oligonucleotides are different. For example, the molar ratio of the blocking oligonucleotide having the highest amount to that having the lowest amount may be from about 10 to about 1.1, about 5 to about 1.1, or about 2 to about 1.1.
The amount of RNA from in the annealing mixture may range from about 1 pg to about 5000 ng, such as from about 5 pg to about 5000 ng, about 10 pg to about 5000 ng, about 100 pg to about 5000 ng, about 1 ng to about 5000 ng, about 5 ng to about 5000 ng, about 10 ng to about 5000 ng, about 100 ng to about 5000 ng, about 5 pg to about 3000 ng, about 10 pg to about 3000 ng, about 100 pg to about 3000 ng, about 1 ng to about 3000 ng, about 5 ng to about 3000 ng, about 10 ng to about 3000 ng, about 100 ng to about 3000 ng, about 5 pg to about 1000 ng, about 10 pg to about 1000 ng, about 100 pg to about 1000 ng, about 1 ng to about 1000 ng, about 5 ng to about 1000 ng, about 10 ng to about 1000 ng, about 100 ng to about 1000 ng, or from about 25 ng to about 500 ng. The amount of RNA may be at least about 1 pg, about 5 pg, about 10 pg, about 50 pg, about 100 pg, about 500 pg, about 1 ng, about 5 ng, about 10 ng, about 50 ng or about 100 ng and/or at most about 500 ng, about 1000 ng, about 3000 ng, or about 5000 ng.
The annealing mixture may contain, in addition to one or more blocking oligonucleotides and an RNA sample, one or more monovalent cations (e.g., Na+ and K+) to increase the annealing of the blocking oligonucleotides to unwanted RNA species. The monovalent concentration in the annealing mixture ranges from 5 mM to 50 mM, such as 10 mM to 30 mM or 15 mM to 25 mM.
Preferably, the annealing mixture contains NaCl or KCl at a concentration of 10 mM to 30 mM, such as 15 mM to 25 mM.
The annealing mixture may optionally comprise a buffer with a pH ranging from 5 to 9, such as a buffer containing 20-50 nM phosphate, pH 6.5 to 7.5.
Once the annealing process is performed, the annealing mixture may be referred to as “template mixture,” which will be used as templates for subsequent cDNA synthesis. In certain embodiments, the annealing mixture may be cleaned up before used as templates for cDNA synthesis. For example, the cleanup may be performed using a solid support that binds nucleic acid (e.g., RNA) by mixing the annealing mixture with the solid support, separating the solid support with nucleic acids bound thereto from the liquid phase, optionally washing the solid support, and eluting the nucleic acids from the solid support. This mixing, separating, optional washing and eluting process may be repeated once (i.e., two rounds of cleanup), twice (i.e., three rounds of cleanup), or more times. Exemplary solid support includes QIAseq beads as used in the Examples described below.
7. Reverse Transcription
As disclosed above, step (c) of a method for inhibiting cDNA synthesis of one or more unwanted RNA species in an RNA sample during reverse transcription disclosed herein is to incubate the template mixture generated as described above with a reaction mixture that comprises: (i) at least one reverse transcriptase, (ii) one or more reverse transcription primers, and (iii) a reverse transcription buffer under conditions sufficient to synthesize cDNA molecules using one or more desired RNA species as template(s). Because one or more blocking oligonucleotides anneal to one or more unwanted RNA species, the transcription of such unwanted RNA species are inhibited.
8. Reverse Transcriptase
The term “reverse transcriptase” refers to an RNA dependent DNA polymerase capable of synthesizing complementary DNA (cDNA) strand using an RNA template. Reverse transcriptases useful in step (c) may be one or more viral reverse transcriptase, including but not limited to AMV reverse transcriptase, RSV reverse transcriptase, MMLV reverse transcriptase, HIV reverse transcriptase, EIAV reverse transcriptase, RAV reverse transcriptase, TTH DNA polymerase, C. hydrogenoformans DNA polymerase, Superscript® I reverse transcriptase, Superscript® II reverse transcriptase, Thermoscript™ RT MMLV, ASLV and RNase H mutants thereof, or a mixture of some of the above enzymes. Preferably, the reverse transcriptase is EnzScript™ M-MLV Reverse Transcriptase RNA H-(Enzymatics), which contains three point mutations that eliminate measurable RNase H activity native to wild type M-MLV reverse transcriptase. Loss of RNase H activity enables greater yield of full-length cDNA transcripts (5 kb) and increased thermal stability over wild type M-MLV reverse transcriptase. Increased thermostability allows for higher incubation temperatures of the first-strand reaction (up to 50° C.), aiding in denaturation of template RNA secondary structure of GC-rich regions.
9. Reverse Transcription Primers
Reverse transcription primers useful in step (c) may be oligo(dT) primers, that is, single strand sequences of deoxythymine (dT). The length of oligo(dT) can vary from 8 bases to 30 bases and may be a mixture of oligo(dT) with different lengths such as oligo(dT)12-18 or oligo(dT) with a single defined length such as oligo(dT)18 or oligo(dT)20.
Preferably, reverse transcription primers used in step (c) are random primers, such as random hexamers (N6), heptamers (N7), octamers (N8), nonamers (N9), etc.
In certain embodiments, reverse transcription primers may be a mixture of one or more oligo(dT) primers and one or more random primers.
In certain other embodiments, reverse transcription primers may comprise primers specific for one more desired RNA species.
The reverse transcription primers may be immobilized or anchored, such as anchored oligo(dT) primers. Alternatively, they may be in solution and not immobilized to a solid phase (e.g., beads).
10. Reaction Buffer and Other Components
The reaction mixture of step (c) (also referred to as “reverse transcription reaction mixture”) comprises a reaction buffer suitable for reverse transcription, such as a Tris buffer with pH about 8.3 or 8.4 at a concentration ranging from about 20 to about 50 mM.
The reaction mixture also comprises dNTPs at a concentration ranging from about 0.1 to about 1 mM (e.g., about 0.5 mM) each dNTP.
The reaction mixture typically also comprises MgCl2 at a concentration ranging from about 1 to about 10 mM, such as about 3 to about 5 mM.
The reaction mixture optionally further comprises a reducing agent, such as DTT at a concentration ranging from about 5 to about 20 mM, such as about 10 mM.
11. Conditions for Reverse Transcription
The reaction mixture is subject to conditions sufficient to synthesize cDNA molecules using one or more desired RNA species in an RNA sample as templates. The conditions typically include incubating the reaction mixture at one or more appropriate temperatures (e.g., at about 35° C. to about 50° C. or about 37° C. to 45° C., such as at about 35° C., about 37° C., about 40° C., about 42° C., about 45° C., or about 50° C.) for a sufficient period of time (e.g., for about 30 minutes to about 1 hour). In certain embodiments, a low temperature incubation step (e.g., at 25° C. for about 2 to about 10 minutes) may be performed for primer extension to increase the primer Tm before a higher temperature incubation step for the first stand cDNA synthesis.
12. Synthesizing 2nd cDNA Strands
In certain embodiments, after step (c) (i.e., the synthesis of the first strand cDNA), the method disclosed herein may comprise step (d) that synthesize the second strand cDNA to generate double stranded cDNA.
Procedures known in the art for synthesizing the second strand cDNA may be used in step (d). For example, E. Coli RNase H may be used to nick nicks and gaps of mRNA resulting from the endogenous RNase H of reverse transcriptase. Polymerase I then initiates second strand synthesis by nick translation. E. coli DNA ligase subsequently seals any breaks left in the second strand cDNA, generating double stranded cDNA products.
Step (d) may also be performed using QIAseq Stranded Total RNA Library kit (QIAGEN) or other commercially available kits (e.g., from Illumina, New England BioLabs, KAPA Biosystems, Thermo Fisher Scientific).
13. Constructing Sequencing Library and Sequencing
In certain embodiments, after double stranded DNA is generated in step (d), the method disclosed herein further comprises step (e) to amplify the double stranded cDNA generated in step (d) to construct a sequencing library. The sequencing library may be used to sequence the one or more desired RNA species in a further step, step (f).
The double stranded cDNA generated in step (d) may be used to prepare a sequencing library in step (e) using methods known in the art. For example, the double stranded DNA may be end-repaired, subject to A-addition, and ligated with adapters. The adapter-linked cDNA molecules may be further amplified via one or more rounds of amplification (e.g., universal PCR, bridge PCR, emulsion PCR, or rolling cycle amplification) to generate a sequencing library (i.e., a collection of DNA fragments that are ready to be sequenced, such as comprising a sequencing primer-binding site).
The sequencing library may be sequenced using methods known in the art in step (f) (see, Myllykangas et al., Bioinformatics for High Throughput Sequencing, Rodriguez-Ezpeleta et al. (eds.), Springer Science+Business Media, LLC, 2012, pages 11-25). Exemplary high throughput DNA sequencing systems include, but are not limited to, the GS FLX sequencing system originally developed by 454 Life Sciences and later acquired by Roche (Basel, Switzerland), Genome Analyzer developed by Solexa and later acquired by Illumina Inc. (San Diego, Calif.) (see, Bentley, Curr Opin Genet Dev 16:545-52, 2006; Bentley et al., Nature 456:53-59, 2008), the SOLiD sequence system by Life Technologies (Foster City, Calif.) (see, Smith et al., Nucleic Acid Res 38: e142, 2010; Valouev et al., Genome Res 18:1051-63, 2008), CGA developed by Complete Genomics and acquired by BGI (see, Drmanac et al., Science 327:78-81, 2010), PacBio RS sequencing technology developed by Pacific Biosciences (Menlo Park, Calif.) (see, Eid et al., Science 323: 133-8, 2009), and Ion Torrent developed by Life Technologies Corporation (see, U.S. Patent Application Publication Nos. 2009/0026082; 2010/0137143; and 2010/0282617).
Sequencing reads obtained from sequencing the sequencing library may be analyzed to determine the expression levels and/or sequences of RNA species of interest. Such information may be useful in diagnosing diseases or predicting responsiveness of the subjects from which the RNA samples are obtained to specific treatments.
14. Other Downstream Uses
The double stranded cDNA generated in step (d) may be used in microarray analysis to determine expression levels, including the presence or absence, of RNA species of interest. Additional uses include functional cloning to identify genes based on their encoded proteins' functions, discover novel genes, or study alternative slicing in different cells or tissues.
15. Depletion Efficiency
The first strand cDNA molecules may be used as templates in qPCR to check the efficiency of the blocking oligonucleotides in inhibiting cDNA synthesis from unwanted RNA species to which the blocking oligonucleotides are complementary. An exemplary method is disclosed in Example 1 below. Briefly, an increase in Ct of amplifying a cDNA reverse transcribed from an unwanted RNA species when one or more blocking oligonucleotides are used during reverse transcription compared with when no blocking oligonucleotides are used during reverse transcription indicates that the one or more blocking oligonucleotides are effective in inhibiting cDNA synthesis from the unwanted RNA species. The increase in Ct may be compared with that of another treatment (e.g., a commercially available treatment) to demonstrate equivalent to or improvement over the other treatment.
In certain embodiments, the Ct value of amplifying a cDNA reverse transcribed from an unwanted RNA species when one or more blocking oligonucleotides are used during reverse transcription is at least 2 times, at least 2.5 times, at least 3 times, or at least 4 times as much as the Ct value when no blocking oligonucleotides are used during reverse transcription.
The efficiency of the blocking oligonucleotides in inhibiting cDNA synthesis from unwanted RNA species may also be analyzed via whole transcriptome sequencing. An exemplary method is disclosed in Example 2 below. Briefly, the decrease in percentage of total reads that are derived from an unwanted RNA species (e.g., 18S rRNA) when one or more blocking oligonucleotides are used during reverse transcription compared with when no blocking oligonucleotides are used during reverse transcription indicates that the one or more blocking oligonucleotides are effective in inhibiting cDNA synthesis from the unwanted RNA species. The decrease in percentage may be compared with that of another treatment (e.g., a commercially available treatment) to demonstrate equivalent to or improvement over the other treatment.
The percentage of total reads that are derived from an unwanted RNA species (e.g., 18S rRNA) when one or more blocking oligonucleotides are used during reverse transcription according to the present disclosure may be at most 5%, at most 4%, at most 3%, at most 2%, at most 1%, at most 0.8%, at most 0.6%, at most 0.5%, at most 0.4%, at most 0.3%, at most 0.2%, at most 0.1% or at most 0.05%.
The ratio of the percentage of total reads that are derived from an unwanted RNA species (e.g., 18S rRNA) when one or more blocking oligonucleotides are used during reverse transcription to that when no blocking oligonucleotide are used may be at most 0.2, at most 0.15, at most 0.1, at most 0.08, at most 0.06, at most 0.05, at most 0.04, at most 0.03, or at most 0.02.
16. Off-Target Depletion
The first strand cDNA molecules may be used as templates in qPCR to check the degree of off-target depletion by blocking oligonucleotides. An exemplary method is disclosed in Example 1 below. Briefly, an increase in Ct of amplifying a cDNA reverse transcribed from a desired RNA species when one or more blocking oligonucleotides targeting one or more unwanted RNA species are used during reverse transcription compared with when no blocking oligonucleotides are used during reverse transcription indicates that the one or more blocking oligonucleotides cause inhibition of cDNA synthesis from the desired RNA species. Such inhibition is referred to “off-target depletion.” The increase in Ct may be compared with that of another treatment (e.g., a commercially available treatment) to evaluate off-target depletion of the two treatments.
In certain embodiments, the increase in Ct value of amplifying a cDNA reverse transcribed from a desired RNA species (e.g., GAPDH mRNA) between when one or more blocking oligonucleotides are used during reverse transcription and when no blocking oligonucleotides are used during reverse transcription is at most 20%, at most 15%, at most 10%, at most 8%, at most 6%, or at most 5% of the Ct value when no blocking oligonucleotides are used during reverse transcription.
The degree of off-target depletion by blocking oligonucleotides may also be analyzed via whole transcriptome sequencing. An exemplary method is disclosed in Example 2 below. Briefly, a scatter plot may be generated comparing the relative gene expression for genes other than those encoding the one or more unwanted RNA species when one or more blocking oligonucleotides are used during reverse transcription with when no blocking oligonucleotides are used during reverse transcription. R2 of the scatter plot indicates how similar the relative gene expression is between the treatment with the one or more blocking oligonucleotides and no treatment. The closer R2 is to 1, the less degree of off-target depletion associated with the use of the one or more blocking oligonucleotides.
In certain embodiments, R2 of the scatter plot as generated above is at least 0.85, at least 0.86, at least 0.87, at least 0.88, at least 0.89, at least 0.90, or at least 0.91.
In one aspect, the present disclosure provides a method for designing blocking oligonucleotides for inhibiting cDNA synthesis of one or more unwanted RNA species in an RNA sample during reverse transcription, comprising:
(a) generating multiple blocking oligonucleotides fully complementary (preferably fully complementary) to regions of the one or more unwanted RNA species,
(b) filtering unacceptable blocking oligonucleotides,
(c) generating one or more groups of blocking oligonucleotides that are complementary to multiple different (preferably evenly spaced) regions of the one or more unwanted RNA species, and
(d) optionally shuffling blocking oligonucleotides among the groups to generate new groups of blocking oligonucleotides, and selecting one or more of the new groups of blocking oligonucleotides.
The selected group of blocking oligonucleotides is effective in inhibiting cDNA synthesis of the one or more unwanted RNA species and preferably with minimal off-target depletion. Both the effectiveness on inhibition of cDNA synthesis from the one or more unwanted RNA species and off target depletion of the selected group of blocking oligonucleotides may be evaluated as described above in Section A.
Preferably, the blocking oligonucleotides each comprise one or more modified nucleotides that increase the binding between the blocking oligonucleotides and their targeted regions of unwanted RNA species. Also preferably, the blocking oligonucleotides each comprise a 3′ modification that prevents them from being extended.
The following description uses LNA oligonucleotides as exemplary blocking oligonucleotides. Blocking oligonucleotides containing other modified nucleotides as well as those without any modified nucleotides for increasing binding to regions of unwanted RNA species but of a sufficient length for stably binding to regions of unwanted RNA species may be designed similarly to be effective in depleting unwanted RNA species and preferably with little or no off-target depletion.
1. Step (a)
Step (a) of the method for designing blocking oligonucleotides provided herein is to generate multiple blocking oligonucleotides complementary (preferably fully complementary) to regions of the one or more unwanted RNA species.
In this step, one or more parameters of blocking oligonucleotides, such as the lengths of blocking oligonucleotides, predicted Tms of duplexes formed between blocking oligonucleotides and their corresponding regions of unwanted RNA species (i.e., regions of unwanted RNA species to which the blocking oligonucleotides are fully complementary), self hybridization, and off-target hybridization in the transcriptome from which the unwanted RNA species belong(s), may be characterized and scored. The scores of the one or more parameters of each blocking oligonucleotide are used to generate a final combined score. During such a process, different parameters may be weighed differently to produce the final combined score.
The algorithm for predicting Tms of duplexes formed between blocking oligonucleotides and their corresponding regions of unwanted RNA species may be based on SantaLucia, Proc. Natl. Acad. Sci. USA 95: 1460-5, 1998, and Tm measurements of LNA containing blocking oligonucleotides.
Preferably, a memetic algorithm is used to improve and select the best blocking oligonucleotides by testing different parameters. For example, the Tm of the duplexes formed between a blocking oligonucleotide and its corresponding region of an unwanted RNA species may be improved by the following four methods: (1) reduce the number of LNA nucleotides, (2) increase the number of LNA nucleotides, (3) alter LNA nucleotide pattern, and (4) alter the blocking oligonucleotide length. In such a manner, multiple small algorithms are used to test different parameters to see if changes will improve the overall core of a blocking oligonucleotide.
LNA blocking oligonucleotides may have one, more, and all of the following characteristics:
(1) Their lengths may range from 10 to 30 nucleotides, preferably 16 to 24 nucleotides, 17 to 23 nucleotides or 18 to 22 nucleotides.
(2) The number of LNAs in each LNA blocking oligonucleotide may range from 2 to 20, preferably 4 to 16, and more preferably 3 to 15.
(3) The melting temperatures of duplexes formed between LNA blocking oligonucleotides and the regions of unwanted RNA species to which the LNA blocking oligonucleotides are complementary range from 80 to 96° C., preferably 86 to 92° C.
(4) The number of LNA blocking oligonucleotides generated in step (a) is at least 100, at least 500, at least 1000, at least 2000, at least 3000, at least 4000, at least 5000, at least 6000, at least 7000, at least 8000, at least 9000, or at least 10000, and/or at most 1,000,000, at most 500,000, at most 100,000, at most 90,000, at most 80,000, at most 70,000, at most 60,000, or at most 50,000, such as from 100 to 1,000,000, from 500 to 100,000, and from 1000 to 10,000.
(5) LNA blocking oligonucleotides are likely to bind to the regions of unwanted RNA species to which the LNA blocking oligonucleotides are complementary rather than to themselves.
(6) LNA blocking oligonucleotides are likely to bind to the regions of unwanted RNA species to which the LNA blocking oligonucleotides are complementary rather than to other regions in the transcriptome to which the unwanted RNA species belong(s).
(7) The number of the different unwanted RNA species to which the LNA blocking oligonucleotides are complementary (preferably fully complementary) is at least 2, at least 3, at least 4, or at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 75, at least 100, at least 200, at least 300, at least 400, or at least 500, and/or at most 1,000,000, at most 500,000, at most 100,000, at most 50,000, at most 10,000, at most 9000, at most 8000, at most 7000, at most 6000, at most 5000, at most 4000, at most 3000, or at most 2000, such as from 2 to 1,000,000, from 100 to 500,000, from 500 to 100,000, and from 1000 to 10,000.
Additional descriptions of blocking oligonucleotides are provided in Section A.5. Blocking oligonucleotides above and Section C. Sets of Blocking Oligonucleotides.
2. Step (b)
Step (b) of the method for designing blocking oligonucleotides provided herein is to filter unacceptable blocking oligonucleotides. This may be done by setting a minimum final combined score for blocking oligonucleotides. Blocking oligonucleotides with final combined scores less than the minimum final combined score are deemed unacceptable and filtered out.
3. Step (c)
Step (c) of the method for designing blocking oligonucleotides provided herein is to generate one or more groups of blocking oligonucleotides that are complementary to multiple different (preferably evenly spaced) regions of the one or more unwanted RNA species.
In certain embodiments, the groups of blocking oligonucleotides target multiple regions of a single RNA species (e.g., human 5S rRNA).
In certain other embodiments, the groups of blocking oligonucleotides target a single type of multiple RNA species from multiple organisms (e.g., bacterial 5S rRNA).
In certain other embodiments, the groups of blocking oligonucleotides target multiple types of RNA species of a single organism (e.g., human rRNAs).
In certain other embodiments, the groups of blocking oligonucleotides target multiple types of RNA species of multiple organisms (e.g., bacterial rRNAs).
To inhibit cDNA synthesis of an unwanted RNA species, it is preferred that blocking oligonucleotides are spread out along the unwanted RNA species so that no region of the unwanted RNA species will be reverse transcribed into cDNA and detected in downstream analysis. A program may be used in this step to select blocking oligonucleotides with top final combined scores and pick those that spread out evenly across the unwanted RNA species.
Preferably, multiple different regions of an unwanted RNA species to which blocking oligonucleotides are complementary are evenly spaced along the unwanted RNA species. The even distribution of the different regions allows effective inhibition of cDNA synthesis of the unwanted RNA species with a minimal or reduced number of different blocking oligonucleotides.
Regions of an unwanted RNA species are evenly spaced if the longest distance between neighboring regions is at most 2.5 times, preferably at most 2 times or at most 1.5 times, the shortest distance between neighboring regions. The distance between neighboring regions is the number of nucleotides between the 3′ terminus of the upstream region (i.e., the region closer to the 5′ terminus of the unwanted RNA species) and the 5′ terminus of the downstream region (i.e., the region closer to the 3′ terminus of the unwanted RNA species). For example, if the distances between neighboring regions of an unwanted RNA species are 30, 32, 35, 37, 38, 40, 43, and 45, such regions are deemed evenly spaced because the longest distance between neighboring region is 45, which is 1.5 time of the shortest distance 30.
The distances between evenly distributed neighboring regions of an unwanted RNA species to which blocking oligonucleotides are complementary may range from 20 to 50, 25 to 50, 30 to 50, 20 to 45, 25 to 45, 30 to 45, or 31 to 43 nucleotides.
In certain embodiments, multiple different regions of an unwanted RNA species to which blocking oligonucleotides are complementary are not evenly distributed. The distance between neighboring regions may range from 0 to 100 nucleotides, such as 0 to 75 nucleotides, 0 to 50 nucleotides, 5 to 100 nucleotides, 5 to 75 nucleotides, 5 to 50 nucleotides, 5 to 40 nucleotides, 5 to 30 nucleotides, 10 to 100 nucleotides, 10 to 75 nucleotides, 10 to 50 nucleotides, 10 to 40 nucleotides, 10 to 30 nucleotides, 20 to 100 nucleotides, 20 to 75 nucleotides, 20 to 60 nucleotides, or 30 to 100 nucleotides. In general, more blocking oligonucleotides are required if neighboring regions of an unwanted RNA species to which the blocking oligonucleotides are complementary are located close to each other (e.g., at most 25, 20, 15, 10, or 5 nucleotides apart). However, the neighboring regions should not be too far apart (e.g., more than 75, 100, 125, or 150 nucleotides apart) to avoid inadequate inhibition of cDNA synthesis using the sequences between the neighboring regions of the unwanted RNA species as templates.
In certain embodiments where a large number (e.g., at least 10, at least 50, at least 100, at least 500, at least 1000, at least 2000, at least 300, at least 4000, or at least 5000) of different unwanted RNA species are to be depleted, the group may be formed by selecting blocking oligonucleotides to increase the total coverage of the targeted unwanted RNA species the most. The different unwanted RNA species may be of a single type of unwanted RNA from multiple organisms (e.g., bacterial 5S rRNA), multiple types of unwanted RNA from a single organisms (e.g., human abundant mRNAs), or multiple types of unwanted RNA from multiple organisms (e.g., bacterial rRNAs).
In some embodiments, a single blocking oligonucleotide may target unwanted RNA species from multiple organisms that are homologous to each other (e.g., 5S rRNA from certain bacterial strains). Thus, the number of the blocking oligonucleotides in a group may be less than the number of unwanted RNA species that the blocking oligonucleotides target.
A greedy algorithm may be used for maximizing coverage of a large number of different unwanted RNA species. A greedy algorithm is an algorithm that always makes a locally-optimal choice in the hope that this choice will lead to a globally-optional solution. An exemplary greedy algorithm may include first defining the blocking oligonucleotide length (“BLOCKER LENGTH”), the distance between neighboring blocking oligonucleotides (“DISTANCE”) when annealing to the unwanted RNA species, and the number of blocking oligonucleotides (“NUMBER”) to form a group, and performing the following steps:
1. Count frequencies of all kmers with K=BLOCKER LENGTH in the set of target sequences,
2. Sort kmers by frequency,
3. Add most frequent kmer to blocker set,
4. Find location of selected kmer in all target sequences,
5. Determine kmers within 0.5 to 2 DISTANCE (preferably 1 DISTANCE) downstream of kmer location and 0.2 to 1 DISTANCE (preferably 0.5 DISTANCE) upstream in each target sequence,
6. Decrement kmers within DISTANCE in frequency list, and
7. Repeat steps 2-6 until the NUMBER of blockers is reached.
An example of using such an algorithm is provided in Example 4 for designing blocking oligonucleotides to deplete bacterial 5S, 16S and 23S rRNA sequences.
Such a design algorithm is useful in selecting a blocker that increases a total coverage of target sequence the most. Because kmer frequencies are often autocorrelated, decrementing counts of adjacent kmers avoids selecting a blocker in regions already covered by a previously selected blocker. Decrementing kmer counts upstream avoids selecting blocker too close to an already selected blocker downstream. Such an algorithm is tuned to partially cover as many target sequences as possible rather than covering fewer target sequences completely.
4. Step (d)
In certain embodiments where multiple groups are generated in step (c), the method for desgining blocking oligonucleotides may further comprise shuffling blocking oligonucleotides among the groups to generate new groups of blocking oligonucleotides and selecting one or more of the new groups of blocking oligonucleotides.
Groups of blocking oligonucleotides may be scored as the average score of the blocking oligonucleotides in the group. Parameters affecting scoring include physical parameters of blocking oligonucleotides such as melting temperature of duplexes formed between blocking oligonucleotides and their corresponding regions of unwanted RNA species, lengths of blocking oligonucleotides, self-hybridization of blocking oligonucleotides, LNA patterns, numbers of LNA nucleotides in blocking oligonucleotides, and off target hybridization of blocking oligonucleotides; and group parameters such as minimal and maximum distances between neighboring blocking oligonucleotides when annealing to their corresponding regions of unwanted RNA species and cross hybridization among blocking oligonucleotides within the group.
In this step of shuffling blocking oligonucleotides among groups of blocking oligonucleotides, cross hybridization within a group of blocking oligonucleotides is minimized. For example, the number of blocking oligonucleotides that may form duplexes with each other with a high Tm (e.g., more than 65° C.) are minimized.
A program may be used to shuffle blocking oligonucleotides and test if the score of a group of blocking oligonucleotides would be increased. This process may be repeated multiple times to generate a group of blocking oligonucleotides with a highest group score. Multiple groups of blocking oligonucleotides may be generated each with a highest group score for each of a given unwanted RNA species (e.g., one group targeting human 5.8S rRNA with a highest group score and another group targeting human 18S rRNA with another highest group score) or for a given type of unwanted RNA species (e.g., one group targeting bacterial rRNAs with a highest group score and another group targeting bacterial 16S rRNAs with another highest group score).
The selected group with a highest score may have at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, or at least 1000 different blocking oligonucleotides, and/or at most 10,000, at most 9000, at most 8000, at most 7000, at most 6000, or at most 5000 different blocking oligonucleotides, such as from 10 to 10,000 or from 100 to 5000 different blocking oligonucleotides.
In certain embodiments, multiple groups of blocking oligonucleotides are selected, such groups may be pooled together when annealing to unwanted RNA species from a RNA sample. Alternatively, they may anneal to their target unwanted RNA species separately.
5. Experimental Testing for Blocking Efficiency and Off-Target Depletion
The selected group of blocking oligonucleotides may be further tested experimentally for its blocking efficiency and/or off-target depletion. Exemplary methods for such testing are described in Section A above and in the Examples below.
In one aspect, the present disclosure provides a set of blocking oligonucleotides for inhibiting cDNA synthesis of an unwanted RNA species. The blocking oligonucleotides are complementary (preferably fully complementary) to multiple different (preferably evenly spaced) regions of the unwanted RNA species.
The number of blocking oligonucleotides in a set may be at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50, and/or at most 1000, at most 900, at most 800, at most 700, at most 600, at most 500, at most 400, at most 300, or at most 200, such as from 2 to 1000, from 5 to 500, and from 10 to 300.
Preferably, the set of blocking oligonucleotides are a set of LNA blocking oligonucleotides, and may have from one to all of the following characteristics:
(1) Their lengths may range from 10 to 30 nucleotides, preferably 16 to 24 nucleotides, 17 to 23 nucleotides or 18 to 22 nucleotides.
(2) The number of LNAs in each LNA blocking oligonucleotide may range from 2 to 20, preferably 4 to 16, and more preferably 3 to 15.
(3) The melting temperatures of duplexes formed between LNA blocking oligonucleotides and the regions of unwanted RNA species to which the LNA blocking oligonucleotides are complementary range from 80 to 96° C., preferably 86 to 92° C.
(4) Depending on the length of the unwanted RNA species, the number of LNA blocking oligonucleotides is at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, or at least 80.
(5) LNA blocking oligonucleotides are likely to bind to the regions of the unwanted RNA species to which the LNA blocking oligonucleotides are complementary rather than themselves.
(6) LNA blocking oligonucleotides are likely to bind to the regions of the unwanted RNA species to which the LNA blocking oligonucleotides are complementary rather than other regions in the transcriptome to which the unwanted RNA species belongs.
(7) (a) Regions of an unwanted RNA species to which blocking oligonucleotides are complementary are evenly distributed along the unwanted RNA species, and the distances between neighboring regions may range from 20 to 50, 25 to 50, 30 to 50, 20 to 45, 25 to 45, 30 to 45, or 31 to 43 nucleotides, or
In a related aspect, the present disclosure provides a plurality of sets of blocking oligonucleotides for inhibiting cDNA synthesis of multiple unwanted RNA species. Each set of blocking oligonucleotides are complementary (preferably fully complementary) to multiple different (preferably evenly spaced) regions of an unwanted RNA species as described above. In certain embodiments, different sets of blocking oligonucleotides are complementary to multiple different (preferably evenly spaced) regions of different unwanted RNA species.
The number of sets may be at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 75, at least 100, at least 200, at least 300, at least 400, or at least 500, and/or at most 10,000, at most 9000, at most 8000, at most 7000, at most 6000, at most 5000, at most 4000, at most 3000, or at most 2000, such as from 2 to 10,000, from 2 to 5000, from 2 to 1000, from 2 to 500, from 2 to 200, from 10 to 10,000, from 10 to 5000, from 10 to 1000, from 10 to 500, from 10 to 200, from 100 to 10,000, from 100 to 5000, from 100 to 1000, or from 100 to 500.
The total number of blocking oligonucleotides in the plurality of sets of blocking oligonucleotides may be at least 5, at least 10, at least 50, at least 100, at least 150, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1500, or at least 2000, at least 3000, at least 4000, at least 5000, at least 6000, at least 7000, at least 8000, at least 9000, or at least 10,000, and/or at most 100,000, at most 90,000, at most 80,000, at most 70,000, at most 60,000, or at most 50,000, such as from 2 to 100,000, from 100 to 80,000, or from 800 to 50,000.
In certain embodiments, the multiple unwanted RNA species targeted by a plurality of sets of blocking oligonucleotides belong to multiple types of RNA species from a single organism (e.g., human 5.8S rRNA, human 18S rRNA and human 28S rRNA). In certain other embodiments, the multiple unwanted RNA species are from multiple organisms. In such embodiments, the multiple unwanted RNA species may belong to a single type of RNA species (e.g., 5S rRNA from multiple bacterial strains) or multiple different types of RNA species (e.g., 5S rRNA, 16S rRNA, and 23S rRNA from multiple bacterial strains).
The number of the different unwanted RNA species to which the sets of blocking oligonucleotides are fully complementary is at least 2, at least 3, at least 4, or at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 75, at least 100, at least 200, at least 300, at least 400, or at least 500, and/or at most 1,000,000, at most 500,000, at most 100,000, at most 50,000, at most 10,000, at most 9000, at most 8000, at most 7000, at most 6000, at most 5000, at most 4000, at most 3000, or at most 2000, such as from 2 to 1,000,000, from 100 to 500,000, from 500 to 100,000, and from 1000 to 10,000.
In certain embodiments, multiple sets of blocking oligonucleotides are prepared, each set targeting one or more unwanted species from a single organism (e.g., human, a plant, a specific bacterial strain). Depending on what organisms are potentially present in a given sample, different sets of blocking oligonucleotides targeting unwanted species for such organisms may be combined together and used in depleting the unwanted RNA species from those organisms. The number of different organisms whose unwanted RNA species are to be depleted may be at least 2, at least 3, at least 4, at least 5, at least 10, at least 25, at least 50, and/or at most 10,000, at most 5,000, at most 1000, at most 500, or at most 100, such as 2 to 10,000, 5 to 5,000, or 10 to 1,000.
In a related aspect, the present disclosure provides a composition or mixture comprising one or more blocking oligonucleotides, a set of blocking oligonucleotides, and/or a plurality of sets of blocking oligonucleotides as described in this section and other sections (e.g., Section A). For example, the mixture may comprise a plurality of sets of oligonucleotides that target human unwanted RNA species and one or more blocking oligonucleotides that target one or more unwanted RNA species from a pathogenic bacterial strain.
The present disclosure also provides a kit for inhibiting cDNA synthesis of one or more unwanted DNA species in an RNA sample, comprising: (1) (a) one or more blocking oligonucleotides that are complementary (preferably fully complementary) to one or more regions of one or more unwanted RNA species in the RNA sample, or (b) a set or a plurality of sets of blocking oligonucleotides, and (2) a reverse transcriptase.
The sections above (e.g., Sections A. 5. and C) are referred to for describing the one or more blocking oligonucleotides, the set or plurality of sets of blocking oligonucleotides, and reverse transcriptases that may be included in the kit.
In certain embodiments, the kit may further comprise from one to all of the following components:
reverse transcription primers,
reaction buffer suitable for reverse transcription,
enzymes for second cDNA strand synthesis (e.g., E. Coli RNase H DNA Polymerase I, and E. coli DNA ligase),
DNA polymerase (e.g., Taq DNA polymerase, Pfu DNA polymerase, KOD DNA polymerase, hot-start DNA polymerase, Bst DNA polymerase, Bsu DNA polymerase, Tth DNA polymerase, and Pwo DNA polymerase),
DNA Ligase (e.g., E. coli DNA ligase, T4 DNA ligase, mammalian DNA ligase, and thermostable DNA ligase),
DNA polymerase for sequencing (e.g., T7 DNA polymerase, Sequenase, Sequenase version 2),
oligonucleotide primers for DNA amplification and/or sequencing, and
adaptors (single-stranded or double stranded oligonucleotides that may be ligated to single-stranded or double stranded DNA molecules).
The components of the kits are typically contained in separate vessels or compartments. However, when appropriate, some of the components may be provided as a mixture or composition. Additional descriptions of the components are provided in other sections, including the Examples, of the present disclosure.
The following examples are for illustration and are not limiting.
The following materials and reagents were used in Examples 1-3 of the present disclosure:
Universal Human Reference RNA (UHRR) (Agilent Technologies).
193 pool of Blockers (B1-B193), sequences of which are shown in the table below.
96 pool of Blockers (B1-B193 but only odd numbered wells, i.e., B1, B3, . . . , B193).
5×BC3 RT Buffer: 5× reverse transcription buffer from Qiagen RT2 First Strand Kit
QIAseq Beads
N6 Primer: Random Hexamer ordered from IDT (standard desalting).
2× PA-012 Master Mix: 2× master mix for qPCR that comprises a DNA polymerase from QIAGEN.
This Example describes unwanted RNA depletion of an exemplary method of the present disclosure with that using the Ribo-Zero rRNA
Removal kit by Illumina via qPCR.
Step by Step Workflow:
1a. Hybridize blockers to total RNA sample
1b. rRNA depletion using Illumina Ribo-zero rRNA Removal kit:
2a. Reverse transcription reaction after step 1a
2b. Reverse transcription reaction after step 1b
3. Purify cDNA
4. Perform qPCR
Summary of Data:
Ct values of samples 1-5 show that using increasing amount of B1-6193 blockers resulted in less synthesis of the 18S rRNA cDNA region measured by the 4 qPCR primer assays (18S FP2 and RP2, 18S FP1 and RP1, 18S FP3 and RP3, and 18S FP4 and RP4) compared with those of sample 6 without any blockers. Using 18.55 pmol of each blocker gave the best results in blocking the synthesis of 18S rDNA cDNA synthesis.
Ct values for the 3 house-keeping genes (GAPDH, ACTB and RPLP0) of samples 2-5 indicate that there were no off-target effects due to the presence of blockers because of similar Ct values of samples 2-5 compared to sample 6 without any blockers. 18.55 pmol each blocker (sample 1) caused additional off-target effects compared to no blockers (sample 6).
Comparisons of Ct values between sample 7 (Ribo-Zero depleted) and sample 8 (no Ribo-Zero depletion) show that using Ribo-Zero rRNA Removal kit resulted in less synthesis of the 18S rRNA cDNA region measured by the qPCR primer assays, and that the Ribo-Zero depletion did not cause off-target effects.
The data further show that 8.75 pmol each of 193 blocker pool worked at least as equally well as Ribo-Zero in both reducing amount of rRNA cDNA and in off-target effects.
This Example compared 18S rRNA depletion of an exemplary method of the present disclosure with those using the RiboZero kit, poly(A) mRNA enrichment, and no treatment via sequencing of whole transcriptome libraries.
Step by Step Workflow:
1. A. For 193 pool of Blockers: Mix together 100 ng UHRR with 8.75 pmol of each blocker. Proceed with QIAseq stranded Total RNA Library Kit in step 2 below.
2. QIAseq Stranded Total RNA Library Kit:
Sequencing Parameters:
Illumina NextSeq 500 system with 150 cycles (75×2 paired end) high-output v2. Load 1.4 pM library.
Analysis was done using Galaxy (http://usegalaxy.org). Alignment of paired-end reads using HISAT2 alignment program (Galaxy Version 2.1.0), to reference genome b37 hg19. Gene counting done with featureCounts counting program (Galaxy Version 1.6.0.2), with reference genome b37 hg19 and rRNA gtf file obtained from UCSC table browser.
Summary of Sequencing Results:
Examination of % of total reads that are rRNA reveal that the Blockers 193 pool out-performed Ribo-zero.
Scatter plots (
This Example tested performance of blockers at different RNA amounts.
Step by Step Workflow:
The workflow included the same steps as in Example 2 except adjusting for different input amounts, different blocker pools, different adapter dilutions, and cycles of PCR amplification (see qPCR data table below for the specifics of these changes that occurred in the QIAseq stranded RNA library kit protocol as described in Example 2). Duplicates were performed for each condition.
Summary of qPCR Data:
Blocking of rRNA with 8.75 pmol blocker (Samples 1 and 7) worked as good as with 100 ng input (Sample 13). There was only slight reduction in blocking of rRNA with 4.38 pmol (compare Sample 3 with Sample 1 and compare Sample 9 with Sample 7). For the 3 house-keeping genes (GADPH, ACTB, and RPLP0), inclusion of blockers significantly improved detection and quantification of these genes as indicated by the decreases in Ct values of Samples 1 and 3 compared with Sample 5 and in Ct values of Samples 7 and 9 compared with Sample 11.
When using the pool of 193 blockers, blocking of rRNA with 8.75 pmol blocker (Samples 19 and 27) worked as good as with 100 ng input (Sample 13). Again, there was only a slight reduction in blocking of rRNA with 4.38 pmol (compare Sample 23 with Sample 19 and compare Sample 27 with Sample 31). There was no additional negative effect on the 3 house-keeping genes (Samples 19 and 27) as compared to 100 ng input (Sample 13).
When using the pool of 96 blockers, there was more substantial negative impact on blocking of rRNA (compare Ct values of 18S rRNA assays between Samples 21 and 19, between Samples 22 and 23, between samples 29 and 27, and between Samples 30 and 31). However, there was no additional negative impact on house-keeping genes as compared to 100 ng input (compare Ct values of house-keeping gene assays between Samples 20, 21, 29 and 30 with Sample 13).
Sequencing Parameters:
Sequencing was performed using Illumina NextSeq 500 system with 150 cycles (75×2 paired end) high-output v2. Load 1.6 pM library.
Analysis was done using Galaxy (http://usegalaxy.org). Alignment of paired-end reads was performed using HISAT2 alignment program (Galaxy Version 2.1.0) to reference genome b38 hg38. Gene counting was done with featureCounts counting program (Galaxy Version 1.6.0.2) with reference genome b38 hg38 and rRNA gtf file obtained from UCSC table browser.
Summary of the Above Table:
At all RNA input amounts tested, 8.75 pmol each of the pool of 193 blockers worked the best in reducing the amount of read that were rRNA (see Samples 1, 2, 7, 8, 13, 14, 19, 20, 27, and 28). 4.38 pmol each of the 193 pool also worked well but with some reduction in rRNA blocking performance (see Samples 3, 4, 9, 10, 15, 16, 23, 24, 31, and 32).
Sequencing Results for Non-rRNA Genes (Scatter Plots):
Scatter plots were generated to show the gene expression profiles for 11,000 unique non-rRNA genes for input amounts of 25 ng, 100 ng, 500 ng, and 1000 ng using the pool of 193 blockers at 4.38 pmol or 8.75 pmol each blocker. Each dot represents the log 2 of reads for each unique non-rRNA gene normalized to the average of 2 house-keeping genes GAPDH and ACTB. The scatter plots are summarized in Tables A and B below.
Because the QIASeq Stranded Total RNA Library Kit has a suggested minimum input of 100 ng total RNA, the results for 25 ng input show that the technical duplicates had poor R2 values as expected (see Table A, Ref. Nos. 1 and 2). However, inclusion of the blockers improved R2 values as compared to no blockers (compare R2 values of Ref. Nos. 1 and 2 with that of Ref. No. 3). This improvement was the result of the blockers enhancing the sensitivity of detection and quantification of non-rRNA genes.
Reproducibility of technical duplicates was good for 100 ng, 500 ng, and 1000 ng input (see Table A, Ref. Nos. 4, 5, 7, 8, 10, and 11), and again was better with blockers compared to no-treatment (compare R2 values in Table A between Ref. No. 4 or 5 and Ref. No. 6; between Ref. No. 7 or 8 with Ref. No. 9; and Ref. No. 10 or 11 with Ref. No. 12).
Scatter plots show that there was very good correlation of non-rRNA gene expression profiles between 100 ng, 500 ng, 1000 ng, for all blocker amounts (see Table B, Ref. Nos. 17, 21, 25, 27, 30, 31, 33, and 34, all of which have R2 values greater than 0.96), indicating that using the pool of 193 blockers at either 8.75 pmol or 4.38 pmol did not negatively alter gene expression profiles while still effectively eliminating rRNA.
This Example describes the design of blockers for blocking cDNA synthesis of bacterial 5S, 16S and 23S rRNA sequences. This design is applicable for samples that are either single-species (for example E. coli K12) or mixed communities as in complex samples, such as stool, sewage or environmental, where there are potentially thousands of different rRNA sequences.
For design, 5S bacterial rRNA sequences (7,300 total sequences) were downloaded from the 5S rRNA Database (http://combio.pl/rrna/), 16S bacterial rRNA sequences (168,096 total sequences) were downloaded from SILVA (https://www.arb-silva.de/) and 23S bacterial rRNA sequences (592,605 total sequences) were downloaded from SILVA (https://www.arb-silva.de/). As sequences can be continually added, modified or deleted to the databases, future designs could take into account altered numbers of sequences.
The molecular nature of the bacterial rRNA cDNA synthesis blockers are principally similar to those used to block cDNA synthesis of human, mouse and rat rRNA (see blockers B1-6193 described above). The oligonucleotides are (on average) 20 bp in length, spaced (on average) 30 bp apart when tiled antisense against the rRNA sequences, contain LNA oligonucleotides and contain a blocking residue at the 3′ terminus of each of the oligonucleotide. The blockers are expected to block cDNA synthesis of bacterial rRNA in a similar manner to the human, mouse and rat rRNA blockers.
Due to the sheer number of bacterial rRNA sequences, each blocker was picked to increase the total coverage the most when all of the rRNA sequences for a particular rRNA type (whether that is 5S, 16S or 23S) was considered. The blocker is designed to be antisense to the target rRNA sequence of interest. Specifically, after the BLOCKER LENGTH (i.e., about 20 bp), the DISTANCE between neighboring blockers (i.e., about 30 bp) when annealing to a set of target rRNA sequences (e.g., bacterial 5S rRNA), and the NUMBER of blockers to select (e.g., 1000 or 2000) were defined, the following design algorithm was used:
1. Count frequencies of all kmers with K=BLOCKER LENGTH in the set of target sequences,
2. Sort kmers by frequency,
3. Add most frequent kmer to blocker set,
4. Find location of selected kmer in all target sequences,
5. Determine kmers within DISTANCE downstream of kmer location and 0.5 DISTANCE upstream in each target sequence,
6. Decrement kmers identified in step 5 in the frequency list, and
7. Repeat steps 2-6 until the NUMBER of blockers is reached.
An example of the above process is shown in
The total fraction of rRNA sequences covered increases when the number of blockers increases (see
It is not required to include all blockers when attempting to block cDNA synthesis of bacterial rRNA. The coverage was 83% for 5S rRNA (using first 1000 blockers), 84% for 16S rRNA (using first 2000 blockers), and 84% for 23S rRNA (using first 1000 blockers). The sequences of the first 100 blockers for 5S rRNA, 16S rRNA, and 23S rRNA are shown as exemplary blockers in the tables below. 35 nmol of each oligo was synthesized using standard desalt purification. Following synthesis, the four pools were combined together to generate a blocker mix that contained 4000 blockers and was used in Examples 5-8.
The sequences of 100 exemplary blockers for each of bacterial 5S rRNA, 16S RNA and 23S rRNA are provided in the tables below.
This Example describes blocking bacterial rRNAs with the blocker mix as described in Example 4. The amount related to a blocker mix described in this Example is the amount of each blocker in the blocker mix. For example, 2.9 pmol blocker mix refers to a block mix contains 2.9 pmol of each blocker.
i. RNA (100 ng of Turbo DNase treated total RNA):
ii. Blocker depletion procedure
The results are shown in the table below.
E.coli
E.coli
E.coli
E.coli
The results show:
No blockers for both samples (E. coli and ATCC gut) resulted in a high percentage of rRNA.
2.9 pmol blockers gave the best performance with respect to rRNA blocking with both E. coli and ATCC gut samples.
For the E. coli sample, decreasing the amount of blockers had negligible effect on rRNA blocking. However, for the ATCC gut sample, when the amount of blocker was reduced, the amount of reads mapped to rRNA increased.
rRNA blocking led to an increased number of genes detected.
The blocking efficacy is inconsistent with that predicted by the blocker design algorithm: For the E. coli sample, the design algorithms predicted the blocking efficacy to be 93% of 5S, 99% of 16S, and 99% of 23S. The above results shown that in practice, this was achieved as 97% of all rRNA was removed.
Bacterial rRNA blockers reduced reads mapped to rRNA from about 97% to about 3% for the E. coli sample and from about 95% to about 12% for the ATCC gut sample.
This Example describes blocking bacterial rRNAs with the blocker mix as described in Example 4 at different concentrations and with different bead cleanup steps. Similar to Example 5, the amount related to a blocker mix described in this Example is the amount of each blocker in the blocker mix.
In this Example, the ATCC gut sample as described in Example 5 was used as the RNA sample. The method and materials were the same as in Example 5 except that the amounts of the block mix used in this Example were 2.9 pmol and 5.8 pmol, and that two versions of bead cleanups were performed: one (“one round”) was the same as in Example 5, the other (“two rounds”) had the following additional steps between steps 3.c. and 3.d.:
(i) Add 15 μl of nuclease-free water and 19.5 μl of QIAseq NGS Bead Binding Buffer. Mix thoroughly by vortexing, and incubate for 5 min at room temperature.
(ii) Centrifuge in a table top centrifuge until the beads are completely pelted (about 2 min).
(iii) Place the tubes/plate on a magnetic rack for 2 min. Once the solution has cleared, with the beads still on the magnetic stand, carefully remove and discard the supernatant.
The results are shown in the table below.
The results show: Doubling the amount of blocker from 2.9 pmol to 5.8 pmol improved depletion of rRNA.
NGS libraries prepared when 5.8 pmol blocker mix was used had a low concentration.
Even though the use of 5.8 pmol blocker mix resulted in improved rRNA depletion, it resulted in fewer genes positively called, whether the cutoff was an FPKM of 0.3 or 3.0.
2 rounds of 1.3× bead cleanup had a neutral effect.
While 5.8 pmol blocker mix was more effective in rRNA depletion, 2.9 pmol may be more preferred when both rRNA depletion and positively expressed genes are considered.
This Example also describes blocking bacterial rRNAs with the blocker mix as described in Example 4 at different concentrations and with different bead cleanup steps. Similar to Example 5, the amount related to a blocker mix described in this Example is the amount of each blocker in the blocker mix.
In this Example, the ATCC gut sample as described in Example 5 was used as the RNA sample. The method and materials were the same as in Example 6 except that the amounts of the block mix used in this Example were 2.9 pmol, 4.35 pmol, and 5.8 pmol.
The results are shown in the table below.
The results show:
Increasing blockers from 2.9 pmol to 4.35 pmol and further to 5.8 pmol improved depletion of rRNA.
NGS libraries prepared using 5.8 pmol blocker mix had a low concentration, regardless of the number of rounds of bead cleanups.
2 rounds of 1.3× bead cleanup improved the number of genes detected, but also increased rRNA percentage. On balance, it is more desirable to have an increased number of genes detected.
Reads mapped in pairs also increase with 2 rounds of 1.3× bead cleanup.
The combination of 2.9 pmol of blocker mix and 2 rounds of 1.3× bead cleanup provides the most desirable results.
This Example also describes blocking bacterial rRNAs with the blocker mix as described in Example 4 with different bead cleanup steps. Similar to Example 5, the amount related to a blocker mix described in this Example is the amount of each blocker in the blocker mix.
In this Example, two different RNA samples were used. One was the ATCC gut sample as described in Example 5 was used as the RNA sample. The other (“ATCC 3 Mix) was the mixture of the following:
a. 20 Strain Even Mix Whole Cell Material (ATCC, cat. no. MSA-2002)
b. Skin Microbiome Whole Cell Mix (ATCC, cat. no. MSA-2005)
c. Oral Microbiome Whole Cell Mix (ATCC, cat. no. MSA-2004)
The method and materials were otherwise the same as in Example 6 except that the amount of the block mix used in this Example was 2.9 pmol.
The results are shown in the table below.
The results show:
For the ATCC gut sample, 2.9 pmol blocker mix depleted rRNA from about 95% to about 13% or 20%, depending on whether 1 round or 2 rounds of 1.3× bead cleanup are used. Between 1 round and 2 rounds of bead cleanup, the additional round allowed for increased gene detection.
For the ATCC 3 Mix sample (consists of 28 bacterial species when overlapping species are accounted for), 2.9 pmol blocker mix depleted rRNA from about 95% to about 10% or about 15%, depending on whether 1 round or 2 rounds of 1.3× bead cleanup are used. Between 1 round and 2 rounds of bead cleanup, the additional round allowed for increased gene detection. Increasing the amount of blocker mix from 2.9 pmol to 4.35 pmol to 5.8 pmol improved depletion of rRNA.
The combination of 2.9 pmol of blocker mix and 2 rounds of 1.3× bead cleanup provides the most desirable results when considering both the rRNA depletion and gene expression results.
The results of Examples 5-8 show that for depleting bacterial rRNA, 2.9 pmol of each blocker was the optimal amount with two rounds of bead cleanups. However, for rRNA depletion, 1.45 pmol and even 5.8 pmol of each blocker also worked to deplete rRNA, even with a single round of bead cleanup.
The various embodiments described above can be combined to provide further embodiments. All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments.
These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
This application claims the benefit of priority to U.S. Provisional Application No. 62/736,006, filed Sep. 25, 2018, which application is hereby incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/051999 | 9/19/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62736006 | Sep 2018 | US |