The variety and significance of RNA has expanded over time so that today many species of RNAs of varying abundance, sizes, structures, and functions have been described. Circular RNAs (circRNA) are found in all kingdoms of life; thousands have been identified across species from Archaea to humans. CircRNAs were previously thought to be insignificant byproducts of splicing errors. However, more recent studies suggest that most circRNAs found in nature are generated through back splicing in vivo. CircRNAs are associated with multiple functions in the multicellular host. They participate in modulating certain protein-protein and protein-RNA interactions (Kristensen, et al., Nature Reviews Genetics, 20, 675-691 (2019)). CircRNAs are believed to act as sponges for microRNA (miRNA) and proteins and may additionally act as protein decoys, scaffolds, and recruiters. Some circRNAs act as translation templates in multiple pathophysiological processes. CircRNAs bind and sequester specific proteins to appropriate subcellular positions. Functions of circRNA include: providing a template for translation of proteins associated with cancer such as circ ZNF609 in mammalian myoblasts; regulation of gene expression such as circEIF3J; indirect regulation of miRNA target genes, e.g. CDR1/ciRS-7 in mammalian brain and miR7; and regulation of RNA binding protein dependent function, e.g. CircMbl in Drosophila and Mannan binding lectin (MBL) protein. Cancer studies have revealed that the degree of circRNA production is correlated with disease progression where 11.3% of circRNA is essential for cell proliferation. Moreover, circRNA levels increase in the brain is age-associated neurological disorders such as Alzheimer’s disease and Parkinson’s disease (see for example, Vo, et al. Cell (2019) 176, 869-881).
Endogenous circRNAs lack the free ends necessary for exonuclease-mediated degradation, rendering them resistant to several mechanisms of RNA turnover and granting them extended lifespans as compared to their linear mRNA counterparts. For this reason, circularization may allow for the stabilization of mRNAs that generally suffer from short half-lives and may therefore improve the overall efficacy of exogenous mRNA in a variety of applications including RNA vaccines and RNA based therapeutics (see for example: Salzman, et al. PLoS One. 2012; 7(2):e30733; and Muller, et al. RNA Biol 2017; 14(8): 1018-1027 and WesselHoeft, et al. Nature Communications (2018) 9, Article 2629).
There is a continuing need to be able to learn more about naturally occurring circRNAs including their sequences. Sensitive analytical tools are required to detect and accurately sequence circRNAs that range in size from very small (20 nucleotides (nt)) to significantly large (10 kb or more). Recently with the advent of vaccines and therapeutics based on mRNAs, circularizing in vitro synthesized large linear RNA (greater than 100 bases) is one approach to stabilizing these molecules. Such methods result in the need for analysis and sequencing of large circRNAs to determine the quality of the synthetic product. Such sequencing methods should preferably be fast, efficient, reliable and show reduced bias.
In general, methods, compositions and kits are provided that enable the detection, analysis and/or sequencing of small or large target RNA molecules whether synthetic, purified or within a biological fluid, or in cell lysate that may contain non-target RNA and other contaminating molecules without the need for depletion or purification steps that diminish what might already be low concentrations of the target molecule. The methods, compositions and kits rely on the use of a Group II Intron reverse transcriptase (Intron-RT) that have strand displacing properties and can generate concatamers in cDNA by rolling circle transcription of circRNAs that may be naturally circular or circularized in vitro from linear RNA.
In one aspect, a reaction mixture, is provided that includes (a) a sample comprising eukaryotic circRNA, or synthetic circRNA having an artificial sequence; and (b) a Group II bacterial or archaeal Intron-RT. In one embodiment, the circular eukaryotic RNA or the synthetic circRNA in the sample is a circularized linear RNA. The reaction mixture may further comprise a ribozyme or a DNA oligonucleotide where the oligonucleotide may be a primer or adapter. In one example, the oligonucleotide primer comprises a 3' end having a target specific complementary sequence for hybridizing to the circRNA; or a degenerate sequence at the 3' end, and optionally a 5' tail. The reaction mixture may also include one or more enzymes selected from the group consisting of a 5'- 3' RNA exonuclease, a 3'- 5' RNA exonuclease, a DNA polymerase, a ligase, a thermolabile proteinase and an endonuclease. For example, where a DNA polymerase is included in the reaction mixture, it may be any of Phi29, Taq, Bst, Bst large fragment, Bsu, Bsu large fragment, E.coli Polymerase I, Klenow, Deep Vent, Vent®, Pfu, KOD, Tgo or 9°N™ DNA polymerase (all commercially available from New England Biolabs, Ipswich, MA).
In certain embodiments, the circRNA may be circularized in vivo or in vitro. The circRNA in the reaction mixture may be an enriched and/or purified preparation from a cell or bodily fluid. Alternatively, the reaction mixture may contain a cell lysate or bodily fluid sample where the RNA has been partially enriched or purified or has been neither enriched nor purified.
In certain embodiments, the reaction mixture of the circRNA has a size in the range of 20 bases-50 kilobases. The reaction mixture may include a concatemeric first strand cDNA that is the product of the reaction between the eukaryotic circRNA in the sample and the bacterial or archaeal reverse transcriptase. Each concatemeric cDNA has multiple repeat units that are complementary copies of the circRNA, and the median length of the concatemeric cDNA is at least 3 times the length of the circRNA.
In some embodiments the cDNA in the reaction mix comprises at least 20 complementary copies of the circRNA. In certain examples, the cDNA contains at least 500 nucleotides. Another feature of the cDNA product is that the repeat units in a single cDNA share more than 90% sequence identity with each other.
In general, a method is provided for identifying a circRNA in a sample by characterizing a first strand cDNA molecule, comprising: (a) incubating a sample comprising a circRNA, with a Group II Intron-RT and dNTPs, to produce by rolling circle reverse transcription, a reaction product comprising: concatemeric first strand cDNA molecules; and (b) characterizing the cDNA by (i) obtaining a sequence of the concatemeric first strand cDNA molecules, wherein the sequence reads represent a consensus complementary sequence of repeat units of the circRNA; or (ii) amplifying the concatemeric first strand cDNA by a DNA amplification reaction by using primers.
In certain embodiments, the circRNA is isolated from a sample containing eukaryotic cells or bodily fluid; contained within a cell lysate or body fluid; circularized linear RNA; or synthesized in vitro creating an artificial sequence. In an example of the method, the step of sequencing the cDNA is preceded by a step of detecting the cDNA by amplification. In another example, the circRNA is circularized linear RNA that is a transcription product of a DNA. In another example of the method, a ribozyme or a DNA oligonucleotide such as a primer, an adapter is included in the incubation step (a) with the circRNA and Intron-RT.
In one embodiment, the methods described herein may include the step of amplifying the full length first strand cDNA concatemer using a randomly-primed amplification method, to produce an amplified concatemer. The first strand cDNA concatemer generated in this way may have a length of at least 500 bases; and contain at least 3 complementary copies of the circRNA. The concatemer cDNA or the double strand DNA amplification product of the concatemer cDNA can be sequenced by long-read sequencing.
In embodiments of the method, rolling circle reverse transcription can be preceded or followed by an enrichment step that involves enzymatic depletion of linear RNA using a 5'- 3' RNase and a 3'- 5' RNase. Alternatively or additionally, enrichment can be achieved by size separation of the concatemeric first strand cDNA from the non-concatemeric cDNA. In embodiments of the method, rolling circle reverse transcription can be performed at a temperature in the range of 20° C.-60° C. in which range the Intron-RT can reverse transcribe circRNA. However, it may be preferable to incubate the reaction mixture at a temperature in the range of 50° C.-60° C.; so as to reduce the formation of RNA secondary structure and to enhance the efficiency of rolling circle reverse transcription. The amount of cDNA copies of the circRNA within a concatemer results in at least a 2 fold higher concentration than can be obtained using a retroviral RT.
In certain embodiments of the method, a DNA polymerase is included in the reaction mixture for rolling circle reverse transcription. Alternatively the DNA polymerase may be used in a subsequent reaction. The DNA polymerase may be added to the cDNA concatemer to amplify the DNA for ease of quantitative detection or for sequencing. In these circumstances, Phi29 DNA polymerase may be selected for the amplification reaction, in which case, it may be desirable to add an endonuclease for removing branching of the amplified DNA. Alternatively any other DNA polymerase may be used as desired for isothermal amplification or for PCR.
Sequencing may be performed on the cDNA directly, after second strand synthesis or after DNA amplification. Consensus sequence reads are obtained by comparing the sequences for individual units in the concatemers to provide an accurate sequence determination for the circRNA. The consensus sequences also permit determination of the error rate of the Intron-RT for reverse transcription. This permits a determination of the error rate of secondary enzymes such as DNA dependent RNA polymerases and/or ligases on or after copying and/or joining nucleic acid sequences.
In general a kit is provided that includes a bacterial or archaeal Group II Intron-RT, and a synthetic non-natural oligonucleotide or ribozyme. The oligonucleotide may include a degenerate sequence for use as a unique identifier that is incorporated at the completion or start of each rolling circle copy. The kit may further include at least one enzyme selected from the group consisting of a DNA dependent RNA polymerase, a 5'- 3' RNA exonuclease, a 3'- 5' RNA exonuclease, a DNA polymerase, a ligase, a thermolabile proteinase and an endonuclease. At least one of the enzymes selected from the group consisting of an Intron-RT, DNA dependent RNA polymerase, a 5'- 3' RNA exonuclease, a 3'- 5' RNA exonuclease, a DNA polymerase, a ligase, a thermolabile proteinase and an endonuclease may be lyophilized. The lyophilized enzyme may be positioned on the surface of a polymer, within a porous polymer matrix or in a cake within a tube. One or more of the enzymes may be in the same or different containers from the Intron-RT.
In one embodiment, a method is provided for assaying the transcription fidelity of an RNA polymerase, comprising: selecting a synthetic linear DNA; transcribing the DNA with a DNA dependent RNA polymerase in a reaction mixture; producing a circularized RNA from (b); reverse transcribing the circRNA with a Group II Intron-RT to form a population of concatemeric cDNA; and sequencing the population of cDNA to determine the transcription fidelity of the RNA polymerase.
The error rate of the of the RNA polymerase can be determined from comparing the consensus reads from units in individual concatemers. In certain examples, the cDNA can be amplified with a DNA polymerase and sequenced by means of long-read sequencing.
In one aspect, a method is provided for amplifying a long linear RNA, which includes: performing first strand cDNA synthesis of a long linear RNA using an Intron-RT; and transcribing in vitro the cDNA using a DNA dependent RNA polymerase to make multiple copies of the long linear RNA. In one example, the long linear RNA contains modified bases. The long linear RNA may have a size of at least 1 kilobase and be capable of being reverse transcribed by Intron-RT in 30 minutes.
In one embodiment, a composition, is provided that includes a Group II Intron-RT and a synthetic non-natural RNA oligonucleotide adapter complementary to a DNA splint oligonucleotide.
In another embodiment, a composition is provided that includes: a concatemeric single strand DNA, wherein the concatemer is at least 3 repeat units of a single sequence, wherein (i) the single sequence has a length in the range of 20 bases to 50 kilobases, (ii) the single sequence in each of the 3 repeat units differ no more than 10%; and (iii) the concatemer is a product of rolling circle reverse transcription of an RNA.
In another embodiment, a lyophilized Group II Intron-RT is provided, wherein the lyophilized transcriptase is associated with a polymer matrix. For example, the Intron-RT may be contained within a porous polymer matrix for example, where the Intron-RT is within the polymer matrix in a cylinder, or the Intron-RT is positioned on a surface of the polymer. In this example, the polymer may be bead shaped with the lyophilized Intron-RT on its surface.
In another embodiment, the Group II Intron-RT is combined with a colored dye to form a mixture wherein the colored dye is at a concentration in the range of 0.003% to 1% (w/v); and wherein the reverse transcriptase mixture does not contain Taq polymerase and does not contain RNA. In one example, the colored dye is one or a combination of xylene cyanol, tartrazine, orange G or a combination of two or more of xylene cyanol, orange G and tartrazine.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Still, certain terms are defined herein with respect to embodiments of the disclosure and for the sake of clarity and ease of reference.
Sources of commonly understood terms and symbols may include: standard treatises and texts such as Kornberg and Baker, DNA Replication, Second Edition (W.H. Freeman, New York, 1992); Lehninger, Biochemistry, Second Edition (Worth Publishers, New York, 1975); Strachan and Read, Human Molecular Genetics, Second Edition (Wiley-Liss, New York, 1999); Eckstein, editor, Oligonucleotides and Analogs: A Practical Approach (Oxford University Press, New York, 1991); Gait, editor, Oligonucleotide Synthesis: A Practical Approach (IRL Press, Oxford, 1984); Singleton, et al., Dictionary of Microbiology and Molecular biology, 2d ed., John Wiley and Sons, New York (1994), and Hale & Markham, the Harper Collins Dictionary of Biology, Harper Perennial, N.Y. (1991) and the like. As used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise. For example, the term “a protein” refers to one or more proteins, i.e., a single protein and multiple proteins. The claims can be drafted to exclude any optional element when exclusive terminology is used such as “solely,” “only” are used in connection with the recitation of claim elements or when a negative limitation is specified.
Aspects of the present disclosure can be further understood in light of the embodiments, section headings, figures, descriptions and examples, none of which should be construed as limiting the entire scope of the present disclosure in any way. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the disclosure.
Each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present teachings. Any recited method can be carried out in the order of events recited or in any other order which is logically possible. Numeric ranges are inclusive of the numbers defining the range. All numbers should be understood to encompass the midpoint of the integer above and below the integer i.e. the number 2 encompasses 1.5-2.5. The number 2.5 encompasses 2.45-2.55 etc. When sample numerical values are provided, each alone may represent an intermediate value in a range of values and together may represent the extremes of a range unless specified.
In the context of the present disclosure, “non-naturally occurring” refers to a polynucleotide, polypeptide, carbohydrate, lipid, or composition that does not exist in nature. Such a polynucleotide, polypeptide, carbohydrate, lipid, or composition may differ from naturally occurring polynucleotides polypeptides, carbohydrates, lipids, or compositions in one or more respects. For example, a polymer (e.g., a polynucleotide, polypeptide, or carbohydrate) may differ in the kind and arrangement of the component building blocks (e.g., nucleotide sequence, amino acid sequence, or sugar molecules). A polymer may differ from a naturally occurring polymer with respect to the molecule(s) to which it is linked. For example, a “non-naturally occurring” protein may differ from naturally occurring proteins in its secondary, tertiary, or quaternary structure, by having a chemical bond (e.g., a covalent bond including a peptide bond, a phosphate bond, a disulfide bond, an ester bond, and ether bond, and others) to a polypeptide (e.g., a fusion protein), a lipid, a carbohydrate, or any other molecule. Similarly, a “non-naturally occurring” polynucleotide or nucleic acid may contain one or more other modifications (e.g., an added label or other moiety) to the 5'- end, the 3' end, and/or between the 5'- and 3'-ends (e.g., methylation) of the nucleic acid. A “non-naturally occurring” composition may differ from naturally occurring compositions in one or more of the following respects: (a) having components that are not combined in nature; (b) having components in concentrations not found in nature; (c) omitting one or components otherwise found in naturally occurring compositions; (d) having a form not found in nature, e.g., dried, freeze dried, crystalline, aqueous; and (e) having one or more additional components beyond those found in nature (e.g., buffering agents, a detergent, a dye, a solvent or a preservative).
Embodiments of the present invention permit a novel approach to detecting and sequencing circular RNA (circRNA) that include naturally circRNA, synthetic circRNA and circularized linear RNA. The circRNA is reverse transcribed by rolling circle reverse transcription to produce cDNA containing concatemers of sequence units that are complementary to the circRNA.
A “concatemer” refers to a long continuous DNA molecule that contains multiple copies of the same DNA sequence linked in series formed by rolling circle reverse transcription of an RNA. In embodiments, concatemeric single strand DNA are described that contain at least 3 repeat units of a single sequence, and the single sequence has a length in the range of 20 bases to 50 kilobases, such that preferably the single sequence in each of the 3 repeat units differs by no more than 10%.
“Long-read sequencing” refers to sequencing that has been developed for analysis of long stretches of nucleic acids in a single read (greater than about 5000 bases or 10,000 bases). Examples of sequencing platforms suitable for long-read sequencing are commercially available from, for example, Pacific Biosciences (Menlo Park, CA) and Oxford Nanopore Technologies (Oxford, UK).
CircRNA can occur in low amounts in cells and in biological fluids. Embodiments of the invention describe workflows that enable detection and identification of these RNAs via sequencing. These workflows are predicated on concatemer formation during reverse transcription using Group II Intron reverse transcriptase (Intron-RTs). Other embodiments described herein demonstrate how circRNA can be synthesized in vitro from linear RNAs and how target RNA molecules can be enriched by means of circularization without the requirement for depletion of non-target RNAs. CircRNA whether natural or synthetic, have been found to be more stable than linear RNAs. This property makes circRNA particular useful as a diagnostic target and also as a reagent for therapeutic and/or vaccine use.
CircRNAs have been detected in exosomes that may contain a complex cargo of contents derived from the original cell, including any or all of proteins, lipids, mRNA, miRNA and DNA. CircRNAs exert biological functions by acting as transcriptional regulators, microRNA (miRNA) sponges and protein templates. Moreover, emerging evidence has revealed that a group of circRNAs can serve as protein decoys, scaffolds and recruiters and play crucial roles in a variety of diseases, enabling them to potentially act as diagnostic biomarkers and therapeutic targets. Because circRNA are more stable than linear RNAs, there are advantages in circularizing linear RNA for sequencing, and for vaccine and therapeutic purposes. When the circRNA is combined with Intron-RTs, cDNA containing concatemers of complementary copies of the circRNA are formed where the cDNA can be size separated from high concentrations of cDNA from other linear RNAs and accurately sequenced by analyzing the consensus sequences in the repeat sequences that make up the cDNA concatemers.
Linear RNAs can be circularized using splint adapters and a suitable RNA ligase as described in
Intron-RT refers to a family of reverse transcriptases that are functionally capable of performing efficient rolling circle reverse transcription of circRNAs to create concatemeric cDNAs. Examples of Intron RTs are provided below in SEQ ID. Nos 1-49 any of which may be used for rolling circle reverse transcription in the compositions, kits and methods described herein. It should be appreciated that other members of the Intron RT group for use herein may be obtained by metagenome analysis using functional, structural or sequence dependent features. Intron-RT as used herein also refers to variants of wild type enzymes where the variants may include naturally occurring and/or artificially introduced insertions, truncations, deletions and/or nucleotide mutations. In some embodiments Intron-RTs may be used with representative sequences from a sequence database such as disclosed below in SEQ ID NOs: 1-49 having at least 80%, 85%, 90% or 95% sequence identity to any of SEQ ID NOs: 1-49.
Intron-RTs from wild type sources are encoded by mobile Group II Introns that function in Intron mobility (“retrohoming”) by a process that requires reverse transcription of a highly structured Intron RNA with high processivity and fidelity. Intron-RTs have been shown here to have higher fidelity, processivity, and strand displacement activity than retroviral RTs. To facilitate Intron-RT purification, , Intron-RTs, these enzymes have been expressed as fusion proteins with a rigidly linked, non-cleavable solubility tag (see for example, Mohr et al. RNA (2013) vol 7, 958-70). Group II intron reverse transcriptases that are commercially available for reverse transcribing linear RNA include TGIRT™ (Ingex, Olivette, MO) and Induro (New England Biolabs), and MarathonRT (Kerafast, Boston, MA).
Intron-RTs have certain characteristic structural features including an N-terminal extension (NTE) typically containing a conserved sequence block (RT0) and two structurally-conserved insertions (RT2a and 3a) between the universally-conserved RT sequence blocks (RT1-7) (Lentzsch, et al. JBC (2019) vol 294, P19764-19784). The RT1-7 corresponds to the fingers and palm of retroviral RTs; thumb, with predicted α-helices corresponding to those in the HIV-1 RT thumb; DNA-binding domain and DNA endonuclease domain.
Present embodiments rely on the use of Intron-RT (for example, see SEQ ID NO: 1-49 or variants or mutants thereof) and are characterized by the conserved structures described above, even where the sequence varies. These reverse transcriptases were shown here to be especially suited for rolling circle reverse transcription of circRNA of any desired size including small and large RNAs and to consistently produce multiple copies of the circRNA in the form of linear concatemeric cDNA where each unit of the cDNA concatemer represents one copy of the circRNA. These reverse transcriptases are also suitable for reverse transcribing linear RNA without concatemer formation. Nonetheless, the enzymes have advantages over retroviral reverse transcriptases for linear RNA because of their processivity (length of RNA substrates transcribed), fidelity (low error rate), and ability to bypass modified nucleotides as well as to tolerate contaminants in the reaction mix. However, until now, the use of these reverse transcriptases have been restricted to linear RNAs. Reverse transcription of circRNA using this class of reverse transcriptases has provided exciting new technologies described herein where some features include the following:
A comparison of the amount of cDNA produced by reverse transcription using commercially available retroviral reverse transcriptases and Intron-RTs show that the yield from Intron-RTs is much higher than from the retrovirus RTs. A variety of reverse transcriptases have been obtained from different sources including modifications of these. The advantages of Intron-RTs compared to viral RTs (e.g. M-MLV RT, PSII, and SuperScript IV (SSIV) (ThermoFisher Scientific, Waltham, MA) is illustrated in
Similar difference in yield could be observed with whole RNA at a concentration of 0.0001 pg-1 ng of a 0.5 kb circRNA from a pure preparation of circRNA or from total brain RNA (see for example,
In embodiments of the method, sequencing revealed that more than 70%, more specifically greater than 80% of the cDNA were concatemers yielding consensus sequences for unique circRNAs.
A population of concatemeric first strand cDNA molecules may be obtained from circRNA by rolling circle reverse transcription where the median length of the molecules is at least 0.5 kb. In some embodiments, the median length is in the range of 1.5 kb-50 kb, where for example, some molecules in the population may be in excess of 20 kb in length. In various embodiments, at least 90% of the molecules may have a length that is at least 1.5 kb, at least 2 kb or at least 3 kb. The number of concatemeric first strand cDNA molecules in the population may vary. However, in some embodiments, the population may comprise at least 100, at least 1,000, at least 5,000, or at least 10,000 of the concatemeric first strand cDNA molecules and each molecule may have, on average, at least 3 units, e.g., at least 4, or at least 5 units. In some cases, the sequences of the concatemeric cDNA may vary in a sample, reflecting the diversity of sequences of circRNAs in the sample. Within a molecule, each unit corresponding to one copy of the circRNA may be at least 25 nt, at least 50 nt, at least 200 nt, at least 500 nt, or at least 1,000 nt. In these embodiments, the units may range in size in the population, reflecting the lengths of the circRNAs in the sample.
The ability to form cDNA concatemers from circRNA was here observed for each of a plurality of Intron-RTs tested. In embodiments, an oligonucleotide is utilized to initiate reverse transcription.
In some embodiments, a reaction mix for producing concatemeric first strand cDNA is provided. In these embodiments, the reaction mix may comprise: a sample comprising circRNA (e.g., one or more RNAs that that have been circularized in vitro, or a sample comprising RNA isolated from a cell or bodily fluid, particularly from a eukaryote, e.g., a protozoan, an invertebrate, a plant, or a mammal, where the sample comprises circRNAs; and an Intron-RT where the combination is non-natural. In addition the reaction mix may comprise an oligonucleotide. The oligonucleotide can be a primer. Examples of primers for use with Intron-RT include a primer where the 3' end (e.g. at least 8 nt in length) hybridizes to the circRNA and the 5' end of the primer is either complementary to the circRNA and does not have a 5' tail (e.g. a target-specific primer may be relatively short, e.g., less than 20, less than 15 or less than 10 nt in length) or it does have a 5' tail that is not complementary. Where the primer has a tail, the tail may be at least 8 nt at the 5' end , or the tail may be short, e.g., 6 or less, 5 or less or 4 or less nt. 5' tails may contain a sequence that provides a binding site for a PCR primer.
Another example of a primer is one that has a degenerate sequence at the 3' end (e.g., a sequence of at least 4, at least 5 or at least 6 degenerate bases). These primers may be relatively short, e.g., less than 10 nt in length although longer degenerate or sequence specific primers may be used. The oligonucleotide may be a primer containing a degenerate sequence that hybridize at its 3' end to the circRNA to initiate reverse transcription and has a 5' tail suitable for down-stream priming of DNA amplification. Where degenerate primers are used for amplification, embodiments of the method provide for an endonuclease such as T7 endonuclease to remove branched DNA molecules that can result from primer hybridization to cDNA concatemers as shown in
The oligonucleotide primer may have a complementary sequence to the circRNA and further include a unique identifier sequence commonly comprising a degenerate base sequence. In some embodiments, the short length of the primer can impose an upper limit on the temperature of the reverse transcription reaction. This effect may be circumvented by extension of the primer. In some embodiments, a retroviral reverse transcriptase such as ProtoScript II may be used to extend a short complementary primer (such as a degenerate primer) hybridized to the circRNA for enhancing Intron-RT reverse transcription. In some embodiments, multiple primers (e.g. hexamers) hybridize to the circRNA at different locations. In any embodiment, the reaction mix does not need to comprise a template switching oligonucleotide. Template switching oligonucleotides typically have a stretch of riboguanosines (e.g., rGrGrG) at the 3' end.
The oligonucleotide in the reaction mix may be an adapter, ribozyme or other DNA molecule suitable for enhancing rolling circle reverse transcription of circRNAs.
The cDNA concatemers that result from Intron-RT may be amplified in entirety prior to sequencing. The preferred amplification is whole genome amplification using for example Phi29 or any other DNA polymerase that is commonly used for long range isothermal amplification (see for example, Examples 6, 9 and 10) as the average size of the concatemers may be in excess of 1.5 kb, 3 kb and as much as 9 kb or more. If qPCR is performed on the cDNA or amplicons thereof, only a portion of the concatemer is amplified to quantify circRNAs.
Although there have been some reports of rolling circle reverse transcription of RNA using retroviral RT, these enzymes were found to be inefficient with respect to yield of cDNA when compared with Intron-RT. SomaGenics (US 9,493,818) reported the use of adapter ligation to artificially circularize very small linear RNAs (miRNAs) for PCR amplification that relied on the adapter to provide specific primer hybridizing site. The reported method utilized a retroviral RT called Superscript™ II (ThermoFisher Scientific, Waltham, MA) to reverse transcribes circRNA with low efficiency to produce a linear cDNA. While not wishing to be bound by theory, it is believed that the observed inefficiencies of retroviral RTs arise from the difficulties in passing through primer bound to the RNA substrate owing to low affinity binding of the enzyme for the RNA substrate and limited if any strand displacement properties.
An advantage of Intron-RT is its flexibility in accomplishing rolling circle reverse transcription of circRNA having a wide range of sizes e.g. 20 bases to 50 kb. For example, a long cDNA (median size >5 kb) was generated from a 42 nucleotide circRNA thereby containing more than 100 units of sequence complementary to a unique circRNA within 15 minutes (see
Advantages of concatemer formation in cDNAs from circRNAs by Intron-RTs over single copy cDNAs include:
Uses of concatemeric cDNA containing at least 4 copies of the substrate circRNA provides advantages that include: (a) improving the accuracy of sequencing data; (b) distinguishing circRNA from linear RNA; and (c) detection of error artefacts in synthetic circRNA.
The ability to form concatemers of cDNA from the circRNA has an added advantage for enhancing accuracy of RNA sequence determination. The consensus sequence from aligned cDNA sequence units obtained from the concatemers can reveal nucleotide variations that result from experimental error in the consensus sequence in contrast to actual variants.
The convenience of forming concatemers of large circRNAs may be provide a means of quality control for mRNA that is synthesized and circularized for vaccine production or as therapeutic molecules for modulating a phenotype of an organism. The convenience of forming or identifying small circRNAs in total RNA or RNA in blood plasma or other biological fluid is the ability to recognize rare molecules that occur at low concentrations without the cost of further loss of sample through depletion reactions to reduce background. Moreover, the concatemer formation in cDNA is an amplifying effect that allows for further second strand DNA amplification and considerable enhancement of signal in diagnostic tests and sequencing experiments. Uses also include testing novel DNA dependent RNA polymerases for error rates where the reverse transcriptase error rate can be reliably subtracted. This is also shown with triangles and circles in
More specifically, an embodiment of the methods include the following: (i) incubating a reaction mix comprising sample comprising RNA isolated from a cell or bodily fluid (where the sample contains circRNAs), an Intron-RT, dNTPs and a primer that has a degenerate 3' end and may not have a 5' tail, to produce a product comprising concatemeric first strand cDNA molecules; (ii) amplifying the product first strand cDNA molecules using a randomly-primed amplification method (e.g., using Phi29 polymerase and random primers, or any suitable randomly primed WGA method), to produce an amplification product; and/or sequencing the amplification product, e.g., using a long range sequencing platform such as Oxford Nanopore sequencer. This method has the potential to revolutionize how circRNAs are analyzed, because the method can be PCR-free. In addition, the method eliminates template switching, which is an inefficient process in this context. Finally, the method eliminates the need for selection of molecules of a defined size range (i.e., molecules that are about 1 kb in length; see Zhang, et al Nat. Biotech. 2021 39: 836-845).
A reagent kit for use in analyzing circRNAs may include an Intron-RT preparation having the following features:
The Intron-RT preparation containing an Intron-RT may be stored in a suitable storage buffer or in a lyophilized form in a storage container such as a tube or on a matrix such as a paper, beads or a plastic substrate.
The Intron-RT may be a fusion protein with a second protein domain such as maltose binding domain (MBP), chitin binding domain (CBD), SNAP-tag® (New England Biolabs, Ipswich, MA) or other suitable protein binding domain for immobilizing the Intron-RT on the substrate. The Intron-RT may be fused to a second RNA binding domain for enhancing its binding to RNA.
The Intron-RT preparation may include reagents for circularizing linear RNA such as any of a ribozyme, an RNA ligase, an adapter or chemicals capable of circularizing the RNA and may further include an oligonucleotide for circularization of an RNA, a splint adapter and/or a primer.
The Intron-RT preparation will be preferably RNA-free prior to use.
The Intron-RT preparation may further include additional proteins such as a 3'-5' exonuclease such as RNase R. The Intron-RT preparation may also include a 5'-3' exonuclease such as XRN-I.
The Intron-RT preparation may further include a reversible binding aptamer for inhibiting the reverse transcriptase activity prior to the desired reaction time. The Intron-RT may additionally include one or more of: DNA dependent RNA polymerase, a DNA polymerase such as Phi29 or other polymerase, an exonuclease, an endonuclease such as T7 endonuclease, a ligase and/or random or specific primers and dNTPs. Any of these reagents may be present in the Intron-RT preparation or provided in a separate container either individually or together.
The Intron-RT preparation or kit may further include a T4 RNA ligase, a ribozyme, an adapter and/or suitable chemicals for circularizing a target RNA. The Intron-RT preparation may further include in a separate container, a Thermolabile Proteinase K (New England Biolabs, Ipswich, MA) for removing enzymes at a particular time in the workflow and/or a DNAse for removing contaminating DNA from a sample.
SEQ ID NO: 1
SEQ ID NO: 2
SEQ ID NO: 3
SEQ ID NO: 4
SEQ ID NO: 5
SEQ ID NO: 6
SEQ ID NO: 7
SEQ ID NO: 8
SEQ ID NO: 9
SEQ ID NO: 10
SEQ ID NO: 11
SEQ ID NO: 12
SEQ ID NO: 13
SEQ ID NO: 14
SEQ ID NO: 15
SEQ ID NO: 16
SEQ ID NO: 17
SEQ ID NO: 18
SEQ ID NO: 19
SEQ ID NO: 20
SEQ ID NO: 21
SEQ ID NO: 22
SEQ ID NO: 23
SEQ ID NO: 24
SEQ ID NO: 25
SEQ ID NO: 26
SEQ ID NO: 27
SEQ ID NO: 28
SEQ ID NO: 29
SEQ ID NO: 30
SEQ ID NO: 31
SEQ ID NO: 32
SEQ ID NO: 33
SEQ ID NO: 34
SEQ ID NO: 35
SEQ ID NO: 36
SEQ ID NO: 37
SEQ ID NO: 38
SEQ ID NO: 39
SEQ ID NO: 40
SEQ ID NO: 41
SEQ ID NO: 42
SEQ ID NO: 43
SEQ ID NO: 44
SEQ ID NO: 45
SEQ ID NO: 46
SEQ ID NO: 47
SEQ ID NO: 48
SEQ ID NO: 49
All publications, patents, and patent applications mentioned in this specification, including U.S. Provisional Application 63/260,323, are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
1. A reaction mixture, comprising:
2. The reaction mix according to paragraph 1, wherein the circular eukaryotic RNA or the synthetic circRNA in the sample is a circularized linear RNA.
3. The reaction mixture according to the above paragraphs 1 or 2, further comprising a DNA oligonucleotide.
4. The reaction mixture according to paragraph 3, wherein the DNA oligonucleotide is a primer or an adapter.
5. A reaction mixture according to paragraph 4, wherein the DNA oligonucleotide is a primer comprising a 3' end having a target specific complementary sequence for hybridizing to the circRNA; or a degenerate sequence at the 3' end, and optionally a 5' tail.
6. The reaction mixture according to any of the preceding paragraphs, further comprising one or more enzymes selected from the group consisting of a 5'- 3' RNA exonuclease, a 3'- 5' RNA exonuclease, a DNA polymerase, a ligase, a thermolabile proteinase and an endonuclease.
7. The reaction mixture according to any of the preceding paragraphs, further comprising a DNA polymerase.
8. The reaction mixture according to paragraph 7, wherein the DNA polymerase is Phi29, Taq, Bst, Bst large fragment, Bsu, Bsu large fragment, E.coli Polymerase I, Klenow, Deep Vent, Vent, Pfu, KOD, Tgo or 9°N DNA polymerase.
9. The reaction mixture of any prior paragraph, wherein the sample comprises an RNA that has been circularized in vitro.
10. The reaction mixture of any prior paragraph, wherein the sample comprises an RNA isolated from a cell or bodily fluid.
11. The reaction mixture of according to any of paragraphs 1-8, wherein the sample is a cell lysate or bodily fluid.
12. The reaction mixture according to any prior paragraph, wherein the circRNA has a size in the range of 20 bases-50 kilobases.
13. The reaction mixture of any prior paragraph, further comprising concatemeric first strand cDNAs, wherein each concatemeric cDNA contains repeat units that are complementary copies of circRNA, and wherein the median length of the concatemeric cDNA is at least 3 times the length of the circRNA.
14. The reaction mixture according to paragraph 13, wherein the cDNA in the reaction mix comprises at least 20 complementary copies of the circRNA.
15. The reaction mixture of paragraph 13, wherein the cDNA contains at least 500 nucleotides.
16. The reaction mixture according to any of the prior claims, wherein the repeat units in a single cDNA share more than 90% sequence identity.
17. A method for identifying a circular RNA (circRNA) in a sample by characterizing a first strand cDNA molecule, comprising:
18. The method of paragraph 17, wherein the circRNA is:
19. The method of paragraph 17 or 18, wherein the step of sequencing the cDNA is preceded by a step of detecting the cDNA by amplification.
20. The method according to any of paragraphs 17-19, wherein the circRNA is a circularized linear RNA that is a transcription product of a DNA.
21. The method of any of paragraphs 17 -20 wherein (a) further comprises: combining a DNA oligonucleotide or a ribozyme with the circRNA and Intron-RT.
22. The method of paragraph 21, wherein the DNA oligonucleotide is a primer or an adapter.
23. The method according to any of paragraphs 17-22 wherein (b) further comprises: amplifying the full length first strand cDNA concatemer using a randomly-primed amplification method, to produce an amplified concatemer.
24. The method of any of paragraphs 17-23, further comprising: forming the concatemeric first strand in (i) having a length of at least 500 bases; and containing at least 3 complementary copies of the circRNA.
25. The method of any of paragraphs 17-24, wherein the sequencing of the concatemeric first strand cDNA in (i) is by long-read sequencing .
26. The method according to any of paragraphs 17-25, wherein the sequencing of the amplification product of the first strand cDNA in (i) is by long-read sequencing.
27. The method according to any of paragraphs 17-26, wherein step (a) further comprises, enriching the circRNA in total RNA by degrading linear RNA with a 5'- 3' RNase and a 3'- 5' RNase.
28. The method according to any of paragraphs 17, further comprising size separating the concatemeric first strand cDNA from non-concatemeric cDNA.
29. The method according to paragraph 28, wherein step (a) does not include RNase treatment of the sample containing circRNA.
30. The method according to any of paragraphs 17-29, wherein (a) further comprises forming the concatemeric cDNA at a temperature in the range of 20° C. — 60° C.
31. The method according to paragraph 30, wherein (a) further comprises incubating the sample with the Intron-RT at a temperature in the range of 50° C. — 60° C.; so as to reduce the formation of RNA secondary structure.
32. The method according to any of paragraphs 17-31, further comprising: amplifying the cDNA with the DNA polymerase in the reaction mixture.
33. The method according to paragraph 32, wherein the DNA polymerase is Phi29.
34. The method according to paragraph 33, further comprising treating the amplified first strand cDNA with a nuclease for removing branching.
35. The method according to any of paragraphs 17-34, wherein (b) further comprises aligning multiple repeat sequences to obtain a consensus sequence.
36. The method according to any of paragraphs 17-35, wherein the amount of cDNA copies of the circRNA within a concatemer provides at least 2 fold higher concentration of copies than can be obtained using a retroviral RT.
37. A kit comprising a bacterial or archaeal Group II Intron reverse transcriptase (Intron-RT), and a synthetic non-natural oligonucleotide.
38. The kit according to paragraph 37, further comprising at least one enzyme selected from the group consisting: of a DNA dependent RNA polymerase; a 5'- 3' RNA exonuclease; a 3'- 5' RNA exonuclease; a DNA polymerase; a ligase; a thermolabile proteinase; and an endonuclease.
39. A kit according to paragraph 37 or 38, wherein at least one of the enzymes selected from the group consisting: of an Intron-RT; DNA dependent RNA polymerase; a 5'- 3' RNA exonuclease; a 3'- 5' RNA exonuclease; a DNA polymerase; a ligase; a thermolabile proteinase; and an endonuclease is lyophilized.
40. The kit according to paragraph 39, wherein the one or more lyophilized enzymes is on the surface of a polymer, within a porous polymer matrix or in a cake within a tube.
41. The kit according to paragraph 40, wherein one or more of the enzymes are in the same or different containers from the Group II Intron reverse transcriptase (Intron-RT).
42. A method for assaying the transcription fidelity of an RNA polymerase, comprising:
43. The method according to paragraph 42, further comprising, determining the error rate of the RNA polymerase from the occurrence of errors in the consensus sequences in individual cDNAs.
44. The method according to paragraph 42 or 43, wherein step (d) further comprises amplifying the cDNA with a DNA polymerase.
45. The method according to paragraph 42 or 43, wherein (e) further comprises long-read DNA sequencing.
46. A method for amplifying a long linear RNA, comprising:
47. The method according to paragraph 46, wherein the long linear RNA contains modified bases.
48. The method according to claim 46 or 47, further comprising reverse transcribing a long linear RNA of at least 1 kb in less than 30 minutes.
49. A composition, comprising: a Group II Intron reverse transcriptase (Intron-RT) and a synthetic non-natural RNA oligonucleotide adapter complementary to a DNA splint oligonucleotide.
50. A composition, comprising: a concatemeric single strand DNA, wherein the concatemer is at least 3 repeat units of a single sequence, wherein (i) the single sequence has a length in the range of 20 bases to 50 kilobases, (ii) the single sequence in each of the 3 repeat units differ no more than 10%; and (iii) the concatemer is a product of rolling circle reverse transcription of an RNA.
51. A lyophilized Group II Intron reverse transcriptase (Intron-RT), wherein the lyophilized transcriptase is associated with a polymer matrix.
52. The lyophilized Intron-RT according to paragraph 51, wherein Group II Intron-RT is contained within a porous polymer matrix.
53. The lyophilized Intron-RT of paragraph 52, wherein the porous polymer matrix is a cylinder.
54. The lyophilized Intron-RT according to claim 51, wherein Group II Intron-RT is positioned on a surface of the polymer.
55. The lyophilized Intron-RT according to paragraph 54, wherein the polymer is in the shape of a bead.
56. A Group II Intron reverse transcriptase (Intron-RT) combined with a colored dye to form a mixture wherein the colored dye is at a concentration in the range of 0.003% to 1% (w/v); and wherein the reverse transcriptase mixture does not contain Taq polymerase and does not contain RNA.
57. The Group II Intron reverse transcriptase mixture of paragraph 56, wherein the colored dye is one or a combination of xylene cyanol, tartrazine, orange G or a combination of two or more of xylene cyanol, orange G and tartrazine.
Rolling circle reverse transcription of circRNA (0.4 kb, sequence shown below) was carried out with an Intron-RT or retroviral reverse transcriptase (Superscript™ II, ThermoFisher Scientific, Waltham, MA). The reaction using Intron-RT was performed as following: 2 µl of Induro™ buffer (New England Biolabs, Ipswich, MA), 1 µl of dNTP (10 mM of each), 1 µl of circRNA specific primer (10 µM, sequence shown below), 1 ug of circRNA and 100 ng of Intron-RT. The reaction was incubated at 55° C. for 0 minutes, 20 minutes, 1 hour and 2 hours, respectively. The reaction for SuperScript II is under the following condition: 2 µl of SupersScript II buffer (5X), 1 µl of dNTP (10 mM of each), 1 µl of circRNA specific primer (10 µM), 1 µg of circRNA and 0.5 µl of SuperScript II RT. The reaction was incubated 42° C. for 0 minutes, 20 minutes, 1 hour and 2 hours. The results are shown in
The circRNA was in vitro synthesized following the protocol published by Wesselhoeft RA, et al., Nature Commun 9, 4475 (2018).
circRNA (0.4 kb) (SEQ ID NO: 50)
Primer sequence:
In a second example, the retroviral reverse transcriptases were SuperScript IV (SSIV), ProtoScript® (PSII) (New England Biolabs, Ipswich, MA), and the Intron-RTs were TGIRT™ (Ingex, Olivette, MO) and Induro. Reverse transcription of 1 kb RNA was initiated with primers specific to the RNAs that were either circular (C) (SEQ ID NO: 52) or linear (L) (SEQ ID NO: 53). Reverse transcription with retroviral RTs (SSIV and PSII), and Intron-RTs (TGIRT and Induro) were carried out according to manufacturers’ protocols. For the Intron-RTs, 1 pmol of linear or circRNA was incubated with 2 µl of 10 µM primers (SEQ ID NO: 51 and SEQ ID NO:54 respectively) in a volume of 28 µl at 65° C. for 5 minutes. The reaction was cooled down to 4° C. with 0.1° C. /second ramp and 2 µl 10 mM dNTPs, 8 µl 5X Intron-RT buffer and 200 ng Intron-RT were added to the reaction followed by an incubation at 55° C. for 1 hour. When the reverse transcription reactions reached completion, 1.5 µl of Thermolabile Proteinase K (New England Biolabs, Ipswich, MA, P8111S) was added to all the RT reactions and incubated for 15 minutes at 37° C. and 20 minutes at 55° C. cDNAs were purified with SPRI beads and eluted in 26 µl water. 24 µl of this elution was mixed with 3 µl 10X RNaseH buffer, 1.5 µl RNaseH (New England Biolabs, Ipswich, MA), and 1.5 µl RNase lf (New England Biolabs, Ipswich, MA) to degrade the template RNA and incubated at 37° C. for 30 minutes and 70° C. for 20 minutes. 10 µl of the reactions were run on 1% agarose gel together with 10 kb ladder (New England Biolabs, Ipswich, MA). cDNA (>10kb) containing multiple repeat complementary sequences to circRNA was observed using Intron-RTs, while the majority of the cDNA produced by MMLV-RT variants (SSIV and PSII) is around 1kb, equivalent to single copy of circRNA due to the poor strand displacement activity. The cDNA copies for linear RNA was single copy for all reverse transcriptase products. The results are shown in
circRNA (1.3 kb) (SEQ ID NO: 52)
Primer sequence: SEQ ID NO: 51
Linear RNA (SEQ ID NO: 53)
CircRNAs having a range of sizes: (A) 1.3 kb (SEQ ID NO: 52) (B) 0.8 kb (SEQ ID NO: 55) and (C) 0.4 kb (SEQ ID NO: 56) were reverse transcribed by rolling circle reverse transcription using the primer (SEQ ID NO: 51) (
circRNA (0.8 kb) (SEQ ID NO: 55)
circRNA (0.4 kb) (SEQ ID NO: 56)
Primer sequence: (SEQ ID NO: 51)
An in vitro synthesized circRNA (1. 3 kb) (SEQ ID NO: 52) with various input (0.001 pg (1), 0.01 pg (2), 0.1 pg (3), 1 pg (4), 10 pg (5), 100 pg (6) and 1 ng (7) was reverse transcribed with retroviral RTs (PSII, SSIV), and an Intron-RT (Induro) using a circRNA specific primer (SEQ ID NO: 51). The RT reactions for PSII and SSIV were carried out according to manufacturers' protocols. For the Intron-RT, 2 µl input RNA was incubated with 2 µl 10 mM of the specific primer in 25 µl total volume at 65° C. for 5 minutes then cooled down to 4° C., 8 µl 5X Induro Buffer, 2 µl 10 mM dNTPs, 1 µl RNase inhibitor (New England Biolabs, Ipswich, MA) and 200 ng Intron-RT were added to the reactions and incubated for 1 hour at 55° C. qPCR was performed with Luna® Universal qPCR Master Mix (New England Biolabs, Ipswich, MA) according to manufacturer’s protocol. The results of the qPCR are shown in
1 µg of total human brain RNA was reverse transcribed using a random hexamers primer. PSII and SSIV reactions were carried out according to manufacturers' recommendations. For Intron-RT, 1 µg total human brain RNA was incubated with 4 µl of 60 µM random hexamer and oligo dT mix in 25 µl total volume at 65° C. for 5 minutes. Then the reaction was cooled down to 4° C. with a 0.1° C./ sec ramp. 8 µl 5X Induro buffer, 2 µl 10 mM dNTPs, 1 µl RNase inhibitor (New England Biolabs, Ipswich, MA) and 200 ng Intron-RT were added to the reactions and incubated for 10 minutes at 23° C. and 5 minutes at 30° C. for initial extension of the hexamers, then 1 hour at 55° C.
qPCR was performed with Luna Universal qPCR Master Mix according to manufacturer’s protocol with the primers listed below to detect endogenous circRNAs from human brain (ZNF609, RIMS2, TULP4, XPO1, HIPK3), The results are shown in
To test the effectiveness of two exoribonuclease for depletion of linear RNA, the following four reactions were set up: 1) Mock without any treatment; 2) RNase R only; 3) Poly A polymerase + RNase R; and 4) Poly A polymerase + RNase R + XRN-1. Linear RNA with A-tailing by Poly A polymerase has been shown to improve the linear RNA degradation by RNase R (Xiao et al, Nucleic Acids Research, (2019) 47, 8755-8769).
The reagents were obtained as follows: RNase R (Lucigen, Middleton, WI); Poly(A) polymerase, MMLV buffer; RNA cleanup: Monarch® RNA Cleanup Kit; mRNA Decapping enzyme (MDE); T4 Polynucleotide Kinase (T4 PNK); XRN-1; NEBuffer 3.1 (New England Biolabs, Ipswich, MA). Adjust Buffer:100 mM Tris-Cl, pH 8.0, 750 mM KCI.
Reverse transcription was performed in the following 20 µl reaction: 4 µl of 5x PSII reaction buffer, 2 µl of 100 mM DTT, 1 µl of 10 mM dNTP, 1 µl of 60 µM RT primer (Reverse Primer), 1 µl RNase Inhibitor (New England Biolabs, Ipswich, MA), 200 units of PSII and 10 µl of eluted RNA from the enrichment procedure described above. The reaction was incubated at 25° C. for 10 minutes, 42° C. for 1 hour and 65° C. for 20 minutes.
qPCR was performed with Luna Universal qPCR Master Mix according to manufacturer’s protocol. The results are shown in
The primers for qPCR used to analyze the cDNA resulting from reverse transcription of circRNA were as following:
Illumina libraries were constructed with NEBNext® Ultra™ II Directional RNA Library Prep Kit for Illumina® (New England Biolabs, Ipswich, MA) according to the manufactory’s protocol. The data was analyzed using CIRI2 (Gao, et al, Briefings in Bioinformatics, 2018, 19(5):803-810)(
Rolling circle reverse transcription was performed in an aqueous reaction mixture containing: 360 ng of purified circRNA template (1.8 kb, sequence shown below), 1x Induro Buffer, 1 mM DNTPs, 1 µM RT primer, 10 mM DTT, 100 ng Induro RT (an Intron-RT) and 20 units Murine RNase Inhibitor in a final volume of 20 µL. Reactions were incubated at 50° C. for 30 minutes. The cDNA reaction product was purified using NEBNext® Sample Purification Beads (New England Biolabs, Ipswich, MA) with a 1:1 volume ratio, following the manufacturer’s directions. 500 ng of cDNA was resuspended in 50 mM NaCl and the EP oligo was added to a final concentration of 1 µM, in a final volume of 20 µL. The oligos were annealed to the cDNA by heating to 60° C. for 5 minutes to create a double stranded cDNA end with an A overhang suitable for the Oxford Nanopore Ligation Sequencing Kit (Oxford Nanopore Technologies, Oxford, UK). A ligation reaction was then performed by adding 20 µL of the Oxford Nanopore sequencing adapter as well as 50 µL of Blunt/TA ligation master mix (New England Biolabs, Ipswich, MA) in a final volume of 100 µL and reacted for 30 minutes at room temperature. Reactions were then purified and prepared for sequencing following the Oxford Nanopore Ligation Sequencing protocol.
RT Primer:
EP oligo:
circRNA (~1.8kb) (SEQ ID NO: 81)
Primer sequence:
In
Randomized primer (N6) were first annealed to the circRNA. In this example, PSII was used to elongate the N6 primer (
Since the PSII has weak strand displacement activity, the elongation of the primer stops when the strand reaches another annealed primer. The Intron-RT subsequently added to the reaction continued reverse transcription in a rolling circle to produce concatemers due to the strong strand displacement activity of the Intron-RT. If the amount of the multiple-repeat cDNA was relatively low due to low input of starting RNA), Phi29 (New England Biolabs, Ipswich, MA) was used to amplify the cDNA followed by T7 Endonuclease (New England Biolabs, Ipswich, MA) to debranch the product. The Phi29 and T7 Endonuclease reactions were performed according to the manufacturer’s instructions. Library preparation was then carried out according to sequencing platform of choice using the standard protocols described by New England Biolabs, Ipswich, MA (see for example, New England Biolabs product E7805S for Illumina sequencing platform and New England Biolabs product E7180S for Oxford Nanopore Sequencing Platform). An example of a workflow for the above is shown in
CircRNA in human brain RNA (1 µg) could be enriched using the circRNA enrichment steps described above including rRNA depletion and linear RNA depletion. The rRNA depletion was performed using NEBNext® rRNA Depletion Kit v2 (Human/Mouse/Rat) (New England Biolabs, Ipswich, MA). The linear RNA depletion was performed according to the method above. As an alternative, circRNA in human brain could be size separated after rolling circle reverse transcription as described below to obviate the need for depletion of unwanted RNA prior to reverse transcription.
Human brain RNA (1 µg) or the enriched circRNA was reversed transcribed with PSII to elongate the randomized Hexamer (N6) primer. The final reaction condition for 20 µl is: 4 µl of 5x PSII reaction buffer, 2 µl of 100 mM DTT, 1 µl of 10 mM dNTP (N0447S), 1 µl of 10 µM N6 primer, 200 units of PSII and human brain RNA. The reaction was incubated at 25° C. for 10 minutes.
If PSII was used to extend the N6 and rolling circle reverse transcription was performed by adding the 20 µl of the following components to the reaction: 8 µl of 5 X Induro buffer, 200 ng of Intron-RT (Induro), 2 µl 10 mM dNTP and 8 µl H2O. The reaction was incubated at 55° C. for 40 minutes.
If PSII was not used to extend the N6, rolling circle reverse transcription was performed as following in a 20 µl reaction: 4 µl of 5 X Group II Intron-RT buffer, 100 ng of Intron-RT, 1 µl 10 mM dNTP and human brain RNA. The reaction was incubated at 25° C. for 10 minutes followed by 55° C. for 40 minutes.
Size selection was carried out with beads resuspension buffer MgCl500-PEG5 according to the paper (Stortchevoi, et al, Journal of Biomolecular techniques, 31: 7-10). The product was eluted in 10 µl water.
The DNA amplification with Phi29 was carried out with the following conditions in a 50 µl reaction: 5 µl of 5X Phi29 DNA polymerase reaction buffer, 5 µl of 10 mM dNTP, 2.5 µl 1 mM phosphorothioate N6, 1 µl of product from rolling circle reverse transcription and 20 units of Phi29 DNA polymerase. The reaction was incubated overnight at 30° C. Then 20 µl of T7 Endonuclease I was added to the reaction to debranch the DNA. The effect of debranching with an endonuclease on the final cDNA product is shown in
The Nanopore library was prepared according to the Ligation Sequencing Kit and the sequencing was performed with Spot-ON® Flow Cell, R9 version on GridION® machine (Oxford Nanopore Technologies, Oxford, UK) according to the manufacturer’s protocols. The N50 (median) length and distribution of lengths of cDNA obtained from circRNA in human brain is shown in
The experimental protocols described here are similar to Example 6, except that the input RNA here is total human brain RNA without circRNA enrichment. This example demonstrated that, due to the size difference, the concatemeric cDNA derived from rolling circle reverse transcription of circRNA could be readily separated from non-concatemeric cDNA derived from linear RNA. As shown in
Small RNAs (sRNAs) are an important group of non-coding RNAs that have great potential as diagnostic and prognostic biomarkers for treatment of a wide variety of diseases. The portability and affordability of nanopore sequencing technology makes it ideal for point of care and low resource settings. Currently sRNAs cannot be reliably sequenced on the nanopore platform due to the short size of sRNAs and high error rate of the nanopore sequencer.
A highly efficient nanopore-based sequencing strategy for sRNAs is described here (see
For synthetic RNA oligos (sequence shown below), 5 pmol of RNA was used as input and the protocol starts at the circularization step. For human brain RNA, sRNAs (< 200 base pairs) were isolated from 10 µg of total human brain RNA, sRNAs (< 200 base pairs) were isolated from total RNA using RNA XP beads (Beckman Coulter, Indianapolis, IN) following the manufacturer’s protocol. 50 ng of the isolated sRNA was used as input for a ligation reaction that would circularize the sRNA.
The RNA was first denatured by heating to 70° C. for 2 minutes and placed on ice immediately. The ligation mixture was then added to the RNA. The ligation mixture contained 1X T4 RNA Ligase Buffer (New England Biolabs, Ipswich, MA), 20% PEG-8000, 0.05% Tween-20, 20 pmol of the annealed DNA splint (bottom strand) RNA adapter (top strand) (see below) and 200 units of T4 RNA Ligase 2 Truncated K/Q (New England Biolabs, Ipswich, MA) in a total volume of 20 µl. The ligation reaction was incubated at 25° C. for 1 hour. Subsequently, 2 units of USER® Enzyme (New England Biolabs, Ipswich, MA) and 4 units of DNase I (New England Biolabs, Ipswich, MA) were added to the reaction mixture and incubated at 37° C. for 30 minutes to remove the bottom DNA splint strand of the adapter. The T4 RNA Ligase 2 Truncated K/Q was then heat inactivated at 75° C. for 5 minutes and then cooled down to 4° C. The reaction mixture was then diluted to a total volume of 40 µL with 1X T4 RNA Ligase Buffer, 1 mM ATP, 10 units of T4 Polynucleotide Kinase and 30 units of T4 RNA Ligase 1 (New England Biolabs, Ipswich, MA). The circularization reaction was then allowed to proceed for 1 hour at 25° C. Linear RNAs were then degraded by adding the following mixture of enzymes: 2 µL XRN-1, 1 µL 5'deadenylase (New England Biolabs, Ipswich, MA), RNase R (purified in house), ATP and Poly(A) Polymerase (New England Biolabs, Ipswich, MA). The reaction mixture was then diluted to 80 µL with 2X Intron-RT buffer (1X final concentration), 1 µM primer (supplemental table 1, oligo 2), 1 mM dNTPs and 100 ng of Induro RT. The reaction was incubated for 38° C. for 5 minutes, 60° C. for 30 minutes and then 95° C. for 5 minutes. The resulting cDNA was purified using 96 µl of NEBNext Sample Purification Beads, following the manufacturer’s directions, with a modified elution protocol. The elution was incubated at 37° C. for 10 minutes with occasional vortexing.
Second strand synthesis was performed using Taq DNA polymerase (New England Biolabs, Ipswich, MA). A primer (supplemental table 1, oligo 5) was annealed to the adapter sequence and the 5'-3' exonuclease activity of the polymerase could remove primers annealed to the internal sequence of cDNA. The reaction mixture contained 1 µg of the rolling circle cDNA product, 1X ThermoPol® Buffer (New England Biolabs, Ipswich, MA), 1 mM dNTPs, 10 pmol primer and 5 units of Taq DNA Polymerase in a total volume of 50 µL. The reaction was incubated for 95° C. for 30 seconds, 62° C. for 1 minute and 65° C. for 20 minutes. These reactions were purified using 25 µl of NEBNext Sample Purification Beads. The Nanopore Sequencing was performed with the purified DNA.
Reads were filtered by length (> 1000 bp) and average quality (>= 7) and then converted to FASTA format. SPADE (23) was used to detect periodic repeats in the reads and to extract consensus sequences. Iterative testing was performed to find optimal parameter tuning and the final parameters were used as follows: K-mer size = 5, sliding window size = 1000, peak height threshold = 10, gap threshold = 200, margin = 200, letter consistency threshold = 0.5. All other parameters were used with their defaults. Custom R scripts (R Core Team, version 3.6.3. https://www.R-project.org/) were used to parse the resulting GenBank files from the SPADE output to collect the consensus sequences. Consensus sequences generated in this manner could have any random circular orientation, therefore, to phase them we generated all possible rotations of each consensus sequence and aligned the adapter to them using pairwise alignments with the Needleman-Wunsch algorithm as implemented in the R package Biostrings (R package, version 2.62.0. https://bioconductor.org/packages/Biostrings.) We chose the first rotation of the sequence that gave the longest un-gapped alignment anchored to either the start or end of the read. The adapter sequence was then trimmed from the rotated consensus sequences to yield the final trimmed consensus sequences.
Synthetic RNA:
RNA adapter:
Bottom DNA splint strand:
RT Primer:
Second Strand Synthesis Primer:
The results are shown in
This example shows that Intron-RT mediated rolling circle reverse transcription allows for circRNA profiling from samples with very low RNA input, such as biofluids (urine, serum, saliva).
The workflow is shown in
Circularization of RNA followed by rolling-circle reverse transcription achieved highly accurate RNA sequencing for identifying sequence heterogeneity, including single-nucleotide polymorphisms (SNP), and was used for measuring the error rates of RNA polymerases (see
The error rate of in vitro transcription of a synthetic template by T7 RNA polymerase was measured. The RNA template was either circularized by means of an insert sequence, flanked by permutated Intron-exon splicing elements, plus homology arms, for circularization, as described in: Wesselhoeft et al., Nature communications 1-10 (2018) doi:10.1038/s41467-018-05096-6, or by ligation (as described in Example 8, or Petkovic et al., 2015, Nucleic Acids Research, Volume 43, 2454-2465).
After circularization of the transcribed RNA, rolling circle cDNA synthesis was performed by Intron-RT, producing long concatemeric cDNA. All reagents are from New England Biolabs, Ipswich, MA. The cDNA synthesis reaction contained: 1 µg of circularized RNA, 1X RT Buffer, 200 nM specific primer, 100 ng Intron-RT and 1 mM each dNTP and incubated at 44° C. for 1 minute. The reaction was stopped by the addition of 1 µl Thermolabile Proteinase K and incubated for 15 minutes at 25° C., 15 minutes at 37° C., and 1 minute at 95° C., followed by the addition of 1 µl RNaseA (diluted ⅒ in TE) and incubation for 15 minutes at 37° C. The single-stranded cDNA was purified by ethanol precipitation. Next, the concatemeric cDNA was replicated and made double-stranded by A-tailing and annealing an oligo-dT primer to the newly-formed poly-dA sequence at the 3' end. The oligo-dT can be extended by a DNA polymerase, such as Phi29 DNA polymerase or Taq DNA polymerase following standard protocols for primer extension. The double-stranded DNA product was then made into sequenceable libraries for long-read DNA sequencing, such as Pacific Biosciences Single-Molecule Real-Time (SMRT) sequencing using commercially available template preparation kits. The DNA libraries were sequenced on a Pacific Biosciences sequencer (Sequel II), to generate HiFi reads (high accuracy sequencing reads) of the concatemeric double-stranded cDNA. A consensus sequence was generated by aligning the sequences of all individual monomers from a single concatemer sequence. This consensus sequence of an individual monomer was compared to the template RNA sequence to determine RNA polymerase errors. Example error rates for T7 RNA polymerase using this method are provided below.
Mutational spectrum of T7 RNA polymerase (average across all samples):
The error rate of Intron-RT reverse transcriptase was determined using the method described in Example 9. In this case, the HiFi sequencing reads of the cDNA concatemers were analyzed by comparing the high-accuracy CCS (circular consensus sequencing) reads of the full length cDNA concatemer to the high-accuracy consensus sequence of the individual RNA monomers (which corresponded to the sequence of the RNA strand that acted as a template for reverse transcription) to determine any errors made by the reverse transcriptase. The error rate of Intron-RT was found to be:
Using a 10 minute incubation at 50° C., 55° C. or 60° C. of total human RNAs - SHDA (1.9 Kb), XRN (4.7 Kb), HERCl (5.5 Kb, SMG1 (9.3 kb), HERC1 (12.2 Kb) and HERC1 (14.2 Kb) were reverse transcribed with the Intron-RT-Induro. After first strand cDNA synthesis, an aliquot was amplified by PCR using LongAmp® Taq 2X Master Mix (New England Biolabs, Ipswich, MA). It was found that at temperatures of 55° C.-60° C., the Intron-RT (Induro) had a cDNA synthesis rate of about 2 kb/minute.
This application claims priority from U.S. Provisional Application 63/260,323 filed on Aug. 17, 2021, herein incorporated by reference. The contents of the electronic sequence listing (NEB-430-US.txt; Size: 138.171 bytes; and Date of Creation: Aug. 12, 2022) is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63260323 | Aug 2021 | US |