The present invention is in the field of molecular biology, diagnostics and more in particular expression profiling.
Over the years, research in the field of trariscriptome analysis has progressed from candidate gene-based detection of RNAs using Northern blotting to high-throughput expression profiling driven by the advent of microarrays. Since 2006 next-generation sequencing technologies have revolutionized transcriptomics by providing opportunities for multidimensional examinations of cellular transcriptomes in which high-throughput expression data are obtained at a single-base resolution. Table 1 summarizes gene expression profiling milestones.
Numerous protocols and commercial kits have been developed for mRNA-seq by next generation sequencing. Usually, the flow-chart of a standard transcriptome analysis includes 11 steps (The number and order of steps may slightly vary for several protocols and platforms. Library protocols for small RNA-seq based on RNA ligation follow a different workflow and are not considered here):
Enrichment of mRNAs and/or depletion of the rRNA are key steps for successful cDNA library generation and sequencing with minimal redundancy, because the major part of the total RNA consists of rRNA molecules.
For whole-transcriptome profiling of complex organisms like human many sequencing reads would have to be generated. Based on experimental data derived on Illumina sequencing platforms for the human transcriptome table 2 gives an overview of required number of reads and sequencing strategies for distinct analysis goals.
The enormous capacities for massive parallel sequencing on next generation sequencing platforms and the world-wide efforts in the genomics field have led to a tremendous improvement of our knowledge about the human genome and its expression profiles. Accordingly, we expect that almost all transcripts including splicing variants will be discovered in only just a few years.
However, complete analysis of complex transcriptomes is still expensive and labor intensive including the data analysis and extraction of biological and medical relevant information.
Furthermore, many working steps starting from RNA extraction to sequencing are time consuming, error-prone and make comparative studies difficult. Finally, many NGS machines do not have the capacity to generate enough reads for a whole complex transcriptome within one sequencing run.
to The present invention accomplishes the following improvements in this field, it leads to a significant improvement of sample preparation for RNA sequencing on NGS machines by reduction of the number of working steps, no no mRNA enrichment/purification no rRNA depletion are necessary, No adapter ligation is needed. The method now addresses NGS machines with limited sequencing capacity by targeted gene expression) profiling (gene panel oriented assays), the method enables multiplexed analysis by indexing, analysis of gene expression levels, analysis of known (examples 1 and 2) and unknown (example 3) splice site variants including as well as their quantification including single-base resolution and hence SNP detection (see example 3). The method provides for a large digital dynamic range.
Further, in contrast to U.S. Pat. No. 7,361,488, no support is needed and what is more important the sequence of the in vivo RNA is determined rather than the hybridized oligonucleotide detected. This difference is quite substantial.
A “composition” herein is an aqueous solution comprising at least one or more ribonucleic acid molecules.
A “first nucleic acid molecule with a 3′-tail wherein said tail does not hybridize to an RNA in the composition” is an oligonucleotide which has two parts, a first part is able to bind its RNA target (specifically) if the target is present in the composition and a second part which does not hind an RNA in the composition.
A “second nucleic acid molecule with a 5′-tail wherein said tail does not hybridize to an RNA in the composition” is an oligonucleotide which has two parts, a first part is able to bind its RNA target (specifically) if the target is present in the composition and a second part which does not bind an RNA in the composition.
The invention relates to a method for determining the sequence and/or quantity of a ribonucleic acid in a Method for determining the sequence and/or quantity of a ribonucleic acid in a composition, comprising the steps of:
The separation of the two nucleic acids molecules is ideally between 2 and 1000, 5 and 500 and most preferably between 35 and 150 nucleotides. The two molecules are deoxyribonucleic acids (DNA) or comprise DNA such that the antibody is functional and binds the hybrid,
U.S. Pat. No. 7,361,488 discloses a method wherein nucleic acid probes which have hybridized to an RNA target are ligated together and then subsequently amplified and detected. The drawback of this method is that the detection occurs by means of the probes which were originally added to the reaction. No de novo in vivo sequence is determined (only known sequences are detectable) and the detection is only indirect as one must assume, based on the detection of the probe that a certain RNA was present. New and unknown sequences to are not detectable. But, was that RNA present? That remains unclear when using the method of U.S. Pat. No. 7,361,488. The present invention solves this problem as the section steps allow for, for the first time the actual sequence determination of defined RNA stretches from, e.g. mRNA transcripts.
Probes and primers of the present invention are designed to have at least a portion be complementary to the polyadenylated mRNA target sequence or an RNA from another species, such that hybridization of the polyadenylated mRNA target sequence or the RNA from the other species and the probes of the present invention occurs. As outlined below, complementarity need not to be perfect; there may be any number of base per mismatches which will interview hybridization between the polyadenylated mRNA target sequence in a single stranded nucleic acid of the present invention. However, if the number of mutation is so great that no hybridization can occur under then the sequence is not a complementary polyadenylated mRNA target sequence (the same applies to an RNA from another species). Hence, the probes described in claim 1 must be “substantially complementary” which herein means that the probes are sufficiently complementary to the polyadenylated mRNA (or RNA from the other species) to hybridize under normal reaction conditions and preferably give the required specificity.
A variety of hybridization conditions may be used in the present invention including high, moderate and low hybridization conditions; see for example Maniatis et. al,, Molecularing Cloning: A Laboratory Manual, 2nd Edition, 1989 and short protocols in Molecular Biology.
Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures, Generally, stringent conditions are selected to be about 5 to 10° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength. The TM is the temperature (under defined ionic strength, ph nucleic acid concentration) at which 50% of the probes complementary to the target hybridize to the polyadenylated mRNA target sequence at equilibrium (as the target sequences are present in excess at Tm, 50% of the probes are occupied at equilibrium). Stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7 to 8.3 and the temperature is at least about 30° C. for short probes (e.g. 10 to 50 nucleotides) arid at least about 60° C. for long probes (e.g. greater than 50 nucleotides). Stringent conditions may also be achieved preferentially herein with the udition of helix destabilizing agents.
The method as designated above will make use of a large number of two-part nucleic acid hybridization probes also termed herein as target probes that are used in particular in a multiplex fashion. A plurality of these probes is used and means 10 or more of such probes. Preferably between 15 and 100, more preferably between 100 and 500, even more preferably between 100 and 1000.
As outlined above, the first nucleic acid molecule has a 3′-tail and the second nucleic acid molecule has a 5′-tail. These are so called universal priming sites. By “universal priming site” herein is meant a sequence of the probe that will bind a PCR primer for amplification. Each probe preferably comprises an upstream universal priming site and a downstream universal priming site. Herein, these are located on said first nucleic acid molecule and said second nucleic acid molecule. Again, “upstream” and “downstream” are not meant to convey a particular 5′- or 3′-orientation and will depend on the orientation of the system, Preferably, only a single upstream universal priming site and a single downstream universal single priming site is used in a probe said, These sequences are generally chosen to be as unique as possible given the particular assays and host genomes to ensure specificity of the assay.
It is preferred that the isolation is done by capturing the hybrids with an antibody that is specific for a DNA/RNA hybrid and said antibody is bound to some sort of a solid phase, such as a magnetic particle.
Even better results are achieved if the RNA is enzymatically digested prior to the amplification step (iv).
The length of the first nucleic acid molecule with a 3′-tail wherein said tail does not hybridize to an RNA in the composition and the second nucleic acid molecule with a tail wherein said tail does not hybridize to an RNA in the composition is preferably between 5 and 100 nucleotides, preferably between 10 an 50 nucleotides and more preferably between 15 and 30 nucleotides.
The tail of these molecules, wherein said tail does not hybridize to an RNA in the composition is preferably between 5 and 100 nucleotides, preferably between 10 an 50 and more preferably between 15 and 30 nucleotides.
In one optional embodiment the first nucleic acid molecule and/or second nucleic acid molecule comprise a further barcode sequence, determined not to bind the target RNA. (see also
The further barcode sequence is preferably from 5 to 6 nucleotides in length it may be between 3 and 20, 5 and 8 nucleotides in length. Other lengths may be envisioned.
These molecular barcodes are generated by introduction of random nucleotides between universal tail and target specific sequences of the probe. It allows differentiation between fragments derived from a target RNA molecule and copies generated during PCR amplification causing a sequence bias (
The first nucleic acid molecule with a 3′-tail wherein said tail does not hybridize to an RNA in the composition and the second nucleic acid molecule with a 5′-tail wherein said tail does not hybridize to an RNA in the composition, are specific for a nucleic acid sequence selected from the group of, a human nucleic acids sequence, a viral sequence, a bacterial sequence, an animal sequence and a plant sequence. Hence, the specific sequence may be specific for a human nucleic acids sequence, a viral sequence, a bacterial sequence, an animal sequence and a plant sequence.
Probes and primers of the present invention are designed to have at least a portion be complementary to the poly-A related mRNA target sequence or an RNA from another species, such that hybridization may occur.
Preferably the sequence for which they are specific is mRNA, it may be an exon-exon junctions and/or 5′ and 3′ UTR region.
Preferably, the next generation sequencing method applied is selected from the group of,
Illumina single end reads up to 150 bases or paired end up to 300 bases (2×50 bases) are preferred.
Preferably in the next generation sequencing method 25 to 500 bases are read per read, preferably between 25 and 200 nucleotides and more preferably between 25 and 150 nucleotides are read per read. Alternatively to single reads, paired end readings may be applied.
The method does to a certain extent depend on the concentration of the first nucleic acid molecule with a 3′-tail wherein said tail does not hybridize to an RNA in the composition and the second nucleic acid molecule with a 5′-tail wherein said tail does not hybridize to an RNA in the composition which ideally is between 1 fM and 1000 nM.
The invention also relates to a kit comprising a first nucleic acid molecule with a wherein said tail does not hybridize to an RNA in the composition, a second nucleic, acid molecule with a 5′-tail wherein said tail does not hybridize to an RNA in the composition, wherein said first and said second nucleic acid molecules when and if hybridized to their target RNA lie on one single stranded RNA molecule separated from each other by between 2 and 1000 nucleotides and an antibody which is specific for an RNA/DNA duplex hybrid molecule.
The invention is best illustrated by Example 3 and
A standard RNA sample preparation workflow for whole transcriptome library generation and sequencing is shown. B. Simplified workflow for example 1 of the invention. Capture DNA probes consist of nucleotide sequences with homology to targeted mRNAs (light blue, dark blue and brown lanes) and 5′ (red) as well as 3′ (green) universal adapter tails without homology to the transcriptome. After hybridization RNA/DNA hybrids are captured with paramagnetic bead-coupled antibodies (blue Y). Finally, captured oligonucleotides libraries are amplified with universal primers to introduce sequence motifs for cluster generation, sequencing primer annealing and optionally for indexing allowing sample multiplexing.
The experimental workflow for example 1—experiment 1 is shown. Varying probe amounts in a range of 2 magnitudes were hybridized with constant amount of total RNA. Results of the capturing experiment were analyzed in 2 independent experiments by quantification of both captured probes and mRNA (target and off-target).
An amplification plot (left) and melting curves (right) of captured hybridization probes derived from 2 different experiments (upper lane: 1st experiment, lower lane: repeated 2nd experiment) is shown. Similar amplification curves and ct values indicate similar amounts of captured oligo probes. The melting curves are similar to the oligo mixture before hybridization. This indicates that the amplification plots obtained after SybrGreen qPCR are specific for the probes used.
SybrGreen amplification plots of 2 independent experiments (upper row: 1st experiment; lower row: 2nd experiment) are shown. Left: Amplification plots of the ACT cDNA (target region). Right: Amplification plots for RPT13a cDNA (off-target region). Control reactions indicated by K were performed after cDNA synthesis using 200 ng total RNA without hybridization. Template amounts for control reactions are not comparable with amounts obtained after hybridization. Therefore, ct values are not comparable as well. Amplification plots and ct values of the cDNAs derived from captured ACT mRNA are very similar independent from the amount of probes used for hybridization. Amplification plots and ct values for the cDNA derived for the off-target region (RPL13a mRNA) are identical with the negative control (hybridization without probes), indicating successful enrichment for the targeted RNAs. The reason for amplification curves of the negative control might be found in the nature of SybrGreen PCR and/or some extent of unspecific capturing without probes by the beads.
A column chart of the VCR data after hybridization and reverse transcriptase (RT) reaction for target and off-target mRNAs is shown.
An experimental workflow for example 1—experiment 2 is shown. Varying amounts of total RNA in a range between 50 ng and 1000 ng were hybridized with 0.7 nmol capture probe mixture (5.5 pmol each). Captured probes were analyzed exactly as in experiment 1 by SybrGreen qPCR or reverse transcriptase reaction and SybrGreen qPCR, respectively.
SybrGreen qPCR results of 2 independent experiments for quantification of captured probe oligos are shown. qPCR was performed using primers, which are homolog to the tailed sequences of the probes. In addition to the determination of ct values dissociation curves were generated to show the specificity of PCR products (for details see protocol in the appendix RSE0205). Unfortunately, PCR experiments without template resulted partially in amplification products. However, the ct values obtained for these controls were significant higher compared to those obtained for samples with template and therefore we do not expect a significant influence on the results.
SybrGreen qPCR results after reverse transcription of captured RNA using random 9mer primers are shown. With increasing RNA amounts the yields of captured RNAs increased, as indicated by decreased ct values. Whereas total RNA amounts used for hybridization and captured targeted RNAs (detection of mRNA from the ACT gene) show strong correlation, the yield of the off-target mRNA from RPL13a gene remains nearly constant, Hybridization with total RNA amounts doubled results in a delta-ct value of approximately −1 for targeted RNAs. Ct values obtained for samples without hybridization can not be compared with data obtained from capture experiments because of different RNA amounts used.
A general workflow for example 2 of the invention is shown. Hybridization and ligation mediated gene expression profiling. Long blue, black, brown and pink lanes indicate targeted mRNAs. Short lanes and arrows in the same colors indicate reverse complementary oligo probes to their target. Phosphorylation of the 5′ end of oligoA is not shown. Short red and green lanes and arrows indicate universal 5′ and 3′ tails of the probes, respectively. Primers for enrichment PCR are shown in red and yellow or green and light blue, respectively to indicate homologies to the probe tails as well as sequencing specific ends.
An experimental design for ligation of oligonucleotides after hybridization on RNA templates including expectation for the results is shown. Identical colors of arrows and lanes indicate primers and probes with homolog sequences. Phosphorylation of the 5′ end of probes DDX56A and DDX67A is not shown.
Ct values of hybridized and ligated oligo probe DDX67A+B on RNA templates DDX56 and DDX67 after SybrGreen PCR in different buffer systems are shown.
A schematic presentation of the workflow for example 3 of the invention is shown.
A workflow for example 3—experiment 4 of the invention is shown.
Agilent 2100 analysis of generated probes after PCR enrichment (endpoint PCR) is shown. Fragments, indicated with green tagged PCR primer pairs were expected and fragments, indicated with red tagged primer pairs were expected to fail analysis. Only expected fragments were detected in correct size.
Schematic presentation of the workflow of the invention (Example 3), 1, Total RNA. II. Hybridization of total RNA with a mixture of target specific DNA probes. Tailed probes A and B match in a distance to their target. III. Closing gap between oligonucleotide A and B by RT polymerase reaction and ligation in presence of ATP. Phosphorylation of the 5′ end of probe B is not shown. IV. Enrichment of newly synthesized cDNA molecules by antibody based purification of the DNA/RNA hybrids. V. Release of newly synthesized DNA by denaturation and RNAse treatment. VI. Enrichment of targeted cDNAs by PCR before sequencing.
mRNA Profiling by Hybrid Capture Technology:
Specific DNA oligonucleotides containing universal adapters at the 5′ and 3′ ends were hybridized to targeted mRNAs and subsequently captured with antibodies, which bind DNA/RNA hybrids. After magnetic separation of the DNA/RNA hybrid molecules purified probe libraries were enriched by PCR prior to sequencing (
DNA probes for hybridization with mRNAs of interest were designed specifically with comparable thermodynamic properties. Hybridization of the RNA with an excess of oligonucleotides followed by purification of the DNA/RNA hybrids allows quantification of the targeted mRNAs by determination of the number of DNA probes via sequencing.
By placing of probes on exon-exon junctions and adjustment of suitable hybridization conditions the selectivity to distinct mRNAs can be increased. Furthermore, it allows expression profiling of different splice variants of a mRNA.
Experiment 1:Hybridization of Total RNA with Varying Amounts of Hybrid Capture Probes
Sample: Total RNA from human T-cell Leukemia (Jurkat)
Target: mRNAs of following genes: GAPDH, ACTB, CBL, CEBPA1, NRAS
Off-Target: mRNAs of gene RPL13a
A description of the target and off-target sequences, probes for hybridization and primers for SybrGreen qPCR may be found in the appendix.
Although with 0.07nmol, 0.7 nmol and 7 nmol 2 magnitudes of different amounts of probes were hybridized with 500 ng total RNA in all cases comparable probe amounts were captured (
Analysis of Captured mRNAs:
Similar ct values for captured target RNAs were obtained after hybridization with different amounts of probes indicate successful enrichment independent from the probes excess (
Experiment 2: Hybridization of varying amounts of total RNA with excess of hybrid capture probes was done as follow:
Sample: Total RNA from human T-cell Leukemia (Jurkat)
Target: mRNAs of following genes: GAPDH, ACTB, CBL, CEBPA1, NRAS
Off-Target: mRNAs of gene RPL13a
Analysis of Hybridized Probes by SybrGreen qPCR:
According to our expectation after hybridization of varying amounts of RNA with an excess of probe oligos the yields of both captured RNA and captured probes should correlate with the amounts of starting RNA. The yield of captured hybridization products increased with increasing RNA amounts, which is indicated by decreasing ct values after to SybrGreen-qPCR. In
mRNA Profiling by Ligation of Oligonucleotide Probes on RNA Templates:
A probe consists of two tailed oligonucleotides. OligoB contains an universal 5′ tail and a target specific 3′ sequence. OligoA consists of an target specific 5′ end and an universal 3′ tail, Both tails are different in their base composition. In addition the 5′ end oligoA is phosphorylated. Both oligos match in direct neighborhood without a gap on their target RNA molecule allowing ligation of the 3′ end of oligoB with the phosphorylated 5′ end of oligoA. After hybridization and ligation fused oligo probes can be amplified via standard PCR using sequencer platform specific enrichment primers (
For evaluation if this idea is feasible, a model experiment was designed as following:
Two PCR fragments with T7 RNA polymerase promoter sequence at one end (DDX56 and DDX67) were generated with tailed primers (LRT7_DDX06.p1—01+LR_DDX5.q1—01 and LR_DDX07.p1—01+LRT7_DDX06.q1—01, respectively) using human gDNA as template and subsequently transcribed in vitro using T7 RNA polymerase (see genomic DNA). Purified RNAs derived from both PCR fragments were used as template for hybridization and ligation experiments.
Tailed DNA probes consisting of 2 separate oligonucleotides, each were designed for their mRNA targets DDX56 and DDX67 as indicated in
mRNA Profiling by Hybridization, Reverse Transcriptase Reaction and Subsequent Ligation of Tailed Oligonucleotide Probes on RNA Templates:
Probes are designed as in example 2, hut oligoA and B match in a distinct distance to their RNA target. Therefore, after hybridization a polymerase step is required to close the gap between both probe oligos prior to ligation (
This additional DNA synthesis step offers some advantages in comparison to the previous examples:
According to
For RNA DDX56 probes DDX56C+D and DDX56E+F were synthesized with identical sequence homology to RNA DDX56, but different tails.
For RNA DDX67 3 probes were designed. Probes DDX67G+H and DDX67J+K differ only in the tail sequence, whereas probe DDX67E+F is located on a different position on RNA67. According to the probes and templates 3 different hybridization experiments A, B and C were set up (
Experiment 5: Model Experiment to Evaluate the Correlation between Targeted RNA Template Amount and Fused Oligo Probes
A mixture of probes DDX56E+F and DDX67J+K was hybridized to different amounts of RNA DDX56 and DDX67 (
Molecular barcodes are generated by introduction of random nucleotides between universal tail and target specific sequences of the probes allow differentiation between fragments derived from a target RNA molecule and copies generated during PCR amplification causing a sequence bias (
Number | Date | Country | Kind |
---|---|---|---|
11191749.8 | Dec 2011 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2012/074062 | 11/30/2012 | WO | 00 | 5/30/2014 |