Genomes contain the essential genetic information for development and homeostasis of any living organisms. For an understanding of biological phenomena, knowledge is required on how genetic information is utilized in a cell or tissue at a given time point. Many cases are known where mistakes in the utilization of genetic information and related regulatory pass ways or within the expressed genetic information cause diseases in human, plant and animal. The RNA expression can be very different in individual cells in a given tissue or in an entire organism. It is therefore desirable to develop a novel method that enables the preparation and capture of RNA and DNA molecules from a limited number of cells, so that even individual cells within a tissue can be analyzed for their RNA expression, promoter usage and expressed genomic information. New directions in the field of life science are addressing such needs. Novel methodologies are being developed for the capture and analysis of individual DNA and/or RNA molecules, and for the understanding of entire biological systems as, for example, in gene network studies.
Different methods are used for expression profiling and annotation of transcripts. Briefly, large-scale expression studies nowadays use approaches based either on in situ hybridization using, e.g., microarrays, or on high-throughput sequencing of short tags, e.g. SAGE, CAGE, MPSS. Such studies may further be combined with classical approaches like RT-PCR or Northern Blotting to address expression levels of individual genes.
High-throughput expression profiling is commonly done by so-called DNA microarrays (Jordan B., DNA Microarrays: Gene Expression Applications, Springer-Verlag, Berlin Heidelberg New York, 2001; Schena A, DNA Microarrays, A Practical Approach, Oxford University Press, Oxford 1999, both of which are hereby incorporated herein by reference). For such experiments, specific probes representing certain individual genes or transcripts are placed on a support and put in hybridizing conditions with a variety of DNA molecules. Positive signals are obtained if a probe on the support reacts with a molecule present in the sample. Such experiments allow the parallel analysis of a large number of genes or transcripts. However, this approach is limited by the fact that only genes or transcripts can be studied, and they had to be initially identified by other experimental means. Such means include cDNA libraries, partial sequence tags and/or results obtained from computer predictions. In the future, the concept of tiling arrays may also allow for an unbiased expression profiling in organisms for which genomic sequence information is available (Kapranov P. et al., Science 296, 916-919 (2002), hereby incorporated herein by reference), although for tiling arrays interpretation is difficult with respect to the nature of the transcripts detected in the experiment.
Due to the limitations of DNA microarray experiments, alternative approaches are in use for gene discovery and expression profiling based on partial sequences or tags obtained from a plurality of RNA samples. The so-called SAGE (Serial Analysis of Gene Expression) method is known as an efficient method for obtaining partial information on the base sequence of an RNA molecule (Velculescu V. E. et at., Science 270, 484-487 (1995), hereby incorporated herein by reference). To achieve high throughput in tag sequencing, DNA concatemers are formed by ligating multiple short DNA fragments (initially about 10 bp in length) containing information on the base sequences at the 3′-end of multiple RNA molecules. A one-pass sequencing read of such a concatemer can determine the base sequences of many tags, i.e., different RNA molecules, within a DNA concatemer. Recently an improved version of SAGE, the so-called LongSAGE, has been published so as to allow for the cloning of longer SAGE tags (Saha S. et al., Nat. Biotechnol. 20, 508-12 (2002); and US patent applications 20030008290 and 20030049653, all of which are hereby incorporated herein by reference). The concept has been further expanded by the so-called “SuperSAGE” method providing sequencing tags of some 25 bp in length (Matsumura, H. et al., Cell. Microbiol. 7, 11-18 (2005), hereby incorporated herein by reference). The SAGE method is currently in wide use as an important method for analyzing genes expressed in specific cells, tissues or organisms, and SAGE tags are available for reference in the public domain, e.g., at http://cgap.nci.nih.gov/SAGE. More information about recent developments in the SAGE field can be found in Wang, S. M.: SAGE: Current Technologies and Applications, Horizon Bioscience, Norwich, 2005, hereby incorporated herein by reference.
U.S. Pat. Nos. 6,352,828, 6,306,597, 6,280,935, 6,265,163, and 5,695,934, all of which are hereby incorporated herein by reference, disclose different approaches for the high-throughput sequencing of short sequence tags, also denoted as Massively Parallel Signature Sequencing or “MPSS”. As described in more detail in Brenner S., et al., Nat. Biotechnol. 18, 630-634 (2000), and Brenner S., et al., Proc. Natl. Acad. Sci. USA 97, 1655-1670 (2000), both of which are hereby incorporated herein by reference, short sequences from the 3′-end of transcripts are obtained in a highly parallel manner by performing cycles with different enzymatic reactions on a single layer of beads. In WO03/091416, hereby incorporated herein by reference, modifications to the aforementioned approach are disclosed to also make the sequencing of short sequences possible from the 5′-end of transcripts. However, in its initial form MPSS was quite limited by the short read length of signature tags.
The shift from 3′-end related information to 5′-end related information, although technically more demanding, is a mandatory move to link expressed information to a regulatory principle which causes the transcriptional event. Common regulatory elements in the control of gene expression are located in the proximity of the 5′-end of a transcript in the so-called promoter regions of a given gene. Due to alternative promoter usage and rearrangements within primary transcripts due to RNA processing and splicing, for most transcripts in higher organisms, promoter regions cannot be identified from information derived from the 3′-end. Hence new approaches have been developed to obtain specifically sequence tags from 5′-ends of transcripts. Such an approach has been disclosed in PCT/JP03/07514, Shiraki T. et al., Prog. Natl. Acad. Sci. USA 100, 15776-15781 (2003), Kodzius R. et al. Nature Methods 3, 211-222 (2006), and US Patent Application 20050250100, all of which are hereby incorporated herein by reference. This so-called CAGE (Cap-Analysis-Gene-Expression) approach allows for the cloning of 5′-end-specific tags into concatemers in a way similar to the SAGE technology. The so-called CAGE tags not only enable the detection of transcripts and their expression profiling, but further provide information on transcriptional start sites to allow for mechanistic studies on the regulation of transcription or the higher annotation of transcripts. Similar approaches for the cloning of concatemers comprising 5′-end specific sequence information have lately also been published by a number of other laboratories, such as in Hwang B. J. et al., Proc. Natl. Acad. Sci. USA 101, 1650-1655 (2004); Hashimoto S. et al., Nat. Biotechnol. 22, 1146-1149 (2004); Zhang Z. and Dietrich F. S., Nuc. Acids Res. 33, 2838-2851 (2005); and Wei C. L. et al. Proc. Natl. Acad. Sci. USA 101, 11701-11706 (2004), all of which are hereby incorporated herein by reference. All those approaches are distinct by the technical means on how the capturing of true 5′-ends is achieved, e.g., by applying the so-called Cap-Trapper or Oligo-Capping methods further outlined below. Further information on the value of 5′-end related tags can be found in Harbers M. and Carninci P., Nature Methods. 2, 495-502 (2005), hereby incorporated herein by reference.
The aforementioned approaches are still limited with respect to the throughput of tag sequencing. In addition, they require many manipulation steps that can cause mistakes in the sequence information obtained from the concatemers. In particular, amplification steps can cause artifacts as well as a bias in the tag frequencies due to distinct amplification rates for individual DNA fragments. To solve these limitations, future directions have to target at direct capturing of DNA and/or RNA molecules for direct analysis so as to omit unnecessarily complicated manipulations and cloning steps, and at a much higher throughput in data acquisition. Recent developments in the field will open up such new avenues to obtain sequence information at a much higher throughput than presently possible by the classical approaches.
The ability to read and decode the genetic code has been one of the greatest breakthroughs in life sciences. The sequencing technologies have become the key to obtaining genetic information. A person skilled in the art knows different approaches for obtaining sequence information including, but not limited to, those described by Sambrook J. and Russuell D. W., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 2001, hereby incorporated herein by reference, like the classic Sanger and Maxim-Gilbert sequencing methods. In recent years, new approaches for high throughput sequencing have been developed. They make use of electrophoresis, sequencing by hybridization, sequencing by synthesis, MPSS sequencing, and non-enzymatic single molecule sequencing. For example, sequencing by hybridization makes use of high-density microarray platforms which have the hybridization patterns to a set of oligonucleotides of defined sequences presented thereon and allow for de novo sequencing of unknown sequences. Perlegen, for example, has used such approaches for the analysis of point mutations within the human genome. Alternatively, sequencing by synthesis can be performed by having a polymerase incorporate nucleotides during the extension of a DNA molecule along a DNA template. When performed on a surface, the incorporation of an individual nucleotide at a defined location can be monitored, and the subsequent order of incorporating nucleotides at a defined location determines the sequence of the nucleic acid molecule at the location. Since a very large number of extension reactions can be performed within one reactor, e.g., on the surface of a glass slide or on a bead array, the approach enables highly parallel sequencing of over 100,000 to 1,000,000 samples per reaction depending on the equipment used. Of particular interest for applying the present invention are novel approaches for the detection and sequencing of single DNA molecules as recently reviewed in Metzker M. L., Genome Res. 15, 1767-1776 (2005); Kling J., Nature Biotechnology 23, 1333-1335 (2005); and Shendure J. et al., Nature Review Genetics 5, 335-344 (2004), all of which are hereby incorporated herein by reference. Some of those approaches are subject to commercial applications as offered by companies like 454 Life Sciences at http://www.454.com/, Helicos BioScience at http://www.helicosbio.com/, Solexa at http://www.solexa.com/Company/overview.htm, Visigen at http://www.visigenbio.com/index.html, or GeneoVoxx GmbH at http://www.genovoxx.de/ (the information found at any of the web pages is hereby incorporate herein by reference). These new approaches should make presently umatched sequencing rates possible. Depending on the methodology used, some devices should sequence very short tags of about 20 to 25 bp at an ultra high throughput of over 1,000,000 or more reads per run (e.g. Helicos, and Solexa), and the device from 454 Life Science can realize much longer sequencing reads of over 100 bp at a rate of some 200,000 to 300,000 reads per run.
Isolation of Full-Length cDNA Molecules
The analysis of RNAs from a biological sample greatly benefits from the isolation of full-length RNA molecules and the preparation of cDNAs derived from such full-length RNAs. In particular as outlined above, sequence information from 5′-ends of RNAs enables a wider interpretation of data. Moreover, full-length cDNA technologies are mandatory for the analysis of biological processes related to RNA processing and splicing. Recent data obtained by different experimental approaches show that the complexity of higher organisms is achieved by an alternative use of genomic information. Additional mechanisms for the combinatorial rearrangement and processing of the transcripted information are important for diversification and expansion of the genetic pool (Zavolan M., et al., Genome Res. 13 (2003) 1290-1300, hereby incorporated herein by reference).
Different approaches for the preparation of full-length cDNAs have been described in the literature as summarized by Das, M., et al., Physiol Genomics 6, 57-80 (2001), hereby incorporated herein by reference. Out of those, the Cap-Trapper and Oligo-Capping methods are most frequently used besides other approaches known to a person skilled in the art in the field. These approaches have been instrumental in understanding genome and transcript structures and decoding protein sequences. Moreover, only full-length cDNAs give access to proteins encoded therein, which can be expressed from functional vectors when needed for experimental studies or industrial applications.
To apply the Cap-Trapper method (Carninci P. and Hayashizaki Y., Methods Enzymol. 303, 19-44 (1999); and U.S. Pat. Nos. 5,962,272 and 6,022,715, all of which are hereby incorporated herein by reference), the diol group of the Cap structure is chemically biotinylated to capture mRNA/cDNA hybrids on Streptavidin-coated beads. Remaining single-stranded RNAs including rRNAs and tRNAs, or RNA portions within partly double-stranded DNA-RNA hybrids are destroyed by RNase I digestion, whereas RNA moieties within mRNA/cDNA hybrids are protected against degradation. Full-length enriched cDNAs are then released from the beads by RNA hydrolysis.
The alternative Oligo-Capping method starts from the modification of 5′-ends of mRNAs (Maruyama K. and Sugano S., Gene 138, 171-174 (1994); and Suzuki Y. and Sugano S., Methods Mol. Biol. 221, 73-91 (2003), both of which are hereby incorporated herein by reference). In the first step, un-capped and truncated mRNAs are dephosphorylated at their 5′-ends by a phosphatase, followed by decapping capped mRNAs by treatment with tobacco acid pyrophosphates (TAP). This treatment leaves only full-length mRNAs phosphorylated at their 5′-ends. Therefore, RNA ligase can only attach an oligonucleotide to the 5′-ends of phosporylated full-length mRNAs. The oligonucleotide attached at the 5′-end of mRNA can be used in later manipulations of the cDNAs derived from such modified mRNAs, e.g., for 2nd strand cDNA synthesis.
Alternative approaches to full-length cDNA selection include the use of a Cap-binding protein (Edery, I., et al., Mol. Cell. Biol. 15, 3363-3371 (1995), hereby incorporated herein by reference) and an antibody against the Cap structure (Theissen, H., et al., Embo J 5, 3209-3217 (1986), hereby incorporated herein by reference), the attachment of an oligonucleotide to the Cap-structure (U.S. Pat. Nos. 5,962,272 and 6,022,715, both of which are hereby incorporated herein by reference), or the SMART™ method from Clontech (http://www.clontech.com/clontech/smart/index.shtml, the information provided therein is hereby incorporated herein by reference). However, the SMART™ approach as a Cap-Switching method (Zhu Y. et al., Biotechniques 30, 892-897 (2001), hereby incorporated herein by reference) adds the trinucleotide GGG to the 5′-end of mRNA, which makes it unfavorable for 5′-end tag cloning due to the reduced length of the informative part of the tag, particularly when sequencing approaches can only make very short sequencing reads.
The present invention relates to the modification of an RNA molecule or a plurality of RNA molecules to introduce sequence information at its/their 5′-end. The invention relates to the modification of RNA molecules, so that information added to the RNA molecules is used for their manipulation and/or analysis.
The present invention provides an innovative solution on how to obtain RNA and/or DNA molecules or fragments thereof for single molecule detection and sequencing. The present invention offers a new solution on how to capture individual molecules for detection and analysis as needed for new approaches to single molecule detection. The present invention modifies an RNA molecule and a DNA molecule derived therefrom in such a way that sequence information from a specified region of such modified RNA or DNA molecule can be obtained. Therefore, the invention provides a new high-throughput sequencing approach and its use in, for example, expression profiling, transcript characterization, genome annotation, cloning for further analysis, and other classical means. In particular, the invention provides a further method of high value to studies including, but not limited to, expression profiling based on 5′-end specific sequences, which is an essential component of commercial applications, reagents and services including, but not limited to, life science, drug development, diagnostics, or forensic studies.
In one embodiment, the present invention relates to the transcriptional conversion of a native or artificial RNA molecule or a plurality of RNA molecules into cDNAs. Hence, the invention relates to the synthesis and preparation of single-stranded DNA molecules. As such, the invention relates to a method for the isolation of fragments from nucleic acid molecules for the purpose of detection and analysis. Moreover, the invention relates to the conversion of an RNA sample containing one or more nucleic acid molecules of a single kind or plural kinds into DNA molecules.
In another embodiment, the invention relates to the manipulation of nucleic acid molecules so as to prepare nucleic acid molecules in the form of linear single-stranded DNAs. The invention relates to the preparation and manipulation of linear single-stranded DNAs which are transcripts derived from RNAs.
In a different embodiment, the invention provides a method for introducing functional groups at an end of an RNA or DNA molecule. Thus, the invention provides a method for capturing RNA and DNA molecules for analysis and manipulation by means of a functional group. In this embodiment, the invention relates to the isolation of a single nucleic acid molecule for the purpose of analysis and detection or sequencing.
Hence the invention provides a method for preparing a template for single molecule detection and high-throughput sequencing.
In another embodiment, the invention relates to the use of single-stranded DNA molecules for directly obtaining sequence information thereof. Hence the invention relates to obtaining sequence information from defined regions of single-stranded DNA fragments. In one particular embodiment, the 5′-end specific sequence information obtained from a DNA fragment prepared according to the invention relates to the 5′-end sequence of an RNA molecule. Thus, the invention relates to obtaining sequence information from an RNA molecule.
In another different embodiment, the invention provides a method for modifying the opposite ends of a DNA molecule derived from an RNA, where the modifications at the end corresponding to the 5′-end of the RNA and/or the end corresponding to the 3′-end of the RNA introduce a functional group or a group that otherwise has a function for the further manipulation, detection, and analysis of the DNA molecule. In one specific embodiment, a functional group is introduced at the end corresponding to the 3′-end of the RNA to capture the cDNA by binding it to a surface. In one particular embodiment, the 3′-end specific sequence information obtained from a DNA fragment prepared according to the invention relates to the end sequence of an RNA or DNA molecule. In another embodiment, both ends of the cDNA molecule are modified in such a way that the cDNA molecule can be amplified by means of the LAMP process or by means of the rolling circle amplification (RCA) process.
In another embodiment, the invention provides a method for introducing regions of a defined sequence, the so-called “Identifier Sequence,” at the 5′-end of an RNA. Such an Identifier Sequence identifies the origin of a molecule. Hence, with the introduction of an Identifier Sequence, it becomes possible to analyze a pooled sample which comprises samples of different origins. Such different origins may relate to different cell lines, organisms, or tissues used, or they may relate to different developmental stages or various time points within an experimental study. In one embodiment, the Identifier Sequence is part of the sequence obtained from a molecule prepared according to the invention. In one more embodiment, the Identifier Sequence is used to capture a molecule having such an Identifier Sequence to a specific location on a surface for the purpose of detection and analysis or sequencing. In just one more embodiment, the Identifier Sequence is used to prime the sequencing of a molecule among a plurality of molecules in a pooled sample. Similar to the foregoing, the invention provides a method for introducing Identifier Sequences not only at the 5′-end of an RNA, but also at the region of a cDNA equivalent to the 3′-end of the RNA. The use of an Identifier Sequence is not limited to the 5′-end of the RNA, but may be used at either end of a cDNA derived from the RNA depending on experimental requirements.
The invention relates to the sequencing of certain regions of DNA fragments obtained according to the invention for the purpose for their annotation by computational means including their statistical analysis, annotation by means of alignments to reference information, and/or mapping to genomic sequences. Thus, the invention relates to a method for gene discovery, gene identification, gene expression profiling, and their annotation.
In another further embodiment, the invention relates to the sequencing of DNA fragments obtained according to the invention to allow for their annotation by computational means, the readout of Identifier Sequences, and the statistical analysis of sequences, where such sequences are related to regions within genomes. Hence, the invention relates to the characterization of genetic elements within genomes with reference to transcriptional start sites.
In yet another different embodiment, the invention relates to the preparation of hybridization probes from the ends nucleic acid molecules, where such regions would be analyzed by means of in situ hybridization. In a preferred embodiment, the in situ hybridization experiment makes use of a tiling array.
In one more embodiment, the invention relates to the full-length cloning of nucleic acid molecules in such a way that the sequence information obtained from DNA fragments according to the invention is amplified. It is within the scope of the invention to amplify and clone transcripted regions as well as genomic fragments. Such fragments may contain promoter regions.
Thus, the invention provides a method for the analysis of nucleic acid molecules and short fragments thereof as needed, for example, for the characterization of biological samples. Moreover, the invention provides a method for fast and effective manipulation and/or sequencing of RNA and DNA fragments to make use of such fragments in analytical assays, such as single molecule detection. Hence, the invention is in particular suitable for high-throughput sequencing approaches and the parallel detection of RNA or DNA molecules on a solid support.
In a particular embodiment, the invention relates to the construction of a bidirectional template by means of a modified DNA that can be converted into a circular single-stranded DNA molecule. After amplification of the circular single-stranded DNA molecule by means of the RCA reaction, a bidirectional linear single-stranded DNA molecule is obtained that can be directly attached to a defined location on a solid support. The present invention makes it possible to obtain multiple sequencing reads from the same template at a defined location which links different sequencing reads to the same temple. By the use of a bidirectional linear single-stranded DNA molecule as template sequence information from both strands of the modified DNA molecule or both ends of an RNA molecule can be obtained from the same template.
The invention provides a required method for designing and performing analytical assays that can be used in life science studies and diagnostics. Hence, the invention relates to a method for analyzing a biological system or for diagnostics.
The invention also provides a method for designing and manufacturing a kit and reagents to perform the invention as such or in part as needed to satisfy experimental requirements.
The invention encompasses a method for handling single-stranded as well as double-stranded nucleic acids in the form of linear and circular nucleic acid molecules. Double-stranded DNA means any nucleic acid molecules each of which is composed of two polymers formed by deoxyribonucleotides and in which the two polymers have substantially complementary sequences to each other allowing for their association to form a dimeric molecule. The two polymers are bound to each other by specific hydrogen bonds between matching base pairs within the deoxyribonucleotides. Any DNA molecule composed only of one polymer chain formed by two or more deoxyribonucleotides having no matching complementary DNA molecule to associate with is considered to be a single-stranded DNA molecule for the purpose of the invention, even if such a molecule may form secondary structures comprising double-stranded DNA portions. As used interchangeably herein, the terms “nucleic acid molecule(s)” and “polynucleotide(s)” include RNA or DNA regardless of single or double-stranded, coding or non-coding, complementary or not, and sense or antisense, and also include hybrid sequences thereof. In particular, they encompass genomic DNAs and complementary DNAs, which may be transcribed or untranscribed, spliced or unspliced, incompletely spliced or processed, independent from its origin, cloned from a biological material, or obtained by means of synthesis. RNAs for the purpose of the invention are considered a single-stranded nucleic acid molecule even if such a molecule may form secondary structures comprising double-stranded RNA portions. In particular, RNAs encompass for the purpose of the invention any form of nucleic acid molecules comprising ribonucleotides, and do not relate to a particular sequence or origin. Thus, RNAs may be transcribed in vivo or in vitro by artificial systems or untranscribed, spliced or unspliced, incompletely spliced or processed, independent from its natural origin or derived from artificially designed templates. They may include mRNA, tRNA, rRNA, miRNA, siRNA, RNAi obtained by means of synthesis, or any mixture thereof. RNAs may derive from biological samples or more specifically from fluids of a biological origin, such as blood or serum. For instance, it may contain viral RNA or other potential parasites from the blood of an individual human; or the RNA may be obtained from purified cells, including flow-sorted cells from dissected tissue, where cells may be labeled with a selectable fluorescent antibody for cell sorting, or labeled by the transgenic expression of a marker such as the green fluorescent protein (GFP), using methods known to a person skilled in the art of the field. Alternatively, these cells are selected based on their morphology or by laser capture micro dissection. More precisely, the expressions “DNA”, “RNA”, “nucleic acid”, and “sequence” encompass nucleic acid materials themselves and are thus not restricted to particular sequence information, vector, phagemid or any other specific nucleic acid molecules. The term “nucleic acid” is also used herein to encompass naturally occurring nucleic acids, artificially synthesized or prepared nucleic acids, any modified nucleic acids into which at least one or more modifications have been introduced by naturally occurring events or through approaches known to a person skilled in the art. Similarly, a “tag” or an “Identifier Sequence” according to the invention can be any region of a nucleic acid molecule as prepared by means of the invention. The term “tag” or “Identifier Sequence” as used herein encompasses any nucleic acids fragment, no mater whether it comes from a naturally occurring source, or it is artificially synthesized or prepared. It may also encompass any modified nucleic acids into which at least one modification has been introduced by naturally occurring events or through approaches known to a person skilled in the art. Furthermore, the terms “tag” or “Identifier Sequence” do not relate to any particular sequence information or their composition. The terms “purity”, “enriched”, “purification”, “enrichment”, and “selection” are used interchangeably herein and do not require absolute purity or enrichment of a product. The terms “specific”, “preferable”, or “preferential” are used interchangeably herein and do not require absolute specificity of a DNA or RNA hybridization probe or an enzyme for its substrate, but rather they are intended to signify the possibility that an enzyme may have low or lower affinity compared to other compounds related or unrelated to its substrate. Similarly, the terms used to name an enzyme or an enzymatic activity are to describe the function or activity of such a component and do not require the absolute purity of such a component. Thus, any mixture containing a specific enzyme or enzymes with other components of the same, related or unrelated function are within the scope of the invention. Similarly, DNA or RNA molecules may function in a specific manner as hybridization probes, and as such, they may have “complementary sequences” for the purpose of the invention. DNAs or RNAs having complementary sequences can be used for the detection of a related nucleic acid molecule, even if such a probe and its target molecule may be distinct due to naturally occurring or artificially introduced mutations at different positions. The term “biological samples” includes any kind of material obtained from living organisms including microorganisms, animals, and plants, as well as any kind of infectious particles including viruses and prions, which depend on a host organism for their replication. As such “biological samples” include any kind material obtained from a patient, animal, plant or infectious particle for the purpose of research, development, diagnostics or therapy. Thus, the invention is not limited to the use of any particular nucleic acid molecules or their origin, but the invention provides a general method to be applied to and used for the manipulation and processing of any given nucleic acid. Any such nucleic acid molecules as applied to perform the invention can be obtained or prepared by any method known to a person skilled in the art including, but not limited to, those described in Sambrook J. and Russuell D. W., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 2001, hereby incorporated herein by reference.
The invention relates to methods for the isolation of fragments from nucleic acid molecules for the purpose of analysis and detection. The analysis of a nucleic acid molecule may include, but is not limited to, obtaining part or the entire sequence information of a nucleic acid molecule. A person skilled in the art knows different approaches for obtaining sequence information including, but not limited to, those described in Sambrook J. and Russuell D. W., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 2001, or newly evolving high throughput technologies described in the background art. The invention is not limited to the use for any particular sequencing approach or technology, and it provides a general method for manipulating RNA and DNA for analysis and detection as most suitable for the experimental needs or as appropriate in light of new developments in the field.
For manipulation, detection, or analysis including a sequencing reaction, nucleic acid molecules may be attached or otherwise bound to a solid support. A solid support may be any solid material with which components can be associated directly or indirectly. Such material includes, but is not limited to, acrylamide, agarose, cellulose, nitrocellulose, glass, gold, polystyrene, polyethylene vinyl acetate, polypropylene, polymethacrylate, polyethylene, polyethylene oxide, polysilicates, polycarbonates, Teflon, fluorocarbons, nylon, silicon rubber, polyanhydrides, polyglycolic acid, polyactic acid, polyorthoesters, functionalized silane, polypropylfumerate, collagen, glycosaminoglycans, polyamino acids, or any combination thereof. Solid supports may further include thin films, membranes, bottles, dishes, fibers, woven fibers, shaped polymers such as tubes, particles, beads, microparticles, or any combination thereof.
Thus, the invention relates to the conversion of a sample containing one or more nucleic acid molecules. Such nucleic acid molecules or any mixture of nucleic acid molecules would be converted into DNA. To perform the invention, nucleic acid molecules can be derived from any naturally occurring genomic DNA or RNA sample or from an existing DNA library of artificial origin, or any mixture thereof. The invention is not limited to the use of an individual nucleic acid molecule or any plurality of nucleic acid molecules, but the invention can be performed on an individual nucleic acid molecule or any plurality of nucleic acid molecules regardless whether such molecules would occur in nature, be derived from an exciting library, or be artificially created. Furthermore, the invention can process any nucleic acid molecule regardless of its origin or nature. Thus, it is within the scope of the invention that the nucleic acid molecules could be full-length molecules as compared to naturally occurring nucleic acid molecules, or any fragment thereof. Even furthermore, it can be envisioned that such fragments of nucleic acid molecules may be prepared by a random process or by a targeted dissection of nucleic acid molecules by means of an enzymatic activity with a preference for a certain sequence, or by means which would allow for the fragmentation based on the structure of the nucleic acid molecule including, but not limited to, exons and introns within transcripted regions. Thus, the invention is not restricted to the use of any particular starting material.
The invention relates to the modification of an RNA molecule or a plurality of RNA molecules to introduce sequence information and/or a functional group at the 5′-end of an individual RNA molecule or RNA molecules within a pool of RNA molecules. Such a functional group may comprise 1, 3, 1 to 5, 5 to 10, 10 to 15, 15 to 25, 25 to 35, 35 to 45 or more than 45 nucleotides. Hence, the invention relates to the modification of an RNA in such a way that information added to the RNA molecule is used for the manipulation and/or analysis of the RNA molecule or for the preparation and analysis of the modified RNA.
A person skilled in the art knows about different enzymatic and chemical approaches for the modification of RNA. Preferably, in order to practice the invention, an RNA is modified by enzymatic reactions so that a selective use of different enzymatic activities allows a targeted modification of certain RNA species within groups of RNAs. More preferably, mRNA molecules within total RNA are preferentially targeted for modification to allow for selective enrichment. However, the invention is not limited to the analysis of mRNA but provides a general method for capturing an RNA species for analysis and detection. Here results from recent studies point at entirely new RNA species like miRNA and other short RNA molecules (Alvarez-Garcia I. and Miska E A., Development 135, 4653-4662 (2005), hereby incorporated herein by reference) that could become subject to specific modification and analysis.
To perform the invention, the target RNA is subjected to three conceptually different steps: (1) masking of the non-full-length mRNA molecules, (2) conversion of the Cap structure within molecules into reactive molecules, and (3) attaching the treated RNA molecules to the 5′-end of target RNAs. A standard procedure for adding an RNA oligonucleotide to the 5′-ends of mRNAs is the so-called Oligo-Capping method (Maruyama K. and Sugano S., Gene 138, 171-174 (1994); and Suzuki Y. and Sugano S., Methods Mol. Biol. 221, 73-91 (2003), both of which are hereby incorporated herein by reference), and modifications thereof. RNA preparations from a living organism contain RNA species marked by the presence of a Cap structure at the 5′-ends of full-length mRNAs. This Cap structure makes them distinct from other truncated RNA species lacking such a Cap structure and having instead a free phosphate group at their 5′-ends. The Oligo-Capping approach makes use of the unique feature of full-length mRNAs for selective enrichment. Oligo-Capping comprises a number of enzymatic steps to specifically modify mRNA molecules within a pool of RNAs. In the first enzymatic reaction uncapped RNAs, such as truncated mRNAs, small RNAs, tRNAs, and rRNAs, are dephosphorylated at their 5′-ends by a phosphatase, followed by a second reaction step in which capped mRNAs are decapped by treatment with tobacco acid pyrophosphatase (TAP). This treatment leaves only full-length mRNAs phosphorylated at their 5′-ends. Therefore, in a third enzymatic reaction an RNA ligase can only attach an oligonucleotide to the 5′-ends of phosporylated full-length mRNAs.
For the first reaction step any phosphatase can be use that is able to remove the phosphate group from the 5′-end of RNA. More specifically, the phosphatase can be selected out of a list of the Bacterial Alkaline Phosphatase (BAP), Calf Intestine Alkaline Phosphatase (CIAP), Shrimp Alkaline Phosphatase (SAP), or Antarctic Phosphatase. Similarly different pyrophosphatases may be used to perform the invention, where most commonly the tobacco acid pyrophosphatase (TAP) is used for the removal of the Cap structure. For the RNA ligation step, any RNA Ligase can be used that can ligate an DNA and/or RNA oligonucleotide to phosphorylated RNA. Most commonly the T4 RNA ligase or the Thermo Phage single-stranded DNA ligase is used in this reaction. The Thermo Phage single-stranded DNA Ligase is a commercially available enzyme that can work both on single-stranded DNA and RNA (for more information on the enzyme refer to the product information under http://www.prokaria.com/upload/files/Thermophage-ssDNA-ligase-version-4-2.pdf, hereby incorporated herein by reference). Therefore this enzyme may be preferable to directly ligate an DNA oligonucleotide to RNA. Hence the invention provides a method for directly ligating DNA to RNA so as to prepare a linear heteropolymer composed of desoxyribonucleotides or DNA oligonucleotides and ribonucloetides or RNA or RNA oligonucleotides.
A person skilled in the art knows different modifications of the Oligo-Capping approach that can be used to perform the invention. Most preferably the invention makes use of a procedure where all enzymatic reactions are performed in a single reaction vial as disclosed in patent application JP2006-106770, hereby incorporated herein by reference. In brief, the first reaction step makes use of a phosphatase that can be inactivated by heat treatment, such as Antarctic Phosphatase. After inactivation of the first enzyme in the reaction chain, buffer conditions are changed by the addition of new components suitable for running the TAP reaction. TAP can again be inactivated by heat treatment. Therefore, only another change in the buffer conditions by the addition of additional components and an oligonucleotide is sufficient to perform the ligation of an oligonucleotide to phosphorylated RNA as a final reaction step (compare
In the above, the modification reaction is performed in such a way that mRNA molecules within a pool of RNAs are modified for further manipulation. However, the invention is not restricted to the modification of mRNAs. In a different example, all phosphorylated RNA molecules lacking a Cap structure are directly modified by the ligation of an oligonucleotide to the 5′-end of RNAs. In this example, the invention enables a selective modification of non-mRNA molecules and truncated mRNA molecules. In just a different example of the invention, RNA molecules lacking a Cap structure are modified in a first enzymatic reaction. In one example, only the RNA molecules lacking a Cap structure are modified for manipulation according to the invention. In a different example, the first reaction step is followed by other steps to stepwise modify different RNA molecules. Therefore, in a second enzymatic reaction, the Cap structure of the full-length mRNA molecules is removed by an enzymatic reaction, TAP, to create phosphate groups at the 5′-end of the full-length mRNA molecules. In the last reaction step, an oligonucleotide of the same or different sequences is ligated to the full-length mRNA molecules. In this embodiment, the invention provides a method for adding different oligonucleotides to certain different RNA species within a pool of various RNA molecules.
Following the course of events outlined above and further described in
Most commonly the oligonucleotides are designed to function as “primers” for the introduction of priming sites at 5′-ends of RNAs. Primers may be an oligonucleotide comprising 5, 6, 5 to 10, 10 to 15, 15 to 20, 20 to 30, 30 to 40, 40 to 50 or more than 50 nucleotides. After synthesis of a complementary nucleic acid strand, the 3′-end of the new synthesized second strand will have complementary sequences to the oligonucleotide attached to the 5′-end of RNA. Hence, oligonucleotides having entirely or in part the same sequence as the oligonucleotide added to the 5′-end of RNAs can be used to prime the synthesis of nucleic acid molecules having in part or entirely the same sequence as that of the modified mRNAs. The priming of the second strand can be used, for example, for the preparation of double-stranded or single-stranded DNAs, for DNA or RNA amplification, and for sequencing. Different approaches for the synthesis of a second DNA strand by means of a DNA polymerase can be found in standard textbooks such as Sambrook J. and Russuell D. W., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 2001, hereby incorporated herein by reference. Such DNA polymerases include, but are not limited to, the Klenow fragment of DNA polymerase I, T4 and T7 DNA polymerases, DNA polymerase I, Taq polymerase, Tfl DNA polymerase, Tth DNA polymerase, Tli DNA polymerase, or any other DNA polymerase known in the field. For example, for the preparation of linear single-stranded DNA, various technologies have been developed familiar to a person skilled in the sate of the art in the field. Some approaches use a DNA-polymerase-based synthesis of single-stranded DNA from a DNA or RNA template. In a particular case, the synthesis of a single-stranded DNA can be achieved by the so-called asymmetric PCR reaction, in which the two primers are used at different concentrations. After the rate-limiting primer is exhausted, the reaction switches from the exponential amplification of double-stranded DNA to the linear amplification of the one strand primed by the primer used in excess over the rate-limiting primer. In an alternative approach lambda exonuclease is used to digest the one strand of double-stranded DNA having a 5′-phosphorylated end. Such a template can be prepared in PCR reactions in which only one out of two primers is phosphorylated at the 5′-end. The lambda exonuclease, also denoted as “Strandase™”, is commercially available from Novagen, Madison, USA, and the documentation on its “Strandase™ ssDNA Preparation Kit”, Cat. No. 69202, is hereby incorporated herein by reference. Similarly, the enzyme can also be obtained as lambda exonuclease from Epicentre, Madison, USA (Cat. Nos. LE035H and LE032K). For a number of applications of single-stranded linear DNA, the single-stranded DNA is prepared by means of the PCR reaction in which one of the two primers is specifically tagged. While not limited to it, a biotin label is most frequently applied to separate the strand having a functional group and the second undesired strand from the template DNA. This approach is of value particularly when the strand of interest is supposed to be used as attached to a matrix or any kind of solid support. The immobilized single-stranded DNA can be directly purified on the support and used in detection assays depending on strand specific preparation and isolation of single-stranded DNA or in the preparation of a template for DNA sequencing. One such application includes, but is not limited to, the detection and characterization of SNPs in genomic DNA in, for example, the so-called DASH SNP detection system. This approach is described in US Patent Application No. 2001046670, which is hereby incorporated herein by reference. S. Stahl et al. (Stahl, S. et al, Nucleic Acid Research 16, 3025-3038 (1988), hereby incorporated herein by reference) have found a different application in which biotinylated DNA is used, for example, for sequencing on solid phase. In just another example, in a reaction cycle combining the activities of a reverse transcriptase, RNase H, and a DNA-dependent RNA polymerase a modified RNA molecule can be amplified in accordance with the method published by Guatelli J. C. et al., Proc. Natl. Acad. Sci. USA 87, 1874-1878 (1990), hereby incorporated herein by reference.
In a different embodiment, the oligonucleotides attached to the 5′-end of RNA are designed to have sequence information to enable the manipulation of the RNA molecule or any DNA molecule derived therefrom. A person skilled in the art knows many enzymatic activities that depend on binding to specific sequences or recognition sites. Many such enzymes can be commercially obtained from different suppliers including, but not limited to, FERMENTAS UAB (Vilnius, Lithuania), New England Biolabs Inc. (Beverly, USA), Promega (Madison, USA), Takara (Tokyo, Japan), Roche (Mannheim, Germany), and GE Biosciences (Cardiff, United Kingdom). Commonly restriction endonucleases cut only double-stranded DNA but do not cut single-stranded DNA. Most commonly restriction endonucleases are used to digest DNA molecules at defined locations such as their recognition site or locations in the proximity of their recognition site. In one example the recognition site introduced by an oligonucleotide and attached to the 5′-end of RNA is a restriction site for a class-IIs restriction enzyme. These enzymes cleave outside of their recognition sequence, where, for example, the Class IIs restriction enzyme MmeI cleaves 20/18 base pairs apart from its recognition site. Therefore, MmeI is commonly used for the isolation of short sequencing tags as, for example, in the aforementioned LongSAGE, 5′-SAGE and CAGE approaches. Other applications would make use of restriction endonucleases for the purpose of DNA recombination and cloning known to a person skilled in the art, and further described in Sambrook J. and Russuell D. W., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 2001, hereby incorporated herein by reference. Moreover, a modified or otherwise designed restriction endonuclease can function as a strand-specific nicking enzyme, which cleaves only one DNA strand within its recognition sequence in a double-stranded DNA substrate. Such enzymes include, but are not limited to, the commercially available nucleases N.Bpu 10I (FERMENTAS UAB, Vilnius, Lithuania), N.Bbv C IA, N.Bst NB I and N. Alw I (New England Biolabs Inc, Beverly, USA). Nicking enzymes are of particular interest to create priming sites within double-stranded DNA, which can be used for primer extension reactions toward DNA synthesis and sequencing. An example of a reaction in which one DNA strand within a double-stranded DNA molecule is nicked by an enzymatic activity to create a priming site for DNA synthesis has been described by Walker T. G. et al., Proc. Natl. Acad. Sci. USA 89, 392-396 (1992), hereby incorporated herein by reference. In a different example, the oligonucleotide introduces a recognition site for a RNA polymerase including, but not limited to, the T3 RNA polymerase, T7 RNA polymerase, or SP6 RNA polymerase, all of which are DNA-dependent RNA polymerases with specificity for their respective double-stranded promoters. Starting from the promoters or their recognition sites, they catalyze the 5′-to-3′ synthesis of a complementary RNA from either a single-stranded DNA or double-stranded DNA template. Guatelli J. C. et al. have described an example for the use of a DNA-dependent RNA polymerase in Proc. Natl. Acad. Sci. USA 87, 187401878 (1990), hereby incorporated herein by reference. In a different example, the oligonucleotide may have recognition sites for DNA binding proteins. Many DNA binding proteins are known to a person skilled in the art, which can be of natural occurrence or may have been prepared by means of protein design. Such DNA binding proteins include, but are not limited to, transcription factors, proteins of regulatory function that bind directly or indirectly to recognition sites in genomic DNA. Transcription factors are essential molecules for life and needed for the utilization of genomic information. Every living organism contains a large number of transcription factors. As an example, Kanamori M. et al. have published a database on all the known transcription factors from mouse in Biochem Biophys Res Commun, 322, 787-93 (2004), hereby incorporated herein by reference. Transcription factors are distinct in terms of their affinity to different recognition sites. This specificity can be used for the enrichment of DNA molecules comprising recognition sites for a given transcription factor and/or group of transcription factors. However, the binding specificity of a transcription factor is not limited to binding a certain sequence, as a person skilled in the art will know proteins that rather recognize structures than specific sequences. For example, the transcription factor DAX-1 can bind to different DNA structures as described by Zazopoulos E. et al., Nature 390, 311-315 (1997), hereby incorporated herein by reference. In a different example, a DNA binding protein may bind specifically to single-stranded DNA. Single-stranded-DNA binding proteins including, but not limited to, SSB from E. coli, the product of the phage T4 Gene 32, the adenovirus DBP, an antibody directed against single-stranded DNA, calf thymus UPI, or any mixture thereof. In addition there are proteins that specifically bind to mismatches in double-stranded DNA. This group of proteins includes, but is not limited to, the family of MutS proteins (for reference on the protein family refer to http://www.tigr.org/˜jeisen/MutS/MutS.html, the content of this webpage is hereby incorporated herein by reference), related to a major mismatch repair pathway in E. coli. Where primers are used in primer extension reactions that have a mismatch in their sequence as compared to the complementary sequence or parts thereof attached to the modified RNA or any DNA derived thereof, a MutS proteins or any member of the gene family may be used to specifically enrich double-stranded DNA species having mismatches. In addition MutS or any member of the gene family may be used to block or otherwise manipulate primer extension reactions. Some MutS proteins are commercially available as, for example, Taq MutS from Nippongene (Tokyo, Japan, Code Number 316-04011). In a different example the oligonucleotide may have regions of a given sequence that can be used as an “Identifier Sequence” or “Barcode”. Such a given sequence can be used as an Identifier Sequence to mark the origin of a sample, or it can function as a tag to specifically capture a modified RNA or any DNA derived thereof by means of hybridization to a nucleic acid molecule or the like having complementary sequence to the Identifier Sequence. Such a sequence can also be used as a specific and selective priming site for any of the aforementioned enzymatic reactions. As such, the Identifier Sequence can be a selective priming site for second-strand synthesis by a DNA polymerase, amplification, for example, by means of a PCR reaction, or preparation of a single-stranded DNA. Hence, in combination with the aforementioned method for introducing different oligonucleotides such as Identifier Sequences or recognition sites to different RNA species, the invention provides another method for separately manipulating individual RNA molecules within a plurality of RNA molecules or the total RNA.
In a different embodiment, the invention provides a method for introducing functional groups at the end of an RNA molecule or a variety of RNA molecules. Many different functional groups have an affinity to bind to a binding molecule. A functional group may include, but is not limited to, a reactive group or cross linker suitable to form a covalent bound in a chemical reaction, an amino group, biotin, digoxigenin, antibody, antigen, a protein, a nucleic acid, a nucleic acid binding molecule, or any combination thereof. The functional group and any molecule attached to the functional group can bind to binding molecules which are presented on a matrix. For the purpose of the invention a matrix may be selected from any immobilized form of a reactive group that can be used in a chemical reaction to form a covalent bound, such as avidin, streptavidin, a digoxigenin-binding molecule, an oligonucleotide having a defined sequence, an antibody or its ligand, and a chemical matrix. If the applied functional group is biotin, then the related matrix is avidin or streptavidin. Similarly, when the functional group is digoxigenin, the matrix is a digoxigenin-binding molecule (see Roche Diagnostics GmbH Catalog, the documentation therein is hereby incorporated herein by reference). When the functional group is an oligonucleotide, the matrix is an oligonucleotide having a sequence complementary to that of the functional group, or when the functional group is an antigen, the matrix may be an antibody or an antibody-binding protein such as protein I or protein G. Hence, the invention provides a method for introducing a functional group to an RNA molecule, where such a functional group is attached to the oligonucleotide. Modified oligonucleotides can be commercially obtained from many providers. Most frequently biotin-labeled oligonucleotides are used in the field. For an example of the preparation of different modified oligonucleotides, see the web site of MWG Biotech at http://www.mwg-biotech.com/html/s_synthetic_acids/s_modifications.shtml, the information available therein is hereby incorporated herein by reference. MWG Biotech can provide oligonucleotides having biotin or digoxigenin as a functional group at different positions in an oligonucleotide. Moreover, modified oligonucleotides can be obtained having one or more functional groups such as reactive groups for cross linking like the 5′ Aminolink C3/C5/C6/C12, 3′ Aminolink C3/C6/C7, 3′ Aminolink C3/C6/C7, Amino (C2/C6)-dT, Amino C6-dC, Spacer C3/C9 (TEG), Spacer C12/C18 (HEG), or a reduced Thiol modifier. RNA oligonucleotides can be purchased, for example, from Invitrogen and some information on such available RNA oligonucleotides is available at http://www.invitrogen.com/content.cfm?pageid=9900, This information is hereby incorporated herein by reference. Also, Operon provides some useful information at http://www.operon.com/, and such information is hereby incorporated herein by reference.
In the aforementioned embodiments, and as further outlined in
A RNA molecule can be used as a template to prepare a DNA transcript by means of a reverse transcriptase as described in Sambrook J. and Russuell D. W., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 2001, hereby incorporated herein by reference, and a person skilled in the art in the field knows many modifications of the process including different reaction conditions and enzyme modifications. For example, reverse transcriptases include, but are not limited to, AMV reverse transcriptase, M-MLV reverse transcriptase, or M-MLV reverse transcriptase RNase H minus or any other modifications thereof. Any modified RNAs obtained in accordance with any or all afore described steps and further outlined in
The aforementioned synthesis of a DNA from an RNA template leads to the formation of a double-stranded DNA/RNA molecule. The RNA portion within any such double-stranded DNA/RNA molecule can be removed by means of an RNA degrading enzyme or changes in the pH of the reaction buffer. For example, the enzyme RNase H specifically digests RNAs within double-stranded DNA/RNA molecules, making it a preferable enzyme to practice the invention. The removal of RNA applies for any kind of cDNA regardless of the priming, either random priming, specific priming or oligo-dT priming, used for the reverse transcription reaction. Examples for the removal of RNA from a DNA/RNA template are described in Sambrook J. and Russuell D. W., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 2001, hereby incorporated herein by reference. Any such treatment of a double-stranded DNA/RNA molecule releases the DNA strand which can be obtained as a single-stranded DNA molecule whose entire template or most of the template had been made of ribonucleotides. In case the DNA/RNA hybrid contains regions of double-stranded DNA, for example, when parts or the entirety of the oligonucleotide added to the RNA molecule at a previous step have been made out of DNAs, the removal of the RNA portion of the hybrid molecule will lead to the preparation of a DNA molecule comprising regions of a double-stranded DNA at the end equal to the 5′-end of the RNA template. Hence, the invention provides a method for preparing single-stranded and/or partly single-stranded DNA molecules comprising sequence information derived from an RNA molecule which may be an mRNA or a total RNA or sequence information introduced by means of manipulation of such an RNA molecule.
Single-stranded DNA molecules are important for DNA analysis and manipulation, and many applications and technologies in molecular biology and biotechnology require the strand-specific preparation of single-stranded DNA. Such applications include, but are not limited to, the preparation of a template DNA for sequencing or for strand-specific DNA synthesis including synthesis of labeled probes, the replacement of thymine residues by uracil, the introduction of point mutations, the preparation of testers and drivers for subtractive hybridizations or the detection and isolation of individual clones in a mixture of various DNA or RNA molecules, the detection and analysis of single nucleotide polymorphisms (SNPs), and the preparation of microarrays. Those methods and their applications are well known to those skilled in the art of molecular biology and are further described by Sambrook J. and Russuell D. W., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 2001, hereby incorporated herein by reference.
DNA templates obtained by means of the invention may be distinct in their features depending on the nature of the oligonucleotide added to the RNA species at the early stage as, for example, shown in
In the aforementioned examples, the invention provided a method for modifying 5′-ends of mRNAs. However, the invention is not limited to the modification of the 5′-ends. In accordance with the examples given above for the priming of cDNA synthesis in reverse transcription reaction and depicted further in
In another example, the primer used in the reverse transcription reaction includes a functional group attached to such primer. Such functional group will be incorporated into the cDNA regardless of the nature of the primer such as a set of random primers, a specific primer, or an oligo-dT primer. Hence the invention provides a method for the preparation of cDNA fragments having a modified 5′-end with a functional group. A person skilled in the art knows many different functional groups that have an affinity to bind to a binding molecule. A functional group may include, but is not limited to, a reactive group, an amino group, biotin, digoxigenin, an antibody, an antigen, a protein, and a nucleic acid binding molecule. The functional group and any molecule attached to such a functional group can bind to a binding molecule presented on a matrix. For the purpose of the invention a matrix may be selected from any immobilized form of avidin, streptavidin, a digoxigenin-binding molecule, an antibody and its ligand and/or chemical matrix. If the applied functional group is a reactive group such as an amino group, then in a chemical reaction the reactive group can be used to form a covalent bound to the matrix. When the functional group is biotin, then the related matrix is avidin or streptavidin. Similarly, when the functional group is digoxigenin, the matrix is a digoxigenin-binding molecule (see Roche Diagnostics GmbH Catalog, which is hereby incorporated herein by reference). When the functional group is an antigen, the matrix may be an antibody or an antibody-binding protein such as protein I or protein G. Modified oligonucleotides can be commercially obtained from many providers. Frequently, biotin-labeled oligonucleotides are used in the field. As an example for the preparation of different modified oligonucleotides see the web pages of MWG Biotech at http://www.mwg-biotech.com/html/s_synthetic_acids/s_modifications.shtml, Invitrogen at http://www.invitrogen.com/content.cfm?pageid=9900, or Operon at http://www.operon.com/, the information found in those pages is hereby incorporated herein by reference. In a different example, the oligonucleotide may have recognition sites for a DNA binding protein. Many DNA binding proteins are known to a person skilled in the art, and they can be of natural occurrence or may have been prepared by means of protein design. Such DNA binding proteins include, but are not limited to, transcription factors, proteins of regulatory function that bind directly or indirectly to recognition sites in genomic DNA. Every living organism contains a large number of transcription factors. As an example, Kanamori M. et al. have published a database on all the known transcription factors from mouse in Biochem Biophys Res Commun, 322, 787-93 (2004), hereby incorporated herein by reference. Transcription factors are distinct by their affinity to different recognition sites in such a way that transcription factors bind to specific sequences. This specificity can be used for the enrichment of DNA molecules comprising recognition sites for a given transcription factor and/or group of transcription factors. However, the binding specificity of a transcription factor is not limited to binding a certain sequence, as a person skilled in the art knows proteins that rather recognize structures than specific sequences. For example, the transcription factor DAX-1 can bind to different DNA structures such as those described by Zazopoulos E. et al., Nature 390, 311-315 (1997), hereby incorporated herein by reference. In a different example, a DNA binding protein may bind specifically to a single-stranded DNA. Single-stranded-DNA binding proteins including, but not limited to, SSB from E. coli, the product of the phage T4 Gene 32, the adenovirus DBP, an antibody directed against a single-stranded DNA, calf thymus UPI, or any mixture thereof. In a different example, the binding protein may be MutS or a member of the MutS gene family. In a further different example, the oligonucleotide may have regions of a given sequence that can be used as an Identifier Sequence. Such a given sequence or the Identifier Sequence can be used to mark the origin of a sample, or can function as a tag to specifically capture a modified RNA or any DNA derived thereof by means of hybridization to a nucleic acid molecule or the like having a sequence complementary to the Identifier Sequence, or can be used as a specific and selective priming site for any of the aforementioned enzymatic reactions. As such, the Identifier Sequence can be a selective priming site for the second-strand synthesis by a DNA polymerase, the amplification, for example, by means of a PCR reaction, the preparation of a single-stranded DNA, or the priming of a sequence reaction. Hence, in combination with the aforementioned methods for introducing different oligonucleotides into a cDNA which is derived from RNA molecules among a plurality of RNAs.
In accordance with any of the steps outlined in the forgoing, the invention provides a method for introducing a functional group at a position equal to the 5′-end of an RNA. In addition, the invention provides a method for introducing a functional group at a position equal to the 3′-end of RNA or the 5′-end of a first strand cDNA. Hence, the invention provides a method for introducing a functional group at either end of a cDNA derived from an RNA. The functional group attached to an RNA or a cDNA has a binding affinity to another molecule, and the functional group can be used to capture the modified RNA or cDNA and attach molecules to a surface. Examples for combinations of a functional group and a binding molecule may include, but are not limited to, a reactive group such as an amino group that can be used to form a covalent bound to the matrix in a chemical reaction the reactive group, biotin binding to avidin or streptavidin, digoxigenin binding to a digoxigenin-binding molecule, an oligonucleotide binding to a complementary sequence, an antigen binding to an antibody, or an antibody binding to an antibody-binding protein such as protein I or protein G. Depending on the location of the functional group the modified RNA or DNA may be attached to a surface in a different manner or orientation. For example, a partly double-stranded cDNA molecule can be attached to a surface by means of an interaction of the functional group at a position equal to the 5′-end of RNA. In this example, the partly double-stranded region of the cDNA molecule enables the sequencing of the cDNA fragments from the end equal to the 5′-end of RNA (compare
In the aforementioned embodiment, the invention provides a method for preparing single-stranded or partly single-stranded RNA and/or DNA molecules. Using a functional group, such molecules can be attached to a surface. Molecules on a surface can be washed by different buffers for purification and further manipulation. Hence, the invention provides a method for purifying single-stranded DNAs, partly single-stranded DNAs, or RNAs. Such a method for purifying of single-stranded DNAs, partly single-stranded DNAs, or RNAs are mandatory for the detection of single molecules as achieved by new technologies including, but not limited to, those described in Metzker M. L. Genome Res. 15, 1767-1776 (2005), Kling J., Nature Biotechnology 23, 1333-1335 (2005) and Shendure J. et al., Nature Review Genetics 5, 335-344 (2004), both of which are hereby incorporated herein by reference. Hence, the invention relates to the use of single-stranded DNA, partly single-stranded DNA, or RNA molecules for directly obtaining sequence information thereof. In a preferable embodiment, the invention relates to obtaining sequence information from defined regions of single-stranded DNA fragments. In a preferable example, the 5′-end specific sequence information of RNA is obtained from a DNA fragment prepared according to the invention having sequence complementary to the 5′-end sequence of an RNA molecule. Thus, the invention relates to obtaining sequence information from RNAs.
In accordance with the forgoing and any of the steps outlined in
In a preferable example, a DNA molecule having loop structures at each end is converted into a circular single-stranded DNA molecule by steps comprising a first reaction in which the free 3′-end of one hairpin structure is extended by means of a DNA polymerase lacking any exonuclease and strand-displacement activities. Such DNA polymerases include, but are not limited to, any reverse transcriptase such as the M-MuLV Reverse Transcriptase, H Minus M-MuLV Reverse Transcriptase, Superscript II, Superscript III, AMV Reverse Transcriptase, MonsterScript, Expand Reverse Transcriptase, or any mixture thereof. Other DNA polymerases may include, but are not limited to, the Klenow fragment of DNA polymerase I, T4 and T7 DNA polymerases, DNA polymerase I, Taq polymerase, Tfl DNA polymerase, Tth DNA polymerase, Tli DNA polymerase, or any other DNA polymerase known in the field. Due to the lack of a strand displacement activity the DNA polymerase will stop when reaching the 5′-end of the opposite hairpin structure. In a second reaction step, the open ends of the single-stranded DNA molecule are ligated to each other to form a circular DNA molecule. Such a ligation reaction can be performed by any DNA ligase including but not limited to the T4 DNA ligase, E. coli DNA ligase, or Taq DNA ligase. Circular single-stranded DNA molecules can be amplified by means of the rolling circle amplification method (so-called RCA method). The RCA reaction is driven by a DNA polymerases that can extend oligonucleotide primers on a circular template in an isothermal reaction as further describe in U.S. Pat. Nos. 5,854,033 and 6,143,495, both of which are hereby incorporated herein by reference. The reaction product is a linear chain of single-stranded DNA which contains copies of a template linked in tandem. Depending on the reaction conditions and time, the reaction product may contain tens, hundreds, or even thousands of copies of the original template in one molecule. Special DNA polymerases for use in RCA reactions are known to a person skilled in the art in the field including, but not limited to, the phi29 DNA Polymerase, which has a strong strand displacement activity needed for efficient isothermal DNA amplification. A person skilled in the art in the field knows many different applications and modifications of the RCA method. For further reference on the RCA method, see the following review articles: Gusev, Y. et al., American J. Pathology 159, 63-69 (2001), or Zhang D. et al. Clin. Chim. Acta. 363, 61-70 (2006), both of which are hereby incorporated herein by reference. Hence, a DNA molecule prepared according to the invention can be amplified in such a way that a linear polymer of repetitive sequences is obtained, and such a polymer contains sequences that can be used to drive a sequence reaction.
In one example, the RCA reaction is performed in such a way that the reaction product is directly or indirectly bound to a defined location or a point called the point of detection, analysis, or sequencing. As one example, Nallur G. et al., Nucleic Acid Res, 29, el 18 (2001) describes procedures for RCA mediated signal amplification on glass slides. In this example, RCA is the enabling step to perform clonal amplification of individual targets within a plurality of nucleic acid molecules, in which each molecule is amplified at a defined location on a surface. Here, the RCA reaction can be performed in a highly parallel manner without taking the risk of amplification biases known, for example, from classical PCR reactions. Using primers that are attached to a surface in the RCA reaction, an arrayed matrix of reaction products can be obtained so that each reaction product contains multiple copies of the template in one molecule. Hence, the RCA reaction can greatly amplify the sensitivity of detection or analysis, or can make it possible to perform a reliable sequencing reaction at a given location.
In a different example, the RCA reaction is used to prepare a template for detection, analysis, and/or sequencing at a defined location, where the template is subject to one or more detection steps, analysis, or sequencing reactions. Hence, the invention provides a method of sequencing a template in a first step, removing the amplification products produced during the sequence reaction from the sequencing template in a second reaction step, and re-sequencing the same template in a third reaction step by a different primer at the same location. Such a course of reactions may be performed to obtain two different sequencing reads from one template, three different sequencing reads from one template, four different sequencing reads from one template, five different sequencing reads from one template, or even more than five different sequencing reads from one template. The covalent attachment of DNA to a surface is discussed and it is shown that the covalent bound allows for at least 30 cycles of hybridization and stripping of the hybridized DNA in Beier M. and Hoheisel J. D., Nucleic Acid Research, 27, 1970-1977 (1999). Hence, the invention provides a method for providing more than one type of sequence information from a template, in which different sequencing reactions are performed at the same location, at which the link is defined between the different sequencing reads obtained from the same template.
In another example, the RCA reaction is performed to prepare a reaction product that contains multiple copies of the sense and anti-sense strands of an original RNA molecule. Such reaction products are obtained when a circular template for the reaction is prepared in accordance with the steps shown in
In a different embodiment, the invention relates to the use of Identifier Sequences introduced at the 5′-end of RNAs or at regions equivalent to the 3′-end of RNAs, and the use thereof. As outlined in the forgoing, the invention provides a method for introducing specific sequences or Identifier Sequences at the opposite ends of a cDNA as prepare in accordance with the invention. In a preferable example, the Identifier Sequences are located in the close proximity of the ends of the RNA or cDNA to enable there sequencing within the same sequencing reaction used to obtain sequence information from the RNA or cDNA itself. Identifier Sequences may be designed according to certain rules to fulfill their functions which are unique within a given experiment. An Identifier Sequence may be 1 bp long, 2 bp long, 3 bp long, 4 bp long, 5 bp long, 6 bp long, 6 to 10 bp long, 10 to 15 bp long, 15 to 20 bp, or longer than 20 bp. Preferable Identifier Sequences are 6 to 12 bp in length or 25 to 75 bp in length. An Identifier Sequence may be of arbitrary nature: they may have random sequences. They may be designed by computational means, taken from a biological sample or artificially created. They may also comprise recognition sites for restriction endonucleases or other enzymes and proteins, or priming sites. Identifier Sequences can be designed in accordance with any or all for the following rules:
In a preferable example of the invention, Identifier Sequences are used to mark the origin of a sample within a pool of samples, in which all members of the pooled sample are manipulated jointly in the same experiment. The samples within the pooled samples should be mixed as early as possible, preferably already as modified RNA samples. A sample obtained by mixing different RNA samples having different Identifier Sequences would create a “pooled sample” comprising different forms of modified RNAs (compare
In one embodiment, the Identifier Sequences are used to mark nucleic acid molecules in a particular RNA from multiple biological samples which may include cells from different organisms, tissues or various temporal or treatment stages of a biological experiment, or of different cell types. The pooling of samples within an experimental design may serve different functions including, but not limited to, increasing the complexity of the sample to make full use of the very high throughput of novel sequencing approaches, simplifying the handling of many samples by reducing the number of samples to be handled at the same time, or enabling certain forms of data analysis. In one preferable embodiment, samples are pooled so as to have the same systematic errors over all steps of manipulation for a common statistical analysis as compared to individual experiments in which distinct systematic errors would occur for different samples. For example, in one typical application, the Identifier Sequences are added in proximity of the 5′-end of the RNAs while creating a modified RNA according to the invention, and the modified RNA samples are then mixed prior to the preparation cDNAs thereof. The pooled sample is prepared to have a mixture of different species of modified RNA samples having distinct Identifier Sequences. This pool of modified RNA samples is then treated as a single sample according to the invention in order to obtain data related to the pooled sample. For example, sequencing reads related to the modified RNA samples can be obtained within the pooled library. The sequence information can be determined by any method known in the field, but it is preferable that each sequence read contains the sequence information of the Identifier Sequence plus sequence information derived from the original RNA sample or cDNA. After the determination of the sequencing reads, each individual sequence can be processed computationally in order to recognize the Identifier Sequence, and to group sequence reads having the same Identifier Sequence for further analysis. The sequence information related to the original RNA is analyzed separately from the Identifier Sequences in accordance with the needs of the experimental design. These sequences may relate to so-called “sequence tags” or short sequencing reads comprising partial sequence information derived from an RNA or cDNA. Sequence tags can be used to identify transcripts or certain locations in the genome, or may be used for a statistical analysis on the expression level of transcripts within a pooled sample (for further details on the use of sequencing tags refer to Harbers M. and Carninci P., Nature Meth. 2, 495-502 (2005), hereby incorporated herein by reference). Sequence information may be further stored in databases or by other computational means for the purpose of analysis, archiving, or reference data set building. Such a database could contain, for example, sequence information, the frequency of appearance of each sequence tag within different tissues and cell lines, and annotation data related to transcripts, genes, and functional elements within genomes.
In a different example, the Identifier Sequences are not identified by sequencing, but are used to form specific hybrids with nucleic acid molecules having complementary sequence to the Identifier Sequences bound to a solid matrix or support (compare
In a different example, the Identifier Sequences are not identified by sequencing or hybridization, but are used to bind specifically to proteins having a binding affinity to the Identifier Sequence that are bound to a solid matrix or support. In this example, the Identifier Sequences are used to group samples derived from the same origin to defined locations on a surface. The location will define the nature of the Identifier Sequence or the origin of the RNA, DNA or sample within a pooled sample. Hence, the readout of an Identifier Sequence not necessarily requires direct sequencing or hybridization, but can be otherwise performed by binding to a protein having high affinity for an Identifier Sequence.
The invention relates further to the sequencing of the regions from DNA fragments obtained according to the invention for the purpose for their annotation by computational means including their statistical analysis, annotation by means of alignments to reference information, and/or mapping to genomic sequences. Thus, the invention relates to a method for gene discovery, gene identification, gene expression profiling, and their annotation.
In another embodiment, the invention relates to the preparation of hybridization probes from the ends of nucleic acid molecules for analyzing such regions by means of in situ hybridization. In a preferred example, the in situ hybridization experiment makes use of a tiling array. In this embodiment, the invention relates further to the design of hybridization probes including, but not limited to, those presented on a microarray.
Thus, the invention provides a method for analyzing nucleic acid molecules and short fragments thereof as needed, for example, for the characterization of biological samples. Moreover, the invention provides a method for fast and effective manipulation of RNA and DNA fragments to make use of such fragments in analytical assays. In this sense, the invention provides a new method for making use of the ever-higher throughput of new sequencing devises and new sequencing technologies.
In another embodiment, modified RNA prepared according to the invention by the use of specific primers in the reverse transcription reaction can be used to determine the real 5′-end sequence similar to protocols known to a person skilled in the art as RACE.
The invention provides a method necessary for obtaining information of value to describe the status of a biological system, namely on the use of genetic information or expression profiles, and the activity of regulatory pass ways or regulatory networks. Hence, the invention relates to the design and performance of analytical assays that can be used in studies in life science and in diagnostic. The invention provides a method for analyzing a biological system and diagnostics.
The invention or parts thereof can be used for the production of a kit containing, among other components, reagents, nucleic acid molecules, and/or enzymes for the manipulation of RNA and the preparation of DNA. In one embodiment, a kit provides the reagents needed to modify RNA. In a different embodiment, a kit provides the reagents used for preparing a DNA template. In a preferable embodiment, a kit provides the reagents to prepare a template for single molecule detection. In a preferable embodiment, a kit provides the reagents for a research purpose. In a more preferable embodiment, a kit provides the reagents for a diagnostic assay.
Key steps of the present invention will now be further explained in more detail with reference to the following examples. All names and abbreviations as used to describe the invention herein shall have the meaning as known to a person skilled in the art.
To perform an example according to the invention, mRNA or total RNA samples were prepared by standard methods known to a person skilled in the art of molecular biology as, for example, given in more detail in Sambrook J. and Russuell D. W., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 2001, hereby incorporated herein by reference. Furthermore, Carninci P. et al. (Biotechniques 33 (2002) 306-309, hereby incorporated herein by reference) describe a method for obtaining cytoplasmic mRNA fractions. Although the use of cytoplasmic RNA can be preferable, the invention is not limited to this method, and any other approach for the preparation of mRNA or total RNA should allow for the performance of the invention in a similar manner.
The preparation of mRNA from total RNA or cytoplasmic RNA is preferable, but not essential, to perform the invention as the use of total RNA can provide satisfying results in combination with the Cap-selection step performed during full-length cDNA library preparation. The amount of mRNA represents about 1-3% of the total RNA preparations, and it can be subsequently prepared by using commercial kits based on oligo dT-cellulose matrixes. Such commercial kits including, but not limited to, the MACS mRNA isolation kit (Milteny) which provided satisfactory mRNA yields under the recommended conditions when applied for the preparation of mRNA fractions for performing the invention. To perform the invention, one cycle of oligo-dT mRNA selection is sufficient as extensive mRNA purification can cause a loss of long mRNAs.
All RNA samples used to perform the invention were analyzed for their ratios of the OD readings at 230, 260 and 280 nm to monitor the RNA purity. Removal of polysaccharides was considered successful when the 230/260 ratio was lower than 0.5 and an effective removal of proteins was obtained when the 260/280 ratio was higher than 1.8 or around 2.0. The RNA samples were further analyzed by electrophoresis in an agarose gel to prove a good ratio between the 28S and 18S rRNA in total RNA preparations (note rRNA size may change for preparation of total RNA from other species than mammalians), and to show the integrity of the RNA fractions.
This example is a typical protocol for the derivatization of 5′-ends of RNA molecules with RNA oligonucleotides. All reactions were performed in a 500 microliters siliconised microtube and using a siliconized tip each time to avoid nucleic acids losses.
The RNA sample was at first depohosphorylated. The RNA (for instance 1 nanogram to 1 microgram) was added in a tube, together with 2 micrograms of glycogen, in a total volume of 5 microliter. The reaction buffer was 1/10 the common concentration, or 5 mM Bis-Tris-Propane-HCl, 0.1 mM MgCl2, 0.01 mM ZnCl2, pH 6.0 at 25° C. Glicogen was used to avoid attachment of RNA to the plastic during the operation. The sample was denatured at 65° C. for 5 minutes to expose the phosphate groups to be later removed, and after being held at 37° C. for 2 minutes the Anctartic phosphatase (New England Biolabs) was added (2.5 units). The sample was treated for 3 hours to overnight at 37° C. Overnight dephosphorylation allowed removal of 98-99% of the phosphate groups. Short incubation could also be performed at 45° C. in the presence of trehalose at 0.6M final, which increased the activity at 45° C.
Then, the Antarctic phosphatase was inactivated at 65° C., but before doing this, the divalent ions had to be chelated. For this reason, 0.55 microliters of a solution of (0.5 M sodium acetate (pH 6.0), 10 mM EDTA, 1% β-mercaptoethanol, and 0.1% Triton X-100) were added. EDTA was chelating the divalent ions and created conditions suitable for the subsequent TAP treatment. The Antarctic phosphatase was also inhibited by EDTA in the buffer. The inactivation was carried out at 65° C. for 5 to 15 minutes.
The forgoing steps were the followed by decapping by simple addition of 0.2 microliters (2 units) of tabacco acid pyrophosphatase (TAP). It was also possible to increase the quantity of the Tap up to 20 units/experiment. The reaction was carried out for 2 hours at 37° C., followed by heat inactivation in this buffer, at 65° C. for 15 minutes, after which the sample was cooled on ice. Optionally, also betain could be added (1 M), which helped melting GC rich secondary structures in RNAs. After this treatment, the TAP did not degrade ATP anymore. ATP was necessary for the subsequent step. Then, the ligation was carried out by adding a “capping RNA” oligonucleotide of any sequence at a concentration of 5 micromolar oligonucleotide. To 6.75 microliters of reaction, 2 microliters of RNA ligase (500 mM HEPES-NaOH (pH 8.0 at 25° C.), 100 mM MgCl2, 100 mM DTT) were added. DTT inhibited the TAP. Optionally, also hexamino cobaltum chloride (HCC) could be added at 1 mM concentration, but this was optional and not necessary. Polyethylene glicole was then added (PEG 8000) at a final concentration of 25%, ATP at 125 micromolar concentration and finally 10 units of T4 RNA ligase (Fermentas) were added. At such conditions, the resulting mixture of previous buffers was not inhibitory for the ligation steps.
The sample was then ligated for 2 hours to overnight (16 hours) at 20° C. At this point, the former Cap structure of the RNA was replaced with an oligonucleotide, and this could be used for different tests as they appeared in other examples, such as full-length cDNA preparation.
The activity of each enzyme used in Example 2 and their buffers were tested by:
(A) Evaluation of the activity of the Antarctic Phosphatase (New England Biolabs). 5′ phosphorylated oligoribonucleotides were dephosophorylated 120 minutes at 37° C. in the following buffers. The oligoribonucleotides were subsequently radiolabelled with T4 Kinase and gamma-32P-ATP and analysed by PAGE. In absence of prior dephosphorylation, radiolabelling was impossible due to the 5′ phosphate.
(B) Evaluation of the activity of the Tobacco Acid Pyrophosphatase (TAP) (Epicentre). gamma-32P-ATP was incubated with 2 U TAP in a reaction buffer. The TAP was heated 15 minutes before incubation with radioactive ATP.
(C) For evaluation of the activity of the T4 RNA ligase (Fermentas) A radiolabelled oligoribonucleotide was incubated in presence of an unlabelled oligoribonucleotide. Ligation results in a shift of the electro-mobility in polyacrylamide gel.
The sample prepared as in the above was desalted using microcon YM-100 filter as described by the manufacturer (Millipore). To the ligated RNA, added were water and reverse transcriptase (RT) primers, which can be obtained by Invitrogen. Used were 800 ng of the primer AGA GAG AGA CCU CGA GCC UAG GUC CGA C for a 20 micro liters reaction, and 3 micro liters of the sorbitol-trehalose mixture (3.3 M stock) were added to have a final concentration of 0.5M Sorbitol and 4% trehalose when making the final RT reaction. The RNA-primer mixture was heated for 10 minutes at 65° C. and then stored on ice while the remaining reagents were prepared. Then a premix composed of 11 micro liters of 2×GC buffer (described in Carninci, Shiraki et al, Biotechniques, 2002; 32, 984-985, hereby incorporated herein by reference) was added, and then 1 micro liter of 10 mM dNTPs stock, and finally, 1 micro liter of MMLV reverse transcriptase (RNaseH minus, Fermentas) were added. The GC buffer system was replaced by a buffer as recommended by the manufacturer. To this reaction mixture, the RNA sample was added, and incubated for 2 min at 25° C. (to anneal the samples), 30 minutes at 42° C., 10 min at 52° C., 10 min at 56° C. before the reaction was stopped. In this way, cDNA was obtained at thigh frequency that spans the 5′-end of the original mRNAs. This was further purified/processed. For instance, it could be treated with proteinase K (addition of 20 micrograms, together with EDTA at 10 mM final concentration, followed by RNA and Proteinase inactivation at 95° C. for 15 minutes. This sample could then be used on C14B (Amersham-Pharmacia) to fractionate the size, or eliminate the primers.
The cDNA was amplified by PCR. To the cDNA, Takara EX-taq buffer was added at a final concentration of 1×, then dNTPs were added (final concentration: 200 micro molar each), 5′ oligonucleotide (sequence: acc tcg agc cta ggt ccg ac) and 3′ end oligonucleotide (sequence: ca gcg tcc tca agc ggc cgc), each oligonucleotide at 400 nM concentration, MgCl2 at 2.5 mM, and KCl at a final concentration of 50 mM. The components were mixed and then after 5 minutes at 94° C., samples were incubated for 30 seconds at 94° C., 30 seconds at 58° C., and 1.5 minutes at 68° C., for 30 cycles.
This produced 5′-end cDNA that were complete and could be blunted and cloned following standard techniques into a plasmid vector (see Sambrook et al., supra, for general information about molecular cloning and sequencing).
The capped RNA was prepared as in Example 2, with the only difference that the RNA oligonucleotide had a different sequence as described below. By using the process of Example 2, followed by PCR, it was possible to amplify 5′-ends by RACE. The experiment was performed as follows: 500 ng of total RNA from liver was subjected to ONE-Tube oligo-capping, followed by the removal of the unreacted oligoribonucleotides, and reverse-transcription with random primers. The 5′ ends were amplified with a gene-specific primer having a sequence of:
and a primer complementary to the oligo-cap having a sequence of:
The cDNA was prepared as in Example 2. However, the oligonucleotides were prepared and designed in order to have the different adaptors at the 3′ and 5′-end of the RNA, respectively:
/5BioTEG/CCUAUCCCCUGUGUGCCUUGCCUAUCCCCUGUUGCGUGUCUCAG Adaptor B was used as an “oligo-capping” sequence, and Adaptor A was used conjugated to a oligo-random primer for the first strand synthesis. After the first strand synthesis, the material was passed through a C1-4B spin column to separate the excess of unreacted primer. Subsequently, the sample was subjected to the emulsion-PCR and then sequencing reactions as described for the 454-Life Science sequencing instrument (Margulies et al, Nature, 2005; 437(7057): 376-380, hereby incorporated herein by reference). This resulted in identifying hundreds of thousands sequences in a single run.
The cDNA was obtained as in Example 2 and in the subsequent examples, and the sample was processed until the second strand cDNA was obtained by using standard protocols known to a person skilled in the art, such as the one described in Kodzius et al., Nat. Methods. 2006 March; 3(3): 211-22, hereby incorporated herein by reference. The cDNA was then cleaved with MmeI and followed by addition of a second linker, amplification, purification and production of concatamers. Detailed protocols for such procedures are described elsewhere, such as in Kodzius et al., Nat. Methods. 2006 March; 3(3):211-22, hereby incorporated herein by reference. These sequencing tags could then be further used for sequencing and then identifying gene borders (like in Carninci et al, Science. 2005 Sep. 2; 309(5740):1559-63, hereby incorporated herein by reference) and expression profiling, or as a promoter of the genes (Harbers and Carninci, Nat. Methods, 2005 July; 2(7): 495-502, hereby incorporated herein by reference).
The cDNA was obtained as in Example 2 and the subsequent examples, and the sample was processed until the first strand cDNA was obtained by using standard protocols known to a person skilled in the art, such as those described in Kodzius et al., Nat. Methods. 2006 March, 3(3):211-22. Subsequently, the nucleic acids were attached to a solid-phase matrix as in the US patent application Nos. 20060012793, 20060012784, and 20060008824, and instruments based on such technology.