The present invention relates generally to the field of molecular biology. More particularly, it concerns methods for full length RNA sequencing of single-cells.
Single cell sequencing (SCS) has emerged as a powerful new tool for studying rare cells and delineating complex populations. The currently used methods are aimed at capturing the polyadenylated fraction of the transcriptome (˜1% of the whole transcriptome). However, most single-cell protocols, such as Cel-Seq and Smart-seq (Hashimshony, T. Genome Biol. 17, 77 (2016 and Picelli, S. et al. Nat. Methods 10, 1096 (2013).) miss important RNA-species such as non-polyadenylated long non-coding RNA, tRNA, miRNA, snoRNA and snRNA. The key role of snoRNAs is to guide modifications of RNA, whilst most snRNAs are essential parts of the spliceosome. Hence, these RNA-species play a crucial function in RNAstructure and in the generation of different isoforms resulting in the translation of proteins with distinct functions.
Recently Hayashi et al (Nat. Commun. 9, (2018) described an SCS method, named RamDa-seq, for detecting nonpolyadenylated long non-coding RNA. However, this method does not allow to capture and simultaneously read out species of small RNAs in single-cells. Furthermore, it is not possible to tag the RNA molecules with barcodes and unique molecular identifiers (UMI), which makes it difficult to perform high-throughput sequencing and molecule counting using UMIs.
Thus there remains a need for new methods that allow high throughput and full length RNA sequencing of single cells.
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. One skilled in the art will recognize many methods and materials similar or equivalent to those described herein, which could be used in the practice of the present invention. Indeed, the present invention is in no way limited to the methods and materials described.
For purposes of the present invention, the following terms are defined below.
Methods of carrying out the conventional techniques used in methods of the invention will be evident to the skilled worker. The practice of conventional techniques in molecular biology, biochemistry, computational chemistry, cell culture, recombinant DNA, bioinformatics, genomics, sequencing and related fields are well-known to those of skill in the art and are discussed, for example, in the following literature references: Sambrook et al. Molecular Cloning. A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989; Ausubel et al. Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1987 and periodic updates; and the series Methods in Enzymology, Academic Press, San Diego.
“A,” “an,” and “the”: these singular form terms include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “a cell” includes a combination of two or more cells, and the like.
As used herein, the term “about” is used to describe and account for small variations. For example, the term can refer to less than or equal to ±10%, such as less than or equal to ±5%, less than or equal to ±4%, less than or equal to ±3%, less than or equal to ±2%, less than or equal to ±1%, less than or equal to ±0.5%, less than or equal to ±0.1%, or less than or equal to ±0.05%. Additionally, amounts, ratios, and other numerical values are sometimes presented herein in a range format. It is to be understood that such range format is used for convenience and brevity and should be understood flexibly to include numerical values explicitly specified as limits of a range, but also to include all individual numerical values or sub-ranges encompassed within that range as if each numerical value and sub-range is explicitly specified. For example, a ratio in the range of about 1 to about 200 should be understood to include the explicitly recited limits of about 1 and about 200, but also to include individual ratios such as about 2, about 3, and about 4, and sub-ranges such as about 10 to about 50, about 20 to about 100, and so forth.
“And/or”: the term “and/or” refers to a situation wherein one or more of the stated cases may occur, alone or in combination with at least one of the stated cases, up to with all of the stated cases.
“Comprising”: this term is construed as being inclusive and open ended, and not exclusive. Specifically, the term and variations thereof mean the specified features, steps or components are included. These terms are not to be interpreted to exclude the presence of other features, steps or components.
“Primer based amplification” refers to a polynucleotide amplification reaction, namely, a population of polynucleotides that are replicated from one or more starting sequences, i.e. a primer. A suitable primer may have a sequence length of 15-30 nucleotides. Amplifying may refer to a variety of amplification reactions, including but not limited to polymerase chain reaction (PCR), linear polymerase reactions, nucleic acid sequence-based amplification, rolling circle amplification and the like.
“Sequencing” refers to determining the order of nucleotides (base sequences) in a nucleic acid sample.
“High throughput sequencing technologies, also referred to in the art as next generation sequencing, such as offered by Roche, Illumina and Applied Biosystems, or also referred to in the art as third generation sequencing, as described by David J Munroe & Timothy J R Harris in Nature Biotechnology 28, 426-428 (2010) and such as offered by Pacific Biosciences and Oxford Nanopore Technologies, may also be used. Such technologies allow from one sample DNA multiple sequence reads in a single run. For example, the number of sequence reads may range from several hundred up to billions of reads in a single run of a high through put sequence technology. High throughput sequencing technologies may be performed according to the manufacturer's instructions (as e.g. provided by Roche, Illumina or Applied Biosystems). The technology may involve the preparation of DNA before carrying out a sequencing run. Such preparation may include ligation of adaptors to DNA. Adaptors may include identifier sequences to distinguish between samples. Depending on the size of DNA that is suitable or compatible with the high throughput sequencing technology used, the DNA that is to be sequenced may be subjected to a fragmenting step.
“Size selection’ according to the invention involves techniques with which particular size ranges of molecules, e.g. (ligated) DNA fragments or amplified (ligated) DNA fragments, are selected. Techniques that can be used are for instance gel electrophoresis, size exclusion, gel extraction chromatography, but are not limited thereto, as long as molecules with a particular size can be selected or excluded, such a technique will suffice.
The term “PCR” encompasses derivative forms of the reaction, including but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, assembly PCR and the like. Reaction volumes range from a few hundred nanoliters, e.g., 200 nL, to a few hundred microliters, e.g., 200 microliters. “Reverse transcription PCR,” or “RT-PCR,” means a PCR that is preceded by a reverse transcription reaction that converts a target RNA to a complementary single stranded DNA, which is then amplified, e.g., Tecott et al., U.S. Pat. No. 5,168,038. “Real-time PCR” means a PCR for which the amount of reaction product, i.e., amplicon, is monitored as the reaction proceeds. There are many forms of real-time PCR that differ mainly in the detection chemistries used for monitoring the reaction product, e.g., Gelfand et al., U.S. Pat. No. 5,210,015 (“Taqman”); Wittwer et al., U.S. Pat. Nos. 6,174,670 and 6,569,627 (intercalating dyes); Tyagi et al., U.S. Pat. No. 5,925,517 (molecular beacons). Detection chemistries for real-time PCR are reviewed in Mackay et al., Nucleic Acids Research, 30:1292-1305 (2002). “Nested PCR” means a two-stage PCR wherein the amplicon of a first PCR becomes the sample for a second PCR using a new set of primers, at least one of which binds to an interior location of the first amplicon. As used herein, “initial primers” in reference to a nested amplification reaction mean the primers used to generate a first amplicon, and “secondary primers” mean the one or more primers used to generate a second, or nested, amplicon. “Multiplexed PCR” means a PCR wherein multiple target sequences (or a single target sequence and one or more reference sequences) are simultaneously carried out in the same reaction mixture, e.g. Bernard et al. (1999) Anal. Biochem., 273:221-228 (two-color real-time PCR). Usually, distinct sets of primers are employed for each sequence being amplified. “Quantitative PCR” means a PCR designed to measure the abundance of one or more specific target sequences in a sample or specimen. Techniques for quantitative PCR are well-known to those of ordinary skill in the art, as exemplified in the following references: Freeman et al., Biotechniques, 26:112-126 (1999); Becker-Andre et al., Nucleic Acids Research, 17:9437-9447 (1989); Zimmerman et al., Biotechniques, 21:268-279 (1996); Diviacco et al., Gene, 122:3013-3020 (1992); Becker-Andre et al., Nucleic Acids Research, 17:9437-9446 (1989); and the like.
As used herein, the term “adapter” is a single-stranded, double-stranded, partly double-stranded, Y-shaped or hairpin nucleic acid molecule that can be attached, preferably ligated, to the end of other nucleic acids, and preferably has a limited length, e.g., about 10 to about 200, or about 10 to about 100 bases, or about 10 to about 80, or about 10 to about 50, or about 10 to about 30 bases in length, and is preferably chemically synthesized. The optionally double-stranded structure of the adapter may be formed by two distinct oligonucleotide molecules that are base paired with one another, or by a hairpin structure of a single oligonucleotide strand.
As demonstrated herein we have developed Vast transcriptome Analysis in Single cells by A-tailing” (VASA-Seq), a novel single-cell method for whole transcriptome and full-length analysis. This method can be used for example, for single cells after FACS sorting, enabling multicolor antibody labeling of cell types (such as HSPCs) and recording of FACS-index data. VASA-Seq detects full-length isoforms of mRNA and long noncoding RNA with reduced technical noise due to the addition of UMIs to each fragment. It also provides strand information and, most importantly, is able to capture snoRNA and snRNA from the same cell. The method described herein allows better understanding of cell-to-cell heterogeneity due to broader detection of different RNA species in single cells.
The novelty is to perform fragmentation, end repair and poly-A tailing directly at the single cell level. This way, each fragment (and RNA species that naturally lack poly-A tails such as IncRNA, snRNA and snoRNA) can be primed with a barcoded poly-T primer. In some embodiments, the poly-T primer has a barcode and an UMI (unique molecular identifier) and it is thus possible to achieve much higher throughput in terms of the number of cells that can be processes compared to RamDa-seq (Hayashi et al, supra). RamDa-seq also lacks the UMI (can only be added in combination with a barcode) which is very important for noise reduction of the sequencing data. VASA-seq also exhibits strand specificity due to the fact that it's always the same end of the RNA fragments that gets poly-A tailed and primed. This feature is lacking in RamDa-seq or Smart-seq2 (an method for full-length RNA-sequencing of single cells, not full transcriptomics).
Thus, the methods described herein are suitable for generating NGS libraries corresponding to any RNA starting material of interest and are not limited to polyadenylated RNAs. For example, the subject methods may be used to generate NGS libraries from non-polyadenylated RNAs, including microRNAs, small RNAs, siRNAs, and/or any other type non-polyadenylated RNAs of interest. The methods also find use in generating strand-specific information, which can be helpful in determining allele specific expression or in distinguishing overlapping transcripts in the genome.
In a first aspect, there is provided for a method for processing an RNA sample, the method comprising the steps of:
In one embodiment, the method provides for producing a cDNA comprising a an identifier sequence (barcode) and a unique molecular identifier (UMI) the method comprising the steps:
After the fragmentation step, the sample mixture will contain polyadynelated RNA and non-polyadenylated RNAs. Polyadenylation is the addition of a poly(A) tail to the fragmented RNA. It is an objective of the invention to polyadenylate the non-polyadenylated RNA so that these RNAs can sequenced. Adenylating of RNA may be performed using any convenient approach. According to certain embodiments, the adenylation is performed enzymatically, e.g., using Poly(A) polymerase or any other enzyme suitable for catalyzing the incorporation of adenine residues at the 3′ terminus of the precursor RNA. Reaction mixtures for carrying out the adenylation reaction may include any useful components, including but not limited to, a polymerase, a buffer (e.g., a Tris-HCL buffer), one or more metal cations (e.g. MgCb, MnCl2, or combinations thereof), a salt (e.g., NaCl), one or more enzyme-stabilizing components (e.g., DTT), ATP, and any other reaction components useful for facilitating the adenylation of a precursor RNA. The adenylation reaction may be carried out at a temperature (e.g., 30° C.-50° C., such as 37° C.) and pH (e.g., pH 7-pH 8.5, such as pH 7.9) compatible with the polymerase being employed, e.g., polyA polymerase. Other approaches for adding nucleotides to a precursor RNA include ligation-based strategies, where an RNA ligase (e.g., T4 RNA ligase) catalyzes the covalent joining of a defined sequence to an end (e.g., the 3′ end) of the precursor RNA to produce a template RNA.
In one embodiment the nucleotide sequence that is capable of hybridizing to nucleic acids is designed to hybridize to the poly-A tail of mRNA such as a poly T sequence. In one embodiment, the poly-T sequence and/or analogues thereof or a combination thereof comprise at least 6 nucleotides, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30 or 40 nucleotides.
As used herein, a “barcode” refers to a nucleic acid sequence that is preferably used to identify the cell or batch origin of nucleic acid after amplification and sequencing processes. The unique barcode sequence allows each cell's or batch's nucleic acids (genome or transcriptome) to be associated with the original cell/batch. In another embodiment the barcode sequence is used to trace back the genome to each cell. According to one embodiment, the barcode sequence comprises at least 2 nucleotides or alternatively, more than 2 nucleotides, or alternatively, at least 4 nucleotides, or alternatively, at least 6 nucleotides, or alternatively, at least 8 nucleotides, or alternatively, at least 10 nucleotides, or alternatively, at least 12 nucleotides, or alternatively, at least 14 nucleotides, or alternatively, at least 20 nucleotides, or alternatively, at most 8 nucleotides, or alternatively, more than 8 nucleotides, or alternatively, at most 10 nucleotides, or alternatively, at most 14 nucleotides, or alternatively, at most 20 nucleotides. In some embodiments, first strand synthesis is primed with an (anchored) oligo dT primer (or potentially with a randomer or a combination of the two) that is appended with a barcode, an amplification primer binding site, and optionally a template switch (TS) primer sequence.
In addition or alternatively, the prepared sequencing library and/or the processed RNA sample may comprise an UMI. Hence, the prepared sequencing library and/or the processed RNA may comprise at least one of a barcode and an UMI. In an embodiment, the barcode can be preceded or followed by a second barcode that is a molecular barcode (unique molecular identifier, or “UMI”) that would allow for the detection of PCR duplicates. UMI sequences have been described in the art, such as by Kivioja et al., 2012, Nat Methods 9: 72-74. The UMI sequence is a random sequence which may be added to quantify absolute numbers of each transcript molecule and eliminate amplification biases. Thus, in some embodiments, 1st strand synthesis as performed in step d) is primed with an oligo dT that has been appended with a barcode, and a (UMI). It will be appreciated that the order of the barcode and UMI on the first strand synthesis primer can be varied. For example, in some embodiments, the barcode (BC) is positioned 3′ to the UMI. In some embodiments, the barcode is positioned 5′ to the UMI. In some embodiments, the barcode is directly contiguous with the UMI. In some embodiments, the barcode is separated from the UMI by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more than 10 nucleotides. In some embodiments, the sample barcode overlaps with the UMI by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more than 10 nucleotides.
In one embodiment of the invention, the RNA sample is a cellular RNA sample, preferably from a single cell, more preferably wherein the RNA is isolated from a cell nucleus. The invention is not limited to any specific cell type. Preferably, the cell is a nucleated cell. Preferably, the cell is a mammalian cell, preferably a human cell. A preferred human cell can be at least one of a tumor cell, an embryonic cell and a brain cell.
In an embodiment, the RNA sample comprises non-polyadenylated RNA that can become polyadenylated in the method of the invention. Preferably, the non-polyadenylated RNA that becomes polyadenylated, and optionally sequenced, in the method of the invention is a non-coding RNA. Preferably, the non-coding RNA is selected from the group consisting of long non-coding RNA (IncRNA), tRNA, miRNA, snoRNA and snRNA. Preferably, the non-coding RNA is a small non-coding RNA, preferably at least one of a miRNA, a snoRNA and a snRNA.
In one embodiment, the RNA sample is a cellular RNA sample, preferably the RNA sample is the total RNA of a single cell. In another one embodiment the RNA is isolated from a cell nucleus, preferably the RNA is isolated from a single cell nucleus, most preferably from a single cell nucleus.
As used herein, a “single cell” refers to one cell. Single cells useful in the methods described herein can be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. Additionally, cells from specific organs, tissues, tumors, neoplasms, or the like can be obtained and used in the methods described herein. Furthermore, in general, cells from any population can be used in the methods, such as a population of prokaryotic or eukaryotic single celled organisms including bacteria, fungi or yeast.
In some embodiments of the method described herein, obtaining the RNA sample can include the step of first obtaining single cells and then lysing the cells to release the RNA.
A single cell suspension can be obtained using standard methods known in the art including, for example, enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample. Single cells can be placed in any suitable reaction vessel in which single cells can be treated individually. For example a 96-well plate, such that each single cell is placed in a single well. Methods for manipulating single cells are known in the art and include fluorescence activated cell sorting (FACS), micromanipulation and the use of semi-automated cell pickers (e.g. the Quixell™ cell transfer system from Stoelting Co.). Individual cells can, for example, be individually selected based on features detectable by microscopic observation, such as location, morphology, or reporter gene expression.
Methods of lysis the cells or single cell to release the RNA are well known in the art. Lysis can be achieved by, for example, heating the cells, or by the use of detergents or other chemical methods, or by a combination of these. However, any suitable lysis method known in the art can be used. Preferably, the lysis step is performed by at least one of
In one embodiment of the invention, the fragmentation step b) is performed by a divalent metal-cation at a temperature between about 55-100° C.
Methods to fragment RNA are known to the skilled person. Common fragmentation methods include enzymatic fragmentation, nebulization, RNA hydrolysis and heat digestion of the RNA with a divalent metal cation, i.e. exposure to divalent cations at elevated temperature. In one embodiment of the invention, the fragmentation of RNA in step b) heated digestion of the RNA with a divalent metal cation. Preferably, the heat digestion is performed at a temperature between about 55-100° C., more preferably the heat digestion is performed at a temperature between about 65-85° C. In another embodiment of the invention the divalent metal cation is selected from the group consisting of Mg2+, Mn2+, Ca2+ and Zn2+. Preferably, the divalent metal cation is Mg2+ or Mn2+.
In another embodiment of the invention, fragmentation of RNA in step b) can be carried out chemical reactions including for example, hydrolysis reactions including base and acid hydrolysis.
For example, in particular embodiments, RNA can be fragmented by Alkaline conditions because RNA is unstable under alkaline conditions. See, e.g., Nordhoff et al. (1993) “Ion stability of nucleic acids in infrared matrix-assisted laser desorption/ionization mass spectrometry”, Nucl. Acids Res., 21(15):3347-57. In one embodiment of the invention, RNA is fragmented by hydrolysis using Na2CO3.
Fragmentation by the RNA can result in a phosphate group at the 3′end of the fragmented RNA. In one embodiment, end-repairing of the fragmented RNA might be required before polyadenylation can take place. The end-repairing step replaces the phosphate group, that is created by RNA fragmentation at the 3′end of the RNA strand, with an OH group so that the fragmented RNA can be subjected to polyadenylation. End-repair can be performed by methods known in the art, including for example, by using a polynucleotide kinase, preferably the T4 Polynucleotide Kinase (PNK) and a source of phosphate such as ATP.
In one embodiment, the method of the invention further comprises sequencing of the cDNA obtained in step d). In one embodiment next generation sequencing (NSG) is used, such as offered by Roche, Illumina and Applied Biosystems, or also referred to in the art as third generation sequencing, as described by David J Munroe & Timothy J R Harris in Nature Biotechnology 28, 426-428 (2010) and such as offered by Pacific Biosciences and Oxford Nanopore Technologies, may also be used.
In one embodiment, the method of the invention further comprises one or more of the following steps:
As used herein, “in vitro transcription” or “IVT” refers to the process whereby transcription occurs in vitro in a non-cellular system to produce “synthetic RNA molecules”. In one embodiment, the method of the invention comprises in vitro transcription (step 0 of the cDNA obtained in step d), whereby amplified RNA (aRNA) is obtained. By “amplified RNA” it is meant that for each initial source of nucleic acid, multiple corresponding RNAs are produced.
The person skilled in the art straightforwardly understands that that there can be additional or alternative methods to amplify the cDNA obtained in step d), such as, but not limited to polymerase chain reaction (PCR). As a non-limiting example, PCR may be used instead of, or in addition to, IVT when e.g. an adapter is ligated at the 5′ site of the, optionally poly-adenylated, RNA of step a), b) or c) and/or an adapter may be ligated to the cDNA obtained in step d). This adapter may comprise a primer binding site for amplifying the cDNA. A second primer binding site may be located in the poly-T primer and or in any further adapter ligated to the other site of the cDNA molecule, such that the cDNA molecule is flanked by primer binding sites for amplifying the cDNA.
Ribosmal RNA (rRNA) comprises about 95% to about 98% of the RNA in a cell, its presence can complicate the analyses of RNA molecules of interest in a sample. In one embodiment, the method of the invention thus includes a step of rRNA depletion. The skilled person straightforwardly understands that the cDNA molecule obtained in step d) of the method of the invention can be further extended with any preferred additional nucleotide sequence, preferably to extend the cDNA molecule of step d) with one or more universal sequences as specified herein. Preferably, the cDNA molecule in the sequencing library comprises on one site of the molecule at least an UMI obtained by reverse transcription as specified in step d), and at the other site any preferred additional nucleotide sequence, such as, but not limited to, one or more universal sequences, a barcode and/or an UMI.
For example, at different steps of the method as specified herein, there may be one or more adapters ligated to the cDNA and/or RNA molecule. In addition or alternatively, additional nucleotide sequences may be added during the reverse transcription and/or amplification steps by incorporating these sequences in the primer used for respectively reverse transcription and/or amplification.
In an embodiment, the method of the invention further comprises a step h) of ligating an oligonucleotide adapter to the aRNA obtained is step f). Preferably, the adapter-ligated RNA molecule comprises at one site of the molecule at least an UMI obtained by reverse transcription as specified in step d), and at the other site an adapter obtained in step h). Preferably, the adapter comprises one or more universal sequences as defined herein. The adapter may further comprise at least one of a barcode and an UMI.
Alternatively or in addition, the adapter can be ligated to at least one of:
In an embodiment, the adapter is ligated to the RNA provided in step a). Preferably, the adapter is ligated to the 5′ end of the RNA molecule. Preferably, the adapter comprises one or more universal sequences as defined herein. The adapter may further comprise at least one of a barcode and an UMI. Optionally, the 5′ end is phosphorylated prior to adapter ligation.
In an embodiment, the adapter is ligated to the cDNA obtained in step d). Preferably, the adapter-ligated cDNA comprises on one site of the molecule an UMI obtained by reverse transcription as specified in step d), and comprises at the other site an adapter. Preferably, the adapter comprises one or more universal sequences as defined herein. The adapter may further comprise at least one of a barcode and an UMI.
In an embodiment, the adapter is ligated to the double-stranded cDNA obtained in step e). Preferably, the adapter-ligated double-stranded cDNA molecule comprises at one site of the molecule an UMI obtained by reverse transcription as specified in step d), and at the other site an adapter. Preferably, the adapter comprises one or more universal sequences as defined herein. The adapter may further comprise at least one of a barcode and an UMI.
In an embodiment, the method of the invention further comprises a step i) of performing reverse transcription of the aRNA. In an embodiment, the adapter is ligated to the cDNA molecule obtained in step i). Preferably, the adapter-ligated cDNA comprises on one site of the molecule an UMI obtained by reverse transcription as specified in step d), and comprises at the other site an adapter. Preferably, the adapter comprises one or more universal sequences as defined herein. The adapter may further comprise at least one of a barcode and an UMI.
In an embodiment, any preferred additional sequence, preferably any universal sequence, is present in a primer used in step e) to perform the second-synthesis. The primer may further comprise at least one of a barcode and an UMI.
In an embodiment, any preferred additional sequence, preferably any universal sequence, is present in a primer used in step i) to perform reverse transcription. The primer may further comprise at least one of a barcode and an UMI.
In an embodiment, any preferred additional sequence, preferably any universal sequence, is present in a primer used in step k) to amplify the cDNA. The adapter may further comprise a barcode.
In an embodiment, the method of the invention further comprises a step j) of degrading the remaining aRNA.
In an embodiment, the method of the invention further comprises a step k) of amplifying the cDNA samples to generate a cDNA library comprising double-stranded cDNA.
In an embodiment, the method of the invention further comprises a step l) of selecting by size the cDNA library obtained in step k).
In an embodiment, the method of the invention further comprises a step m) of sequencing at least one of:
In one embodiment, the method of the invention further comprises the following steps:
The adapters that are added to the 5′ and/or 3′ end of a nucleic acid can comprise a universal sequence.
In an embodiment, the cDNA molecules of the sequencing library, e.g. the cDNA molecules obtained by the method of the invention, are flanked by one or more universal sequences.
A universal sequence is a region of nucleotide sequence that is common to, i.e., shared by, two or more nucleic acid molecules. Optionally, the two or more nucleic acid molecules also have regions of sequence differences. Thus, for example, the 5′ adapters can comprise identical or universal nucleic acid sequences and the 3′ adapters can comprise identical or universal sequences. A universal sequence that may be present in different members of a plurality of nucleic acid molecules can allow the replication or amplification of multiple different sequences using a single universal primer that is complementary to the universal sequence. Some universal primer sequences used in examples presented herein include the V2.A14 and V2.B15 sequences. However, it will be readily appreciated that any suitable adapter sequence can be utilized in the methods presented herein. The universal sequences may comprise a binding site for a sequencing primer, preferably a binding site for a deep-sequencing primer.
Methods to degrade the remaining a RNA are known in the art and include for example, degrading the RNA with a DNAase-free RNAse, such as RNaseA.
Optional step k) is carried out to prepare a standard NGS library, including ligating sequencing adapters to the DNA templates for direct sequencing.
In one embodiment of the invention, the size of the size selected PCR products is between 150 bp and 1000 bp, preferably the size of the selected PCR products is between 300-450.
In a second aspect, the invention provides for a cDNA fragment comprising a barcode and a UMI obtainable by the methods according to the invention.
In one embodiment, the cDNA fragment of the invention, is further processed to be sequenced either by sanger sequencing of by NSG sequencing as described above.
1. Sorting and Lysis
2. Fragmentation
3. End repair
4. Poly-A Tailing
5. cDNA Synthesis/Reverse Transcription (RT)
6. Second Strand Synthesis
7. Pool&Cleanup
8. IVT
9. DNA cleanup
EXO-SAP (to remove primers):
10. Pool&Cleanup
1. rRNA Depletion
Make Hyb-Mix (4+1 Reactions)
Make RNase-Mix (4+1 Reactions)
Make DNase-Mix (4+1 Reactions)
2. Pool&Cleanup
3. Adapter Ligand
4. 2nd cDNA Synthesis
5. Strand Degradation
6. PCR Amplification:
Add to each tube:
Amplify the tube in the thermal cycler using the following PCR cycling conditions:
7. Size Selection
8. Bead Cleanup
Number | Date | Country | Kind |
---|---|---|---|
18203188.0 | Oct 2018 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/079512 | 10/29/2019 | WO | 00 |