The present invention is comprised in the field of molecular biology and nanotechnology and relates to a method for producing miRNA libraries for massive parallel sequencing by applying nanotechnology for reducing biases, increasing efficiency, and reducing costs.
Eukaryotic cells (and some viruses) produce small non-coding RNA molecules (19 to 25 nucleotides in their mature forms) which regulate the expression of a number of genes. In humans, it has been estimated that 30% of the genome is regulated by microRNAs. microRNAs mainly act on messenger RNAs in the cytoplasm, recognizing specific sequences of UTRs (untranslated regions) through which they reduce the frequency of translation and the half-life of the messenger. Molecular functioning of a certain complexity requires the involvement of protein structures in the cytoplasm (such as the RNA-induced silencing complex, RISC).
miRNAs play a relevant role in processes as important as cell proliferation, apoptosis, differentiation, or energy metabolism. miRNA biogenesis is subjected to strict spatial and temporal control. miRNA deregulation is associated with most chronic pathological processes in humans (including cancer, diabetes, endothelial dysfunction, and neurodegenerative diseases). The stability of miRNA in blood and its relevance in chronic pathological processes suggest that markers indicating the presence, degree, and prognosis of a disease could be found within the population of circulating miRNAs. In this sense, there is an important field of research in cancer (included in the concept of liquid biopsy) and, developed to a lesser extent, in type II diabetes and neurodegenerative diseases.
There are unresolved methodological barriers which limit miRNA analysis, as well as the existence of biases not taken into consideration that lead to the publication of contradicting results and the lack of reproducibility of many studies. Methodological problems originate from two characteristics of miRNAs:
There are three methodological groups to address miRNA analysis: (1) specific miRNA sequence analysis by means of RT-PCR (reverse transcription—PCR), (2) microarray hybridization, and (3) miRNA-seq (massive parallel sequencing).
The study of circulating miRNAs as cancer biomarkers presents additional challenges. Blood concentrations are much lower than in tumor tissue, but furthermore, the proportion of circulating miRNA originating from neoplastic cells may vary greatly since it depends on the volume of the tumor mass and on the stage of the cancer. Moreover, certain miRNA species may originate from exosomes released by minority neoplastic populations that are, however, of great relevance (such as tumor-initiating cells or cancer stem cells).
Massive parallel miRNA sequencing (miRNA-seq) should be powerful enough to successfully address the challenges of circulating miRNA. However, results are not reproducible between the different methodologies and platforms and worsen when applied to the peculiarities of circulating miRNA. Accordingly, the “classic and obsolete” microarray is still the method of choice for massive miRNA sequence analysis.
The problems of reproducibility do not originate from massive sequencing per se, but rather from the level of massive sequencing library (miRNA-seq library) production.
There are several methods to produce miRNA-seq libraries. All these methods use one- or two-step ligation reactions. The ligation reaction introduces biases given that the probability of attaching two molecules (DNA or RNA) depends on their sequences. Accordingly, when producing libraries by means of ligation reactions, a significant alteration of relative frequencies may happen within the population of molecules to be sequenced (overestimating certain sequences and underestimating others). Ligation bias is a yet-to-be-resolved methodological problem, the error of which is very hard to quantify.
Furthermore, in miRNA, biases associated with the ligation reaction increase considerably for two reasons:
Therefore, it is necessary to develop powerful techniques to perform an accurate genetic analysis, and these techniques must be accurate enough so that circulating miRNA can reflect, through its composition, incipient tumors or tumor subpopulations of clinical relevance.
The authors of the present invention have developed a new methodology to create miRNA libraries. The new methodology applies nanotechnology to the process, which allows adding (actually elongating) adapters by means of DNA polymerase, avoiding ligation reaction and problems associated therewith (reduced efficiency and biases).
Therefore, a first aspect of the invention relates to a method for obtaining a massive sequencing library with the complementary DNA, cDNA, of a population of miRNAs of interest, which comprises:
Step (a) consists of the specific capturing of the genetic material on the surface of the colloidal tools.
The particle bears on its surface an oligonucleotide covalently attached at its 5′ end (in the example of
Therefore, in a preferred embodiment of this aspect of the invention, the magnetic particle is characterized in that:
Ideally, the size of the poly(A) tail added to the 3′ end of the miRNAs is of 20-30 nucleotides. The reaction time and the amount of enzyme can be adjusted so that the poly(A) tail is in the desired range. To complete the reaction, heating for 10 minutes at 65° C. is sufficient to denature the enzyme.
Once the tailing reaction is performed, colloidal tools which capture the population of miRNAs as a result of a single-stranded sequence of (ideally between 18 and 20) thymines are added. The oligonucleotide (DNA) attached to the particle bears, in its 3′ half, the poly(T) sequence and, in its 5′ half, the sequence of one of the massive sequencing adapters (
The colloidal tools can capture the miRNAs in the same buffer in which the tailing reaction was performed (i.e., the reaction product does not need to be purified by means of chemical methods). Capturing (by means of hybridization between the poly(A) tail of the miRNAs and the poly(T) sequence of the particles) is performed for 2-4 hours under stochastic stirring and at a temperature of 55° C.
In the example of the invention, after creating the amide bond, the magnetic particles settle on a magnet, washed 2 times with 200 mM NaOH, and incubated in the same alkaline solution for 30 minutes. The sudden increase in pH caused by the NaOH solution has two functions:
Once the alkaline incubation is performed, pH is re-equilibrated by means of two washings in 100 mM Tris-HCl buffer at pH 7.4 and the particles are resuspended in 10 mM Tris-HCl buffer at pH 7.4.
In the example of the invention, the sequence of the oligonucleotide covalently attached to the particle consists of:
Lastly, the particle is hybridized with a complementary oligonucleotide of the Tc-P1 sequence (in a suitable buffer) and washed in 10 mM Tris-HCl at pH 7.4 to remove excess non-hybridized oligonucleotide. This optional step improves capturing efficiency.
In step (b) or reverse transcription on the particles, the attachment between the oligonucleotide of the particles and the population of miRNAs is used for priming the reverse transcription reaction. For this reason, it is advisable for the length of the poly(T) tail of the oligonucleotide attached to the particles to be shorter than the poly(A) tail of the miRNAs (which thereby ensures that the 3′ ends of the oligonucleotide of the particles remain hybridized to the poly(A) tail of the miRNAs, increasing reverse transcription efficiency).
The particles attached to miRNAs are resuspended in a reverse transcription reaction medium (any commercial reverse transcription kit can be used) which is (conventionally) developed by means of a 30-60 minute incubation at 42° C. Right before adding the reverse transcriptase, it is advisable to perform heating at 70-80° C. for 5-10 minutes in order to remove secondary RNA structures that may reduce reverse transcription efficiency.
After the reverse transcription reaction, alkaline washing is performed on the particles (step (c)), and this washing has two functions:
As a result, the magnetic particles have covalently attached thereto at 5′ single-stranded DNA molecules bearing:
In step (d), blocking of the particle, the oligonucleotides attached to the colloidal tool must be in excess in order to increase miRNA capturing and reverse transcription efficiency. Accordingly, after steps (a)-(c), a high proportion of oligonucleotides does not undergo elongation (does not acquire a cDNA sequence at 3′). These oligonucleotides without elongation must be blocked so that they do not interfere with the final steps of the process. Blocking of the particle is an essential requirement.
Blocking is performed by selectively adding a terminator nucleotide (dideoxy-thymine) at the 3′ end of the oligonucleotides which have not acquired cDNA sequences. To that end, the particles are resuspended in a medium with PCR reaction buffer, Taq-polymerase (any Taq-polymerase or another commercial thermostable DNA polymerase can be used), an oligonucleotide (which was referred to as an elongation template), and dideoxy-thymine-triphosphate (ddTTP) at a concentration of about 0.2 mM.
Dideoxynucleotides (such as dideoxy-thymine) lack a 3′ hydroxyl group, and accordingly are incapable of continuously incorporating new nucleotides by means of a phosphodiester bond (which is the basis for Sanger sequencing, for example).
The elongation template consists of:
By means of this elongation template, a dideoxy-thymine is incorporated only in those oligonucleotides of the particle which have not acquired a cDNA sequence.
The blocking reaction is performed following a protocol consisting of several cycles:
The proportion of elongated molecules can be increased by performing several cycles (5-15 cycles), reaching practically 100%. This is why the 3′ ends of the elongation templates are inactivated (by means of a dideoxynucleotide or another terminator nucleotide). This is to ensure that elongation can only proceed from the 3′ ends of the DNA strands covalently attached to the particle.
Finally, two alkaline washings of the particle are performed, followed by pH re-equilibration, and resuspension in buffer at pH 7.4. This alkaline washing removes the blocking reaction medium and the elongation template used to specifically block oligonucleotides without a cDNA sequence (step (d)) and only leaves single-stranded strands covalently attached at 5′ on the particle.
The process of blocking non-elongated oligonucleotides of the particle is shown in
Elongation of the second adapter is performed in steps (e) and (f). By means of a terminal transferase, a DNA tailing reaction which adds a guanine (poly(G)) tail to the 3′ ends (which can be elongated) of the single-stranded DNA attached to the particles is performed. Unlike RNA tailing (step (a)), DNA tailing can add any type of nucleotide to the 3′ end of (both single- and double-stranded) DNA molecules. DNA tailing reaction, performed in the presence of only dGTP, adds a poly(G) tail.
For DNA tailing reaction, particles are resuspended in the presence of the suitable reaction medium (any commercial terminal transferase is applicable) and 0.2 mM dGTP. Ideally, the size of the poly(G) tail added to the 3′ end of the DNA attached to the particles is of 15-20 nucleotides, never exceeding 20. The reaction time and the amount of enzyme can be adjusted so that the poly(G) tail is within the desired range. To complete the reaction, heating for 10 minutes at 65° C. is sufficient to denature the enzyme.
After DNA tailing, magnetic particles are washed twice in a suitable buffer (with PBS, TBS, or another similar buffer) in order to remove the substrates of the reaction.
The poly(G) tail allows elongating the second massive sequencing adapter by means of an elongation template which specifically hybridizes with the poly (G) tail. In this example, this second elongation template consists of:
The elongation reaction is performed by means of the same method described in step (b), with the exception that instead of dideoxy-thymine-triphosphate, a mixture of the 4 triphosphate nucleotides (dATP, dTTP, dGTP, dCTP) is added at 0.2 mM. Two alkaline washings (and pH re-equilibration) are then performed to remove any remaining reaction medium and DNA strands not covalently attached to the particles.
These alkaline washings remove the elongation reaction medium and the elongation template used to add the sequence of the second adapter, leaving only single-stranded strands covalently attached at 5′ on the particle.
Tailing with guanines which hybridize with the poly(C) tail of an elongation template has been shown in the example. The design of the present invention is not limited to this combination and other complementary nucleotide ends can be used.
The process of elongating the second adapter is shown in
The library is completed in step (g). For completion, a standard PCR is performed with the particles and using primers specific for the ends of the two adapters. The particle is removed (causing it to settle on a magnet) and the PCR product is purified (
The result is a massive sequencing library with the cDNA of the population of miRNAs flanked by two homopolymers (A/T and G/C) of 10-20 base pairs each.
It should be noted that although the present invention preferably relates to microRNAs (miRNAs), given that they are the most abundant population among small non-coding RNAs, there are other types of small non-coding RNAs such as siRNAs, piwi-RNAs, or tRNAs which can also be detected and quantified by means of the present technology.
In this sense and as it is used throughout the present invention, the term “small non-coding RNAs” encompasses, but is not limited to, a polynucleotide molecule varying from about 10 (preferably 17) to about 450 nucleotides in length, which can be endogenously transcribed or exogenously produced (in a chemical or synthetic manner), but which is not translated into a protein. Preferably, said term encompasses, but is not limited to, a polynucleotide molecule varying from about 10 nucleotides, preferably 15 nucleotides, more preferably 17 nucleotides to about 50 nucleotides, more preferably 30 nucleotides, even more preferably 25 nucleotides in length, which can be endogenously transcribed or exogenously produced (in a chemical or synthetic manner), but which is not translated into a protein.
Examples of small non-coding RNAs include various molecules such as siRNA (small interfering RNA), piwi-RNA, tRNA (transfer ribonucleic acid), snRNA (small nuclear RNA), snoRNA (small nucleolar RNAs), tncRNAs (transfer RNA-derived small ncRNAs), and microRNAs. Likewise, this term, “small non-coding RNAs”, also includes primary miRNA transcripts (also known as pri-pre-miRNAS, pri-mirs, and pri-miRNAS) varying from about 70 nucleotides to about 450 nucleotides in length), as well as pre-miRNAS (also known as miRNA precursors, ranging from about 50 nucleotides to about 110 nucleotides in length). In other words, the first aspect of the present invention, as well as all the preferred embodiments of this aspect, can be applied to a method for obtaining a massive sequencing library with the complementary DNA, cDNA, of a population of small non-coding RNAs of interest, which comprises:
Step (a) consists of the specific capturing of genetic material on the surface of the colloidal tools.
In a preferred embodiment, the small non-coding RNAs are selected from any polynucleotide molecule which has a length from 10 nucleotides, more preferably 17 nucleotides, to about 450 nucleotides, and can be endogenously transcribed or exogenously produced (in a chemical or synthetic manner) but cannot be translated into a protein. Preferably, the small non-coding RNAs are selected from any polynucleotide molecule which has a length from about 10 nucleotides, preferably 15 nucleotides, more preferably 17 nucleotides to about 50 nucleotides, more preferably 30 nucleotides, even more preferably 25 nucleotides in length, and can be endogenously transcribed or exogenously produced (in a chemical or synthetic manner) but cannot be translated into a protein.
In another preferred embodiment, the small non-coding RNAs are selected from the list consisting of siRNA (small interfering RNA), piwi-RNA, tRNA (transfer ribonucleic acid or transfer RNA), snRNA (small nuclear RNA), snoRNA (small nucleolar RNA), tncRNAs (transfer RNA-derived small non-coding RNAs), and microRNAs.
In yet another preferred embodiment, the population of small non-coding RNAs of interest comprises non-coding RNAs selected from at least one from the list consisting of siRNAs (small interfering RNAs), piwi-RNAs, tRNAs (transfer ribonucleic acids), snRNAs (small nuclear RNAs), snoRNAs (small nucleolar RNAs), tncRNAs (transfer RNA-derived small ncRNAs), and microRNAs. Preferably, the small non-coding RNAs of interest comprise microRNAs or comprise mainly microRNAs, more specifically more than 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the total population of small non-coding RNAs are miRNAs. The same proportions could be extrapolated to the other small non-coding RNAs described herein.
Another aspect of the invention relates to a kit or device, hereinafter kit or device of the invention, comprising the elements necessary for carrying out the method of the invention.
Another aspect of the invention relates to the use of the method of the invention or of the kit or device of the invention for the high-resolution analysis of the populations of miRNAs in a biological sample, and also for the high-resolution analysis of the populations of any other small non-coding RNA such as siRNAs (small interfering RNAs), piwi-RNAs, tRNAs (transfer ribonucleic acids or transfer RNAs), snRNAs (small nuclear RNAs), snoRNAs (small nucleolar RNAs), or tncRNAs (transfer RNA-derived small non-coding RNAs). Preferably, the biological sample is blood, and even more preferably plasma.
The most effective and least costly analysis of miRNAs, as well as of any other small non-coding RNA in blood would have several applications for the diagnosis of various diseases. For example, but without limitation, it would allow the use of circulating miRNAs as a biomarkers for cancer, and more specifically, breast cancer.
Therefore, another aspect of the invention relates to the use of the method of the invention or of the kit or device of the invention for the diagnosis of a disease. More preferably, the disease is cancer. In a particular embodiment, the disease is breast cancer.
Nucleic acids or polynucleotides for sequencing include, but are not limited to, nucleic acids such as DNA, RNA, or PNA (peptide nucleic acid), variants or fragments thereof, and/or concatemers thereof. The polynucleotides can be of a known or unknown sequence, natural or artificial, and can be from any source (for example, eukaryotes or prokaryotes). The polynucleotides can be naturally derived, recombinantly produced, or chemically synthesized. Concatemerized polynucleotides can contain subunits or analogs thereof which may or may not be naturally occurring, or modified subunits. Methods as described herein can be used to determine a polynucleotide sequence. The length of the target nucleic acid for sequencing may vary. For example, the nucleic acid for sequencing can include at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 500, at least 1000, at least 10000, at least 100,000, at least 1,000,000, at least 10,000,000 nucleotides. The polynucleotide for sequencing can be of genomic origin or can be fragments or variants thereof. The nucleic acid chain for sequencing can be of single chain and it may or may not be derived from a double-stranded nucleic acid molecule. The single-stranded molecules can also be produced, for example, by means of in vitro or chemical synthesis methods and technologies. The embodiments as described in the present specification are not limited by the nucleic acid preparation methods and those skilled in the art can practice any number of methods to provide a composition for use in the described methods. For example, in the sequence by means of synthesis methodologies, a library comprising the target nucleic acids is often generated, and a part of the DNA library is then sequenced.
“Operatively attached” means that two chemical structures are attached to one another such that they remain attached through various manipulations to which they are expected to be subjected. Normally, the functional moiety and the coding oligonucleotide are covalently attached through a suitable binding group. For example, the binding group can be a bifunctional moiety with a binding site for the coding oligonucleotide and a binding site for the functional moiety.
Attachment between the 5′ end of the oligonucleotide and the surface of the particle must be by means of a covalent bond. Preferably, there are two options: an amide bond (as shown in the examples of the invention) or bonds based on thiol groups such as a disulfide bond.
The methods described in the present specification are not limited by any sequencing sample preparation method in particular and the alternatives will be readily evident for any person skilled in the art and are considered within of the scope of the present description.
In this specification, the term “colloidal tool” is synonymous to “magnetic particles attached to oligonucleotides”.
To analyze the working of the methodology, miRNA-seq libraries were produced from synthetic RNA (a collection of 10 synthetic molecules corresponding to human miRNA sequences are used in the test) at very low concentrations (of the order of pmoles and fmoles, in an attempt to emulate existing concentrations.
After completing the miRNA libraries, a proof of concept test was performed by means of Sanger sequencing. The miRNA library is made up of a collection of different sequences, so it is not a suitable substrate for Sanger sequencing.
Purification and ligation of the miRNA libraries in the pGEM plasmid-T (commercial amplicon cloning system) are performed, followed by transformation to E. coli (JM109 strain). Then, the bacteria were grown in a selective medium with ampicillin (the pGEM plasmid provides resistance to ampicillin). Only those bacterial clones which took up a plasmid were capable of growing in the selective medium and forming colonies in the agar with the culture medium (the colonies are clones carrying the pGEM plasmid-T with a single insert version).
After cloning, twenty colonies were picked and grown, from which plasmid DNA was extracted, and this was sequenced by means of the Sanger method using specific primers flanking the insertion point. The sequence corresponding to the inserts had a mean size of 128 base pairs.
The insert is attached to the open plasmid by means of a ligation reaction that, if unbiased, would have two options with equal frequency (sense or antisense, with respect to the plasmid sequence). However, in all the sequencing performed in the test of the present invention, the insert was in the sense direction. This datum proves the existence of strong ligation biases which increase when working with relatively short sequences.
Next (
It is furthermore observed that the poly(C/G) and poly(A/T) tails have variable sizes due to the actual nature of the tailing reactions and the elongation process on the particle.
_3′
_3′
_3′
The study consists of two cohorts:
The study includes control samples of healthy women and sick women with breast cancer in different stages of progression (stages 0, I, II, Ill, and IV) and with different phenotypes (Luminal A, Luminal B, HER2, and triple negative). In those women undergoing neoadjuvant therapy, the cohort will consist of blood samples obtained before and after treatment.
The high-resolution composition of circulating miRNAs will be evaluated by analyzing the following parameters:
The illustrations shown in the specification show the production of a miRNA library using adapters of the Ion S5 platform. The methodology described in the illustrations can be adapted to the production of libraries for the Illumina platform by introducing small modifications. Taking into account that the length of Illumina adapters is greater than the length of Ion S5 adapters, it is advisable for the reverse transcription and elongation reactions (on particle) to incorporate incomplete versions of Illumina adapters, and for the primers used in final PCR reaction to complete said adapters during the amplification of the library.
Experiments have been performed with synthetic patterns prepared from equimolar mixtures of synthetic RNA sequences corresponding to 30 human miRNAs (miR-17, -18a, -20a, -21, -23a, -23b, -24, -26b, -29c, -34a, -34b, -34c, -125b, -135a, -135b, -145, -320a, -125a, -130a, -135b, -150, -155, -200c, -210, -221, -223, -301a, -365a, -454, -663b). The libraries produced with synthetic patterns were massively sequenced in the Illumina platform, incorporating Nextera type adapters. Data analysis disclosed a dispersion of less than 5%, with respect to the equimolarity existing in the initial pattern mixture.
The studies performed with synthetic miRNA patterns have demonstrated that the minimum amount of miRNA which can be used to produce a massive sequencing library is 1 pg.
When libraries are produced from very small amounts of miRNA, relevant increases in amplification bias occur at the PCR level (the invention includes a final PCR step). In these cases, it is advisable to incorporate UMI sequences (Unique Molecular Identifiers, Nat Methods, 2017. PMID: 28448070) into the PCR primers. UMI sequences are used to correct amplification biases.
1. A method for obtaining a massive sequencing library with the cDNA of a population of miRNAs of interest, which comprises:
2. The method according to clause 1, wherein precipitation and washing with a suitable buffer are performed after each step.
3. The method according to any of clauses 1-2, wherein the elongation of step 5 is performed with a terminal transferase.
4. The method according to any of clauses 1-3, wherein the polyG tail of step 5 must have between 15 and 20 guanine nucleotides.
5. The method according to any of clauses 1-4, wherein the elongation template of step f) can have a dideoxynucleotide at the 3′ end which prevents the elongation of the template itself.
6. The method according to any of clauses 1-5, wherein, in the case of libraries produced from tissue RNA in which larger RNAs (ribosomal and messenger RNAs) have not been removed, it is necessary to perform a selective purification of sizes (below 200 bp) after step g).
7. The method according to any of clauses 1-6, wherein other complementary nucleotide ends are used.
8. Use of the method according to any of clauses 1-7 for the identification of miRNAs of interest in a biological sample.
9. Use of the method according to the preceding clause, wherein the biological sample is blood. Use of the method according to clause 8, wherein the biological sample is plasma.
11. Use of the method according to any of clauses 1-7 for the diagnosis, prognosis, or response to treatment of a disease.
12. Use of the method according to any of clauses 1-7 for the diagnosis, prognosis, or response to treatment of cancer.
Number | Date | Country | Kind |
---|---|---|---|
P202031119 | Nov 2020 | ES | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/ES2021/070804 | 11/8/2021 | WO |