The present invention is in the field of diagnostic and sequencing technologies and is related to a high throughput sequencing method and a kit comprising tools for performing this method, that combine a capture and amplification by switching detection step, preferably the so-called “Capture and Amplification by Tailing and Switching” (CATS) and sequencing technology, preferably the so-called “Nanoballs sequencing” technologies.
A Capture and Amplification by Switching technology, especially the so-called “Capture and Amplification by Tailing and Switching” (CATS) technology is a ligase-free method to produce DNA libraries for a further sequencing from RNA or DNA and is described in the international patent application WO2015/173402-A1.
There is a need to improve RNA sequencing (or RNA-Seq) which is using the next generation sequencing (NGS) technologies to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome. The Capture and amplification by switching protocols, especially CATS protocol are more efficient for RNA-Seq library creation than protocols using ligase by incorporating adaptors during cDNA synthesis in a single reaction tube. In particular, the CATS technology allows optimal sequencing of sensitive, degraded, cell free RNA (cfRNAs) sequence, plasma derived RNA sequences, non-coding RNA (ncRAS) sequences such as miRNA sequences or long non-coding RNA (IncRNA sequences), exosomal RNA sequences, rare and low input RNA sample, that are efficient markers of different diseases, such as cancers.
Improved sequencing protocols, especially the “Nanoball sequencing” technology disclosed by Drmanac et al (Science 327: 5961, page 78-81 (2010)) require a fragmentation of genomic DNA, wherein individual fragments are used to produce circular DNAs, in which platform specific oligonucleotides adapters separate genomic DNA sequence.
The obtained circular DNAs are amplified to generate advantageously single-stranded concatemers (DNA nanoballs (DNBs) that have a size of about 300 nanometers) that can be immobilized on a substrate at a specific location and that remain separated from each other, because of their negatively charges upon the patterned substrate containing up to 3 billion spots each spot containing one (and only one) DNA nanoball.
The present invention aims to provide a new detection and sequencing method and tools for performing such method that do not present the drawbacks of the method and kit of the state of the art.
A first aim of the present invention is to obtain a method and tools for performing this method that improve the nucleic acids libraries production and sequencing, especially of sensitive, degraded, chemically modified, cell free nucleic acid sequences, especially all kind of RNA sequences (coding or non-coding RNA sequences, miRNAs, MiscRNAs, piRNAs, rRNAs, siRNAs, snRNAS, snoRNAs,
TRNAs, . . . ), regardless of a spike-in possibly obtained from a single cell.
A further aim of the invention is to obtain such method and tools for performing this method that are easy to use, with minimal hands-on time; that are also robust and present an improved sensitivity and excellent reproducibility.
All literature and similar material cited in the application, including, but no limited to patents, patent applications, scientific articles, books and web pages are expressly incorporated by reference in their entirety to the description of the present invention.
Unless otherwise defined, all terms used in disclosing the invention, including technical and scientific terms, have the meaning as commonly understood by one of ordinary skilled in the art in the invention field.
A used in this description and claims, the singular forms “a”, “an”, and “the” include singular and plural referents, unless the content of the description clearly dictates otherwise.
The terms “comprising”, “comprises” and “composed of” are synonymous to “including” or “containing” and are inclusive and not open ended and do not exclude any additional, non-recited members, elements or methods steps.
The terms “one or more” or “at least one” are clear per se and encompasses a reference to any of these members, which means any two or more of the members and up to all members.
The term “about” as used herein, when referring to a measurable value, such as an amount of a compound, dose, time and the like is meant to encompass 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5% or even 0;1% of the specified amount or value.
As used in the description and claims, the terms “nucleic acid(s)” includes polymeric and oligomeric macromolecules, made of DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) known as nucleotides, comprising bases selected from the group consisting of Adenine (A), Thymine (T), Cytosine (C), Guanine (G) and Uracil (U).
The terms “single stranded nucleic acids” (ss nucleic acid) refer to a nucleic acid consisting of only one polynucleotide or oligonucleotide strand. In contrast a “double stranded nucleic acid ” (ds nucleic acid) consist of two polynucleotide or oligonucleotide strands wherein the majority of the nucleotides are paired according to known pairing rules.
The terms “genetic amplification” is a biochemical technology used in molecular biology for many years to amplify by primer sequences, a single or few copies of a piece or portion of DNA by replication and copy across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence. The most known genetic amplification technology is the so called “polymerase Chain Reaction or PCR” as described in U.S. Pat Nos. 4,683,195-B2 and 4,683,202-B2 using primers sequences and the heat stable DNA polymerase, such as the Taq polymerase obtained from Bacterium Thermus aquatic allowing thermal cycling.
The term “primer” refers to a oligonucleotide sequence, usually comprising between about 12 nucleotides and about 25 nucleotides, hybridizing specifically to a target sequence of interest and which functions as a substrate onto which nucleotides can be polymerized by a polymerase.
The terms “Template Switch Oligo” or TSO, refer to an oligo that hybridizes to untemplated C nucleotides added by a reverse transcriptase during reverse transcription.
The present invention is related to a high throughput (detection and) sequencing method of a nucleic acid strand sequence as well as tools (preferably included into a kit) for performing this method, this (detection and sequencing) method comprising at least (or consisting of the steps of, preferably the consecutive steps of:
In the method of the invention, the synthesized double stranded nucleic acid sequences present a length preferably comprised between about 200 and about 500 nucleotides.
According to the invention, the native single stranded nucleic acid sequence or native double stranded nucleic acid sequence is preferably selected from the group consisting of fragmented and/or bisulfite-converted DNA sequence, mRNA sequence, miRNA sequence small RNA sequence, piRNA sequence, bisulfite-converted RNA or a mixture thereof.
In the method according to the invention, the at least 5 consecutive identical nucleotides are preferably selected from the group consisting of ribo, desoxy-ribonucleotides or didesoxy-ribonucleotides of A, T, C, G or U, that are preferably added by an enzyme selected from the group consisting of a poly(A)-polymerase, poly(U)-polymerase, poly(G)-polymerase, terminal transferase, DNA ligase, RNA ligase and the dinucleotides and the trinucleotides RNA ligases.
Another aspect of the invention concerns an apparatus or a sequencing kit for performing the method of the invention, this kit or apparatus comprising (or consisting of) the following reagents present in suitable vials
In the method, apparatus and kit according to the invention, the priming oligonucleotide preferably comprises the nucleotide sequence disclosed in claims 9 to 12 and claims 19 and 20 of WO2015/173402 incorporated herein by reference.
Advantageously, in the method, apparatus and kit according to the invention, the rolling cycle amplification is obtained by addition of a sufficient amount of the Phi 29 DNA polymerase, this enzyme allowing a production of concatemers or DNA nanoballs (DNBs) into a long single stranded DNA sequence comprising several head-to-tail copies of the circular template, wherein the resulting nanoparticle self assembles into a tight ball of DNA.
This polymerase replicates the looped DNA and when it finishes one circle, it does not stop-it, continues the replication by peeling off its—previously copied DNA. This copying process continues over and over, forming the DNA nanoball this large mass of repeating DNA to be sequenced all connected together.
Preferably, in the method, apparatus and kit according to the invention the patterned array flow cell is a silicon wafer coated with silicon dioxide, titanium, hexamethyldisilazane (HDMS) and a photoresist material and each DNA nanoball selectively binds to the positively—charged aminosilane according to the pattern.
Advantageously, in the method of the invention, the ligase base sequencing is obtained by adding dNTP incorporated by polymerase, each dNTP being preferably conjugated to a particular label or comprises a modification that allows their future detection through a binding with one more labeled antibody(ies) (CooINGS® technology improved in sensitivity and less costly for obtaining more accurate and longer reads), preferably a label being a fluorophore or dye and possibly containing a termination blocking addition extension, wherein unincorporated dNTPs are washed, wherein image is captured, wherein dye and terminator are preferably cleaved and wherein these steps are repeated until sequencing is complete.
The CooINGS technology is based the use of multiple fluorescent dye molecules attached to the antibodies providing a higher signal-to-noise ratio and reduced consumption of expensive materials, together with incorporating natural bases with no interference between sequencing cycles.
In addition, in the method of the invention, the added fluorophore is excited with a laser that excites specific wavelength of light and the emission of fluorescence from each DNA nanoball is captured on high resolution CCD camera and wherein the color of each DNA nanoball corresponding to a base to the interrogative position and wherein the computer records the base position information.
A last aspect of the invention concerns the use of the apparatus, the kit or the method according to anyone of the preceding claims. The preferred use is proposed for sequencing or expression analysis, for cloning labelling, for the identification of genes or mutation, in detection of human or animal disease or forensic science, for the analysis of infectious diseases and genomes of viruses, bacteria, fungi, animals or plant, including their derived cells, for the characterization of plants, fruits, breeding checks detection of plants or fruits diseases.
The present invention will be described hereafter in the following examples presented as non-limiting preferred embodiments of the present invention
The following
In the
The applicant has obtained the averaged per base sequence distribution of the samples sequenced in lane 03. This distribrution displays a typical Capture and amplification by switching detection construct, being CATS small RNA-seq construct with a short insert short insert size in conformity with the nature of the RNAs sequenced (small non coding RNAs) and also displays the expected poly(A) tail synthesized during library preparation after the small RNA reads.
The N content is non-nul, but low enough not to cause problems later on during data analysis. The template Switch motif (Template switching oligonucleotide TSO) is absent from the first (1-3) sequencing cycles as the sequencing was done in dark cycling mode for those cycles.
The applicant has obtained also the averaged quality distribution of the DNBs sequenced in lane 03. As the vast majority of the DNBs (>85%) across la 03 obtained present a quality score above 30 which makes the sequencing of CATS small RNA libraries on the DNBSEQ-G400 system an efficient and high quality sequencing system.
With the method of the invention, the applicant has selected reads allocated per sample (#index n°) in the different sequencing lanes and mean Q30% for the samples in the different sequencing lanes. The obtained results show that the libraries are able to sequenced normally, regardless of a spike-in and produce high quality reads (Q30>85%).
Furthermore, the relative proportion of mapped reads (%) out of the trimmed reads were obtained for the different samples across the different lanes. Most of the reads after filtering and trimming are mapping (STAR) to a reference genome (hg19) to an expected percentage for a CATS small RNA library. The sequencing method was performed in the different lanes, regardless of a spike-in content, do not impact the mapping stats. This means that the sequencing method and system according to the invention is reproducible across lanes.
The complete diversity biotyping at TPM higher or equal to 2 of the libraries sequenced in lane 03 was obtained by using the Ensembl annotations. most of the library contents are annotated as non-coding RNA, even though a certain fraction is coming from protein coding transcripts, constituting products of degradation, that are captured during library preparation. This biotyping representation is unexpectedly totally in accordance with libraries representation obtained by the state of the art methods and systems, especially the so-called Illumine (ILMN) sequencing method and system.
The small non-coding RNA diversity biotyping at TPM higher or equal to 2 of the libraries sequenced in lane 03 was obtained by using the Ensembl annotations. The noncoding RNAs spanning a wide diversity of small non-coding RNAS ranging from miRNAs to snoRNAs identified by the method and system of the invention. Therefore, the claimed method and system according to the invention are as efficient as the known methods and systems of the state of the art, especially the so-called Illumine (ILMN) sequencing method and system.
Number | Date | Country | Kind |
---|---|---|---|
PCT/EP2019/057777 | Mar 2019 | EP | regional |
19200404.2 | Sep 2019 | EP | regional |
The present application is the US national phase of PCT/EP2020/058791 filed Mar. 27, 2020, which claims the benefit of PCT/EP2019/057777, filed Mar. 27, 2019, and EP19200404.2 filed Sep. 30, 2019.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/058791 | 3/27/2020 | WO | 00 |