The present invention relates to method for preparing an RNA or DNA sample for a target specific next generation sequencing comprising performing a one-step target enrichment in a single reaction vessel or in a single reaction mixture, as well as a kit for preparing an RNA or DNA sample for next generation sequencing in a one-step target enrichment. Further envisaged is the use of the method or the kit for a rapid virus detection, a rapid leukocyte antigen-associated gene identification or a rapid blood group associated gene identification.
In the past 15 years, a variety of Next Generation Sequencing (NGS) technologies have been developed after the founding sequencing method of Sanger dideoxy synthesis in 1977. Next Generation Sequencing (NGS), also known as high-throughput sequencing, represents an assortment of sequencing methods which transcend the capacity of traditional DNA sequencing technologies in respect to cost, speed and data output. This technology supports a massively parallel sequencing and thus, allowing rapid analysis of a multitude of samples. There is a variety of NGS platforms using different sequencing technologies which can be grouped into two major categories, sequencing by hybridization and sequencing by synthesis (SBS).
Sequencing by hybridization uses arrayed DNA oligonucleotides of known sequences on filters that were hybridized to labelled fragments of the DNA to be sequenced. By repeatedly hybridizing and washing away the unwanted non-hybridized DNA, it is possible to determine whether the hybridizing labelled fragments matches the sequence of the DNA probes on the filter. This technology depends on using specific probes to interrogate sequences, such as in diagnostic applications for identifying disease-related SNPS (single-nucleotide polymorphisms) in specific genes or identifying chromosome abnormalities (Slatko et al., Curr Protoc Mol Biol., 2018, 122(1): e59).
SBS methods are a further development of Sanger sequencing, without the dideoxy terminators, in combination with repeated cycles of synthesis, imaging, and methods to incorporate additional nucleotides in the growing chain. Two major SBS technologies are prevalent on the market, the Ion Torrent technology and the Illumina technology.
The Illumina technology is defined by their use of terminator molecules that are similar to those used in Sanger sequencing, in which the ribose 3′-OH group is blocked, thus preventing elongation. The technology is based on a so-called “bridge amplification” wherein DNA molecules with appropriate adaptors ligated on each end are used as substrates for repeated amplification synthesis reaction on a solid support (i.e. glass slide) that contains oligonucleotide sequences complementary to a ligated adaptor. The oligonucleotides on the slide are spaced such that the DNA, which is then subjected to repeated rounds of amplification, creates clonal “clusters” consisting of about 1000 copies of each oligonucleotide fragment. During the synthesis, the nucleotides carrying each a different fluorescent label are incorporated and then detected by direct imaging (Slatko et al., Curr Protoc Mol Biol., 2018, 122(1): e59). The nucleotide label serves as a terminator for polymerization, so after each dNTP incorporation, the fluorescent dye is imaged to identify the base and then enzymatically cleaved to allow incorporation of the next nucleotide.
A similar well-known sequencing-by-synthesis technology is ion semiconductor sequencing, often referred to as Ion Torrent technology. This method is based on the detection of hydrogen ions that are released during the polymerization of DNA. As such, no images are created and analysed, as opposed to various other techniques. Unlike the Illumina technology, this approach relies on a single signal to mark the incorporation of a dNTP into an elongating strand. As a consequence, an iterative addition of each of the four nucleotides to a sequencing reaction is necessary to ensure only one dNTP is responsible for the signal. Another difference of the Ion Torrent technology lies in the dNTPs themselves, which do not require to be blocked, as the absence of the next nucleotide in the sequencing reaction prevents elongation (Goodwin et al., Nat Rev Genet, 2016; 17(6), 333-51).
The sample preparation for many sequencing-by-synthesis approaches are generally rather similar and may comprise a) fragmentation of DNA sequences into suitable sizes (between 25-600 bps), b) target enrichment, c) adapter ligation, and d) attachment of indices or barcodes to distinguish between the multitudes of samples. However, the sample preparation is highly time-consuming due to the many steps and error-prone if not performed with due care. A major bottleneck and speed-limiting step for NGS sample preparation has been the selective enrichment of a target, the attachment of required sequences, such as indices and adaptors and the various purification steps between each of these steps. As a result, it usually takes as long as 2-4 days to generate a sample that is ready to be sequenced and another 1-2 days to complete an entire sequencing process
In times when a rapid and accurate method of analysing samples is required, such as seen with the outbreak of the COVID-19 pandemic, time-consuming and complex sample preparation proves to be rather challenging in view of the flood of samples to be analysed.
Hence, there is a need for a fast sample preparation for NGS application, which avoids time consuming process steps and can be automated.
The present invention addresses these needs and provides in one aspect a method for preparing an RNA sample for a target specific next generation sequencing comprising performing a one-step target enrichment in a single reaction vessel or in a single reaction mixture, wherein said enrichment comprises the steps: (i) exposing the RNA to be sequenced in a single reaction vessel to a mixture comprising a reverse transcriptase, a DNA polymerase, and (a) one or more target-specific reverse primers, suitable for the preparation of a target specific cDNA; and (b) a forward target specific primer comprising a first forward adaptor sequence, and a reverse target specific primer comprising a first reverse adaptor sequence, and (c) a forward indexing primer comprising a first forward adaptor sequence, a forward indexing sequence and a second forward adaptor sequence; and a reverse indexing primer, comprising a first reverse adaptor sequence, a reverse indexing sequence, wherein the reverse indexing primer sequence is different from the forward indexing primer sequence, and a second reverse adaptor sequence; and desoxyribonucleoside triphosphates (dNTPs); and (ii) subjecting the reaction mixture of (i) to a series of temperature changes under conditions sufficient to yield a first strand cDNA copy of at least a portion of the RNA to be sequenced, preferably a gene sequence, and subsequently a target specific amplicon comprising starting from the 5′- to the 3′-end a second forward adaptor sequence, a forward index sequence, a first forward adaptor sequence, a forward target specific primer sequence, target sequence, a reverse target specific primer sequence, a first reverse adaptor sequence, a reverse index sequence and a second reverse adaptor sequence.
In a further aspect the present invention relates to a method for preparing a DNA sample for a target specific next generation sequencing comprising performing a one-step target enrichment in a single reaction vessel or in a single reaction mixture, wherein said enrichment comprises the steps: (i) exposing the DNA to be sequenced in a single reaction vessel to a mixture comprising a DNA polymerase, and (a) one or more target-specific forward primers and one or more target specific reverse primers, suitable for the preparation of a target specific DNA; and (b) a forward target specific primer comprising a first forward adaptor sequence, and a reverse target specific primer comprising a first reverse adaptor sequence, and (c) a forward indexing primer comprising a first forward adaptor sequence, a forward indexing sequence and a second forward adaptor sequence; and a reverse indexing primer, comprising a first reverse adaptor sequence, a reverse indexing sequence, wherein the reverse indexing primer sequence is different from the forward indexing primer sequence, and a second reverse adaptor sequence; and desoxyribonucleoside triphosphates (dNTPs); and (ii) subjecting the reaction mixture of (i) to a series of temperature changes under conditions sufficient to yield a first strand cDNA copy of at least a portion of the RNA to be sequenced, preferably a gene sequence, and subsequently a target specific amplicon comprising starting from the 5′- to the 3′-end a second forward adaptor sequence, a forward index sequence, a first forward adaptor sequence, a forward target specific primer sequence, target sequence, a reverse target specific primer sequence, a first reverse adaptor sequence, a reverse index sequence and a second reverse adaptor sequence.
The currently claimed one-step target enrichment strategy for RNA and DNA samples can advantageously be used for an infinite number of applications, as target specific primers can be designed for any kind of genetic targets, similar to the conventional PCR. Further, since adaptor sequences in Target Specific Primer and Indexing Primer can be modified, the one-step target enrichment strategy of the present invention is also applicable for the use on different sequencing platforms. A further relevant advantage of the one-step target enrichment strategy of the present invention is the usage of the two separated primer sets (i.e. a Target Specific Primer and an Indexing Primer). The same Indexing Primer can thus be combined with different Target Specific Primers in different applications. There is hence no need to design and synthesize new Indexing Primers, if new target region must be sequenced. Furthermore, the use of dual indexing and forward Indexing Primers in combination with reverse Indexing Primers advantageously allows for the unambiguous assignment of the sequence reads to the samples. More importantly, only a relatively low number of Indexing Primers is required to analyze a high number of targets and samples. Accordingly, the costs for primer design and synthesis can be significantly reduced. Further details of important embodiments can also be derived from
In one set of embodiments, the method additionally comprises as first step the extraction of RNA or DNA from a sample obtained from a subject. In a preferred embodiment, the RNA or DNA is made accessible for further steps by cell lysis.
In another embodiment, the sample is a liquid sample such as a cell culture, cell suspension, whole blood, blood plasma, urine, lavage, smear, mouth swab, throat swab, cerebrospinal fluid, saliva or stool sample, or a tissue or biopsy sample.
In a further embodiment, the target sequence is, or is derived from, a target gene or a part of the target gene, such as an exon or intron or part of both, a target intergenic region, or a genomic sequence or a part of it.
In yet another embodiment, the method additionally comprises a control amplification of one or more additional target sequences. In a preferred embodiment, the control amplification is performed with an independent subject-based target such as a mammalian house-keeping gene. It is particularly preferred to use an RNase gene.
In another preferred embodiment, the control amplification is an extraction control yielding information on the amount and/or quality of the sample
In one embodiment, the method additionally comprises a step of sample registration, which is performed previous to the enrichment.
In a preferred embodiment, the sample registration comprises the unambiguous linking of the sample to a digital code or number.
In another preferred embodiment, the sample registration comprises a step of sample registration, which is performed previous to the enrichment.
In yet another preferred embodiment, the sample registration comprises the unambiguous linking of the sample to a digital code or number.
In a further embodiment, the sample registration is performed by a subject providing the sample.
In one embodiment, the sample registration is performed online, preferably with a mobile digital device such as a cellphone, tablet computer, smartwatch, or a laptop computer; or with any non-mobile computer system.
In a further embodiment, the method further comprises a purification of the amplicon as obtained in step (ii)
In another embodiment, the method further comprises a step of quantifying the amplicon.
In yet another embodiment, the method comprises a step of sequencing the amplicon as obtained in step (ii), preferably with a NGS system such as Illumina, Ion Torrent, Oxford Nanopore, or SMRT Sequencing.
In one embodiment, the method additionally comprises assembling sequence reads.
In another embodiment, the obtained sequence is aligned and/or compared with one or more reference sequences.
In yet another embodiment, the method additionally comprises a phylogenetic comparison of the obtained sequence(s) with one or more reference sequences.
In one embodiment, the obtained sequence is stored in, and optionally retrievable from, a computer system, a database, a public sequence repository, a cloud system, a hospital computer system, a doctor's association computer system, a local health organization database, a regional health organization database, a national health organization database, an international health organization database.
In a further embodiment, the preparation of sample for a target specific next generation sequencing is for the detection of a virus, microbe or a genotype of a higher eukaryote.
In another embodiment, the detection of a virus or microbe additionally includes an identification of said virus or microbe, preferably of sub-species, strain or variant or mutant version of said virus or microbe.
In yet another embodiment, the virus is a positive strand ssRNA virus, preferably belonging to the order of Nidovirales, Picornavirales or Tymovirales, or to the family of Coronaviridae, Picornaviridae, Caliciviridae, Flaviviridae or Togaviridae, wherein said virus is more preferably a rhinovirus, Norwalk-Virus, Echo-Virus or enterovirus, or a Coronavirus or belongs to the group of Coronaviruses, or belongs to the group of alpha or beta coronaviruses, such as human or Microchiroptera (bat) coronavirus, most preferably a SARS-CoV-2 virus.
In one embodiment, the detection of a genotype of a higher eukaryote comprises the identification of a blood group antigen or of a leukocyte antigen.
In a further embodiment, said blood group is a human blood group, preferably an ABO, MNS, Rhesus, Lutheran, Kell, Lewis, Duffy, Kidd, Diego, Yt, Scianna, Dombrock, Colton, Cromer, or Vel blood group.
In another embodiment, the said leukocyte antigen is a human leukocyte antigen, preferably HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, HLA-G, HLA-DRA1, HLA-DRB1, HLA-DRB3/4/5, HLA-DQA1, HLA-DQB1, HLA-DPA1, or HLA-DPB1, or variants thereof.
In yet another embodiment, the method is performed computer-based, preferably automatically or semi-automatically.
The present invention provides in a further aspect, a kit for preparing an RNA sample for next generation sequencing in a one-step target enrichment comprising: a) a reverse transcriptase (RT); b) one or more target-specific reverse primers, suitable for the preparation of a target specific cDNA, c) a forward target specific primer comprising a first forward adaptor sequence, and a reverse target specific primer comprising a first reverse adaptor sequence; d) a forward indexing primer comprising a first forward adaptor sequence, a forward indexing sequence and a second forward adaptor sequence; and a reverse indexing primer, comprising a first reverse adaptor sequence, a reverse indexing sequence, wherein the reverse indexing sequence is different from the forward indexing sequence, and a second reverse adaptor sequence; e) desoxyribonucleoside triphosphates (dNTPs); and f) a DNA polymerase.
In yet another aspect the present invention relates to a kit for preparing a DNA sample for next generation sequencing in a one-step target enrichment comprising: a) one or more target-specific forward primers and one or more target specific reverse primers, suitable for the preparation of a target specific DNA, b) a forward target specific primer comprising a first forward adaptor sequence, and a reverse target specific primer comprising a first reverse adaptor sequence; c) a forward indexing primer comprising a first forward adaptor sequence, a forward indexing sequence and a second forward adaptor sequence; and a reverse indexing primer, comprising a first reverse adaptor sequence, a reverse indexing sequence, wherein the reverse indexing sequence is different from the forward indexing sequence, and a second reverse adaptor sequence; e; d) desoxyribonucleoside triphosphates (dNTPs); and e) a DNA polymerase.
In one embodiment, the adaptor sequence has length of about 8 to 45 nucleotides.
In another embodiment, the indexing primer sequence has length of about 4 to 20 nucleotides.
In yet another embodiment, the adaptor sequence is capable of binding to substrate, preferably a sequence chip or flow cell.
In a further embodiment, the target-specific primer or said target-specific primer pair is specific for a target sequence. Preferably said target sequence is a viral gene or a part of a viral genome, a leukocyte antigen-associated gene, or a blood group associated gene.
In another embodiment, the target sequence is a viral gene of a coronavirus, preferably a SARS-CoV-2 virus gene or genomic portion, or a part of it. More preferably the viral gene is a 5′UTR, 3′UTR, ORF1ab, Orf3a, Orf6, Orf7a, Orf7b, Orf8, Orf10, M gene region, E gene region, N gene region, or S gene region of SARS-CoV-2 virus.
In yet another embodiment the target sequence comprises one or more of the following nucleotide positions according to the nucleotide numbering of the reference genome of SARS-CoV-2 (reference genome with NCBI Reference Sequence No: NC_045512.2; SEQ ID NO: 63): 100, 733, 1264, 2749, 3267, 3828, 5388, 5648, 6319, 6573, 6613, 6954, 7600, 7851, 10667, 11078, 11288-11296, 11824, 12964, 12778, 13860, 17259, 19602, 19656, 21614, 21621, 21638, 21765-21770, 21974, 21991-21993, 22132, 22812, 23012, 23063, 23271, 23525, 23604, 23709, 24506, 24642, 24914, 26149, 27853, 27972, 28048, 28111, 28167, 28253, 28262, 28280, 28512, 28628, 28877, 28975, 28977, 29722, 29754.
In one embodiment, the target sequence is a leukocyte antigen-associated gene selected from HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, HLA-G, HLA-DRA1, HLA-DRB1, HLA-DRB3/4/5, HLA-DQA1, HLA-DQB1, HLA-DPA1, or HLA-DPB1.
In another embodiment, the target sequence is a blood group associated antigen associated with the ABO, MNS, Rhesus, Lutheran, Kell, Lewis, Duffy, Kidd, Diego, Yt, Scianna, Dombrock, Colton, Cromer, or Vel blood group antigens.
In yet another embodiment, the kit additionally comprises synthetic RNA-spike-ins.
In a further preferred embodiment of the method or kit as defined above said forward indexing primer is a primer selected from the group comprising primers of SEQ ID NO: 32 to SEQ ID NO: 39.
In another preferred embodiment of the method or kit as defined above, said reverse indexing primer is a primer selected from the group comprising primers of SEQ ID NO: 40 to SEQ ID NO: 51.
In another preferred embodiment of the method or kit as defined above, said enrichment comprises a multiplexing amplification.
In yet another preferred embodiment of the method or kit as defined above, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more target sequences are simultaneously amplified. It is particularly preferred that 2 or 3 target sequences are simultaneously amplified.
In yet another preferred embodiment of the method defined above, the method allows for a qualitative detection of the target sequence and/or an organism or virus comprising said target sequence or a sequence being highly similar to the target sequence, preferably having a sequence identity of 97% or more.
In further preferred embodiment, method according to the invention comprises the detection of one or more of the following nucleotide exchanges or modifications at positions of the reference genome of SARS-CoV-2 (reference genome with NCBI Reference Sequence No: NC_045512.2; SEQ ID NO: 63): C100T, T733C, G1264T, C2749T, C3267T, C3828T, C5388A, A5648C, A6319G, C6573T, A6613T, T6954C, C7600T, C7851T, T10667G, T11078C, del11288-11296, C11824T, A12964G, C12778T, C13860T, G17259T, C19602T, G19656T, C21614T, C21621A, C21638T, del21765-21770, G21974T, del21991-21993, G22132T, A22812C, G23012A, A23063T, C23271A, C235251, C23604A, C23709T, T24506G, C246421, G24914C, T26149C, A27853C, C27972T, G28048T, A28111G, G28167A, C28253T, insG28262GAACA, G28280C, C28512G, G28628T, AGTAGGG28877-28883TCTAAAC, G28975T, C28977T, C29722T, and C29754T.
A further aspect of the present invention relates to a use of the method or the kit as defined above for an enrichment for a rapid virus detection.
In yet another aspect the present invention relates to a use of the method or the kit as defined above for an enrichment for a rapid leukocyte antigen-associated gene identification.
In yet another aspect the present invention relates to a use of the method or the kit for an enrichment for a rapid blood group associated antigen identification.
In a further embodiment the method as defined above additionally comprises a step of sequence comparison with a reference sequence.
Although the present invention will be described with respect to particular embodiments, this description is not to be construed in a limiting sense.
Before describing in detail exemplary embodiments of the present invention, definitions important for understanding the present invention are given.
As used in this specification and in the appended claims, the singular forms of “a” and “an” also include the respective plurals unless the context clearly dictates otherwise.
In the context of the present invention, the terms “about” and “approximately” denote an interval of accuracy that a person skilled in the art will understand to still ensure the technical effect of the feature in question. The term typically indicates a deviation from the indicated numerical value of ±20%, preferably ±15%, more preferably ±10%, and even more preferably ±5%.
It is to be understood that the term “comprising” is not limiting. For the purposes of the present invention the term “consisting of” or “essentially consisting of” is considered to be a preferred embodiment of the term “comprising of”. If hereinafter a group is defined to comprise at least a certain number of embodiments, this is meant to also encompass a group which preferably consists of these embodiments only.
Furthermore, the terms “(i)”, “(ii)”, “(iii)” or “(a)”, “(b)”, “(c)”, “(d)”, or “first”, “second”, “third” etc. and the like in the description or in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein. In case the terms relate to steps of a method or use, there is no time or time interval coherence between the steps, i.e. the steps may be carried out simultaneously or there may be time intervals of seconds, minutes, hours, days, weeks, etc. between such steps, unless otherwise indicated.
It is to be understood that this invention is not limited to the particular methodology, protocols, reagents, etc. described herein as these may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention that will be limited only by the appended claims. Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art.
As has been set out above, the present invention concerns in one aspect a method for preparing an RNA sample for a target specific next generation sequencing comprising performing a one-step target enrichment in a single reaction vessel or in a single reaction mixture, wherein said enrichment comprises the steps: (i) exposing the RNA to be sequenced in a single reaction vessel to a mixture comprising a reverse transcriptase, a DNA polymerase, and (a) one or more target-specific reverse primers, suitable for the preparation of a target specific cDNA; and (b) a forward target specific primer comprising a first forward adaptor sequence, and a reverse target specific primer comprising a first reverse adaptor sequence, and (c) a forward indexing primer comprising a first forward adaptor sequence, a forward indexing sequence and a second forward adaptor sequence; and a reverse indexing primer, comprising a first reverse adaptor sequence, a reverse indexing sequence, wherein the reverse indexing primer sequence is different from the forward indexing primer sequence, and a second reverse adaptor sequence; and desoxyribonucleoside triphosphates (dNTPs); and (ii) subjecting the reaction mixture of (i) to a series of temperature changes under conditions sufficient to yield a first strand cDNA copy of at least a portion of the RNA to be sequenced, preferably a gene sequence, and subsequently a target specific amplicon comprising starting from the 5′- to the 3′-end a second forward adaptor sequence, a forward index sequence, a first forward adaptor sequence, a forward target specific primer sequence, target sequence, a reverse target specific primer sequence, a first reverse adaptor sequence, a reverse index sequence and a second reverse adaptor sequence.
The term “in a single reaction vessel” as used herein means that due to the innovative combination of primers and ingredients, reverse transcription, target enrichment, index ligation, and sequencing adaptor ligation can be performed in a single place, e.g. vessel, without any additional intervention for purification or similar steps. Accordingly, all ingredient necessary for the preparation of a RNA sample, or in a further embodiment a DNA sample, can be mixed in said single vessel. This advantageously minimizes the time for the sample processing steps, minimizes the risk for sample mix-up or cross-contamination and can be controlled by temperature and cycle duration conditions during amplification steps.
The term “in single reaction mixture” as used herein means that due to the innovative combination of primers and ingredients, all steps of the method of the present invention can be performed in one mixture of ingredient without the necessity of adding further ingredient or inactivating ingredient after a certain step. This advantageously reduced the time and effort for performing the method and minimizes the risk for sample mix-up or cross-contamination.
For many applications, wherein whole genome sequencing is not required, it is often desirable to only sequence a specific subset of genes or regions of the genome. The term “target enrichment” refers to the amplification or multiple reproduction of such specific gene regions, usually by means of polymerase chain reaction amplification, or similar techniques.
Generally, the amplification processes are carried out on DNA target. For the present invention, it might is desirable to not only analyse DNA but also RNA, for example extracted viral RNA. For this purpose, synthesis of DNA from an RNA template via reverse transcription, also known as cDNA-synthesis, needs to be carried out prior to sequencing of the RNA sample. According, as described herein below in more detail, complementary DNA (cDNA) copies are created by using a reverse transcriptase (RT) or DNA polymerase having RT activity, which results in the production of single-stranded cDNA molecules.
The method thus envisages in a first step (i) the exposure of the RNA to be sequenced in a single reaction vessel to a mixture comprising a reverse transcriptase, a DNA polymerase, several different primers and a dNTPs. The term “exposure” a used herein means a contacting of at least one RNA molecule, preferably 1 to 1000 RNA molecules to an enzyme and dNTPs. The contacting may performed for any suitable time period, e.g. during the entire method, or until an amplicon has been obtained. The exposure may further be performed in a suitable buffer or reagent. The buffer may comprise KCl, MgCl2, Tris HCL, DTT, Tween, DMSO, betain, BSA, urea, gelatine, spermidine, or any other suitable component in any suitable concentration known to the skilled person. The buffer may, in non-limiting examples, comprise TrisHCl e.g. in a concentration of 250 mM, KCl, e.g. in a concentration of 375 mM, MgCl2, e.g. in a concentration of 15 mM and DTT, e.g. in a concentration of 0.1 M, preferably at a pH of 8.3. In addition, a suitable amount of dNTPs, e.g. dATP, dCTP, cGTP and cTTP has to be used, e.g. in a suitable concentration such as 10 mM. The buffer may further preferably comprise RNAse blocking compounds or RNase inhibitors such as RNaseZap, Superase, RNaseOUT, ribonuclease inhibitor, RNasin or the like.
The term “reverse transcriptase (RT)” as used herein refers to a class of polymerases characterized as RNA dependent DNA polymerases. All known RTs require a primer to synthesize a DNA transcript from an RNA template. The reverse transcriptase to be included in the mixture may be any suitable reverse transcriptase capable of producing cDNA known to the skilled person. Examples of such suitable reverse transcriptases include MMLV reverse transcriptase without RNase H activity, avian myeloblastosis virus (AMV) RT, or commercially available reverse transcriptases such as SuperScript, SuperScript II, SuperScript III, Superscript IV, StrataScript, One step PrimeScript, Qiagen OneStep RT-PCR kit (Qiagen), Luna Universal Probe One-Step RT-qPCR Kits (NEB), TaqPath 1-Step RT-qPCR Master Mix (ThermoFisher) etc. The reverse transcriptase may further be a thermostable transcriptase such as Superscript IV or a non-thermostable transcriptase such as PrimeScript. This property of the transcriptase may have an influence on the reaction conditions, e.g. the reaction temperature for reverse transcription. It is preferred to use One Step PrimeScript (Takarabio), Qiagen OneStep RT-PCR kit (Qiagen), Luna Universal Probe One-Step RT-qPCR Kits (NEB), or TaqPath 1-Step RT-qPCR Master Mix (ThermoFisher).
The DNA polymerase to be included in the mixture may be any suitable DNA polymerase capable of producing amplicons known to the skilled person. Suitable examples include Taq-DNA polymerase, SuperFi DNA polymerase (Thermo Fisher), Q5 High Fidelity DNA polymerase (NEB), One Taq-DNA polymerase (NEB), Bst DNA polymerase (NEB), Pfu DNA polymerase (Promega), GoTaq polymerase (Promega), Taq DNA Polymerase (Thermofisher), Platinum II Taq Hot-Start DNA Polymerase (ThermoFisher), and FastStart Taq DNA Polymerase (Roche).
The mixture further comprises an innovative selection of primers. The term “primer” as used herein refers to a short single-stranded nucleic acid which serves as a starting point for replicating enzymes, such as DNA polymerase or RT, during DNA or cDNA synthesis. The selection of primers according to the method of the present invention comprises as a first group of primers (a) one or more target-specific reverse primers, suitable for the preparation of a target specific cDNA. The term “reverse primer” as used herein relates to a primer that is complementary to the RNA strand. It accordingly allows for the provision of a DNA copy (cDNA) of the RNA strand after synthesis by a reverse transcriptase. The target-specificity or complementarity may be complete or 99%, 98%, 97% or lower. For example, it may allow for one or two mismatches. It is preferred that the complementarity is complete. The target specific reverse primer for cDNA preparation may have any suitable length, e.g. 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 or more nucleotides. It is preferred that the target specific reverse primer can distinguish between different targets, e.g. different virus strains, as well one or more internal controls. In one mixture one or more target-specific reverse primers may be present, e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. This group of primers may be used for a multiplexing of target sequences, thus yielding a group of cDNA molecules which can subsequently be further processed or enriched. If more than one target-specific reverse primer is used, at least one primer may bind to a sequence of a target entity, whereas the other primer may bind to a control sequence, e.g. a sequence from the host or any other suitable sequence. This step thus allows for a multiplexing amplification during the enrichment procedure.
In certain embodiments 2, 3, 4, 5, 6, 7, 8, 9, 10 or more target sequences are simultaneously amplified. The mixture may accordingly comprise a corresponding number of target specific primers. It is particularly preferred that 2 or 3 target sequences are simultaneously amplified.
The term “target sequence” as used in the context of the present invention relates to any sequence of interest. It may, for example, be a sequence which is derived from a gene, i.e. a specific target gene, or a part of said gene, such as an exon or intron or part of both, an intergenic region, a transcript (RNA), a genomic sequence or a part of it, a splice site, a functional domain, a regulatory sequence such as a promoter sequence, a sector with known SNPs, a mutational hotspot, a sequence associated with a disease, with a resistance to a drug, with an immunological deficit etc. The target sequence may be of RNA or DNA origin.
In preferred embodiments the target-specific primer (in particular in case of an RNA sample where cDNA is produced) or the target-specific primer pair (in particular in case of a DNA sample where a dsDNA amplicon is produced) is specific for a target sequence. The target sequence may have any suitable length, e.g. 100 nt, 200 nt, 300 nt, 400 nt, 500 nt, 1000 nt, 2000 nt, 3000 nt, 4000 nt, 5000 nt, 10 000 nt 15 000 nt or more or any value in between the mentioned values. The target sequence is, in particularly preferred embodiments, a viral gene or a part of a viral genome. It may also be a leukocyte antigen-associated gene, or a blood group antigen associated gene.
The viral gene may, in certain embodiments, be a viral gene of a coronavirus, in particular of SARS-CoV-2 virus gene. It may also be a genomic portion, or it may a part or sub-section of a gene, e.g. a region spanning any 100, 150, 200, 300, 400, 500 nt etc. The target sequence may, for example, comprise, essentially consist of, or consist of the 5′UTR, 3′UTR, ORF1ab, Orf3a, Orf6, Orf7a, Orf7b, Orf8, Orf10, M gene region, E gene region, N gene region, or S gene region of SARS-CoV-2, or any 100, 150, 200, 300 nt etc. fragment within these entities, or spanning two or more of these entities.
In further very specific embodiments, the target sequence comprises one or more positions of genomic mutation in the genome of SARS-CoV-2. This may also include not yet know mutations, which are to be detected in the future. These mutations may, in typical situations, lead to synonymous or nonsynonymous amino acid substitutions, or deletions or other changes in the genome. The mutations may, in preferred embodiments, be associated with phenotypical changes of virus biology, e.g. lead to a changed binding or infection behavior, a changed mortality, a changed susceptibility of the virus to vaccination induced immune reactions etc. In several cases the mutation may have an influence on the structure and/or conformation of the SARS-CoV-2's spike or S protein. In more specific embodiments the mutation may have an influence on the binding interface of the spike protein and its cognate receptor, e.g. ACE2. The present invention accordingly envisages one or more of the following nucleotide positions according to the nucleotide numbering of the reference genome of SARS-CoV-2 (reference genome with NCBI Reference Sequence No: NC_045512.2; SEQ ID NO: 63), which are to comprised in target sequence, e.g. of any suitable size such as 100, 150, 200, 300, 400, 500 nt: Position (according to the numbering scheme of NC_045512.2) 100, 733, 1264, 2749, 3267, 3828, 5388, 5648, 6319, 6573, 6613, 6954, 7600, 7851, 10667, 11078, 11288-11296, 11824, 12964, 12778, 13860, 17259, 19602, 19656, 21614, 21621, 21638, 21765-21770, 21974, 21991-21993, 22132, 22812, 23012, 23063, 23271, 23525, 23604, 23709, 24506, 24642, 24914, 26149, 27853, 27972, 28048, 28111, 28167, 28253, 28262, 28280, 28512, 28628, 28877, 28975, 28977, 29722, and/or 29754.
In further specific embodiments the method according to the present invention comprises the detection of one or more of the following nucleotide exchanges or modifications at positions of the reference genome of SARS-CoV-2 (reference genome with NCBI Reference Sequence No: NC_045512.2; SEQ ID NO: 63): C100T, T733C, G1264T, C2749T, C3267T, C3828T, C5388A, A5648C, A6319G, C6573T, A6613T, T6954C, C7600T, C7851T, T10667G, T11078C, del11288-11296, C11824T, A12964G, C12778T, C13860T, G17259T, C19602T, G19656T, C21614T, C21621A, C21638T, del21765-21770, G21974T, del21991-21993, G22132T, A22812C, G23012A, A23063T, C23271A, C23525T, C23604A, C23709T, T24506G, C24642T, G24914C, T26149C, A27853C, C27972T, G28048T, A28111G, G28167A, C28253T, insG28262GAACA, G28280C, C28512G, G28628T, AGTAGGG28877-28883TCTAAAC, G28975T, C28977T, C29722T, and C29754T. These nucleotide exchanges or modifications relate to a difference vs. a wild-type sequence enshrined in NC_045512.2. In several instances the presence of a nucleotide exchange indicated a mutated virus. In such cases a specific alert may be started. Furthermore, information may be aggregated into statistics, sequence information may be provided to local, regional, national or international health organization or decision makers.
In further embodiments the target sequence is a leukocyte antigen-associated gene. Envisaged examples include HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, HLA-G, HLA-DRA1, HLA-DRB1, HLA-DRB3/4/5, HLA-DQA1, HLA-DQB1, HLA-DPA1, or HLA-DPB1, or variants thereof. Corresponding genetic information may be known to the skilled person or can be derived from suitable literature or internet sources. For example, information may be derived from the PD-IMGT/HLA Database (https://www.ebi.ac.uk/ipd/imgt/hla/; last visited on Apr. 20, 2021); see also Robinson et al., Nucleic Acids Research, 2020), 48:D 948-55.
In a further specific embodiment the target sequence is a blood group associated antigen. Envisaged blood group antigens comprise ABO, MNS, Rhesus, Lutheran, Kell, Lewis, Duffy, Kidd, Diego, Yt, Scianna, Dombrock, Colton, Cromer, and Vel blood group. Corresponding genetic information may be known to the skilled person or can be derived from suitable literature or internet sources. For example, information may be derived from the BGMUT database or the dbRBC (database Red Blood Cells) resource of NCBI at the NIH. (see also Patnaik et al., Nucleic Acids Res., 2012, 40, D1023-D1029).
The innovative selection of primers in the method of the invention further comprises as a second group of primers (b) a forward target specific primer comprising a first forward adaptor sequence, and a reverse target specific primer comprising a first reverse adaptor sequence. The group of primers is envisaged for an enrichment step, which specifically enriches a target sequence. The term “forward primer” as used herein relates to a primer that is complementary to the sequence of the reverse strand. Accordingly, the forward primer allows for providing copies of the template strand, e.g. of the cDNA and subsequently derived DNA molecules. The forward target specific primer advantageously comprises two sections or portions, a target specific portion and an adaptor portion. The target specific portion is complementary to a target sequence on the cDNA. The target specific forward primer portion may have any suitable length, e.g. 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 or more nucleotides. It may be fully complementary or allow for one or two mismatches. The adaptor portion is located at the 5′ end of the primer. It corresponds to a sequencing primer sequence and, at the same time, may be used as adaptor for binding to primers of group (c), i.e. to indexing primers. The adaptor portion may have any suitable length, e.g. 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 or more nucleotides. It is preferred that the adaptor sequence has a length of 8 to 45 nucleotides. The reverse target specific primer comprising a first reverse adaptor sequence as used herein is constructed in a similar way as the forward primer. Accordingly, the reverse primer allows for providing copies of the complementary strand, e.g. a DNA molecule derived from cDNA. The reverse target specific primer advantageously comprises two sections or portions, a target specific portion and an adaptor portion. The target specific portion is complementary to a target sequence on the cDNA-derived complementary DNA strand. The target specific reverse primer portion may have any suitable length, e.g. 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 or more nucleotides. It may be fully complementary or allow for one or two mismatches. The adaptor portion is located at the 5′ end of the primer. It corresponds to a sequencing primer sequence and, at the same time, may be used as adaptor for binding to primers of group (c), i.e. to indexing primers. The adaptor portion may have any suitable length, e.g. 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 or more nucleotides. It is preferred that the adaptor sequence has a length of 8 to 45 nucleotides. This group of primers may be used for sequence enrichment. By using target specific forward and reverse primers which bind to the cDNA sequence (and its complement) which can be obtained with the primers of group and which is unique for every target of interest, it is possible to distinguish between different targets and even allows for including an internal control. Accordingly, a huge number of double stranded DNA molecules, which are fully or at least highly complementary to the cDNA template sequence, can be synthesized by with the help of a DNA polymerase, e.g. via PCR steps.
The innovative selection of primers in the method of the invention further comprises as a third group of primers (c) a forward indexing primer comprising a first forward adaptor sequence, a forward indexing sequence and a second forward adaptor sequence and a reverse indexing primer, comprising a first reverse adaptor sequence, a reverse indexing sequence, wherein the reverse indexing primer sequence is different from the forward indexing primer sequence, and a second reverse adaptor sequence. The forward primer of said group (c) accordingly comprises three sections of which the most 3′ oriented section is a first forward adaptor sequence, which may be identical to the first forward adaptor sequence of the forward primers of group (b) and thus complementary to a corresponding portion of the enriched molecules. Also envisaged are sequences which are partially identical, e.g. 80% or more identical. This adaptor sequence is designed to be bound by a sequencing primer for a subsequent sequencing activity. The second section, more 5′ oriented, is an indexing sequence. The term “indexing sequence” as used herein relates to a sequence which is artificially included in a polynucleotide and which serves for identification purposes after a characterization step, e.g. after sequencing. The indexing sequence may, thus, inform the user which of several samples is being characterized, e.g. sequenced. An indexing section accordingly comprises a unique sequence which is provided only once, i.e. for one type of molecule/polynucleotide, e.g. within one sample. The indexing sequence is preferably different from known naturally occurring sequence motifs. In other embodiments, it is preferably long enough to avoid mix-ups with naturally occurring sequences or different indexing sequences. According to preferred embodiments, the indexing sequence has a length of at least 4 to about 25 or more nucleotides, preferably a length of about 4 to 20 nucleotides. Further details would be known to the skilled person, or can be derived from suitable literature sources such as Kozarewa et al., 2011, Methods Mol. Biol. 733, 279-298. The third section, at the 5′ terminus, is a further, second adaptor sequence. This second adaptor sequence is capable of interacting with a substrate or device, e.g. a flow cell, to facilitate sequencing. The second adaptor sequence may have any suitable length, e.g. 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 or more nucleotides. It is preferred that said adaptor sequence has a length of 8 to 45 nucleotides. The sequence is preferably complementary to a fishing sequence at a substrate or device, e.g. at the surface of a sequence chip or flow cell such as an Illumina sequencing flow cell. It is particularly preferred that the forward indexing primer is a primer selected from the group comprising primers of SEQ ID NO: 32 to SEQ ID NO: 39
The reverse indexing primer of said group (c) has an identical arrangement of three sections as described above for the forward indexing primer. Accordingly, it comprises a first adaptor sequence, an indexing sequence and a second adaptor sequence. Importantly, it is advantageously envisaged that the indexing sequence of the reverse primer of said group (c) is not identical to the indexing sequence of the forward primer. This allows for a distinction of both strand upon sequencing and thus provides two differentiable and separately identifiable strands of a molecule. It is particularly preferred that the reverse indexing primer is a primer selected from the group comprising primers of SEQ ID NO: 40 to SEQ ID NO: 51.
Accordingly, the primer of group (c) allows for a preparation of the molecule for a subsequent sequencing step. This preparation includes a dual indexing. Thus, to every sample, a unique combination of forward and reverse index sequences is added which advantageously allows a pooling of high number of samples and their simultaneous sequencing. Accordingly, the number of samples, sequenced in parallel in a single run, is not limited to the possibilities to design sample specific indexes, but may rather depend on the sequencing platform and potentially data output.
The method according to the invention comprises, upon contacting the RNA molecule with the primer groups as described above, (ii) subjecting the reaction mixture of (i) to a series of temperature changes. These temperature changes are designed to successionally make use of primer groups (a), (b) and (c).
The first set of conditions is designed to allow for the production of a cDNA molecule. Conditions for this step may vary according to the primer length and sequence and the reverse transcriptase used. For example, for thermostable reverse transcriptase, e.g. Superscript IV, a suitable annealing temperature for the primer and a reaction temperature of about 50° C. may be used. For non-thermostable reverse transcriptases a lower temperature, e.g. of 25° C. may be used, preferably with e.g. OneStep PrimeScript. It is preferred to use a low temperature of about 25° C. for the reverse transcription. The reverse transcription step may be performed for any suitable length of time, e.g. for about 3 to 15 min, preferably for about 5 minutes.
The second set of conditions is designed to allow for the enrichment of target sequences from the cDNA molecule with the primers of group (b) and the preparation of molecules for sequencing according with the primers of group (c). Conditions for this step may vary according to the primer length, the target sequence and the DNA polymerase used. Typically, a denaturation step, a primer annealing step and an extension or polymerisation step is used. These steps are repeated for several times, e.g. for 15 to 35 times. For example, the denaturation may be performed at temperatures of about 95° C. The annealing step may be performed in the range of about 50 to 60° C. The extension may be performed, depending on the DNA polymerase, at a temperature of about 55 to 72° C. Time periods may be adapted to the target sequence length or the number of cycles. Typically, denaturation periods are about 15 to 30 sec and annealing periods are about 15 to 30 sec. The extension period may vary considerably. Typically, about 1 min of extension time may be calculated for about 1000 base pairs to be produced.
After having finished the enrichment and preparation steps, a target specific amplicon is obtained. This amplicon comprises the following segments from 5′ to 3′ end: (1) a second forward adaptor sequence which is suitable to binding to a substrate, (2) a forward indexing sequence, (3) a first forward adaptor sequence which is complementary to a sequencing primer, (4) a forward target specific primer sequence, (5) a target sequence of variable length according to the selected target and the selected primers, (6) a reverse target specific primer sequence, (7) a first reverse adaptor sequence, which is complementary to a sequencing primer, (8) a reverse index sequence, (9) a second reverse adaptor sequence, which is suitable to binding to a substrate. The double stranded amplicon can thus be sequenced and identified according to the indexing sequence on both strands in parallel.
Advantageously, this resulting product can be obtained in single vessel and thereby allows for a very efficient high-throughput and less time consuming overall sequencing approach. The approach thus minimizes the time for the sample processing steps, minimizes the risk for sample mix-up or cross-contamination and the process can be controlled by specific parameter such as temperature and cycle duration conditions during amplification steps. As further advantageous feature of the invention, PCR products can, in certain embodiments, be pooled after the target enrichment since every single sample is combined with a sample specific index. Consequently, the number of vessels can by reduced by pooling of samples. Further steps such as PCR product purification and normalization can be performed with a significantly reduced number of vessels, which saves time and reagents. Furthermore, subsequent sequencing steps may directly be performed with the obtained product. The present invention envisages a pooling of 10, 20, 20, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 1000, 2000, 2500, 3000 or more or any value in between the mentioned values of different enrichment products (e.g. derived from a corresponding number of single vessels). In specific embodiments the maximum number of pooled different enrichment products may be limited by the number of available different indexing sequences. The maximum number of pooled different enrichment products can accordingly be adjusted to the choice and amount of different indexing sequences, or to any other suitable parameter.
In an alternative aspect the present invention relates to a method for preparing a DNA sample for a target specific next generation sequencing comprising performing a one-step target enrichment in a single reaction vessel or in a single reaction mixture, wherein said enrichment comprises the steps: (i) exposing the DNA to be sequenced in a single reaction vessel to a mixture comprising a DNA polymerase, and (a) one or more target-specific forward primers and one or more target specific reverse primers, suitable for the preparation of a target specific DNA; and (b) a forward target specific primer comprising a first forward adaptor sequence, and a reverse target specific primer comprising a first reverse adaptor sequence, and (c) a forward indexing primer comprising a first forward adaptor sequence, a forward indexing sequence and a second forward adaptor sequence; and a reverse indexing primer, comprising a first reverse adaptor sequence, a reverse indexing sequence, wherein the reverse indexing sequence is different from the forward indexing sequence, and a second reverse adaptor sequence and (ii) subjecting the reaction mixture of (i) to a series of temperature changes under conditions sufficient to yield a target specific amplicon comprising starting from the 5′- to the 3′-end a second forward adaptor sequence, a forward indexing sequence, a first forward adaptor sequence, a forward target specific primer sequence, target sequence, a reverse target specific primer sequence, a first reverse adaptor sequence, reverse indexing sequence and a second reverse adaptor sequence.
The method for preparing a DNA sample for a target specific next generation sequencing largely corresponds to the method for preparing an RNA sample for target specific next generation sequencing. The above explained features and details thus apply also to the method for preparing a DNA sample, with the exception that the innovative group of primers (a) is designed for the amplification of a DNA molecule. Accordingly, the mixture does not comprise a reverse transcriptase, but only a DNA polymerase as mentioned above. Accordingly, the primers of group (a) may comprise one or more forward and reverse primers for the target sequence, thus allowing an amplification of both strands of the target DNA at the same time. The initial amplification may be followed by a target enrichment and sample indexing step which fully corresponds to the RNA based method mentioned above. As to the elements of (ii), i.e. subjecting the reaction mixture of (i) the method relates to a series of temperature changes under conditions sufficient to yield a target specific amplicon. These steps differ from the steps mentioned in the context of the RNA based method by the omission of a first reverse transcription step. Accordingly, a series of temperature changes including denaturation, annealing and extension as explained above may be used.
Accordingly, after having finished the enrichment and preparation steps for the DNA sample, a target specific amplicon is obtained. This amplicon comprises the following segments from 5′ to 3′ end: (1) a second forward adaptor sequence which is suitable to binding to a substrate, (2) a forward indexing sequence, (3) a first forward adaptor sequence which is complementary to a sequencing primer, (4) a forward target specific primer sequence, (5) a target sequence of variable length according to the selected target and the selected primers, (6) a reverse target specific primer sequence, (7) a first reverse adaptor sequence, which is complementary to a sequencing primer, (8) a reverse index sequence, (9) a second reverse adaptor sequence, which is suitable to binding to a substrate. The double stranded amplicon can thus be sequenced and identified according to the indexing sequence on both strands in parallel.
The method as defined above, additionally comprising as first step the extraction of RNA from a sample obtained from a subject, preferably by sample lysis, or, alternatively, the extraction of DNA from a sample obtained from a subject.
For the extraction of RNA maintaining RNA integrity is critical and requires special precautions during extraction, processing, storage, and experimental use. It is accordingly preferred to perform the method with nuclease-free labware and reagents. To isolate and purify RNA, a variety of strategies are available depending on the type of source materials. It is, in particular, envisaged to stabilize RNA molecules, to inhibit RNases, and to maximize yield. Envisaged purification methods typically remove endogenous compounds, such as complex polysaccharides that may interfere with enzyme activity; and common inhibitors of reverse transcriptases, such as salts, metal ions, ethanol, and phenol. Typically, the extraction is performed with a suitable cell lysis buffer, e.g. a commercially available cell lysis buffer such as RNeasy (Qiagen) or RLA (Promega). Typically, the cell lysis buffer for RNA extraction is highly denaturing and is usually composed of guanidine isothiocyanate. RNase inhibitors are usually present in the lysis buffer, since RNases can be resistant to denaturation and remain active. Also envisaged is the use of paramagnetic beads, e.g. SPRI beads.
For extraction of DNA a similar approach may be used. However, the employment of RNA stabilizers and RNase inhibitors is not necessary. Typically, cells in a sample are separated from each other, often by a physical means such as grinding or vortexing, and put into a solution containing salt. The positively charged sodium ions in the salt help protect the negatively charged phosphate groups that run along the backbone of the DNA. Subsequently, as much of the cellular debris as possible needs to be removed. This is typically done by using a protease to degrade DNA-associated proteins and other cellular proteins. Alternatively, some of the cellular debris can be removed by filtering the sample. Finally, the DNA is precipitated by adding isopropanol to the mixture. Further, magnetic beads-based methods or column-based methods can be used. For cell lysis typically a lysis buffer which commonly contains SDS is used. Also envisaged are commercial extraction kits such as DNAzol (ThermoFisher), PureLink (ThermoFisher), Monarch (NEB) or Wizard (Promega).
In embodiments of the invention the sample may be a liquid sample.
The term “liquid sample” refers to a liquid material obtained via suitable methods from one or more biological organisms or comprising one or more biological organisms, or processed after having been obtained. The liquid sample may further be material obtained from contexts or environments in which biological organisms are present, or processed variants thereof. Typically, the liquid sample is an aqueous sample. In preferred embodiments, it may comprise a bio-organic fluid obtained from the body of a mammal that is taken for analysis, testing, quality control, or investigation purposes. In a preferred embodiment, said liquid sample may be a cell culture sample, a cell suspension, whole blood, blood plasma, urine, lavage, smear, mouth swab, throat swab, cerebrospinal fluid, saliva or stool sample, or a tissue or biopsy sample. It may further be a blood components or banked blood sample, a bile, saliva, nasal fluid, ear fluid sweat, sputum, semen, breast fluid, milk, colostrum, pleural fluid, ascites, cerebrospinal fluid, amniotic fluid or bronchoalveolar lavage fluid, gastric fluid, aqueous humor, vitreous humor, gastrointestinal fluid, exudate, transudate, pleural fluid, pericardial fluid, upper airway fluid, peritoneal fluid, or a liquid stool sample. Also envisaged are a fluid harvested from a site of an immune response, or fluid harvested from a pooled collection site. Furthermore, the liquid sample may contain a tissue extract derived from body tissues, e.g. tissues obtained via biopsy or resections, preferably from a eukaryotic organism, more preferably from a mammalian organism, even more preferably from a human being. The biopsy material may be derived, for example, from all suitable organs, e.g. the lung, the muscle, brain, liver, skin, pancreas, stomach, heart, stomach, intestine etc., a nucleated cell sample, a fluid associated with a mucosal surface, or skin. In order to be extracted, the biopsy material is typically homogenized and/or resuspended in a suitable buffer solution as known to the skilled person. Such samples may, in specific embodiments, be pre-processed e.g. by enrichment steps and/or dilution steps etc. Typically, the sample is processed by lysis and subsequent RNA or DNA extraction as outlined above. The sample may, in further embodiments, be a solid sample. A solid sample, e.g. a solid tissue sample or a solid cell accumulation may subsequently be diluted in a suitable buffer for further processing steps. In addition, any suitable sample derived from the environment, food sources, organic or biological sources (e.g. animals, in particular mammals, plants etc.) may be used, e.g. after processing. It is preferred that the sample is a swab sample, e.g. taken from nose and/or mouth and/or throat zones of the body. Also preferred are blood and processed blood samples, tissue sample or cell culture sample.
For control purposes one or more additional sequences of interest may be analysed. These additional sequences are selected among genes or transcripts which are typically expected to a be present in a sample. Such sequences are further expected to be present in a wide variety of tissues, cell types and samples and to show no or only minimal changes in expression levels between individual samples and experimental conditions. For RNA detection typically expressed genes are used. A suitable example is a mammalian house-keeping gene such as RNase. Also envisaged is the use of Glyceraldehyde 3-phosphate dehydrogenase (GAPDH). For DNA detection any genomic sequence may be used.
The use of additional sequences as mentioned above is advantageously used for different purposes. It can be used as extraction control yielding information on the amount and/or quality of the sample. It may further be used as process control to show whether amplification steps have properly worked. In typical embodiments the control readout is the generation of a sequence read for the envisaged control sequence.
In preferred embodiments of the invention a sample obtained from a subject is registered. The term “sample registration” as used herein means that the sample is unambiguously connected to a subject, as well as to a date and optionally a place, a time, a subject's birth date, email address, telephone number, street address, the subject's responsible general practitioner, the subject's emergency contact, a subject's health insurance information etc. The sample registration is preferably performed previous to the enrichment. The registration of a subject's data may be rendered anonymous. For example, the data may be connected to a number and be stored in a separate place or system. It is particularly preferred to register samples with digital code or number. This digital code or number is preferably chosen to be unambiguous, i.e. should have a suitable length or complexity.
The registration may performed by the subject during the sample taking process. For example, the subject is asked to provide all necessary information. It is preferred to collect the information electronically, e.g. via a mobile digital device such as a cell phone, tablet computer, smartwatch, or a laptop computer, preferably an app working on the device or a suitable interface, e.g. a web interface. Alternatively, the information may also be collected with any non-mobile computer system, e.g. on paper or as audio data.
In further embodiments, the method additionally comprises a step of purification of the amplicon as obtained in step (ii). This step is envisaged in order to avoid quality and efficiency problems during a subsequent sequencing step. The purification of the DNA amplicons may be performed according to any suitable method known to the skilled person. For example, obtained amplifications may be purified with a column-based technique or magnetic/streptavidin bead based methods. For example, spin columns may be used to quickly and efficiently purify PCR products from enzymes, dNTPs, salts and primers. The DNA is typically eluted from the spin column with a buffer and can subsequently be used for sequencing steps. An envisaged commercial example is the QIAquick PCR purification kit (Qiagen), GenCatch (Epoch).
The method as described herein additionally comprises a step of quantifying the amplicon. This step may, for example, be performed spectrophotometrically, e.g. by measuring intrinsic absorptivity properties of nucleic acids (DNA or RNA), or with fluorophore based methods, a fragment analyzer or by real-time PCR. When an absorption spectrum is measured, nucleic acids absorb light with a characteristic peak at 260 nm. A corresponding signal is typically measured by spectrophotometers or spectrometers. Alternatively, a quantification measurement may be performed via electrophoresis of an amplicon sample and a subsequent staining, e.g. with ethidium bromide.
In further specific embodiments the method allows for a qualitative detection of a target sequence. The method thus provides a diagnostically relevant answer to the question whether a sequence is present or not. The qualitative detection may, for example, be based on a predefined cut-off amount of detected molecules in a specific volume and/or after a specific number of PCR cycles. Should the detected number be below said threshold, a negative answer is given, vice versa if the threshold is surpassed. The exact threshold value may depend on the equipment and reagents used. It my further be calibrated with specific control and calibration solutions, e.g. comprising a predefined amount of target sequence. In a corresponding embodiment, the qualitative detection of a target sequence may also provide a diagnostically relevant answer to the question whether an organism or virus comprising said target sequence is or was, or parts of it are or were, present or not in the sample. In further embodiments, the method allows for a qualitative detection of a sequence having a sequence identity of 97% or more with a certain, e.g. predefined target sequence.
As used herein, the term “next generation sequencing” or “deep sequencing are related terms that describe a DNA sequencing technology which allows multi-million DNA samples to be sequenced simultaneously. This next generation sequencing approach is typically a massively parallel sequencing approach which may include any suitable sequencing method that determines the nucleotide sequence of the amplicon according to the present invention in a highly parallel fashion. For example, more than 108 molecules may be sequenced simultaneously. The sequencing may be performed according to any suitable massive parallel approach. Typical platforms include Roche 454, GS FLX Titanium, Illumina, Life Technologies, Ion Torrent, Oxford Nanopore Technologies, Solexa, Solid or Helicos Biosciences Heliscope systems, MGI Tech or SMRT Sequencing. Preferred is the Illumina platform. The sequencing may further include subsequent imaging and initial data analysis steps.
It is further envisaged that the method steps according to the invention, including or excluding steps such as nucleic acid extraction, NGS sequencing, imaging and initial data analysis be performed in a semi-automatic or automatic manner. For example, the core steps of preparing an RNA sample or a DNA sample for sequencing and optionally also the other steps such as nucleic acid extraction, NGS sequencing, imaging and initial data analysis may be performed in a sample analyzer or robotic or liquid sample handling system. The analyzer or handling system may, for example, comprise modules for one or more different assay(s) or activities, e.g. an RNA or DNA preparation module and a sequencing module, a pH sensor, a sensor for ionic concentrations etc. Also envisaged is the presence of reaction zones, which comprise one or more reagent(s) necessary for the performance of a method, e.g. buffers, ions, nucleotides, dyes etc. The analyzer may further or alternatively be equipped with an image recognition module. The analyzer may accordingly also be equipped with microfluidic elements, which allow to transport samples or sample portions to different areas of the device. Furthermore, robotic components including robotic arms etc. may be included. In further embodiments, the analyzer may be used in combination with one or more further analyzer(s). For example, a chain or conveyer structure may be provided in which a sample is analyzed by two or more analyzers, e.g. in a row. These analyzers may further be connected and/or share data with each other and/or an external database or the like. In further embodiments, the analyzers may be integrated in a laboratory management system, e.g. a laboratory information management system.
Correspondingly obtained data are typically provided in the form of sequencing reads which may be single-end or paired-end reads. Obtaining such sequencing data may further include the addition of assessment steps or data analysis steps.
The sequencing length may be any suitable sequencing read length. It is preferred to make use of sequencing reads of a length of about 50 to about 1000, preferably about 150 to about 500 nucleotides, e.g. 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 350, 400, 450, 500 or more nucleotides or any value in between the mentioned values. The length may vary depending on the specific target sequence, organism form which it is derived, genetic or genomic structure of the target sequence or scientific/diagnostic problem to be solved.
In certain embodiments the obtained sequence is aligned and/or compared with one or more reference sequences. The terms “alignment”, “aligning” or “comparing” as used herein relate to the process of sequence comparison and matching a sequencing read with one or more predefined sequence, e.g. with one or more reference sequences or a part thereof. In the context of the present invention alignment exclusively relates to nucleotide sequences. For the performance of an alignment operation or sequence comparison any suitable algorithm or tool can be used. Preferred is an algorithm such as the Burrows-Wheeler Aligner (BWA), e.g. as described by Li and Durbin, 2009, Bioinformatics, 25, 1754-1760.
It is preferred that the alignment is performed in the form of a phylogenetic comparison of the obtained sequence(s) with one or more reference sequences. Suitable algorithms for a phylogenetic comparison include PG LS (phylogenetic generalized least squares) which is used to test whether there is a relationship between two (or more) variables while accounting for the fact that lineage are not independent, or Monte Carlo simulations. An outcome of the phylogenetic comparison is typically a phylogenetic tree, which indicates relationship and lineage of compared sequences. This approach is particularly advantageous if different samples, or samples from different subjects are sequenced at the same time. Also a comparison of the outcome of a phylogenetic comparison with earlier comparison runs or literature data or independent data is envisaged.
Sequence reads may, in certain embodiments also be assembled. The assembly is typically performed with the help of reference sequence which is used as scaffold or framework allowing for a placement of the sequence reads at corresponding position after sequence comparison, e.g. in the form of contigs. A suitable tool is, for example, GAML (Boza et al., Algorithms Mol Biol. 2015, 10:18).
Also envisaged is an assembly of sequence reads without the use of a reference genome, i.e. a de novo assembly. Reads with sufficient amount of overlapping parts at the start or the end positions may be used to form contigs, i.e. sets of mutually overlapping reads. Examples of corresponding algorithms include Cortex (Iqbal et al., Nature Genetics, 2012) and SPAdes (Bankevich et al., Journal of Computational Biology, 2012). A suitable tool is, for example, ABySS (Simpson et al., Genome Res. 2009, 19(6), 1117-23).
The term “reference sequence” as used herein relates to a sequence, which is used for alignment purposes within the context of the present invention. The reference sequence is typically an organism's or entity's genomic sequence or part of a genomic sequence, e.g. a virus genome or part of it, a mammalian genome or part of it, e.g. the genomic sequence of a chromosome or a sub-section thereof. The reference sequence may further be limited to certain sectors of the genome, e.g. specific chromosomes, or parts of a chromosome, or certain genes, groups of genes or gene clusters etc. Particularly preferred are sectors, which correspond to known mutational hotspots or which have been described as being involved in the etiology of diseases. The sequence may either be provided in any suitable direction or orientation. The reference sequence may be selected as any suitable genomic sequence derivable from databases as known the skilled person. For example, a reference sequence may be derived from the reference assembly provided by the Human Genome Reference Consortium, or from the depository of genomic sequences at NCBI, e.g. for viruses (https://www.ncbi.nlm.nih.gov, last visited on Apr. 20, 2021). For example, the genomic sequence of SARS-CoV-2 may be derived as NCBI Reference Sequence: NC_045512.2 from NCBI as mentioned above.
The present invention further envisages, in certain embodiments, a step of sequence comparison with a reference sequence, e.g. a reference sequence as mentioned above. The comparison may be performed with any suitable tool or program, e.g. an algorithm as mentioned above. The comparison may yield results as to the presence of a sequence deviation from the reference sequence, e.g. the presence of a mutated or changed nucleotide. For a massive parallel sequencing approach, the comparison results may further be groups and/or fed into a phylogenetic algorithm or program to detect relationships between the sequences.
The present invention further envisages that the sequence information obtained from a subject's sample and/or the result of a sequence comparison as mentioned above is stored on a computer system, a database, a public sequence repository, a cloud system, a hospital computer system, a doctors association computer system, a local health organization database, a regional health organization database, a national health organization database and/or an international health organization database. The sequence information may be stored in any suitable format. It may further be linked to a subject's registration data, e.g. as defined above. The information may further be linked to one or more aggregated or derived statistical values. The information may, preferably, be evaluated with respect to a specific diagnostic or clinical question, e.g. infection by an organism, infection by a specific type of organism, presence of a certain genotype etc. Also envisaged is a linkage to an alert system comprising a connection to a subject's registration data.
It is further envisaged to connect the obtained information to a diagnostic database which may comprise information on the disease and/or on potential therapeutic options. Also included may be a conclusion on the most promising treatment, or a potential therapy plan. The corresponding information may also be derived from suitable literature sources, e.g. an electronic literature depository.
It is further envisaged that the information can be retrieved from any of the mentioned systems, e.g. by the subject, or medical practitioner, or a hospital, or a health office etc.
In specific embodiments, the preparation of sample for a target specific next generation sequencing is for the detection of a virus, a microbe or a genotype of a higher eukaryote. The virus may be any virus, preferably a positive strand ssRNA virus. It may, in particular, belong to the order of Nidovirales, Picornavirales or Tymovirales, or to the family of Coronaviridae, Picornaviridae, Caliciviridae, Flaviviridae or Togaviridae. In more preferred embodiments the virus is a rhinovirus, Norwalk-Virus, Echo-Virus or enterovirus. It may further be a Coronavirus or belong to the group of Coronaviruses, or belongs to the group of alpha or beta coronaviruses. Particularly preferred is a human or Microchiroptera (bat) coronavirus, in particular a SARS-CoV-2 virus or any mutational derivative thereof.
Further envisaged are PHEV, FcoV, IBV, HCoV-OC43 and HcoV HKU1, JHMV, HCoV NL63, HCoV 229E, TGEV, PEDV, FIPV, CCoV, MHV, BCoV, SARS-CoV, MERS-CoV or any mutational derivative thereof. The term “mutational derivative thereof” as used herein relates to virus variants, which do not have the same genomic sequence as the mentioned viruses (e.g. as defined by reference sequences such as those stored at NCBI, mentioned above) but is derived therefrom by mutational events which are typical for this virus group. These events may lead to changes in the infectious behavior of the virus, but still allows for a classification of the virus, thus identification of the virus as belonging to the group, e.g. of coronaviruses.
Also envisaged are other viruses such as a negative strand ssRNA virus including RSV, metapneumovirus, or an influenza virus; a dsRNA virus including a rotavirus; an ssDNA virus including Smacoviridae or Spiraviridae; a dsDNA virus including human papillomavirus (HPV), an adenovirus, or Herpes simplex virus Type 1 and Type 2 (HSV-1, HSV-2).
A “microbe” as envisaged by the present invention may be a bacterium, e.g. a bacterium which is pathogenic for mammals, in particular for human beings, or a fungus. Examples of bacteria to be analysed according to the present invention include Streptococcus pneumoniae, Haemophilus influenzae, Staphylococcus aureus, in particular MRSA, Escherichia coli, Salmonella spp. and Neisseria meningitidis.
The term“genotype of a higher eukaryote” as used herein relates to any part of the transcriptome or genome of a higher eukaryotic organism, e.g. a mammal, preferably a human being. Such genotype may, preferably, be associated with a diagnostically or clinically relevant or interesting situation, e.g. a disease or predisposition for a disease, or a therapeutically relevant or interesting situation. In particularly preferred embodiments, the genotype is linked to or comprises blood-group antigens or leukocyte antigens. Envisaged blood groups comprise ABO, MNS, Rhesus, Lutheran, Kell, Lewis, Duffy, Kidd, Diego, Yt, Scianna, Dombrock, Colton, Cromer, and Vel blood group. Corresponding genetic information may be known to the skilled person or can be derived from suitable literature or internet sources, e.g. as mentioned above.
In further particularly preferred embodiments, the genotype is linked to or comprises a human leukocyte antigen. Envisaged examples include HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, HLA-G, HLA-DRA1, HLA-DRB1, HLA-DRB3/4/5, HLA-DQA1, HLA-DQB1, HLA-DPA1, or HLA-DPB1, or variants thereof. Corresponding genetic information may be known to the skilled person or can be derived from suitable literature or internet sources, e.g. as mentioned above.
In a further aspect the present invention relates to a kit for preparing an RNA sample for next generation sequencing in a one-step target enrichment. The RNA sample kit according to the present invention comprises a) a reverse transcriptase (RT); b) one or more target-specific reverse primers, suitable for the preparation of a target specific cDNA, c) a forward target specific primer comprising a first forward adaptor sequence, and a reverse target specific primer comprising a first reverse adaptor sequence; d) a forward indexing primer comprising a first forward adaptor sequence, a forward indexing sequence and a second forward adaptor sequence; and a reverse indexing primer, comprising a first reverse adaptor sequence, a reverse indexing sequence, wherein the reverse indexing sequence is different from the forward indexing sequence, and a second reverse adaptor sequence; e) desoxyribonucleoside triphosphates (dNTPs); and f) a DNA polymerase. In a preferred embodiment the kit typically comprises all these elements in one vessel. The vessel may be provided in suitable form, e.g. refrigerated or at any suitable temperature or humidity. In further embodiments, the kit may comprise the above listed components in different containers which may, for example, mixed when used, e.g. when starting the method.
In another aspect the present invention relates to a kit for preparing a DNA sample for next generation sequencing in a one-step target enrichment. The DNA sample kit comprises a) one or more target-specific forward primers and one or more target specific reverse primers, suitable for the preparation of a target specific DNA, b) a forward target specific primer comprising a first forward adaptor sequence, and a reverse target specific primer comprising a first reverse adaptor sequence; c) a forward indexing primer comprising a first forward adaptor sequence, a forward indexing sequence and a second forward adaptor sequence; and a reverse indexing primer, comprising a first reverse adaptor sequence, a reverse indexing sequence, wherein the reverse indexing sequence is different from the forward indexing sequence, and a second reverse adaptor sequence; e; d) desoxyribonucleoside triphosphates (dNTPs); and e) a DNA polymerase. In a preferred embodiment the kit typically comprises all these elements in one vessel. The vessel may be provided in suitable form, e.g. refrigerated or at any suitable temperature or humidity. In further embodiments, the kit may comprise the above listed components in different containers which may, for example, mixed when used, e.g. when starting the method.
In specific embodiments, the kit may comprise a forward indexing primer selected from the group comprising primers of SEQ ID NO: 32 to SEQ ID NO: 39. In further specific embodiments the kit may or may additionally comprise a reverse indexing primer selected from the group comprising primers of SEQ ID NO: 40 to SEQ ID NO: 51.
The kit may be formulated as diagnostic composition and may comprise suitable carriers, diluents etc. The components or ingredients of the kit may, according to the present invention, be comprised in one or more containers or separate entities. The nature of the agents is determined by the method of detection for which the kit is intended.
In further embodiments the kit may comprise synthetic RNA spike-ins. The term “synthetic RNA spike-in” as used herein relates to an RNA molecule of known sequence and quantity which is used to calibrate measurements assays. The spike-in is typically designed to bind to a DNA molecule with a matching sequence, i.e. a control probe. Since a known quantity of RNA spike-in is mixed with the experiment sample during preparation, the degree of hybridization between the spike-ins and the control probes can be used to normalize hybridization measurements of sample RNA.
The kit may optionally comprise a package insert or a leaflet with instructions. The term “package insert” or “leaflet with instructions” is used to refer to instructions customarily included in commercial packages of diagnostic or biochemical products that contain information about the usage, calibration and/or warnings concerning the use etc. The leaflet with instructions may be part of the kit.
In a further aspect the present invention relates to a use of the method or the kit as defined above for an enrichment for a rapid virus detection. The enrichment, which may be implemented with primers of group b) as mentioned above, allows for a very efficient amplification of relevant sequences and thus provided for rapid and massively performable virus detection. This approach is hence capable of saving time and resources and provides essential sequence information for a huge number of samples in a very short period of time.
In a further aspect the present invention relates to the use of the method or the kit as defined above for an enrichment for a rapid leukocyte antigen-associated gene identification. The enrichment, which may be implemented with primers of group b) as mentioned above, allows for a very efficient amplification of relevant sequences and thus provided for rapid and massively performable leukocyte antigen-associated gene detection. This approach is hence capable of saving time and resources and provides essential sequence information for a huge number of samples in a very short period of time.
In yet another aspect the relates to the use of the method or the kit as defined above for an enrichment for a rapid blood group associated gene identification. The enrichment, which may be implemented with primers of group b) as mentioned above, allows for a very efficient amplification of relevant sequences and thus provided for rapid and massively performable blood group associated gene identification. This approach is hence capable of saving time and resources and provides essential sequence information for a huge number of samples in a very short period of time.
The examples and figures provided herein are intended for illustrative purposes. It is thus understood that examples and figures are not to be construed as limiting. The skilled person in the art will clearly be able to envisage further modifications of the principles laid out herein.
The following master-mix and cycle profile was used:
The used Cycler Profile:
The following components were added to the single reaction in a single well of a 96-well plate or 384-well plate:
Thermocycler Profile
142 samples labeled with _RP or _SR0403 and (i) include 2 different amplicons (1 virus specific+1 internal control), (ii) in every sample the internal control and the virus specific region should be counted separately, (iii) the primer sequence is highlighted in yellow, and (iv) the target region is highlighted in red.
RNAseP primers (i.e. those designated with “RP”) were used for internal control. The product length based on these primers was 113 base pairs for the primer pair E_Sarbeco_F1 and E_Sarbeco_R2, and 65 base pairs for the primer pair RP-F and RP-R.Resulting products:
142 samples were labeled with _SC or _SC0403 and (i) includes 5 different amplicons (4 virus specific+1 internal control), (ii) in every sample the internal control and the virus specific region should be counted separately, (iii) the primer sequence is highlighted in yellow, and (iv) the target region is highlighted in red. The MiSeq run 2 results are shown in
RNAseP primers (i.e. those designated with “RP”) were used for internal control. The product length based on these primers was 113 base pairs for the primer pair E_Sarbeco_F1 and E_Sarbeco_R2, 72 base pairs for the primer pair 2019-nCoV_N1-F and 2019-nCoV_N1-R, 67 base pairs for the primer pair 2019-nCoV_N2-F and 2019-nCoV_N2-R, 72 base pairs for primer pairs 2019-nCoV_N3-F and 2019-nCoV_N3-R, and 65 base pairs for the primer pair RP-F and RP-R. Resulting products:
The primer sequence is highlighted in yellow, and the target region is highlighted in red. The NovaSeq results are shown in
RNAseP primers (i.e. those designated with “RP”) were used for internal control. The product length based on these primers was 113 base pairs for the primer pair E_Sarbeco_F1 and E_Sarbeco_R2, 65 base pairs for primer pairs 2019-nCoV_N3-F and 2019-nCoV_N3-R, and 65 base pairs for the primer pair RP-F and RP-R. Resulting products:
The following primers were designed/used in the context of Illumina NGS approaches.
TCGTCGGCAGCGTCAGATGTG-
TATAAGAGACAG
GAC-
AATGATACGGCGACCACCGA-
GATCTACAC
GATCGC]TCGTCGG-
The following primers can be used as indexing primers in the context of the present invention.
GenBank: MG772933.1
MG772933.1 Bat SARS-Like Coronavirus Isolate Bat-SL-CoVZC45, Complete Genome
CGTTAATAGTTAATAGCGTACTTCTTTTTCTTGCTTTTGTGGTATTCTTGCTAGTCACACTAGCCATCCT
TT
GGAATGTCACGCATTGGCATGGAAGTCACACCTTCGGGAACATGGCTGACTTATCATGGAGCCATTA
Problem: The novel coronavirus has created a pandemic (COVID-19). A safe release of a country lockdown requires fast and reliable identification of all, including asymptomatic individuals combined with rapid communication of results.
The solution: the RavenC2 System presents a complete end-to-end and direct to consumer workflow to perform large scale SARS-CoV-2 testing—using pre-barcoded sample collection vials, RNA extraction-free and 1-step targeted library preparation with high throughput Next Generation Sequencing, combined with rapid data analysis and integration with a smartphone app to securely manage the data flow between healthcare providers and citizens.
A novel coronavirus has created a pandemic (COVID-19). Until the availability of a safe vaccination opening of our society and economy requires regular testing of asymptomatic individuals combined with rapid communication of results. This makes high-throughput testing key to monitoring the outbreak to prevent future waves of the pandemic. For fast and reliable large-scale SARS-CoV-2 testing, a complete end to end workflow was established clinical laboratory using pre-barcoded swap devices that are simple and quick registered by the citizen via a smart phone app at sample collection. Samples are quickly transformed using an RNA extraction free, 1-setp targeted RT-PCR approach. The use of separate target-specific primer sets for 3 viral targets and 1 internal human control genes, in combination with indexing primer sets allows modular combination of molecular barcode (index) kits to ensure scalability from several samples to 3000-12000 samples per sequencing run. Libraries are compatible with all Illumina platforms (tested: MiSeq, HiSeq2500, NovaSeq) and can be directly uploaded and analyzed in Basepace or using a local Dragen Server. Result data are directly communicated via API interface and reported to the citizen via smartphone app (see
To demonstrate the sensitivity and specificity of the sequencing workflow 400 test samples, full virus control controls and Virus target PCR amplicons are interrogated on the MiSeq™ System. The full RAVENC2 workflow is tested in a pilot experiment on multitude of individuals.
Single swabs of citizens are collected in pre-barcoded collection vials and the test and registers in the app via scanning the vial barcode. Once collected samples have reached the laboratory and registered by their barcode via LIMS, they are immediately introduced into the laboratory process.
The swab samples are transferred in a lysis buffer and forwarded to an automated RNA extraction workflow. In a 1-step approach target-specific cDNA is generated with SARS-CoV-2 specific virus markers and enriched in each sample (targeted RT-PCR). A human control amplicon is used as process control.
Sequencing barcodes (indices) are introduced simultaneously during the target PCR reaction. Barcodes that identify the individual samples, consist of a unique combination of two 8-nucleotides indices. This enables the parallel analysis of around 3.000 samples per sequencing pool (see
Library pools are sequenced on a MiSeq using 2×51 bp read length with default settings. The solution is scalable and compatible with all supported Illumina Sequencers. Data are either directly uploaded to Illumina Basespace Sequencing Hub and/or stored locally. The RAVENC2 panel includes oligo probes targeting viral sequences and also human genome positive control probes, in the event a sample does not contain any viral target. Using DRAGEN locally or in Basespace, all sequencing data from a run is demultiplexed into FASTQ files for each individual sample. The paired-end reads are then aligned to a reference sequence containing the entire SARS-CoV-2 genome plus human control amplicon sequences. After filtering low-quality reads and alignments purely matching to primer sequences (potential PCR artifacts) DRAGEN counts the number of reads aligning specifically to each amplicon. Sample-level QC, to establish that sample collection and the PCR worked as expected, is done using the counts of reads to the human control RNA (e.g. RPP). On samples passing QC, Virus detection is then performed next by aggregating read counts from all viral amplicons, compared to thresholds established from calibration studies. The DRAGEN system uses an FPGA card for hardware acceleration, enabling processing an entire sequencing run with thousands of samples in under one hour (MiSeq or NextSeq). The overall processing of an entire sequencing run with thousands of samples is completed in under one hour. APIs are defined to transmit results to the laboratory information system as well as the RAVENC2 server communicating with the smartphone app.
The Raven App provides a decentralized and secure digital solution for direct return of Covid-19 results to individuals tested on the high-throughput RavenC2 laboratory platform.
The goal of the RAVENC2 application and infrastructure is to digitally transmit Covid-19 test results directly from the laboratory to each individual at scale, while minimizing the processing of personal data. Individuals using the RAVENC2 Application will be able to scan the unique barcode printed on each test kit, which is later entered into the Laboratory Information Management System (LIMS). Once sequenced, each result can then be transmitted back to the individual based on a unique identifier. The use of temporary access tokens ensures that only users who have scanned the code on a test kit can access the test result assigned to that test kit (see
Validation with Control Samples
Initial validation of the RAVENC2 Panel is performed with a set of 96 test samples which were pretested via qPCR, one commercially available virus sample and viral and human control PCR amplicons as targets. After library preparation and sequencing, resulting data is aligned and read counts from all viral amplicons are aggregated, compared to established thresholds from calibration studies (see
Implementing this workflow all around the EU or even worldwide is assumed to multiply the amount of tests that could be performed on a daily basis. Since modern sequencing platforms are able to produce terabytes of data and the virus amplicons only need a few Megabytes, the number of concurrently analysed samples is mostly limited by the number of unique barcodes. Redesigning those to a length of 10 or 12 bp will greatly increase the capacity of each sequencing run to >1.000.000/run.
With the RavenC2 approach we can see that testing of 3000 samples in a day on a single instrument is highly achievable. This can further be scaled via a number of axes:
Multiple instruments, since there are thousands of sequencing instruments globally.
Higher capacity instruments (newer sequencing instruments produce terabases of data and the virus amplicons only need a few megabases). Theoretically a single run on the latest sequencing instruments could handle millions of samples in a single day however the ability to individually identify each sample would need some significant advances.
Increasing the barcode length and/or availability of unique dual indexing (both will enable more samples to be combined on a single run and help increase the number of samples per day up). Using our current configuration increasing the barcode complexity will enable at least approximately 12,000 sample in a day per instrument.
Single pharyngeal swabs of 1571 citizens were collected in pre-barcoded collection vials (Fisher Scientific), prefilled with 800 μl lysis buffer (Macherey Nagel, Roche). Samples were transferred into the laboratory process within 2-48 hrs. of collection
The resuspension took place during transportation and no additional resuspension was necessary
200 μl of the swab lysis resuspension were transferred by Hamilton Chemagic Liquid handling System (Hamilton), from the barcoded vials into barcoded 96 deep well extraction plates.
RNA extraction was performed on a MagNA Pure 96 System, using the MagNA Pure 96 DNA and Viral NA Kit (Roche Diagnostics) according to instructor's guide, input volume was 200 μl and the elution volume was set to 100 μl.
A one-step target enrichment approach was used to generate Illumina Libraries combining cDNA synthesis, target enrichment, sample indexing and sequence adaptor ligation in a single reaction.
Three separate target-specific primer sets for viral targets were used and one internal human target gene as process control. According the CDC Primerset for Covid19 with a mean target amplicon size of 72 bp. In combination with the gene-specific primer sets a sample specific indexing primer set was added containing a dual molecular barcode sequence and the “Nextera”-adapter sequence-overhang, allowing to combine several samples (up to 3000) in one sequencing run.
RT-PCR reaction was prepared in a total volume of 12.5 μl on a Hamilton M Liquid handling System using One Step PrimeScript™ III RT-qPCR Mix, with UNG (Takara RR601B) in 384 well plates and either 1.5 μl RNA extract from the test samples.
To every 384 well reaction plate 1 μl of a dilution series with known standard virus copies starting from 200 copies/μl to 3 copies/μl either a no template control was included. (EDX SARS-CoV-2 Standard BioRad). Five known positive samples has been included in the workflow. The standard dilution series is used to define a sequencing run specific cut-off and to distinguish positives from negative samples. RT-PCR was carried out on ProFlex™ 2×384-Well-PCR-Systems (ThermoFisher).
2 μL of every RT-PCR well was combined into a single sequencing library pool. Library clean-up was performed with QIAquick PCR Purification Kit according to the instructions in the manual (Qiagen).
The following ingredients and conditions were used:
TCGTCGGCAGCGTCAGATGTG-
TATAAGAGACAG
GAC-
2 μL of every RT-PCR well was combined into a single sequencing library pool. Library clean-up was performed with QIAquick PCR Purification Kit according to the instructions in the manual (Qiagen).
Library pools were sequenced on a MiSeq and on a NovaSeq6000 System using 2×51 bp read length with default settings.
Data were stored locally and were uploaded to Illumina Analytics Platform. Using DRAGEN, all sequencing data from a run were de-multiplexed into FASTQ files for each individual sample. The paired-end reads are then aligned to a reference sequence containing the entire SARS-CoV-2 genome plus human control amplicon sequences. After filtering low-quality reads and alignments purely matching to primer sequences (potential PCR artefacts) DRAGEN counts the number of reads aligning specifically to each amplicon. Sample-level QC, to establish that sample collection and the PCR worked as expected, is done using the counts of reads to the human control RNA (e.g. RPP). On samples passing QC, Virus detection was then performed by aggregating read counts from all viral amplicons, compared to the internal human control and evaluated according to thresholds established from calibration studies.
Positive Samples and invalidated samples from the NGS Screening assay were confirmed by Realtime RT-PCR using the ampliCube Coronavirus SARS-CoV-2 (Mikrogen GmbH) assay following the manufacturer's recommendation. The assay was carried out on a BioRad CFX 96 RT-PCR System.
Following primary data analysis, the read counts of all samples, internal standards, NTC and positive controls for nCoV2, nCov3 and RPP30 were determined
Of the 1571 analyzed samples, two samples and one positive control had to be excluded from the analysis due to a laboratory error.
The mean value and the standard deviation were determined for the control gene RPP30 from all samples. All samples that had fewer reads than the mean minus the standard deviation were considered as not evaluable. 21 samples could not be evaluated and have been reported as invalid (1.3%).
To normalize and determine a comparison value, the counts for nCov2 and nCov3 were added and divided by the counts for RPP30. (counts (nCoV2+nCoV3)/RPP30). The counts for nCoV1 were excluded from the analysis due to too many unspecific reads that were generated from that amplicon. This normalized comparison value was compared with the values of the dilution series of the reference material.
All samples that achieved a value higher than standard 2 (corresponding to 100 copies/μl) were evaluated as positive. 4 samples showed higher read counts than the standard 2. These samples were the positive controls.
When the value of the lowest standard 7 was used as cutoff (3 copies/μl), 66 samples had more reads than the lowest standard (4.2% false positive reads). All of these samples and all samples that were classified as not evaluable due to insufficient RPP30 values were additionally analyzed by real-time PCR. No sample was confirmed as positive in real-time PCR.
The pilot project has shown that the NGS screening approach according to the invention enables reliable detection of SARS CoV2 Infections. In particular, the approach enables the screening of a very large number of samples simultaneously in a short time.
Number | Date | Country | Kind |
---|---|---|---|
20171403.7 | Apr 2020 | WO | international |
20175796.0 | May 2020 | WO | international |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/060704 | 4/23/2021 | WO |