The present disclosure relates generally to the field of molecular biology. More particularly, it concerns methods of enhancing detection of sequence variants by selective allele enrichment or depletion prior to sequencing by next-generation sequencing.
In biological samples, such as cell-free DNA from peripheral blood, rare DNA sequence variants, such as cancer driver mutations, are present at less than 1% allele frequency, but can nonetheless provide important therapy guidance or patient stratification information. Additionally, there is a need to simultaneously analyze many potential mutations to achieve high clinical sensitivity. Next-generation sequencing has been applied to detection and quantitation of rare DNA variants through deep sequencing with molecular barcodes. However, these methods are inherently inefficient and expensive due to the large number of NGS reads wasted on sequencing wildtype (i.e., healthy) DNA.
The disclosure describes a class of methods to allow low-throughput detection and quantification of rare variants, such as somatic cancer mutations in peripheral blood plasma. The large number of reads needed for liquid biopsy applications prevents the detection and quantification of rare events by low-throughput NGS. However, allelic enrichment/depletion enables low-throughput NGS instruments, such as the Illumina MiSeq, the Qiagen GeneReader, and the Thermo Fisher Proton systems, to perform liquid biopsy detection of cancer mutations (see
In one embodiment, provided herein are methods of detecting the presence of rare sequence variants within a DNA region of interest, the method comprising: (a) amplifying one or more region of interest using polymerase chain reaction (PCR) with primers, each primer comprising a 5′ sequence-adaptor region and a 3′ gene-specific region, thereby generating double-stranded amplicons; (b) denaturing the double-stranded amplicons, thereby generating single-stranded amplicons; (c) hybridizing the single-stranded amplicons to a mixture of negative-selection Sinks; (d) removing the single-stranded amplicons bound to Sinks; (e) amplifying the remaining single-stranded amplicons by PCR using primers comprising sequencing adaptor sequences; and (f) performing high-throughput DNA sequencing. In some aspects, the rare variant is of unknown sequence identity. In some aspects, the rare variant is of known sequence identity.
In some aspects, step (c) further comprises hybridizing the single-stranded amplicons to a mixture of positive-selection Probes. In certain aspects, the Probes comprise toehold probes, fine-tuned probes, or X-probes. In certain aspects, the Probes and Sinks are thermodynamically competitive. In some aspects, there is one Probe and one Sink for each rare sequence variant. In some aspects, there is one Probe for each rare sequence variant. In some aspects, two or more rare sequence variants may use the same Sink. In some aspects, the Probes comprise Probes having paired probe complement and probe protector oligonucleotides of Table 1. In some aspects, the Sink comprise Sinks having paired sink complement and sink protector oligonucleotides of Table 2. In certain aspects, step (d) further comprises collecting amplicons bound to Probes. In certain aspects, step (d) is performed via streptavidin-coated magnetic beads, collecting is performed using a magnet, and the Probes in step (c) are either directly functionalized with a biotin or hybridized to a universal oligonucleotide functionalized with a biotin. In certain aspects, step (d) is performed via streptavidin-coated agarose beads, collection is performed using centrifugal force, and the Probes in step (c) are either directly functionalized with a biotin or hybridized to a universal oligonucleotide functionalized with a biotin. In certain aspects, removing the single-stranded amplicons bound to Sinks occurs by way of collecting amplicons bound to Probes.
In some aspects, the hybridization in step (c) is performed at a temperature of between about 15° C. and about 75° C. In some aspects, the hybridization in step (c) is performed in a buffer with a monovalent cation concentration of between about 50 mM and about 5 M. In certain aspects, the monovalent cation is sodium. In some aspects, the hybridization in step (c) is performed in a buffer with a divalent cation concentration of between about 3 mM and about 30 mM. In certain aspects, the divalent cation is magnesium.
In some aspects, the PCR of step (a) is multiplex PCR when amplifying more than one region of interest. In some aspects, the PCR of step (a) is carried out for 4-20 cycles. In some aspects, the PCR of step (a) is carried out for no more than 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 cycles.
In some aspects, step (b) is performed via heat denaturation. In certain aspects, heat denaturation comprises heating the amplicon mixture to at least 80° C. for at least 2 minutes. In some aspects, step (b) is performed via DNAse activity and wherein one of the primers in step (a) is modified with either a 5′ phosphate functionalization to encourage degradation or a 5′ functionalization to inhibit degradation. In certain aspects, the 5′ primer functionalization comprises a phosphorothioate, a 2′-O-methyl group, or a non-natural nucleotide.
In some aspects, the Sinks in step (c) comprise toehold probes, fine-tuned probes, or X-probes. In some aspects, the removing in step (d) is performed via solid-phase separation. In some aspects, step (d) is performed via streptavidin-coated magnetic beads, removing is performed using a magnet, and the Sinks in step (c) are either directly functionalized with a biotin or hybridized to a universal oligonucleotide functionalized with a biotin. In some aspects, step (d) is performed via streptavidin-coated agarose beads, removing is performed using centrifugal force, and the Sinks in step (c) are either directly functionalized with a biotin or hybridized to a universal oligonucleotide functionalized with a biotin.
In some aspects, the primers in step (e) are universal primers. In some aspects, the primers in step (e) further comprise a sample barcode or index sequence. In some aspects, the sequencing in step (f) is sequencing-by-synthesis. In some aspects, the sequencing in step (f) is nanopore sequencing. In some aspects, the sequencing in step (f) is sequencing-by-hybridization (e.g., Nanostring).
In some aspects, the method further comprises (g) analyzing the DNA sequencing data to calculate the ratio of reads observed for variant sequences as compared to wild-type sequences. In some aspects, the sequencing in step (f) is paired-end sequencing. In some aspects, the analysis in step (g) does not consider any sequencing read in which the forward read and the reverse read do not perfectly agree on the sequence of the amplicon insert. In certain aspects, the analysis in step (g) does not consider any sequencing reading in which a read quality score is below 30. In certain aspects, the read quality score is a threshold FASTQ score.
In some aspects, the method is further defined as a method of quantifying the presence of rare sequence variants within a DNA region of interest.
As used herein, “essentially free,” in terms of a specified component, means that none of the specified component has been purposefully formulated into a composition and/or is present only as a contaminant or in trace amounts. The total amount of the specified component resulting from any unintended contamination of a composition is therefore well below 0.05%, preferably below 0.01%. Most preferred is a composition in which no amount of the specified component can be detected with standard analytical methods.
As used herein the specification, “a” or “an” may mean one or more. As used herein in the claim(s), when used in conjunction with the word “comprising,” the words “a” or “an” may mean one or more than one.
The use of the term “or” in the claims is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or.” As used herein “another” may mean at least a second or more.
Throughout this application, the term “about” is used to indicate that a value includes the inherent variation of error for the device, the method being employed to determine the value, or the variation that exists among the study subjects.
Other objects, features and advantages of the present disclosure will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only, since various changes and modifications within the spirit and scope of the disclosure will become apparent to those skilled in the art from this detailed description.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure. The disclosure may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
The present disclosure describes methods for using toehold probes, fine-tune probes, or X-probes to apply allele-specific enrichment or depletion to amplicons from multiplex PCR on a biological DNA sample. Due to the high sequence specificity of the probes, a large majority of the wild-type sequences are removed, and the allele frequency of mutations are significantly increased. Consequently, low-depth sequencing becomes sufficient to detect and quantitate rare mutations. Thus, the industrial applicability of this disclosure is to significantly reduce sequencing costs for analyzing rare mutations.
Although these allele-specific enrichment and depletion probes have been previously demonstrated on DNA targets, integration with NGS is non-trivial. For example, direct application of toehold probes to biological DNA is undesirable because low probe capture yield and limited sample input quantity may result in false negatives, in which rare mutations are present in the original DNA sample but not captured by probes and consequently not represented in NGS data.
Furthermore, the current dominant method of NGS analysis of biological DNA is to end-repair fragmented genomic DNA and subsequently ligate to sequencing adaptors. However, end-repair and ligation are both low-yield enzymatic processes, and likewise can result in false negatives due to losing the few DNA molecules that bear a rare mutation.
Yet another possible but ultimately undesirable method is to perform many cycles of multiplexed amplification of gene regions of interest, and perform toehold probe enrichment or depletion on the final product. The drawback of this approach is that with high cycle multiplexed PCR, primer dimers become dominant, practically preventing this approach from scaling to more than 20 genetic loci.
The present disclosure thus describes the approach of performing low-cycle (e.g., 5) multiplex PCR to pre-amplify gene regions of interest by roughly 10- to 30-fold to counter probe binding loss, while simultaneously remaining scalable to high multiplexing due to the unlikelihood of accumulating high concentrations of primer dimers within only a few PCR cycles (see
In some embodiments, both positive selection of the known rare variant is performed in combination with negative selection of the corresponding wild-type allele. In other embodiments, negative selection of wild-type alleles may be formed without concurrent positive selection, which allows for the detection of rare variants of unknown sequence.
ASES sources of error and VAF limit of detection. False positives arise from either PCR errors due to limited enzyme fidelity (e1) or NGS sequencing error (e0) (
In some embodiments, the present disclosure provides synthetic oligonucleotide probes for use in allele-specific enrichment and/or depletion. In particular embodiments, the oligonucleotide probes are toehold probes, X-probes, or fine-tune probes. The oligonucleotide probes can have a length of 30 to 200 nucleotides, particularly 50 to 100 nucleotides, such as between 60 and 70 nucleotides. Further, the oligonucleotide probes can comprise part or all of sequencing primer sequences or their binding sites, such as index sequencing primers for particular sequencing platforms (e.g., Illumina index primers).
The molecular specificity of the enrichment and depletion probes is beneficial to the accurate inference of genomic DNA variants. Nonspecific binding of variant enrichment probes to wild-type loci would defeat the purpose of enrichment. Likewise, nonspecific binding of wild-type depletion probes to variant loci would result in the desired target being lost from the sample. Toehold probes with protector oligonucleotides can be employed to enhance the molecular specificity of the Probes and Sinks. In some aspects, the toehold probes may be fine-tune probes as described in U.S. Pat. Publn. No. 2016/0340727, which is incorporated herein by reference in its entirety. In some aspects, the toehold probes may be X-probes as described in U.S. Pat. Publn. No. 2016/0326600, which is incorporated herein by reference in its entirety.
In some embodiments, a protector oligonucleotide comprising a region that is partially complementary to the target complementarity region is introduced. Importantly, at least five continuous nucleotides on the target complementarity region are not bound by the protector, i.e., form a toehold, in order to allow initiation of hybridization between the target and the Probe/Sink. This protector oligonucleotide can improve the specificity of hybridization reactions (see Zhang et al., 2012, Wang and Zhang, 2015, U.S. Pat. No. 9,284,602, and U.S. Pat. Publn. No. 2016/0340727, each of which is incorporated herein by reference in its entirety), and maintains high sequence selectivity across a large range of temperatures and buffer conditions. In some aspects, the protector oligonucleotide is present in molar excess.
In some embodiments, the nucleic acid probes are rationally designed so that the standard free energy for hybridization (e.g., theoretical standard free energy) between the specific target nucleic acid molecule and the target complementarity region is close to zero, while the standard free energy for hybridization between a spurious target (even one differing from the specific (actual) target by as little as a single nucleotide) and the probe is high enough to make their binding unfavorable by comparison.
The “toehold” region is present in the target complementarity region, is complementary to a target sequence and not complementary to the protector oligonucleotide. The sequences of the complementary regions are rationally designed to achieve this matching under desired conditions of temperature and probe concentration. As a result, the equilibrium for the actual target and Probe/Sink rapidly approaches 50% target:probe::protector:probe (or whatever ratio is desired), while equilibrium for the spurious target and primer greatly favors protector:probe.
Mechanistically, it is thought that hybridization to a target begins at the toehold and continues along the length of the target complementarity region until the probe is no longer “double-stranded.” This assumes complementarity between the target and the target complementarity region. When nucleotide mismatches exist between a spurious target and the target complementarity region, displacement of the second strand (i.e., the protector oligonucleotide) is thermodynamically unfavorable and the association between the target complementarity region and the spurious target is reversed.
Because the standard free energy favors a complete match (fully complementary) between the target sequence of the nucleic acid and toehold regions of the probe rather than a mismatch (e.g., single nucleotide change), the target complementarity region of the probe will bind stably to a target in the absence of a mismatch but not in the presence of a mismatch. If a mismatch exists between the target complementarity region of the probe and the target, the probe duplex prefers to reform. In this way, the frequency of producing a ligation product when a target sequence is not present is reduced. This type of discrimination is typically not possible using the standard single-stranded probes because in those reactions there is no competing nucleic acid strand (such as the protector oligonucleotide) to which a mismatched probe strand would prefer to bind. In some aspects, the thermodynamics of the Probes and Sinks are designed to satisfy that of a Competitive Composition. See, U.S. Pat. Publn. No. US2017/0029875, which is incorporated herein by reference in its entirety.
In some aspects, the Sinks are functionalized with, for example, a biotin group to enable the removal of any target nucleic acids that are bound by the Sinks. In other aspects, the Probes are functionalized but the Sinks are not, thereby allowing any target nucleic acids bound by the Probes to be collected; in this aspect, the Sinks serve to compete with the Probes for binding to the non-desired targets to increase the specificity of the hybridization of the Probes.
In some embodiments, the sequence of the functionalized strand is decoupled from the sequence of the target-specific strand, such as, for example, in the case of X-probes. See, e.g., U.S. Pat. Publn. No. 2016/0326600, which is incorporated by reference herein in its entirety. In this embodiment, the probe system comprises a universal component and a target-specific component. The universal component comprises at least a first universal oligonucleotide/strand, which comprises at least one region. The sequence of the universal strand is not target specific and therefore can be used with any target-specific component. The target-specific component comprises a protector strand and a target-specific/complement strand. The target-specific strand (i.e. complement strand) comprises at least two regions. At least one of the regions of the target-specific strand is fully or partially complementary to the at least one region of the first universal strand, which gives rise to a first double-stranded region. In some instances, the protector strand has a region that is at least partially complementary (and in some instances fully complementary) to all or a portion of the target-specific strand, which gives rise to a second double-stranded region. The target-specific strand contains a toehold region that is not hybridized to any other strand of the probe, but is complementary to a portion of the target sequence. To be clear, the region of the target-specific strand that is complementary to a region of the protector strand is also complementary to the target sequence. In some embodiments, the first universal strand comprises a functionalization conjugated thereto.
Upon hybridization of the probe to the target nucleic acid, the protector strand and any universal strand hybridized thereto dissociates from the target-specific strand leaving the target-specific strand, along with any universal strand hybridized thereto, hybridized to the target nucleic acid. Thus, the probes of the present disclosure permit the use of functionalized universal components with a variety of target-specific components, thereby eliminating the expense of synthesizing a different functionalized probe for each desired target sequence.
In some aspects, the universal strands on the Sinks are functionalized with, for example, a biotin group to enable the removal of any target nucleic acids that are bound by the Sinks. In other aspects, the universal strands on the Probes are functionalized but the universal strands on the Sinks are not, thereby allowing any target nucleic acids bound by the Probes to be collected; in this aspect, the Sinks serve to compete with the Probes for binding to the non-desired targets to increase the specificity of the hybridization of the Probes.
A. Target Nucleic Acid Molecules
A nucleic acid molecule of interest can be a single nucleic acid molecule or a plurality of nucleic acid molecules. Also, a nucleic acid molecule of interest can be of biological or synthetic origin. Examples of nucleic acid molecules include genomic DNA, cDNA, RNA, amplified DNA, a pre-existing nucleic acid library, etc.
Nucleic acids in a nucleic acid sample being analyzed (or processed) in accordance with the present disclosure can be from any nucleic acid source. As such, nucleic acids in a nucleic acid sample can be from virtually any nucleic acid source, including but not limited to genomic DNA, complementary DNA (cDNA), RNA (e.g., messenger RNA, ribosomal RNA, short interfering RNA, microRNA, etc.), plasmid DNA, mitochondrial DNA, etc. Furthermore, as any organism can be used as a source of nucleic acids to be processed in accordance with the present disclosure, no limitation in that regard is intended. Exemplary organisms include, but are not limited to, plants, animals (e.g., reptiles, mammals, insects, worms, fish, etc.), bacteria, fungi (e.g., yeast), viruses, etc. In certain embodiments, the nucleic acids in the nucleic acid sample are derived from a mammal, where in certain embodiments the mammal is a human. A nucleic acid molecule of interest can be a single nucleic acid molecule or a plurality of nucleic acid molecules. Also, a nucleic acid molecule of interest can be of biological or synthetic origin. Examples of nucleic acid molecules include genomic DNA, cDNA, cell-free DNA (cfDNA), RNA, amplified DNA, a pre-existing nucleic acid library, etc. In some aspects, the target nucleic acid is a double-stranded DNA molecule, such as, for example, human genomic DNA.
A nucleic acid molecule of interest may be subjected to various treatments, such as repair treatments and fragmenting treatments. Fragmenting treatments include mechanical, sonic, chemical, enzymatic, degradation over time, etc. Repair treatments include nick repair via extension and/or ligation, polishing to create blunt ends, removal of damaged bases such as deaminated, derivatized, abasic, or crosslinked nucleotides, etc. A nucleic acid molecule of interest may also be subjected to chemical modification (e.g., bisulfite conversion, methylation/demethylation), extension, amplification (e.g., PCR, isothermal, etc.), etc.
An RNA molecule may be obtained from a sample, such as a sample comprising total cellular RNA, a transcriptome, or both; the sample may be obtained from one or more viruses; from one or more bacteria; or from a mixture of animal cells, bacteria, and/or viruses, for example. The sample may comprise mRNA, such as mRNA that is obtained by affinity capture. Obtaining nucleic acid molecules may comprise generation of the cDNA molecule by reverse transcribing the mRNA molecule with a reverse transcriptase, such as, for example Tth DNA polymerase, HIV Reverse Transcriptase, AMV Reverse Transcriptase, MMLV Reverse Transcriptase, or a mixture thereof.
B. Amplification of Nucleic Acids
A number of template-dependent processes are available to amplify the target nucleic acids present in a given sample. One of the best known amplification methods is the polymerase chain reaction (referred to as PCR™) which is described in detail in U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,800,159, each of which is incorporated herein by reference in its entirety. Briefly, two synthetic oligonucleotide primers, which are complementary to two regions of the template DNA (one for each strand) to be amplified, are added to the template DNA (that need not be pure), in the presence of excess deoxynucleotides (dNTP's) and a thermostable polymerase, such as, for example, Taq (Thermus aquaticus) DNA polymerase. In a series (typically 30-35) of temperature cycles, the target DNA is repeatedly denatured (around 90° C.), annealed to the primers (typically at 50-60° C.) and a daughter strand extended from the primers (72° C.). As the daughter strands are created they act as templates in subsequent cycles. Thus, the template region between the two primers is amplified exponentially, rather than linearly.
A barcode, such as a sample barcode, may be added to the target nucleic acid molecules during amplification. One method involves annealing a primer to the target nucleic acid molecule, the primer including a first portion complementary to the target nucleic acid molecule and a second portion including a barcode; and extending the annealed primer to form a barcoded nucleic acid molecule. Thus, the primer may include a 3′ portion and a 5′ portion, where the 3′ portion may anneal to a portion of the target nucleic acid molecule and the 5′ portion comprises the barcode.
C. Sequencing of Nucleic Acids
Methods are also provided for the sequencing of the library of nucleic acid molecules. Any technique for sequencing nucleic acids known to those skilled in the art can be used in the methods of the present disclosure. DNA sequencing techniques include classic dideoxy sequencing reactions (Sanger method) using labeled terminators or primers and gel separation in slab or capillary, sequencing-by-synthesis using reversibly terminated labeled nucleotides, pyrosequencing, 454 sequencing, allele specific hybridization to a library of labeled oligonucleotide probes, sequencing-by-synthesis using allele specific hybridization to a library of labeled clones that is followed by ligation, real time monitoring of the incorporation of labeled nucleotides during a polymerization step, and SOLiD sequencing.
The nucleic acid library may be generated with an approach compatible with Illumina sequencing such as a Nextera™ DNA sample prep kit, and additional approaches for generating Illumina next-generation sequencing library preparation are described, e.g., in Oyola et al. (2012). In other embodiments, a nucleic acid library is generated with a method compatible with a SOLiD™ or Ion Torrent sequencing method (e.g., a SOLiD® Fragment Library Construction Kit, a SOLiD® Mate-Paired Library Construction Kit, SOLiD® ChIP-Seq Kit, a SOLiD® Total RNA-Seq Kit, a SOLiD® SAGE™ Kit, a Ambion® RNA-Seq Library Construction Kit, etc.). Additional methods for next-generation sequencing methods, including various methods for library construction that may be used with embodiments of the present disclosure are described, e.g., in Pareek (2011) and Thudi (2012).
In particular aspects, the sequencing technologies used in the methods of the present disclosure include the HiSeq™ system (e.g., HiSeq™ 2000 and HiSeq™ 1000) and the MiSeq™ system from Illumina, Inc. The HiSeq™ system is based on massively parallel sequencing of millions of fragments using attachment of randomly fragmented genomic DNA to a planar, optically transparent surface and solid phase amplification to create a high density sequencing flow cell with millions of clusters, each containing about 1,000 copies of template per sq. cm. These templates are sequenced using four-color DNA sequencing-by-synthesis technology. The MiSeq™ system uses TruSeq™, Illumina's reversible terminator-based sequencing-by-synthesis.
Another example of a DNA sequencing platform is the QIAGEN GeneReader platform—a next generation sequencing (NGS) platform utilizing proprietary modified nucleotides whose 3′ OH groups are reversely terminated by a small moiety to perform sequencing-by-synthesis (SBS) in a massively parallel manner. Briefly, the sequencing templates are first clonally amplified on a solid surface (such as beads) to generate hundreds of thousands of identical copies for each individual sequencing template, denaturized to generate single-stranded sequencing templates, hybridized with sequencing primer, and then immobilized on the flow cell. The immobilized sequencing templates are then subjected to a nucleotide incorporation reaction in a reaction mix that includes modified nucleotides with a cleavable 3′ blocking group that enables the incorporation and detection of only one specific nucleotide onto each sequencing template in each cycle. See U.S. Pat. Nos. 6,664,079; 8,612,161; and 8,623,598, each of which is incorporated by reference herein.
Another example of a DNA sequencing platform is the Ion Torrent PGM™ sequencer (Thermo Fisher) and the Ion Torrent Proton™ Sequencer (Thermo Fisher), which are ion-based sequencing systems that sequence nucleic acid templates by detecting ions produced as a byproduct of nucleotide incorporation. Typically, hydrogen ions are released as byproducts of nucleotide incorporations occurring during template-dependent nucleic acid synthesis by a polymerase. The Ion Torrent PGM™ sequencer and Ion Proton™ Sequencer detect the nucleotide incorporations by detecting the hydrogen ion byproducts of the nucleotide incorporations. The Ion Torrent PGM™ sequencer and Ion Torrent Proton™ sequencer include a plurality of nucleic acid templates to be sequenced, each template disposed within a respective sequencing reaction well in an array. The wells of the array are each coupled to at least one ion sensor that can detect the release of H+ ions or changes in solution pH produced as a byproduct of nucleotide incorporation. The ion sensor comprises a field effect transistor (FET) coupled to an ion-sensitive detection layer that can sense the presence of H+ ions or changes in solution pH. The ion sensor provides output signals indicative of nucleotide incorporation, which can be represented as voltage changes whose magnitude correlates with the H+ ion concentration in a respective well or reaction chamber. Different nucleotide types are flowed serially into the reaction chamber, and are incorporated by the polymerase into an extending primer (or polymerization site) in an order determined by the sequence of the template. Each nucleotide incorporation is accompanied by the release of H+ ions in the reaction well, along with a concomitant change in the localized pH. The release of H+ ions is registered by the FET of the sensor, which produces signals indicating the occurrence of the nucleotide incorporation. Nucleotides that are not incorporated during a particular nucleotide flow will not produce signals. The amplitude of the signals from the FET may also be correlated with the number of nucleotides of a particular type incorporated into the extending nucleic acid molecule thereby permitting homopolymer regions to be resolved. Thus, during a run of the sequencer multiple nucleotide flows into the reaction chamber along with incorporation monitoring across a multiplicity of wells or reaction chambers permit the instrument to resolve the sequence of many nucleic acid templates simultaneously. Further details regarding the compositions, design and operation of the Ion Torrent PGM™ sequencer can be found, for example, in U.S. Pat. Publn. Nos. 2009/0026082; 2010/0137143; and 2010/0282617, all of which are incorporated by reference herein in their entireties.
Another example of a DNA sequencing technique that can be used in the methods of the present disclosure is 454 sequencing (Roche) (Margulies et al., 2005). 454 sequencing involves two steps. In the first step, DNA is sheared into fragments of approximately 300-800 base pairs, and the fragments are blunt ended. Oligonucleotide adaptors are then ligated to the ends of the fragments. The adaptors serve as primers for amplification and sequencing of the fragments. The fragments can be attached to DNA capture beads, e.g., streptavidin-coated beads using, e.g., Adaptor B, which contains 5′-biotin tag. The fragments attached to the beads are PCR amplified within droplets of an oil-water emulsion. The result is multiple copies of clonally amplified DNA fragments on each bead. In the second step, the beads are captured in wells (pico-liter sized). Pyrosequencing is performed on each DNA fragment in parallel. Addition of one or more nucleotides generates a light signal that is recorded by a CCD camera in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated.
Another example of a DNA sequencing technique that can be used in the methods of the present disclosure is SOLiD technology (Life Technologies, Inc.). In SOLiD sequencing, genomic DNA is sheared into fragments, and adaptors are attached to the 5′ and 3′ ends of the fragments to generate a fragment library. Alternatively, internal adaptors can be introduced by ligating adaptors to the 5′ and 3′ ends of the fragments, circularizing the fragments, digesting the circularized fragment to generate an internal adaptor, and attaching adaptors to the 5′ and 3′ ends of the resulting fragments to generate a mate-paired library. Next, clonal bead populations are prepared in microreactors containing beads, primers, template, and PCR components. Following PCR, the templates are denatured and beads are enriched to separate the beads with extended templates. Templates on the selected beads are subjected to a 3′ modification that permits bonding to a glass slide.
Another example of a DNA sequencing technique that can be used in the methods of the present disclosure is the IonTorrent system (Life Technologies, Inc.). Ion Torrent uses a high-density array of micro-machined wells to perform this biochemical process in a massively parallel way. Each well holds a different DNA template. Beneath the wells is an ion-sensitive layer and beneath that a proprietary Ion sensor. If a nucleotide, for example a C, is added to a DNA template and is then incorporated into a strand of DNA, a hydrogen ion will be released. The charge from that ion will change the pH of the solution, which can be detected by the proprietary ion sensor. The sequencer will call the base, going directly from chemical information to digital information. The Ion Personal Genome Machine (PGM™) sequencer then sequentially floods the chip with one nucleotide after another. If the next nucleotide that floods the chip is not a match, no voltage change will be recorded and no base will be called. If there are two identical bases on the DNA strand, the voltage will be double, and the chip will record two identical bases called. Because this is direct detection—no scanning, no cameras, no light—each nucleotide incorporation is recorded in seconds.
Another example of a sequencing technology that can be used in the methods of the present disclosure includes the single molecule, real-time (SMRT™) technology of Pacific Biosciences. In SMRT™, each of the four DNA bases is attached to one of four different fluorescent dyes. These dyes are phospholinked. A single DNA polymerase is immobilized with a single molecule of template single stranded DNA at the bottom of a zero-mode waveguide (ZMW). A ZMW is a confinement structure which enables observation of incorporation of a single nucleotide by DNA polymerase against the background of fluorescent nucleotides that rapidly diffuse in and out of the ZMW (in microseconds). It takes several milliseconds to incorporate a nucleotide into a growing strand. During this time, the fluorescent label is excited and produces a fluorescent signal, and the fluorescent tag is cleaved off. Detection of the corresponding fluorescence of the dye indicates which base was incorporated. The process is repeated.
A further sequencing platform includes the CGA Platform (Complete Genomics). The CGA technology is based on preparation of circular DNA libraries and rolling circle amplification (RCA) to generate DNA nanoballs that are arrayed on a solid support (Drmanac et al. 2010). Complete genomics' CGA Platform uses a novel strategy called combinatorial probe anchor ligation (cPAL) for sequencing. The process begins by hybridization between an anchor molecule and one of the unique adapters. Four degenerate 9-mer oligonucleotides are labeled with specific fluorophores that correspond to a specific nucleotide (A, C, G, or T) in the first position of the probe. Sequence determination occurs in a reaction where the correct matching probe is hybridized to a template and ligated to the anchor using T4 DNA ligase. After imaging of the ligated products, the ligated anchor-probe molecules are denatured. The process of hybridization, ligation, imaging, and denaturing is repeated five times using new sets of fluorescently labeled 9-mer probes that contain known bases at the n+1, n+2, n+3, and n+4 positions.
A further sequencing platform includes nanopore sequencing (Oxford Nanopore). Nanopore detection arrays are described in US2011/0177498; US2011/0229877; US2012/0133354; WO2012/042226; WO2012/107778, and have been used for nucleic acid sequencing as described in US2012/0058468; US2012/0064599; US2012/0322679 and WO2012/164270, all of which are hereby incorporated by reference. A single molecule of DNA can be sequenced directly using a nanopore, without the need for an intervening PCR amplification step or a chemical labelling step or the need for optical instrumentation to identify the chemical label. Commercially available nanopore nucleic acid sequencing units are developed by Oxford Nanopore (Oxford, United Kingdom). The GridION™ system and miniaturised MinION™ device are designed to provide novel qualities in molecular sensing such as real-time data streaming, improved simplicity, efficiency and scalability of workflows and direct analysis of the molecule of interest. Using the Oxford Nanopore nanopore sequencing platform, an ionic current is passed through the nanopore by setting a voltage across this membrane. If an analyte passes through the pore or near its aperture, this event creates a characteristic disruption in current. Measurement of that current makes it possible to identify the molecule in question. For example, this system can be used to distinguish between the four standard DNA bases G, A, T and C, and also modified bases. It can be used to identify target proteins, small molecules, or to gain rich molecular information, for example to distinguish between the enantiomers of ibuprofen or study molecular binding dynamics. These nanopore arrays are useful for scientific applications specific for each analyte type; for example when sequencing DNA, the technology may be used for resequencing, de novo sequencing, and epigenetics.
“Amplification,” as used herein, refers to any in vitro process for increasing the number of copies of a nucleotide sequence or sequences. Nucleic acid amplification results in the incorporation of nucleotides into DNA or RNA. As used herein, one amplification reaction may consist of many rounds of DNA replication. For example, one PCR reaction may consist of 30-100 “cycles” of denaturation and replication.
“Polymerase chain reaction,” or “PCR,” means a reaction for the in vitro amplification of specific DNA sequences by the simultaneous primer extension of complementary strands of DNA. In other words, PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates. Usually, the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument. Particular temperatures, durations at each step, and rates of change between steps depend on many factors well-known to those of ordinary skill in the art, e.g., exemplified by the references: McPherson et al., editors, PCR: A Practical Approach and PCR2: A Practical Approach (IRL Press, Oxford, 1991 and 1995, respectively).
“Primer” means an oligonucleotide, either natural or synthetic that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3′ end along the template so that an extended duplex is formed. The sequence of nucleotides added during the extension process is determined by the sequence of the template polynucleotide. Usually primers are extended by a DNA polymerase. Primers are generally of a length compatible with its use in synthesis of primer extension products, and are usually are in the range of between 8 to 100 nucleotides in length, such as 10 to 75, 15 to 60, 15 to 40, 18 to 30, 20 to 40, 21 to 50, 22 to 45, 25 to 40, and so on, more typically in the range of between 18-40, 20-35, 21-30 nucleotides long, and any length between the stated ranges. Typical primers can be in the range of between 10-50 nucleotides long, such as 15-45, 18-40, 20-30, 21-25 and so on, and any length between the stated ranges. In some embodiments, the primers are usually not more than about 10, 12, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, or 70 nucleotides in length.
As used herein, a nucleic acid “region” or “domain” is a consecutive stretch of nucleotides of any length.
“Incorporating,” as used herein, means becoming part of a nucleic acid polymer.
A “nucleoside” is a base-sugar combination, i.e., a nucleotide lacking a phosphate. It is recognized in the art that there is a certain inter-changeability in usage of the terms nucleoside and nucleotide. For example, the nucleotide deoxyuridine triphosphate, dUTP, is a deoxyribonucleoside triphosphate. After incorporation into DNA, it serves as a DNA monomer, formally being deoxyuridylate, i.e., dUMP or deoxyuridine monophosphate. One may say that one incorporates dUTP into DNA even though there is no dUTP moiety in the resultant DNA. Similarly, one may say that one incorporates deoxyuridine into DNA even though that is only a part of the substrate molecule.
“Nucleotide,” as used herein, is a term of art that refers to a base-sugar-phosphate combination. Nucleotides are the monomeric units of nucleic acid polymers, i.e., of DNA and RNA. The term includes ribonucleotide triphosphates, such as rATP, rCTP, rGTP, or rUTP, and deoxyribonucleotide triphosphates, such as dATP, dCTP, dUTP, dGTP, or dTTP.
The term “nucleic acid” or “polynucleotide” will generally refer to at least one molecule or strand of DNA, RNA, DNA-RNA chimera or a derivative or analog thereof, comprising at least one nucleobase, such as, for example, a naturally occurring purine or pyrimidine base found in DNA (e.g., adenine “A,” guanine “G,” thymine “T” and cytosine “C”) or RNA (e.g. A, G, uracil “U” and C). The term “nucleic acid” encompasses the terms “oligonucleotide” and “polynucleotide.” “Oligonucleotide,” as used herein, refers collectively and interchangeably to two terms of art, “oligonucleotide” and “polynucleotide.” Note that although oligonucleotide and polynucleotide are distinct terms of art, there is no exact dividing line between them and they are used interchangeably herein. The term “adaptor” may also be used interchangeably with the terms “oligonucleotide” and “polynucleotide.” In addition, the term “adaptor” can indicate a linear adaptor (either single stranded or double stranded) or a stem-loop adaptor. These definitions generally refer to at least one single-stranded molecule, but in specific embodiments will also encompass at least one additional strand that is partially, substantially, or fully complementary to at least one single-stranded molecule. Thus, a nucleic acid may encompass at least one double-stranded molecule or at least one triple-stranded molecule that comprises one or more complementary strand(s) or “complement(s)” of a particular sequence comprising a strand of the molecule. As used herein, a single stranded nucleic acid may be denoted by the prefix “ss,” a double-stranded nucleic acid by the prefix “ds,” and a triple stranded nucleic acid by the prefix “ts.”
A “nucleic acid molecule” or “nucleic acid target molecule” refers to any single-stranded or double-stranded nucleic acid molecule including standard canonical bases, hypermodified bases, non-natural bases, or any combination of the bases thereof. For example and without limitation, the nucleic acid molecule contains the four canonical DNA bases—adenine, cytosine, guanine, and thymine, and/or the four canonical RNA bases—adenine, cytosine, guanine, and uracil. Uracil can be substituted for thymine when the nucleoside contains a 2′-deoxyribose group. The nucleic acid molecule can be transformed from RNA into DNA and from DNA into RNA. For example, and without limitation, mRNA can be created into complementary DNA (cDNA) using reverse transcriptase and DNA can be created into RNA using RNA polymerase. A nucleic acid molecule can be of biological or synthetic origin. Examples of nucleic acid molecules include genomic DNA, cDNA, RNA, a DNA/RNA hybrid, amplified DNA, a pre-existing nucleic acid library, etc. A nucleic acid may be obtained from a human sample, such as blood, serum, plasma, cerebrospinal fluid, cheek scrapings, biopsy, semen, urine, feces, saliva, sweat, etc. A nucleic acid molecule may be subjected to various treatments, such as repair treatments and fragmenting treatments. Fragmenting treatments include mechanical, sonic, and hydrodynamic shearing. Repair treatments include nick repair via extension and/or ligation, polishing to create blunt ends, removal of damaged bases, such as deaminated, derivatized, abasic, or crosslinked nucleotides, etc. A nucleic acid molecule of interest may also be subjected to chemical modification (e.g., bisulfite conversion, methylation/demethylation), extension, amplification (e.g., PCR, isothermal, etc.), etc.
Nucleic acid(s) that are “complementary” or “complement(s)” are those that are capable of base-pairing according to the standard Watson-Crick, Hoogsteen or reverse Hoogsteen binding complementarity rules. As used herein, the term “complementary” or “complement(s)” may refer to nucleic acid(s) that are substantially complementary, as may be assessed by the same nucleotide comparison set forth above. The term “substantially complementary” may refer to a nucleic acid comprising at least one sequence of consecutive nucleobases, or semiconsecutive nucleobases if one or more nucleobase moieties are not present in the molecule, are capable of hybridizing to at least one nucleic acid strand or duplex even if less than all nucleobases do not base pair with a counterpart nucleobase. In certain embodiments, a “substantially complementary” nucleic acid contains at least one sequence in which about 70%, about 71%, about 72%, about 73%, about 74%, about 75%, about 76%, about 77%, about 77%, about 78%, about 79%, about 80%, about 81%, about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, to about 100%, and any range therein, of the nucleobase sequence is capable of base-pairing with at least one single or double-stranded nucleic acid molecule during hybridization. In certain embodiments, the term “substantially complementary” refers to at least one nucleic acid that may hybridize to at least one nucleic acid strand or duplex in stringent conditions. In certain embodiments, a “partially complementary” nucleic acid comprises at least one sequence that may hybridize in low stringency conditions to at least one single or double-stranded nucleic acid, or contains at least one sequence in which less than about 70% of the nucleobase sequence is capable of base-pairing with at least one single or double-stranded nucleic acid molecule during hybridization.
The term “non-complementary” refers to nucleic acid sequence that lacks the ability to form at least one Watson-Crick base pair through specific hydrogen bonds.
The term “ligase” as used herein refers to an enzyme that is capable of joining the 3′ hydroxyl terminus of one nucleic acid molecule to a 5′ phosphate terminus of a second nucleic acid molecule to form a single molecule. The ligase may be a DNA ligase or RNA ligase. Examples of DNA ligases include E. coli DNA ligase, T4 DNA ligase, and mammalian DNA ligases.
“Sample” means a material obtained or isolated from a fresh or preserved biological sample or synthetically-created source that contains nucleic acids of interest. In certain embodiments, a sample is the biological material that contains the variable immune region(s) for which data or information are sought. Samples can include at least one cell, fetal cell, cell culture, tissue specimen, blood, serum, plasma, saliva, urine, tear, vaginal secretion, sweat, lymph fluid, cerebrospinal fluid, mucosa secretion, peritoneal fluid, ascites fluid, fecal matter, body exudates, umbilical cord blood, chorionic villi, amniotic fluid, embryonic tissue, multicellular embryo, lysate, extract, solution, or reaction mixture suspected of containing immune nucleic acids of interest. Samples can also include non-human sources, such as non-human primates, rodents and other mammals, other animals, plants, fungi, bacteria, and viruses.
As used herein in relation to a nucleotide sequence, “substantially known” refers to having sufficient sequence information in order to permit preparation of a nucleic acid molecule, including its amplification. This will typically be about 100%, although in some embodiments some portion of an adaptor sequence is random or degenerate. Thus, in specific embodiments, substantially known refers to about 50% to about 100%, about 60% to about 100%, about 70% to about 100%, about 80% to about 100%, about 90% to about 100%, about 95% to about 100%, about 97% to about 100%, about 98% to about 100%, or about 99% to about 100%.
The technology herein includes kits for creating libraries of target nucleic acids in a sample. A “kit” refers to a combination of physical elements. For example, a kit may include, for example, one or more components, such as Sinks and Probes, either with or without protector oligonucleotides, as well as specific primers, enzymes, reaction buffers, an instruction sheet, and other elements useful to practice the technology described herein. These physical elements can be arranged in any way suitable for carrying out the disclosure.
The components of the kits may be packaged either in aqueous media or in lyophilized form. The container means of the kits will generally include at least one vial, test tube, flask, bottle, syringe or other container means, into which a component may be placed, and preferably, suitably aliquoted (e.g., aliquoted into the wells of a microtiter plate). Where there is more than one component in the kit, the kit also will generally contain a second, third or other additional container into which the additional components may be separately placed. However, various combinations of components may be comprised in a single vial. The kits of the present disclosure also will typically include a means for containing the nucleic acids, and any other reagent containers in close confinement for commercial sale. Such containers may include injection or blow molded plastic containers into which the desired vials are retained.
A kit will also include instructions for employing the kit components as well the use of any other reagent not included in the kit. Instructions may include variations that can be implemented. It is contemplated that such reagents are embodiments of kits of the disclosure. Such kits, however, are not limited to the particular items identified above.
The following examples are included to demonstrate preferred embodiments of the disclosure. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the disclosure, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the disclosure.
A 114-plex non-pathogenic panel has been completed using both positive and negative selection. See Tables 1-3 for the full sequence list used for the 114-plex panel. See
The genomic DNA input sample consisted of 498.5 ng NA18537 cell line DNA and 1.5 ng NA18562 cell line DNA. The sample had a 0.3% allele frequency in all single nucleotide polymorphisms (SNPs) in which both NA18537 and NA18562 are homozygous but differ from each other. In a 2.2M read library, with roughly 10,000× depth per locus, there are roughly 30 variant reads per locus, as expected for the 0.3% allele frequency sample (
The distribution of fold-enrichment per locus for the 114-plex panel was analyzed. Median fold-enrichment observed was 52, and 90% of the Variants were enriched 8-fold or more (
In addition, panels using only negative selection, in which only wild-type alleles are depleted and thus the sequence of the variant need not be known, is also contemplated.
Amplicons were hybridized to variant-specific probes and wildtype-specific sinks to enrich variants of interest (
Estimation of sample VAF based on ASES VRF.
A cancer mutation panel, based on the National Comprehensive Cancer Network (NCCN) guidelines of actionable mutations, was generated. The panel covers 112 actionable mutations distributed across 42 amplicons (
Validation of the ASES cancer mutation panel on clinical cfDNA samples is shown in
All of the methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this disclosure have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the disclosure. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the disclosure as defined by the appended claims.
The following references, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference.
The present application is a continuation of U.S. application Ser. No. 16/227,790, filed Dec. 20, 2018, which claims the priority benefit of U.S. Provisional Application No. 62/608,197, filed Dec. 20, 2017, the entire contents of each of which are incorporated herein by reference.
This invention was made with government support under Grant No. R01 HG008752 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62608197 | Dec 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16227790 | Dec 2018 | US |
Child | 17735062 | US |