The present disclosure relates to methods and compositions for nucleic acid library preparation and their use in sequencing applications. In certain aspects, the present disclosure relates to methods of making a library of concatenated amplicons from a target nucleic acid. In some embodiments, the libraries disclosed and generated by the methods described herein may be useful in various downstream applications, such as analyzing and characterizing the molecular features of genomic targets. Compositions and kits for making a library of concatenated amplicons (e.g., using any of the exemplary methods described herein) are also provided.
Since the advent of “second-generation” sequencing (or next-generation sequencing), the cost of genome sequencing has precipitately dropped (Mardis, (2008) Trends Genet. 24(3):133-41). These technologies, which can produce short reads a few hundred base pairs in length, have enabled the sequencing of many new genomes along with widespread resequencing efforts to analyze genomic diversity (Schatz et al., (2010) Genome Res. 20(9):1165-73; 1000 Genomes Project Consortium, (2010) Nature 467(7319):1061-73). Although second-generation sequencing has enabled population-scale analyses of single nucleotide and other small variants, analysis of larger structural variations has proved difficult. Further, new genomes assembled de novo using second-generation technologies are often of lower quality compared with those genomes sequenced using older, more expensive methods (International Rice Genome Sequencing Project, (2005) Nature 436(7052):793-800; Lander et al., (2001) Nature 409(6822):860-921). Resequencing projects may also be limited in their analysis of structural variations, missing tens of thousands of structural variants or more per mammalian-sized genome (Chaisson et al., (2015) Nature 517(7536):608-11).
The availability of “third-generation” single-molecule sequencing technologies that are affordable for many laboratories and can produce average read lengths of more than 10,000 base pairs has enabled improved analysis of genome structure (Lee et al., (2016) “Third-generation sequencing and the future of genomics,” DOI: 10.1101/048603). With respect to structural variation analysis, long reads improve “split-read” analyses such that insertions, deletions, translocations, and other structural changes can be more readily recognized (Chaisson et al., (2015) Nature 517(7536):608-11). Single-molecule sequencing technologies can also produce more uniform coverage of the genome since as they are not as sensitive to GC- or AT-biased content as second-generation technologies, which tend to have reduced or completely absent coverage over regions with imbalanced sequence composition (Ross et al., (2013) Genome Biol. 14(5):R51). Additional advantages of single-molecule sequencing include single-molecule sensitivity and continuous or real-time readouts.
Long-read technologies, such as single-molecule real-time (SMRT®) technology (Pacific Biosciences, Menlo Park, Calif.) and nanopore-based methods (Oxford Nanopore Technologies, Oxford, UK), address several limitations of short-read sequencers. However, long-read technologies still suffer from low throughput (ranging from about 100,000 to about 10 million reads) compared to competing short-read sequencing platforms, in addition to a variable raw error rate (up to about 10-20%). Long-read technologies have also been hampered by sample and preparation methods that are not suitable for long-read sequencing, such as those for oncology and prenatal testing applications, which typically use short nucleic acid fragments such as cell-free DNA (cfDNA) or circulating tumor DNA (ctDNA) present in trace amounts in blood (Newman et al., (2014) Nat Med. 20(5):548-54). Thus, novel sample preparation strategies capable of providing long DNA templates could increase the throughput of single-molecule sequencing platforms. Such methods could also increase the versatility of these platforms to cost-effectively sequence both long and short DNA molecules.
Molecular biology methods designed to generate long DNA templates by concatenating DNA fragments into genes or gene clusters have been proposed. See, e.g., WO 2018/108328; Schlecht et al., (2017) Scientific Reports 7:5252; Kadkhodaei et al., (2016) RSC Adv. 6:66682-94; Mitani et al., (2004) BioTechniques 37(1):124-9; Ramteke et al., (2016) F1000Research 4:160; Marcozzi et al., (2019) “CyclomicsSeq a sensitive liquid biopsy genetic test real-time and cost-efficient cancer monitoring in blood”). However, current methods, such as those using Gibson Assembly to covalently link DNA fragments with complementary ends, have limitations, including (i) a requirement for a minimum fragment size; (ii) assembly of amplicons in a random order; (iii) a wide distribution of product size; (iv) the ability to only assemble up to about 5 amplicons; and/or (v) a requirement for a purification step between any amplicon synthesis and assembly reactions. Thus, there remains a need for more effective methods of library preparation, particularly those that are capable of harnessing the advantages of long-read single-molecule sequencing platforms and may also be applied to other downstream applications (e.g., gene assembly, molecular characterization of sequence variations, etc.).
The present disclosure provides, in part, novel methods and compositions for nucleic acid library preparation and improved sequencing/sequence assembly methods. In certain aspects, the present disclosure provides methods and compositions for concatenating multiple discrete amplicons into one or more longer amplicons. In certain aspects, the present disclosure provides a method of making a library of concatenated amplicons from a target nucleic acid by generating tagged amplicons from the target nucleic acid (e.g., by amplifying two or more regions of interest (ROIs)); concatenating the tagged amplicons to generate one or more concatenated amplicons; and amplifying the one or more concatenated amplicons to generate a library of concatenated amplicons. In some embodiments, each ROI is amplified with a forward primer and a reverse primer. In some embodiments, each primer comprises a 5′ tag sequence and a sequence capable of hybridizing to an ROI. In some embodiments, the 5′ tag sequence of the reverse primer for each ROI is complementary to the 5′ tag sequence of the forward primer for another ROI.
In some embodiments, amplicons are designed to enrich genomic sequences of interest (e.g., exons). In some embodiments, enrichment of such genomic sequences allows sequencing reads and/other downstream analyzers to focus on regions of interest and exclude other regions (e.g., non-coding sequences, e.g., introns). Thus, in some embodiments, enrichment may result in time and/or cost savings. In some embodiments, amplicons are concatenated in a predetermined order. In some embodiments, amplicons are concatenated such that the assembled concatemer comprises single-copy representation of each amplicon.
In some embodiments, the methods and compositions disclosed herein may be useful in various downstream applications. An exemplary application of the disclosed methods and compositions is sequencing analysis, e.g., using single-molecule sequencing. In some embodiments, the methods and compositions disclosed herein provide one or more advantages over alternate methods for nucleic acid library preparation and/or related sequencing using such a library (e.g., those using Gibson assembly for amplicon concatenation). Exemplary advantages include, without limitation: (i) no restriction on fragment size, thereby providing compatibility with short, degraded samples, such as formalin-fixed paraffin-embedded (FFPE) or cell-free DNA (liquid biopsy) samples; (ii) a self-normalizing workflow capable of generating a product with a defined size and amplicons concatenated in a uniform (e.g., 1:1) stoichiometry; (iii) ability to concatenate more amplicons (e.g., more than 5 amplicons); (iv) no requirement for a purification step between any amplicon synthesis and assembly reactions; (v) reduction in time and/or cost for sample preparation; and (vi) increased throughput for downstream applications (e.g., single-molecule sequencing, e.g., cost-effective multiple gene sequencing assays that can be configured on a single flow cell). In some embodiments, the methods and compositions disclosed herein provide effective strategies for nucleic acid library preparation that can be applied to sequencing across panels of different genes and/or markers.
In some embodiments, the methods and compositions disclosed herein increase the size of multiple discrete amplicons via amplicon concatenation. In some embodiments, the amplicon concatenation methods described herein generate concatemer templates suitably sized for downstream applications (e.g., using single-molecule sequencing). In some embodiments, the amplicon concatenation methods described herein may increase throughput of single-molecule sequencing by up to about 50-fold, up to about 100-fold, or more, as compared to alternate methods for nucleic acid library preparation. In some embodiments, the methods and compositions described herein may have advantages not only for sequencing analysis, but also for other downstream applications. Exemplary potential applications include gene assembly and molecular characterization of sequence variations (e.g., single nucleotide variants (SNV), indels, gene chimera, and copy number changes) within target loci, e.g., using analyzers other than single-molecule sequencing platforms.
In some embodiments, the present disclosure provides a method of making a library of concatenated amplicons from a target nucleic acid, the method comprising:
In some embodiments, amplifying two or more ROIs comprises polymerase chain reaction (PCR) or isothermal amplification. In some embodiments, amplifying two or more ROIs comprises PCR. In some embodiments, amplifying two or more ROIs comprises multiplex PCR. In some embodiments, PCR and/or multiplex PCR comprises magnesium in a working concentration of about 0.5 mM to about 4 mM. In some embodiments, PCR and/or multiplex PCR comprises magnesium in a working concentration of about 1 mM to about 3.5 mM. In some embodiments, PCR and/or multiplex PCR comprises magnesium in a working concentration of about 1.5 mM to about 3 mM. In some embodiments, PCR and/or multiplex PCR comprises dimethyl sulfoxide (DMSO) in a working concentration of about 1% to about 8% by volume (v/v). In some embodiments, PCR and/or multiplex PCR comprises DMSO in a working concentration of about 3% to about 6% by volume. In some embodiments, PCR and/or multiplex PCR comprises a pH of about 8 to about 10. In some embodiments, PCR and/or multiplex PCR comprises a pH of about 8.5 to about 9.2.
In some embodiments, amplifying two or more ROIs comprises amplifying at least two, at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more, e.g., at least 12, or at least 14 ROIs. In some embodiments, each ROI is about 2, about 5, about 10, about 20, about 30, about 40, about 50, about 100, about 150, about 200, about 250, about 500, about 1,000, about 2,000, about 5,000, or about 10,000 nucleotides in length.
In some embodiments, the working concentration of one or more primers in step (i) is about 1 nM to about 5,000 nM (e.g., about 10 nM to about 100 nM, e.g., about 30 nM). In some embodiments, the working concentration of one or more primers in step (i) is about 10 nM to about 100 nM (e.g., about 30 nM). In some embodiments, the working concentration of one or more primers in step (i) is about 30 nM.
In some embodiments, one or more primers in step (i) are depleted prior to concatenating the tagged amplicons. In some embodiments, one or more primers in step (i) are selected to prevent formation of one or more primer dimers. In some embodiments, the one or more primers lack 5 or more (e.g., 5, 6, 7, 8, or more) exactly-matched bases at the 3′ end of the primer sequences. In some embodiments, the one or more primers prevent formation of one or more primer dimers (e.g., one or more exponential amplifiable primer dimers). In some embodiments, the one or more primers lack 7 or more (e.g., 7. 8, 9, 10, or more) exactly-matched bases at the 3′ end of the primer sequences. In some embodiments, the one or more primers prevent formation of one or more primer dimers (e.g., one or more linear amplifiable primer dimers). In some embodiments, one or more primers in step (i) comprise minimal sequence that is capable of hybridizing to an ROI and also complementary to a sequence in another primer. In some embodiments, the minimal sequence is about 6 to about 100 nucleotides in length, e.g., about 6 to about 50 or about 15 to about 30 nucleotides in length, e.g., about 18 to about 20 nucleotides in length. In some embodiments, the minimal sequence is about 6 to about 50 nucleotides in length, e.g., about 6 to about 30 or about 15 to about 30 nucleotides in length, e.g., about 18 to about 20 nucleotides in length. In some embodiments, the minimal sequence is about 6 to about 30 nucleotides in length. In some embodiments, the minimal sequence is about 4 to about 40, about 5 to about 35, or about 6 to about 30 nucleotides in length. In some embodiments, the minimal sequence is about 10, about 15, about 20, about 25, about 30, or about 35 nucleotides in length. In some embodiments, the minimal sequence is about 15 to about 30 nucleotides in length. In some embodiments, the minimal sequence is about 18 to about 20 nucleotides in length. In some embodiments, the minimal sequence is at least about 4, about 5, about 6, about 7, about 8, about 9, or about 10 nucleotides in length. In some embodiments, the minimal sequence is at least about 6 nucleotides in length.
In some embodiments, one or more primers in step (i) are selected to minimize formation of one or more dead-end intermediate products. In some embodiments, the one or more dead-end intermediate products cannot form one or more concatenated amplicons. In some embodiments, one or more primers in step (i) comprise at least one adenine between the 5′ tag sequence and the sequence capable of hybridizing to the ROI. In some embodiments, one or more primers in step (i) comprise a 5′ phosphate. In some embodiments, one or more primers in step (i) comprise a molecular barcode. In some embodiments, the 5′ tag sequence in one or more primers is an artificial tag sequence. In some embodiments, the artificial tag sequence is not homologous to a human genome sequence.
In some embodiments, the tagged amplicons are not purified prior to concatenation. In some embodiments, concatenating the tagged amplicons comprises providing a DNA polymerase. In some embodiments, the DNA polymerase has 3′ to 5′ exonuclease activity. In some embodiments, the DNA polymerase is a high-fidelity DNA polymerase. In some embodiments, the DNA polymerase is a Q5, Pfu, or Kapa HiFi HotStart DNA polymerase. In some embodiments, concatenating the tagged amplicons comprises providing at least one adjuvant. In some embodiments, the at least one adjuvant comprises TMAC, ThermaGo, and/or ThermaStop.
In some embodiments, concatenating the tagged amplicons comprises concatenating at least two, at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 tagged amplicons. In some embodiments, each tagged amplicon is about 50, about 100, about 150, about 200, about 250, about 500, about 1,000, about 2,000, about 5,000, or about 10,000 nucleotides in length. In some embodiments, the total length of the one or more concatenated amplicons is about 2,000 to about 50,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 2,000 to about 20,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 10,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 5,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 3,000 to about 4,000 nucleotides.
In some embodiments, the one or more concatenated amplicons are in a predetermined order. In some embodiments, the predetermined order results from the tag sequences in the primers. In some embodiments, the 5′ tag sequence of the reverse primer for each ROI is complementary to the 5′ tag sequence of the forward primer for the ROI immediately downstream. In some embodiments, the order of the one or more concatenated amplicons is identical to the order of the corresponding ROIs in the target nucleic acid.
In some embodiments, the one or more concatenated amplicons comprise single-copy representation of each tagged amplicon. In some embodiments, the ratio of the one or more concatenated amplicons to the corresponding ROIs in the target nucleic acid is about 1 to 1.
In some embodiments, amplifying the one or more concatenated amplicons comprises PCR and/or multiplex PCR. In some embodiments, the PCR and/or multiplex PCR conditions comprise magnesium. In some embodiments, the magnesium is in a working concentration of about 0.5 mM to about 4 mM. In some embodiments, PCR and/or multiplex PCR comprises magnesium, e.g., in a working concentration of about 1 mM to about 3.5 mM. In some embodiments, PCR and/or multiplex PCR comprises magnesium in a working concentration of about 1.5 mM to about 3 mM. In some embodiments, the PCR and/or multiplex PCR conditions comprise DMSO. In some embodiments, the DMSO is in a working concentration of about 1% to about 8% by volume. In some embodiments, PCR and/or multiplex PCR comprises DMSO in a working concentration of about 3% to about 6% by volume. In some embodiments, the PCR and/or multiplex PCR conditions comprise a pH of about 8 to about 10. In some embodiments, PCR and/or multiplex PCR comprises a pH of about 8.5 to about 9.2.
In some embodiments, amplifying the one or more concatenated amplicons comprises a first end primer capable of hybridizing to a tag sequence at the 5′ end of a concatenated amplicon and a second end primer capable of hybridizing to a tag sequence at the 3′ end of a concatenated amplicon. In some embodiments, the tag sequence at the 5′ end of the concatenated amplicon is identical to or overlaps with the 5′ tag sequence of a forward primer used to amplify an ROI in step (i). In some embodiments, the tag sequence at the 3′ end of the concatenated amplicon is identical to or overlaps with the 5′ tag sequence of a reverse primer used to amplify an ROI in step (i). In some embodiments, the first end primer and the second end primer are added in any one of steps (i)-(iii). In some embodiments, the first end primer and the second end primer are added in step (i). In some embodiments, the first end primer and the second end primer are added in step (ii) or step (iii).
In some embodiments, a method described herein (e.g., a method of making a library of concatenated amplicons) further comprises analyzing a library of concatenated amplicons. In some embodiments, analyzing comprises sequencing, gene assembly, and/or structural variation characterization.
In some embodiments, sequencing comprises single-molecule sequencing. In some embodiments, sequencing comprises long-read sequencing. In some embodiments, sequencing comprises sequencing about 800 nucleotides or longer. In some embodiments, sequencing comprises nanopore sequencing or single-molecule real-time (SMRT) sequencing. In some embodiments, structural variation characterization comprises detecting or quantifying single nucleotide variants (SNV), repeat sequences, indels, gene chimera, and/or gene copy number. In some embodiments, detecting or quantifying gene copy number comprises detecting or quantifying one or more molecular barcodes. In some embodiments, the one or more molecular barcodes are in one or more primers in step (i). In some embodiments, detecting or quantifying gene copy number comprises using and/or comparing to an external spiking control. In some embodiments, the external spiking control comprises a synthetic gBlock control. In some embodiments, structural variation characterization comprises labeling and/or direct imaging.
In some embodiments, a target nucleic acid comprises one or more genes or a multiple gene panel. In some embodiments, the one or more genes comprise a human gene. In some embodiments, the human gene is a human disease gene. In some embodiments, the human gene is a human cancer gene. In some embodiments, the one or more genes comprise CFTR, SMN1, SMN2, KRAS, BRAF, PIK3C, EGFR, and/or ERBB2. In some embodiments, the human gene is a human gene with high modeled fetal disease risk (MFDR). In some embodiments, the one or more genes comprise SMN1, SMN2, FMR1, HBA1, HBA2, and/or GBA. In some embodiments, the one or more genes comprise CFTR, FMR1, SMN1, SMN2, IKBKAP, ABCC8, FANCC, GALT, GBA, G6PC, HBA1, HBA2, HBB, BLM, ASPA, TMEM216, BCKDHA, BCKDHB, ACADM, MCOLN1, NEB, SMPD1, F8, HEXA, PCDH15, DMD, CYP21A2, and/or CLRN1. In some embodiments, the one or more genes comprise CFTR, FMR1, SMN1, and/or SMN2.
In some embodiments, a target nucleic acid is used in a multiple gene panel. In some embodiments, the multiple gene panel is a newborn or carrier screening panel. In some embodiments, the multiple gene panel comprises a human gene. In some embodiments, the multiple gene panel comprises at least about 20 human genes (e.g., at least about 22 human genes). In some embodiments, the multiple gene panel comprises at least about 22 human genes. In some embodiments, the human gene is a human disease gene. In some embodiments, the human gene is a human cancer gene. In some embodiments, the multiple gene panel comprises CFTR, SMN1, SMN2, KRAS, BRAF, PIK3C, EGFR, and/or ERBB2. In some embodiments, the human gene is a human gene with high modeled fetal disease risk (MFDR) In some embodiments, the multiple gene panel comprises SMN1, SMN2, FMR1, HBA1, HBA2, and/or GBA. In some embodiments, the multiple gene panel comprises CFTR, FMR1, SMN1, SMN2, IKBKAP, ABCC8, FANCC, GALT, GBA, G6PC, HBA1, HBA2, HBB, BLM, ASPA, TMEM216, BCKDHA, BCKDHB, ACADM, MCOLN1, NEB, SMPD1, F8, HEXA, PCDH15, DMD, CYP21A2, and/or CLRN1. In some embodiments, the multiple gene panel comprises CFTR, FMR1, SMN1, and/or SMN2.
In some embodiments, a target nucleic acid is from a biological sample (e.g., a liquid and/or biopsy sample). In some embodiments, the biological sample comprises a blood sample. In some embodiments, the biological sample comprises a buccal sample. In some embodiments, the biological sample comprises a biopsy sample. In some embodiments, the biopsy sample comprises frozen tissue or formalin-fixed paraffin-embedded (FFPE) tissue. In some embodiments, the biopsy sample comprises a liquid biopsy sample. In some embodiments, the liquid biopsy sample comprises cell-free DNA or DNA from circulating tumor cells (i.e., circulating tumor DNA (ctDNA)).
The present disclosure further provides, in some embodiments, a library of concatenated amplicons, wherein the library is made by:
Further provided herein, in some embodiments, is a method of selecting a set of primers capable of amplifying two or more regions of interest (ROIs) from a target nucleic acid, comprising selecting a forward primer and a reverse primer for each ROI, wherein each primer comprises a 5′ tag sequence and a sequence capable of hybridizing to the ROI, and wherein:
Further provided herein, in some embodiments, is a kit comprising a set of primers and instructions for use of the primers in amplifying two or more regions of interest (ROIs) from a target nucleic acid, wherein the set of primers comprises a forward primer and a reverse primer for each ROI, wherein each primer comprises a 5′ tag sequence and a sequence capable of hybridizing to the ROI, and wherein:
In some embodiments of the methods and compositions (e.g., libraries, kits) described herein, one or more primers (e.g., all primers) comprise minimal sequence that is capable of hybridizing to an ROI. In some embodiments, one or more primers (e.g., all primers) comprise minimal sequence that is complementary to a sequence in another primer. In some embodiments, one or more primers (e.g., all primers) comprise minimal sequence that is capable of hybridizing to an ROI and also complementary to a sequence in another primer. In some embodiments, the minimal sequence is about 6 to about 100 nucleotides in length, e.g., about 6 to about 50 or about 15 to about 30 nucleotides in length, e.g., about 18 to about 20 nucleotides in length. In some embodiments, the minimal sequence is about 6 to about 50 nucleotides in length, e.g., about 6 to about 30 or about 15 to about 30 nucleotides in length, e.g., about 18 to about 20 nucleotides in length. In some embodiments, the minimal sequence is about 6 to about 30 nucleotides in length. In some embodiments, the minimal sequence is about 4 to about 40, about 5 to about 35, or about 6 to about 30 nucleotides in length. In some embodiments, the minimal sequence is about 10, about 15, about 20, about 25, about 30, or about 35 nucleotides in length. In some embodiments, the minimal sequence is about 15 to about 30 nucleotides in length. In some embodiments, the minimal sequence is about 18 to about 20 nucleotides in length. In some embodiments, the minimal sequence is at least about 4, about 5, about 6, about 7, about 8, about 9, or about 10 nucleotides in length. In some embodiments, the minimal sequence is at least about 6 nucleotides in length. In some embodiments, one or more primers comprise at least one adenine between the 5′ tag sequence and the sequence capable of hybridizing to the ROI. In some embodiments, one or more primers comprise a 5′ phosphate. In some embodiments, one or more primers comprise a molecular barcode. In some embodiments, the artificial tag sequence is not homologous to a human genome sequence.
Also provided herein, in some embodiments, is a method of sequencing a library of concatenated amplicons, wherein the library of concatenated amplicons is made by any of the exemplary methods described herein.
Also provided herein, in some embodiments, is a method of sequencing a target nucleic acid, the method comprising:
In some embodiments of the methods (e.g., the sequencing methods) described herein, amplifying two or more ROIs comprises amplifying at least two, at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more, e.g., at least 12, or at least 14 ROIs. In some embodiments, each ROI is about 2, about 5, about 10, about 20, about 30, about 40, about 50, about 100, about 150, about 200, about 250, about 500, about 1,000, about 2,000, about 5,000, or about 10,000 nucleotides in length.
In some embodiments, concatenating the tagged amplicons comprises concatenating at least two, at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 tagged amplicons. In some embodiments, each tagged amplicon is about 50, about 100, about 150, about 200, about 250, about 500, about 1,000, about 2,000, about 5,000, or about 10,000 nucleotides in length. In some embodiments, the total length of the one or more concatenated amplicons is about 2,000 to about 50,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 2,000 to about 20,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 10,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 5,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 3,000 to about 4,000 nucleotides.
In some embodiments, the one or more concatenated amplicons are in a predetermined order. In some embodiments, the predetermined order results from the tag sequences in the primers. In some embodiments, the 5′ tag sequence of the reverse primer for each ROI is complementary to the 5′ tag sequence of the forward primer for the ROI immediately downstream. In some embodiments, the order of the one or more concatenated amplicons is identical to the order of the corresponding ROIs in the target nucleic acid.
In some embodiments, the one or more concatenated amplicons comprise single-copy representation of each tagged amplicon. In some embodiments, the ratio of the one or more concatenated amplicons to the corresponding ROIs in the target nucleic acid is about 1 to 1.
In some embodiments, sequencing comprises single-molecule sequencing. In some embodiments, sequencing comprises long-read sequencing. In some embodiments, sequencing comprises sequencing about 800 nucleotides or longer. In some embodiments, sequencing comprises nanopore sequencing or single-molecule real-time (SMRT) sequencing.
In some embodiments, a method described herein (e.g., a method of sequencing a target nucleic acid) further comprises analyzing a library of concatenated amplicons before, during, or after sequencing. In some embodiments, analyzing comprises gene assembly and/or structural variation characterization. In some embodiments, structural variation characterization comprises detecting or quantifying single nucleotide variants (SNV), repeat sequences, indels, gene chimera, and/or gene copy number. In some embodiments, detecting or quantifying gene copy number comprises detecting or quantifying one or more molecular barcodes. In some embodiments, the one or more molecular barcodes are in one or more primers in step (i). In some embodiments, detecting or quantifying gene copy number comprises using and/or comparing to an external spiking control. In some embodiments, the external spiking control comprises a synthetic gBlock control. In some embodiments, structural variation characterization comprises labeling and/or direct imaging.
In some embodiments, a target nucleic acid comprises one or more genes or a multiple gene panel. In some embodiments, the one or more genes comprise a human gene. In some embodiments, the human gene is a human disease gene. In some embodiments, the human gene is a human cancer gene. In some embodiments, the one or more genes comprise CFTR, SMN1, SMN2, KRAS, BRAF, PIK3C, EGFR, and/or ERBB2. In some embodiments, the human gene is a human gene with high modeled fetal disease risk (MFDR). In some embodiments, the one or more genes comprise SMN1, SMN2, FMR1, HBA1, HBA2, and/or GBA. In some embodiments, the one or more genes comprise CFTR, FMR1, SMN1, SMN2, IKBKAP, ABCC8, FANCC, GALT, GBA, G6PC, HBA1, HBA2, HBB, BLM, ASPA, TMEM216, BCKDHA, BCKDHB, ACADM, MCOLN1, NEB, SMPD1, F8, HEXA, PCDH15, DMD, CYP21A2, and/or CLRN1. In some embodiments, the one or more genes comprise CFTR, FMR1, SMN1, and/or SMN2.
In some embodiments, a target nucleic acid is used in a multiple gene panel. In some embodiments, the multiple gene panel is a newborn or carrier screening panel. In some embodiments, the multiple gene panel comprises a human gene. In some embodiments, the multiple gene panel comprises at least about 20 human genes (e.g., at least about 22 human genes). In some embodiments, the multiple gene panel comprises at least about 22 human genes. In some embodiments, the human gene is a human disease gene. In some embodiments, the human gene is a human cancer gene. In some embodiments, the multiple gene panel comprises CFTR, SMN1, SMN2, KRAS, BRAE, PIK3C, EGFR, and/or ERBB2. In some embodiments, the human gene is a human gene with high modeled fetal disease risk (MFDR). In some embodiments, the multiple gene panel comprises SMN1, SMN2, FMR1, HBA1, HBA2, and/or GBA. In some embodiments, the multiple gene panel comprises CFTR, FMR1, SMN1, SMN2, IKBKAP, ABCC8, FANCC, GALT, GBA, G6PC, HBA1, HBA2, HBB, BLM, ASPA, TMEM216, BCKDHA, BCKDHB, ACADM, MCOLN1, NEB, SMPD1, F8, HEXA, PCDH15, DMD, CYP21A2, and/or CLRN1. In some embodiments, the multiple gene panel comprises CFTR, FMR1, SMN1, and/or SMN2.
In order that the disclosure may be more readily understood, certain terms are defined throughout the detailed description. Unless defined otherwise herein, all scientific and technical terms used in connection with the present disclosure have the same meaning as commonly understood by those of ordinary skill in the art.
All references cited herein are also incorporated by reference in their entirety. To the extent a cited reference conflicts with the disclosure herein, the specification shall control.
As used herein, the singular forms of a word also include the plural form, unless the context clearly dictates otherwise. As examples, the terms “a,” “an,” and “the” are understood to be singular or plural. Likewise, “an element” means one or more element. The term “or” shall mean “and/or” unless the specific context indicates otherwise. All ranges include the endpoints and all points in between unless the context indicates otherwise.
The term “about” or “approximately,” as used herein in the context of numerical values and ranges, refers to values or ranges that approximate or are close to the recited values or ranges such that the embodiment may perform as intended, as is apparent to the skilled person from the teachings contained herein. Thus, these terms encompass values beyond those resulting from systematic error. In some embodiments, “about” or “approximately” means plus or minus 10% of a numerical amount.
In certain aspects, the present disclosure provides methods and compositions for nucleic acid library preparation. In certain aspects, the methods and compositions disclosed herein are used in various downstream applications (e.g., single-molecule sequencing, gene assembly, structural variation characterization, etc,).
In some embodiments, the methods and compositions disclosed herein relate to the concatenation of multiple discrete amplicons into one or more longer amplicons. In some embodiments, the methods disclosed herein comprise generating tagged amplicons, concatenating tagged amplicons, and/or amplifying one or more concatenated amplicons. In some embodiments, generating tagged amplicons comprises amplifying two or more regions of interest (ROIs) from a target nucleic acid, e.g., using tagged, gene-specific primers. In some embodiments, generating tagged amplicons comprises PCR (e.g., multiplex PCR, e.g., multiplex overlap extension (MOE)-PCR).
In some embodiments, the tagged amplicons are assembled by concatenation into one or more longer amplicons. In some embodiments, the one or more concatenated amplicons comprise multiple shorter amplicons in a predetermined order. In some embodiments, the predetermined order results from the tag sequences in the gene-specific primers used for amplification. In some embodiments, the one or more concatenated amplicons comprise single-copy representation (e.g., a defined unitary copy number) of each tagged amplicon. In some embodiments, the methods and related compositions (e.g., libraries, kits) disclosed herein offer one or more benefits for nucleic acid library preparation, including but not limited to increased simplicity, scale, and/or specificity. In some embodiments, the methods and related compositions (e.g., libraries, kits) disclosed herein may be useful in various downstream applications, such as sequencing (e.g., single-molecule sequencing, e.g., nanopore sequencing or single-molecule real-time (SMRT) sequencing). Other exemplary applications for the disclosed methods and compositions include, without limitation, gene assembly and molecular characterization of sequence variations (e.g., single nucleotide variants (SNV), indels, gene chimera, and copy number changes).
An exemplary embodiment is a method of making a library of concatenated amplicons from a target nucleic acid, the method comprising:
Another exemplary embodiment is a library of concatenated amplicons, wherein the library is made by:
Another exemplary embodiment is a method of selecting a set of primers capable of amplifying two or more regions of interest (ROIs) from a target nucleic acid, comprising selecting a forward primer and a reverse primer for each ROI, wherein each primer comprises a 5′ tag sequence and a sequence capable of hybridizing to the ROI, and wherein:
Another exemplary embodiment is a kit comprising a set of primers and instructions for use of the primers in amplifying two or more regions of interest (ROIs) from a target nucleic acid, wherein the set of primers comprises a forward primer and a reverse primer for each ROI, wherein each primer comprises a 5′ tag sequence and a sequence capable of hybridizing to the ROI, and wherein:
Also provided herein, in certain aspects, are methods of using the methods and compositions disclosed herein. For instance, in some embodiments, a library of concatenated amplicons (e.g., a library described herein and/or generated using any of the exemplary methods described herein) can be analyzed. In some embodiments, analyzing comprises sequencing, gene assembly, and/or structural variation characterization.
An exemplary embodiment is method of sequencing a library of concatenated amplicons, wherein the library of concatenated amplicons is made by any of the exemplary methods described herein.
Another exemplary embodiment is a method of sequencing a target nucleic acid, the method comprising:
As used herein, the term “region of interest” or “ROI” refers to a nucleic acid (e.g., a genomic sequence, gene, gene fragment, or other nucleic acid of interest) that is analyzed (e.g., using any of the exemplary methods described herein). In some embodiments, an ROI is a portion of a genome or region of genomic DNA. In some embodiments, an ROI comprises or consists of an exon or multiple exons. In some embodiments, an ROI comprises or consists of a portion of an exon. In some embodiments, an ROI comprises more than one ROI. In some embodiments, an ROI may be a template for an amplification reaction (e.g., PCR, e.g., multiplex PCR). In some embodiments, an ROI may be split into two or more amplicons. In some embodiments, amplifying an ROI from a target nucleic acid yields one amplicon (e.g., one tagged amplicon). In some embodiments, amplifying an ROI yields two, 3, 4, or 5, or more, amplicons (e.g., two, 3, 4, or 5, or more, tagged amplicons). In some embodiments, amplifying an ROI yields two amplicons (e.g., two tagged amplicons). In some embodiments, the methods disclosed herein comprise amplifying two or more ROIs from a target nucleic acid. In some embodiments, the methods disclosed herein comprise amplifying at least two, at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 ROIs from a target nucleic acid. In some embodiments, the methods disclosed herein comprise amplifying at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more, e.g., at least 12, or at least 14 ROIs from a target nucleic acid.
The term “nucleic acid” is used herein interchangeably with the term “polynucleotide,” and refers to a polymer of nucleotides (e.g., ribonucleotides and deoxyribonucleotides, both natural and non-natural) including DNA, RNA, and their subcategories, such as cDNA, mRNA, etc. A nucleic acid may be single-stranded or double-stranded and generally contains 5-3′ phosphodiester bonds, although in some cases, nucleotide analogs may have other linkages. Nucleic acids may include naturally occurring bases (adenosine, guanosine, cytosine, uracil and thymidine), as well as non-natural bases. Non-natural bases may have a particular function, e.g., increasing the stability of a nucleic acid duplex, inhibiting nuclease digestion, or blocking primer extension or strand polymerization. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. In some embodiments, degenerate codon substitutions may be achieved in a nucleic acid by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., (1991) Nucleic Acids Res. 25(19):5081; Ohtsuka et al., (1985) J Biol Chem. 260(5):2605-8; Rossolini et al., (1994) Mol Cell Probes 8(2):91-8). In some embodiments, a nucleic acid is a target nucleic acid.
As used herein, the terms “target nucleic acid,” “target sequence,” and “target” are used herein interchangeably to refer to any nucleic acid of interest, or a portion thereof, which is to be amplified, detected, and/or analyzed. The terms also include all variants of a target sequence. In some embodiments, a target nucleic acid is a gene or a gene fragment. In some embodiments, a target nucleic acid is or comprises non-coding sequence(s). In some embodiments, a target nucleic acid is an entire genome, including all genes, gene fragments, and intergenic regions (entire genome). In some embodiments, a target nucleic acid is a portion of a genome, e.g., only the coding regions of a genome (exome). In some embodiments, a target nucleic acid contains a locus of a genetic variant, e.g., a polymorphism, including a single nucleotide polymorphism or variant (SNP or SNV), or a genetic rearrangement resulting, e.g., in a gene fusion. In some embodiments, a target nucleic acid comprises a biomarker, i.e., a gene whose variants are associated with a disease or condition (e.g., a cancer). In some embodiments, a target nucleic acid comprises DNA. The DNA can be, e.g., genomic DNA, mitochondrial DNA, viral DNA, synthetic DNA, or cDNA reverse transcribed from RNA. In some embodiments, the DNA is genomic DNA. In some embodiments, a target nucleic acid is naturally fragmented, e.g., circulating cell-free DNA (cfDNA) or chemically degraded DNA, such as DNA typically found in chemically preserved or archived samples.
The term “amplicon,” as used herein, refers to a nucleic acid generated via an amplification reaction (e.g., PCR or isothermal amplification). An amplicon is typically double-stranded DNA; however, it may be RNA and/or DNA:RNA. In some embodiments, an amplicon comprises DNA complementary to a template nucleic acid (e.g., a target nucleic acid). In some embodiments, one or more primer pairs are selected and/or designed to generate one or more amplicons from a template nucleic acid. As such, in some embodiments, an amplicon comprises the primer pair, the complement of the primer pair, and the region of a template nucleic acid that was amplified to generate the amplicon. In some embodiments, an amplicon further comprises a tag sequence. An amplicon comprising a tag sequence may be referred to herein as a “tagged amplicon.”
As used herein, the term “library” refers to a plurality of nucleic acids. In some embodiments, a library is a library of concatenated amplicons. In some embodiments, a library comprises one or more concatenated amplicons. In some embodiments, a library comprises up to about 200 concatenated amplicons, e.g., about 1 to about 200, about 1 to about 150, about 1 to about 100, about 1 to about 50, about 1 to about 20, or about 1 to about 10 concatenated amplicons. In some embodiments, a library comprises up to about 100 concatenated amplicons, e.g., about 1 to about 100, about 1 to about 50, about 1 to about 20, or about 1 to about 10 concatenated amplicons. In some embodiments, a library comprises up to about 50 concatenated amplicons, e.g., about 1 to about 50, about 1 to about 20, or about 1 to about 10 concatenated amplicons. In some embodiments, a library comprises up to about 20 concatenated amplicons, e.g., about 1, about 5, about 10, about 15, or about 20 concatenated amplicons.
The terms “amplify,” “amplifying,” and “amplification,” as used herein in the context of nucleic acids, refer to the production of one or more copies of a polynucleotide, or a portion of the polynucleotide (e.g., starting from a small amount of the polynucleotide (e.g., a single polynucleotide molecule)), wherein the amplification products or amplicons are generally detectable. Amplification of polynucleotides encompasses a variety of chemical and enzymatic processes. Exemplary forms of amplification include the generation of multiple DNA copies from one or a few copies of a target or template DNA molecule during, e.g., a polymerase chain reaction (PCR) or isothermal amplification. In some embodiments, the amplification reaction is PCR (e.g., multiplex PCR). In some embodiments, the amplification reaction is multiplex PCR. In some embodiments, the amplification reaction is isothermal amplification.
In some embodiments, amplifying two or more ROIs comprises PCR or isothermal amplification. In some embodiments, amplifying two or more ROIs comprises PCR. In some embodiments, amplifying two or more ROIs comprises multiplex PCR.
The term “polymerase chain reaction” or “PCR,” as used herein, refers to a DNA synthesis reaction capable of amplifying a DNA template. A typical PCR reaction mixture comprises primer sequences which are complementary to the ends of a desired template, deoxynucleotide triphosphates (dNTPs), various buffer components, and a DNA polymerase. In general, the reaction mixture is admixed with a DNA sample known or suspected of harboring the desired template. The resulting mixture is then subjected to repeated cycles of template denaturation, primer annealing to the denatured template, and primer extension by the DNA polymerase, to create copies of the template. Because the product of each cycle can act as a template for subsequent reaction cycles, amplification generally proceeds in an exponential fashion (see, e.g., U.S. Pat. No. 4,683,202, and McPherson & Moller, PCR: The Basics (2nd Ed., Taylor & Francisco) (2006)). Variations to this exemplary technique are known in the art and encompassed in the term PCR as used herein.
The term “multiplex PCR,” as used herein, refers to an amplification reaction capable of amplifying multiple DNA templates in parallel (e.g., in a single-tube PCR). In multiplex PCR, more than one target sequence can be amplified, e.g., by using multiple primer pairs in the reaction mixture. Thus, in some embodiments, a plurality of PCR products (i.e., amplicons) can be produced. Multiplex PCR can be broadly divided into single template PCR reactions, and multiple template PCR reactions. A single template PCR reaction may use a single template (e.g., genomic DNA) together with several pairs of forward and reverse primers to amplify specific regions within the template. A multiple template PCR reaction may use multiple templates and several primer sets in the same reaction tube. In some embodiments, multiplex PCR comprises a single template PCR reaction. In some embodiments, multiplex PCR comprises a multiple template reaction. In some embodiments, multiplex PCR is multiplex overlap extension (MOE)-PCR (see, e.g., Kadkhodaei et al., (2016) RSC Adv. 6:66682-94).
In some embodiments, PCR and/or multiplex PCR comprises magnesium, e.g., in a working concentration of about 0.5 mM to about 4 mM. In some embodiments, PCR and/or multiplex PCR comprises magnesium in a working concentration of about 1 mM to about 3.5 mM (e.g., about 0.8 mM, about 0.9 mM, about 1 mM, about 1.1 mM, about 1.2 mM, about 1.3 mM, about 1.4 mM, about 1.5 mM, about 1.6 mM, about 1.7 mM, about 1.8 mM, about 1.9 mM, about 2 mM, about 2.1 mM, about 2.2 mM, about 2.3 mM, about 2.4 mM, about 2.5 mM, about 2.6 mM, about 2.7 mM, about 2.8 mM, about 2.9 mM, about 3 mM, about 3.1 mM, about 3.2 mM, about 3.3 mM, about 3.4 mM, about 3.5 mM, about 3.6 mM, or about 3.7 mM). In some embodiments, PCR and/or multiplex PCR comprises magnesium in a working concentration of about 1.5 mM to about 3 mM (e.g., about 1.3 mM, about 1.4 mM, about 1.5 mM, about 1.6 mM, about 1.7 mM, about 1.8 mM, about 1.9 mM, about 2 mM, about 2.1 mM, about 2.2 mM, about 2.3 mM, about 2.4 mM, about 2.5 mM, about 2.6 mM, about 2.7 mM, about 2.8 mM, about 2.9 mM, about 3 mM, about 3.1 nM, or about 3.2 nM).
In some embodiments, PCR and/or multiplex PCR comprises dimethyl sulfoxide (DMSO), e.g., in a working concentration of about 1% to about 8% by volume (v/v) (e.g., about 0.8%, about 0.9%, about 1%, about 1.5%, about 2%, about 2.5%, about 3%, about 3.5%, about 4%, about 4.5%, about 5%, about 5.5%, about 6%, about 6.5%, about 7%, about 7.5%, about 8%, about 8.1%, or about 8.2% by volume). In some embodiments, PCR and/or multiplex PCR comprises DMSO in a working concentration of about 3% to about 6% by volume (e.g., about 2.8%, about 2.9%, about 3%, about 3,1%, about 3.2%, about 3.3%, about 3.4%, about 3.5%, about 3.6%, about 3.7%, about 3.8%, about 3.9%, about 4%, about 4.1%, about 4.2%, about 4.3%, about 4.4%, about 4.5%, about 4.6%, about 4.7%, about 4.8%, about 4.9%, about 5%, about 5.1%, about 5.2%, about 5.3%, about 5.4%, about 5.5%, about 5.6%, about 5.7%, about 5.8%, about 5.9%, about 6%, about 6.1%, or about 6.2% by volume).
In some embodiments, PCR and/or multiplex PCR comprises a pH of about 8 to about 10 (e.g., a pH of about 7.8, about 7.9, about 8, about 8.1, about 8.2, about 8.3, about 8.4, about 8.5, about 8.6, about 8.7, about 8.8, about 8.9, about 9, about 9.1, about 9.2, about 9.3, about 9.4, about 9.5, about 9.6, about 9.7, about 9.8, about 9.9, about 10, about 10.1, or about 10.2). In some embodiments, PCR and/or multiplex PCR comprises a pH of about 8.5 to about 9.2 (e.g., a pH of about 8.3, about 8.4, about 8.5, about 8.6, about 8.7, about 8.8, about 8.9, about 9, about 9.1, about 9.2, about 9.3, or about 9.4).
The terms “template” and “template nucleic acid” are used herein interchangeably to refer to a nucleic acid that is bound by a primer, e.g., for extension by a nucleic acid synthesis reaction (e.g., by PCR or multiplex PCR). In some embodiments, a nucleic acid synthesis reaction (e.g., PCR or multiplex PCR) uses less than about 2 μg of a template nucleic acid (e.g., template DNA), e.g., less than about 1.9 μg, less than about 1.8 μg, less than about 1.7 μg, less than about 1.6 μg, less than about 1.5 μg, less than about 1.4 μg, less than about 1.3 μg, less than about 1.2 μg, less than about 1.1 μg, or less than about 1.0 μg. In some embodiments, a nucleic acid synthesis reaction (e.g., PCR or multiplex PCR) uses less than about 1 μg of a template nucleic acid (e.g., template DNA), e.g., less than about 0.9 μg, less than about 0.8 μg, less than about 0.7 μg, less than about 0.6 μg, or less than about 0.5 μg.
In some embodiments, amplifying two or more ROIs comprises amplifying at least two, at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more, e,g., at least 12, or at least 14 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least two, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, or at least 9 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, or at least 19 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, or at least 29 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least 30, at least 31, at least 32, at least 33, at least 34. at least 35, at least 36, at least 37, at least 38, or at least 39 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, or at least 49 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least 50 ROIs, or more (e.g., at least 52, at least 55, at least 60, at least 70, at least 80, at least 90, or at least 100 ROIs, or more).
In some embodiments, each ROI is about 2, about 5, about 10, about 20, about 30, about 40, about 50, about 100, about 150, about 200, about 250, about 500, about 1,000, about 2,000, about 5,000, or about 10,000 nucleotides in length. In some embodiments, each ROI is about 2, about 5, about 10, about 20, about 30, about 40 nucleotides in length. In some embodiments, each ROI is about 50, about 60, about 70, about 80, or about 90 nucleotides in length. In some embodiments, each ROI is about 100, about 110, about 120, about 130, or about 140 nucleotides in length. In some embodiments, each ROI is about 150, about 160, about 170, about 180, or about 190 nucleotides in length. In some embodiments, each ROI is about 200, about 210, about 220, about 230, or about 240 nucleotides in length. In some embodiments, each ROI is about 250, about 300, about 350, about 400, or about 450 nucleotides in length. In some embodiments, each ROI is about 500, about 550, about 600, about 650, about 700, about 750, about 800, about 850, about 900, or about 950 nucleotides in length. In some embodiments, each ROI is about 1,000, about 1,100, about 1,200, about 1,300, about 1,400, about 1,500, about 1,600, about 1,700, about 1,800, or about 1,900 nucleotides in length. In some embodiments, each ROI is about 2,000, about 2,200, about 2,400, about 2,600, about 2,800, about 3,000, about 3,200, about 3,400, about 3,600, about 3,800, about 4,000, about 4,200, about 4,400, about 4,600, or about 4,800 nucleotides in length. In some embodiments, each ROI is about 5,000, about 5,500, about 6,000, about 6,500, about 7,000, about 7,500, about 8,000, about 8,500, about 9,000, or about 9,500 nucleotides in length. In some embodiments, each ROI is about 10,000 nucleotides in length, or more (e.g., about 12,000, about 15,000, or about 20 nucleotides in length, or more),
The term “primer,” as used herein, refers to a polynucleotide capable of hybridizing with a sequence in a target nucleic acid (e.g., an ROI) and acting as a point of initiation of synthesis for a complementary strand of a nucleic acid under conditions suitable for such synthesis (e.g., in the presence of nucleotides and an inducing agent such as a DNA polymerase and at a suitable temperature and pH). In some embodiments, a primer is single-stranded for maximum efficiency in amplification, but may alternatively be double-stranded. If double-stranded, in some embodiments, the primer is first treated to separate its strands before being used to prepare extension products. In some embodiments, the primer is DNA. In some embodiments, the primer is sufficiently long to prime the synthesis of extension products in the presence of an inducing agent (e.g., a DNA polymerase). The exact lengths of primers may depend on several factors, including temperature, source of primer, and the use of the method, as will be apparent to one of skill in the art. In some embodiments, a primer is about 18-22 nucleotides in length. In some embodiments, a primer is about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, or about 24 nucleotides in length. In some embodiments, a primer is less than about 18 nucleotides in length. In some embodiments, a primer is greater than about 22 nucleotides in length. In some embodiments, a primer comprises at least one sequence or sequence portion that does not hybridize to the nucleic acid of interest. For example, in some embodiments, a primer may comprise a tag sequence (e.g., any of the tag sequences described and/or exemplified herein). In some embodiments, a primer is a forward primer. In some embodiments, a primer is a reverse primer. In some embodiments, a primer comprises a set of primers (e.g., at least one forward primer and at least one reverse primer).
The term “forward primer,” as used herein, refers to a primer capable of annealing to a 5′ end of a template. In some embodiments, a forward primer can anneal to about 15-30, about 15-25, about 15-20, about 20-30, or about 20-25 nucleotides at a 5′ end of the template.
The term “reverse primer,” as used herein, refers to a primer capable of annealing to a 3′ end of a template (e.g., to a 5′ end of a reverse strand of the template). In some embodiments, a reverse primer can anneal to about 15-30, about 15-25, about 15-20, about 20-30, or about 20-25 nucleotides at a 3′ end of the template.
In some embodiments, the working concentration of one or more primers is about 1 nM to about 5,000 nM. In some embodiments, the working concentration of one or more primers is about 5 nM, about 10 nM, about 20 nM, about 30 nM, about 40 nM, about 50 nM, about 60 nM, about 70 nM, about 80 nM, about 90 nM, about 100 nM, about 150 nM, about 200 nM, about 250 nM, about 300 nM, about 350 nM, about 400 nM, about 450 nM, about 500 nM, about 550 nM, about 600 nM, about 650 nM, about 700 nM, about 750 nM, about 800 nM, about 850 nM, about 900 nM, about 950 nM, or about 1,000 nM. In some embodiments, the working concentration of one or more primers is about 1,000 nM, about 1,250 nM, 1,500 nM, about 1,750 nM, about 2,000 nM, about 2,250 nM, about 2,500 nM, about 2,750 nM, about 3,000 nM, about 3,250 nM, about 3,500 nM, about 3,750 nM, about 4,000 nM, about 4,250 nM, about 4,500 nM, about 4,750 nM, or about 5,000 nM, or higher. In some embodiments, the working concentration of one or more primers is about 10 nM to about 100 nM. In some embodiments, the working concentration of one or more primers is about 10 nM to about 50 nM. In some embodiments, the working concentration of one or more primers is about 20 nM to about 40 nM. In some embodiments, the working concentration of one or more primers is about 30 nM.
In some embodiments, one or more primers are depleted prior to concatenating tagged amplicons. The term “depleted” or “depletion,” as used herein in the context of primer concentration, means reducing a primer concentration by at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 95%, or at least about 99%, or 100%, relative to the starting concentration of the primer (i.e., 100% depletion is not necessarily achieved). In some embodiments, a primer concentration is reduced or depleted by at least about 80%, at least about 90%, at least about 95%, or at least about 99%. In some embodiments, a primer concentration is reduced or depleted by 100%.
In some embodiments, one or more primers are selected to prevent formation of one or more primer dimers.
As used herein, the term “primer dimer” refers to a nucleic acid molecule comprising or consisting of at least two primers that have attached (i.e., hybridized) to each other due to strings of complementary bases in the primers. Primer dimers can be a potential by-product in amplification reactions such as PCR. In some embodiments, a DNA polymerase may amplify one or more primer dimers, which can result in competition for reagents and potentially inhibit amplification of the DNA sequence targeted for amplification. In some embodiments, a primer dimer may result in skipping of amplicons and/or generation of truncated amplification products. In some embodiments, such as in quantitative PCR, primer dimers may interfere with accurate quantification. In some embodiments, the methods and compositions described herein comprise selecting one or more primers that lack 5 or more (e.g., 5, 6, 7, 8, 9, 10, or more) exactly-matched bases (i.e., exactly-matched bases with one another or with any other primers) at the 3′ end of the primer sequences. In some embodiments, such selection may prevent two primers from forming a primer dimer (e.g., an exponential amplifiable primer dimer). In some embodiments, such selection may prevent two primers from forming a primer dimer (e.g., a linear amplifiable primer dimer). In some embodiments, such selection may prevent two primers from forming one or more non-specific off-target products. In some embodiments, one or more primers are selected to comprise minimal sequence that is complementary to a sequence in another primer used in generating a nucleic acid library. In some embodiments, the minimal sequence is about 6 to about 100 nucleotides in length, e.g., about 6 to about 50 or about 15 to about 30 nucleotides in length, e.g., about 18 to about 20 nucleotides in length. In some embodiments, the minimal sequence is about 6 to about 50 nucleotides in length, e.g., about 6 to about 30 or about 15 to about 30 nucleotides in length, e.g., about 18 to about 20 nucleotides in length. In some embodiments, the minimal sequence is about 6 to about 30 nucleotides in length. In some embodiments, the minimal sequence is about 4 to about 40, about 5 to about 35, or about 6 to about 30 nucleotides in length. In some embodiments, the minimal sequence is about 10, about 15, about 20, about 25, about 30, or about 35 nucleotides in length. In some embodiments, the minimal sequence is about 15 to about 30 nucleotides in length. In some embodiments, the minimal sequence is about 18 to about 20 nucleotides in length. In some embodiments, the minimal sequence is at least about 4, about 5, about 6, about 7, about 8, about 9, or about 10 nucleotides in length. In some embodiments, the minimal sequence is at least about 6 nucleotides in length.
In some embodiments, one or more primers are selected to minimize formation of one or more dead-end intermediate products. In some embodiments, one or more primers comprise a 5′ tag sequence and a sequence capable of hybridizing to an ROI. In some embodiments, the methods and compositions described herein comprise selecting one or more primers that have at least one adenine between the 5′ tag sequence and the sequence capable of hybridizing to an ROI. In some embodiments, such selection may minimize or eliminate formation of one or more dead-end intermediate products.
As used herein, the term “dead-end intermediate product” refers to a nucleic acid molecule produced in an amplification reaction (e.g., PCR) that cannot form one or more concatenated amplicons.
As used herein, the term “tag sequence” refers to a nucleic acid that is not capable of hybridizing with a sequence in a target nucleic acid (e.g., an ROI). In some embodiments, a tag sequence may be about 10-60 nucleotides in length. In some embodiments, a tag sequence is about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, or about 29 nucleotides in length. In some embodiments, a tag sequence is about 30, about 35, about 40, about 45, about 50, about 55, or about 60 nucleotides in length, or longer (e.g., about 65 or about 70 nucleotides in length, or longer). In some embodiments, a tag sequence of a primer or amplicon is complementary to a tag sequence of another primer or amplicon. In some embodiments, a tag sequence serves as a template for concatenation. For example, in some embodiments, a 5′ tag sequence of a reverse primer for an ROI is complementary to a 5′ tag sequence of a forward primer for another ROI. In some embodiments, following amplification, the tag sequences in the resulting amplicons may hybridize and allow concatenation of the tagged amplicons. In some embodiments, a tag sequence in one or more primers and/or in one or more amplicons is an artificial tag sequence. The term “artificial” refers to a sequence that is not homologous to any part of a genomic sequence (e.g., a human genome sequence).
Two sequences are “not homologous” if two sequences have a low percentage of nucleotides that are the same (e.g., less than about 70% identity over a specified region, or, when not specified, over the entire sequence), e.g., when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using a sequence comparison algorithm or by manual alignment and visual inspection. Optionally, the identity exists over a region that is at least about 50 nucleotides (or 10 amino acids) in length, or over a region that is 100 to 500 or 1000 or more nucleotides (or 20, 50, 200 or more amino acids) in length. In some embodiments, the identity exists over a region that is at least about 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides in length. In some embodiments, the identity exists over a region that is at least about 20 nucleotides in length.
In some embodiments, a tag sequence in one or more primers and/or in one or more amplicons is an artificial tag sequence that is less than about 70% identical to any part of a genomic sequence (e.g., a human genomic sequence). In some embodiments, a tag sequence in one or more primers and/or in one or more amplicons is an artificial tag sequence that is less than about 60% identical to any part of a genomic sequence (e.g., a human genomic sequence). In some embodiments, a tag sequence in one or more primers and/or in one or more amplicons is an artificial tag sequence that is less than about 50% identical to any part of a genomic sequence, or less (e.g., a human genomic sequence). In some embodiments, percent (%) identity between an artificial tag sequence and a genomic sequence (e.g., a human genomic sequence) is measured over the entire length of the artificial tag sequence.
The percent “identity” between two sequences is a function of the number of identical positions shared by the sequences (i.e., percent identity equals number of identical positions/total number of positions×100), taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. The comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters. Additionally, or alternatively, the sequences of the present disclosure can further be used as a “query sequence” to perform a search against public databases to, for example, identify related sequences. For example, such searches can be performed using the BLAST program of Altschul et al. (J Mol Biol 1990; 215(3):403-10).
In some embodiments, an artificial tag sequence is about 20 nucleotides in length, or longer (e.g., about 25 or about 30 nucleotides in length, or longer). In some embodiments, an artificial tag sequence is about 20 nucleotides in length, or longer (e.g., about 25 or about 30 nucleotides in length, or longer), and percent (%) identity between the artificial tag sequence and a genomic sequence (e.g., a human genomic sequence) is measured over the entire length of the tag. In some embodiments, an artificial tag sequence is a 5′ tag sequence, e.g., a tag sequence at the 5′ end of a primer or amplicon. In some embodiments, an artificial tag sequence is a 5′ tag sequence that can be used in an amplification reaction without interference from a sequence in a target nucleic acid (e.g., a human genomic sequence).
In some embodiments, tagged, sequence-specific primers are designed such that the 5′ tag sequence of the reverse primer for each ROI is complementary to the 5′ tag sequence of the forward primer for another ROI. In some embodiments, tagged, sequence-specific primers are designed such that the 5′ tag sequence of the reverse primer for each ROI is complementary to the 5′ tag sequence of the forward primer for the ROI immediately downstream. For instance, in some embodiments, tagged, sequence-specific primers are designed as shown in
In some embodiments, one or more primers comprise at least one adenine between the 5′ tag sequence and the sequence capable of hybridizing to the ROI. In some embodiments, one or more primers comprise a 5′ phosphate. In some embodiments, use of phosphorylated primers may improve specificity of amplicon ligation and concatenation (e.g., following PCR (e,g., following multiplex PCR)).
In some embodiments, one or more primers comprise a molecular barcode. The term “barcode” refers to a nucleic acid sequence that can be detected and identified, e.g., to track, categorize, or index amplified samples. Barcodes can be incorporated into various nucleic acids. Barcodes can also be sufficiently long (e.g., at least 6, 10, or 20 nucleotides in length) such that nucleic acids incorporating the barcodes can be distinguished or grouped according to the barcodes. In some embodiments, a barcode is at least 6 nucleotides in length (e.g., about 6, about 7, about 8, or about 9 nucleotides in length, or longer). In some embodiments, a barcode is at least 10 nucleotides in length (e.g., about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, or about 19 nucleotides in length, or longer). In some embodiments, a barcode is at least 20 nucleotides in length, or longer. Exemplary barcodes and uses thereof are described in U.S. Pat. No. 8,318,434, which is incorporated herein by reference.
In some embodiments, barcodes may be used to quantify the original copy input of each ROI. In some embodiments, the copy input information allows detection of copy number variation. A tag sequence may comprise a barcode. In some embodiments, one or more primers comprise a barcode within a tag sequence (e.g., a 5′ tag sequence). In some embodiments, a barcode included within a tag sequence (e.g., a 5′ tag sequence) can label each individual target molecule (e.g., each tagged amplicon) with a unique barcode sequence. For instance, in some embodiments, an amplification reaction using 10 ng input of human genomic DNA may yield approximately 3000 unique copies of a particular gene, with each copy labeled with a unique barcode. By counting the number of unique barcodes in the final sequencing reads, in some embodiments, the copy number of input molecules can be determined. For example, in some embodiments, a two-copy gene having twice the number of starting copies for amplification may have twice the number of unique barcode counts, as compared to a one-copy gene. In some embodiments, the number of unique barcode sequences incorporated into a concatemer can be counted and compared to reference counts for a known copy-number gene. In some embodiments, the copy number of the target gene can be calculated based on the molecular barcode counting ratio relative to the reference gene.
In some embodiments, each tagged amplicon is labeled with a unique barcode sequence, and the barcodes are used to determine the copy number of each amplicon target in the starting input. In some embodiments, following amplification, concatenation, and sequencing, each amplicon having the same stoichiometry ratio (e.g., a stoichiometry ratio of about 1:1, i.e., one amplicon to one concatemer) can result in the same total reads for each amplicon. In some embodiments, if each tagged amplicon is labeled with a unique barcode sequence, barcode counting can also simultaneously allow for quantification of the actual copy number of each target amplicon in the starting input. In some embodiments, a purification step is used to remove any unincorporated barcode primers from the reaction mixture following amplification. In some embodiments, if excess barcode primers are not removed (e.g., via purification), a resampling of PCR products may occur (e.g., during a subsequent amplification reaction (e.g., during a subsequent PCR)) and result in falsely high numbers of unique copies of a target amplicon, e.g., as determined by sequencing analysis. Exemplary methods for copy number detection using barcodes are described in Ogawa et al., (2017) Scientific Reports 7(1):13576, which is incorporated herein by reference for such methods.
In some embodiments, an external spiking control may be used to quantify the original copy input of each ROI. In some embodiments, detecting or quantifying gene copy number comprises using and/or comparing to an external spiking control. In some embodiments, the external spiking control is added during amplification of two or more ROIs, e.g., in step (i) of a multiplex PCR. In some embodiments, the external spiking control comprises a spiking synthetic gBlock control. In some embodiments, the external spiking control (e.g., a spiking synthetic gBlock control) comprises gene fragments of a reference gene with a known copy number and a target gene with an unknown copy number. In some embodiments, each synthetic gene fragment contains at least one stamp code, e.g., a different base compared to the natural genomic sequence, which allows for differentiation between the natural genomic sequences and the artificial synthetic gBlocks. In some embodiments, two or more gene fragments are constructed in one synthetic gBlock to maintain a 1:1 stoichiometry ratio. In some embodiments, two or more gene fragments in a synthetic gBlock may have the opposite 5′-3′ orientation as the orientation in the final concatenation products. In some embodiments, a unique restriction site is used to cut the synthetic gBlock while maintaining an equal (1:1) molar ratio of the two or more gene fragments in the digested gBlock control. Exemplary methods for copy number detection using an external spiking control (e.g., a spiking synthetic gBlock control) are described and exemplified herein (e.g., in Example 7 and
The terms “concatenate,” “concatenating,” and “concatenation,” as used herein, refer to the linkage (e.g., covalent linkage) of two or more nucleic acids (e.g., amplicons, e.g., tagged amplicons). The terms “concatemer” and “concatenated amplicon” refer to a continuous nucleic acid molecule generated by linking (e.g., covalently linking) shorter nucleic acid molecules such as amplicons (e.g., tagged amplicons).
In some embodiments, tagged amplicons are not purified prior to concatenation. In some embodiments, tagged amplicons are joined to form one or more concatenated amplicons. In some embodiments, concatenating the tagged amplicons comprises concatenating at least two, at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 tagged amplicons. In some embodiments, concatenating the tagged amplicons comprises concatenating at least two, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, or at least 9 tagged amplicons. In some embodiments, concatenating the tagged amplicons comprises concatenating at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, or at least 19 tagged amplicons. In some embodiments, concatenating the tagged amplicons comprises concatenating at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, or at least 29 tagged amplicons. In some embodiments, concatenating the tagged amplicons comprises concatenating at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, or at least 39 tagged amplicons. In some embodiments, concatenating the tagged amplicons comprises concatenating at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, or at least 49 tagged amplicons. In some embodiments, concatenating the tagged amplicons comprises concatenating at least 50 tagged amplicons, or more (e.g., at least 52, at least 55, at least 60, at least 70, at least 80, at least 90, or at least 100 tagged amplicons, or more).
In some embodiments, each tagged amplicon is about 50, about 100, about 150, about 200, about 250, about 500, about 1,000, about 2,000, about 5,000, or about 10,000 nucleotides in length. In some embodiments, each tagged amplicon is about 50, about 60, about 70, about 80, or about 90 nucleotides in length. In some embodiments, each tagged amplicon is about 100, about 110, about 120, about 130, or about 140 nucleotides in length. In some embodiments, each tagged amplicon is about 150, about 160, about 170, about 180, or about 190 nucleotides in length. In some embodiments, each tagged amplicon is about 200, about 210, about 220, about 230, or about 240 nucleotides in length. In some embodiments, each tagged amplicon is about 250, about 300, about 350, about 400, or about 450 nucleotides in length. In some embodiments, each tagged amplicon is about 500, about 550, about 600, about 650, about 700, about 750, about 800, about 850, about 900, or about 950 nucleotides in length. In some embodiments, each tagged amplicon is about 1,000, about 1,100, about 1,200, about 1,300, about 1,400, about 1,500, about 1,600, about 1,700, about 1,800, or about 1,900 nucleotides in length. In some embodiments, each tagged amplicon is about 2,000, about 2,200, about 2,400, about 2,600, about 2,800, about 3,000, about 3,200, about 3,400, about 3,600, about 3,800, about 4,000, about 4,200, about 4,400, about 4,600, or about 4,800 nucleotides in length. In some embodiments, each tagged amplicon is about 5,000, about 5,500, about 6,000, about 6,500, about 7,000, about 7,500, about 8,000, about 8,500, about 9,000, or about 9,500 nucleotides in length. In some embodiments, each tagged amplicon is about 10,000 nucleotides in length, or more (e.g., about 12,000, about 15,000, or about 20 nucleotides in length, or more).
In some embodiments, the total length of the one or more concatenated amplicons is about 2,000 to about 50,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 2,000 to about 20,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 10,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 5,000 nucleotides. In some embodiments, the total length of the one or more concatenated amplicons is about 3,000 to about 4,000 nucleotides. In some embodiments, concatenating tagged amplicons to generate one or more concatenated amplicons allows each amplicon to have a desired orientation. In some embodiments, concatenating involves hybridization of the complementary ends (i.e., tags) of the tagged amplicons.
The terms “hybridize,” “hybridizing,” and “hybridization,” as used herein, refer to the formation of a complex between nucleotide sequences that are sufficiently complementary to form a complex via Watson-Crick base pairing. For example, in some embodiments, where a primer “hybridizes” with target (template) nucleic acid, the complex (hybrid) is sufficiently stable to serve the priming function required by, e.g., the DNA polymerase to initiate DNA synthesis. In some embodiments, where the complementary end (i.e., tag) of a tagged amplicon “hybridizes” with the complementary end (i.e., tag) of another tagged amplicon, the complex is sufficiently stable to form a concatamer of the tagged amplicons. In some embodiments, wherein a primer comprises a sequence capable of hybridizing to an ROI, the sequence in the primer and the ROI may be, but are not necessarily, completely complementary. In some embodiments, the sequence in the primer and the ROI have a perfectly matched stretch of bases that is capable of forming a complex via Watson-Crick base pairing (i.e., is 100% complementary). In some embodiments, the sequence in the primer and the ROI do not have a perfectly matched stretch of bases, but are sufficiently complementary to form a complex via Watson-Crick base pairing (e.g., the sequence in the primer and the ROI are at least about 80%, 85%, 90%, 95%, or 99% complementary).
The term “complementary,” as used herein in connection with a nucleic acid sequence, refers to the pairing of bases, A with T or U, and G with C. The term can refer to nucleic acid molecules that are completely complementary (i.e., capable of forming A to T or U pairs and G to C pairs across the entire reference sequence), as well as molecules that are substantially complementary (e.g., at least about 80%, 85%, 90%, 95%, or 99% complementary).
In some embodiments, one or more concatenated amplicons are in a predetermined order. In some embodiments, the predetermined order results from the tag sequences in the primers. In some embodiments, the 5′ tag sequence of the reverse primer for each ROI is complementary to only the 5′ tag sequence of the forward primer for the ROI immediately downstream. In some embodiments, the order of the one or more concatenated amplicons is identical to the order of the corresponding ROIs in the target nucleic acid. In some embodiments, the order of the one or more concatenated amplicons is not identical to the order of the corresponding ROIs in the target nucleic acid and is driven instead by the predetermined pairing of the 5′ tag sequence of the reverse primer of each ROI with the 5′ tag sequence of the forward primer of another ROI. In some embodiments, the one or more concatenated amplicons comprise single-copy representation (e.g., a defined unitary copy number) of each tagged amplicon. As used herein, the term “single-copy representation” means that a concatenated amplicon contains a single copy of each tagged amplicon used to assemble the concatenated amplicon. In some embodiments, the ratio of the one or more concatenated amplicons to the corresponding ROIs in the target nucleic acid is about 1 to 1. Other ratios (i.e., any ratios other than about 1 to 1) are also contemplated and may result from the exemplary methods and compositions disclosed herein.
In some embodiments, concatenating tagged amplicons comprises providing a DNA polymerase. In some embodiments, the DNA polymerase fills in the gaps in the structures formed by hybridization of the complementary ends (i.e., tags) of the tagged amplicons. In some embodiments, the DNA polymerase is a wild-type polymerase. In some embodiments, the DNA polymerase is a modified polymerase. In some embodiments, the DNA polymerase is a thermophilic, chimeric, and/or engineered polymerase. In some embodiments, the DNA polymerase can comprise a mixture of more than one polymerase. In some embodiments, the DNA polymerase has 3′ to 5′ exonuclease activity. In some embodiments, the DNA polymerase is a high-fidelity DNA polymerase. In some embodiments, the DNA polymerase is a Q5, Pfu, or Kapa HiFi HotStart DNA polymerase.
In some embodiments, the DNA polymerase is a Q5 DNA polymerase, e,g., M0494S, M0491S (New England Biolabs Inc.) (see, e.g., U.S. Pat. Nos. 6,627,424, 7,541,170, 7,670,808, and 7,666,645, each of which is incorporated herein by reference for the description of such polymerases and uses thereof).
In some embodiments, the DNA polymerase is a Pfu DNA polymerase, e.g., M7741/M7745 (Promega) (see, e.g., Mesalam et al., (2018) Virology 514:30-41; Pasello et al., (2018) Methods in Molecular Biology 1827; Harvey et al., (2018) Journal of Chemical Ecology 44(10):894-904; Dubos et al., (2018) General and Comparative Endocrinology 266:110-118; and Tanabe et al., (2018) Revista do Instituto de Medicina Tropical de São Paulo 60, each of which is incorporated herein by reference for the description of such polymerases and uses thereof).
In some embodiments, the DNA polymerase is a Kapa HiFi HotStart DNA polymerase, e.g., KK2601/KK2602 (Roche) (see, e.g., U.S. Pat. No. 8,481,685, which is incorporated herein by reference for the description of such polymerases and uses thereof).
In some embodiments, concatenating tagged amplicons comprises providing at least one adjuvant. The term “adjuvant,” as used herein, refers to a reagent capable of improving efficiency (i.e., higher amount of product) and/or specificity (i.e., lower amount of non-specific product) of an amplification reaction (e.g., PCR, e.g., multiplex PCR). In some embodiments, the at least one adjuvant comprises TMAC, ThermaGo, and/or ThermaStop. In some embodiments, the at least one adjuvant comprises trioctadecylmethylammonium chloride (TMAC). In some embodiments, the at least one adjuvant comprises ThermaGo (ThermaGo™ (Thermagenix)). In some embodiments, the at least one adjuvant comprises ThermaStop (ThermaStop™ (Thermagenix)). See, e.g., U.S. Pat. Nos. 7,517,977, 9,034,605, and 9,758,813; see also U.S. Publication No. 201810002739, each of which is incorporated herein by reference for the description of such adjuvants.
In some embodiments, amplifying the one or more concatenated amplicons comprises PCR. In some embodiments, amplifying the one or more concatenated amplicons comprises long-range PCR (i.e., PCR capable of amplifying templates at least about 10,000 nucleotides in length, or longer). Exemplary protocols, including reagents and reaction conditions, for long-range PCR are described in, e.g., Cheng et al., (1994) PNAS 91:5695-9; Barnes (1994) PNAS 91(6):2216-20; and Jia et al., (2014) Scientific Reports 4:5737, each of which is incorporated herein by reference for the disclosure of such protocols.
In some embodiments, amplifying the one or more concatenated amplicons comprises at least one first end primer and at least one second end primer.
As used herein, the term “end primer” refers to a primer capable of hybridizing with a tag sequence at an end (i.e., a 5′ or 3′ end) of a concatenated amplicon. In some embodiments, an end primer acts as a point of initiation of synthesis along a complementary strand of the concatenated amplicon. In some embodiments, the end primer is used to amplify the concatenated amplicon. In some embodiments, an end primer comprises a first end primer and a second end primer. In some embodiments, the first end primer is capable of hybridizing to a tag sequence at the 5′ end of a concatenated amplicon. In some embodiments, the 5′ end of the concatenated amplicon is identical to or overlaps with the 5′ tag sequence of a forward primer used to amplify an ROI. In some embodiments, the second end primer is capable of hybridizing to a tag sequence at the 3′ end of a concatenated amplicon. In some embodiments, the tag sequence at the 3′ end of the concatenated amplicon is identical to or overlaps with the 5′ tag sequence of a reverse primer used to amplify an ROI. Exemplary end primers are described and exemplified herein. Exemplary end primers, and their use in an exemplary method disclosed herein, are also shown in
In some embodiments, a first end primer and a second end primer are added during generation of tagged amplicons, concatenation of tagged amplicons, or amplification of one or more concatenated amplicons (i.e., in any one of steps (i)-(iii), respectively). In some embodiments, a first end primer and a second end primer are added in step (ii) or step (iii). In some embodiments, a method disclosed herein comprises 2-step PCR.
As used herein, the term “2-step PCR” refers to a method comprising a first PCR and a second PCR. In some embodiments, the first PCR and the second PCR are carried out without an intervening purification step (i.e., a purification step between the first and second PCR). In some embodiments, the first PCR comprises multiplex PCR. In some embodiments, the first PCR comprises the protocol: 94° C./5 min, 2 cycles of 94° C./15 sec, 60° C./4 min, and 23 cycles of 94° C./15 sec, 72° C./2 min, followed by 20 cycles of 94° C./15 sec, 55° C./1 min, 72° C./2 min. In some embodiments, the second PCR comprises amplification of the products from the first PCR (e.g., about 1 μl of PCR products) with end primers. In some embodiments, the end primers are added before or during the second PCR. In some embodiments, 2-step PCR may be performed in less than about 5 hours, less than about 4.5 hours, less than about 4 hours, less than about 3.5 hours, or less than about 3 hours. In some embodiments, 2-step PCR may be performed in less than about 4 hours. In some embodiments, the total active (“hands-on”) time of 2-step PCR may be less than about 1 hour, less than about 50 min, less than about 40 min, less than about 30 min, or less than about 20 min. In some embodiments, the total active time of 2-step PCR may be less than about 30 min.
In some embodiments, a first end primer and a second end primer are added in step (i). In some embodiments, a method disclosed herein comprises 1-step PCR.
As used herein, the term “1-step PCR” refers to a method comprising a single PCR. In some embodiments, the single PCR comprises PCR and amplification of the products from the PCR (e.g., about 1 μl of PCR products) with end primers. In some embodiments, the PCR comprises multiplex PCR.
In some embodiments, a target nucleic acid is obtained from a biological sample (e.g., a biological sample from a human subject diagnosed with and/or suspected of being at risk for a disease (e.g., a cancer or a hereditary disorder)). In some embodiments, a target nucleic acid is used in a multiple gene panel, e.g., to detect mutations and/or structural variation in one or more target genes. In some embodiments, the multiple gene panel is a newborn or carrier screening panel. In some embodiments, the multiple gene panel comprises at least about 20 human genes (e.g., at least about 22 human genes). In some embodiments, the multiple gene panel comprises at least about 22 human genes.
In some embodiments, a library of concatenated amplicons is made from the target nucleic acid, e.g., using any of the exemplary methods disclosed herein. For example, in some embodiments, a library of concatenated amplicons is made by generating tagged amplicons from the target nucleic acid (e.g., by amplifying two or more regions of interest (ROIs)); concatenating the tagged amplicons to generate one or more concatenated amplicons; and amplifying the one or more concatenated amplicons to generate the library.
In some embodiments, two or more ROIs (e.g., ROIs in exon regions) are amplified (e.g., by PCR, e.g., by multiplex PCR) with gene-specific primers each having a tag sequence attached to the 5′ end of the primer. In some embodiments, two or more ROIs are amplified by multiplex PCR (e.g., MOE-PCR). In some embodiments, each ROI is amplified with a forward primer and a reverse primer. In some embodiments, each primer comprises a 5′ tag sequence and a sequence capable of hybridizing to an ROI. In some embodiments, the 5′ tag sequence of the reverse primer for each ROI is complementary to the 5′ tag sequence of the forward primer for another ROI. In
In some embodiments, the library of concatenated amplicons made from the target nucleic acid is analyzed. In some embodiments, the library is analyzed using sequencing (e.g., single-molecule sequencing), gene assembly, and/or structural variation characterization. In some embodiments, the library is sequenced, e.g., using single-molecule sequencing or any long-read sequencing platform.
In some embodiments, the present disclosure provides method of sequencing a target nucleic acid, the method comprising:
In some embodiments, the target nucleic acid is isolated from a biological sample. In some embodiments, the biological sample is obtained from a subject (e.g., a human subject). In some embodiments, the biological sample comprises a blood sample, a buccal sample, or a biopsy sample (e.g., a liquid biopsy sample). In some embodiments, a biopsy sample comprises frozen tissue or formalin-fixed paraffin-embedded (FFPE) tissue. In some embodiments, a biopsy sample (e.g., a liquid biopsy sample) comprises cell-free DNA or DNA from circulating tumor cells.
In some embodiments, tagged amplicons are generated by amplifying two or more ROIs using PCR (e.g., multiplex PCR). In some embodiments, tagged amplicons are generated by amplifying two or more ROIs using multiplex PCR. In some embodiments, the PCR and/or multiplex PCR comprises magnesium in a working concentration of about 1.5 mM to about 3 mM. In some embodiments, the PCR and/or multiplex PCR comprises DMSO in a working concentration of about 3% to about 6% by volume (v/v). In some embodiments, the PCR and/or multiplex PCR comprises a pH of about 8.5 to about 9.2. In some embodiments, amplifying two or more ROIs comprises amplifying at least two, at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 ROIs. In some embodiments, amplifying two or more ROIs comprises amplifying at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more, e.g., at least 12, or at least 14 ROIs. In some embodiments, each ROI is about 2, about 5, about 10, about 20, about 30, about 40, about 50, about 100, about 150, about 200, about 250, about 500, about 1,000, about 2,000, about 5,000, or about 10,000 nucleotides in length.
In some embodiments, tagged amplicons are generated by amplifying two or more ROIs using a set of tagged, sequence-specific primers in a PCR reaction (e.g., a multiplex PCR reaction, e.g., a multiplex PCR reaction in a single tube). In some embodiments, a 5′ tag sequence is an artificial tag sequence. In some embodiments, a 5′ tag sequence is an artificial tag sequence that is not homologous (e.g., is less than 70% identical) to a human genome sequence. In some embodiments, the tagged, sequence-specific primers are designed such that the 5′ tag sequence of the reverse primer for each ROI is complementary to the 5′ tag sequence of the forward primer for another ROI. In some embodiments, the tagged, sequence-specific primers are designed such that the 5′ tag sequence of the reverse primer for each ROI is complementary to the 5′ tag sequence of the forward primer for the ROI immediately downstream. In some embodiments, the tagged, sequence-specific primers are designed such that the 5′ tag sequence of the reverse primer for each ROI is complementary to the 5′ tag sequence of the forward primer for an ROI that is not immediately downstream. In some embodiments, the tagged, sequence-specific primers are designed as shown in
Following amplification, in some embodiments, the amplicons comprise complementary tag sequences, which allow the tagged amplicons to be assembled into a single concatenated product. In some embodiments, the total length of the one or more concatenated amplicons is about 2,000 to about 50,000 nucleotides (e.g., about 3,000, about 4,000, about 5,000, or about 10,000 nucleotides, or longer). In some embodiments, concatenating the tagged amplicons comprises providing a DNA polymerase. In some embodiments, the DNA polymerase has 3′ to 5′ exonuclease activity. In some embodiments, the DNA polymerase is a high-fidelity DNA polymerase. In some embodiments, the DNA polymerase is a high-fidelity DNA polymerase (e.g., a Q5, Pfu, or Kapa HiFi HotStart DNA polymerase) and the PCR and/or multiplex PCR conditions comprise magnesium, e.g., in a working concentration of about 1.5 mM to about 3 mM. In some embodiments, the DNA polymerase is a high-fidelity DNA polymerase (e.g., a Q5, Pfu, or Kapa HiFi HotStart DNA polymerase) and the PCR and/or multiplex PCR conditions comprise DMSO, e.g., in a working concentration of about 3% to about 6% by volume (v/v). In some embodiments, the DNA polymerase is a high-fidelity DNA polymerase (e.g., a Q5, Pfu, or Kapa HiFi HotStart DNA polymerase) and the PCR and/or multiplex PCR conditions comprise a pH of about 8.5 to about 9.2. In some embodiments, the DNA polymerase is a Q5, Pfu, or Kapa HiFi HotStart DNA polymerase. In some embodiments, concatenating the tagged amplicons comprises providing at least one adjuvant. In some embodiments, the at least one adjuvant comprises TMAC, ThermaGo, and/or ThermaStop.
In some embodiments, the working concentration of one or more primers in step (i) is about 30 nM. In some embodiments, one or more primers in step (i) are depleted prior to concatenating the tagged amplicons. In some embodiments, one or more primers are depleted via purification.
In some embodiments, one or more primers in step (i) are selected to prevent formation of one or more primer dimers. In some embodiments, selection comprises designing one or more primers in step (i) to comprise minimal sequence that is capable of hybridizing to an ROI and also complementary to a sequence in another primer. Exemplary primers comprising minimal sequence that is capable of hybridizing to an ROI and also complementary to a sequence in another primer are described and exemplified herein (e.g., in Example 2 and Table 4; see also
In some embodiments, one or more primers in step (i) are selected to minimize formation of one or more dead-end intermediate products, e.g., products that cannot form one or more concatenated amplicons. In some embodiments, selection comprises designing one or more primers in step (i) to comprise at least one adenine between the 5′ tag sequence and the sequence capable of hybridizing to the ROI.
In some embodiments, one or more primers in step (i) do not comprise a molecular barcode. In other embodiments, one or more primers in step (i) comprise a molecular barcode. In some embodiments, one or more primers comprise a barcode within the 5′ tag sequence. In some embodiments, a barcode included within the 5′ tag sequence labels each tagged amplicon with a unique barcode sequence. In some embodiments, one or more primers comprising a barcode are depleted after amplification, e.g., via purification, to remove any unincorporated molecular barcode primers from the reaction mixture (e.g., after PCR and/or multiplex PCR). In some embodiments, following sequencing in step (v), the number of unique barcodes in the final sequencing reads are counted and the copy number of input molecules is determined. In some embodiments, following amplification, concatenation, and sequencing, the number of unique barcode sequences incorporated into a concatemer are counted and compared to reference counts for a known copy-number gene. In some embodiments, the copy number of the target gene is calculated based on the molecular barcode counting ratio relative to the reference gene.
In some embodiments, end primers with tag sequences are used to drive amplification of a concatenated amplicon (e.g., TagA and TagB primers in
In some embodiments, sequencing in step (v) comprises single-molecule sequencing. In some embodiments, the sequencing comprises long-read sequencing (e.g., sequencing about 800 nucleotides or longer). In some embodiments, the sequencing comprises nanopore sequencing or single-molecule real-time (SMRT) sequencing. In some embodiments, the sequencing comprises long-read sequencing of a target nucleic acid, e.g., using the method described above or any of the exemplary methods described herein.
In some embodiments, a target nucleic acid comprises one or more genes or a multiple gene panel. In some embodiments, the one or more genes comprise a human gene. In some embodiments, the human gene is a human disease gene. In some embodiments, the human gene is a human cancer gene. In some embodiments, the one or more genes comprise CFTR, SMN1, SMN2, KRAS, BRAF, PIK3C. EGFR, and/or ERBB2. In some embodiments, the human gene is a human gene with high modeled fetal disease risk (MFDR). In some embodiments, the one or more genes comprise SMN1, SMN2, FMR1, HBA1, HBA2, and/or GBA. In some embodiments, the one or more genes comprise CFTR, FMR1, SMN1, SMN2, IKBKAP, ABCC8, FANCC, GALT, GBA, G6PC, HBA1, HBA2, HBB, BLM, ASPA, TMEM216, BCKDHA, BCKDHB, ACADM, MCOLN1, NEB, SMPD1, F8, HEXA, PCDH15, DMD, CYP21A2, and/or CLRN1. In some embodiments, the one or more genes comprise CFTR, FMR1, SMN1, and/or SMN2.
In some embodiments, a target nucleic acid is used in a multiple gene panel. In some embodiments, a target nucleic acid is used in a multiple gene panel, e.g., to detect mutations and/or structural variation in one or more target genes. In some embodiments, the multiple gene panel is a newborn or carrier screening panel. In some embodiments, the multiple gene panel comprises one or more human genes. In some embodiments, the human gene(s) is/are human disease gene(s). In some embodiments, the methods and nucleic acid libraries disclosed herein are used to detect the presence or absence of a mutation in one or more of the human disease genes, e.g., in the newborn or carrier screening panel. In some embodiments, the human gene is a human cancer gene. In some embodiments, the multiple gene panel comprises CFTR, SMN1, SMN2, KRAS, BRAF, PIK3C, EGFR, and/or ERBB2. In some embodiments, the multiple gene panel comprises SMN1, SMN2, FMR1, HBA1, HBA2, and/or GBA. In some embodiments, the multiple gene panel comprises CFTR, FMR1, SMN1, SMN2, IKBKAP, ABCC8, FANCC, GALT, GBA, G6PC, HBA1, HBA2, HBB, BLM, ASPA, TMEM216, BCKDHA, BCKDHB, ACADM, MCOLN1, NEB, SMPD1, F8, HEXA, PCDH15, DMD, CYP21A2, and/or CLRN1. In some embodiments, the multiple gene panel comprises CFTR, FMR1, SMN1, and/or SMN2. In some embodiments, the human gene is a human gene with high modeled fetal disease risk (MFDR).
In some embodiments, a target nucleic acid and/or a multiple gene panel is used to detect a variation having clinical significance. Without wishing to be bound by theory, the clinical significance of any given sequence variant typically falls along a gradient, ranging from those in which the variant is almost certainly pathogenic for a disorder to those that are almost certainly benign. Various standards and guidelines for the classification of sequence variants have been developed using criteria informed by expert opinion and empirical data, such as the guidelines from the American College of Medical Genetics and Genomics (ACMG) (see, e.g., Richards et al., (2015) Genet Med 17(5):405-24, which is incorporated herein by reference). As used herein, the term “modeled fetal disease risk” or “MDFR” refers to the probability that a hypothetical fetus created from a random pairing of individuals would be homozygous or compound heterozygous for two mutations presumed to cause severe or profound disease (i.e., a disease that if left untreated would cause intellectual disability, a substantially shortened lifespan, or both). A gene with “high” MDFR, as used herein, means a gene having one or more sequence variants classified as pathogenic or likely pathogenic (e.g., as determined, e.g., using ACMG guidelines) and presumed to cause “profound” disease (e.g., as determined, e.g., using the algorithm described in Lazarin et al., (2014) PLoS One. 2014; 9(12):e114391; see also Hague et al., (2016) JAMA 316(7):734-42, each of which is incorporated herein by reference).
In some embodiments, the multiple gene panel is a carrier screening panel. In some embodiments of the exemplary methods and compositions disclosed herein, nucleic acid variants relevant to carrier screening are amplified and/or captured in about 200 to about 400 discrete (short) amplicons (e.g., about 180 to about 220, about 220 to about 260, about 260 to about 300, about 300 to about 340, about 340 to about 380, or about 380 to about 420 discrete (short) amplicons). In some embodiments of the exemplary methods and compositions disclosed herein, sample input is less than about 2 μg of a template nucleic acid (e.g., template DNA), e.g., less than about 1.9 μg, less than about 1.8 μg, less than about 1.7 μg, less than about 1.6 μg, less than about 1.5 μg, less than about 1.4 μg, less than about 1.3 μg, less than about 1.2 μg, less than about 1.1 μg, or less than about 1.0 μg. In some embodiments, sample input is less than about 1 μg of a template nucleic acid (e.g., template DNA), e.g., less than about 0.9 μg, less than about 0.8 μg, less than about 0.7 μg, less than about 0.6 μg, or less than about 0.5 μg.
In some embodiments of the exemplary methods and compositions disclosed herein, the discrete (short) amplicons are concatenated into about 10 to about 50 concatenated amplicons (e.g., about 5 to about 20, about 15 to about 30, about 25 to about 40, about 35 to about 50, about 45 to about 60 concatenated amplicons). In some embodiments, the concatenated amplicons are sequenced using, e.g., single-molecule sequencing or any long-read sequencing platform. In some embodiments, the disclosed methods and compositions can be applied to sequencing across panels of different disease genes and/or markers.
In some embodiments, a target nucleic acid is from a sample (e.g., a biological sample). In some embodiments, a target nucleic acid is from a biological sample. In some embodiments, a target nucleic acid is isolated or purified from a biological sample, e.g., by a process which comprises removing one or more non-nucleic acid components from the biological sample.
As used herein, the term “sample” refers to any composition containing or presumed to contain a target nucleic acid. A sample isolated from a subject, i.e., separated from one or more of the conditions or factors present naturally in the subject, may be referred to as a “biological sample.” A biological sample can be obtained from a living subject, or can be obtained from a subject post-mortem. A biological sample can comprise cell culture constituents, such as, e.g., cultured cells, conditioned media, recombinant cells, and cell components. In some embodiments, a biological sample comprises cells. Cells can be primary cells, can be immortalized cells from a cell line, can be mammalian, or can be non-mammalian (e.g., bacteria, yeast). In some embodiments, a biological sample comprises cell components.
In some embodiments, a biological sample is obtained from a subject. The term “subject” refers to any biological entity comprising genetic material. For example, the subject can be an animal, plant, fungus, or microorganism, such as, e.g., a bacterium, virus, archaeon, microscopic fungus, or protist. In some embodiments, the subject is a human or non-human animal. Non-human animals include all vertebrates (e.g., mammals and non-mammals). In some embodiments, the subject is a mammal. In some embodiments, the subject is a human. In some embodiments, the subject is not diagnosed with and/or is not suspected of being at risk for a disease. In some embodiments, the subject is diagnosed with and/or is suspected of being at risk for a disease. In some embodiments, the disease is a cancer.
Exemplary biological samples include, without limitation, samples of tissue or liquid isolated from a subject. Non-limiting examples of tissues include, e.g., brain, bone, marrow, lung, heart, esophagus, stomach, duodenum, liver, prostate, nerve, meninges, kidneys, endometrium, cervix, breast, lymph node, muscle, hair, and skin, among others. A biological sample can also comprise liquid (e.g., a fluid). Exemplary liquid biological samples include, e.g., whole blood, plasma, serum, soluble cellular extract, extracellular fluid, cerebrospinal fluid, ascites, urine, sweat, tears, saliva, buccal sample, a cavity rinse, or an organ rinse. A biological sample may also include samples of in vitro cultures established from cells taken from a subject, including formalin-fixed paraffin-embedded (FFPE) tissue and nucleic acids isolated therefrom. A sample (e.g., a biological sample) may also include cell-free material, such as cell-free blood fraction that contains cell-free DNA (cfDNA) or DNA from circulating tumor cells (ctDNA). Exemplary methods for lysing cells include but are not limited to mechanical disruption, liquid homogenization, high frequency sound waves, freeze/thaw cycles, and manual grinding. Other exemplary methods for lysing cells or otherwise extracting nucleic acids from a sample are known and would be apparent to one of skill in the art.
In some embodiments, multiple nucleic acids, including all the nucleic acids in a sample, may be converted to library molecules using the methods and compositions described herein. In some embodiments, a sample is a biological sample derived or isolated from a human.
In some embodiments, a biological sample comprises a blood sample. In some embodiments, a biological sample comprises a buccal sample. In some embodiments, a biological sample comprises a fragment of a solid tissue or a solid tumor derived from a human patient, e.g., by biopsy. In some embodiments, the biological sample comprises a biopsy sample. In some embodiments, the biopsy sample comprises frozen tissue or FFPE tissue. In some embodiments, the biopsy sample comprises a liquid biopsy sample. In some embodiments, the liquid biopsy sample comprises cfDNA or ctDNA.
The term “sequencing,” as used herein, refers to any method of determining the sequence of nucleotides in a target nucleic acid. In some embodiments, a library of concatenated amplicons (e.g., a library described herein and/or generated using any of the exemplary methods described herein) can be sequenced. In some embodiments, a library of concatenated amplicons described herein and/or generated using any of the exemplary methods described herein is particularly advantageous in single-molecule sequencing, or in any sequencing platform capable of long-reads (i.e., reads about 800 nucleotides in length, or longer). In some embodiments, sequencing comprises single-molecule sequencing. In some embodiments, sequencing comprises long-read sequencing. In some embodiments, sequencing comprises sequencing about 800 nucleotides or longer.
Non-limiting examples of such long-read sequencing technologies include, without limitation, platforms using single-molecule real-time (SMRT) sequencing such as SMRT by Pacific Biosciences (Menlo Park, Calif., USA), and platforms using nanopore sequencing such as biological nanopore-based instruments manufactured by Oxford Nanopore Technologies (Oxford, UK) or Roche Genia (Santa Clara, Calif., USA) or solid state nanopore-based instruments described, e.g., in WO 2016/142925 and Stranges et al., (2016) PNAS 113(44):E6749, and any other presently existing or future single-molecule sequencing technology that is suitable for long-reads. Exemplary long-read sequencing methods and instruments are also described, e.g., in Liu et al., (2017) Genome Med. 9(1):65; Gieβelmann et al., (2018) “Repeat expansion and methylation state analysis with nanopore-sequencing,” (DOI: 10.1101/480285); Cheng et al., (2015) Clin Chem. 61(10):1305-6; Wei et al., (2018) Fertil Steril. 110(5):910-6; Leija-Salazar et al., (2019) Mol Genet Genomic Med, 7(3):e564; and U.S. Pat. Nos. 8,828,208, 9,057,102, 9,404,146, and 9,542,527, each of which is incorporated herein by reference for the disclosure of such methods and instruments. In some embodiments, sequencing comprises SMRT sequencing or nanopore sequencing.
In some embodiments, the compositions and methods disclosed herein can be used for structural variation characterization, e.g., of a nucleic acid in a sample. In some embodiments, structural variation characterization comprises detecting or quantifying single nucleotide variants (SNV), repeat sequences, indels, gene chimera, and/or gene copy number. In some embodiments, detecting or quantifying gene copy number comprises detecting or quantifying one or more molecular barcodes. In some embodiments, one or more molecular barcodes are used to quantify the original copy input of each ROI. In some embodiments, detecting or quantifying gene copy number comprises using and/or comparing to an external spiking control. In some embodiments, an external spiking control is used to quantify the original copy input of each ROI. In some embodiments, the external spiking control comprises a synthetic gBlock control. In some embodiments, the copy input information is used to detect copy number variation. In some embodiments, the one or more molecular barcodes are in one or more primers. In some embodiments, structural variation characterization comprises labeling and/or direct imaging.
The following examples provide illustrative embodiments of the disclosure. One of ordinary skill in the art will recognize the numerous modifications and variations that may be performed without altering the spirit or scope of the disclosure. Such modifications and variations are encompassed within the scope of the disclosure. The examples provided do not in any way limit the disclosure.
To determine whether 46 short amplicons from a QuantideX® NGS DNA Hotspot 21 Kit for cancer mutation detection (Asuragen) can be converted into one longer amplicon, 12 amplicons from the 46-amplicon panel were selected (Table 1). The end primer tags included Illumina P5, AATGATACGGCGACCACCGA (SEQ ID NO: 1) for T14007_KRAS_4_15_F2 and lllumina P7, CAAGCAGAAGACGGCATACGA (SEQ ID NO: 2) for T14008_ERBB2_774_788_R2. All other complementary tag sequences were derived from natural (genomic) sequence. For instance, in the tag sequence AGGACTGGGGTTTTATTATA (SEQ ID NO: 3) for T13984_KRAS_4_15_R, the TTTTATTATA portion (SEQ ID NO: 4) was adjacent to the natural gene-specific portion of the KRAS_4_15 sequence, while the AGGACTGGGG portion was reverse complementary to the gene-specific sequence of the KRAS_55_65_F primer.
Three primer pools were made. Primer pool#1 had 12 primers at 500 nM each from the 1st 6 amplicons (Table 1). Primer pool#2 had 12 primers at 500 nM each from the 2nd 6 amplicons (Table 1). Primer pool#3 had the complete set of 24 primers at 500 nM each. A 10 μl PCR reaction contained 5 μl of 2× Phoenix Taq PCR master mix (Enzymatics), 1 μl of 10 ng/μl DNA (NA12878, Coriell), 1 μl of 500 mM TMAC, 1 μl of 500 nM primer pool (#1 or #2 or #3), and 2 μl of nuclease-free water. The pre-amplification cycle conditions were 95° C./5 min, 2 cycles of 95° C./15 sec, 64° C./4 min, 28 cycles of 95° C./15 sec, 72° C./4 min. The reactions were paused at 72° C. on the thermal cycler at the end of the first PCR and 1 μl of 15 μM tagging primer mix was added. For reactions using primer pool#1, primer pool#2, or primer pool#3, a tagging primer of T2109-FAM-P5/T13994, T13995/T2110-P7-FAM, and T2109-FAM-P5/T2110-P7 was used, respectively. After end primer was added, the reactions resumed with 25 cycles 95° C./15 sec, 55° C./1 min, 72° C./2 min, and a final 72° C./10 min 4° C. hold. The final PCR products were diluted 1:50 fold and 1 μl was mixed with 12 μl of HiDi (ABI) and 2 μl of ROX1000 size standard (Asuragen). Capillary electrophoresis (CE) was run at 2.5 KV for 20 sec inject and 20 KV for 40 min run.
The expected full length product sequences of the 1st 6 and the 2nd 6 amplicons are set forth in Table 2. The expected sequence of the assembled 12-amplicon concatenation product is set forth in Table 3.
The full length product of the 1st 6 amplicons was detected with an observed size of 646 nt (with primer pool#1) (
To confirm whether the observed CE peaks of the 1st and the 2nd 6 amplicon concatenation reactions reflected the correct concatenation products, agarose gel was used to purify the two fragments of the 1st 6 and the 2nd 6 amplicon concatenation products. The fragments were then assembled in a separate PCR reaction with end primer T2109-FAM-P5/T2110-P7.
Single full length products were observed on CE (
To help detect the full length product of the assembled 12 amplicons from Example 1, agarose gel was used to purify the two 6-amplicon concatenation products. The two 6-amplicon concatenation products were then assembled using modified primers and modified PCR conditions to yield a 12-amplicon concatenation full length product in a single tube reaction without any purification in between.
Primers: Primers T13999_EGFR_737_761_F and T14010_EGFR_737_761_R have a perfectly matched stretch of 5 bases at their 3′ ends and are capable of forming a 78-bp primer dimer, which can result in an 80-bp deletion (
Reaction Conditions: PCR cycling conditions were also modified relative to the conditions used in Example 1. The primers were mixed at 500 nM each and 0.6 μl were used in a 10 μl PCR reaction. The final primer concentration was 30 nM. The reaction contained 5 μl of 2× PhoenixTaq PCR master mix (Enzymatics), 1 μl of 10 ng/μl DNA (NA12878, Condi), 1 μl of 500 mM TMAC, 0.6 μl of 500 nM primer pool#2 (2nd 6 amplicon pool) or pool#3 (complete set of 12 amplicon pool), and 2.4 μl of nuclease-free water. The pre-amplification and concatenation PCR conditions were 94° C./5 min, 2 cycles of 94° C./15 sec, 60° C./4 min, and 23 cycles of 94° C./15 sec, 72° C./2 min, followed by 20 cycles of 94° C./15 sec, 55° C./1 min, and 72° C./2 min (total PCR: 2 hours, 40 min), 1 μl of pre-amplification and concatenation PCR products were transformed into assembly/tagging PCR with 5 μl of 2× Phoenix Taq master mix, 1 μl of 15 μM T13348_EGFR_486_493_F and T2110-P7-FAM (for 2nd 6 amplicon concatenation) or 1 μl of 15 μM T2109-P5-FAM and T2110-P7 (for 12 amplicon concatenation), and 3 μl of nuclease-free water. PCR cycle conditions were 95° C./5 min, 25 cycles of 95° C./15 sec, 55° C./1 min, and 72° C./2 min. The final PCR products were diluted 1:50 fold and 1 μl was used for CE.
With modified primer pools and PCR conditions, improved detection of the 2nd 6 amplicon concatenation were observed (
In addition, primers T13354_EGFR_767_798_F and T13350_ERBB2_774_788_R were found to directly amplify the ERBB2 gene, resulting in a 260-bp truncation of PCR products (
To test the amplicon concatenation method on additional gene targets, 4 amplicons of the CFTR gene were designed to cover 24 common CFTR variants (Table 6). The expected sequence of the assembled 4-amplicon concatenation product is set forth in Table 7.
Reaction Conditions: The primers were mixed at 500 nM each and 0.6 μl were used in a 10 μl PCR reaction. The final primer concentration was 30 nM. The reaction contained 5 μl of 2× PhoenixTag PCR master mix (Enzymatics), 1 μl of 10 ng/μl DNA (NA12878, Coriell), 1 μl of 500 mM TMAC, 0.6 μl of 500 nM primer pool, and 2.4 μl of nuclease-free water. The pre-amplification and concatenation PCR conditions were 94° C./5 min, 2 cycles of 94° C./15 sec, 60° C./4 min, 23 cycles of 94° C./15 sec, 72° C./2 min, followed by 20 cycles of 94° C./15 sec, 55° C./1 min, and 72° C./2 min (total PCR: 2 hours, 40 min). 1 μl of pre-amplification and concatenation PCR products were transformed into assembly/tagging PCR with 5 μl of 2× Phoenix Taq master mix, 1 μl of 15 μM T2109-P5-FAM and T2110-P7, and 3 μl of nuclease-free water. PCR cycle conditions were 95° C./5 min, 25 cycles of 95° C./15 sec, 55° C./1 min, and 72° C./2 min. The final PCR products were diluted 1:50 fold and 1 μl was used for CE.
An exemplary CE trace of the concatenated products is shown in
Nanopore sequencing confirmed the correct 4-amplicon concatenation sequence (1186 nt). The full length 4-amplicon concatenation peak showed as 1059 nt on CE (
Primer concentrations were also varied by testing final primer concentrations of 5 nM, 10 nM, 30 nM, and 40 nM. The 30 nM final primer concentration produced the highest full length amplicon yield and least amount of truncated product (
Generally, when using a DNA polymerase which lacks 3′ to 5′ proofreading activity, the polymerase may acid a single, 3′ adenine (A) overhang to each end of the PCR product. Such non-template-based addition can have potential consequences for concatenation, e.g., preventing amplicons from further concatenation. For instance, in
To investigate whether inserting an extra thymine (T) in a DNA template (e.g., as shown in
Reaction Conditions: The modified primers were mixed at 500 nM each and 0.6 μl were used in a 10 μl PCR reaction. The final primer concentration was 30 nM. The reaction contained 5 μl of 2× PhoenixTaq PCR master mix (Enzymatics), 1 μl of 10 ng/μl DNA (NA12878, Coriell), 1 μl of 500 mM TMAC, 0.6 μl of 500 nM modified primer pool, and 2.4 μl of nuclease-free water. The pre-amplification and concatenation PCR conditions were 94° C./5 min, 2 cycles of 94° C./15 sec, 60° C./4 min, 23 cycles of 94° C./15 sec, 72° C./2 min, followed by 20 cycles of 94° C./15 sec, 55° C./1 min, and 72° C./2 min (total PCR: 2 hours, 40 min). 1 μl of pre-amplification and concatenation PCR products were transformed into assembly/tagging PCR with 5 μl of 2× Phoenix Taq master mix, 1 μl of 15 μM T2109-P5-FAM and T2110-P7, and 3 μl of nuclease-free water. PCR cycle conditions were 95° C./5 min, 25 cycles of 95° C./15 sec, 55° C./1 min, and 72° C./2 min. The final PCR products were diluted 1:50 fold and 1 μl was used for CE.
An exemplary CE trace of the concatenated products is shown in
DNA polymerases were also varied by testing standard antibody-based HotStart Taq DNA polymerase and comparing to Kapa HiFi HotStart DNA polymerase. With or without an extra adenine in the primer design, Kapa HiFi HotStart DNA polymerase did not generate dead-end intermediate fragments (i.e., fragments which cannot be further concatenated into full length products), in contrast to standard antibody-based HotStart Taq DNA polymerase. However, the Kapa HiFi HotStart enzyme can have leak activity at lower temperatures, and may benefit from the addition of reagents such as TMAC, ThermaGo, and ThermaStop to suppress non-specific amplification (
To test the amplicon concatenation method on additional CFTR variants (e.g., high frequency mutation variants), the DelF508 region and the G542X region were designed (Table 10) and added to the 4 amplicons of the CFTR gene. Exemplary variants covered by the 6 amplicons are listed in Table 11. The expected sequence of the assembled 6 amplicon concatenation product is set forth in Table 12.
Reaction Conditions: The primers were mixed at 500 nM each and 0.6 μl were used in a 10 μl PCR reaction. The final primer concentration was 30 nM. The reaction contained 5 μl of 2× PhoenixTaq PCR master mix (Enzymatics), 1 μl of 10 ng/μl DNA (NA12878, Coriell), 1 μl of 500 mM TMAC, 0.6 μl of 500 nM primer pool, and 2.4 μl of nuclease-free water. The pre-amplification and concatenation PCR conditions were 94° C./5 min, 2 cycles of 94° C./15 sec, 60° C./4 min, 23 cycles of 94° C./15 sec, 72° C./2 min, followed by 20 cycles of 94° C./15 sec, 55° C./1 min, and 72° C./2 min (total PCR: 2 hours, 40 min). 1 μl of pre-amplification and concatenation PCR products were transformed into assembly/tagging PCR with 5 μl of 2× Phoenix Taq master mix, 1 μl of 15 μM T2109-P5-FAM and T2110-P7, and 3 μl of nuclease-free water. PCR cycle conditions were 95° C./5 min, 25 cycles of 95° C./15 sec, 55° C./1 min, and 72° C./2 min. The final PCR products were diluted 1:50 fold and 1 μl was used for CE.
An exemplary CE trace of the concatenated products is shown in
Nanopore sequencing confirmed the correct 6 amplicon concatenation sequence (1589 nt). 400 fmol of the 6-amplicon concatemer were loaded on a nanopore flow cell of nanopore sequencing. About 100,000 reads were obtained from the concatemer, the majority of which were full length.
The second PCR cycle was also varied by testing at 10, 15, 20, and 25 cycles. Full length products were observed starting at about 15 cycles, but 25 cycles produced the greatest yield (
To test whether it was possible to expand the size and increase the amplicon limit of a multiplex PCR and a concatenation reaction in a single tube, 8 additional CFTR regions of interest (ROIs) were designed and combined with the 6 CFTR amplicons from Example 5 (Table 13). The expected sequence of the assembled 14-amplicon concatenation product is set forth in Table 14.
TTCTCAATAAGTCCTGGCCAGAGGGTGAGATTTGAACACTGCTTG
Reaction Conditions: The primers were mixed and the final primer concentration was 30 nM. The reaction contained 5 μl of 2× PhoenixTaq PCR master mix (Enzymatics), 1 μl of 10 ng/μl DNA (NA12878, CorieII), 1 μl of 500 mM TMAC, 0.6 μl of 500 nM primer pool, and 2.4 μl of nuclease-free water. The pre-amplification and concatenation PCR conditions were 94° C./5 min, 2 cycles of 94° C./15 sec, 60° C./4 min, 23 cycles of 94° C./15 sec, 72° C./2 min, followed by 20 cycles of 94° C./15 sec. 55° C./1 min, and 72° C./2 min (total PCR: 2 hours, 40 min). 1 μl of pre-amplification and concatenation PCR products were transformed into assembly/tagging PCR with 5 μl of 2× Phoenix Taq master mix, 1 μl of 15 μM T2109-P5-FAM and T2110-P7, and 3 μl of nuclease-free water. PCR cycle conditions were 95° C./5 min, 25 cycles of 95° C./15 sec, 55° C./1 min, and 72° C./2 min. The final PCR products were diluted 1:50 fold and 1 μl was used for CE.
An exemplary CE trace of the concatenated products is shown in
Nanopore sequencing confirmed the correct 14 amplicon concatenation sequence (3203 nt). Barcoded CFTR 14-amplicon concatamer was mixed with other samples and sequenced on a nanopore flow cell of nanopore sequencing. After demultiplexing, about 10,000 reads were obtained from the CFTR 14-amplicon concatamer, many of which were full length (
The amplicon concatenation methods described herein may be applied to co-detection of CFTR variants, and SMN1/SMN2 copy number variation, disease modifiers, and/or silent carrier mutations. To investigate a method of measuring copy number using a spiking external control, the following experiment was performed. A schematic diagram of the experimental design is shown in
Briefly, a synthetic gBlock control was designed to contain one modified CFTR amplicon (CFTR* in
After nanopore sequencing, counting the sequencing reads as CFTR* with * (with stamp mark from gBlock)=A, CFTR without * (from sample genomic DNA)=B, SMN* with * (with stamp mark from gBlock)=C, SMN1 without * (from sample genomic DNA)=D, and SMN2 without * (from sample genomic DNA)=E, the copy number of SMN1 and SMN2 was calculated as:
SMN1 copy number F=2*(D/C)*(A/B) and SMN2 copy number G=2*(E/C)*(A/B).
The 6 CFTR amplicon and SMN amplicon primers are listed in Table 15. The expected CFTR+SMN amplicon concatenation product sequence and the spiking control gBlock sequence are shown in Table 16. The differential base in the gBlock relative to the natural genomic sequence are boxed in
Reaction Conditions: The primers were mixed at 250 nM each and 1.2 μl were used in a 10 μl PCR reaction. The final primer concentration was 30 nM. The reaction contained 5 μl of 2× PhoenixTaq PCR master mix (Enzymatics), 1 μl of 10 ng/μl DNA (NA12878, Coriell), 1 μl of diluted HindIII-cut T14641-gBlock (˜1500 copies/μl based on estimate from ng/μl of IDT synthesis label), 1 μl of 500 mM TMAC, 1.2 μl of 250 nM primer pool, and 0.8 μl of nuclease-free water. The pre-amplification and concatenation PCR conditions were 94° C./5 min, 2 cycles of 94° C./15 sec, 60° C./4 min, 23 cycles of 94° C./15 sec, 72° C./2 min, followed by 20 cycles of 94° C./15 sec, 55° C./1 min, and 72° C./2 min (total PCR: 2 hours, 40 min). 1 μl of pre-amplification and concatenation PCR products were transformed into assembly/tagging PCR with 5 μl of 2× Phoenix Taq master mix, 1 μl of 15 μM T2109-P5-FAM and T2110-P7, and 3 μl of nuclease-free water, PCR cycle conditions were 95° C./5 min, 25 cycles of 95° C./15 sec, 55° C./1 min, and 72° C./2 min. The final PCR products were diluted 1:50 fold and 1 μl was used for CE.
An exemplary CE trace of the concatenated products is shown in FIG, 12C. The POP 7 polymer used on CE cannot resolve and size fragments greater than 1000 nt. The 1979 nt constructs therefore showed as about 1077 nt on CE. However, agarose gel analysis confirmed a fragment size of about ˜2000 nt (
Genomic DNA samples were spiked in the gBlock control, concatenated, and amplified with a unique sample barcode outside P7 and the P7 tag sequence. These samples were ligated with a nanopore sequencing adaptor and sequenced. The percent (%) of read counts at the differential sites for CFTR*/CFTR, SMN*/SMN1/SMN2 were used to calculate copy number. Nanopore sequencing also confirmed the correct 7 amplicon concatenation sequence (1979 nt).
The sample HG02697 with a SMN1 copy of >4 and a SMN2 copy of 1, as determined by AmplideX® PCR/CE SMN1/2 Kit (RUO), resulted in a SMN1 copy of 4.5 and a SMN2 copy of ˜1. Several other samples with different SMN1/SMN2 ratios were also amplified, concatenated, and barcoded for nanopore sequencing. The concatenation/nanopore sequencing results of observed SMN1/SMN2 ratios were compared with the results determined by AmplideX® PCR/CE SMN1/2 Kit (RUO) (
Number | Date | Country | |
---|---|---|---|
62940537 | Nov 2019 | US |