This application is filed with a Sequence Listing which has been submitted electronically in XML format. Said XML copy, created on Jul. 31, 2023, is named “2023-07-31_01243-0013-00US ST26.xml” and is 94,245 bytes in size. The information in the electronic format of the sequence listing is incorporated herein by reference in its entirety.
This application relates to polynucleotides comprising read primer binding sequences, insert sequences derived from a target nucleic acid, a concatenation sequence, and an attachment sequence. Compositions comprising these polynucleotides and methods of generating and sequencing a concatenated nucleic acid sequencing template are also described. In addition, this disclosure relates to methods of preparing sequencing templates comprising multiple inserts. This disclosure also relates to methods of use of such templates, including analysis of contiguity information. Further, sequencing templates comprising two copies of the same insert sequence (i.e., an insert sequence and a copy of an insert sequence) can be used to correct for random errors generated during sequencing or amplification or to identify nucleobase damage or other mutation that leads to non-canonical base pairing in a double-stranded nucleic acid. These sequencing templates comprising an insert sequence and a copy of the insert sequence can also be used for methylation analysis.
Typically, the read-length on sequencing by synthesis (SBS) platforms is limited to 250-300 base pairs due to phasing/pre-phasing. This read-length limits the throughput of SBS platforms.
Previously, methods were described to improve SBS throughput using polynucleotides comprising multiple inserts. Often these methods relied on orthogonal SBS reactions, for example with different polymerases or substrate combinations or with primer blocking (See WO 2015/0002789 and US 20180312917). However, a need exists for straightforward means to increase sequencing output from a flowcell without need for non-standard reagents to allow cost-effective and user-friendly means of increasing sequencing output.
The present disclosure describes polynucleotides comprising multiple insert sequences from one or more target nucleic acid. These polynucleotides may be generated from multiple DNA libraries. Annealing of a hybridization sequence in one library product to a complement of a hybridization sequence in another library product to form a hybridized adduct can then allow elongation to form the polynucleotide comprising multiple insert sequences. Sequencing of these multiple insert sequences can be performed by sequential SBS elongation reactions based on multiple distinct read primer binding sequences comprised in the polynucleotides.
In addition, conventional short read sequencing methods comprise an initial generation of short separate fragments from intact genomic DNA or RNA. These fragments are generated in a several ways such as physical shearing, enzymatic digestion, or polymerase extension from one or more primers. Template preparation then modifies and appends synthetic adapters to these fragments to enable them to be sequenced. These sequencing templates almost always contain a single fragment from the original sample comprising the sequence of bases in the same order and juxtaposition as in the intact genome. Where a template is double-stranded, the complement of a sequence is associated by hybridization of the two strands. However, when a double-stranded template is denatured, the two complementary strands separate, and a template becomes a single strand comprising a single sequence fragment from the original sample. In this process, any association between the two complementary strands is lost. In addition, in this process of fragmentation and template preparation, any association between two or more fragments that were contiguous in the original unfragmented genome is also lost.
The exception to this rule of loss of contiguity information is found in template preparation methods that employ ligation to join two or more distal fragments together prior to sequencing adapters being appended. One example is “mate-pair” libraries, wherein the ends of a large DNA fragment are joined together forming a circle, then further fragmented followed by recovery of the sub-fragment that spans the co-joined ends. The subsequent template contains two sequences from the original large fragment joined in tandem. Another example is chromatin based conformational capture where distal fragments of DNA in a genome are spatially organized in close proximity due to the structural arrangement of DNA complexed with chromatin in vivo. Ligation of fragments in proximity with one another and subsequent processing generates sequencing templates with tandem inserts that give information about the spatial relevance, and by inference, functional relevance of the individual inserts.
A number of different methods have been developed as potential means of improving preparation of sequencing templates with multiple inserts, such as Duplex Sequencing (Schmitt, et al. Proc. Natl. Acad. Sci. U S. A. 109:14508-14513 (2012), Duplex Proximity Sequencing (Pro-Seq, as described in Pel et al. PLoS One 13:1-19 (2018)), CypherSeq (Gregory et al. Nucleic Acids Res. 44:e22 (2016)), o2n-seq (Wang et al. Nat. Commun. 8, 15335 (2017)), Circle Sequencing (Lou et al., Proc. Natl. Acad. Sci. U S. A. 110:19872-19877 (2013)), and Bot Sequencing (Hoang et al. Proc. Natl. Acad. Sci. U S. A. 113:9846-9851 (2016) and Abascal et al. Nature 593, 405-410 (2021)). However, all these of these methods have shown drawbacks, and none has had universal applicability.
The Concatenating Original Duplex for Error Correction (CODEC) method recently described in Bae et al., bioRxiv, 10.1101/2021.06.11.448110, posted Jun. 12, 2021, involves physically linking both strands of double-stranded DNA for sequencing of a single duplex with a single read pair using specialized CODEC adapter complexes. The CODEC method can be used to identify non-canonical base-pairing that may be due to nucleobase damage or to a change comprised only in one strand of a double-stranded nucleic acid, as well as errors that may have been introduced during PCR amplification or sequencing. However, the CODEC method requires two consecutive ligations that can limit conversion efficiency, and byproducts may also be formed by undesired ligations.
In the absence of innate structural relationships between sequences in the genome, surrogate “association markers” in the form of barcodes may be used. For example, a large fragment of DNA, such as greater than 1000 base pairs, or even greater than 5000 base pairs, can be isolated by dilution, compartmentalization, or immobilization on a surface, and further fragmented wherein each sub-fragment thereafter appends a common barcode sequence. Where many fragments are thus processed in parallel, with each isolated fragment receiving a unique barcode sequence appended to its subsequent subsequences, a pool of all sub-fragments from all fragments can be sequenced in a single experiment, and the subfragments disambiguated by identifying and collating their barcode sequences. This approach enables contiguous sequences within the genome to be associated with one another and can enable the assembly in silico of numerous subfragments into much larger in silico fragments and can help with the phasing of variants in a genome.
In another type of barcoding, unique molecular indices (UMIs) are used for preserving associations between sequences within a genome that physically separate during template preparation and sequencing. The UMIs comprise short barcode sequences appended to fragments of DNA or RNA during template preparation such that individual single molecules each receive a unique barcode. Reading the UMI by sequencing can distinguish individual molecules (such as fragments within a preparation of templates) even when the original sample contained two or more identical fragments, in length and in sequence. UMIs also help identify mistakes (e.g., alterations to the innate genomic sequence) generated and propagated during PCR or other such methods that make copies of original templates. This is useful in experiments for sequencing samples that contain innate variants at low frequencies that would potentially otherwise be difficult to identify in a background of artificial variants created by PCR. In another use of UMIs, a double-stranded fragment can be ligated appended with a double-stranded adapter containing a duplex UMI (i.e., a UMI barcode hybridized to its complement in a double-stranded adapter such that a first and a second strand of the genomic fragment each append a common UMI barcode). In this manner, after separation by denaturation the first strand and second strand can be identified and re-associated by the UMI. Such use of UMIs can help improve the accuracy of sequencing by giving two “reads” of a sequence in the genome, in other words identifying and using the “sense” and “antisense” pair of templates from a fragment to infer the validity of a base call during a sequencing read of either template.
The use of barcodes to associate sequences, either distal or complementary within a genome, is in practice complex because of the constraints around designing and incorporating barcodes within adapters and sequencing reactions. For instance, there is a finite number of permutations for a given length of barcode. In one example, a four base barcode only has two-hundred and fifty-six permutations and not all are functional in practice due to self-complementarity and other sequencing considerations. Similar issues manifest when the barcode is longer but with the added penalty of requiring more cycles of sequencing to read the barcodes.
Adding barcodes to adapters adds complexity to the adapter itself. For instance, adding variations in performance from one adapter to another results in challenges around normalization during library pooling. Complex barcodes also require complex manufacturing, particularly when a barcode and its complement are hybridized in a double-stranded adapter.
The use of in vivo structural associations, such as mate-pairs or chromatin conformational capture, also require complex workflows and is limited in the associations it can identify. For example, a challenge of mate-pairs is the extreme size of large fragments, while a challenge of chromatin conformational capture is chromatin-induced associations.
Disclosed herein are a barcode-free methods that can provide association information about contiguous and complementary sequences within the genome. These methods may utilize a surface to link sequences in tandem within a single template. Methods may also use compartmentalization for generating templates for proximity or haplotype data. When sequenced, the resulting templates can provide information to correct errors in sequencing or identify non-canonical base pairings and also to provide contiguity information for assembly and phasing of genomic information.
Disclosed herein also are methods of detecting methylation status. Conventional methods for detecting methylation status in genomic DNA generally use a chemical or biochemical reaction to convert the bases of interests to a different base. The detection of this conversion is used to infer whether or not the base was methylated. These methods require a sample to be split in two aliquots. One aliquot is treated by the chemistries/biochemistries while the other aliquot remains untreated. Both are then sequenced and compared to one another to deduce the methylation status. One example of such chemistries is bisulfite sequencing, which uses sodium bisulfite conversion of non-methylated C bases to U bases. The uracil nucleotides are then converted to thymine nucleotides during an amplification step such as PCR. Following sequencing of both the treated and untreated sample, a comparison of the reads will indicate, wherein if a C base in the untreated sample is read as a T in the treated sample, that this C base was not methylated in the original sample. However, where a C base in the untreated sample is still read as a C base in the treated sample, then by deduction C base was methylated in the original sample.
A similar strategy is used with the EM-Seq assay as described in Vaisvilas et al., Genome Res. 31(7): 1280-1289 (2021), except that an enzymatic reaction rather than a chemical reaction is used to convert non-methylated C's. A recent publication (Liu et al., Nature Biotechnology 37(4):424-429 (2019)) introduced an alternative chemistry based on borane that converts methylated C nucleotides and does not convert unmethylated C nucleotides. It has a reported advantage over normal C conversion chemistries such as bisulfite sequencing, because the converted genome is mostly still a 4 base genome comprising A, C, G and T as only a small percentage of the genome is methylated (in contrast with bisulfite chemistries where the converted genome is mostly A, G and T).
A common characteristic of current method of methylation analysis is that a sample needs to be split into two aliquots, which are processed and sequenced in parallel. Technologies do exist that directly detect methylation status of bases without needing to split the sample. These methods rely on single-molecule sequencing technologies that use sequencing strategies that can differentiate methylated and unmethylated bases in the original sample. Examples of such technologies include nanopore sequencing (see, for example, “Epigenetics and methylation analysis,” Oxford Nanopore Technologies, downloaded on Oct. 7, 2021 at nanoporetech.com/applications/investigation/epigenetics-and-methylation-analysis) and SMRT sequencing (as described in Flusberg et al., Nat Methods. 7(6): 461-465 (2010)). However, these strategies are disadvantageous for methods where high-throughput sequencing is necessary or where genomes of interest are small in fragment size, such as cell-free DNA.
Described herein are methods where a single aliquot of a methylated sample is treated and sequencing to discern the methylation status of a genome. The methods include those that can discern hydroxymethylated-cytosine from methylated-cytosine. The present methods can decrease sample preparation and sequencing burden and potentially decreases the amount of starting material required for methylation analysis.
Described herein are polynucleotides comprising multiple insert sequences. These polynucleotides may be used in methods to allow sequencing of multiple inserts sequences from a target nucleic acid. Also described herein are polynucleotides comprising multiple inserts for use as sequencing templates in methods of error correction and identification of non-canonical base pairing, determining contiguity data, and methylation analysis.
Embodiment 1 is a polynucleotide comprising (a) a 5′ terminal polynucleotide comprising a first read primer binding sequence; (b) a first insert sequence located 3′ of the 5′ terminal polynucleotide, wherein the first insert sequence is derived from a target nucleic acid; (c) a concatenation sequence located 3′ of the first insert sequence comprising a second read primer binding sequence and a hybridization sequence; (d) a second insert sequence located 3′ of the concatenation sequence, wherein the second insert sequence is derived from a discontiguous sequence of the target nucleic acid or from a different target nucleic acid than the first insert sequence; and (e) a 3′ terminal polynucleotide sequence.
Embodiment 2 is a polynucleotide comprising a 3′ terminal polynucleotide comprising a first read primer binding sequence; a first insert sequence 5′ of the 3′ terminal polynucleotide that is derived from a target nucleic acid; a concatenation sequence comprising a second read primer binding sequence that is orthogonal to the first read primer binding sequence, wherein the second read primer binding sequence comprises a hybridization sequence; a second insert sequence 5′ of the concatenation sequence and derived from a discontiguous sequence of the target nucleic acid or from a different target nucleic acid than the first insert sequence; and an attachment polynucleotide at the 5′ end of the polynucleotide and comprising an attachment sequence, wherein the 3′ terminal polynucleotide, the concatenation sequence, and the attachment polynucleotide are not derived from the target nucleic acid.
Embodiment 3 is the polynucleotide of embodiment 1 or 2, wherein the two insert sequences are derived from different target nucleic acids.
Embodiment 4 is the polynucleotide of any of the preceding embodiments, wherein the first insert sequence and the second insert sequence each independently comprise from 40 to 400 nucleotides, 100 to 200 nucleotides, or 150 nucleotides.
Embodiment 5 is the polynucleotide of any of the preceding embodiments, wherein the first read primer binding sequence comprises a first adapter sequence.
Embodiment 6 is the polynucleotide of any of the preceding embodiments, wherein the first read primer binding sequence further comprises the complement of a transposon end sequence.
Embodiment 7 is the polynucleotide of embodiment 5 or 6, wherein the first adapter sequence is the complement of a A14 primer sequence (A14′) or the complement of a B15 primer sequence (B15′).
Embodiment 8 is the polynucleotide of any one of embodiments and 3 to 7, wherein, the 3′ terminal polynucleotide comprises the complement of a P7 primer sequence (P7′) or the complement of a P5 primer sequence (P5′
Embodiment 9 is the polynucleotide of any one of embodiments 2 to 7, wherein the 3′ terminal polynucleotide comprises the complement of a P5 primer sequence (P5′) and the attachment polynucleotide comprises a P7 primer sequence (P7), or the 3′ terminal polynucleotide comprises the complement of a P7 primer sequence (P7′) and the attachment polynucleotide comprises a P5 primer sequence (P5).
Embodiment 10 is the polynucleotide of any one of embodiments 2 to 9, wherein the concatenation sequence comprises (a) the hybridization sequence, and optionally comprises (b) a transposon end sequence 3′ of the hybridization unit and the complement of the transposon end sequence 5′ of the hybridization unit.
Embodiment 11 is the polynucleotide of embodiment 10, wherein the second read primer binding sequence comprises the hybridization sequence and the complement of the transposon end sequence.
Embodiment 12 is the polynucleotide of any one of embodiments 2 to 11, wherein the attachment polynucleotide comprises a second adapter sequence and optionally a transposon end sequence.
Embodiment 13 is the polynucleotide of embodiment 12, wherein the second adapter sequence is an A14 sequence or a B15 sequence.
Embodiment 14 is the polynucleotide of embodiment 13, wherein the first adapter sequence is the complement of an A14 sequence (A14′) and the second adapter sequence is a B15 sequence, or the first adapter sequence is the complement of a B15 sequence (B15′) and the second adapter sequence is an A14 sequence.
Embodiment 15 is the polynucleotide of any one of embodiments 2 to 7 or 9 to 14, wherein the 3′ terminal polynucleotide and/or the attachment polynucleotide each independently comprise at least one of a barcode sequence, a unique molecular identifier (UMI) sequence, a capture sequence, or a cleavage sequence.
Embodiment 16 is the polynucleotide of any one of embodiments 2 to 7 and 9 to 14, wherein the polynucleotide is immobilized on a solid support.
Embodiment 17 is the polynucleotide of embodiment 16, wherein the polynucleotide is immobilized on the solid support via the attachment polynucleotide.
Embodiment 18 is the polynucleotide of embodiment 17, wherein the polynucleotide is immobilized on the solid support via hybridization of the attachment polynucleotide to an attachment polynucleotide complement on the surface of the solid support.
Embodiment 19 is the polynucleotide of embodiment 17, wherein the polynucleotide is immobilized to the solid support via binding of an affinity moiety on the attachment polynucleotide to a binding moiety on the surface of the solid support.
Embodiment 20 is the polynucleotide of any one of embodiments 16 to 19, wherein the solid support is a flow cell or a bead.
Embodiment 21 is the polynucleotide of any one of embodiments 2 to 7 or 9 to 20, wherein the polynucleotide comprises, between the second insert sequence and the attachment polynucleotide, at least one insert unit comprising an insert sequence derived from a discontiguous sequence of the target nucleic acid or from a different target nucleic acid than the other insert sequences at the 5′ end and a concatenation sequence comprising a read primer binding sequence at the 3′ end, wherein the read primer binding sequence is orthogonal to the other read primer binding sequences.
Embodiment 22 is the polynucleotide of embodiment 21, wherein the polynucleotide is hybridized to its complement.
Embodiment 23 is a composition comprising the polynucleotide of any one of embodiments 1, 3-8, or 22 and its complement, wherein the complement comprises (a) a 5′ terminal complement comprising a first complement read primer binding sequence; (b) a complement sequence of the second insert sequence located 3′ of the 5′ terminal complement; (c) a complement concatenation sequence located 3′ of the complement sequence of the second insert sequence comprising: (i) a second complement read primer binding sequence, and (ii) a complement hybridization sequence; (d) a complement sequence of the first insert sequence located 3′ of the complement concatenation sequence; and (e) a 3′ terminal complement.
Embodiment 24 is a composition comprising the polynucleotide of any one of embodiments 2 to 7 or 9 to 22 and its complement, wherein the complement comprises a 3′ terminal complement comprising a first complement read primer binding sequence, wherein the first complement read primer binding sequence is orthogonal to the first and second read primer binding sequences; the complement of the second insert sequence 5′ of the 3′ terminal complement; a complement concatenation sequence 5′ of the complement of the second insert sequence and comprising a 3′ to 5′ second complement read primer binding sequence, wherein the second complement read primer binding sequence is orthogonal to the first and second read primer binding sequences, and to the first complement read primer binding sequence; the complement of the first insert sequence 5′ of the complement concatenation sequence; and a complement attachment polynucleotide at the 5′ end comprising a complement attachment sequence.
Embodiment 25 is the composition of embodiment 24, wherein the first complement read primer binding sequence is complementary to the second adapter sequence and, when present, the transposon end sequence of the attachment polynucleotide; the complement concatenation sequence is complementary to the concatenation sequence; and the complement attachment polynucleotide is complementary to first adapter sequence and, when present, the complement of the transposon end sequence.
Embodiment 26 is the composition of embodiment 24 or 25, wherein the polynucleotide is immobilized on a solid support via the first attachment polynucleotide.
Embodiment 27 is the composition of embodiment 24 or 25, wherein the complement is immobilized on the solid support via the complement attachment polynucleotide.
Embodiment 28 is the polynucleotide of any one of embodiments 2 to 7 or 9 to 22 or the composition of any one of embodiments 24 to 27, wherein the polynucleotide has the structure: 3′-P7′-B15′-ME′-Insert 1-ME-HYB-ME′-Insert 2-ME-A14-P5-5′, wherein ME′ is the complement of a mosaic end sequence (for example, SEQ ID NO: 3).
Embodiment 29 is the polynucleotide or composition of embodiment 28, wherein the complement of the polynucleotide has the structure: 3′-P5′-A14′-ME′-Insert 2-ME-HYB′-ME′-Insert 1-ME-B15-P7-5′.
Embodiment 30 is a transposome complex comprising a transposase; a first transposon comprising the complement of a first read primer binding sequence, wherein the complement of the first read primer binding sequence comprises a 3′ portion comprising a transposon end sequence; and the complement of a first adapter sequence; and a second transposon comprising a 5′ portion comprising the complement of the transposon end sequence; and the complement of a hybridization sequence.
Embodiment 31 is the transposome complex of embodiment 30, wherein the complement of the first adapter sequence is a B15 sequence.
Embodiment 32 is the transposome complex of embodiment 30 or 31, wherein the second transposon comprises a complement attachment sequence 5′ of the first read primer binding sequence, optionally wherein the complement attachment sequence comprises a P7 sequence.
Embodiment 33 is the transposome complex of embodiment 30, wherein the transposome complex has the structure:
wherein ME is a mosaic end sequence such as SEQ ID NO: 6.
Embodiment 34 is the transposome complex of any one of embodiments 30 to 33, wherein the transposome complex is immobilized on a bead via the first or second transposon.
Embodiment 35 is a transposome complex comprising a transposase; a first transposon comprising an attachment polynucleotide, wherein the attachment polynucleotide comprises a 5′ portion comprising an attachment sequence; a 3′ portion comprising a second read primer binding sequence, comprising a 3′ portion comprising a transposon end sequence; and an adapter; and a second transposon comprising a 5′ portion comprising the complement of the transposon end sequence; and a hybridization sequence.
Embodiment 36 is the transposome complex of embodiment 35, wherein the adapter is an A14 sequence.
Embodiment 37 is the transposome complex of embodiment 35 or 36, wherein the attachment sequence comprises a P5 sequence.
Embodiment 38 is the transposome complex of embodiment 35, wherein the transposome complex has the structure:
Embodiment 39 is the transposome complex of any one of embodiments 35 to 38, wherein the transposome complex is immobilized to a solid support via the first or second transposon.
Embodiment 40 is the transposome complex of any one of embodiments 35 to 38, wherein the transposome complex is immobilized on a bead.
Embodiment 41 is the transposome complex of any one of embodiments 30 to 40, wherein the transposome complex is immobilized to an affinity binding partner on the solid support or bead via an affinity element connected to a linker attached to the first or second transposon.
Embodiment 42 is a composition or kit comprising more than one transposome complex, such as the transposome complex of any one of embodiments 30 to 41.
Embodiment 43 is a composition or kit comprises a solid support, optionally wherein the optionally support is beads; components for generating transposome complexes, comprising a transposase; oligonucleotides for generating an oligonucleotide duplex, wherein the first oligonucleotide comprises a 3′ transposon end sequence and a 5′ first adapter sequence and the second oligonucleotide comprises a 5′ transposon end sequence and a 3′ second adapter sequence, wherein the 5′ transposon end sequence is complementary to the 3′ transposon end sequence; wherein the first and second adapter sequences are not the same; and a first and second set of primers for adding attachment sequences and hybridization sequences to fragments by PCR, wherein the first set of primers comprises primers for adding a hybridization sequence and a first attachment sequence to fragments; and wherein the second set of primers comprises primers for adding a complement hybridization sequence and a second attachment sequence to fragments; wherein the first and second attachment sequences are not the same.
Embodiment 44 is an adapter composition or kit comprising a first forked adapter complex and a second forked adapter complex, wherein the first forked adapter complex comprises a complement attachment polynucleotide comprising a 5′ portion comprising a complement attachment sequence; and a 3′ portion comprising an adapter; and a hybridization polynucleotide comprising (a) a 5′ portion comprising the complement of a portion of the adapter and hybridized thereto; and (b) the complement of a hybridization sequence, wherein the complement of the hybridization sequence is not complementary to the complement attachment polynucleotide; and the second forked adapter complex comprises an attachment polynucleotide comprising a 5′ portion comprising an attachment sequence; and a 3′ portion comprising the adapter; and a hybridization polynucleotide comprising (a) a portion comprising the complement of a portion of the adapter and hybridized thereto; and (b) a hybridization sequence, wherein the hybridization sequence is not complementary to the attachment polynucleotide.
Embodiment 45 is the adapter composition or kit of embodiment 44, wherein the attachment sequence comprises a P5 primer sequence and the complement attachment sequence comprises a P7 primer sequence.
Embodiment 46 is the adapter composition or kit of embodiment 44 or wherein the complement attachment polynucleotide comprises a B15 sequence and the hybridization polynucleotide comprises a A14 sequence.
Embodiment 47 is the adapter composition or kit of embodiment 46, wherein a first forked adapter complex has the structure:
and a second forked adapter complex has the structure:
Embodiment 48 is the adapter composition or kit of any one of embodiments 44 to 47, wherein the adapter complexes comprise methylated nucleotides (e.g., include methylated cytosines).
Embodiment 49 is a method of generating a concatenated nucleic acid sequencing template comprising attaching a first read primer binding sequence to the 3′ end of a first insert sequence derived from a first target nucleic acid; attaching a hybridization sequence to the 5′ end of the first insert sequence; attaching the complement of the hybridization sequence to the 3′ end of a second insert sequence derived from a discontiguous region of the first target nucleic acid or from a second target nucleic acid; and annealing the hybridization sequence to the complement of the hybridization sequence to form a hybridized adduct; synthesizing a fully double-stranded concatenated nucleic acid sequencing template from the hybridized adduct; wherein the region between the first and second insert sequences comprises a second read primer binding sequence that comprises the hybridization sequence and is orthogonal to the first read primer binding sequence; thereby generating a concatenated nucleic acid sequencing template.
Embodiment 50 is the method of embodiment 48, wherein the attaching the first read primer binding sequence and the attaching the hybridization sequence comprises contacting the one or more target nucleic acids with a transposome complex, under conditions suitable for tagmentation.
Embodiment 51 is the method of embodiment 49 or 50, wherein the attaching the complement of the hybridization sequence to the 3′ end of a second insert sequence derived from a discontiguous region of the first target nucleic acid or from a second target nucleic acid comprises contacting the one or more target nucleic acids with a transposome complex, under conditions suitable for tagmentation.
Embodiment 52 is the method of embodiment 49, wherein the attaching a first read primer binding sequence to the 3′ end of a first insert sequence and the attaching a hybridization sequence to the 5′ end of the first insert sequence comprise contacting one or more target nucleic acids with a first forked adapter complex of any one of embodiments 44 to 48, under conditions suitable for ligation of the adapter complexes to the ends of the fragments to form fragments ligated at both ends with the first adapter complex and fragments ligated at both ends with the second adapter complex, and denaturing the ligated fragments.
Embodiment 53 is the method of embodiment 49 or 50, wherein attaching the complement of the hybridization sequence to the 3′ end of a second insert sequence comprises contacting the one or more target nucleic acids with a second forked adapter complex, under conditions suitable for ligation of the adapter complexes to the ends of the fragments to form fragments ligated at both ends with the first adapter complex and fragments ligated at both ends with the second adapter complex, and denaturing the ligated fragments.
Embodiment 54 is a method of generating a concatenated nucleic acid sequencing template comprising contacting a first sample comprising a first target nucleic acid with a first transposome complex and a second transposome complex, wherein each transposome complex comprises a transposase; a first transposon comprising a 3′ portion comprising a transposon end sequence and a 5′ portion comprising an adapter sequence; and a second transposon comprising a 5′ portion comprising the complement of the transposon end sequence and hybridized thereto; wherein the adapter sequence in the first transposome complex is the complement of a first adapter sequence and the adapter sequence in the second transposome complex is a second adapter sequence; under conditions sufficient to fragment the first target nucleic acid to generate a first tagged product comprising an insert sequence from the first target nucleic acid tagged at one end with the transposons of the first transposome complex and at the other end with the transposons of the second transposome complex; adding a complement attachment sequence to the 3′ end of the first tagged product and adding the complement of a hybridization sequence to the 5′ end of the first tagged product, optionally by polymerase chain reaction, to form a first modified tagged product; contacting a second sample comprising a second target nucleic acid with the transposome complexes under conditions sufficient to fragment the second target nucleic acid to generate a second tagged product comprising an insert sequence from the second target nucleic acid tagged at one end with the transposons of the first transposome complex and at the other end with the transposons of the second transposome complex; adding an attachment sequence to the 3′ end of the second tagged product and adding a hybridization sequence to the 5′ end of the second tagged product, optionally by polymerase chain reaction, to form a second modified tagged product; annealing the hybridization sequence of the first modified tagged product to the complement of the hybridization sequence in the second modified tagged product to form a hybridized adduct; and synthesizing a fully double-stranded concatenated nucleic acid sequencing template from the hybridized adduct, wherein the concatenated nucleic acid sequence template comprises (a) a first read primer binding sequence 3′ of the insert sequence from the second target nucleic acid, wherein the first read primer binding sequence comprises the first adapter sequence and the complement of the transposon end sequence, and (b) a second read primer binding sequence between the two insert sequences, wherein the second read primer binding sequence comprises the transposon end sequence and the hybridization sequence, and wherein the first read primer binding sequence is orthogonal to the second read primer binding sequence.
Embodiment 55 is a method of generating a concatenated nucleic acid sequencing template comprising contacting a first sample comprising a first target nucleic acid with a first transposome complex, wherein the first transposome complex comprises a transposase; a first transposon comprising a 3′ portion comprising a transposon end sequence and a 5′ portion comprising an attachment sequence and the complement of a first adapter sequence; and a second transposon comprising a 5′ portion comprising the complement of the transposon end sequence and hybridized thereto; under conditions sufficient to fragment the first target nucleic acid to generate a first tagged product comprising an insert sequence from the first target nucleic acid tagged at each end with the transposons of the first transposome complex; adding the complement of a hybridization sequence to the 5′ end of the first tagged product, optionally by polymerase chain reaction, to form a first modified tagged product; contacting a second sample comprising a second target nucleic acid with a second transposome complex, wherein the second transposome complex comprises a transposase; a first transposon comprising a 3′ portion comprising a transposon end sequence and a 5′ portion comprising a second adapter sequence and a complement attachment sequence; and a second transposon comprising a 5′ portion comprising the complement of the transposon end sequence and hybridized thereto; under conditions sufficient to fragment the second target nucleic acid to generate a second tagged product comprising an insert sequence from the second target nucleic acid tagged at each end with the transposons of the second transposome complex; adding the complement of the hybridization sequence to the 5′ end of the second tagged product, optionally by polymerase chain reaction, to form a second modified tagged product; annealing the hybridization sequence of the first modified tagged product to the complement of the hybridization sequence in the second modified tagged product to form a hybridized adduct; and synthesizing a fully double-stranded concatenated nucleic acid sequencing template from the hybridized adduct, wherein the concatenated nucleic acid sequence template comprises (a) a first read primer binding sequence 3′ of the insert sequence from the second target nucleic acid, wherein the first read primer binding sequence comprises the first adapter sequence and the complement of the transposon end sequence, and (b) a second read primer binding sequence between the two insert sequences, wherein the second read primer binding sequence comprises the transposon end sequence and the hybridization sequence, and wherein the first read primer binding sequence is orthogonal to the second read primer binding sequence.
Embodiment 56 is the method of embodiment 54 or 55, wherein the transposome complexes are immobilized on a solid support.
Embodiment 57 is a method of generating a concatenated nucleic acid sequencing template comprising (a) contacting: (i) a first double-stranded polynucleotide comprising a first target nucleic acid with a first restriction enzyme, and (ii) a second double-stranded polynucleotide comprising a second target nucleic acid with a second restriction enzyme; to produce first and second polynucleotides with compatible overhangs, and wherein the restriction enzymes are chosen from type II, type IIS, type IIP, and type IIT restriction enzymes; (b) attaching the compatible overhangs of the first and second polynucleotides using a ligase.
Embodiment 58 is the method of embodiment 57, wherein the contacting step is preceded by: (a) attaching the first restriction enzyme cut site, optionally, by using an adapter, to a first target nucleic acid and generating the first double stranded polynucleotide by primer extension; and (b) attaching the second restriction enzyme cut site, optionally, by using an adapter, to a second target nucleic acid and generating the second double stranded polynucleotide by primer extension.
Embodiment 59 is a method of generating a concatenated nucleic acid sequencing template comprising: (a) shearing or digesting a first source of nucleic acids and a second source of nucleic acids to generate a first library of nucleic acid fragments and a second library of nucleic acid fragments, respectively; (b) attaching a first adapter to each nucleic acid fragment from the first source of nucleic acids and attaching a second adapter to each nucleic acid fragment of the second source of nucleic acids comprising: (i) contacting the nucleic acid fragments with a first polymerase to produce nucleic acid fragments with blunt ends; (ii) phosphorylating 5′-hydroxyl of the nucleic acid fragments with kinase; (iii) adding 3′ adenine to the nucleic acid fragments with a second polymerase; and (iv) ligating the first adapter to each nucleic acid fragment of the first library and ligating the second adapter to each nucleic acid fragment of the second library; (c) mixing and annealing the first and second libraries of nucleic acids, optionally by PCR, wherein (i) the nucleic acids denature at elevated temperatures and (ii) A and A′ sequences hybridize to each other at lower temperatures; and (d) synthesizing a fully double-stranded concatenated nucleic acid sequencing template, optionally by PCR.
Embodiment 60 is the method of any one of embodiments 54 to 59, wherein the method comprises sequencing the concatenated nucleic acid sequence template.
Embodiment 61 is a method of sequencing a concatenated nucleic acid sequencing template comprising sequencing the first insert sequence of a polynucleotide of any one of embodiments 1 to 22 by initiating sequencing with a first read sequencing primer complementary to the first read primer binding sequence; and sequencing the second insert sequence by initiating sequencing with a second read sequencing primer complementary to the second read primer binding sequence.
Embodiment 62 is the method of embodiment 61, wherein a method further comprises sequencing the complement of the second insert sequence by initiating sequencing with a first complement read sequencing primer complementary to the first complement read primer binding sequence; and sequencing the complement of the first insert sequence by initiating sequencing with a second complement read sequencing primer complementary to the second complement read primer binding sequence.
Embodiment 63 is a method of any one of embodiments 49 to 59, wherein compartmentalizing a sample comprising target double-stranded nucleic acid into a plurality of different compartments is performed and generating concatenated nucleic acid sequencing templates is performed within the different compartments.
Embodiment 64 is a polynucleotide comprising (a) a 5′ terminal polynucleotide comprising a first read sequencing primer sequence; (b) an insert sequence derived from a target nucleic acid, wherein the insert sequence is 3′ of the 5′ terminal polynucleotide; (c) a hybridization sequence 3′ of the insert sequence; (d) a copy of the insert sequence 3′ of the hybridization sequence; and (e) a 3′ terminal polynucleotide comprising the complement of a second read sequencing primer sequence.
Embodiment 65 is a polynucleotide comprising (a) a 5′ terminal polynucleotide comprising a first read sequencing primer sequence; (b) a first insert sequence derived from a target nucleic acid, wherein the insert sequence is 3′ of the 5′ terminal polynucleotide; (c) a hybridization sequence 3′ of the insert sequence; (d) a second insert sequence 3′ of the hybridization sequence; and (e) a 3′ terminal polynucleotide comprising the complement of a second read sequencing primer sequence.
Embodiment 66 is a polynucleotide of embodiment 64 or 65, wherein the insert sequences comprise 40 to 400 nucleotides, optionally wherein the insert sequences comprise 1000 or fewer nucleotides.
Embodiment 67 is the polynucleotide of any one of embodiments 64 to 66, wherein the hybridization sequence comprises 10 to 30 nucleotides, optionally wherein one or more nucleotide in the hybridization sequence is a locked nucleic acid.
Embodiment 68 is the polynucleotide of any one of embodiments 64 to 67, wherein the first read sequencing primer sequence and the second read sequencing primer sequence are different.
Embodiment 69 is the polynucleotide of any one of embodiments 64 to 68, wherein the first read sequencing primer sequence and the second read sequencing primer sequence each comprise an A14 sequence or a B15 sequence, or their complements.
Embodiment 70 is the polynucleotide of any one of embodiments 64 to 69, wherein the 3′ terminal polynucleotide comprises the complement of a P5 primer sequence (P5′) and the 5′ terminal polynucleotide comprises a P7 primer sequence (P7 (SEQ ID NO: 8)), or the 3′ terminal polynucleotide comprises the complement of a P7 primer sequence (P7′) and the 5′ terminal polynucleotide comprises a P5 primer sequence (P5 (SEQ ID NO: 7)).
Embodiment 71 is the polynucleotide of any one of embodiments 64 to 70, wherein the 3′ terminal polynucleotide and/or the 5′ terminal polynucleotide each independently comprise at least one of an adapter, a barcode sequence, a unique molecular identifier (UMI) sequence, a capture sequence, or a cleavage sequence.
Embodiment 72 is the polynucleotide of any one of embodiments 64 to 71, wherein the polynucleotide is immobilized on a solid support.
Embodiment 73 is the polynucleotide of embodiment 72, wherein the polynucleotide is immobilized on the solid support via the 5′ terminal polynucleotide.
Embodiment 74 is the polynucleotide of embodiment 73, wherein the polynucleotide is immobilized to the solid support via binding of an affinity moiety on the 5′ terminal polynucleotide to a binding moiety on the surface of the solid support.
Embodiment 75 is the polynucleotide of any one of embodiments 64 to 74, wherein an affinity moiety is attached via a linker to the 5′ terminal polynucleotide.
Embodiment 76 is the polynucleotide of any one of embodiments 64 to wherein the affinity moiety is biotin, desthiobiotin, or dual biotin.
Embodiment 77 is the polynucleotide of any one of embodiments 64 or 66 to 76, wherein the polynucleotide has the structure 5′-P5-A14-Insert-HYB-Insert-B15′-P7′-3′ or 5′-P7-B15-Insert-HYB′-Insert-A14′-P5′-3′, wherein HYB is a hybridization sequence and HYB′ is the complement of a hybridization sequence.
Embodiment 78 is the polynucleotide of any one of embodiments 65 to 77, wherein the polynucleotide has the structure 5′-P5-A14-Insert1-HYB-Insert2-B15′-P7′-3′ or 5′-P7-B15-Insert1-HYB′-Insert2-A14′-P5′-3′; wherein HYB is a hybridization sequence and HYB′ is the complement of a hybridization sequence.
Embodiment 79 is a composition comprising the polynucleotide of any one of embodiments 64 to 78 hybridized to its complement.
Embodiment 80 is a composition comprising the polynucleotide of any one of embodiments 64 to 78 or a composition of embodiment 79 immobilized on the surface of a solid support, wherein the affinity moiety is biotin, desthiobiotin, or dual biotin and the binding moiety is avidin or streptavidin.
Embodiment 81 is the composition of embodiment 80, wherein the solid support is a bead, slide, wall of a vessel, a flow cell, or a nanowell comprised in a flow cell.
Embodiment 82 is a forked adapter comprising two polynucleotide strands comprising (a) a first strand comprising a sequencing primer sequence and (b) a second strand comprising a 3′ hybridization sequence or its complement, wherein the 3′ end of the first strand is fully or partially complementary to the 5′ end of the second strand.
Embodiment 83 is the forked adapter of embodiment 82, wherein the hybridization sequence or its complement is bound to a blocking oligonucleotide that is fully or partially complementary to the hybridization sequence or its complement.
Embodiment 84 is the forked adapter of embodiment 83, wherein the hybridization sequence or its complement is bound to a blocking oligonucleotide that is fully complementary to the hybridization sequence or its complement.
Embodiment 85 is the forked adapter of any one of embodiments 82 to 84, wherein the first strand and/or second strand further comprise at least one of an adapter, a barcode sequence, a unique molecular identifier (UMI) sequence, a sample index sequence, a capture sequence, or a cleavage sequence.
Embodiment 86 is the forked adapter of any one of embodiments 82 to 85, wherein first strand and/or second strand further comprise a P7 or P5 primer sequence, or their complements.
Embodiment 87 is the forked adapter of any one of embodiments 82 to 86, wherein the sequencing primer sequence comprises a B15 sequence (SEQ ID NO: 6) or an A14 sequence (SEQ ID NO: 4), or their complements.
Embodiment 88 is the forked adapter of any one of embodiments 82 to 87, wherein the first strand comprises a 5′ affinity element capable of binding to an affinity binding partner on a solid support or bead.
Embodiment 89 is the forked adapter of embodiment 88, wherein the affinity element is connected via a linker attached to the first strand.
Embodiment 90 is a composition or kit comprising two forked adapters of any one of embodiments 82 to 89, wherein (a) the first forked adapter comprises a first strand comprising a first read sequencing primer sequence and a second strand comprising a complement of a hybridization sequence and (b) the second forked adapter comprises a first strand comprising a second read sequencing primer sequence and a second strand comprising a hybridization sequence.
Embodiment 91 is the composition or kit of embodiment 44-48 or 90, wherein one or both forked adapters comprise a blocking oligonucleotide.
Embodiment 92 is a method of generating one or more concatenated nucleic acid sequencing templates comprising (a) contacting a sample comprising double-stranded nucleic acid fragments each comprising an insert prepared from a target nucleic acid with the composition or kit comprising two forked adapters, wherein one or both forked adapters comprise a blocking oligonucleotide, optionally wherein the first read sequencing adapter sequence comprises a first read primer binding sequence; (b) ligating the forked adapters to the double-stranded fragments to prepared tagged double-stranded fragments; (c) immobilizing the tagged double-stranded fragments on a solid support; (d) denaturing (1) the immobilized tagged double-stranded fragments to produce immobilized single-stranded fragments and (2) the blocking oligonucleotides to unblock hybridization sequences and complements of hybridization sequences; (e) hybridizing two immobilized single-stranded fragments to each other to form a bridge by binding of the hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment; and (f) extending from the 3′ ends of each single-stranded fragment to produce a double-stranded concatenated nucleic acid sequencing template comprising inserts from both immobilized single-stranded fragments.
Embodiment 93 is a method of generating one or more concatenated nucleic acid sequencing templates comprising (a) contacting a sample comprising double-stranded target nucleic acid with two pools of transposome complexes in solution; wherein the first pool of transposome complexes comprises (i) a transposase; (ii) a first transposon comprising a 3′ transposon end sequence and a first read sequencing adapter sequence; and (iii) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ complement of a hybridization sequence; and wherein the second pool of transposome complexes comprises (i) a transposase; (ii) a first transposon comprising a 3′ transposon end sequence and a second read sequence adapter sequence; and (iii) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ hybridization sequence, wherein one or both second transposons comprise a blocking oligonucleotide; (b) tagmenting the double-stranded nucleic acids to produce tagged double-stranded fragments; (c) releasing the transposome complex from the double-stranded fragments; (d) extending and ligating the double-stranded fragments; (e) immobilizing the tagged double-stranded fragments on a solid support; (f) denaturing (1) the immobilized tagged double-stranded fragments to produce immobilized single-stranded fragments and (2) the blocking oligonucleotides to unblock hybridization sequences and complements of hybridization sequences; (g) hybridizing two immobilized single-stranded fragments to each other to form a bridge by binding of the hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment; and (h) extending from the 3′ ends of each single-stranded fragment to produce a double-stranded concatenated nucleic acid sequencing template comprising inserts from both immobilized single-stranded fragments.
Embodiment 94 is the method of embodiment 92 or 93, wherein the denaturing is performed with an increase in temperature, change in pH, and/or addition of one or more chaotropic agents.
Embodiment 95 is the method of embodiment 94, wherein the increase in temperature is an increase from 45° C.-55° C. to 85° C.-95° C., optionally wherein the increase in temperature is an increase from 50° C. to 90° C.
Embodiment 96 is the method of any one of embodiments 92 to 95, wherein the one or more chaotropic agents comprise formamide and/or NaOH.
Embodiment 97 is the method of any one of embodiments 92 to 96, wherein the immobilizing is by binding of an affinity moiety (1) comprised in the first and/or second forked adapter or (2) comprised in a tag from a second transposome to one or more binding moieties on the surface of the solid support.
Embodiment 98 is the method of any one of embodiments 92 to 97, wherein the affinity moiety is biotin, desthiobiotin, or dual biotin and the binding moiety is avidin or streptavidin.
Embodiment 99 is the method of any one of embodiments 92 to 98, wherein one or more additional rounds of denaturing, hybridizing, and extending are performed.
Embodiment 100 is the method of any one of embodiments 92 to 99, wherein a first single-stranded fragment comprises an insert and a second single-stranded fragment comprises an insert that is the complement of the insert comprised in the first fragment.
Embodiment 101 is the method of any one of embodiments 92 to 100, wherein a first single-stranded fragment comprises an insert and a second fragment comprises an insert that is not the complement of the insert comprised in the first fragment.
Embodiment 102 is the method of any one of embodiments 92 to 101, wherein hybridizing occurs between single-stranded fragments prepared from double-stranded fragments comprising (1) a first forked adapter ligated at one end of each fragment and a second forked adapter ligated at the other end of each fragment or (2) a tag from a second transposon of a first transposome complex at one end of each fragment and a tag from a second transposon of a second transposome at the other end of each fragment.
Embodiment 103 is the method of any one of embodiments 92 to 102, wherein two immobilized single-stranded fragments do not hybridize to each other to form a bridge in the absence of binding of a hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment.
Embodiment 104 is the method of embodiment 103, wherein the hybridizing two immobilized single-stranded fragments to each other to form a bridge does not occur between single-stranded fragments prepared from double-stranded fragments comprising (1) the same forked adapter ligated at both ends of each fragment or (2) a tag from the same transposome complex at both ends of each fragment.
Embodiment 105 is a method of generating one or more concatenated nucleic acid sequencing templates comprising (a) compartmentalizing a sample comprising target double-stranded nucleic acid into a plurality of different compartments; (b) preparing fragments each comprising an insert from the double-stranded nucleic acid within the plurality of different compartments; (c) contacting the plurality of different compartments with a composition or kit of comprising two forked adapters of embodiment 91, wherein one or both forked adapters comprise a blocking oligonucleotide; (d) ligating the forked adapters to the double-stranded fragments to prepared tagged double-stranded fragments within the plurality of different compartments; (e) denaturing (1) the immobilized tagged double-stranded fragments to produce single-stranded fragments and (2) the blocking oligonucleotides to unblock hybridization sequences and complements of hybridization sequences within the plurality of different compartments; (f) hybridizing two single-stranded fragments within the same compartment to each other to form a bridge by binding of the hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment; and (g) extending from the 3′ ends of each single-stranded fragment to produce a double-stranded concatenated nucleic acid sequencing template comprising inserts from both single-stranded fragments within the same compartment.
Embodiment 106 is the method of embodiment 105, wherein the target double-stranded nucleic acid comprises double-stranded DNA fragments, and the preparing fragments prepares subfragments of the double-stranded DNA fragments.
Embodiment 107 is the method of embodiment 63, 105 or 107, wherein the compartments are wells, tubes, or droplets.
Embodiment 108 is the method of any one of embodiments 105 to 107, wherein the denaturing is performed with an increase in temperature, change in pH, and/or addition of one or more chaotropic agents.
Embodiment 109 is the method of embodiment 108, wherein the increase in temperature is an increase from 45° C.-55° C. to 85° C.-95° C., optionally wherein the increase in temperature is an increase from 50° C. to 90° C.
Embodiment 110 is the method of embodiment 108 or 109, wherein the one or more chaotropic agents comprise formamide and/or NaOH.
Embodiment 111 is the method of any one of embodiments 105 to 110, wherein one or more additional rounds of denaturing, hybridizing, and extending are performed.
Embodiment 112 is the method of any one of embodiments 105 to 111, wherein a first single-stranded fragment comprises an insert and a second single-stranded fragment comprises an insert that is the complement of the insert comprised in the first fragment.
Embodiment 113 is the method of any one of embodiments 105 to 111, wherein a first single-stranded fragment comprises an insert and a second fragment comprises an insert that is not the complement of the insert comprised in the first fragment.
Embodiment 114 is the method of any one of embodiments 105 to 113, wherein hybridizing occurs between single-stranded fragments prepared from double-stranded fragments comprising a first forked adapter ligated at one end of each fragment and a second forked adapter ligated at the other end of each fragment.
Embodiment 115 is the method of any one of embodiments 105 to 114, wherein single-stranded fragments do not hybridize to each other in the absence of binding of a hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment.
Embodiment 116 is the method of embodiment 115, wherein the hybridizing two single-stranded fragments to each other does not occur between single-stranded fragments prepared from double-stranded fragments comprising the same forked adapter ligated at both ends of each fragment.
Embodiment 117 is the method of any one of embodiments 63 or 105 to 116, wherein the compartmentalizing comprises dilution of the sample such that most compartments comprise one or no target double-stranded nucleic acid.
Embodiment 118 is the method of embodiment 117, wherein inserts comprised in the same concatenated sequencing templates were prepared from the same target nucleic acid.
Embodiment 119 is the method of any one of embodiments 63 or 105 to 118, wherein the compartmentalizing separates most different haplotypes into different compartments and the method is used for haplotype phasing.
Embodiment 120 is the method of embodiment 119, wherein the haplotype phasing does not require barcodes.
Embodiment 121 is a solid support comprising two pools of immobilized transposome complexes, wherein (a) the first pool of transposome complexes comprises (i) a transposase; (ii) a first transposon comprising a 3′ transposon end sequence, a first read sequencing adapter sequence, and a 5′ affinity moiety; and (iii) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ complement of a hybridization sequence; and (b) the second pool of transposome complexes comprises (i) a transposase; (ii) a first transposon comprising a 3′ transposon end sequence, a second read sequence adapter sequence, and a 5′ affinity moiety; and (iii) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ hybridization sequence, wherein each first transposon is immobilized by binding of a 5′ affinity moiety to a binding moiety on the surface of the solid support.
Embodiment 122 is the solid support of embodiment 121, wherein the first or second pool of transposome complexes comprises the transposome complex of any one of embodiments 30 to 42, wherein the first read sequencing adapter sequence comprises a first read primer binding sequence.
Embodiment 123 is the solid support of embodiment 121 or 122, wherein the first and/or second pool of transposomes complexes comprise homodimers and/or heterodimers.
Embodiment 124 is the solid support of embodiment 122 or 123, wherein the solid support is a bead, slide, wall of a vessel, a flow cell, or a nanowell comprised in a flow cell.
Embodiment 125 is the solid support of any one of embodiments 121 to 124, wherein one or more transposons comprises an index sequence and/or a UMI.
Embodiment 126 is the solid support of embodiment 125, wherein a first transposon comprised in a first pool of transposome complexes and/or a first transposon comprised in a second pool of transposome complexes comprise sample indexes.
Embodiment 127 is the solid support of embodiment 126, wherein both a first transposon comprised in a first pool of transposome complexes and a first transposon comprised in a second pool of transposome complexes comprise sample indexes.
Embodiment 128 is the solid support of any one of embodiments 121 to 127, wherein a second transposon comprised in a first pool of transposome complexes and/or a second transposon comprised in a second pool of transposome complexes comprise sample indexes and/or unique molecular identifiers (UMIs).
Embodiment 129 is the solid support of embodiment 128, wherein both a second transposon comprised in a first pool of transposome complexes and a second transposon comprised in a second pool of transposome complexes comprise sample indexes.
Embodiment 130 is the solid support of embodiment 128 or embodiment 129, wherein both a second transposon comprised in a first pool of transposome complexes and a second transposon comprised in a second pool of transposome complexes comprise UMIs.
Embodiment 131 is a method of generating one or more double-stranded concatenated nucleic acid sequencing templates comprising (a) applying a sample comprising a double-stranded nucleic acid immobilized to a solid support; (b) tagmenting the double-stranded nucleic acids to produce tagged double-stranded fragments comprising inserts from the double-stranded nucleic acid, wherein the double-stranded fragments are immobilized to the solid support by binding of the 5′ affinity moieties to a binding moiety on the surface of the solid support; (c) releasing the transposome complex from the double-stranded fragments; (d) extending and ligating the double-stranded fragments; (e) denaturing the double-stranded fragments into single-stranded fragments, wherein single-stranded fragments comprising a 5′ affinity moiety remain immobilized on the solid support; (f) allowing hybridization of a hybridization sequence comprised in a first immobilized single-stranded fragment to a complement of a hybridization sequence comprised in a second immobilized single-stranded fragment thereby forming a bridge; and (g) extending and generating a double-stranded concatenated nucleic acid sequencing template.
Embodiment 132 is the method of embodiment 131, wherein releasing the transposome complex from the double-stranded fragments is performed with SDS.
Embodiment 133 is the method of embodiment 131 or 132, wherein allowing hybridization comprises cooling the solid support and/or applying a hybridization buffer.
Embodiment 134 is the method of embodiment 133, wherein the cooling comprises reducing the temperature of the solid support to 60° C. or cooler.
Embodiment 135 is the method of embodiment 133 or 134, wherein the hybridization buffer comprises a high salt concentration, optionally wherein the high salt concentration is 750 mM NaCl.
Embodiment 136 is the method of any one of embodiments 131 to 135, wherein the denaturing comprises heating the solid support or applying a chemical denaturant.
Embodiment 137 is the method of embodiment 136, wherein the denaturing comprises increasing the temperature of the solid support to 90° C. or warmer.
Embodiment 138 is the method of any one of embodiments 131 to 137, wherein extending comprises providing polymerase, dNTPs, and extension buffer.
Embodiment 139 is the method of any one of embodiments 131 to 138, further comprising additional rounds of allowing hybridization and extending and generating a double-stranded concatenated nucleic acid sequencing template.
Embodiment 140 is the method of embodiment 131 to 139, wherein hybridization of a hybridization sequence comprised in a first immobilized single-stranded fragment to a complement of a hybridization sequence comprised in a second immobilized single-stranded fragment only occurs when the first and second fragment are at a proximity to each other on the surface of the solid support that is closer than the length of the longer of the first or second fragment.
Embodiment 141 is the method of embodiment 131 to 140, wherein the first immobilized fragment and the second immobilized fragment are immobilized in close proximity on the solid support, wherein the close proximity allows binding of 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, or or more nucleotides comprised in the hybridization sequence comprised in the first immobilized fragment to nucleotides comprised in the complement of the hybridization sequence comprised in the second immobilized fragment.
Embodiment 142 is the method of any one of embodiments 131 to 141, wherein the first immobilized fragment and the second immobilized fragment are immobilized within 20 to 500 nanometers of each other on the surface of the solid support.
Embodiment 143 is the method of any one of embodiments 93 to 121 or 131 to 142, wherein the sample comprises multiple double-stranded nucleic acids.
Embodiment 144 is the method of embodiment 143, wherein both the first and the second immobilized fragments are prepared from the same double-stranded nucleic acid, and the double-stranded concatenated nucleic acid sequencing template comprises two inserts from the same double-stranded nucleic acid.
Embodiment 145 is the method of embodiment 144, wherein the two inserts are from two contiguous sequences comprised in the same double-stranded nucleic acid.
Embodiment 146 is the method of embodiment 144, wherein the two inserts are from two proximal sequences comprised in the same double-stranded nucleic acid, wherein the proximal sequences are separated by 100 or less nucleotides, 200 or less nucleotides, 300 or less nucleotides, 400 or less nucleotides, 500 or less nucleotides, 700 or less nucleotides, or 1,000 or less nucleotides in the double-stranded nucleic acid.
Embodiment 147 is the method of embodiment 146, wherein an area of the solid support comprises multiple double-stranded concatenated nucleic acid sequencing template that share common insert sequences from proximal sequences comprised in the same double-stranded nucleic acid.
Embodiment 148 is a double-stranded concatenated nucleic acid sequencing template prepared by the method of any one of embodiments 131 to 147, wherein the structure of the template comprises (a) 5′-P545-A14-ME-Insert1-ME′-HYB-ME-Insert2-ME′-B15′-i7′-P7′-3′; (b) 5′-P5-A14-ME-Insert1-ME′46-HYB-i8′-ME-Insert2-ME′-B15′-P7′-3′; or (c) 5′-P545-A14-ME-Insert1-ME′46-HYB-i8′-ME-Insert2-ME′-B15′-i7′-P7′-3′, or their complements.
Embodiment 149 is the method of any one of embodiments 131 to 148, further comprising (a) releasing double-stranded concatenated nucleic acid sequencing templates from the solid support; and (b) sequencing the templates to determine insert sequences comprised in the templates.
Embodiment 150 is the method of embodiment 149, wherein the releasing comprising enzymatic digestion or chemical cleavage.
Embodiment 151 is the method of embodiment 149 or 150, further comprising amplifying the templates after releasing and before sequencing.
Embodiment 152 is a method of generating one or more concatenated nucleic acid sequencing templates comprising (a) compartmentalizing a sample comprising target double-stranded nucleic acid into a plurality of different compartments; (b) tagmenting the double-stranded nucleic acids to produce tagged double-stranded fragments comprising inserts from the double-stranded nucleic acid within the plurality of different compartments, wherein the tagmenting is performed with two pools of transposome complexes, wherein the first pool of transposome complexes comprises (i) a transposase; (ii) a first transposon comprising a 3′ transposon end sequence and a first read sequencing adapter sequence; and (iii) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ complement of a hybridization sequence; and wherein the second pool of transposome complexes comprises (i) a transposase; (ii) a first transposon comprising a 3′ transposon end sequence and a second read sequence adapter sequence; and (iii) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ hybridization sequence; (b) denaturing the tagged double-stranded fragments to produce single-stranded fragments; (c) hybridizing two single-stranded fragments within the same compartment to each other by binding of the hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment; and (d) extending from the 3′ ends of each single-stranded fragment to produce a double-stranded concatenated nucleic acid sequencing template comprising inserts from both single-stranded fragments.
Embodiment 153 is the method of embodiment 152, wherein double-stranded concatenated nucleic acid sequencing templates are only produced from hybridizing of two single-stranded fragments present in the same compartment.
Embodiment 154 is the method of embodiment 152 or 153, wherein the hybridization sequence and/or its complement is bound to a blocking oligonucleotide that is fully or partially complementary to the hybridization sequence or its complement, and the denaturing comprises denaturing the blocking oligonucleotide to unblock the hybridization sequence and/or its complement.
Embodiment 155 is the method of any one of embodiments 152 to 154, wherein the transposome complexes are in solution.
Embodiment 156 is the method of any one of embodiments 152 to 155, wherein the compartments are wells, tubes, or droplets.
Embodiment 157 is the method of any one of embodiments 152 to 156, wherein the denaturing is performed with an increase in temperature, change in pH, and/or addition of one or more chaotropic agents.
Embodiment 158 is the method of embodiment 157, wherein the increase in temperature is an increase from 45° C.-55° C. to 85° C.-95° C., optionally wherein the increase in temperature is an increase from 50° C. to 90° C.
Embodiment 159 is the method of embodiment 157 or 158, wherein the one or more chaotropic agents comprise formamide and/or NaOH.
Embodiment 160 is the method of any one of embodiments 152 to 159, wherein one or more additional rounds of denaturing, hybridizing, and extending are performed.
Embodiment 161 is the method of any one of embodiments 152 to 160, wherein a first single-stranded fragment comprises an insert and a second fragment comprises an insert that is not the complement of the insert comprised in the first fragment.
Embodiment 162 is the method of any one of embodiments 152 to 161, wherein the compartmentalizing comprises dilution of the sample such that most compartments comprise one or no target double-stranded nucleic acid.
Embodiment 163 is the method of embodiment 162, wherein inserts comprised in the same concatenated sequencing templates were prepared from the same target nucleic acid.
Embodiment 164 is the method of any one of embodiments 63 or 152 to 163, wherein the compartmentalizing separates most different haplotypes into different compartments and the method is used for haplotype phasing.
Embodiment 165 is the method of embodiment 164, wherein the haplotype phasing does not require barcodes.
Embodiment 166 is the method of any one of embodiments 93 to 121 or 131 to 165, further comprising amplifying the templates.
Embodiment 167 is the method of any one of embodiments 49-55, 57-59, 93 to 121, or 131 to 166, further comprising sequencing the templates.
Embodiment 168 is the method of embodiment 167, wherein sequencing is performed using sequencing primers that bind to A14, B15, and/or a hybridization sequence (HYB).
Embodiment 169 is the method of embodiment 167 or 168, wherein sequencing comprises dark cycles wherein data are not being recorded for a portion of the sequencing.
Embodiment 170 is the method of embodiment 169, wherein the data not being recorded are sequence data associated with the 3′ transposon end sequence or its complement.
Embodiment 171 is the method of any one of embodiments 167 to 170, further comprising (a) evaluating sequences of inserts comprised in the same template; and (b) determining proximity data for sequences comprised in the double-stranded nucleic acid based on inserts that are comprised in the same template.
Embodiment 172 is the method of embodiment 171, wherein the proximity data are determinations that insert sequences (or their complements) were comprised in the same target nucleic acid.
Embodiment 173 is the method of any one of embodiments 167 to 172, further comprising (a) evaluating sequencing results from multiple sequences of a given insert prepared from different templates; and (b) determining instances of non-canonical base pairing based on the sequencing data from (i) the insert and its complement comprised in the same concatenated sequencing template; and/or (ii) the insert comprised in multiple concatenated sequencing templates.
Embodiment 174 is the method of any one of embodiments 167 to 173, further comprising evaluating sequencing results from multiple sequences of a given insert prepared from different templates; and correcting errors in sequencing results for this insert based on the sequencing data from (i) the insert and its complement comprised in the same concatenated sequencing template; and/or (ii) the insert comprised in multiple concatenated sequencing templates.
Embodiment 175 is a method of identifying modified cytosines comprised in an insert sequence comprised in a concatenated sequencing template, comprising (a) preparing a double-stranded concatenated sequencing template, wherein each strand comprises an insert sequence and a copy of the insert sequence and the two strands are complementary to each other; (b) subjecting the double-stranded concatenated sequencing template to a condition for altering modified and/or unmodified cytosines; (c) preparing amplicons of each strand of the double-stranded concatenated sequencing template; (d) sequencing amplicons and evaluating sequencing results for the insert sequence and the copy of the insert sequence in the amplicons produced from each strand; and (e) determining positions of modified cytosines comprised in the insert sequence based on the sequences of each strand of the double-stranded concatenated sequencing template.
Embodiment 176 is the method of embodiment 175, wherein the modified cytosines are methylated or hydroxymethylated cytosines.
Embodiment 177 is the method of embodiment 175 or 176, wherein the concatenated sequencing templates are prepared by the method of any one of embodiments 93 to 121 or 131 to 165.
Embodiment 178 is the method of embodiment 177, wherein extension to produce the double-stranded concatenated sequencing template is performed with a reaction solution comprising methylated-dCTP.
Embodiment 179 is the method of any one of embodiments 175 to 178, wherein uracils comprised in the concatenated sequencing templates are converted to thymines when preparing amplicons.
Embodiment 180 is the method of any one of embodiments 175 to 179, wherein modified cytosines or unmodified cytosines are altered, optionally wherein modified cytosines are altered by TET-Assisted Pyridine Borane Sequencing (TAPS) treatment or unmodified cytosines are altered by sodium bisulfate or enzymatic treatment.
Embodiment 181 is the method of embodiment 180, wherein modified cytosines are altered and the positions of modified cytosines are determined by the presence of (T,C) in the insert sequence and the copy of the insert sequence, respectively, and the positions of unmodified cytosines are determined by the presence of (C,C) in the insert sequence and the copy of the insert sequence, respectively, and wherein the modified and unmodified cytosines are paired with G's in the complementary strand.
Embodiment 182 is the method of embodiment 180, wherein unmodified cytosines are altered and the positions of modified cytosines are determined by the presence of (C,T) in the insert sequence and the copy of the insert sequence, respectively, and the positions of unmodified cytosines are determined by the presence of (T,T) in the insert sequence and the copy of the insert sequence, respectively, and wherein the modified and unmodified cytosines are paired with G's in the complementary strand.
Embodiment 183 is the method of embodiment 180, wherein the method differentiates positions of methylated cytosines from hydroxymethylated cytosines.
Embodiment 184 is the method of embodiment 183, wherein the subjecting each strand to a condition for altering modified and/or unmodified cytosines comprises (a) reacting each strand with β-glycosyltransferase; (b) reacting each strand with a DNA methyltransferase (DNMT); and (c) reacting each strand with a condition that converts unmodified cytosines to uracils.
Embodiment 185 is the method of embodiment 184, wherein (a) the positions of methylated cytosines are determined by the presence of (C,C) in the insert sequence and the copy of the insert sequence, respectively; (b) the positions of hydroxymethylated cytosines are determined by the presence of (C,T) in the insert sequence and the copy of the insert sequence, respectively; and (c) the positions of unmodified cytosines are determined by the presence of (T,T) in the insert sequence and the copy of the insert sequence, respectively; and wherein the methylated, hydroxymethylated, and unmodified cytosines are paired with G's in the complementary strand.
Embodiment 186 is the method of embodiment 183, wherein the subjecting each strand to a condition for altering modified and/or unmodified cytosines comprises (a) reacting each strand with a DNMT; and (b) reacting each strand with a condition that converts methylated cytosines to dihydroxyuracil (DH U).
Embodiment 187 is the method of embodiment 186, wherein (a) the positions of methylated cytosines are determined by the presence of (T,T) in the insert sequence and the copy of the insert sequence, respectively; (b) the positions of hydroxymethylated cytosines are determined by the presence of (T,C) in the insert sequence and the copy of the insert sequence, respectively; and (c) the positions of unmodified cytosines are determined by the presence of (C,C) in the insert sequence and the copy of the insert sequence, respectively; and wherein the methylated, hydroxymethylated, and unmodified cytosines are paired with G's in the complementary strand.
Additional objects and advantages will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice. The objects and advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate one (several) embodiment(s) and together with the description, serve to explain the principles described herein.
Table 1 provides a listing of certain sequences referenced herein.
Described herein are polynucleotides comprising multiple insert sequences, wherein the insert sequences are derived from one or more target nucleic acid. These polynucleotides may comprise a concatenation sequence and multiple primer sequences. This application also describes methods of generating these polynucleotides and uses of these polynucleotides. The presence of multiple insert sequences within a given polynucleotide can increase the output of the sequencing platforms by increasing the number of reads that are produced per flowcell.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art. All patents, applications, published applications and other publications referenced herein are incorporated by reference in their entirety unless stated otherwise. In the event that there are a plurality of definitions for a term herein, those in the Definitions section prevail unless stated otherwise. As used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Unless otherwise indicated, conventional methods of mass spectroscopy, NMR, HPLC, protein chemistry, biochemistry, recombinant DNA techniques and pharmacology are employed. The use of “or” or “and” means “and/or” unless stated otherwise. Furthermore, use of the term “including” as well as other forms, such as “include”, “includes,” and “included,” is not limiting. As used in this specification, whether in a transitional phrase or in the body of the claim, the terms “comprise(s)” and “comprising” are to be interpreted as having an open-ended meaning. That is, the terms are to be interpreted synonymously with the phrases “having at least” or “including at least.” When used in the context of a process, the term “comprising” means that the process includes at least the recited steps, but may include additional steps. When used in the context of a compound, composition, or device, the term “comprising” means that the compound, composition, or device includes at least the recited features or components, but may also include additional features or components.
The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
“Hybridization sequence” or “HYB,” as used herein, refers to a sequence that can hybridize to a complementary hybridization sequence. Hybridization of HYB in one library product to a HYB′ in another library product can lead to a hybridization adduct, wherein the two library products anneal to each other via hybridization of HYB/HYB′.
As used herein, a “concatenated nucleic acid sequencing template” refers to a double-stranded composition of a polynucleotide and its complement. A concatenated nucleic acid sequencing template can be generated by association of two library products by hybridization of HYB/HYB′ followed by extension to generate a double-stranded template.
“Insert sequence,” as used herein, refers to a region of a target nucleic acid that is comprised in a polynucleotide. A polynucleotide may comprise multiple insert sequences.
“Stacked reads” or “tandem reads,” as used herein, relates to sequencing reads of multiple insert sequences that are generated from a single polynucleotide. These sequencing reads may be sequential. For example, a polynucleotide comprising 2 or more insert sequences and 2 or more primer sequences can be used to generate tandem reads. A “tandem reads library,” as used herein, refers to a library of polynucleotides comprising multiple insert sequences that can be used to generate tandem reads.
“SBS,” as used herein refers to a sequence that is incorporated into a polynucleotide to improve binding of a read primer. In embodiments wherein polynucleotides are made from library products produced by tagmentation, SBS may be a mosaic end sequence and SBS' may be the complement of a mosaic end sequence, such as ME and ME′. SBS and SBS' sequences may also be comprised in adapters when library products are produced using Truseq methods (Illumina).
Described herein are polynucleotides that comprise multiple insert sequences, wherein each insert comprises a portion of one or more target nucleic acid. A single polynucleotide comprising multiple insert sequences allows for sequencing of multiple regions of the one or more target nucleic acid in the same region of a flowcell. In this way, more regions of the one or more target nucleic acid can be sequenced without the need for a larger flowcell.
In some embodiments, the polynucleotides are generated from 2 separate library products based on hybridizing of a HYB in one library product to a HYB′ sequence in the other library product to form a hybridized adduct, followed by elongation to produce a concatenated nucleic acid sequencing template.
These polynucleotides may also comprise additional sequences, such as one or more primer sequences, a concatenation sequences, attachment polynucleotides.
In some embodiments, a polynucleotide comprises a 3′ terminal polynucleotide comprising a first read primer binding sequence; a first insert sequence 5′ of the 3′ terminal polynucleotide that is derived from a target nucleic acid; a concatenation sequence comprising a second read primer binding sequence that is orthogonal to the first read primer binding sequence, wherein the second read primer binding sequence comprises a hybridization sequence; a second insert sequence 5′ of the concatenation sequence and derived from a discontiguous sequence of the target nucleic acid or from a different target nucleic acid than the first insert sequence; and an attachment polynucleotide at the 5′ end of the polynucleotide and comprising an attachment sequence, wherein the 3′ terminal polynucleotide, the concatenation sequence, and the attachment polynucleotide are not derived from the target nucleic acid.
Polynucleotides with multiple insert sequences can allow a greater amount of sequence to be generated from a flowcell compared to a standard Illumina pair-end library, as shown in
Also described herein are polynucleotides that may be used as sequencing templates. These sequencing templates may be used with any standard sequencing methods known in the art.
In some embodiments, polynucleotides comprise more than one insert sequence. “Insert sequence” or “insert,” as used herein, refers to a region of a target nucleic acid, such as a double-stranded nucleic acid, that is comprised in a polynucleotide. A polynucleotide may comprise multiple insert sequences. In some embodiments, a polynucleotide comprises two insert sequences. In some embodiments, a polynucleotide comprises three, four, or five insert sequences. A polynucleotide comprising more than one insert that can be used as a sequencing template may be referred to herein as a “concatenated nucleic acid sequencing template” or “concatenated sequencing template.”
In some embodiments, polynucleotides comprise a hybridization sequence or the complement of a hybridization sequence. “Hybridization sequence” or “HYB,” as used herein, refers to a sequence that can hybridize to a complementary hybridization sequence. For example, hybridization of HYB in one fragment (such as a library product) to a HYB′ (the complement of a hybridization sequence) in another fragment can lead to a hybridization adduct or a bridge, wherein the two fragments anneal to each other via hybridization of HYB/HYB′. In some embodiments, HYB comprises sufficient nucleotides to attach two single-stranded fragments together when HYB hybridizes to HYB′. In some embodiments, a HYB sequence comprised in a concatenated sequencing template may used as a primer binding site, as shown in
In some embodiments, a HYB or HYB′ comprises 10-30 nucleotides. In some embodiments, binding of the HYB in a first single-stranded nucleic acid fragment to the HYB′ in a second single-stranded nucleic acid fragment is sufficient to “bridge” the two fragments (as described in methods herein with examples shown in
In some embodiments, one or more nucleotide in the HYB or HYB′ is a locked nucleic acid or a bridged nucleic acid. As used herein, a “locked nucleic acid” or “LNA” refers to a modified RNA nucleotide in which the ribose moiety is modified with an extra bridge connecting the 2′ oxygen and 4′ carbon. In some embodiments, LNAs confer heightened structural stability in the HYB or HYB′ sequence, thus increasing the hybridization melting temperature (Tm) of the HYB/HYB′ interaction. For example, HYB or HYB′ sequences comprising one or more LNAs may only comprise relatively short sequences (such as 10-20 nucleotides), yet still confer sufficiently strong binding to allow formation of bridges between a first single-stranded fragment comprising a HYB and a second single-stranded fragment comprising a HYB′.
In some embodiments, the polynucleotide comprises two or more inserts. As described herein, these inserts may be copies of the same sequence from a target nucleic acid or separate sequences from a target nucleic acid. As used herein, a “chimeric template” refers to a template comprising different inserts.
A wide variety of different polynucleotides comprising two inserts will be described herein, such as those in
For example, a polynucleotide may comprise one or more sequencing primer sequences. Such sequencing primer sequences may be used for binding primers to initiate sequencing when the polynucleotides are used as sequencing templates. In some embodiments, a polynucleotide comprises a first read sequencing primer sequence and/or a second read sequencing primer sequence. As used herein “first read sequencing primer sequence” and “second read sequencing primer sequences” refer to sequences that can bind to a primer that may be used in different sequencing reads. These terms do not limit to any specific sequence, and, for example, a first read sequencing primer sequence may be used to initiate a second sequencing read in a given experiment and a second read sequencing primer may be used to initiate a first sequencing read in a given experiment. Such primer sequences may vary based on the sequencing platform that a user plans to utilize, and such primer sequences would be well-known in the art, such as A14 (SEQ ID NO: 4) and B15 sequences (SEQ ID NO: 5).
In some embodiments, the first read sequencing primer sequence and the second read sequencing primer sequence are different. In some embodiments, the first read sequencing primer sequence and the second read sequencing primer sequence each comprise an A14 sequence or a B15 sequence, or their complements. In some embodiments, the 3′ terminal polynucleotide comprises the complement of a P5 primer sequence (P5′) and the 5′ terminal polynucleotide comprises a P7 primer sequence (P7, SEQ ID NO: 48), or the 3′ terminal polynucleotide comprises the complement of a P7 primer sequence (P7′) and the 5′ terminal polynucleotide comprises a P5 primer sequence (P5, SEQ ID NO: 7).
In some embodiments, the 3′ terminal polynucleotide and/or the 5′ terminal polynucleotide each independently comprise at least one of an adapter, a barcode sequence, a unique molecular identifier (UMI) sequence, a capture sequence, or a cleavage sequence. In other words, polynucleotides may comprise additional sequences of use in methods that a user wants to perform, such as sequencing.
Using methods described herein, one insert in a polynucleotide may be prepared from a fragment comprising a portion of a sense strand of a target nucleic acid and the other insert is prepared by elongation from a fragment comprising a portion of an antisense strand of a target nucleic acid. Using methods described herein, one insert may be prepared from a fragment comprising a portion of an antisense strand of a target nucleic acid and the other insert is prepared by elongation from a fragment comprising a portion of a strand of a target nucleic acid.
In some embodiments, a polynucleotide comprises two insert sequences that are copies of each other. In some embodiments, a polynucleotide comprises a 5′ terminal polynucleotide comprising (a) a first read sequencing primer sequence; (b) an insert sequence derived from a target nucleic acid, wherein the insert sequence is 3′ of the 5′ terminal polynucleotide; (c) a hybridization sequence 3′ of the insert sequence; (d) a copy of the insert sequence 3′ of the hybridization sequence; and (e) a 3′ terminal polynucleotide comprising the complement of a second read sequencing primer sequence. In some embodiments, this polynucleotide may be a sequencing template. While the two copies of the insert (i.e., the insert sequence and the copy of the insert sequence) may be expected to be identical, sequencing results may indicate that they are not. For example, the two copies of the insert may be different based on a mismatch mutation in the target nucleic acid or based on introduction of an error during PCR amplification.
In some embodiments, a polynucleotide comprises two insert sequences that are not copies of each other. In some embodiments, the two insert sequences may be different. In some embodiments, the two insert sequences comprised in a polynucleotide were prepared from different regions of a target nucleic acid. In some embodiments, a polynucleotide comprises (a) a 5′ terminal polynucleotide comprising a first read sequencing primer sequence; (b) a first insert sequence derived from a target nucleic acid, wherein the insert sequence is 3′ of the 5′ terminal polynucleotide; (c) a hybridization sequence 3′ of the insert sequence; (d) a second insert sequence 3′ of the hybridization sequence; and (e) a 3′ terminal polynucleotide comprising the complement of a second read sequencing primer sequence. As described herein for methods with immobilized transposomes, such templates with two different insert sequences can be used to determine contiguity data.
The two inserts comprised in a polynucleotide may be the same of different sizes. In some embodiments, inserts that are copies comprise the same number of nucleotides. In some embodiments, the insert sequences comprise 40 to 400 nucleotides, optionally wherein the insert sequences comprise 1000 or fewer nucleotides. In some embodiments, a paired sequencing read protocol may be performed for a larger insert, such as one comprising more than 500 nucleotides.
In some embodiments, a polynucleotide is immobilized on a solid support. In some embodiments, the polynucleotide is immobilized on the solid support via the 5′ terminal polynucleotide (such as in the embodiment shown in
In some embodiments, a polynucleotide has the structure: 5′-P5-A 14-Insert-HYB-Insert-B15′-P7′-3′; or
5′-P7-B15-Insert-HYB′-Insert-A14′-P5′-3′, wherein HYB is a hybridization sequence and HYB′ is the complement of a hybridization sequence. In some embodiments, the two insert sequences are copies of the same sequence that are identical or two sequences that have greater than 95% sequence homology. Potential reasons for differences in two copies of an insert sequences are described herein, such as non-canonical base pairing or random errors introduced during sequencing. Figure shows a representative double-stranded polynucleotide that comprises two complementary concatenated sequencing templates. One template comprises two A inserts, while the complementary strand comprises two A′ inserts.
In some embodiments, a polynucleotide has the structure: 5′-P5-A 14-Insert-HYB-Insert-B15′-P7′-3′; or
5′-P7-B15-Insert1-HYB′-Insert2-A14′-P5′-3′, wherein HYB is a hybridization sequence and HYB′ is the complement of a hybridization sequence. In some embodiments, Insert 1 and Insert 2 comprise different sequences with little or no sequence homology.
In some embodiments, a composition comprises a polynucleotide hybridized to its complement. In some embodiments, a polynucleotide hybridized to its complement may be termed a double-stranded concatenated sequencing template. In some embodiments, a double-stranded concatenated sequencing template is immobilized to the surface of a solid support by both of its 5′ ends.
In some embodiments, a polynucleotide or a composition comprising a polynucleotide and its complement is immobilized on the surface of a solid support, wherein the affinity moiety is biotin, desthiobiotin, or dual biotin and the binding moiety is avidin or streptavidin.
A wide range of different solid support may be used for immobilization. In some embodiments, the solid support is a bead, slide, wall of a vessel, a flow cell, or a nanowell comprised in a flow cell.
In some embodiments, a linker for attaching an affinity moiety to a polynucleotide is a cleavable linker. In some embodiments, a user can release a polynucleotide from a solid support at a desired time by cleaving this cleavable linker.
A. Target Nucleic Acid
Target nucleic acids used herein can be composed of DNA, RNA or analogs thereof. The source of the target nucleic acids can be genomic DNA, messenger RNA, or other nucleic acids from native sources. In some cases, the target nucleic acids that are derived from such sources can be amplified prior to use in a method or composition herein.
Exemplary biological samples from which target nucleic acids can be derived include, for example, those from a mammal such as a rodent, mouse, rat, rabbit, guinea pig, ungulate, horse, sheep, pig, goat, cow, cat, dog, primate, human or non-human primate; a plant such as Arabidopsis thaliana, corn, sorghum, oat, wheat, rice, canola, or soybean; an algae such as Chlamydomonas reinhardtii; a nematode such as Caenorhabditis elegans; an insect such as Drosophila melanogaster, mosquito, fruit fly, honey bee or spider; a fish such as zebrafish; a reptile; an amphibian such as a frog or Xenopus laevis; a dictyostelium discoideum; a fungi such as Pneumocystis carinii, Takifugu rubripes, yeast, Saccharamoyces cerevisiae or Schizosaccharomyces pombe; or a Plasmodium falciparum. Target nucleic acids can also be derived from a prokaryote such as a bacterium, such as Escherichia coli, staphylococci or Mycoplasma pneumoniae; an archae; a virus such as Hepatitis C virus or human immunodeficiency virus; or a viroid. Target nucleic acids can be derived from a homogeneous culture or population of the above organisms or alternatively from a collection of several different organisms, for example, in a community or ecosystem. Nucleic acids can be isolated using methods known in the art including, for example, those described in Sambrook et al, Molecular Cloning: A Laboratory Manual, 3rd edition, Cold Spring Harbor Laboratory, New York (2001) or in Ausubel et al, Current Protocols in Molecular Biology, John Wiley and Sons, Baltimore, Md. (1998), each of which is incorporated herein by reference.
In some embodiments, target nucleic acids can be obtained as fragments of one or more larger nucleic acids. Fragmentation can be carried out using any of a variety of techniques known in the art including, for example, nebulization, sonication, chemical cleavage, enzymatic cleavage, or physical shearing. Fragmentation may also result from use of a particular amplification technique that produces amplicons by copying only a portion of a larger nucleic acid. For example, PCR amplification produces fragments having a size defined by the length of the fragment between the flanking primers used for amplification.
A population of target nucleic acids, or amplicons thereof, can have an average strand length that is desired or appropriate for a particular application of the methods or compositions set forth herein. For example, the average strand length can be less than 100,000 nucleotides, 50,000 nucleotides, 10,000 nucleotides, nucleotides, 1,000 nucleotides, 500 nucleotides, 100 nucleotides, or 50 nucleotides. Alternatively or additionally, the average strand length can be greater than 10 nucleotides, 50 nucleotides, 100 nucleotides, 500 nucleotides, 1,000 nucleotides, 5,000 nucleotides, 10,000 nucleotides, 50,000 nucleotides, or 100,000 nucleotides. The average strand length for population of target nucleic acids, or amplicons thereof, can be in a range between a maximum and minimum value set forth above. It will be understood that amplicons generated at an amplification site (or otherwise made or used herein) can have an average strand length that is in a range between an upper and lower limit selected from those exemplified above.
In some embodiments, the target nucleic acids have a relatively short average strand length, such as less than 200 nucleotides, less than 150 nucleotides, less than 100 nucleotides, less than 75 nucleotides, less than 50 nucleotides, or less than 36 nucleotides. Sequencing of target nucleic acids with relatively short average strand length are not limited by read-length, and increasing the number of reads could significantly increase sequencing output. Examples of sample types with relatively short average strand length are cell-free DNA (cfDNA) and exome sequencing sample.
In some embodiments, the target nucleic acids are cell-free DNA (cfDNA) from a maternal blood sample. In some embodiments, the cfDNA is extracted from a maternal plasma sample. In some embodiments, the cfDNA is for noninvasive prenatal testing (NIPT).
In some embodiments, the target nucleic acids are exomes. In some embodiments, exomes are prepared via targeted resequencing. In some embodiments, exomes are prepared by whole-genome enrichment. In some embodiments, exomes are prepared by hybridization-based enrichment.
In some embodiments, the target nucleic acids are DNA and RNA. Separate libraries of RNA and DNA can be prepared to generate hybrid DNA/RNA polynucleotides. In some embodiments, polynucleotides comprise one or more insert comprising RNA and one or more insert comprising DNA. Such polynucleotides comprising RNA insert(s) and DNA insert(s) can be termed “hybrid polynucleotides” and allow multiple readouts to be generated from a single sequencing run. In some embodiments, polynucleotides comprising RNA and DNA inserts have a dual sample index to allow for self-normalizing. In some embodiments, the minimum of DNA or RNA in the starting libraries dictates the amount of hybrid polynucleotides generated.
Any of a variety of known amplification techniques can be used to increase the amount of template sequences present for use in a method set forth herein. Exemplary techniques include, but are not limited to, polymerase chain reaction (PCR), rolling circle amplification (RCA), multiple displacement amplification (MDA), or random prime amplification (RPA) of nucleic acid molecules having template sequences. It will be understood that amplification of target nucleic acids prior to use in a method or composition set forth herein is optional. As such, target nucleic acids will not be amplified prior to use in some embodiments of the methods and compositions set forth herein. Target nucleic acids can optionally be derived from synthetic libraries. Synthetic nucleic acids can have native DNA or RNA compositions or can be analogs thereof. Solid-phase amplification methods can also be used, including for example, cluster amplification, bridge amplification or other methods set forth below in the context of array-based methods.
In some embodiments, the polynucleotides disclosed herein can be sequenced using any suitable nucleic acid sequencing platform to determine the nucleic acid sequence of the target sequence. In some respects, sequences of interest are correlated with or associated with one or more congenital or inherited disorders, pathogenicity, antibiotic resistance, or genetic modifications. Sequencing may be used to determine the nucleic acid sequence of a short tandem repeat, single nucleotide polymorphism, gene, exon, coding region, exome, or portion thereof. As such, the methods and compositions described herein relate to methods useful in, but not limited to, cancer and disease diagnosis, prognosis and therapeutics, DNA fingerprinting applications (e.g., DNA databanking, criminal casework), metagenomic research and discovery, agrigenomic applications, and pathogen identification and monitoring.
In some embodiments, a sample used to prepare sequencing templates comprises double-stranded nucleic acid. This double-stranded nucleic acid may be referred to as target nucleic acid. In some embodiments, a double-stranded nucleic acid may be added to a solid support comprising immobilized transposomes. In some embodiments, a double-stranded nucleic acid may be fragmented and combined with a mixture of forked adapters.
In some embodiments, a sample comprises multiple double-stranded nucleic acids.
A biological sample used in accordance with the present disclosure can be any type that comprises target nucleic acids. However, the sample need not be completely purified, and can comprise, for example, nucleic acid mixed with protein, other nucleic acid species, other cellular components, and/or any other contaminant. In some embodiments, the biological sample comprises a mixture of nucleic acid, protein, other nucleic acid species, other cellular components, and/or any other contaminant present in approximately the same proportion as found in vivo. For example, in some embodiments, the components are found in the same proportion as found in an intact cell. In some embodiments, the biological sample has a 260/280 absorbance ratio of less than or equal to 2.0, 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 0.9, 0.8, 0.7, or 0.60. In some embodiments, the biological sample has a 260/280 absorbance ratio of at least 2.0, 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 0.9, 0.8, 0.7, or 0.60. Because the methods provided herein allow nucleic acid to be bound to solid supports, other contaminants can be removed merely by washing the solid support after surface bound tagmentation occurs. The biological sample can comprise, for example, a crude cell lysate or whole cells. For example, a crude cell lysate that is applied to a solid support in a method set forth herein, need not have been subjected to one or more of the separation steps that are traditionally used to isolate nucleic acids from other cellular components. Exemplary separation steps are set forth in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2d Edition, 1989, and Short Protocols in Molecular Biology, ed. Ausubel, et al, hereby incorporated by reference.
In some embodiments, the sample that is applied to the solid support has a 260/280 absorbance ratio that is less than or equal to 1.7.
Thus, in some embodiments, the biological sample can comprise, for example, blood, plasma, serum, lymph, mucus, sputum, urine, semen, cerebrospinal fluid, bronchial aspirate, feces, and macerated tissue, or a lysate thereof, or any other biological specimen comprising nucleic acid.
In some embodiments, the sample is blood. In some embodiments, the sample is a cell lysate. In some embodiments, the cell lysate is a crude cell lysate. In some embodiments, the method further comprises lysing cells in the sample after applying the sample to a solid support to generate a cell lysate.
In some embodiments, the sample is a biopsy sample. In some embodiments, the biopsy sample is a liquid or solid sample. In some embodiments, a biopsy sample from a cancer patient is used to evaluate sequences of interest to determine if the subject has certain mutations or variants in predictive genes.
In some embodiments, the sample comprises a target double-stranded DNA. In some embodiments, the DNA is genomic DNA. In some embodiments, the DNA is cell-free DNA (cfDNA). In some embodiments, the DNA is circulating tumor DNA (ctDNA).
In some embodiments, the DNA is double-stranded cDNA that is prepared from RNA. In some embodiments, the RNA is mRNA. In some embodiments, the RNA comprises coding, untranslated region (UTR), introns, and/or intergenic sequences.
B. 3′ Terminal Polynucleotide
In some embodiments, the 3′ terminal polynucleotide comprises a first read primer binding sequence.
In some embodiments, the 3′ terminal polynucleotide comprises at least one of a barcode sequence, a unique molecular identifier (UMI) sequence, a capture sequence, or a cleavage sequence. In some embodiments, the 3′ terminal polynucleotide and/or the attachment polynucleotide each independently comprise at least one of a barcode sequence, a unique molecular identifier (UMI) sequence, a capture sequence, or a cleavage sequence.
In some embodiments, the 3′ terminal polynucleotide comprises a ME′, B15′, and/or P7′ sequence. In some embodiments, the 3′ terminal polynucleotide comprises a ME′, B15′, and P7′ sequence.
In some embodiments, the 3′ terminal polynucleotide comprises the complement of a P5 primer sequence (P5′) and the attachment polynucleotide comprises a P7 primer sequence (P7). In some embodiments, the 3′ terminal polynucleotide comprises the complement of a P7 primer sequence (P7′) and the attachment polynucleotide comprises a P5 primer sequence (P5).
In some embodiments, the 3′ terminal polynucleotide comprises a ME′-B15′-P7′ sequence.
C. Insert Sequences
Insert sequences comprised in a polynucleotide comprise sequences from a target nucleic acid. As such, the polynucleotides described herein can be used for a number of purposes, such as to generate tandem reads when sequencing.
Polynucleotide described herein comprise more than one insert sequence. In some embodiments, a polynucleotide comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, or more insert sequences. In some embodiments, a polynucleotide comprises two insert sequences. In some embodiments, a polynucleotide comprises three insert sequences.
Insert sequences may be derived from one or more target nucleic acid.
In some embodiments, a polynucleotide comprises multiple insert sequences that are derived from multiple target nucleic acids.
In some embodiments, a polynucleotide may comprise multiple insert sequences that are all derived from the same target nucleic acid. In some embodiments, multiple insert sequences are derived from discontiguous sequences of the target nucleic acid. By discontiguous sequences, it is meant that the multiple insert sequences in a polynucleotide do not adjoin each other in the original target nucleic acid. In some embodiments, the multiple insert sequences are from random regions of the target nucleic acid. In some embodiments, the methods for generating the present polynucleotides do not select for specific insert sequences.
In some embodiments, multiple insert sequences each comprise from 40 to 400 nucleotides, or each comprise from 100 to 200 nucleotides, or each comprise 150 nucleotides. In some embodiments, a first insert sequence and a second insert sequence each comprise from 40 to 400 nucleotides, or each comprise from 100 to 200 nucleotides, or each comprise 150 nucleotides.
In some embodiments, a polynucleotide comprises more than two insert sequences. In some embodiments, a polynucleotide comprises, between the second insert sequence and the attachment polynucleotide, at least one insert unit comprising an insert sequence derived from a discontiguous sequence of the target nucleic acid or from a different target nucleic acid than the other insert sequences at the 5′ end and a concatenation sequence comprising a read primer binding sequence at the 3′ end, wherein the read primer binding sequence is orthogonal to the other read primer binding sequences.
In embodiments where a polynucleotide comprises more than two insert sequences, the polynucleotide may comprise multiple different concatenation sequences, wherein each concatenation sequence comprises a primer sequence, and wherein the primer sequences comprised in different concatenation sequences are different. In some embodiments, one or more primer sequences comprise a hybridization sequence, wherein hybridization sequences are different in different primer sequences.
For example, to generate a polynucleotide comprising three insert sequences, two different HYB/HYB′ sequence pairs can be used, such as HYB1/HYB1′ and HYB2/HYB2′. To generate the polynucleotide with three inserts, HYB1/HYB1′ can be used to link insert 1 and insert 2, and HYB2/HYB2′ can be used to link insert 2 and insert 3. A forked adapter for insert 1 could comprise P5 and HYB1, an adapter for insert 2 could comprise HYB1′ and HYB2, and an adapter for insert 3 could comprise HYB2′ and P7′.
Insert sequences can be generated by a number of methods to generate nucleic acid fragments, such as tagmentation or fragmentation.
D. Adapter Sequences
In some embodiments, the polynucleotide may comprise one or more adapter sequence.
Adapter sequences may comprise one or more functional sequences or components selected from the group consisting of primer sequences, anchor sequences, universal sequences, spacer regions, index sequences, capture sequences, barcode sequences, cleavage sequences, sequencing-related sequences, and combinations thereof. In some embodiments, an adapter sequence comprises a primer sequence. In other embodiments, an adapter sequence comprises a primer sequence and an index or barcode sequence. A primer sequence may also be a universal sequence. This disclosure is not limited to the type of adapter sequences that could be used and a skilled artisan will recognize additional sequences that may be of use for library preparation and next generation sequencing. A universal sequence is a region of nucleotide sequence that is common to two or more nucleic acid fragments. Optionally, the two or more nucleic acid fragments may also have regions of sequence differences. A universal sequence that may be present in different members of a plurality of nucleic acid fragments can allow for the replication or amplification of multiple different sequences using a single universal primer that is complementary to the universal sequence.
In some embodiments, the first read primer binding sequence comprises a first adapter sequence. In some embodiments, the first adapter sequence is the complement of a A14 primer sequence (A14′) or the complement of a B15 primer sequence (B15′).
In some embodiments, an adapter sequence comprises an SBS or SBS' sequence. In some embodiments, a SBS or SBS' sequence may comprise all or part of a standard sequence comprised in oligonucleotides used in Truseq workflows, such that standard sequence primers can be used. In some embodiments, SBS may be a mosaic end sequence and SBS' may be the complement of a mosaic end sequence, such as ME and ME′.
In some embodiments, a SBS or SBS' sequence may comprise A14-ME or B15-ME, or their complements. SEQ ID NOs: 15-21 show some exemplary SBS or SBS' sequences or adapters comprising SBS or SBS' sequences.
In some embodiments, SBS and SBS' are all or partially complementary sequences that can form an adapter duplex. In some embodiments, SBS and SBS' are partially complementary. In some embodiments, SBS and SBS' are fully complementary. In some embodiments, SBS and/or SBS' comprise a 13-base pair sequence. In some embodiments, the adapter duplex comprises P5-HYB′ and P7-HYB in addition to SBS or SBS′. In this way, for example, when two library fragments are stacked together (i.e., in tandem together) to generate polynucleotides with two inserts, the resulting polynucleotide can be sequenced with standard sequencing primers.
In some embodiment, an adapter sequence has a melting temperature of 65° C. or higher for binding to a sequencing primer. In some embodiments, an adapter sequence binds a sequencing primer such that the binding is not lost with temperatures used for sequencing. In some embodiments, the adapter sequence comprises significant (greater than 10%) of each of A, T, C, and G. In some embodiments, the G/C content of the adapter sequence is 40%-60%. In some embodiments, the G/C content of the adapter sequence is 30% or greater and 70% or less. In some embodiments, the G/C content of the adapter sequence is between 40% or greater and 50% or less or 50% or greater or 60% or less.
In some embodiments, the attachment polynucleotide comprises a second adapter sequence. In some embodiments, the second adapter sequence is an A14 sequence or a B15 sequence.
In some embodiments, the first adapter sequence is the complement of an A14 sequence (A14′) and the second adapter sequence is a B15 sequence. In some embodiments, the first adapter sequence is the complement of a B15 sequence (B15′) and the second adapter sequence is an A14 sequence.
In some embodiments, adapter sequences are transferred to the 5′ ends of a nucleic acid fragment by a tagmentation reaction.
E. Concatenation Sequence
In some embodiments, a concatenation sequence comprises a second read primer binding sequence that is orthogonal to the first read primer binding sequence, wherein the second read primer binding sequence comprises a hybridization sequence. In some embodiments, the hybridization sequence is HYB′. In some embodiments, the second read primer binding sequence comprises a hybridization sequence (HYB) and the complement of an SBS' sequence (ME′), as shown in
In some embodiments, the concatenation sequence comprises a transposon end sequence 3′ of the hybridization sequence and a complement of the transposon end sequence 5′ of the hybridization sequence.
In some embodiments, the concatenation sequence comprises ME′, HYB′, and/or ME. In some embodiments, the concatenation sequence comprises ME′, HYB′, and ME. In some embodiments, the concatenation sequence is ME′-HYB′-ME.
In some embodiments, the second read primer binding sequence comprises the complement of a hybridization sequence and a complement of the transposon end sequence. In some embodiments, the second read primer binding sequence comprises HYB′ or ME′. In some embodiments, the second read primer binding sequence comprises HYB′ and ME′. In some embodiments, the second read primer binding sequence is HYB′-ME′.
F. Immobilization and Attachment Polynucleotide
In some embodiments, the polynucleotide is immobilized on a solid support.
In some embodiments, the polynucleotide is immobilized on the solid support via an attachment polynucleotide. In some embodiments, the attachment polynucleotide comprises an attachment sequence.
In some embodiments, the attachment polynucleotide comprises an attachment sequence. In some embodiments, the attachment sequence is a nucleic acid sequence that hybridizes to a transposon in a transposome complex and that is immobilized on a solid support, such as a slide, flow cell, or bead. In some embodiments, the attachment sequence functions to attach a transposome complex to a solid support. In some embodiments, the attachment sequence functions to attach a polynucleotide to a solid support. In some embodiments, the attachment sequence is P5.
In some embodiments, the polynucleotide is immobilized on the solid support via hybridization of the attachment polynucleotide to an attachment polynucleotide complement on the surface of the solid support. In some embodiments, the polynucleotide is immobilized to the solid support via binding of an affinity moiety on the attachment polynucleotide to a binding moiety on the surface of the solid support.
In some embodiments, the solid support is a flow cell or a bead.
In some embodiments, the attachment polynucleotide comprises at least one of a barcode sequence, a unique molecular identifier (UMI) sequence, a capture sequence, or a cleavage sequence.
In some embodiments, the attachment polynucleotide comprises a second adapter sequence. In some embodiments, the second adapter sequence is A14 or B15.
In some embodiments, the attachment polynucleotide comprises a transposon end sequence. In some embodiments, the transposon end sequence is ME.
In some embodiments, the attachment sequence is P5, the second adapter sequence is A14, and/or the transposon end sequence is ME. In some embodiments, the attachment polynucleotide comprises P5, A14, and/or ME. In some embodiments, the attachment polynucleotide comprises P5, A14, and ME. In some embodiments, the attachment polynucleotide comprises P5-A14-ME.
G. Samples Indexes and UMIs
In some embodiments, polynucleotides comprise, in addition to a hybridization sequence (or its complement) and at least 2 inserts, a primer sequence, an index sequence, a barcode sequence, a purification tag, or any combination thereof. In some embodiments, polynucleotides comprise sample indexes and/or unique molecular identifiers (UMIs). In some embodiments, one or more of these sequences are incorporated into polynucleotides using forked adapters that are ligated to double-stranded fragments or using forked adapters that are comprised within in transposomes that are incorporated into double-stranded fragments during tagmentation. Alternatively, additional sequences may be added to polynucleotides (such as concatenated sequencing templates) after they have been generated, such as with PCR.
Unique molecular identifiers (UMIs) are sequences of nucleotides applied to or identified in nucleic acid molecules that may be used to distinguish individual nucleic acid molecules from one another. UMIs may be sequenced along with the nucleic acid molecules with which they are associated to determine whether the read sequences are those of one source nucleic acid molecule or another. The term “UMI” may be used herein to refer to both the sequence information of a polynucleotide and the physical polynucleotide per se. UMIs are similar to barcodes, which are commonly used to distinguish reads of one sample from reads of other samples, but UMIs are instead used to distinguish nucleic acid template fragments from another when many fragments from an individual sample are sequenced together. UMIs may be defined in many ways, such as described in WO 2019/108972 and WO 2018/136248, which are incorporated herein by reference.
In some embodiments, two sample indexes are used to prepare unique dual indexes (UDIs). In some embodiments, a sample index is an i5-i8 sequence. Alternatively, i6 and i8 sequences may be used as UMIs.
While UMIs are useful for removing PCR duplicates in double-stranded nucleic acids and for detection of low-frequency variants, UDIs are useful for mitigating sample misassignment due to index hopping in library sequencing and demultiplexing. UDIs, such as unique i5 and i7 index sequences, can be added to the ends of target nucleic acids so that both ends contain a UDI. UDIs can be used with patterned flow cells, such as Illumina's NovaSeq 6000 system (See, e.g., WO 2018/204423, WO 2018/208699, WO 2019/055715, and WO 2016/176091; which are incorporated by reference herein in their entireties). In some embodiments described herein, such as those shown in
H. Compositions Comprising a Polynucleotide and its Complement
In some embodiments, a composition comprises a polynucleotide and its complement. In some embodiments, a polynucleotide is hybridized to its complement. In some embodiments, a polynucleotide and its complement are comprised in a double-stranded composition.
In some embodiments, a composition comprises a polynucleotide and its complement, wherein the complement comprises a 3′ terminal complement comprising a first complement read primer binding sequence, wherein the first complement read primer binding sequence is orthogonal to the first and second read primer binding sequences; the complement of the second insert sequence of the 3′ terminal complement; a complement concatenation sequence 5′ of the complement of the second insert sequence and comprising a 3′ to 5′ second complement read primer binding sequence, wherein the second complement read primer binding sequence is orthogonal to the first and second read primer binding sequences, and to the first complement read primer binding sequence; the complement of the first insert sequence 5′ of the complement concatenation sequence; and a complement attachment polynucleotide at the 5′ end comprising a complement attachment sequence.
In some embodiments, a composition comprises a polynucleotide and a complement, wherein either the polynucleotide or the complement is immobilized on a solid support. In some embodiments, a composition comprises a polynucleotide that is immobilized on a solid support via the first attachment polynucleotide. In some embodiments, the complement is immobilized on the solid support via the complement attachment polynucleotide.
In some embodiments, the complement attachment polynucleotide comprises an attachment sequence. In some embodiments, the attachment sequence comprised in the complement attachment polynucleotide is P7.
In some embodiments, the complement attachment polynucleotide comprises a ME-B15-P7 sequence. In some embodiments, the complement attachment sequence comprises P7. In some embodiments, the complement concatenation sequence comprises ME-HYB-ME′. In some embodiments, the second read complement primer sequence comprises HYB-ME′. In some embodiments, the 3′ terminal polynucleotide complement comprises P5′-A14′-ME′. In some embodiments, the first read complement read primer binding sequence comprises A14′-ME′. In some embodiments, the complement hybridization sequence comprises HYB.
I. Structures of a Polynucleotide or a Composition
A polynucleotide may have a variety of structures. In some embodiments, a composition comprises a polynucleotide, or its complement, of one of the following structures.
In some embodiments, the polynucleotide has the structure: 3′-P7′-B15′-ME′-Insert 1-ME-HYB-ME′-Insert 2-ME-A14-P5-5′.
In some embodiments, the complement of the polynucleotide has the structure: 3′-P5′-A14′-ME′-Insert 2-ME-HYB′-ME′-Insert 1-ME-B15-P7-5′.
J. Kits Comprising a Polynucleotide
In some embodiments, a kit or composition comprises a first transposome complex and a second transposome complex, wherein the first transposome complex comprises a transposon comprising the complement of a hybridization sequence and the second transposome complex comprises a transposon comprising a hybridization sequence.
In some embodiments, a composition or kit comprises a solid support, optionally wherein the optionally support is beads; components for generating transposome complexes, comprising a transposase; oligonucleotides for generating an oligonucleotide duplex, wherein the first oligonucleotide comprises a 3′ transposon end sequence and a 5′ first adapter sequence and the second oligonucleotide comprises a 5′ transposon end sequence and a 3′ second adapter sequence, wherein the 5′ transposon end sequence is complementary to the 3′ transposon end sequence; wherein the first and second adapter sequences are not the same; and a first and second set of primers for adding attachment sequences and hybridization sequences to fragments by PCR, wherein the first set of primers comprises primers for adding a hybridization sequence and a first attachment sequence to fragments; and wherein the second set of primers comprises primers for adding a complement hybridization sequence and a second attachment sequence to fragments; wherein the first and second attachment sequences are not the same.
In some embodiments, a kit or composition comprises one or more forked adapter complex. In some embodiments, a kit or composition comprises a first forked adapter complex and a second forked adapter complex.
In some embodiments, a kit or composition comprises one or more assembled adapter duplexes. In some embodiments, a kit or composition comprises an assembled adapter duplex comprising a first adapter duplex and a second adapter duplex.
In some embodiments, a kit or composition comprises a forked adapter complex and an assembled adapter duplex.
In some embodiments, a kit or composition comprises assembled enzyme and transposons.
In some embodiments, a kit or composition comprises purified oligonucleotides.
A variety of methods can be used to generate the polynucleotides described herein.
A. Methods Comprising a Transposition Reaction
In some embodiments, a polynucleotide is prepared via a method comprising a transposition reaction.
A transposition reaction is a reaction wherein one or more transposons are inserted into target nucleic acids at random sites or almost random sites. Components in a transposition reaction include a transposase (or other enzyme capable of fragmenting and tagging a nucleic acid as described herein, such as an integrase) and a transposon element that includes a double-stranded transposon end sequence that binds to the transposase (or other enzyme as described herein), and an adapter sequence attached to one of the two transposon end sequences. One strand of the double-stranded transposon end sequence is transferred to one strand of the target nucleic acid and the complementary transposon end sequence strand is not (a non-transferred transposon sequence). The adapter sequence can include one or more functional sequences or components (e.g., primer sequences, anchor sequences, universal sequences, spacer regions, or index tag sequences) as needed or desired.
Transposon based technology can be utilized for fragmenting DNA, for example, as exemplified in the workflow for NEXTERA™ FLEX DNA sample preparation kits (Illumina, Inc.), wherein target nucleic acids, such as genomic DNA, are treated with transposome complexes that simultaneously fragment and tag (“tagmentation”) the target, thereby creating a population of fragmented nucleic acid molecules tagged with unique adapter sequences at the ends of the fragments.
A “transposome complex” is comprised of at least one transposase (or other enzyme as described herein) and a transposon recognition sequence. In some such systems, the transposase binds to a transposon recognition sequence to form a functional complex that is capable of catalyzing a transposition reaction. In some aspects, the transposon recognition sequence is a double-stranded transposon end sequence. The transposase binds to a transposase recognition site in a target nucleic acid and insert sequences the transposon recognition sequence into a target nucleic acid. In some such insertion events, one strand of the transposon recognition sequence (or end sequence) is transferred into the target nucleic acid, resulting in a cleavage event. Exemplary transposition procedures and systems that can be readily adapted for use with the transposases.
Exemplary transposases that can be used with certain embodiments provided herein include (or are encoded by): Tn5 transposase, Sleeping Beauty (SB) transposase, Vibrio harveyi, MuA transposase and a Mu transposase recognition site comprising R1 and R2 end sequences, Staphylococcus aureus Tn552, Ty1, Tn7 transposase, Tn/O and IS10, Mariner transposase, Tc1, P Element, Tn3, bacterial insertion sequences, retroviruses, and retrotransposon of yeast. More examples include IS5, Tn10, Tn903, IS911, and engineered versions of transposase family enzymes. The methods described herein could also include combinations of transposases, and not just a single transposase.
In some embodiments, the transposase is a Tn5, Tn7, MuA, or Vibrio harveyi transposase, or an active mutant thereof. In other embodiments, the transposase is a Tn5 transposase or a mutant thereof. In other embodiments, the transposase is a Tn5 transposase or a mutant thereof. In other embodiments, the transposase is a Tn5 transposase or an active mutant thereof. In some embodiments, the Tn5 transposase is a hyperactive Tn5 transposase, or an active mutant thereof. In some aspects, the Tn5 transposase is a Tn5 transposase as described in PCT Publ. No. WO2015/160895, which is incorporated herein by reference. In some aspects, the Tn5 transposase is a hyperactive Tn5 with mutations at positions 54, 56, 372, 212, 214, 251, and 338 relative to wild-type Tn5 transposase. In some aspects, the Tn5 transposase is a hyperactive Tn5 with the following mutations relative to wild-type Tn5 transposase: E54K, M56A, L372P, K212R, P214R, G251R, and A338V. In some embodiments, the Tn5 transposase is a fusion protein. In some embodiments, the Tn5 transposase fusion protein comprises a fused elongation factor Ts (Tsf) tag. In some embodiments, the Tn5 transposase is a hyperactive Tn5 transposase comprising mutations at amino acids 54, 56, and 372 relative to the wild type sequence. In some embodiments, the hyperactive Tn5 transposase is a fusion protein, optionally wherein the fused protein is elongation factor Ts (Tsf). In some embodiments, the recognition site is a Tn5-type transposase recognition site (Goryshin and Reznikoff, J. Biol. Chem., 273:7367, 1998). In one embodiment, a transposase recognition site that forms a complex with a hyperactive Tn5 transposase is used (e.g., EZ-Tn5™ Transposase, Epicentre Biotechnologies, Madison, Wis.). In some embodiments, the Tn5 transposase is a wild-type Tn5 transposase.
In some embodiments, the transposome complex comprises a dimer of two molecules of a transposase. In some embodiments, the transposome complex is a homodimer, wherein two molecules of a transposase are each bound to first and second transposons of the same type (e.g., the sequences of the two transposons bound to each monomer are the same, forming a “homodimer”). In some embodiments, the compositions and methods described herein employ two populations of transposome complexes. In some embodiments, the transposases in each population are the same. In some embodiments, the transposome complexes in each population are homodimers, wherein the first population has a first adapter sequence in each monomer and the second population has a different adapter sequence in each monomer.
In some embodiments, the transposase complex comprises a transposase (e.g., a Tn5 transposase) dimer comprising a first and a second monomer. In some aspects, each monomer comprises a first transposon, a second transposon, and an attachment polynucleotide, where the first transposon includes a transposon end sequence at its 3′ end (also referred to as a 3′ transposon end sequence) and an adapter sequence at its 5′ end (also referred to as a 5′ adapter sequence); the second transposon includes a transposon end sequence at its 5′ end (also referred to as a 5′ transposon end sequence) and an adapter sequence at its 3′ end (also referred to as a 3′ adapter sequence); and the attachment polynucleotide includes an attachment adapter sequence hybridized to the 5′ adapter sequence of the first transposon, a primer sequence, and a linker. In some embodiments, the 5′ transposon end sequence of the second transposon is at least partially complementary to the 3′ transposon end sequence of the first transposon. In some embodiments, the attachment adapter sequence of the attachment polynucleotide is at least partially complementary to the adapter sequence of the first transposon. In some embodiments, the linker of the attachment polynucleotide includes a binding element.
1. Transposome Complexes
In some embodiments, a transposome complex comprises a first transposon comprising the complement of a first read primer binding sequence, wherein the complement of the first read primer binding sequence comprises: a 3′ portion comprising a transposon end sequence; the complement of a first adapter sequence; and a second transposon comprising a 5′ portion comprising the complement of the transposon end sequence; and the complement of a hybridization sequence. In some embodiments, the first read primer binding sequence comprises a first read sequencing adapter sequence.
In some embodiments, the 3′ transposon end sequence comprises a mosaic end (ME) sequence and the 5′ transposon end sequence comprises an ME′ sequence.
In some embodiments, the complement of the first adapter sequence is a B15 sequence.
In some embodiments, the first read primer binding sequence is ME′-B15′.
In some embodiments, the second transposon comprises a complement attachment sequence 5′ of the first read primer binding sequence. In some embodiments, the complement attachment sequence comprises a P7 sequence.
In some embodiments, the transposome complex has a structure of:
In some embodiments, a transposome complex comprises a transposase; a first transposon comprising an attachment polynucleotide, wherein the attachment polynucleotide comprises a 5′ portion comprising an attachment sequence; a 3′ portion comprising a second read primer binding sequence, comprising a 3′ portion comprising a transposon end sequence; and an adapter; and a second transposon comprising a 5′ portion comprising the complement of the transposon end sequence; and a hybridization sequence.
In some embodiments, adapter is an A14 sequence. In some embodiments, the attachment sequence comprises a P5 sequence.
In some embodiments, the transposome complex has a structure of:
In some embodiments, the first and second transposons as described herein are annealed to each other, and the first transposon is annealed to the attachment polynucleotide. The annealed polynucleotides are then loaded onto a transposase, such as a Tn5 transposase, thereby forming a transposome complex, which is then contacted with and bound to a solid support, such as a bead. In some embodiments, the annealed transposons are bound to a solid support such as a bead and a transposase is then complexed with the transposons, thereby creating a transposome that is bound to a solid support.
2. End Sequences
In some embodiments, the first transposon includes a 3′ transposon end sequence and the second transposon includes a 5′ transposon end sequence. In some embodiments, the 5′ transposon end sequence is at least partially complementary to the 3′ transposon end sequence. In some embodiments, the complementary transposon end sequences hybridize to form a double-stranded transposon end sequence that binds to the transposase (or other enzyme as described herein). In some embodiments, the transposon end sequence is a mosaic end (ME) sequence. Thus, in some embodiments, the 3′ transposon end sequence is an ME sequence and the 5′ transposon end sequence is an ME′ sequence.
3. Adapter Sequences
As discussed above in Section II.D, in any of the embodiments of the method described herein, the first transposon includes a 5′ adapter sequence and the second transposon includes a 3′ adapter sequence. In some embodiments, the attachment polynucleotide includes an attachment adapter sequence hybridized to the 5′ adapter sequence. In some embodiments, the attachment adapter sequence is at least partially complementary to the 5′ adapter sequence. In some embodiments, the adapter sequence is an A14 sequence or a B15 sequence. Thus, in some embodiments, the 5′ adapter sequence is an A14 sequence and the attachment adapter sequence is an A14′ sequence. In some embodiments, the 3′ adapter sequence is a B15′ sequence.
In any of the embodiments, the adapter sequence or transposon end sequences, including A14-ME, ME, B15-ME, ME′, A14, B15, and ME are provided below:
4. Immobilized Transposomes and Solid Supports
In some embodiments, the transposome complex is immobilized to a solid support via the first or second transposon. In some embodiments, the transposome complex is immobilized on a bead. In some embodiments, the transposome complex is immobilized on a bead via the first or second transposon.
The terms “solid surface,” “solid support,” and other grammatical equivalents refer to any material that is appropriate for or can be modified to be appropriate for the attachment of the transposome complexes. As will be appreciated by those in the art, the number of possible substrates is multitude. Possible substrates include, but are not limited to, glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TEFLON, etc.), polysaccharides, polyhedral organic silsesquioxane (POSS) materials, nylon or nitrocellulose, ceramics, resins, silica, or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses, plastics, optical fiber bundles, beads, paramagnetic beads, and a variety of other polymers.
In some embodiments, the transposome complex is immobilized on the solid support via a binding element (and optional linker). In some embodiments, the solid support is a bead, a paramagnetic bead, a flowcell, a surface of a microfluidic device, a tube, a well of a plate, a slide, a patterned surface, or a microparticle. In some embodiments, the solid support comprises or is a bead. In one embodiment, the bead is a paramagnetic bead. In some embodiments, the solid support comprises a plurality of solid supports. In some embodiments, transposome complexes are immobilized on a plurality of solid supports. In some embodiments, the plurality of solid supports comprises a plurality of beads. In some embodiments, the plurality of transposome complexes are immobilized on the solid support at a density of at least 103, 104, 105, 106 complexes per mm2. In some embodiments, the solid support is a bead or a paramagnetic bead, and there are greater than 10,000, 20,000, 30,000, 40,000, 50,000, or 60,000 transposome complexes bound to each bead.
Suitable bead compositions include, but are not limited to, plastics, ceramics, glass, polystyrene, methylstyrene, acrylic polymers, paramagnetic materials, thoria sol, carbon graphite, titanium dioxide, latex or cross-linked dextran such as Sepharose, cellulose, nylon, cross-linked micelles and TEFLON, as well as any other materials outlined herein for solid supports. In certain embodiments, the microspheres are magnetic microspheres or beads, for example paramagnetic particles, spheres or beads. The beads need not be spherical; irregular particles may be used. Alternatively or additionally, the beads may be porous. The bead sizes range from nanometers, e.g., 100 nm, to millimeters, e.g., 1 mm, with beads from 0.2 micron to 200 microns being preferred, and from 0.5 to 5 micron being particularly preferred, although in some embodiments smaller or larger beads may be used. The bead may be coated with a binding partner, for example the bead may be streptavidin coated. In some embodiments, the beads are streptavidin coated paramagnetic beads, for example, Dynabeads MyOne streptavidin C1 beads (Thermo Scientific catalog #65601), Streptavidin MagneSphere Paramagnetic particles (Promega catalog #Z5481), Streptavidin Magnetic beads (NEB catalog #514205) and MaxBead Streptavidin (Abnova catalog #U0087). The solid support could also be a slide, for example a flowcell or other slide that has been modified such that the transposome complex can be immobilized thereon.
In some embodiments, the binding partner is present on the solid support or bead at a density of from 1000 to 6000 pmol/mg, or 2000 to 5000 pmol/mg, or 3000 to 5000 pmol/mg, or 3500 to 4500 pmol/mg.
In some embodiments, the solid surface is the inner surface of a sample tube. In some embodiments, the solid surface is a capture membrane. In one example, the capture membrane is a biotin-capture membrane (for example, available from Promega Corporation). In some embodiments, the capture membrane is filter paper. In some embodiments of the present disclosure, solid supports comprised of an inert substrate or matrix (e.g. glass slides, polymer beads etc.) which has been functionalized, for example by application of a layer or coating of an intermediate material comprising reactive groups which permit covalent attachment to molecules, such as polynucleotides. Examples of such supports include, but are not limited to, polyacrylamide hydrogels supported on an inert substrate such as glass, particularly polyacrylamide hydrogels as described in WO2005/065814 and US2008/0280773, the contents of which are incorporated herein in their entirety by reference. The methods of tagmenting (fragmenting and tagging) DNA on a solid surface for the construction of a tagmented DNA library are described in WO2016/189331 and US2014/0093916A1, which are incorporated herein by reference in their entireties. In some embodiments, the transposome complex described herein is immobilized to a solid support via the binding element. In some such embodiments, the solid support comprises streptavidin as the binding partner and the binding element is biotin.
In some embodiments, transposome complexes are immobilized on a solid support, such as a bead, at a particular density or density range. In some embodiments, the density of complexes on a solid support refers to the concentration of transposome complexes in solution during the immobilization reaction. The complex density assumes that the immobilization reaction is quantitative. Once the complexes are formed at a particular density, that density remains constant for the batch of surface-bound transposome complexes. The resulting beads can be diluted, and the resulting concentration of complexes in the diluted solution is the prepared density for the beads divided by the dilution factor. Diluted bead stocks retain the complex density from their preparation, but the complexes are present at a lower concentration in the diluted solution. The dilution step does not change the density of complexes on the beads, and therefore affects library yield but not insert (fragment) size. In some embodiments, the density is between 5 nM and 1000 nM, or between 5 and 150 nM, or between 10 nM and 800 nM. In other embodiments, the density is 10 nM, or 25 nM, or 50 nM, or 100 nM, or 200 nM, or 300 nM, or 400 nM, or 500 nM, or 600 nM, or 700 nM, or 800 nM, or 900 nM, or 1000 nM. In some embodiments, the density is 100 nM. In some embodiments, the density is 300 nM. In some embodiments, the density is 600 nM. In some embodiments, the density is 800 nM. In some embodiments, the density is 100 nM. In some embodiments, the density is 1000 nM.
In some embodiments, the composition includes a solid support and a transposome complex immobilized to the solid support. In some embodiments, the transposome complex includes a transposase, a first transposon, an attachment polynucleotide, and a second transposon. In some embodiments, the first transposon includes a 3′ transposon end sequence and a 5′ adapter sequence. In some embodiments, the attachment polynucleotide includes an attachment adapter sequence hybridized to the 5′ adapter sequence and a binding element. In some embodiments, the second transposon comprises a 5′ transposon end sequence and a 3′ adapter sequence. In some embodiments, the transposome complex is immobilized to the solid support through the attachment polynucleotide. In some embodiments, the attachment polynucleotide further comprises a primer sequence.
In some embodiments, the binding element comprises or is an optionally substituted biotin. In some embodiments, the binding element is connected to the attachment polynucleotide via a linker. In some embodiments, the binding element comprises or is a biotin linker. In some embodiments, the binding element comprises or is a 3′, 5′, or internal biotin.
Some embodiments of the transposome complex described herein include an attachment polynucleotide. As used herein, the attachment polynucleotide is a polynucleotide that hybridizes to a transposon on one end and binds to a surface on a second end. Thus, the transposome complex described herein is immobilized to a solid support through the attachment polynucleotide. In some embodiments, an attachment polynucleotide includes an attachment adapter sequence hybridized to the adapter sequence of the first transposon or the adapter sequence of the second transposon, a primer sequence, and a linker. In some embodiments, the linker includes a binding element.
As described herein the attachment adapter sequence may be at least partially complementary to the adapter sequence of the first or second transposon. In some embodiments, the attachment adapter sequence hybridizes to the 5′ adapter sequence. In embodiments when the attachment adapter sequence hybridizes to the 5′ adapter sequence, where the 5′ adapter sequence is an A14 sequence, the attachment adapter sequence is an A14′ sequence. In some embodiments, the attachment adapter sequence hybridizes to the 3′ adapter sequence. In embodiments when the attachment adapter sequence hybridizes to the 3′ adapter sequence, where the 3′ adapter sequence is a B15′ sequence, the attachment adapter sequence is a B15 sequence. In any of these embodiments, the attachment adapter sequence may be fully complementary to the adapter sequence of the first or second transposon or partially complementary to the adapter sequence of the first or second transposon.
In some embodiments, the attachment polynucleotide contains a primer sequence. In some embodiments, the primer sequence is a P5 primer sequence or a P7 primer sequence or a complement thereof (e.g., P5′ or P7′). The P5 and P7 primers are used on the surface of commercial flow cells sold by Illumina, Inc., for sequencing on various Illumina platforms. The primer sequences are described in U.S. Pat. Publ. No. 2011/0059865, which is incorporated herein by reference in its entirety. Examples of P5 and P7 primers, which may be alkyne terminated at the 5′ end, include the following:
As used herein, one example of a linker is a moiety that covalently connects a binding element to the end of the nucleotide portion of the attachment polynucleotide and may be used to immobilize the attachment polynucleotide to a solid support. The linker may be a cleavable linker, for example, a linker capable of being cleaved to remove the attachment polynucleotide, and thus the transposome complex or tagmentation product from the solid support. A cleavable linker as used herein is a linker that may be cleaved through chemical or physical means, such as, for example, photolysis, chemical cleavage, thermal cleavage, or enzymatic cleavage. In some embodiments the cleavage may be by biochemical, chemical, enzymatic, nucleophilic, reduction sensitive agent or other means. Cleavable linkers may comprise a moiety selected from the group consisting of: a restriction endonuclease site; at least one ribonucleotide cleavable with an RNAse; nucleotide analogues cleavable in the presence of certain chemical agent(s); photo-cleavable linker unit; a diol linkage cleavable by treatment with periodate (for example); a disulfide group cleavable with a chemical reducing agent; a cleavable moiety that may be subject to photochemical cleavage; and a peptide cleavable by a peptidase enzyme or other suitable means. Cleavage may be mediated enzymatically by incorporation of a cleavable nucleotide or nucleobase into the cleavable linker, such as uracil or 8-oxo-guanine.
In some embodiments, the linker described herein may be covalently and directly attached the attachment polynucleotide, for example, forming a —O— linkage, or may be covalently attached through another group, such as a phosphate or an ester. Alternatively, the linker described herein may be covalently attached to a phosphate group of the attachment polynucleotide, for example, covalently attached to the 3′ hydroxyl via a phosphate group, thus forming a —O—P(O)3— linkage.
A binding element, as used herein, is a moiety that can be used to bind, covalently or non-covalently, to a binding partner. In some aspects, the binding element is on the transposome complex and the binding partner is on the solid support. In some embodiments, the binding element can bind or is bound non-covalently to the binding partner on the solid support, thereby non-covalently attaching the transposome complex to the solid support. In some embodiments, the binding element is capable of binding (covalently or non-covalently) to a binding partner on a solid support. In some aspects, the binding element is bound (covalently or non-covalently) to a binding partner on the solid support, resulting in an immobilized transposome complex.
In such embodiments, the binding element comprises or is, for example, biotin, and the binding partner comprises or is avidin or streptavidin. In other embodiments, the binding element/binding partner combination comprises or is FITC/anti-FITC, digoxigenin/digoxigenin antibody, or hapten/antibody. Further suitable binding pairs include, but not limited to, desthiobiotin-avidin, dithiobiotin-avidin, iminobiotin-avidin, biotin-avidin, dithiobiotin-succinilated avidin, iminobiotin-succinilated avidin, biotin-streptavidin, and biotin-succinilated avidin. In some embodiments, the binding element is a biotin and the binding partner is streptavidin.
In some embodiments, the binding element can bind to the binding partner via a chemical reaction or is bound covalently by reaction with the binding partner on the solid support, thereby covalently attaching the transposome complex to the solid support. In some aspects, the binding element/binding partner combination comprises or is amine/carboxylic acid (e.g., binding via standard peptide coupling reaction under conditions known to one of ordinary skill in the art, such as EDC or NHS-mediated coupling). The reaction of the two components joins the binding element and binding partner through an amide bond. Alternatively, the binding element and binding partner can be two click chemistry partners (e.g., azide/alkyne, which react to form a triazole linkage).
In some embodiments, the attachment polynucleotide further includes additional sequences or components, such as a universal sequence, a spacer region, an anchor sequence, or an index tag sequence, or a combination thereof. A universal sequence is a region of nucleotide sequence that is common to two or more nucleic acid fragments. Optionally, the two or more nucleic acid fragments also have regions of sequence differences. A universal sequence that may be present in different members of a plurality of nucleic acid fragments can allow for the replication or amplification of multiple different sequences using a single universal primer that is complementary to the universal sequence.
Variations of the transposome complex, including the transposase, the transposons, and the attachment polynucleotide may be realized. For example, variations in configuration, design, hybridization, structural elements, and overall arrangement of the transposome complex may be realized. The disclosure and drawings provided herein provide several variations, but it is understood that additional variations within the scope of the disclosure may be readily realized.
In some embodiments, one or more library product used to generate a polynucleotide is produced by bead-based tagmentation. In some embodiments, one or more library product used to generate a polynucleotide is produced by solution-based tagmentation.
B. Truseq Methods
Forked adapter-based technology can be utilized for generating polynucleotides, for example, as exemplified in the workflow for Truseq sample preparation kits (Illumina, Inc.).
In some embodiments, an adapter composition or kit comprises a first forked adapter complex and a second forked adapter complex, wherein the first forked adapter complex comprises: a complement attachment polynucleotide comprising a 5′ portion comprising a complement attachment sequence; and a 3′ portion comprising an adapter; and a hybridization polynucleotide comprising (a) a 5′ portion comprising the complement of a portion of the adapter and hybridized thereto; and (b) the complement of a hybridization sequence, wherein the complement of the hybridization sequence is not complementary to the complement attachment polynucleotide; and the second forked adapter complex comprises an attachment polynucleotide comprising a 5′ portion comprising an attachment sequence; and a 3′ portion comprising the adapter; and a hybridization polynucleotide comprising (a) a 5′ portion comprising the complement of a portion of the adapter and hybridized thereto; and (b) a hybridization sequence, wherein the hybridization sequence is not complementary to the attachment polynucleotide.
In some embodiments, the attachment sequence comprises a P5 primer sequence and the complement attachment sequence comprises a P7 primer sequence.
In some embodiments, the complement attachment polynucleotide comprises a B15 sequence and the hybridization polynucleotide comprises a A14 sequence.
In some embodiments, the first forked adapter complex has the structure:
In some embodiments, the second forked adapter complex has the structure:
In some embodiments, the adapter complexes comprise methylated nucleotides (e.g., include methylated cytosines).
C. Methods Comprising Ligation
In some embodiments, a library of polynucleotides is prepared via a method comprising a ligation step (
In some embodiments, the overhangs are produced using restriction enzymes and restriction enzyme recognition sites. In some embodiments, the enzyme is a type II, type IIS, type IIP, or type IIT restriction enzyme. In some embodiments, the enzyme is BtgZI. In some embodiments, the enzyme is BgLII. In some embodiments, the overhangs are ligated together using a ligase.
In some embodiments, the polynucleotides are attached to a binding element, such as biotin. In some embodiments, the digested ends of polynucleotides are removed by applying a binding partner, such as streptavidin magnetic beads.
In some embodiments, forked adapters are ligated to inserts to used to generate polynucleotides with different ends (
D. Methods Comprising Strand Overlap Extension (SOE)
In some embodiments, a library of polynucleotides is prepared via a method comprising strand overlap extension (SOE) (
For example, a first library contains polynucleotides that have a first adapter sequence at one end and a second adapter sequence on the other end. In these embodiments, the first or the second adapter sequence bears a 3′ sequence that is complementary to the 3′ end sequence of a third adapter sequence in a second library. The mixing of the two libraries together by denaturation and reannealing allows the complementary ends from both libraries to hybridize. In these embodiments, a polymerase extension reaction extends the complementary regions to full length, thus generating dual-insert polynucleotides.
In many embodiments, the adapter may comprise a variety of sequences in a variety of combinations. In some embodiments, the adapter is a forked adapter that may include a P5, Read 1, tag, and/or A sequence. In some embodiments, the adapter is a forked adapter that may include a P7, Index, Read 2, tag, and/or A′ sequence.
In some embodiments, the tandem insert library is sequenced using multiple reads. In some embodiments, Read 1 and Read 4 give paired end data from the first insert. In some embodiments, Read 2 and Read 3 give paired end data from the second insert.
This application also discloses methods of generating a concatenated nucleic acid sequencing template. Multiple insert sequences can be sequenced from a concatenated nucleic acid sequencing template. In other words, a concatenated nucleic acid sequencing template can be used for generating tandem reads.
In some embodiments, a concatenated nucleic acid sequencing template is generated via formation of a hybridized adduct. As used herein, a “hybridized adduct” refers to a hybridization sequence annealed to a complement of a hybridization sequence. In some embodiments, a fully double-stranded concatenated nucleic acid sequencing template is generated after formation of a hybridized adduct.
In some embodiments, a method of generating a concatenated nucleic acid sequencing template comprises: attaching a first read primer binding sequence to the 3′ end of a first insert sequence derived from a first target nucleic acid; attaching a hybridization sequence to the 5′ end of the first insert sequence; attaching the complement of the hybridization sequence to the 3′ end of a second insert sequence derived from a discontiguous region of the first target nucleic acid or from a second target nucleic acid; and annealing the hybridization sequence to the complement of the hybridization sequence to form a hybridized adduct; synthesizing a fully double-stranded concatenated nucleic acid sequencing template from the hybridized adduct; wherein the region between the first and second insert sequences comprises a second read primer binding sequence that comprises the hybridization sequence and is orthogonal to the first read primer binding sequence; thereby generating a concatenated nucleic acid sequencing template.
In some embodiments, the attaching the first read primer binding sequence and the attaching the hybridization sequence comprises contacting the one or more target nucleic acids with a transposome complex under conditions suitable for tagmentation.
In some embodiments, the attaching the complement of the hybridization sequence to the 3′ end of a second insert sequence derived from a discontiguous region of the first target nucleic acid or from a second target nucleic acid comprises contacting the one or more target nucleic acids with a transposome complex of under conditions suitable for tagmentation.
In some embodiments, the attaching a first read primer binding sequence to the 3′ end of a first insert sequence and the attaching a hybridization sequence to the 5′ end of the first insert sequence comprise contacting one or more target nucleic acids with a first forked adapter complex under conditions suitable for ligation of the adapter complexes to the ends of the fragments to form fragments ligated at both ends with the first adapter complex and fragments ligated at both ends with the second adapter complex, and denaturing the ligated fragments.
In some embodiments, the attaching the complement of the hybridization sequence to the 3′ end of a second insert sequence comprises contacting one or more target nucleic acids with a second forked adapter complex under conditions suitable for ligation of the adapter complexes to the ends of the fragments to form fragments ligated at both ends with the first adapter complex and fragments ligated at both ends with the second adapter complex, and denaturing the ligated fragments.
In some embodiments, a method of generating a concatenated nucleic acid sequencing template comprises contacting a first sample comprising a first target nucleic acid with a first transposome complex and a second transposome complex, wherein each transposome complex comprises:
In some embodiments, a method of generating a concatenated nucleic acid sequencing template comprises:
In some embodiments, the transposome complexes are immobilized on a solid support.
In some embodiments, forked adapters may be used to prepare sequencing templates comprising more than one insert.
In some embodiments, the adapter may be a forked adapter, also known as a Y-adapter. Forked adapter-based technology can be utilized for generating polynucleotides, for example, as exemplified in the workflow for TruSeq™ sample preparation kits (Illumina, Inc.). Reagents from the workflow for TruSight® Oncology kits (Illumina, Inc.) may also be used to assemble forked adapters. In some embodiments, a forked adapter comprises a HYB or HYB′ sequence.
As used herein, a “forked adapter” refers to an adapter comprising two strands of nucleic acid, wherein the two strands each comprise a region that is complementary to the other strand and a region that is not complementary to the other strand. In some embodiments, the two strands of nucleic acid in the forked adapter are annealed together before ligation, with the annealing based on complementary regions. In some embodiments, the complementary regions each comprise 12 nucleotides. In some embodiments, a forked adapter is ligated to both strands at the end of a double-stranded DNA fragment. In some embodiments, a forked adapter is ligated to one end of a double-stranded DNA fragment. In some embodiments, a forked adapter is ligated to both ends of a double-stranded DNA fragment. In some embodiments, the forked adapters on opposite ends of a fragment are different (as shown in
In some embodiments, each forked adapter comprises a first oligonucleotide and a second oligonucleotide that are partially hybridized to each other to form a double-stranded section and a single stranded section.
In order to block a hybridization sequence (X) and its complement (X′) from binding to each other at undesired times, blocking oligonucleotides can be employed. In some embodiments, blocking oligonucleotides comprise one or more modification such that they are not targets of tagmentation. In other words, the blocking oligonucleotides may be designed to be resistant to transposases and thus avoid cleavage of the double-stranded nucleic acid formed by hybridization of a blocking oligonucleotide to a hybridization sequence or its complement. In some embodiments, a blocking oligonucleotide comprises a phosphorothioate backbone.
In some embodiments, a blocking oligonucleotide comprises the complement of all or part of the sequence one wants to block from hybridizing. Thus, in some embodiments, a blocking oligonucleotide may be all or part of an X or X′ sequence. As used herein, a “blocking oligonucleotide” refers to an oligonucleotide that can be used to inhibit binding of two sequences to each other, until the blocking oligonucleotide bound to at least one of the two sequences is removed. In some embodiments, a blocking oligonucleotide comprises a sequence that is fully or partially complementary to all or part of either the hybridization sequence (X or HYB) or its complement (X′ or HYB′). For example, a blocking oligonucleotide (X′B′) to block a HYB sequence (X in
In the case of the forked adapters shown in
In some embodiments, a blocking oligonucleotide (XB) is bound to the X′ sequence. In some embodiments, a blocking oligonucleotide (X′B′) is bound to the X sequence. In some embodiments, a blocking oligonucleotide is bound to both the X and X′ sequences. The blocking oligonucleotide may be fully or partially complementary to either an X or an X′ sequence. In some embodiments, the blocking oligonucleotide binds to the full X or X′ sequence. In some embodiments, the blocking oligonucleotide binds to a portion of the X or X′ sequence.
One or both forked adapters may also comprise an affinity moiety on the 5′ end of the first strand of the forked adapter. In some embodiments, such as that shown in
In some embodiments, the binding moiety serves to immobilize tagged fragments (prepared by ligation of forked adapters to fragments) on a solid support. In some embodiments, single-stranded fragments ligated to at least one first strand of a forked adapter will be immobilized on the solid support. In some embodiments, immobilized fragments can be washed and blocking oligonucleotides can be removed, without the fragments being released from the surface of the solid support.
In some embodiments, a first strand of a forked adapter comprises a 5′ affinity element capable of binding to an affinity binding partner on a solid support or bead. Such an affinity element may be biotin, as shown by the “Bio” in the first and second adapters shown in
In some embodiments, the affinity element is connected via a linker attached to the first strand. In some embodiments, this linker is a cleavable linker.
In some embodiments, the affinity moiety is linked to the first strand of a forked adapter by a linker. In some embodiments, the linker is a cleavable linker. In some embodiments, a user can release sequencing templates prepared from immobilized fragments from a solid support at a desired time by cleaving a cleavable linker between the affinity moiety and the first strand of the forked adapter. In some embodiments, amplicons of sequencing templates may be prepared on the surface of the solid support, in which case the amplicons may be sequenced without requiring release of sequencing templates from the surface.
In some embodiments, the hybridization sequence (HYB) and the complement of the hybridization sequence (HYB′) can hybridize to each other. However, in some cases, this could potentially lead to dimerization between different forked adapters based on binding of HYB in one forked adapter to a HYB′ in another forked adapter. Such adapter dimerization could decrease the ability to ligate the forked adapters to the end of fragments of nucleic acid.
In some embodiments, a blocking oligonucleotide is employed to block binding of HYB to HYB′ between different forked adapters until a user wants this binding to occur. In some embodiments, the hybridization sequence or its complement is bound to a blocking oligonucleotide that is fully or partially complementary to the hybridization sequence or its complement.
In some embodiments, a forked adapter comprising two polynucleotide strands comprises (a) a first strand comprising a sequencing primer sequence; and (b) a second strand comprising a 3′ hybridization sequence or its complement, wherein the 3′ end of the first strand is fully or partially complementary to the 5′ end of the second strand. In other words, the two strands of a forked adapter may hybridize together in a certain region, while the two strands are separate in another region. The sequence of the first and second strand may be different or all or partially non-complementary in the region wherein the two strands are separate, while the first and second strand may be the same and fully or partially complementary in the region wherein the two strands are hybridized together.
As is well-known in the field, additional sequences of interest can be comprised in forked adapters, such as UMIs and sample indexes. In other words, forked adapters are not limited to the types of sequences shown in
In some embodiments, the first strand and/or second strand further comprise at least one of an adapter, a barcode sequence, a unique molecular identifier (UMI) sequence, a sample index sequence, a capture sequence, or a cleavage sequence.
In some embodiments, the sequencing primer sequence comprised in a first strand of a forked adapter comprises a B15 sequence or an A14 sequence, or their complements. In some embodiments, the first strand of a forked adapter further comprises a P7 or P5 primer sequence, or their complements. Such embodiments are shown in
In some embodiments, a forked adapter is comprised in a mixture with another non-identical forked adapter. In some embodiments, a mixture comprises a first forked adapter and a second forked adapter that are different.
In some embodiments, a composition or kit comprises two forked adapters, wherein (a) the first forked adapter comprises a first strand comprising a first read sequencing primer sequence and a second strand comprising a complement of a hybridization sequence and (b) the second forked adapter comprises a first strand comprising a second read sequencing primer sequence and a second strand comprising a hybridization sequence. In some embodiments, one or both forked adapter comprised in a kit or composition comprise a blocking oligonucleotide.
A mixture of forked adapters may be ligated to double-stranded nucleic acid fragments. These fragments may be prepared from DNA (such as genomic DNA or cDNA prepared from RNA) using well-known techniques in the art, such as physical means using acoustics, nebulization, centrifugal force, needles, or hydrodynamics. Enzymatic means of preparing fragments are also well-known, such as DNase treatment.
When a mixture comprising a first forked adapter and a second forked adapter is combined double-stranded nucleic acid fragments under conditions for ligating, the predicted ratio would be 50% of fragments would be tagged with a first forked adapter at one end and a second forked adapter at a second end (
In some embodiments, tagged fragments prepared in solution by ligation of forked adapters can then be immobilized on the surface of a solid support.
In some embodiments, a method of generating one or more concatenated nucleic acid sequencing templates comprises contacting a sample comprising double-stranded nucleic acid fragments each comprising an insert prepared from a target nucleic acid with a composition or kit comprising two forked adapters, wherein one or both forked adapters comprise a blocking oligonucleotide. In some embodiments, after contacting the sample with the two forked adapters, the method comprises ligating the forked adapters to the double-stranded fragments to prepare tagged double-stranded fragments and immobilizing the tagged double-stranded fragments on a solid support.
In some embodiments, double-stranded fragments are applied to a solid support after ligation with forked adapters. In some embodiments, both the 5′ ends of tagged double-stranded fragments comprise an affinity moiety (based on ligation of the first strand of a forked adapter comprising an affinity moiety) that can bind to a binding moiety on the surface of a solid support. In some embodiments, binding of the affinity moiety to the binding moiety immobilizes fragments on the solid support, such that they will not be released from the support by temperature changes that can allow release of a blocking oligonucleotide bound to a hybridization sequence or its complement.
After immobilizing double-stranded fragments on the surface of a solid support, a method can comprise denaturing (1) the immobilized tagged double-stranded fragments to produce immobilized single-stranded fragments and (2) the blocking oligonucleotides to unblock hybridization sequences and complements of hybridization sequences. In some embodiments, the denaturing is performed with an increase in temperature, change in pH, and/or addition of one or more chaotropic agents. In some embodiments, for example, a single temperature change can mediate denaturing of the two strands of double-stranded fragments and release of the blocking oligonucleotide. In some embodiments, wherein the increase in temperature associated with denaturing is an increase from 45° C.-55° C. to 85° C.-95° C., optionally wherein the increase in temperature is an increase from 50° C. to 90° C. In some embodiments, the one or more chaotropic agents comprise formamide and/or NaOH.
In some embodiments, a first single-stranded fragment comprises an insert, and a second single-stranded fragment comprises an insert that is the complement of the insert comprised in the first fragment. In some embodiments, a first single-stranded fragment comprises an insert, and a second fragment comprises an insert that is not the complement of the insert comprised in the first fragment. In some embodiments, hybridizing occurs between single-stranded fragments prepared from double-stranded fragments comprising a first forked adapter ligated at one end of each fragment and a second forked adapter ligated at the other end of each fragment. In some embodiments, two immobilized single-stranded fragments do not hybridize to each other to form a bridge in the absence of binding of a hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment. In some embodiments, hybridizing two immobilized single-stranded fragments to each other to form a bridge does not occur between single-stranded fragments prepared from double-stranded fragments comprising the same forked adapter ligated at both ends of each fragment.
In some embodiments, the surface of the solid support is washed after the denaturing, and the blocking oligonucleotides will be removed by the wash, while the single-stranded fragments remain immobilized due to the interaction between the 5′ affinity moiety on the fragments with the binding moiety of the surface of the solid support. In some embodiments, the immobilizing of double-stranded or single-stranded fragments is by binding of an affinity moiety from the first and/or second forked adapter to one or more binding moieties on the surface of the solid support. In some embodiments, the affinity moiety is biotin, desthiobiotin, or dual biotin and the binding moiety is avidin or streptavidin.
Since the single-stranded fragments are prepared from double-stranded fragments that were already immobilized on a single surface on a solid support, complementary single-stranded fragments from a double-stranded fragment are likely to be in close proximity (as shown in
Next, the method comprises hybridizing two immobilized single-stranded fragments to each other to form a bridge by binding of the hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment and extending from the 3′ ends of both single-stranded fragments to produce a double-stranded concatenated nucleic acid sequencing template wherein each strand of the template comprises inserts (or their complements) from both immobilized single-stranded fragments (as shown in
In some embodiments, a single-stranded fragment prepared from a double-stranded fragment ligated with a first strand of a first forked adapter (such as shown in
In some embodiments, one or more additional rounds of denaturing, hybridizing, and extending are performed. In this way, the method can proceed in making sequencing templates until single-stranded fragments do not have appropriate other single-stranded fragments with which to form bridges (and concatenated sequencing templates) via HYB/HYB′ binding.
In some embodiments, both single-stranded fragments prepared from a double-stranded fragment are immobilized on the surface of the same solid support. In some embodiments, the method is performed with a single surface on a solid support, so that all fragments are immobilized on the same solid support. The left and right surfaces (shown with attachment of the first and second fragments) presented in
In some embodiments, release of blocking oligonucleotides generates “free” hybridization sequence that can bind to their complement sequences. In some embodiments, the hybridization sequence comprised in one single-stranded fragment can bind to a complement of the hybridization sequence in another single-stranded fragment. Such binding may generate a “bridge” as shown in
After elongation, a concatenated sequencing template can comprise two inserts that are copies of each other, as shown in
Single-stranded fragments with identical ligated adapters cannot hybridize to each other. For example, two fragments tagged with X′ cannot pair to each other at the hybridization sequence (
Accordingly, no sequencing templates comprising two inserts can be prepared from fragments that comprise the same adapters (as indicated by the 0% shown in
In this way, 100% of sequencing templates comprising two copies of an insert are prepared from fragments that comprised different adapters (
Accordingly, a full-length concatenated sequencing template can be prepared after elongation comprising two copies of the same insert sequences and appropriate adapters that may be needed for the desired sequencing platform, as shown in
Since double-stranded fragments are first immobilized on the solid support and then denatured, there is a high probability that two single-stranded fragments denatured from the same double-stranded fragment will be immobilized in close proximity to each other on the surface. This ordering of steps means that the two single-stranded fragments from the same double-stranded fragment (wherein one fragment comprises a Strand A sequence and the other fragment comprises a Strand A′ sequence, as shown in
Alternatively, single-stranded fragments comprising unrelated insert sequences and complementary adapters can also hybridize into bridges and then generate concatenated sequencing templates. Concatenated sequencing templates with two different inserts can serve to increase the sequencing depth by allowing additional sequence reads as compared to sequencing with standard sequencing templates that comprise a single insert.
A. Methods of Compartmentalization for Evaluating Proximity Data
Any method described herein may be used with compartmentalization. In some embodiments, compartmentalization allows for generating proximity data, such as whether different inserts were comprised in the same target nucleic acid. When the same target nucleic acid is a chromosome, compartmentalization may be used for methods of haplotype phasing as described herein.
In some embodiments, compartmentalization is used with the present methods using forked adapters or transposomes to evaluate proximity data. In some embodiments, compartments may be used with dilution to limit the number of available target nucleic acids. In some embodiments, each compartment generally comprises one or no target nucleic acid after dilution (as shown in
In some embodiments, the compartments are wells, tubes, or droplets. For example,
“Droplet” means a volume of liquid on a droplet actuator. Typically, a droplet is at least partially bounded by a filler fluid. For example, a droplet may be completely surrounded by a filler fluid or may be bounded by filler fluid and one or more surfaces of the droplet actuator. As another example, a droplet may be bounded by filler fluid, one or more surfaces of the droplet actuator, and/or the atmosphere. In another example, a droplet may be bounded by filler fluid and the atmosphere. Droplets may, for example, be aqueous or non-aqueous or may be mixtures or emulsions including aqueous and non-aqueous components. Droplets may take a wide variety of shapes; nonlimiting examples include generally disc shaped, slug shaped, truncated sphere, ellipsoid, spherical, partially compressed sphere, hemispherical, ovoid, cylindrical, combinations of such shapes, and various shapes formed during droplet operations, such as merging or splitting or formed as a result of contact of such shapes with one or more surfaces of a droplet actuator. For examples of droplet fluids that may be subjected to droplet operations using the approach of the present disclosure, see Eckhardt et al., International Patent Pub. No. WO/2007/120241, entitled, “Droplet-Based Biochemistry,” published on Oct. 25, 2007, the entire disclosure of which is incorporated herein by reference. U.S. Pat. No. 10,975,371 teaches a wide variety of applications of droplets and droplet actuators and is incorporated herein in its entirety.
In some embodiments, fragments may be prepared within compartments using two pools of forked adapters: one pool comprising forked adapters comprising a hybridization sequence (i.e., the second adapter of
In some embodiments, a method of generating one or more concatenated nucleic acid sequencing templates comprises compartmentalizing a sample comprising target double-stranded nucleic acid into a plurality of different compartments and preparing fragments each comprising an insert from the double-stranded nucleic acid within the plurality of different compartments. The method may then comprise contacting the plurality of different compartments with a composition or kit comprising two forked adapters, wherein one or both forked adapters comprise a blocking oligonucleotide, and ligating the forked adapters to the double-stranded fragments to prepared tagged double-stranded fragments within the plurality of different compartments.
In some embodiments, the method may then comprise denaturing (1) the immobilized tagged double-stranded fragments to produce single-stranded fragments and (2) the blocking oligonucleotides to unblock hybridization sequences and complements of hybridization sequences within the plurality of different compartments, and hybridizing two single-stranded fragments within the same compartment to each other to form a bridge by binding of the hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment. In some embodiments, the method may comprise extending from the 3′ ends of each single-stranded fragment to produce a double-stranded concatenated nucleic acid sequencing template comprising inserts from both single-stranded fragments within the same compartment.
In some embodiments, the target double-stranded nucleic acid comprises double-stranded DNA fragments, and the preparing fragments prepares subfragments of the double-stranded DNA fragments. In other words, the target double-stranded nucleic acid may be fragmented into relatively large fragments, which are then fragmented into subfragments in compartments. This is shown in
Since single-stranded fragments are not immobilized in this method, concatenated sequencing templates are likely prepared comprising two different insert sequences. In some embodiments, a first single-stranded fragment comprises an insert and a second fragment comprises an insert that is not the complement of the insert comprised in the first fragment.
In some embodiments, the hybridizing occurs between single-stranded fragments prepared from double-stranded fragments comprising a first forked adapter ligated at one end of each fragment and a second forked adapter ligated at the other end of each fragment.
In some embodiments, single-stranded fragments do not hybridize to each other to form a bridge in the absence of binding of a hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment. In some embodiments, the hybridizing two single-stranded fragments to each other to form a bridge does not occur between single-stranded fragments prepared from double-stranded fragments comprising the same forked adapter ligated at both ends of each fragment.
B. Haplotype Phasing
“Haplotype phasing,” as used herein, refers to identifying alleles that are co-located on the same chromosome. Sequencing data generally consists of unphased genotypes, and such data cannot differentiate which of the two parental chromosomes, or haplotypes, a particular allele falls on.
Methods of compartmentalization (such as for use in preparing whole-genome haplotyping) are well-known in the art, such as those taught in Amini et al., Nat Genet. 46(12):1343-9 (2014); Kaper F, et al. Proc. Natl. Acad. Sci. USA. 110(14):5552-5557 (2013); Kitzman J O, et al. Nat. Biotechnol. 29(1):59-63 (2011); Peters B A, et al. Nature. 487(7406):190-195 (2012); Fan H C, et al. Nat. Biotechnol. 29(1):51-57 (2011); Levy S, et al. PLoS Biol. 5(10):e254 (2007); Duitama J, et al. Nucleic Acids Res. 40(5):2041-2053 (2012); Suk E K, et al. Genome Res. 21(10):1672-1685 (2011), each of which is incorporated by reference in its entirety herein.
In some embodiments, compartmentalizing separates different haplotypes into different compartments and the method is used for haplotype phasing. In some embodiments, target nucleic acids, such as double-stranded DNA, are aliquoted into multiple compartments by limiting dilution such that an individual compartment contains a limited number of DNA molecules whereby any position of the genome is likely to be represented by haploid DNA in a compartment.
In some embodiments, the limiting dilution reduces the chance that both haplotypes (such as Chr1-Hap1 and Chr2-Hap2 in
Such a method is shown in
In the example shown in
When this method is performed with a sample from an organism with a known genome, the presence of inserts from different chromosomes in the same concatenated sequencing template (because these different chromosomes were comprised in the same compartment during the method) can be resolved from the sequencing data. By analysis to determine the chromosomes that were in the same compartment, information on the alleles comprised in a haploid copy can be determined. In some embodiments, the method does not require barcodes. Instead, the present use of concatenated sequencing templates prepared in compartments allows for analysis of which insert sequences were comprised in a haploid copy without requiring barcodes.
In some embodiments, tagmentation is performed in solution to prepare tagged double-stranded fragments. These tagged double-stranded fragments may be used for preparing sequencing templates comprising multiple inserts similarly to methods described above for ligation of forked adapters. In some embodiments, tagged double-stranded fragments are prepared in solution using two pools of transposomes, and the tagged double-stranded fragments are then immobilized on a solid support. In some embodiments, the immobilizing is performed by binding of an affinity moiety that was incorporated in tagged fragments during tagmentation to a binding moiety on a solid support.
In some embodiments, a method of generating one or more concatenated nucleic acid sequencing templates comprises (a) contacting a sample comprising double-stranded target nucleic acid with two pools of transposome complexes in solution; wherein the first pool of transposome complexes comprises a transposase; a first transposon comprising a 3′ transposon end sequence and a first read sequencing adapter sequence; and a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ complement of a hybridization sequence; and wherein the second pool of transposome complexes comprises a transposase; a first transposon comprising a 3′ transposon end sequence and a second read sequence adapter sequence; and a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ hybridization sequence.
In some embodiments, one or both second transposons comprise a blocking oligonucleotide. Such blocking oligonucleotides are described above for methods with forked adapters, and the blocking oligonucleotides may be used to inhibit binding of a hybridization sequence comprised in one pool of transposome complexes to the complement of the hybridization sequence in the other pool of transposome complexes.
In some embodiments, the method comprises tagmenting the double-stranded nucleic acids to produce tagged double-stranded fragments; releasing the transposome complex from the double-stranded fragments; and extending and ligating the double-stranded fragments.
In some embodiments, the tagged double-stranded fragments are immobilized on a solid support. In some embodiments, this immobilization is performed by binding of a 5′ affinity moiety comprised in a tag to a binding moiety on the solid support.
In some embodiments, the method then comprises denaturing (1) the immobilized tagged double-stranded fragments to produce immobilized single-stranded fragments and (2) the blocking oligonucleotides to unblock hybridization sequences and complements of hybridization sequences. In some embodiments, after the denaturing, the method comprises hybridizing two immobilized single-stranded fragments to each other to form a bridge by binding of the hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment and extending from the 3′ ends of each single-stranded fragment to produce a double-stranded concatenated nucleic acid sequencing template comprising inserts from both immobilized single-stranded fragments.
In some embodiments, the double-stranded concatenated nucleic acid sequencing template comprises an insert sequence and a copy of the insert sequence. In some embodiments, the double-stranded concatenated nucleic acid sequencing template comprises two insert sequences that are different from each other.
The hybridizing of a hybridization sequence in one single-stranded template to the complement of the hybridization sequence in another single-stranded template and extension to prepare concatenated sequencing templates can be performed as described above for forked adapter methods. Essentially, once tagged double-stranded fragments in solution are prepared (either by ligation of forked adapters or by tagmentation in solution), the later steps of immobilizing and preparing bridges and then concatenated sequencing templates can be performed by similar steps.
In some embodiments, hybridizing occurs between single-stranded fragments prepared from double-stranded fragments comprising a tag from a second transposon of a first transposome complex at one end of each fragment and a tag from a second transposon of a second transposome at the other end of each fragment.
In some embodiments, the hybridizing two immobilized single-stranded fragments to each other to form a bridge does not occur between single-stranded fragments prepared from double-stranded fragments comprising a tag from the same transposome complex at both ends of each fragment.
In some embodiments, sequencing templates comprising multiple inserts are prepared using transposomes immobilized on a solid support. In some embodiments, the solid support is a bead, slide, wall of a vessel, a flow cell, or a nanowell comprised in a flow cell.
As used herein, a “transposome complex” or a “transposome” is comprised of at least one transposase (or other enzyme as described herein) and a transposon recognition sequence. In some such systems, the transposase binds to a transposon recognition sequence to form a functional complex that is capable of catalyzing a transposition reaction. In some respects, the transposon recognition sequence is a double-stranded transposon end sequence. The transposase binds to a transposase recognition site in a target nucleic acid and inserts the transposon recognition sequence into a target nucleic acid. In some such insertion events, one strand of the transposon recognition sequence (or end sequence) is transferred into the target nucleic acid, resulting in a cleavage event. Exemplary transposition procedures and systems can be readily adapted for use with the transposases.
A “transposase” means an enzyme that is capable of forming a functional complex with a transposon end-containing composition (e.g., transposons, transposon ends, transposon end compositions) and catalyzing insertion or transposition of the transposon end-containing composition into a double-stranded target nucleic acid. A transposase as presented herein can also include integrases from retrotransposons and retroviruses.
Transposon based technology can be utilized for fragmenting DNA, wherein target nucleic acids, such as genomic DNA, are treated with transposome complexes that simultaneously fragment and tag the target (“tagmentation”), thereby creating a population of fragmented nucleic acid molecules tagged with unique adapter sequences at the ends of the fragments. Tagmentation includes the modification of DNA by a transposome complex comprising transposase enzyme complexed with one or more tag (such as adapter sequences) comprising transposon end sequences (referred to herein as transposons). Tagmentation thus can result in the simultaneous fragmentation of the DNA and ligation of the adapters to the 5′ ends of both strands of duplex fragments.
A transposition reaction is a reaction wherein one or more transposons are inserted into target nucleic acids at random sites or almost random sites. Components in a transposition reaction may include a transposase (or other enzyme capable of fragmenting and tagging a nucleic acid as described herein, such as an integrase) and a transposon element that includes a double-stranded transposon end sequence that binds to the enzyme, and an adapter sequence attached to one of the two transposon end sequences. One strand of the double-stranded transposon end sequence is transferred to one strand of the target nucleic acid and the complementary transposon end sequence strand is not (i.e., a non-transferred transposon sequence). The adapter sequence can comprise one or more functional sequences (e.g., primer sequences) as needed or desired.
The term “transposon end” refers to a double-stranded nucleic acid DNA that exhibits only the nucleotide sequences (the “transposon end sequences”) that are necessary to form the complex with the transposase or integrase enzyme that is functional in an in vitro transposition reaction. In some embodiments, a transposon end is capable of forming a functional complex with the transposase in a transposition reaction. As non-limiting examples, transposon ends can include the 19-bp outer end (“OE”) transposon end, inner end (“IE”) transposon end, or “mosaic end” (“ME”) transposon end recognized by a wild-type or mutant Tn5 transposase, or the R1 and R2 transposon end as set forth in the disclosure of US 2010/0120098, the content of which is incorporated herein by reference in its entirety. Transposon ends can comprise any nucleic acid or nucleic acid analogue suitable for forming a functional complex with the transposase or integrase enzyme in an in vitro transposition reaction. For example, the transposon end can comprise DNA, RNA, modified bases, non-natural bases, modified backbone, and can comprise nicks in one or both strands. Although the term “DNA” is used throughout the present disclosure in connection with the composition of transposon ends, it should be understood that any suitable nucleic acid or nucleic acid analogue can be utilized in a transposon end.
The term “transferred strand” refers to the transferred portion of both transposon ends. Similarly, the term “non-transferred strand” refers to the non-transferred portion of both “transposon ends.” The 3′-end of a transferred strand is joined or transferred to target DNA in an in vitro transposition reaction. The non-transferred strand, which exhibits a transposon end sequence that is complementary to the transferred transposon end sequence, is not joined or transferred to the target DNA in an in vitro transposition reaction.
In some embodiments, the transposon is a forked adapter transposon. A forked adapter transposon comprises two strands. In some embodiments, the second strand of the forked adapter transposon comprises an adapter sequence and a sequence fully or partially complementary to the first strand of the first forked adapter transposon. The sequence with full or partial complementarity in the first and second strands allow for the two strands to hybridize together and form the forked structure.
In some embodiments, more than one type of transposome complexes is immobilized on the surface of a solid support. In some embodiments, fragments can be prepared with different tags based on use of different transposomes.
In some embodiments, a solid support comprises two pools of immobilized transposome complexes. In some embodiments, a first pool of transposome complexes comprises (a) a transposase; (b) a first transposon comprising a 3′ transposon end sequence, a first read sequencing adapter sequence, and a 5′ affinity moiety; and (c) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ complement of a hybridization sequence. In some embodiments, a second pool of transposome complexes comprises (a) a transposase; (b) a first transposon comprising a 3′ transposon end sequence, a second read sequence adapter sequence, and a 5′ affinity moiety; and (c) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ hybridization sequence. In some embodiments, each first transposon is immobilized by binding of a 5′ affinity moiety to a binding moiety on the surface of the solid support.
In some embodiments, a first pool of immobilized transposome complexes comprises first forked adapter comprising a first oligonucleotide comprising P5.R1 and a second oligonucleotide comprising a X′ (complement of a hybridization sequence). In some embodiments, a second pool of immobilized transposome complexes comprises a second forked adapter comprising a first oligonucleotide comprising P7.R2 and a second oligonucleotide comprising a X (hybridization sequence). Such an exemplary embodiment is shown in
In some embodiments, a transposome complex comprises a dimer of two molecules of a transposase. In some embodiments, transposome complexes comprise homodimers and/or heterodimers.
In some embodiments, a transposome complex is a homodimer, wherein two molecules of a transposase are each bound to first and second transposons of the same type (e.g., the sequences of the two transposons bound to each monomer are the same, forming a “homodimer”). In some embodiments, the compositions and methods described herein employ two populations of transposome complexes. In some embodiments, the transposases in each population are the same. As used herein, “homodimers” refers to a transposome dimer that comprises the same transposon sequences at both sites. In some embodiments, the compositions and methods described herein employ a population of transposome complexes assembled by contacting a first forked adapter with a transposase to prepare a first transposome complex and contacting a second forked adapter with a transposase to assemble a second transposome complex and then pooling together the first and second transposome complexes. In some embodiments, a pool of transposome complexes comprises homodimers comprising a first forked adapter and homodimers comprising a second forked adapter.
In some embodiments, a transposome complex is a heterodimer, wherein two molecules of a transposase are each bound to a different forked adapter comprising a first and second transposon (e.g., the sequences of the two transposons bound to each monomer of a transposome complex are different, forming a “heterodimer”).
In some embodiments, the compositions and methods described herein employ a population of transposome complexes assembled by pooling a first forked adapter and a second forked adapter together with transposases to assemble the pool of transposome complexes. After this pooling, the predicted ratio of assembled transposome complexes would be 25% transposome complexes that are homodimers comprising the first forked adapter, 25% transposome complexes that are homodimers comprising the second forked adapter, and 50% transposome complexes that are heterodimers comprising the first forked adapter and the second forked adapter. In some embodiments, the first and/or second pool of transposome complexes are homodimers or heterodimers. In some embodiments, the first and the second pool of transposome complexes are homodimers or heterodimers. Exemplary homodimers, heterodimers, and solid supports comprising immobilized homodimers and their methods of use are disclosed in U.S. Pat. No. 9,683,230, which is incorporated herein in its entirety.
In some embodiments, one or more transposons comprise at least one of an adapter, a barcode sequence, a unique molecular identifier (UMI) sequence, a capture sequence, or a cleavage sequence. In other words, transposons may comprise additional sequences of use in methods that a user wants to perform, such as sequencing. In some embodiments, one or more transposons comprises an index sequence and/or a UMI. In some embodiments, one or more transposons comprises an index sequence and a UMI. Transposons comprising UMIs and their methods of use are described in WO 2019/108972, WO 2018/136248, WO2016176091, and WO202014437, each of which is incorporated in its entirety herein.
In some embodiments, a first transposon comprised in a first pool of transposome complexes and/or a first transposon comprised in a second pool of transposome complexes comprise sample indexes. In some embodiments, both a first transposon comprised in a first pool of transposome complexes and a first transposon comprised in a second pool of transposome complexes comprise sample indexes. In a representative example, an embodiment may include a first transposon comprising i5 that is comprised in a first pool of transposome complexes and a first transposon comprising i7 that is comprised in a second pool of transposome complexes, as shown in
In some embodiments, a second transposon comprised in a first pool of transposome complexes and/or a second transposon comprised in a second pool of transposome complexes comprise sample indexes and/or UMIs. In some embodiments, both a second transposon comprised in a first pool of transposome complexes and a second transposon comprised in a second pool of transposome complexes comprise sample indexes.
In some embodiments, both a second transposon comprised in a first pool of transposome complexes and a second transposon comprised in a second pool of transposome complexes comprise UMIs. In a representative example, an embodiment may include a second transposon comprising i8 that is comprised in a first pool of transposome complexes and a second transposon comprising i6 that is comprised in a second pool of transposome complexes, wherein i6 and i8 function as UMIs, as shown in
In some embodiments, the first and second transposons comprised in both a first pool and a second pool of transposomes may comprise either a sample index sequence or a UMI. When such transposomes are used in the present methods, a polynucleotide such as shown in
In some embodiments, a method of generating one or more double-stranded concatenated nucleic acid sequencing templates (as shown in
In some embodiments, transposome complexes are then released from the double-stranded fragments. In some embodiments, releasing the transposome complex from the double-stranded fragments is performed with SDS and washing.
In some embodiments, the method comprises extending and ligating the double-stranded fragments after releasing the transposome complexes. In some embodiments, extending and ligating comprises providing polymerase, dNTPs, and extension buffer (ELMT).
In some embodiments, the method comprises denaturing the extended and ligated double-stranded fragments into single-stranded fragments, wherein single-stranded fragments comprising a 5′ affinity moiety remain immobilized on the solid support as shown in
In some embodiments, the method comprises allowing hybridization of a hybridization sequence comprised in a first immobilized single-stranded fragment to a complement of a hybridization sequence comprised in a second immobilized single-stranded fragment thereby forming a bridge. In some embodiments, allowing hybridization comprises cooling the solid support and/or applying a hybridization buffer. In some embodiments, the cooling comprises reducing the temperature of the solid support to 60° C. or cooler. In some embodiments, the hybridization buffer comprises a high salt concentration, optionally wherein the high salt concentration is 750 mM NaCl.
In some embodiments, a hybridization sequence (X or HYB) comprised in a first single-stranded fragment can hybridize to the complement of a hybridization sequence (X′ or HYB′) comprised in a second single-stranded fragment. In some embodiments, the hybridization sequence and/or its complement is bound to a blocking oligonucleotide that is fully or partially complementary to the hybridization sequence or its complement, and the denaturing comprises denaturing the blocking oligonucleotide to unblock the hybridization sequence and/or its complement. Such blocking oligonucleotides can function as described above for forked adapters, wherein association of a hybridization sequence to its complement is blocked until the blocking oligonucleotide is denatured. In some embodiments, a forked adapter comprised in a transposome comprises 3 oligonucleotides, wherein 2 oligonucleotides comprise the first and second transposon of the forked transposon and the third oligonucleotide is a blocking oligonucleotide. In some embodiments, a blocking oligonucleotide (such as XB or X′B′) is hybridized to the forked adapter transposon at the 3′ended single stranded section of the second transposon. This blocking oligonucleotide may be hybridized to either, or both, the first and second adapter of a forked adapter transposon. In some embodiments, a blocking oligonucleotide prevents a first forked adapter transposon and second forked adapter transposon from hybridizing to one another via the 3′ complementary section of the second oligonucleotides. In some embodiments, the blocking oligonucleotide comprises nucleotides that are not a target for tagmentation.
In some embodiments, binding of a HYB comprised in a first immobilized single-stranded fragment to a HYB′ comprised in a second immobilized single-stranded fragment may be termed “bridging” (similarly to how this term is used in methods using forked adapters).
In some embodiments, a fragment comprising a X sequence can hybridize to a X′ sequence in other fragment (as shown in
In some embodiments, after bridging of two single-stranded fragments, a method comprises extending and generating a double-stranded concatenated nucleic acid sequencing template.
In some embodiments, a method comprises additional rounds of allowing hybridization and extending and generating a double-stranded concatenated nucleic acid sequencing template. In other words, the step of allowing bridging between two immobilized single-stranded fragments can be repeated until no more double-stranded concatenated nucleic acid sequencing templates can be prepared. The number of double-stranded concatenated nucleic acid sequencing templates prepared may be limited by the number of single-stranded fragments immobilized in close proximity with complementary HYB/HYB′ sequences. Once no more single-stranded fragments can partner with other single-stranded fragments, no more additional concatenated sequencing templates can be prepared.
In some embodiments, concatenated sequencing templates prepared using immobilized transposomes comprise two copies of the same insert. In some embodiments, a high ratio of DNA to transposomes leads to a high proportion of concatenated sequencing templates comprising two copies of the same insert. In some embodiments, DNA is pre-fragmented into short fragments less than 1000 bp in length before tagmentation by immobilized transposomes to produce a high proportion of concatenated sequencing templates comprising two copies of the same insert. Under such conditions, the outcome will be predominantly single-stranded fragments comprising sense and antisense complementary sequences that hybridize together, such that extension produces a concatenated sequencing template comprising two copies of the same insert.
In some embodiments, concatenated sequencing templates comprise two inserts that are not copies of each other. In some embodiments, the inserts comprised in a concatenated sequencing template are different. In some embodiments, concatenated sequencing templates comprising two different inserts are used to generate proximity data using the methods outlined below.
A. Fragmenting of Proximal or Contiguous Regions of a Double-Stranded Nucleic Acid by Spatially Localized Transposomes
Binding of double-stranded nucleic acids to transposases comprised in transposome complexes is random, but a given double-stranded nucleic acid would be fragmented by transposomes that are immobilized in a specific area of the surface of the solid support. This aspect of the method is outlined in
In sum, different fragments from the same double-stranded nucleic acid can be tagmented and immobilized across neighboring transposome complexes, as shown in
B. Proximity of Immobilized Single-Stranded Fragments for Bridging
Because single-stranded nucleic acids prepared using immobilized transposomes are immobilized before forming bridges between a HYB in a first single-stranded fragment and a HYB′ in a second single-stranded fragment, the first and second fragments that join in a bridge must be immobilized in close proximity on the surface of the solid support. For example, the first and second fragments may be the sense and antisense strands produced from the same double-stranded fragment. This is shown in
In some embodiments, single-stranded fragments prepared from different double-stranded fragments may be in close enough proximity to hybridize to each other for bridging. In essence, both the first and second single-stranded fragment are tethered to the surface of the solid support at their 5′ ends, so the free 3′ ends of each fragment (comprising HYB or HYB′) must be able to reach each other to interact. If the 3′ ends of two immobilized fragments cannot reach each other because they are immobilized too far apart on the surface of the solid support, a HYB/HYB′ bridge cannot be formed between these two fragments.
Accordingly, if the distance between two immobilized fragments is greater than the length of the longer fragment, there is no way for these fragments to interact, as their HYB/HYB′ sequences could not overlap. In some embodiments, hybridization of a hybridization sequence comprised in a first immobilized single-stranded fragment to a complement of a hybridization sequence comprised in a second immobilized single-stranded fragment only occurs when the first and second fragment are at a proximity to each other on the surface of the solid support that is closer than the length of the longer of the first or second fragment.
In some embodiments, a sufficient number of nucleotides comprised in a HYB in a first single-stranded fragment must be able to hybridize to a HYB′ in a second single-stranded fragment. If no nucleotides between the HYB in a first single-stranded fragment and a HYB′ in a second single-stranded fragment can hybridize with each other, then these two fragments cannot produce a bridge. In some embodiments, the first immobilized fragment and the second immobilized fragment are immobilized in close proximity on the solid support, wherein the close proximity allows binding of 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, or 10 or more nucleotides comprised in the hybridization sequence comprised in the first immobilized fragment to nucleotides comprised in the complement of the hybridization sequence comprised in the second immobilized fragment.
In some embodiments, the first immobilized fragment and the second immobilized fragment are immobilized within 20 to 500 nanometers of each other on the surface of the solid support. In some embodiments, the first immobilized fragment and the second immobilized fragment are immobilized within 20 to 300 nanometers of each other on the surface of the solid support. In some embodiments, immobilized single-stranded fragments that are within 500 nanometers are fewer may be able to bridge with each other via binding of a HYB in one fragment to a HYB′ in the other fragment. In some embodiments, two immobilized fragments from sequences that were adjacent in a double-stranded nucleic acid may be adjacent on the surface of the solid support without a different fragment being immobilized between them.
In some embodiments, a sample comprises multiple different double-stranded nucleic acids. In some embodiments, spatially localized fragments are prepared from the same double-stranded nucleic acid. In some embodiments, both the first and the second immobilized fragments are prepared from the same double-stranded nucleic acid, and the double-stranded concatenated nucleic acid sequencing template comprises two inserts from the same double-stranded nucleic acid.
In some embodiments, the two inserts are from two contiguous sequences comprised in the same double-stranded nucleic acid (such as the bridged fragments shown in
When a sequencing template is released from the solid support and sequenced, the presence of A and B inserts in a single-stranded template (and A′ and B′ inserts in another single-stranded template) can be used to indicate that A and B sequences are in close proximity in the same double-stranded nucleic acid. For example, the A and B sequences may be determined to have been in the same target nucleic acid.
In some embodiments, the concentration of double-stranded nucleic acid in a sample applied to the solid support is low enough to generally avoid single-stranded fragments from different double-stranded nucleic acid polynucleotides being in close enough proximity to bridge together. In this way, most fragments that bridge together (and allow for preparation of double-stranded concatenated sequencing templates) are those from double-stranded fragments prepared from the same double-stranded nucleic acid polynucleotide and not from another double-stranded polynucleotide in the same sample. In this way, concatenated sequencing templates that comprise fragments from unrelated double-stranded nucleic acids can generally be avoided when using methods with immobilized transposomes if the user prefers.
In some embodiments, the two inserts comprised in a first single-stranded fragment and a second single-stranded fragment that form a bridge between their HYB/HYB′ are from non-contiguous regions of the same nucleic acid. In some embodiments, the two inserts in a first single-stranded fragment and a second single-stranded fragment that form a HYB/HYB′ bridge are from two proximal sequences comprised in the same double-stranded nucleic acid. In some embodiments, the proximal sequences are separated by 100 or less nucleotides, 200 or less nucleotides, 300 or less nucleotides, 400 or less nucleotides, 500 or less nucleotides, 700 or less nucleotides, or 1,000 or less nucleotides in the double-stranded nucleic acid. Such relatively small distances between proximal sequences leads to a high likelihood that single-stranded fragments from these sequences may be able to bridge with each other and generate concatenated nucleic acid sequencing templates.
In some embodiments, an area of the solid support comprises multiple double-stranded concatenated nucleic acid sequencing template that share common insert sequences from proximal sequences comprised in the same double-stranded nucleic acid. Using the example nucleic acid shown in
In some embodiments, the proximity of sequences (such as A-E in
In some embodiments, fragments that are closer on the surface of the solid support (because they were prepared from fragments that were in close proximity in the double-stranded nucleic acid that was tagmented) will bridge together with a higher frequency than those that are further away. Accordingly, neighboring fragments will generally bridge with the highest frequency to form concatenated sequencing templates (excluding reannealing of single-stranded fragment prepared with the same insert including their insert sequences as shown in
Neighboring sequences will be estimated to have greater frequency of being comprised in the same concatenated sequencing template as compared to sequences that were farther apart, and this frequency will decrease as the distance between the fragments increases. It follows then that any two sequences that are separated by too large a distance in the double-stranded nucleic acid that is tagmented will not be able to bridge and form a concatenated sequencing template. The lack of these concatenated sequencing templates in sequencing data can thus be interpreted as too far a distance to form bridges between single-stranded fragments comprising a given pair of inserts.
Single-stranded fragments formed from the same double-stranded fragment (such as those comprising A and A′ in
In some embodiments, gaps in the nucleic acid sequence left after the tagmentation event may be filled using an extending step. In general, an extending step is followed by a ligating step. Extending and/or ligating are performed using appropriate conditions. In some embodiments, the buffer used is an extension-ligation mix buffer (e.g., extension-ligation mix buffer 3, ELM3). A polymerase such as T4 DNA pol Exo- (New England BioLabs, Catalog #M0203S) or Ttaq608 may be used in said extending and/or ligating step.
C. Representative Structures of Sequencing Templates Prepared Using Immobilized Transposomes
A user can design transposons comprising forked adapters to incorporate sequences of interest (such as adapters, primer binding sites, etc.). These sequences of interest can be selected by the user based on, for example, what sequencing platform they prefer to use and the requirements for sequencing templates on this platform.
Representative first and second forked adapters that may be comprised in transposomes for preparing sequencing templates described herein are shown in
In some embodiments, a sequencing template prepared using immobilized transposomes has a structure of:
D. Amplification
In some embodiments, the method comprises amplifying the generated double-stranded sequencing templates after releasing them from the surface of the solid support and before sequencing.
In some embodiments, sequencing templates are amplified using cluster amplification methodologies as exemplified by the disclosures of U.S. Pat. Nos. 7,985,565 and 7,115,400, the contents of each of which is incorporated herein by reference in its entirety. The incorporated materials of U.S. Pat. Nos. 7,985,565 and 7,115,400 describe methods of solid-phase nucleic acid amplification which allow amplification products to be immobilized on a solid support in order to form arrays comprised of clusters or “colonies” of immobilized nucleic acid molecules. Each cluster or colony on such an array is formed from a plurality of identical immobilized polynucleotide strands and a plurality of identical immobilized complementary polynucleotide strands. The arrays so-formed are generally referred to herein as “clustered arrays.” The products of solid-phase amplification reactions such as those described in U.S. Pat. Nos. 7,985,565 and 7,115,400 are so-called “bridged” structures formed by annealing of pairs of immobilized polynucleotide strands and immobilized complementary strands, both strands being immobilized on the solid support at the 5′ end, in some embodiments via a covalent attachment. Cluster amplification methodologies are examples of methods wherein an immobilized nucleic acid template is used to produce immobilized amplicons. Other suitable methodologies can also be used to produce immobilized amplicons from sequencing templates produced according to the methods provided herein. For example, one or more clusters or colonies can be formed via solid-phase PCR whether one or both primers of each pair of amplification primers are immobilized.
In other embodiments, sequencing templates are amplified in solution. For example, in some embodiments, the nucleic acid fragments are cleaved or otherwise liberated from the solid support and amplification primers are then hybridized in solution to the liberated molecules. In other embodiments, amplification primers are hybridized to the nucleic acid fragments for one or more initial amplification steps, followed by subsequent amplification steps in solution. Thus, in some embodiments an immobilized nucleic acid template can be used to produce solution-phase amplicons.
It will be appreciated that any of the amplification methodologies described herein or generally known in the art can be utilized with universal or target-specific primers to amplify the sequencing templates. Suitable methods for amplification include, but are not limited to, the polymerase chain reaction (PCR), strand displacement amplification (SDA), transcription mediated amplification (TMA) and nucleic acid sequence-based amplification (NASBA), as described in U.S. Pat. No. 8,003,354, which is incorporated herein by reference in its entirety. The above amplification methods can be employed to amplify one or more nucleic acids of interest. For example, PCR, including multiplex PCR, SDA, TMA, NASBA and the like can be utilized to amplify the sequencing templates. In some embodiments, primers directed specifically to the nucleic acid of interest are included in the amplification reaction.
Methods of evaluating proximity data of sequences within a double-stranded nucleic acid may also be performed with compartments, using compartments as described above for methods with forked adapters. In some embodiments, the compartments are wells, tubes, or droplets.
In some embodiments, transposomes within compartments are in solution. In some embodiments, transposomes are not immobilized on a solid support when preparing sequencing templates in compartments.
In some embodiments, since double-stranded fragments are not immobilized before preparing single-stranded fragments, methods with transposomes in compartments generally prepare concatenated sequencing templates comprising two different inserts. This is because the selection pressure of having the two single-stranded fragments prepared from the same double-stranded fragment in close proximity of a solid support is lost when the fragments are not immobilized and instead tagmentation happens in a solution-phase.
In some embodiments, two pools of transposomes may be used. In some embodiments, a first transposome and a second transposome as shown in
In some embodiments, a method of generating one or more concatenated nucleic acid sequencing templates comprises compartmentalizing a sample comprising target double-stranded nucleic acid into a plurality of different compartments and tagmenting the double-stranded nucleic acids to produce tagged double-stranded fragments comprising inserts from the double-stranded nucleic acid within the plurality of different compartments.
In some embodiments, the tagmenting is performed with two pools of transposome complexes. In some embodiments, the first pool of transposome complexes comprises (a) a transposase; (b) a first transposon comprising a 3′ transposon end sequence and a first read sequencing adapter sequence; and (c) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ complement of a hybridization sequence. In some embodiments, the second pool of transposome complexes comprises (a) a transposase; (b) a first transposon comprising a 3′ transposon end sequence and a second read sequence adapter sequence; and (c) a second transposon comprising a 5′ sequence fully or partially complementary to the 3′ transposon end sequence and a 3′ hybridization sequence. In some embodiments, tagmentation prepares tagged double-stranded fragments. In some embodiments, a first single-stranded fragment comprises an insert and a second fragment comprises an insert that is not the complement of the insert comprised in the first fragment.
In some embodiments, the method comprises denaturing the tagged double-stranded fragments to produce single-stranded fragments, hybridizing two single-stranded fragments within the same compartment to each other by binding of the hybridization sequence in a first fragment to the complement of a hybridization sequence in a second fragment, and extending from the 3′ ends of each single-stranded fragment to produce a double-stranded concatenated nucleic acid sequencing template comprising inserts from both single-stranded fragments. In some embodiments, templates are released from compartments before further processing.
In some embodiments, double-stranded concatenated nucleic acid sequencing templates are only produced from hybridizing of two single-stranded fragments present in the same compartment. In other words, only single-stranded fragments in the same compartment can hybridize together, and single-stranded fragments in different compartments are not available to associate with each other. In some embodiments, the compartmentalizing comprises dilution of the sample such that most compartments comprise one or no target double-stranded nucleic acid. In this way, insert sequences that are comprised in the same concatenated sequencing template are likely to have been comprised in the same target nucleic acid.
In this way, a user can identify that two sequences comprised in the same concatenated sequencing template originated from the same target nucleic acid. Such ability to identify sequences that originated from the same target nucleic acid can help to the sequences that comprise a given target nucleic acid.
In some embodiments, wherein the compartmentalizing separates different haplotypes into different compartments and the method is used for haplotype phasing. In other words, a user could evaluate sequences comprised in the same concatenated sequencing template and determine that these sequences were comprised in the same haplotype. In some embodiments, the haplotype phasing does not require barcodes.
In some embodiments, the hybridization sequence and/or its complement is bound to a blocking oligonucleotide that is fully or partially complementary to the hybridization sequence or its complement, and the denaturing comprises denaturing the blocking oligonucleotide to unblock the hybridization sequence and/or its complement. Such blocking oligonucleotides are described above for methods with forked adapters. In some embodiments, one or more blocking oligonucleotides inhibit association of first transposomes with second transposomes in solution. In other words, the timing of association of the hybridization sequence and its complement can be controlled to happen only after single-stranded tagged fragments are prepared.
In some embodiments, the denaturing is performed with an increase in temperature, change in pH, and/or addition of one or more chaotropic agents. In some embodiments, the increase in temperature is an increase from 45° C.-55° C. to 85° C.-95° C., optionally wherein the increase in temperature is an increase from 50° C. to 90° C. In some embodiments, the one or more chaotropic agents comprise formamide and/or NaOH.
In some embodiments, one or more additional rounds of denaturing, hybridizing, and extending are performed. In other words, rounds of denaturing, hybridizing, and extending may be repeated until there are no single-stranded fragments available for hybridizing with other single-stranded fragments.
In some embodiments, the method further comprising amplifying the templates.
In some embodiments, a method comprises sequencing a concatenated nucleic acid sequence template. In some embodiments, tandem reads are generated by sequencing a concatenated nucleic acid sequence template.
In some embodiments, the sequences of different inserts are generated sequentially. In some embodiments, a method of sequencing a concatenated nucleic acid sequencing template comprises sequencing the first insert sequence and sequencing the second insert sequence.
In some embodiments, a method of sequencing a concatenated nucleic acid sequencing template comprises sequencing the first insert sequence of a polynucleotide by initiating sequencing with a first read sequencing primer complementary to the first read primer binding sequence; and sequencing the second insert sequence by initiating sequencing with a second read sequencing primer complementary to the second read primer binding sequence. An exemplary method is presented in
In some embodiments, the first and second insert sequences may be generated from separate libraries (“Library A” and “Library B,” as shown in
In some embodiments, a method of sequencing a concatenated nucleic acid sequencing template comprises sequencing the complement of the second insert sequence and then sequencing the complement of the first insert sequence.
In some embodiments, a method of sequencing a concatenated nucleic acid comprises sequencing the complement of the second insert sequence by initiating sequencing with a first complement read sequencing primer complementary to the first complement read primer binding sequence; and sequencing the complement of the first insert sequence by initiating sequencing with a second complement read sequencing primer complementary to the second complement read primer binding sequence.
In some embodiments, more than two insert sequences or more than two complements of insert sequences from a polynucleotide may be sequenced.
The polynucleotides comprising multiple insert sequences described herein can be sequenced according to any suitable sequencing methodology, such as direct sequencing or next generation sequencing, including sequencing by synthesis, sequencing by ligation, sequencing by hybridization, sequencing based on detection of released protons can use an electrical detector and associated techniques that are commercially available from Ion Torrent (Guilford, CT, a Life Technologies subsidiary), nanopore sequencing and the like. In some embodiments, the DNA fragments are sequenced on a solid support, such as a flow cell. Exemplary SBS procedures, fluidic systems, and detection platforms that can be readily adapted for use with polynucleotides comprising multiple insert sequences of the present disclosure are described, for example, in Bentley et al., Nature 456:53-59 (2008), WO 04/018497; U.S. Pat. No. 7,057,026; WO 91/06678; WO 07/123744; U.S. Pat. Nos. 7,329,492; 7,211,414; 7,315,019; 7,405,281, and US 2008/0108082, each of which is incorporated herein by reference.
The methods described herein are not limited to any particular type of sequencing instrumentation used.
In some embodiments, sequencing templates comprising multiple inserts are used to determine the sequences of two or more inserts from a double-stranded nucleic acid.
In some embodiments, sequencing templates comprising two or more inserts are used to produce multiple copies of the sequence of an insert from a double-stranded nucleic acid. Although each sequence from an insert comprised in such a template would be expected to have the same sequence, it is well-known a variety of different artifacts can lead to an incorrect sequence. For example, an error that is introduced into an amplicon produced from a sequencing template during amplification can cause a discrepancy in a sequence that is not related to a different in the double-stranded nucleic acid used to prepare inserts.
A. Sequencing
In some embodiments, a method comprises releasing generated double-stranded concatenated nucleic acid sequencing templates from the solid support and sequencing the templates to determine insert sequences comprised in the templates. In some embodiments, the releasing comprising enzymatic digestion or chemical cleavage. Such means of releasing sequencing templates from the surface of a solid support are well-known in the art.
The incorporated materials of U.S. Pat. Nos. 7,985,565 and 7,115,400 describe methods of solid-phase nucleic acid amplification which allow amplification products to be immobilized on a solid support in order to form arrays comprised of clusters or “colonies” of immobilized nucleic acid molecules.
Exemplary SBS procedures, fluidic systems and detection platforms that can be readily adapted for use with amplicons produced by the methods of the present disclosure are described, for example, in Bentley et al., Nature 456:53-59 (2008), WO 04/018497; U.S. Pat. No. 7,057,026; WO 91/06678; WO 07/123744; U.S. Pat. Nos. 7,329,492; 7,211,414; 7,315,019; 7,405,281, and US 2008/0108082, each of which is incorporated herein by reference.
In some embodiments, sequencing is performed after amplifying. In some embodiments, amplification is not performed before sequencing. A number of different sequencing methods are known to those skilled in the art, such as those described in U.S. Pat. Nos. 9,683,230 and 10,920,219, each of which is incorporated by reference herein in its entirety.
In some embodiments, the sequencing fragments are deposited on a flow cell. In some embodiments, the sequencing fragments are hybridized to complementary primers grafted to the flow cell or surface. In some embodiments, the sequences of the sequencing fragments are detected by array sequencing or next-generation sequencing methods, such as sequencing-by-synthesis.
The P5 and P7 primers are used on the surface of commercial flow cells sold by Illumina, Inc., for sequencing on various Illumina platforms. Such primer sequences are described in U.S. Patent Publication No. 2011/0059865 A1, which is incorporated herein by reference in its entirety. While the P5 and P7 primers are given as examples, it is to be understood that any suitable amplification primers can be used in the examples presented herein.
In some embodiments, a sequencing primer used for sequencing comprises a sequence fully or partially complementary to one or more unique primer binding sequences comprised in the sequencing template. In some embodiments, a sequencing primer comprises at least an A2 sequence (SEQ ID NO: 40), at least an A14 sequence (SEQ ID NO: 4), or at least a B15 sequence (SEQ ID NO: 5), or their complements.
In some embodiments, sequencing is performed using sequencing primers that bind to A14, B15, and/or a hybridization sequence (HYB).
An advantage of certain methods set forth herein is that they provide for rapid and efficient detection of a plurality of target nucleic acid in parallel. Accordingly, the present disclosure provides integrated systems capable of preparing and detecting nucleic acids using techniques known in the art. Thus, an integrated system of the present disclosure can include fluidic components capable of delivering amplification reagents and/or sequencing reagents to one or more nucleic acid fragments, the system comprising components such as pumps, valves, reservoirs, fluidic lines, and the like. A flow cell can be configured and/or used in an integrated system for detection of target nucleic acids. Exemplary flow cells are described, e.g., in US 2010/0111768 A1 and U.S. Ser. No. 13/273,666, each of which is incorporated herein by reference. As exemplified for flow cells, one or more of the fluidic components of an integrated system can be used for an amplification method and for a detection method. Taking a nucleic acid sequencing embodiment as an example, one or more of the fluidic components of an integrated system can be used for an amplification method set forth herein and for the delivery of sequencing reagents in a sequencing method such as those exemplified above. Alternatively, an integrated system can include separate fluidic systems to carry out amplification methods and to carry out detection methods. Examples of integrated sequencing systems that are capable of creating amplified nucleic acids and also determining the sequence of the nucleic acids include, without limitation, the MiSeq™ platform (Illumina, Inc., San Diego, CA) and devices described in U.S. Ser. No. 13/273,666, which is incorporated herein by reference.
B. Dark Cycles in Sequencing
In some embodiments, a custom sequencing recipe can be prepared to comprise dark cycles (also known as dark regions), which are used to skip the recording of a particular sequence. As used herein, a “dark cycle” refers to a method wherein the sequencing chemistry of a particular sequence is carried out, but the sequencing is not imaged by the sequencer. WO 2012055929 and WO 2010127304 describe dark cycles, and each of these is incorporated by reference herein. Dark cycles can be used to mitigate phasing/prephasing issues relating to repeatedly sequencing low diversity sequences, such as a library of ME sequences, that may globally worsen the sequencing result. After the dark cycles, the imaging of sequences is resumed so that the insert sequences comprised in sequencing templates are recorded.
A custom sequencing protocol can include an appropriate number of dark cycles to span the length of the sequence to be skipped over. In other words, the number of dark cycles can be based on the number of bases intended to be skipped over. For example, if the sequence to be skipped over is an ME sequence, which is 19 bases long, 19 dark cycles are used. In some embodiments, the sequence to be skipped over is an ME sequence or its complement. In embodiments with a 19-nucleotide long ME, the number of dark cycles is 19. With a ME having a different number of nucleotides, the dark cycle is generally the number of nucleotides. In some embodiments, a user can skip the entire ME. In some embodiments, a user can skip most of the ME domain and sequence part of it, ignoring those nucleotides comprised in the ME that are sequenced.
In some embodiments, the sequencing method comprises dark cycles wherein data are not being recorded for a portion of the sequencing method. In some embodiments, the data not being recorded are sequence data associated with the 3′ transposon end sequence. In some embodiments, the sequence data not being recorded is an ME sequence. In some embodiments, the dark cycles comprise 19 cycles.
In some embodiments, sequencing comprises dark cycles wherein data are not being recorded for a portion of the sequencing. In some embodiments, the data not being recorded are sequence data associated with a transposon end sequence or its complement (ME or ME′).
Examples of where binding of a sequencing primer to a sequencing primer sequence (i.e., a primer binding site) is shown in the arrows on top of the representative polynucleotides in
In some embodiments, the sequencing method does not comprise dark cycles. In these embodiments, custom primers are used to obviate the need for dark cycles. In some embodiments, the custom primers may be bridged primers that comprise a sequence that aligns with ME, wherein the ME sequence is not imaged.
C. Error Correction or Identification of Mutations Present in a Single Strand of a Double-Stranded Nucleic Acid
In some embodiments, concatenated sequencing templates comprising two copies of the same insert can be used for error correction and identification of mutations that are only present in a single strand. This is because, in essence, a read of a single concatenated sequencing template is equivalent to reading both strands of a double-stranded nucleic acid that is tagmented. Thus, preparing and sequencing concatenated sequencing templates can increase the sequencing depth. Increased sequencing depth can be crucial for discovering rare somatic mutations present in, for example, a patient with a solid tumor to increase the chance of identifying the mutation.
In some embodiments, results from sequencing of the concatenated sequencing templates described herein allows for error correction. Such errors can include correcting for random errors introduced during amplification or sequencing itself
In some embodiments, results from sequencing of the concatenated sequencing templates described herein allows for identification of mutations or other base pair differences that are present only in one strand of a double-stranded nucleic acid.
Different means of preparing such concatenated sequencing templates comprising two copies of the same insert are described herein, such as extension after bridging of single-stranded fragments prepared using ligation of forked adapters (as shown in
In some embodiments, a difference between two copies of a sequence in a concatenated sequencing template is due to an error (such as a mistake introduced by sequencing or amplifying).
In some embodiments, the method comprises evaluating sequencing results from multiple sequences of a given insert prepared from different templates and correcting errors in sequencing results for this insert. In some embodiments, correcting the error is based on the sequencing data from the insert and its complement comprised in the same concatenated sequencing template and/or the insert comprised in multiple concatenated sequencing templates.
In some embodiments, a difference between two copies of a sequence in concatenated sequencing template is due to mutation that was only present in a single-strand of the double-stranded nucleic acid that is tagmented. Such a mutation present in only one strand may be termed “non-canonical base pairing” and may be due to nucleobase damage or mutation. Such non-canonical base pairings can generally be difficult to evaluate, and the present method may improve on identification of such base pairings.
In some embodiments, a method comprises evaluating sequencing results from multiple sequences of a given insert prepared from different templates. In some embodiments, determining instances of non-canonical base pairing based on the sequencing data from the insert and its complement comprised in the same concatenated sequencing template; and/or the insert comprised in multiple concatenated sequencing templates.
D. Determining Proximity or Contiguity Information
In some embodiments, a method comprises evaluating sequences of inserts comprised in the same template and determining proximity data for sequences comprised in the double-stranded nucleic acid based on inserts that are comprised in the same template.
As shown in
In some embodiments, concatenated sequencing templates comprising an insert sequence and a copy of the same insert may be used for methylation analysis. These sequences may be described above as concatenated sequences with “two copies” of an insert sequence, however, a copy of an insert sequence would not comprise modified nucleotides (such as modified cytosines) in the absence of conditions to promote them. This aspect is shown in
As used herein, “methylation analysis” refers to evaluating whether cytosines in a given insert from a target nucleic acid are methylated or hydroxymethylated. As used herein, “modified cytosines” refers to methylated or hydroxymethylated cytosines, “unmodified cytosines” refers to cytosines that are not methylated. In some embodiments, the methylated cytosine is 5-methylcytosine (5mC), and the hydroxymethylated cytosine is 5-hydroxymethylcytosine (5hmC).
Means of performing methylation analysis are generally known in the art, but these methods may rely on comparison of two different aliquots of a sample (one aliquot treated with an agent to alter modified or unmodified cytosines and the other aliquot untreated). Standard sequencing analysis for methylation analysis can then be performed to identify modified cytosines, often by evaluating mismatch between treated and untreated aliquots and/or evaluating differences in the sequence results from complementary sequences from a target nucleic acid.
The present methods instead use double-stranded concatenated sequencing templates prepared from a sample comprising target nucleic acid without requiring two separate aliquots of a sample. Further, the present methods have an insert sequence and a copy of insert sequence linked together in a single-stranded concatenated sequencing template and differences between these two sequences can be used for methylation analysis. The analysis of these linked sequences will be more straightforward than analysis of unlinked sequences and require only a single sample.
In some embodiments, the two complementary strands of a double-stranded concatenated sequencing template are amplified (such as with cluster amplification) and sequenced on a flowcell, which allows for a base coding analysis to identify modified and unmodified cytosines, as described herein. In some embodiments, the amplification replaces uracils that are incorporated into sequencing templates with thymines, as uracils will stall polymerases used for SBS sequencing. In some embodiments, the replacement of uracils with thymines during amplification is based on the presence of dTTP in the cluster amplification mix (and absence of dUPT in the cluster amplification mix).
The present application discloses a wide variety of different ways that one skilled in the art may choose to perform such analysis, as shown in
In some embodiments, after conversion of cytosines or modified cytosines to uracils or dihydroxyuracil (DH U), a PCR reaction converts the uracils or DHU's to thymines. In this way, a T/G mismatch (instead of a standard C/G match) in complementary sequences can be evaluated as a position that comprised either a cytosine or modified cytosine, as will be discussed below.
In some embodiments, a method of identifying modified cytosines comprised in an insert sequence comprised in a concatenated sequencing template comprises preparing a double-stranded concatenated sequencing template, wherein each strand comprises an insert sequence and a copy of the insert sequence and the two strands are complementary to each other and subjecting each strand to a condition for altering modified and/or unmodified cytosines. A variety of approaches will be described herein, but one skilled in the art could choose any method to alter either modified or unmodified cytosines. In some embodiments, altering either modified or unmodified cytosines allows a user to identify positions of modified or unmodified cytosines in a target nucleic acid, as will be described herein for some representative methods.
An exemplary double-stranded concatenated sequencing template, wherein each strand comprises an insert sequence and a copy of the insert sequence and the two strands are complementary to each other, that may be used for the present method is shown in
In some embodiments, the method further comprises preparing amplicons of each single-stranded concatenated sequencing template and sequencing amplicons and evaluating sequencing results for the insert sequence and the copy of the insert sequence in the amplicons produced from each strand. In some embodiments, the method comprises determining positions of modified cytosines comprised in the insert sequence based on the sequences of each strand of the double-stranded concatenated sequencing template.
In figures shown herein, one strand may be referred to as a “top strand” and another as “bottom strand” to indicate that these are complementary single-stranded templates that are comprised together in a double-stranded concatenated sequencing template.
In some embodiments, the concatenated sequencing templates are prepared by a method described herein. Alternatively, other methods of preparing concatenated sequencing templates may be used, such those described in the CODEC method (described in Bae et al., bioRxiv, 10.1101/2021.06.11.448110, posted Jun. 12, 2021), followed by the presently described methylation analysis.
In some embodiments, extension to produce the double-stranded concatenated sequencing template is performed with a reaction solution comprising methylated-dCTP, as shown in
In some embodiments, uracils comprised in the concatenated sequencing templates are converted to thymines when preparing amplicons. This aspect is shown, for example, in
In some embodiments, modified cytosines are altered by TET-Assisted Pyridine Borane Sequencing (TAPS). A method comprising TAPS is shown in
In some embodiments, unmodified cytosines are altered by a chemical or enzymatic reaction. In other words, modified cytosines may remain unaffected, but unmodified cytosines may be altered. In some embodiments, the chemical reaction is treatment with sodium bisulfite. In some embodiments, the enzymatic reaction comprises treatment with Tet methylcytosine dioxygenase 2 (TET2), T4-BGT, and APOBEC3A (using, for example, a method known as EM-seq, as described in Vaisvilas et al., Genome Res. 31(7): 1280-1289 (2021)). Such a method is shown in
In some embodiments, the method differentiates positions of methylated cytosines from hydroxymethylated cytosines. In some embodiments, additional reaction steps allow for reactions to differentiate methylated cytosines from hydroxymethylated cytosines.
In some embodiments for differentiating positions of methylated cytosines from hydroxymethylated cytosines, the subjecting each strand to a condition for altering modified and/or unmodified cytosines comprises (a) reacting each strand with β-glycosyltransferase; (b) reacting each strand with a DNA methyltransferase (DNMT); and (c) reacting each strand with a condition that converts unmodified cytosines to uracils. Such a method is shown in
In some embodiments for differentiating positions of methylated cytosines from hydroxymethylated cytosines, the subjecting each strand to a condition for altering modified and/or unmodified cytosines comprises (1) reacting each strand with a DNMT; and (2) reacting each strand with a condition that converts methylated cytosines to dihydroxyuracil (DH U, such as using TAPS). Such a method is shown in
A. Methods Comprising Conversion of Unmodified C's to U's
In some embodiments, methylation analysis is performed with conversion of unmethylated cytosine to uracil while leaving 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) intact. An exemplary method is bisulfite sequencing. Since PCR amplification of the bisulfite-treated DNA reads uracil as thymine, the modification of each cytosine can be inferred at single base resolution, where C-to-T transitions provide the locations of the unmethylated cytosines.
B. Methods Comprising Conversion of Modified C's to U's
In some embodiments, a bisulfite-free method is used for methylation analysis. In some embodiments, TET Assisted Pic-borane Sequencing (TAPS) converts modified cytosine into dihydroxyuracil (DH U), a near natural base, which can be “read” as T by common polymerases. In some embodiments, TAPS detects cytosine modifications directly without affecting unmodified cytosines. In some embodiments, TAPS can be used to detect 5mC and 5hmC. Since PCR amplification of the TAPS-treated DNA reads DH U as thymine, the modification of each cytosine can be inferred at single base resolution, where C-to-T transitions provide the locations of the modified cytosines.
C. Methods Comprising Treatment with β-glucosyltransferase
In some embodiments, β-glucosyltransferase is used in methods to selectively convert hydroxymethylcytosines (hmC) to glucosylated-methylcytosines (gmC). In some embodiments, hydroxymethylated cytosines are “protected” from later reactions that alter methylated and hydroxymethylated cytosines. Such a method is shown in
D. Methods Comprising Treatment with a DNMT
In some embodiments, a DNA methyltransferase (DNMT) is used. In some embodiments, the DNMT is DNA methyltransferase 1 (DNMT1). In some embodiments, a DNMT such as DNMT1 recognizes a hemi-methylated mCpG/GpC motif and methylates the unmethylated C to form mCpG/GpmC. DNMT1 has no activity on hemi-hydroxymethylated CpG sequences as described in Takahashi et al., FEBS Open Bio 5 (2015) 741-747. Accordingly, treatment with DNMT can be used in methods to differentiate methylated cytosines from hydroxymethylated cytosines, as shown in
Polynucleotides comprising multiple insert sequences can be generated via methods based on bead-linked transposomes (BLTs).
Exemplary polynucleotides comprising two insert sequences can be generated by tagmentation followed PCR reactions to generate two libraries comprising different types of products: one library wherein the library products comprise P5-A14/Hyb-B15-ME sequences and one library wherein the library products comprise P7-B15/Hyb′-A14-ME sequences, as shown in
The resulting polynucleotides comprising multiple insert sequences can be used to generate a “tandem reads library,” which is a library of concatenated nucleic acid sequencing templates that can be sequenced.
The workflow of preparing the polynucleotide with multiple insert sequences leverages the well-established bead-linked transposome library preparation technology (e.g. Nextera flex) or adapter-based methods (e.g. Truseq).
In an exemplary method, libraries products comprising A14 and B15 sequences were generated by tagmentation to add A14 and B15 sequences during a tagmentation reaction (
After clean-up, libraries are mixed. Based on hybridized adducts generated between HYB and HYB′, extended products can then be prepared. Only those products that are boxed in
In an exemplary method, libraries products comprising insert, adapter, and hybridization sequences were generated via tagmentation by BLTs followed by addition of HYB and HYB′. In this exemplary method, one tube used bead-based tagmentation to form a P5-HYB′ forked library and another tube used solution-based tagmentation to form a P7-HYB forked library. HYB and HYB′ were added to the library products after tagmentation.
First, a P5/HYB′ library was generated using 10 μL of BLTs (10 fmole) and washed with 200 μL wash buffer. Next, 176 μL working buffer was mixed with 10 μL of single strand binding protein. Wash buffer was removed from the beads and 44 μL of working buffer plus SSB mix was added. The solution was incubated 1 min at RT. A total of 6 μL of 10× tagmentation buffer was then added to the beads, and tagmentation proceeded for 10 minutes at 37° C. Then, 124, 5% SDS was added and incubated at 37° C. for 10 minutes, followed by three washes with 200 μL wash buffer and resuspension in 200 μL wash buffer.
To add the hybridization sequence, fragments were incubated at ° C. for 5 mins to denature the ME′ sequence. After a quick wash with 200 μL wash buffer, beads were resuspended in 80 μL of 2 μM ME′-HYB′, and an Annealrt program was run starting from 60° C., going down to 20° C. (1° C. per cycle). Beads were washed with 200 μL wash buffer, resuspended in 804, ELM3, and then rotated for 30 minutes at RT. Beads were washed with 200 μL wash buffer and stored at 4° C. in wash buffer.
Separately, a P7/HYB library was prepared using an oligonucleotide (oligo) duplex comprising a P7-B8-ME/ME′. The oligonucleotide duplex comprised Oligo 1 and Oligo 2. Table 2 describes the components of the reaction solution for generating the oligonucleotide duplex.
After the oligonucleotide duplex solution was prepared, an Annealrt recipe was performed on PCR using the protocol in Table 3. The duplex was saved at −20° C. for long-term storage, and multiple freeze thaw cycles were avoided.
The enzyme complex was assembled as outlined in Table 4, incubated overnight at 37° C., and then stored at 20° C.
The enzyme complex was diluted 1 into 5 in standard storage buffer to 400 nM. A tagmentation reaction was prepared based on Table 5, and the tagmentation proceeded for 5 minutes at 55° C.
Column clean-up was performed with zymo-kit and eluted in 20 μL of resuspension buffer (RSB). Then a total of 18 μL of tagmented library plus 2 μL of 100 μM HYB-ME′ oligo (final concentration 1004) was incubated at 75° C. incubation for 5 minutes, followed by a slow ramp to 20° C. to replace the ME′ oligo with HYB-ME′ using Oligo 3 and Oligo 4.
A total of 180 μL of ELM3 was added, and the solution was rotated at RT for 30 minutes. A SPRI bead clean-up was performed.
At this step, the P5 library was on beads and the P7 library was in solution. Both libraries were mixed and an Annealrt program was started going from 40° C. going down to 20° C., followed by washing the beads and resuspending in 100 μL AMS1 extension buffer (comprising a strand-displacing polymerase such as Bst polymerase and nucleotides). The resuspended solution was washed with NaOH and library was amplified off the bead surface. In this example, the PCR was performed with P5/A14 and P7/B15 primers. Ampure bead clean-up was performed to remove unattached adapters.
The Qubit Concentration was measured as 0.849 μL/mL, which is approximately 2 nM. A 5 pM single-stranded library was made on a FC #CD79K, seeded miseq flowcell. The clusters did not appear consistent with 5 pM, as they were also dim, so another 24-cycle amplification was performed.
The protocol forms hybrid libraries, but may not have sufficient efficiency. For example, denaturing on beads with NaOH may cause sample loss and insufficient density on the flowcell for sequencing. Preparation of both libraries on beads may improve yields.
The workflow for preparing hybrid DNA library can be performed with bead-linked transposons (BLTs). A difference from a standard protocol for library preparation is the presence of two types of beads (type I beads have BLTs comprising ME′-HYB′ and type II beads have BLTs comprising ME′-HYB at the non-inserted strand of transposon).
After BLT tagmentation and gap-fill ligation (using ELM3), there are two options for library preparation completion. As shown in
The alternate method is shown as
Again, AMS1 polymerase extension mix is added to extend the strand to make P5-P7′ or P7-P5′ library and then the libraries are collected from beads using PCR or other releasing conditions (such as denaturing buffer+high temperature).
These approaches for hybridization of HYB to HYB′ and extension to form concatenated nucleic acid sequencing templates can be used for library products from other sources, such as those generated by Truseq or other types of transposome reactions.
A protocol was developed using desthiobiotin-tagged oligonucleotides. Desthiobiotin tagging can avoid the need for a NaOH denaturation step.
To generate the P5/HYB′ library, a total of 10 μL of BLTs (10 fmole) was washed with 200 μL wash buffer. 176 μL working buffer was mixed with 1 μL of single strand binding (SSB) protein. Wash buffer was removed from the beads and 44 μL of working buffer plus SSB mix was added and incubated for 1 minute at RT. Then, 6 μL of 10× tagmentation buffer was added to the beads and tagmentation proceeded for 10 minutes at 37° C. 12 μL of 5% SDS was added and incubated at 37° C. for 10 minutes. Beads were washed three times with 200 μL wash buffer and resuspended in 200 μL wash buffer. Beads were incubated at 60° C. for 5 minutes to denature ME′ and quickly washed with 200 μL wash buffer. Beads were resuspended in 80 μL of 2 μM ME′-HYB′. The Run Annealrt program was run starting from 60° C., going down to 20° C. (1° C. per cycle). Beads were washed with 200 μL wash buffer and resuspended in 80 μL ELM3 extension-ligation buffer and rotated for 30 minutes at RT, then washed with 200 μL wash buffer and saved in wash buffer at 4° C.
The P7/HYB library was generated using a single-desthiobiotin P7-B8-ME oligonucleotide to create an enzyme complex and was assembled to Dynabeads M280 streptavidin beads. In contrast, the P5/HYB′ were generated using BLTs having dual desthiobiotin. Therefore, the release conditions are different for the 2 libraries, with the P5/HYB′ library generated with BLTs having dual desthiobiotin having release conditions of 20 mM biotin at 60° C., while the P7/HYB library will have a single desthiobiotin with release conditions of 10 μM biotin at 70° C.
To prepare the P7/HYB library, an oligonucleotide (oligo) duplex was prepared as described in Table 6.
After the oligonucleotide duplex solution was prepared, an Annealrt recipe was performed on PCR using the protocol in Table 7. The duplex was saved at −20° C. for long-term storage, and multiple freeze thaw cycles were avoided.
The enzyme complex was assembled as outlined in Table 8, incubated overnight at 37° C., and then stored at 20° C.
40 μL of M280 beads was washed with 200 μL wash buffer, resuspended in 40 μL wash buffer, and 2 μL of 2 μM transposome complex (10 fmole per BLT) was added. The beads were rotated for 30 minutes at RT, washed, and resuspended in 40 μL of wash buffer. 10 μL of enzyme beads was washed with 200 μL wash buffer. 176 μL of the working buffer was mixed with 1 μL of single strand binding protein. Wash buffer was removed from the beads and 44 μL of working buffer plus SSB mix was added and incubated for 1 min at RT. 6 μL of 10× tagmentation buffer was added to beads and tagmentation proceeded for 10 minutes at 37° C. Then, 12 μL 5% SDS was added and incubated 37° C. for 10 minutes. Beads were washed three times with 200 μL wash buffer and resuspended in 200 μL wash buffer. Beads were then incubated at 60° C. for 5 minutes to denature ME′, quickly washed with 200 μL wash buffer, and resuspended beads in 80 μL of 2 μM ME′-HYB. A Run Annealrt program was run starting from 60° C., going down to 20° C. (1° C. per cycle). Beads were washed with 200 μL wash buffer and resuspended in 80 μL ELM3 extension ligation buffer and rotated for 30 mins at RT. Beads were washed with 200 μL wash buffer and saved in 4° C. in wash buffer.
At this point, 2 separate library sets on beads are ready. 15 cycle PCR was performed with each library set, and the supernatant of PCR product shows BA peaks on the expected location. In the PCR reaction, for P5/HYB′ library P5 and HYB were used as PCR primer 1 and for P7/HYB library P7 and HYB′ were used as PCR primer 2, as outlined in Table 9.
P7/HYB beads were resuspended in 10 mM biotin in HT1 hybridization buffer and released at 60° C. for 10 minutes since Oligo 1 of the oligonucleotide duplex comprised a single desthiobiotin. The supernatant was added to P5/HYB beads and then a slow ramp down was started from 50° C. going down to 20° C. to hybridize the library products. Then, beads were washed with wash buffer, and AMS1 was added and incubated at 50° C. for 10 minutes. Polynucleotide comprising two insert sequences (one from each library) were loaded and released onto the flowcell with 20 mM biotin in HT1 hybridization buffer.
Initial experiments were performed with a HYB sequence that may be referred to as HYB1.
An updated HYB design, HYB2, involved additional A/T content, shuffling of A and G nucleotides, and a C/G lock on the 5′ end of the HYB sequence.
Polynucleotides comprising multiple insert sequences were also prepared using a Truseq PCR Free protocol.
1 μg of NA12878 genomic DNA was used as input for each forked library, followed by the Illumina Truseq PCR free protocol to sheer the DNA and to do end repair and A-tailing.
For ligation step used P5/HYB2′ adapters and P7/HYB2 adapters sets were used. The P7/HYB2 adapters (SEQ ID NOs: 24 and 25) were used for insert sequence 1, while the P5/HYB2′ adapters (SEQ ID NOs: 26 and 27) were used for insert sequence 2. In these adapters, C's were methylated.
Adapters sets were prepared (10 pM final concentration) using the Annealrt recipe in Table 10, with the duplex saved at −20 C for long-term and avoiding multiple freeze thaw cycles. The oligonucleotide stock concentration was 100 pM, with a final adapter concentration of 10 μM in 1× annealing buffer (20 mM Tris, 50 mM NaCl, 0.01 mM EDTA).
Ligation was performed following the Illumina PCR free Truseq protocol for ligation step using the custom adapter sets. Dual clean-up was performed as listed on the Truseq protocol, and final libraries were eluted in 22.5 μL Illumina resuspension buffer.
Forked libraries were then ready for stacking to prepare polynucleotides comprising two insert sequences. 6 μL of forked library product with P5/Hyb2′ and 6 μL of forked library product with p7/Hyb2 was mixed, and 1.3 μL of 10× annealing buffer was added. The annealing program on PCR listed in Table 11 was used to hybridize the two library products.
After the annealing step, 1174, (9× the volume of annealed libraries) of AMS1 was added followed by incubation at 50° C. for 10 minutes. After extension, Illumina-compatible tandem libraries were formed. A 1×SPRI clean-up was performed and sample was eluted in 12 μL of Illumina resuspension buffer
A Bioanalyzer run was done to confirm the size of the tandem library, and qPCR was used to quantify the final library product. As shown in
Tandem library can be sequenced on Illumina platforms with recipe modifications to have four reads instead of two. The location of sequencing primers was updated to use the correct sequencing primer for each sequencing read.
In these experiments, human genome library fragments were generated using bead-linked transposons followed by preparation of polynucleotides comprising multiple inserts. Polynucleotides were sequenced via Miseq FC. Data shown in
Example reads from 10 clusters are shown in Table 12 to illustrate successful linking of two library fragments into a single cluster. 4×100 cycles of sequencing were performed and the resulting pairs of reads were mapped to the human genome. Table 12 shows the tile, x and y coordinate of the cluster as reported in BAM file. For a given cluster, the chromosome where each read mapped to is provided. As expected, the two paired reads from each library map to the same chromosome and the two library fragments map to different chromosomes. Thus, results in Table 12 show that the two inserts in a polynucleotide come from different regions in the human genome.
These results of reads from individual clusters demonstrates successful linking of two library fragments into a polynucleotide and sequencing of the two separate insert sequences.
Polynucleotides comprising multiple insert sequences were generated using a method comprising restriction enzyme digest and ligation. In the exemplary method described herein
An 8-lane sequencing flow cell was prepared that contained polynucleotides from the tandem insert library polynucleotides at different concentrations: lane 1 had 2 pM, lane 2 had 10 pM, lane 3 had 20 pM, lane 6 had 2 pM, lane 7 had 10 pM, and lane 8 had 20 pM. Lanes 4 and 5 were lanes for control reactions: lane 4 had monotemplate control reaction and lane 5 had PhIX sequencing library control reaction (
As shown in
The proportion of each of the 4 bases detected at each cycle of sequencing for both inserts are represented in a % base-call per cycle plot in
Polynucleotides comprising multiple insert sequences were generated using a method comprising strand overlap extension (SOE). In the exemplary method described herein (
A sequencing flow cell was prepared that contained polynucleotides from the tandem insert library polynucleotides in all lanes except for lane 5, which contained a single insert control PhiX library. Reads 1 and 4 were used to sequence inserts from the PhiX monotemplate (
Primary metrics from the four-read sequencing run are shown in
The proportion of each of the 4 bases detected at each cycle of sequencing for both inserts are represented in a % base-call per cycle plot in
A method of preparing sequencing templates comprising two or more inserts may be performed with forked adapters and a surface for immobilizing fragments with ligated adapters, with the solid support allowing hybridization of multiple fragments together to generate concatenated sequencing templates.
A first and a second adapter can be prepared, as shown in
A blocking oligonucleotide may be hybridized to one or both forked adapter at the 3′ end of the second strand of either forked adapter (i.e., a blocking oligonucleotide is hybridized to the single-stranded section of the second strand of the forked adapter). This blocking oligonucleotide may be hybridized to either, or both, the first forked adapter or the second forked adapter (
When a mixture of the first forked adapter and the second forked adapter is ligated to the ends of a double-stranded DNA fragment comprising a first strand (the top strand A in
The fragments with ligated adapters can then be added to a surface and attached via the 5′ affinity moiety of the first strands of the forked adapters. The surface may be a bead, or a slide, or a wall of a vessel, or a nanowell on a flow cell. The fragments can next be denatured and subject to flow such that the blocking oligonucleotide is removed. Denaturation can occur by several ways known to those skilled in the art, including heat, pH, or chaotropic agents.
When the surface is subject to conditions that favor renaturation (such as cooling of the surface), the two single-stranded fragments may fully reanneal across their entire length. Alternatively, only single-stranded fragments that have an adapter sequence from a first forked adapter at one end and an adapter sequence from a second forked adapter at the other may reanneal just by their 3′ complementary ends (i.e., binding of the X sequence of the second strand of the second forked adapter with the X′ sequence of the second oligonucleotide of the first forked adapter, as shown in
Fragments that comprise a sequence from a first forked adapter at both ends cannot anneal to each other via their 3′ ends (
As shown in
The concatenated sequencing template also comprises the complement the original A′ bottom strand linked to a copy of the A′ bottom strand. In the final stage of library preparation for sequencing, the top and bottom strands are harvested from the surface by disrupting the 5′ surface binding moiety, followed by denaturing the library. Thus, the top and bottom strand are sequenced independently of one another. They may also be replicated by PCR or other methods that copy DNA before sequencing.
In other cases, a sequencing template may comprise two insert of more inserts that are not copies of each other. Such sequencing templates can be generated by two fragments that anneal by binding of X to X′, without the inserts in the two fragments being complementary. In other words, some sequencing templates can have two copies of the same insert, while other sequencing templates can comprise two different inserts with unrelated sequences.
A method for preparing sequencing templates comprising two or more inserts may use forked adapters and a means of compartmentalization.
A pool of DNA molecules, for example, separate genomes, separate chromosomes, or large fragments of DNA (>1000 bp, preferably greater than 5000 bp) is aliquoted into multiple compartments by limiting dilution such that an individual compartment contains no DNA molecules, a single DNA molecule, or a limited number of DNA molecules equating to a fraction of one haploid copy whereby any position of the genome is likely to be represented by haploid DNA. Methods incorporating compartmentalization primarily capture contiguity information, but these methods can also produce concatenated sequencing templates with two copies of a given insert sequence (via hybridization of fragments comprising a sense strand and antisense strand of the same insert sequence).
Methods of compartmentalization (such as for use in preparing whole-genome haplotyping) are well-known in the art, such as those taught in Amini et al., Nat Genet. 46(12):1343-9 (2014); Kaper F, et al. Proc. Natl. Acad. Sci. USA. 110(14):5552-5557 (2013); Kitzman J O, et al. Nat. Biotechnol. 29(1):59-63 (2011); Peters B A, et al. Nature. 487(7406):190-195 (2012); Fan H C, et al. Nat. Biotechnol. 29(1):51-57 (2011); Levy S, et al. PLoS Biol. 5(10):e254 (2007); Duitama J, et al. Nucleic Acids Res. 40(5):2041-2053 (2012); Suk E K, et al. Genome Res. 21(10):1672-1685 (2011), each of which is incorporated by reference in its entirety herein. A user may choose a specific means of compartmentalization, such as emulsions, based on their preference and available equipment, and this method can be adapter to a variety of compartmentalization methods known in the art.
It will be appreciated that a different compartment (e.g., a compartment comprising f2, f3, etc.) will also form tandem insert templates, but only from permutations of the starting molecules within those wells. In other words, only subfragments generated in the same compartment are available to hybridize together to generate concatenated sequencing templates. Accordingly, the presence of two insert sequences together in a concatenated sequencing template can be used to infer that these insert sequences were comprised in the same starting DNA molecule (such as fragment f1, f2, or f3 in
Accordingly, contiguity information is captured in the concatenated sequencing templates even when the tandem insert templates from all compartments are pooled together and sequenced.
An advantage of using wells or tubes as compartments is that reagents can be added at each stage of the process. A potential disadvantage of using wells or tubes is the physical scale of the liquid handling and plasticware. Hence, alternative methods of compartmentalization using droplets of water in oil have been developed that use microfluidics. Droplets can be merged to add reagents such as endonucleases that fragment DNA. Droplet technology has been used to capture contiguity information (see, for example, exemplary methods outlined in “Everything you wanted to know about Linked-Reads,” 10× Genomics, Feb. 7, 2017), but such methods often require the addition of exogenous synthetic barcodes to link contiguous sequences.
As a consequence of limiting dilution to sub-haploid concentrations and compartmentalization, two copies (haplotypes) of the same gene are unlikely to be present in the same compartment. For preparing haplotype data, however, dilutions need not limit to one or no target nucleic acid in a given compartment, but instead can allow for different chromosomes to be comprised in the same compartment. The dilution would only generally need to limit the probability of two haploid copies ending up in the same compartment.
As shown in
Sequencing templates comprising two or more inserts can also be prepared using a solid support with immobilized transposomes. A first and a second transposome are prepared as shown in
Both the first and second adapters comprise an affinity moiety that can bind to a binding moiety on a surface of a solid support to attach the first strands to the surface. In other words, association of the binding moiety on a surface with an affinity moiety in a transposome can be used to immobilize the transposomes on the surface. The affinity moiety may be a biotin or other chemistries known to those skilled in the art. The affinity moiety is present on the 5′ end of one of strands in a forked adapter comprised in the transposome. The first strand of the forked adapter comprised in the first transposome comprises full or partial sequences corresponding to the ‘Read 1’ sequences of Illumina's sequencing platform (e.g., P5.R1), and the first strand of the forked adapter comprised in the second transposome comprises full or partial sequences corresponding to the ‘Read 2’ sequences of Illumina's sequencing platform (e.g., P7.R2).
The second strand of each forked adapter can comprise two sections, a end section and a 3′ end section. The 5′ end section of the second strands is complementary and hybridized to the 3′ end of the first strands. The 3′ end section of the second strand (X′) of the forked adapter comprised in the first transposome adapter is complementary to the 3′ end section of the second strand (X) of the forked adapter comprised in the second transposome.
The transposomes are attached to a surface via the 5′ end of the first strand of the forked adapter comprised in the first and second transposome. Methods for attachment are known to those skilled in the art, for example, biotinylation of oligonucleotides to attach to streptavidin-coated surfaces. Attachment to the surface may result in a random arrangement of the two transposomes (
A strand of double-stranded DNA added to this surface with immobilized transposomes will undergo tagmentation by one or multiple transposomes positioned by chance under the contact point of the DNA with the surface (
The DNA to surface transposome ratio can be selected such that no more than two tagmentation events occur per double-stranded DNA molecule. Where two tagmentation reaction occur per double-stranded DNA, bridges are formed between neighboring transposomes.
Where a tagmentation reaction occurs with a first transposome and a second transposome, a bridge is formed comprising a segment of the starting DNA (e.g., segment A) with adapters appended at both ends. The bridges may be between a first transposome and a second transposome, or a first transposome and a first transposome, or a second transposome and a second transposome. Such permutations will occur in a ratio of 50:25:25, respectively.
When these bridges are processed to remove the Tn5 transposase (such as with SDS and washing), to seal the nicks/gaps, and then to denature the double-stranded fragments into single-stranded fragments, different combinations of templates can be formed.
For example, where the bridge is formed between a first transposome and a second transposome, two single stranded templates are formed, 5′-P5-R1-A-X-3′ and 5-′P7-R2-A′-X′-3′ (
The single-stranded strands are then treated to promote reannealing by methods known to those skilled in the art, for example, cooling or conducive buffer conditions. One outcome is that single-stranded fragments simply reanneal to their complement. Alternatively, single-stranded fragments may reanneal by their 3′ complementary ends, i.e., via binding of an X sequence to an X′ sequence. This is only possible between the first transposome and second transposome adapters, i.e., 5′-P5-R1-A-X-3′ and 5-′P7-R2-A′-X′ (
Where two bridges are formed by three tagmentation events, for example the two bridges represented by A and B in
It will be appreciated that two bridges may also form between three transposomes comprising a second forked adapter or three transposomes comprising a first forked adapter (
Where more than two bridges are formed, for example the five bridges represented by A, B, C, D, E in
The process of denaturation, reannealing, and extension can be performed multiple times until all the templates comprising an adapter from the first strand of the forked adapter comprised in the first transposome at a first end and an adapter from the second strand of the forked adapter comprised in the second transposome at a second end are converted into sequencing templates comprising two inserts.
The sequencing templates can then be detached from the surface by disrupting the linkage joining the tag incorporated from the 5′ end of the first strand of the forked adapters with the surface, using means known to those skilled in the art, for instance by enzymatic digestion or chemical cleavage. The released templates can then be introduced to a sequencing platform directly or may first undergo further modification such as the addition of additional adapter sequences or amplification by PCR followed by sequencing.
The present method does not require barcodes to capture association information about contiguous and complementary sequences within the genome. However, where two or more libraries of templates from different samples are pooled before sequencing, a sample barcode may be desired. Sample barcodes may be included in the first strands of forked adapters (
Transposomes may also be used with methods of limited dilutions and/or compartmentalization as described in Example 12. The transposomes may be first and second transposomes as shown in
In such methods, transposomes may be in solution and may not be immobilized on a solid support. Transposomes may also be immobilized on a solid support (such as a bead) wherein most compartments only comprise a single solid support. DNA molecules within a compartment are tagmented with the first and second transposomes present in the compartment but not necessarily attached to a surface to produce double-stranded tagged fragments.
The tagged fragments can then be denatured to prepare single-stranded fragments, and hybridization may be allowed between a X sequence on one fragment and a X′ sequence on another fragment. After hybridization, extension may be performed to prepare concatenated sequencing templates. These concatenated sequencing templates can then be sequenced.
If solution-phase transposomes are used, this method may likely generate concatenated sequencing templates that comprise two different insert sequences (as opposed to concatenated sequencing templates comprising two copies of the same insert) since the single-stranded fragments will not be immobilized before the hybridizing. Since the compartments can be optimized to generally comprise one or no DNA molecules before tagmentation, the presence of a concatenated sequencing template with two different insert sequences in sequencing results can be used to infer that these two insert sequences originated from sequences comprised in a single DNA molecule (i.e., neighboring or proximal sequences within a DNA molecule).
Concatenated sequencing templates described herein may be used for methylation analysis.
The concatenated sequencing template may then undergo a conversion process to identify methylated C's.
As shown in
The codification of the original bases is further developed and refined by collating the ‘2-base’ codes from the reads from the top strand and bottom strand of the tandem insert templates, using the method shown in
Methylation analysis can also be performed wherein the conversion is performed on methylated cytosines, and not unmethylated cytosines, as shown in
After preparation of a concatenated sequencing template using extensions with dNTPs that include methylated-dCTP, conversion of non-methylated C's to U's may be performed with any of the methods well-known in the art, such as sodium bisulfite conversion, enzymatic conversion, or borane-based conversion (
Methods can also be used to separately identify cytosines, methylated cytosines, and hydroxymethylated cytosines. As shown in
Methods can also be used to identify cytosines, methylated cytosines, and hydroxymethylated cytosines using conversion of only methylated cytosines. As shown in
Thus, the user can choose a decided means of methylation analysis based on the desired data and whether differentiation of methylated cytosines and hydroxymethylated cytosines is preferred.
The foregoing written specification is considered to be sufficient to enable one skilled in the art to practice the embodiments. The foregoing description and Examples detail certain embodiments and describes the best mode contemplated by the inventors. It will be appreciated, however, that no matter how detailed the foregoing may appear in text, the embodiment may be practiced in many ways and should be construed in accordance with the appended claims and any equivalents thereof.
As used herein, the term about refers to a numeric value, including, for example, whole numbers, fractions, and percentages, whether or not explicitly indicated. The term about generally refers to a range of numerical values (e.g., +/−5-10% of the recited range) that one of ordinary skill in the art would consider equivalent to the recited value (e.g., having the same function or result). When terms such as at least and about precede a list of numerical values or ranges, the terms modify all of the values or ranges provided in the list. In some instances, the term about may include numerical values that are rounded to the nearest significant figure.
This application is a continuation of PCT/US2021/055878, filed Oct. 20, 2021, which claims the benefit of priority of U.S. Provisional Application No. 63/094,422, filed Oct. 21, 2020, and U.S. Provisional Application No. 63/256,040, filed Oct. 15, 2021, the contents of which are each incorporated by reference herein in their entireties for any purpose.
Number | Date | Country | |
---|---|---|---|
63094422 | Oct 2020 | US | |
63256040 | Oct 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2021/055878 | Oct 2021 | US |
Child | 18303905 | US |