The content of the electronically submitted sequence listing, file name Library_SEQ_ST25.txt, size 192, 421 bytes, and date of creation Feb. 28, 2018, filed herewith, is incorporated herein by reference in its entirety.
This present disclosure relates generally to the area of genetic analysis, and more specifically to a method and a kit for constructing a nucleic acid library.
Recent years have witness a rapid development and wide application of the next-generation sequencing technologies. Next-generation sequencing typically involves the construction of a nucleic acid library from a nucleic acid sample before sequencing.
Current method of constructing a DNA library typically involves chopping nucleic acid sequences to obtain double-stranded DNA fragments before ligation of adaptors at each of the 3′ end and the 5′ end of the fragments to thereby allow sequencing of each individual double-stranded DNA fragments. In this process, the presence of single-stranded segments in the DNA molecules, due to, for example, the damages to the DNA molecules accumulated during sample preparation, such as formalin-fixed paraffin-embedded (FFPE) samples, or over a long-time storage (e.g. fossil samples), imposes a huge issue, as these damaged DNA segments commonly lead to a great difficulty in DNA sequencing based on current technology for constructing the DNA library.
Nucleic acid samples are sometimes extremely limited, where only nanogram or picogram nucleic acids are available for further analysis. It is a challenging task to construct high quality library from such ultra-low amount of nucleic acid samples. However, this difficulty is frequently encountered in clinical applications of nucleic acid analysis, such as clinical NGS sequencing. In addition, rare mutations or ultra-rare mutations, as those that are commonly associated with cancers, have proven a challenging task for current sequencing platforms. This is primarily because normal tissues are typically collected together with the diseased tissues, which often significantly reduces the prevelance of disease-related mutations in clinical samples, resulting in a great difficulty in looking for disease-related rare mutations using current sequencing technologies.
As such, genetic analysis of low-quality and/or low-quantity nucleic acid materials is particularly challenging for all current sequencing platforms and technologies.
In order to address the above-mentioned challenges for analyzing low-quality and/or low-quantity nucleic acid samples using current sequencing technologies, the present disclosure provides a method and a kit for constructing a nucleic acid library.
In a first aspect, the disclosure provides a kit for constructing a DNA library from a biological sample containing a plurality of nucleic acid sequences. The kit can include a first adaptor, a DNA ligase, and a first primer.
The first adaptor can include a first strand, which comprises a phosphate group, a barcode sequence, and a first primer recognition sequence in a direction from a 5′ end thereof to a 3′ end thereof. Herein, the barcode sequence is configured to provide barcode information to each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor. The DNA ligase is configured to allow a ligation between the 5′ end of the first strand of the first adaptor to a 3′ end of each of a plurality of single-stranded DNA molecules. Herein, each of the plurality of single-stranded DNA molecules corresponds to one of the plurality of nucleic acid sequences in the biological sample. The first primer comprises a sequence complementary to the first primer recognition sequence of the first adaptor and is configured to allow for a single-strand extension reaction to thereby form a double-stranded DNA molecule corresponding to each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor.
Herein the first primer can have a Tm of about 30-35° C., but can also have a Tm of about 55-65° C. In one specific embodiment, the first primer recognition sequence has a sequence: CCTCAGCAAG (i.e. SEQ ID NO: 913), and correspondingly the first primer comprises a sequence: CTTGCTGAGG (i.e. SEQ ID NO: 914), which is substantially a complimentary sequence of the first primer recognition sequence. The barcode sequence has a length of about 2-16 nt.
The kit disclosed herein can further include a solid support, and the first strand of the first adaptor can further include an immobilization portion at the 3′ end thereof, which is configured to allow immobilization of each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor at the 5′ end thereof to the solid support. The immobilization portion can comprise a first coupling partner, configured to be able to form a stable coupling with a second coupling partner attached to the solid support. Herein, the stable coupling between the first coupling partner and the second coupling partner can be a non-covalent binding or a covalent connection.
According to some embodiments, the stable coupling between the first coupling partner and the second coupling partner is a non-covalent binding, and the first coupling partner and the second coupling partner can respectively be one and another of a coupling pair, selected from one of a biotin-streptavidin pair, a biotin-avidin pair, a biotin-anti-biotin antibody pair, a carbohydrate-lectin pair, or an antigen-antibody pair. In one specific embodiment, the first coupling partner comprises a biotin moiety, and the second coupling partner comprises a streptavidin moiety attached to a magnetic bead.
According to some other embodiments, the stable coupling between the first coupling partner and the second coupling partner is a covalent connection, and the first coupling partner and the second coupling partner can respectively be one and another of a cross-linking pair. Examples of the cross-linking pair include an NHS ester-primary amine pair, a sulfhydryl-reactive chemical group pair (e.g. cysteines, or other sulfhydryls such as maleimides, haloacetyls, and pyridyl disulfides), an oxidized sugar-hydrazide pair, photoactivatable nitrophenyl azide's UV triggered addition reaction with double bonds leading to insertion into C—H and N—H sites or subsequent ring expansion to react with a nucleophile (e.g., primary amines), or carbodiimide activated carboxyl groups to amino groups (primary amines), etc.
The immobilization portion can further include a spacer between the first primer recognition sequence and the first coupling partner, and the spacer can include at least one C3 spacer unit.
According to some embodiments of the kit, the first strand of the first adaptor further includes an index sequence between the phosphate group and the barcode sequence or between the barcode sequence and the first primer recognition sequence, which is configured to provide index information for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor. Herein, the index sequence can have a length of 1-8 nt.
According to some embodiments of the kit, the first strand of the first adaptor further comprises a separator sequence disposed between the phosphate group and the barcode sequence, which is configured to serve as a separation marker between the barcode sequence and each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor. Herein, the separator sequence can have a length of about 2-16 nt. According to some specific embodiments of the kit, the separator sequence can be further configured to provide index information for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor.
In the kit, the first adaptor can be single-stranded, and the DNA ligase can be a single-stranded DNA ligase, which can comprises at least one of CircLigase I or CircLigase II.
Alternatively, the first adaptor can be partially double-stranded. In embodiments where the first adaptor comprises a single-stranded segment at the 5′ end of the first strand, the DNA ligase can be a single-stranded DNA ligase, which can comprises at least one of CircLigase I or CircLigase II.
In some other embodiments, the first adaptor further comprises a second strand, which includes a first portion at a 5′ end thereof and a second portion at a 3′ end thereof. The first portion of the second strand forms a double-stranded duplex with the 5′ end of the first strand, and the second portion forms a single-stranded overhang in the first adaptor. As such, the DNA ligase can be a bandage strand-facilitated DNA ligase, which can include at least one of T3 DNA ligase, T4 DNA ligase, T7 DNA ligase, or Taq DNA ligase. Herein, the first portion can have a length of 8-18 nt, and the second portion can have a length of 4-10 nt. According to some embodiments, the first adaptor can comprise a set of adaptors, each configured such that a second portion of a second strand thereof comprises a random sequence. According to some other embodiments, the first adaptor can comprise one or more adaptors, each configured such that a second portion of a second strand thereof comprises a specific sequence.
The kit disclosed herein can further include a second adaptor, which is configured to ligate to a free end of the barcoded double-stranded DNA molecule corresponding to each of the plurality of single-stranded DNA molecules immobilized to the solid support at an immobilized end thereof. The second adaptor can comprise a third strand and a fourth strand. The fourth strand include a phosphate group at a 5′ end thereof, and further include a second primer recognition sequence, which is configured to provide a priming site for amplification of the double-stranded DNA molecule corresponding to the each of the plurality of single-stranded DNA molecules. The third strand comprises a sequence complimentary to a 5′-end sequence of the fourth strand, and is configured to form a duplex with, and thereby to ensure a stability of, the 5′-end sequence of the fourth strand. Herein, the fourth strand can further include at least one functional sequence at a 5′ end of the second primer recognition sequence, which can comprise at least one of a second index sequence, or a second barcode sequence, or a sequencing primer sequence.
In the kit as described above, the third strand can further include, at a 5′ end of thereof, at least one of: a cap structure, an overhang sequence, or a functional moiety. The cap structure can include a sequence that does not match with a 3′-end sequence of the fourth strand, and is configured to avoid concatenation of the second adaptor in a ligation reaction. The overhang sequence can form a single-stranded segment for the second adaptor.
The kit can further include a pair of primers, which are configured to amplify the double-stranded DNA molecule corresponding to the each of the plurality of single-stranded DNA molecules therethrough.
Herein the pair of primers can be configured to respectively target the two end portions of the double-stranded DNA molecule. In one specific embodiment, one of the pair of primers can comprise a sequence corresponding to at least a portion of a sequence of the first primer which has been used for the single-stranded extension reaction, and can, for example, comprise a sequence corresponding to the first primer recognition sequence in the first adaptor. Another of the pair of primers can comprise a sequence corresponding to at least a portion of a sequence in the fourth strand of the second adaptor. Herein “at least a portion” of a sequence can include part or all of the sequence.
The kit as disclosed herein may further include a third adaptor that can be ligated to the free ends of the first adaptor, and can be engineered to be compatible with commercial sequencing platforms to work together with the second adaptor to perform pair-end sequencing or to work along to perform sequencing starting from the first adapter sequence to the DNA molecule corresponding to each of the plurality of single-stranded DNA molecules.
In a second aspect, the disclosure further provides a method for constructing a DNA library from a biological sample containing a plurality of nucleic acid sequences utilizing the kit according to any one of the embodiments as described above. The method comprises:
preparing a DNA sample from the biological sample, wherein the DNA sample comprises a plurality of single-stranded DNA molecules, each having a dephosphorylated 5′ end;
ligating a first strand of a first adaptor to a 3′ end of each of the plurality of single-stranded DNA molecules, wherein the first strand of the first adaptor comprises a phosphate group, a barcode sequence and a first primer recognition sequence along a direction from a 5′ end thereof to a 3′ end thereof; and
synthesizing a complementary strand for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor to obtain a barcoded double-stranded DNA molecule corresponding thereto.
In the step of ligating a first strand of a first adaptor to a 3′ end of each of the plurality of single-stranded DNA molecules in the method as described above, the barcode sequence can have any length, but preferably can have a length of 2-16 nt.
According to some embodiments, the plurality of nucleic acid sequences in the biological sample comprise a plurality of DNA sequences, and the preparing a DNA sample from the biological sample comprises: performing dephosphorylation reaction and dissociation reaction to obtain a plurality of single-stranded DNA molecules, each having a dephosphorylated 5′ end.
Herein the performing dephosphorylation reaction and dissociation reaction can comprise at least one cycle of: performing a dephosphorylation reaction, and performing a dissociation reaction, or alternatively can comprise at least one cycle of: performing a dissociation reaction, and performing a dephosphorylation reaction.
Prior to the performing dephosphorylation reaction and dissociation reaction, the preparing a DNA sample from the biological sample can further comprise: shearing the plurality of DNA sequences into a plurality of DNA fragments. Herein each of the plurality of DNA fragments can have a size of about 100-300 bp, and preferably of about 150 bp.
According to some embodiments, the plurality of nucleic acid sequences in the biological sample comprise a plurality of RNA sequences, and the preparing a DNA sample from the biological sample comprises: treating the biological sample to thereby obtain a plurality of cDNA molecules, each corresponding to one of the plurality of RNA molecules.
The treating the biological sample to thereby obtain a plurality of cDNA molecules can comprise: performing a reverse transcription using an oligo(dT) as a primer to obtain a cDNA sequence corresponding to each of the plurality of RNA molecules.
According to some embodiments, prior to the performing a reverse transcription using an oligo(dT) as a primer to obtain a cDNA sequence corresponding to each of the plurality of RNA molecules, the treating the biological sample to thereby obtain a plurality of cDNA molecules further comprises: performing a polyadenylation at a 3′ end of each of the plurality of RNA molecules.
According to some embodiments, the treating the biological sample to thereby obtain a plurality of cDNA molecules comprises: performing a reverse transcription using random primers or sequence-specific primers to obtain a cDNA sequence corresponding to each of the plurality of RNA molecules.
In the method disclosed herein, the first adaptor can comprise a single-stranded segment at the 5′ end of the first strand thereof, and the ligating a first strand of a first adaptor to a 3′ end of each of the plurality of single-stranded DNA molecules comprises: performing a ligation reaction through a single-stranded DNA ligase such that the 3′ end of each of the plurality of single-stranded DNA molecules is ligated to the 5′ end of the first strand of the first adaptor. Herein, the single-stranded DNA ligase can comprise at least one of CircLigase I or CircLigase II.
According to some embodiments of the method, the first adaptor further comprises a second strand, which comprises a first portion at a 5′ end thereof and a second portion at a 3′ end thereof. The first portion of the second strand has a length of at least 1 nt, and forms a double-stranded duplex with the 5′ end of the first strand. The second portion has a length of at least 1 nt, and forms a single-stranded overhang in the first adaptor, and the ligating the 5′ end of a first strand of a first adaptor to a 3′ end of each of the plurality of single-stranded DNA molecules comprises: performing a ligation reaction through a bandage strand-facilitated DNA ligase such that the 3′ end of each of the plurality of single-stranded DNA molecules is ligated with the 5′ end of the first strand of the first adaptor.
Herein, the second portion can have a length of 4-10 nt. As such, the first adaptor can include a set of adaptors, each configured such that a second portion of a second strand thereof comprises a random sequence. The first adaptor can also include one or more adaptors, each configured such that a second portion of a second strand thereof comprises a specific sequence.
Herein the first portion can have a length of 8-18 nt. The bandage strand-facilitated DNA ligase can include at least one of T3 DNA ligase, T4 DNA ligase, T7 DNA ligase, or Taq Ligase.
In the method disclosed herein, the first strand of the first adaptor can further comprise an index sequence, which is disposed between the phosphate group and the barcode sequence, or between the barcode sequence and the first primer recognition sequence. The index sequence is configured to provide index information for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor. Herein, the index sequence can have a length of 1-8 nt.
The first strand of the first adaptor can further comprise a separator sequence disposed between the phosphate group and the barcode sequence, which is configured to serve as a separation marker between the barcode sequence and each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor. Herein, the separator sequence can have a length of about 2-16 nt. According to some embodiments, the separator sequence can be further configured to provide index information for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor.
In the method disclosed herein, the synthesizing a complementary strand for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor to obtain a barcoded double-stranded DNA molecule corresponding thereto can comprise:
annealing a first primer with each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor, wherein the first primer comprises a sequence complementary to the first primer recognition sequence in the first strand of the first adaptor; and
performing a single-strand extension reaction to form a double-stranded DNA molecule for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor.
Herein the annealing a first primer with each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor can include: slowly altering a temperature of a reaction from an original temperature to a working temperature for the single-stranded extension reaction. According to some embodiments, the first primer has a Tm of about 30-35° C., and the slowly altering a temperature of a reaction to a working temperature for the single-stranded extension reaction comprises: increasing the temperature from an original temperature of no more than ˜20° C. (preferably no more than ˜15° C.) to the working temperature for the single-stranded extension reaction at a rate of no more than ˜3° C. per minute (preferably no more than ˜1° C. per minute). In one specific embodiment, the first primer recognition sequence has a sequence: CCTCAGCAAG (i.e. SEQ ID NO: 913), and correspondingly the first primer comprises a sequence: CTTGCTGAGG (i.e. SEQ ID NO: 914), which is substantially a complimentary sequence of the first primer recognition sequence.
The synthesizing a complementary strand for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor to obtain a barcoded double-stranded DNA molecule corresponding thereto can further include: performing a blunt-end repair to the double-stranded DNA molecule corresponding to the each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor. Herein, the blunt-end repair can be performed by at least one of T4 DNA polymerase, Klenow Fragment, or T4 polynucleotide kinase.
In the method disclosed herein, the first strand of the first adaptor can further include an immobilization portion at the 3′ end thereof, which is configured to be able to form a stable coupling to a solid support. Between the ligating a first strand of a first adaptor to a 3′ end of each of the plurality of single-stranded DNA molecules and the synthesizing a complementary strand for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor to obtain a barcoded double-stranded DNA molecule corresponding thereto, the method further comprises: immobilizing each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor to the solid support via the stable coupling between the immobilization portion and the solid support.
Herein, the immobilization portion can include a first coupling partner, configured to be able to stably couple (i.e. covalently connect, or non-covalently but securely bind, etc.) to a second coupling partner attached to the solid support. According to some embodiments, the first coupling partner can comprise a biotin moiety, the second coupling partner can comprise at least one of a streptavidin moiety, an avidin moiety, or an anti-biotin antibody, and the solid support can comprise at least one of a magnetic bead, a filter, a resin bead, a nanosphere, a plastic surface, a microtiter plate, a glass surface, a slide, a membrane, or a matrix.
After the synthesizing a complementary strand for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor to obtain a barcoded double-stranded DNA molecule corresponding thereto, the method can further comprise: ligating a second adaptor to a free end of the double-stranded DNA molecule corresponding to the each of the plurality of single-stranded DNA molecules immobilized to the solid support at an immobilized end thereof.
Herein the second adaptor can comprise a third strand and a fourth strand. The fourth strand comprises a phosphate group, which is at a 5′ end thereof) and a second primer recognition sequence, which is configured to provide a priming site for amplification of the double-stranded DNA molecule corresponding to the each of the plurality of single-stranded DNA molecules. The third strand comprises a sequence complimentary to a 5′-end sequence of the fourth strand, and is configured to form a duplex with, and thereby to ensure a stability of, the 5′-end sequence of the fourth strand.
As such, the method further comprises: performing a PCR reaction to thereby amplify the double-stranded DNA molecule corresponding to the each of the plurality of single-stranded DNA molecules.
Herein the PCR reaction can be performed by means of a pair of primers respectively targeting the two end portions of the double-stranded DNA molecule. In one specific embodiment, one of the pair of primers can comprise a sequence corresponding to at least a portion of a sequence of the first primer which has been used for the single-stranded extension reaction, and another of the pair of primers can comprise a sequence corresponding to at least a portion of a sequence in the fourth strand of the second adaptor. Herein “at least a portion” of a sequence can include part or all of the sequence.
Between the ligating a second adaptor to a free end of the double-stranded DNA molecule corresponding to the each of the plurality of single-stranded DNA molecules immobilized to the solid support at an immobilized end thereof and the performing a PCR amplification to each of the plurality of single-stranded DNA molecules, the method can further comprise: eluting the double-stranded DNA molecule corresponding to the each of the plurality of single-stranded DNA molecules from the solid support.
These and other embodiments which will be apparent to those of skill in the art upon reading the specification provide the art with methods for assessing, characterizing, and detecting genetic markers, such as cancer markers, and genetic analysis, such as SNV identification. In particular, it provides methods for constructing single-stranded nucleic acids into libraries for desired analysis.
Throughout the disclosure, the term “about” or “around”, and the sign “˜” as well, generally refers to plus or minus 10% of the indicated number. For example, “about 20” may indicate a range of 18 to 22, and “about 1” may mean from 0.9-1.1. Other meanings of “about” may be apparent from the context, such as rounding off, so, for example “about 1” may also mean from 0.5 to 1.4.
As used herein, the term “double-stranded duplex”, “hybridization” or “annealing” refers to the pairing of complementary (including partially complementary) polynucleotide strands. Hybridization and the strength of hybridization (e.g., the strength of the association between polynucleotide strands) is impacted by many factors well known in the art including the degree of complementarity between the polynucleotides, stringency of the conditions involved affected by such conditions as the concentration of salts, the melting temperature (Tm) of the formed hybrid, the temperature of the hybridization reaction, the presence of other components, the molarity of the hybridizing strands and the G:C content of the polynucleotide strands. When one polynucleotide is said to “hybridize” to another polynucleotide, it means that there is some complementarity between the two polynucleotides or that the two polynucleotides form a hybrid under high stringency conditions. When one polynucleotide is said to not hybridize to another polynucleotide, it means that there is no sequence complementarity between the two polynucleotides or that no hybrid forms between the two polynucleotides at a high stringency condition.
As used herein, the term “complementary” refers to the concept of sequence complementarity between regions of two polynucleotide strands (e.g. a double-stranded structure) or between two regions of the same polynucleotide strand (e.g. a “loop” or “hairpin” structure). It is known that an adenine base of a first polynucleotide region is capable of forming specific hydrogen bonds (“base pairing”) with a base of a second polynucleotide region which is antiparallel to the first region if the base is thymine or uracil. Similarly, it is known that a cytosine base of a first polynucleotide strand is capable of base pairing with a base of a second polynucleotide strand which is antiparallel to the first strand if the base is guanine. A first region of a polynucleotide is complementary to a second region of the same or a different polynucleotide if, for example, when the two regions are arranged in an antiparallel fashion, at least one nucleotide of the first region is capable of base pairing with a base of the second region. Therefore, it is not required for two complementary polynucleotides to base pair at every nucleotide position. “Complementary” refers to a first polynucleotide that is 100% or “fully” complementary to a second polynucleotide and thus forms a base pair at every nucleotide position. “Complementary” also refers to a first polynucleotide that is not 100% complementary (e.g., 90%, or 80% or 70% complementary) contains mismatched nucleotides at one or more nucleotide positions. In one embodiment, two complementary polynucleotides are capable of hybridizing to each other under high stringency hybridization conditions.
Throughout the disclosure, the term “bandage strand-facilitated DNA ligase” is referred to as a DNA ligase that can catalyze a ligation between a 5′ end of a first DNA strand and a 3′ end of a second strand, facilitated by the presence of a third strand (i.e. “bandage strand”) that has one segment complimentary to the 5′ end of the first DNA strand and another segment complimentary to the 3′ end of the second strand. Herein the bandage strand-facilitated DNA ligase includes, but is not limited to, T4 DNA ligase, T3 DNA ligase, T7 DNA ligase, and Taq DNA ligase, etc. The term “single-stranded DNA ligase” as in this disclosure is referred to as a DNA ligase that can catalyze the ligation between a 5′ end of a first DNA strand and a 3′ end of a second strand in an absence of the bandage strand.
Unless indicated otherwise, all sequences in the present disclosure have a direction from 5′ end to 3′ end.
The present disclosure provides a method for constructing a nucleic acid library from a biological sample containing a plurality of nucleic acid sequences. As illustrated in
S100: preparing a DNA sample from the biological sample, wherein the DNA sample comprises a plurality of single-stranded DNA molecules, each having a dephosphorylated 5′ end.
According to some embodiment of the method, the biological sample comprises a plurality of DNA sequences that are often double-stranded and commonly have phosphorylated 5′ ends, and as such, step S100 can include the following sub-steps, as illustrated in
S110: Shearing the plurality of DNA sequences into DNA fragments;
S120: Performing a dephosphorylation reaction over the DNA fragments to thereby obtain dephosphorylated DNA fragments; and
S130: Performing a dissociation reaction over the dephosphorylated DNA fragments to thereby obtain the plurality of single-stranded DNA molecules.
Herein the biological sample can have a plurality of double-stranded DNA sequences, and can typically be a genomic DNA sample from a tissue, a mitochondrial DNA sample, or a cell-free DNA sample from blood or other body fluids, etc. These different types of DNA samples can be prepared based on different assays that are conventional in the field, whose description is skipped herein. Herein by means of step S100, the DNA sample comprising a plurality of single-stranded DNA molecules can be obtained from the biological sample.
In sub-step S110, the length of each DNA fragment can have a range of around 100-300 bp (preferably around 150 bp), but can vary depending on different needs. The DNA molecules in the biological sample can be sheared by a conventional shearing method. In one example, a DNA sample can be sheared into fragments of around 150 bp with Diagenode's Bioruptor at a program of 7 cycles of 30 seconds ON/90 seconds OFF using 0.65 ml Bioruptor® Microtubes. It is noted that sub-step S110 may be optional and can vary depending on the source, nature, and composition of the biological sample. In one example, long nucleic acid sequences, such as genomic DNA obtained from a conventional preparation approach which typically have large double-stranded DNA fragments, can be sheared into small fragments. In another example, circulating cell free DNAs (cfDNAs) commonly purified from human plasma typically have a size of around 140-170 bp and may not need to be sheared, or only need minor shearing.
Sub-step S120 is configured to remove the phosphate group at the 5′ end of any DNA fragment to thereby prevent the formation of concatemers between different nucleic acid fragments from the sample in the subsequent ligation reaction. Herein sub-step S120 can be performed at 37° C. in the presence of a phosphatase (such as the FastAP Alkaline Phosphatase) for 5˜10 min. Other reaction conditions are also possible.
In sub-step S130, the plurality of dephosphorylated DNA fragments can be dissociated from a double-stranded form to become a single-stranded form, to thereby obtain a plurality of single-stranded DNA molecules. As such, the sample can be heated at 95° C. for 3-15 min and snap-frozen on ice. Other reaction conditions are also possible.
It is noted that there can be other embodiments of step S100 regarding the order and the cycles for the sub-steps S120 and S130.
In one specific embodiment, after S110, the dissociation reaction (i.e. S130) can be performed prior to the dephosphorylation reaction (i.e. S120), as illustrated in
To ensure that as many phosphate groups in the nicks/gaps or at the ends of DNA strand as possible are to be removed, in some embodiment of step S100, after S110, sub-steps S130 and S120 can be done inn cycles (n≥2), as illustrated in
The actual selection of the various embodiments of step S100, as respectively illustrated in
According to some other embodiments of the method, the biological sample may contain a plurality of RNA sequences, and the method is employed to construct a DNA library from the plurality of RNA molecules in the biological sample. Correspondingly, prior to sub-step S110, step S100 comprises a sub-step of:
S109: preparing a cDNA sample comprising a plurality of cDNA molecules from the biological sample, wherein each cDNA molecule corresponds to one of the plurality of RNA molecules.
If mRNAs in the biological sample are included as the target nucleic acid sequences for the construction of the DNA library, because typically each mRNA contains a poly(A) tail at a 3′ end thereof, specifically as shown in
S1091: Performing a reverse transcription using an oligo(dT) as a primer to thereby obtain a cDNA sequence corresponding to each of the plurality of RNA molecules.
If RNAs in the plurality of RNA molecules other than the mRNAs are also included as the target nucleic acid sequences for the construction of the DNA library, because they typically do not have poly(A) tails at 3′ ends, thus specifically, as illustrated in
S1091′: Performing a polyadenylation at a 3′ end of each of the plurality of RNA molecules; and
S1092′: Performing a reverse transcription using an oligo(dT) as a primer to thereby obtain a cDNA sequence corresponding to each of the plurality of RNA molecules.
Herein S1091′ can be performed to each RNA molecule by means of a poly(A) polymerase to corresponding obtain a treated RNA molecule having a poly (A) tail. S1092′ can include: annealing of the oligo(dT) primer with the poly (A) tail of each treated RNA molecule, and performing a reverse transcription in presence of a reverse transcriptase. The actual processes for S1091′ and S1092′ are well-known by people of ordinary skills in the field, and the description is skipped herein.
Alternatively, each RNA sequence in the biological sample can be reversely transcribed by means of random primers or sequence-specific primers. As shown in
S1091″: Performing a reverse transcription initiated by a set of random primers or sequence-specific primers to obtain cDNAs corresponding to each of the plurality of RNA molecules.
The above-mentioned embodiments of the method can be applied to a biological sample containing only RNA molecules, which is prepared, for example, by a RNA purification protocol that is known to those of ordinary skills in the field. It can also be applied to a biological sample containing both DNA molecules and RNA molecules.
It is noted that every cDNA molecule obtained from reverse transcription of a RNA molecule by the two embodiments of sub-step S109 as shown in
It is further noted that if only RNAs in the biological sample are targeted, during extraction of the RNAs, the genomic DNA can be removed by a RNA purification protocol that is known to people of ordinary skills in the field.
S200: ligating a first strand of a first adaptor to a 3′ end of each of the plurality of single-stranded DNA molecules, wherein the first strand of the first adaptor comprises a barcode sequence and a first primer recognition sequence at a 5′ end and a 3′ end thereof respectively.
Herein in the first adaptor 01, the barcode sequence 100 substantially allows each single-stranded DNA molecule to be labelled uniquely. The barcode sequence can have any length, and can have preferably a length of 2-16 nt. According to some embodiments of the disclosure, the barcode sequence 100 has a length of 12 nt, which can uniquely apply a total of 412 (or 16,777,216) different adaptors to a plurality of single-stranded DNA molecules. It should be noted that the length of the barcode sequence 100 can vary, depending on different needs in practice, for example, on the estimated complexity and abundance of different single-stranded DNA molecules in the DNA sample.
The first primer recognition sequence 200 in the first strand of the first adaptor 01 is substantially a universal primer recognition sequence across different DNA molecules, which allows each uniquely barcodedly labelled single-stranded DNA molecule to be conveniently amplified to obtain double-stranded DNA molecules in a subsequent single-cycle PCR reaction by means of a first primer 200′ having a sequence complementary to the first primer recognition sequence 200 (as described below). Herein the first primer 200′ can thus be regarded as a universal primer. It is noted that to avoid non-specific amplification of sequences in the above mentioned single-cycle PCR reaction, the first primer recognition sequence 200 can be configured to have a relatively unique sequence among different genes and across different species. Thus the first primer recognition sequence 200 may vary based on the nature and species of the target nucleic acid sample.
According to some embodiments, the first primer recognition sequence 200 is further configured to have a Tmthat allows efficient or specific amplification for the single-cycle PCR reaction, depending on different needs. The first primer recognition sequence 200 can optionally have a length of 5-30 nt.
According to some preferred embodiments, the first primer recognition sequence 200 has a Tm of ˜30-35° C., and a length of 8-12 nt. For example, the first primer recognition sequence 200 in one specific embodiment, which has a sequence of “CCTCAGCAAG” (i.e. SEQ ID NO: 913), has a length of 10 nt. In addition, to balance the length and Tm, the first primer recognition sequence 200 can be selected such that it has a GC content between 40%-70%, and is lack of any repetitive sequences. It is noted that this above configuration is especially suitable for constructing a DNA library directly from original DNA sequences in a DNA sample without any prior amplification. The use of a short first primer recognition sequence 200 in the first adaptor 01 allows a subsequent synthesis of a complementary strand of each single-strand DNA molecule (i.e. the amplification reaction for the single-cycle PCR reaction) to be efficiently performed in the presence of a short primer (i.e. the first primer 200′ as described below, which has a sequence complimentary to the first primer recognition sequence 200) having a relatively low Tm.
According to some other embodiments, the first primer recognition sequence 200 has a length of 13-30 nt and has a Tm of 55-65° C., just like a regular PCR primer sequence. This configuration allows a relative more specific amplification for the single-cycle PCR reaction to meet certain practical needs.
It is noted that besides the first adaptor 01 as shown in
In addition to the barcode sequence 100 and the first primer recognition sequence 200 as described above, the first adaptor 01 can optionally include an immobilization portion 300, disposed at a 3′ end of the first adaptor 01 (i.e. 3′ end of the first primer recognition sequence 200) and configured to allow immobilization of the plurality of single-stranded DNA molecules attached therewith at a 5′ end of the first adaptor 01 to a solid support 300s, as illustrated in
Herein the solid support 300s can be a filter, a bead (such as resin, or a magnetic bead, etc.), a nanosphere, a plastic surface, a microtiter plate, a glass surface, a slide, a membrane, a matrix (which can be packed into a cartridge or column structure), etc., and selection of specific solid support 300s can depend on the convenience, purpose, and situation. The solid support 300s can be treated and derivatized as is known in the art.
Immobilization of the plurality of single-stranded DNA molecules to the solid support 300s can be direct or indirect. According to some embodiments as illustrated in
As such, in any of these above embodiments of the first adaptor 01, the immobilization portion 300 can include a first coupling partner 300a, which is covalently or non-covalently but stably attached to a 3′ end of the first adaptor 01 (i.e. a 3′ end of the first strand of the first adaptor 01). The first coupling partner 300a is configured to form a stable coupling or attachment with a second coupling partner 300a′ immobilized (or covalently attached) to a solid support 300s, without interfering with other events.
Herein the stable attachment between the first coupling partner 300a and the second coupling partner 300a′ can be a covalent connection, and as such the first coupling partner-second coupling partner pair can be, but is not limited to, the functional group pair of NHS esters-primary amines. Alternatively, the stable attachment between the first coupling partner 300a and the second coupling partner 300a′ can be a non-covalent binding (or bonding), and as such the first coupling partner-second coupling partner pair can be, but is not limited to, a biotin-streptavidin/avidin pair, a biotin-anti-biotin antibody pair, a carbohydrate-lectin pair, and an antigen-antibody pair.
For example, the first coupling partner 300a can be a dye (e.g. a fluorescence dye), and the second coupling partner 300a′ can be an antibody that specifically and stably binds with the first coupling partner 300a (i.e. the dye). The use of dye as the first coupling partner 300a allows the target sequence ligated to the first adaptor 01 to be visualized, and thus additionally providing a means for quality control or for other purposes.
As such, the stable attachment between the first coupling partner 300a and the second coupling partner 300a′ allows the first adaptor 01, along with each single-stranded DNA molecule ligated thereby, to be immobilized to the solid support 300s, facilitating the capture, enrichment, isolation, and purification of the DNA molecules, which in turn brings convenience in subsequent reactions (e.g. PCR amplification, NGS sequencing, etc).
In order to increase the efficiency for the first primer 200′ to bind with the first primer recognition sequence 200 in the first adaptor 01 to thereby facilitate the subsequent single-cycle PCR reaction, the immobilization portion 300 can be configured to further include a spacer 300b, disposed between the first primer recognition sequence 200 and the first coupling partner 300a. A length of the spacer 300b can rely on the nature and composition of the immobilization portion 300. In one illustrating example also illustrated in
It should be noted that the biotin-streptavidin pair as described above and illustrated in
It is further noted that the spacer 300b can include other spacer units, and can further include another moiety, such as triethyleneglycol (TEG). Herein the TEG spacer can be disposed to attach a biotin moiety, which can avoid hindrance issues and can be beneficial for attaching oligonucleotides to nanospheres or magnetic beads.
Additionally, the first adaptor 01 can optionally include an index sequence 400, disposed either at a 5′ end of the first adaptor 01 (i.e. at a 5′ end of the barcode sequence 100, as shown in
Furthermore, the first adaptor 01 can optionally include a separator sequence 500, disposed at a 5′ end of the barcode sequence 100 (i.e. 5′ end of the first adaptor 01 as illustrated in
According to some embodiments as illustrated in
One specific example of the first adaptor as described above and shown in
In addition to the aforementioned single-stranded first adaptor 01 (i.e. the first adaptor 01 includes only the first strand and all functional elements are substantially in the first strand of the first adaptor 01), which is described as a first embodiment of the first adaptor 01 and illustrated in
In a second embodiment of the first adaptor as shown in
In a third embodiment of the first adaptor as shown in
It is noted that due to the presence of the bandage strand (i.e. the second strand 01b in the first adaptor 01″), the ligation reaction by means of the bandage strand-facilitated DNA ligase (e.g. T4 DNA ligase) is demonstratably more efficient than a ligation reaction using a single-stranded DNA ligase. Additionally, the “overhang” sequence (i.e. the second portion) on the second strand 01b of the first adaptor 01″ can add selection power to the ligation reaction by selectively annealing to target single-stranded DNA molecules whose 3′ end sequences are complementary to the “overhang” sequences.
In order to ensure a sufficient coverage, according to some embodiments, the first adaptor 01″ substantially includes a set of adaptors, where the second portion in the second strand of each adaptor comprises a random sequence, configured such that the random sequences in the second portion in the second strand of the plurality of adaptors together can cover all possible sequences of the 3′ end of the plurality of single-stranded DNA molecules. As such, all possible single-stranded DNA sequences in the sample can be ligated to the first adaptor 01″ to thus be incorporated in the library via the bandage strand-facilitated DNA ligase (e.g. T4 DNA ligase).
According to some other embodiments, the second portion of the first adaptor 01″ can comprise one or more specific sequences, which allow a relatively specific ligation of the first adaptor 01″ with certain target species in the single-stranded DNA molecules whose 3′ end sequences are complementary to the second portion.
In step S200, the ligation of the 5′ end of the first strand of the first adaptor to the 3′ end of each of the plurality of single-stranded DNA molecules is carried out by a DNA ligase. In other words, under the action of the DNA ligase, a 5′ end of the first strand of the first adaptor can be ligated to a 3′ end of each of the plurality of single-stranded DNA molecules. Herein the DNA ligase can be any of CircLigase II, CircLigase I, T4 DNA ligase, etc.
The CircLigase II and CircLigase I can be the single-stranded DNA ligase used to perform a ligation between each of the plurality of single-stranded DNA molecules and the single-stranded first adaptor 01 (as shown in
Herein by ligating the first strand of the first adaptor 01 to the 3′ end of each single-stranded DNA molecule in step S200, each single-stranded DNA molecule is substantially labelled individually with a unique barcode (via the barcode sequence 100 in the first adaptor 01).
In embodiments where the first strand of the first adaptor 01 contains a first coupling partner 300a configured to be immobilized to a solid support attached to a second coupling partner 300b (via the stable coupling between the first coupling partner 300a and the second coupling partner pair 300b), after step S200 and before step S300 (mentioned below), the method includes the following step:
S250: immobilizing each of the plurality of single-stranded nucleic acid molecules ligated to the first strand of the first adaptor to a solid support.
Step S250 can be performed by incubating each single-stranded DNA molecule ligated to the first strand of the first adaptor 01 with the solid support at an appropriate temperature for an appropriate time period. In one specific example, the solid support is magnetic beads coupled with streptavidin, and the first adaptor is coupled with biotin. As such, the incubation can be performed at room temperature for 10-30 min. It is noted that this step S250 is optional and can be skipped in cases where no solid support is needed.
S300: synthesizing a complementary strand for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor to obtain a barcoded double-stranded DNA molecule corresponding thereto.
Herein S300 can be performed by a single-cycle PCR reaction via the aforementioned first primer 200′, which comprises a sequence complementary to the first primer recognition sequence 200 in the first strand of the first adaptor 01. Specifically, if the first adaptor 01 takes a single-stranded form as shown in
S310: annealing the first primer with each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor; and
S320: performing a single-strand extension reaction to form the double-stranded DNA molecule for each of the plurality of single-stranded DNA molecules ligated to the first strand of the first adaptor.
Herein S310 is to ensure a sufficient binding of the first primer 200′ with the first primer recognition sequence 200 in each single-stranded DNA molecule ligated to the first strand of the first adaptor, so that the single-stranded extension reaction (i.e. single-cycle PCR) can occur in sub-step S320. Specifically, sub-step S310 can include: slowly altering (increasing or decreasing) a temperature of the reaction to a working temperature (i.e. reaction temperature) of the single-stranded extension.
According to some embodiments, slowly altering a temperature of a reaction to a working temperature of the single-stranded extension reaction comprises: increasing the temperature from an original temperature of no more than ˜20° C., and preferably no more than ˜15° C., to the working temperature for the single-stranded extension reaction at a rate of no more than ˜3° C. per minute, and preferably no more than ˜1° C. per minute.
In one specific example where the first primer 200′ has a Tm of 32° C., S310 specifically includes: (1) adding the first primer to the reaction, and incubating the reaction at 65° C. for 2 min before quickly cooling on ice; (2) adding a BST DNA polymerase in the reaction, and incubating the reaction at 15° C.; and (3) slowly increasing the temperature of the reaction at a rate of around 1° C. per minute, until the temperature reaches 37° C. Correspondingly, S320 includes: incubating the reaction at 37° C. for 3-10 min. It is noted that in this specific example where the first primer 200′ has a relatively low Tm (˜30° C.), the reaction temperature can only be slowly increased to give rise to satisfactory results, and based on an actual experiment, the manner of slowly decreasing the temperature of the reaction fails to obtain a satisfactory result.
In another specific example, the first primer 200′ has a Tm of 60° C., S310 involves: (1) adding the first primer and a BST DNA polymerase to the reaction, and incubating the reaction at 70-80° C. for 2 min; (2) slowly cooling the temperature of the reaction at a rate of around 1° C. per minute, until the temperature reaches about 60° C. Correspondingly, S320 includes: incubating the reaction at a temperature within the range of 50-72° C. for 30 min. It is noted that in the above example where the first primer 200′ has a relatively high Tm (˜60° C.), it is also possible to slowly increase the reaction temperature.
It is noted that in S320, besides the BST 3.0 polymerase, other DNA polymerases (such as Klenow fragment) or a RNA reverse transcriptase can also be used.
Optionally, after S320, the method can further include a sub-step:
S330: performing a blunt-end repair to each double-stranded DNA molecule obtained from the single-stranded extension reaction.
After the single-stranded extension reaction in S320, each double-stranded molecule may have a 3′ overhang, which needs to be removed to ensure a high efficiency for any subsequent treatment, such as ligation with a second adaptor 02 as described below. Specifically, S330 can be performed in the presence of a T4 DNA polymerase (having a 3′ end exonuclease activity) and incubated at 25° C. for 15 minutes. Besides T4 DNA polymerase, other choices include Klenow Fragment or T4 polynucleotide kinase. It is possible to mixedly use these above enzymes.
It should be noted that if the first adaptor 01 takes a partially double-stranded form as shown in
After the above steps S100 (i.e. preparing single-stranded DNA molecules), S200 (i.e. ligating a first strand of a first adaptor to each single-stranded DNA molecule), optionally S250 (i.e. immobilizing ligation product), and S300 (i.e. synthesizing complementary strand for each single-stranded DNA molecule), a DNA library comprising a plurality of barcode-labelled double-stranded DNA sequences is thus constructed. Each barcode-labelled double-stranded DNA sequence corresponds to one original single-stranded nucleic acid molecule.
The DNA library may subject to further treatment or analysis depending on different purposes. For example, the DNA library may be treated such that each barcode-labelled single-stranded DNA molecule can be inserted into a vector, allowing for subsequent amplification and/or expression in a model organism (such as in E. Coli, a yeast, or a phage). Alternatively, the DNA library may subject to amplification to thereby obtained amplified DNA library before a subsequent genetic analysis, such as sequencing analysis, a variant/mutation analysis, or a copy number analysis, can be performed.
In the following, a specific example is provided to illustrate steps implicated to amplify each barcode-labelled double-stranded DNA sequence in the DNA library, in order to facilitate a subsequent analysis of the single-stranded nucleic acid molecules corresponding thereto. Specifically, each single-stranded nucleic acid molecule is pre-treated in step S100, labelled with a first strand of a first adaptor attached with a biotin moiety at a 3′ end thereof in step S200, immobilized to a solid support (more specifically, streptavidin-conjugated magnetic beads) in step S250 (via the biotin-streptavidin binding pair), and further treated to allow the synthesis of a complementary strand for each barcode-labelled single-stranded nucleic acid molecule in step S300. After these above steps, each original single-stranded nucleic acid molecule is converted into a corresponding barcode-labelled double-stranded DNA molecule immobilized onto a magnetic bead, which then undergoes further treatment to allow an amplification and a subsequent sequencing analysis using substantially an Illumina sequencing platform.
Specifically, as illustrated in
S400: ligating a second adaptor to a free end of each double-stranded DNA molecule immobilized to the solid support at an immobilized end.
Herein in the DNA library, each barcode-labelled double-stranded DNA sequence corresponding to one original single-stranded nucleic acid molecule is immobilized to the magnetic beads at the immobilized end via the aforementioned bonding between a biotin moiety attached to a 3′ end of the first adaptor and a streptavidin moiety attached to the magnetic beads. The free end of each double-stranded DNA molecule is substantially the end opposing to the immobilized end.
The third strand 02a comprises a sequence that is at least complimentary to a 5′-end sequence of the fourth strand 02b, and is configured to form a duplex with, and thereby ensures a stability of, the 5′-end sequence in the fourth strand 02b. In order to prevent the formation of concatemers or unwanted ligation products during subsequent ligation reaction, the third strand 02a is configured to have no phosphate group at its 5′ end.
According to some other embodiment as shown in
It is noted that besides the second primer recognition sequence 600, the fourth strand 02b of the second adaptor 02 can further comprise one or several other functional sequences, such as a second index sequence 910, a second barcode sequence 920, etc., as illustrated in
Specifically, ligation of the second adaptor 02 to the free end of each double-stranded DNA molecule immobilized to the solid support at the immobilized end can be performed using a T4 DNA ligase with an incubation at 16° C. for 1 hour, and the reaction can be performed using other enzymes and under other reaction conditions.
It is noted that because of the lack of a phosphate group in the free end (more specifically the 5′ end) of each double-stranded DNA molecule immobilized to the solid support, only the 3′ end at the free end of each double-stranded DNA molecule is ligated to the 5′ end of the fourth strand 02b, and there is a gap/nick between the 3′ end of the third strand 02a of the second adaptor 02 and the 5′ dephosphorylated end (formed in step S100) on the original single-stranded DNA molecule in each double-stranded DNA molecule (as shown by the arrow in
S500: eluting the DNA library from the solid support.
Herein in step S500, a strand complementary to the barcode-labelled and solid support-immobilized strand of each double-stranded DNA molecule in the DNA library can be eluted from the solid support, and the eluted strand substantially includes the second primer recognition sequence 600 in the second adaptor. In one specific example, step S500 can be performed by incubation at 95° C. for 5 minute in the presence of an elution buffer (such as TET buffer composed of 10 mM Tris-HCl, 1 mM EDTA, 0.05% Tween-20). Under these conditions, the original single-stranded DNA molecule ligated to the first strand in first adaptor can also be eluted from the solid support due to the instable binding of single biotin-streptavidin coupling at high temperature, but this DNA strand can not serve as PCR template because of the 5′ dephosphorylation gap on the original single-stranded DNA molecule which leads to no universal primer recognoization sequence on the newly formed 3′ end after the first cycle of PCR amplification.
S600: performing a PCR reaction to thereby amplify each double-stranded DNA molecule.
Herein the PCR reaction can be performed by means of a pair of primers respectively targeting the two end portions of each double-stranded DNA molecule.
According to some preferred embodiments, one of the pair of primers (i.e. Primer 1) can comprise a sequence corresponding to at least a portion of a sequence of the first primer which has been used for the single-stranded extension reaction, and another of the pair of primers (i.e. Primer 2) can comprise a sequence corresponding to at least a portion of a sequence in the fourth strand of the second adaptor. Herein “at least a portion” of a sequence can include part or all of the sequence.
It is noted that there is no limitation regarding the pair of primers used in the PCR reaction as long as each double-stranded DNA molecule corresponding to the each of the plurality of single-stranded DNA molecules in the sample can be amplified. Therefore, the one of the pair of primers (i.e. Primer 1) used in S600 can include a 3′ end portion that corresponds to a portion, or all, of the first primer recognition sequence 200, but can possibly include a sequence that does not correspond to the first primer recognition sequence 200 but corresponds to the sequence of the first primer at 5′ end of the first primer recognition sequence 200 (such as a second index sequence 400′ and a second sequencing primer sequence 900b in
In addition, the pair of primers can be enigineered as well. For example, Primer 1 can comprise a sequence corresponding to the first primer recognition sequence 200 as mentioned above, but can also include other functional elements, depending on practical needs. Similarly, the second primer can comprise a sequence corresponding to the second primer recognition sequence 600 as mentioned above, but can also include other functional elements.
In the embodiment as shown in the
In the embodiment as shown in the
In both of the embodiments as shown in
Furthermore, the presence of other functional sequences would allow the target sequence to be sequenced or for other purposes. For example, in some embodiments, Primer 1 and Primer 2 respectively include a pair of sequencing primers (e.g. Primer 2 includes a PE Primer I sequence, and Primer 1 includes a PE primer II sequence), thus the amplified target sequence can undergo direct sequencing using a current NGS sequencing platform (e.g. Illumina sequencing platform). Similarly, other functional elements such as the second index sequence 400′ can allow for an additional differentiation among different samples, for a convenience for a subsequent analysis.
Thus through steps S400-S600 as described above, each double-stranded DNA molecule in the DNA library that corresponds the barcode-labelled single-stranded nucleic acid molecule can be amplified. As such, there are sufficient copies for each barcode-labelled single-stranded nucleic acid molecule in a subsequent analysis, such as next generation sequencing (NGS) analysis, which can improve the sensitivity.
In addition to the sequencing analysis as described above, the amplified DNA molecules in the DNA library that correspond to the originally single-stranded nucleic acid molecules in the biological sample allow for further nucleic acid assays. Any means of testing for a sequence variant or sequence copy number variant, including without limitation, a point mutation, a deletion, an amplification, a loss of heterozygosity, a rearrangement, a duplication, may be used. Sequence variants may be detected by sequencing, by hybridization assay, by ligation assay, etc. Non-targeted assays may be used, where the location of a sequence variant is unknown. If locations of the relevant sequence variants are defined, specific assays which focus on the identified locations may be used, such as targeted sequencing, point-mutation targeted sequencing analysis (e.g. SAFE-SeqS, Duplex Sequencing, etc.). Any assay that is performed on a test sample involves a transformation, for example, a chemical or physical change or act. Assays and determinations are not performed merely by a perceptual or cognitive process in the body of a person.
The following are further noted. Single-stranded nucleic acid library construction can make assays feasible that would otherwise fail to yield valid sequencing-ready materials. The biological sample can be from any appropriate sources in the patient's body that will have nucleic acids from a cancer or lesion that can be collected and tested. Test samples can be also from any appropriate sources derived from patient tissue, such as FFPE slides, FFPE tissue blocks, and test samples can be also from any appropriate sources derived from other biological specimens, such as fossils, body remains of ancient human species or animal species.
Suitable test samples may be obtained from body tissue, stool, and body fluids, such as blood, tear, saliva, sputum, bronchoalveolar lavage, urine and different organ secreted juices. The samples may be collected using any means conventional in the art, including from surgical samples, from biopsy samples, from endoscopic ultrasound, phlebotomy, etc.
Obtaining the samples may be performed by the same person or a different person that conducts the subsequent analysis. Samples may be stored and/or transferred after collection and before analysis. Samples may be fractionated, treated, purified, enriched, prior to assay. Any of the assay results may be recorded or communicated, as a positive act or step. Communication of an assay result, diagnosis, identification, or prognosis, may be, for example, orally between two people, in writing, whether on paper or digital media, by audio recording, into a medical chart or record, to a second health professional, or to a patient. The results and/or conclusions and/or recommendations based on the results may be in a natural language or in a machine or other code. Typically, such records are kept in a confidential manner to protect the private information of the patient or the project.
Collections of barcoded adaptors, primers, control samples, and reagents can be assembled into a kit for use in the methods. The reagents can be packaged with instructions, or directions to an address or phone number from which to obtain instructions. An electronic storage medium may be included in the kit, whether for instructional purposes or for recordation of results, or as means for controlling assays and data collection.
Control samples can be obtained from the same patient from a tissue that is not apparently diseased, or can be obtained from a healthy individual or a population of apparently healthy individuals. Control samples may be from the same type of tissue or from a different type of tissue than the test sample. Control samples may be provided together with the barcoded adaptors, primers, and reagents in a kit for use in the method, where the control samples may be a standard reference sample for the purpose of validating the performance of the kit and the operation performed by the user.
The data described below document the results for the identification of ultra-rare mutations from a whole exome sequencing study based on one specific embodiment of the method for constructing a nucleic acid library as described above.
Barcoded single-stranded library construction method (as described above) is used to generate barcoded single-strand DNA based library for NGS studies. The barcode on each individual single-stranded DNA molecule is used as a marker to label each individual DNA sequence, only when a sequence variant (an SNV) is identified at the same corresponding sites on two complementary DNA strands labeled by different and non-complementary barcodes, can an SNV be called. Such barcoded single-stranded library is PCR error-proof and facilitates the identification of ultra-rare mutations (SNVs).
SNVs can be detected with confidence only when the sequencing system's error rate is significantly lower than the frequency of identified SNVs. Therefore, baseline error rate of an NGS pipeline is critical for its performance of detecting ultra-rare SNVs. To further assess the baseline mutation frequency of this method, an updated normal exome reference database was created for the patient. With the updated reference exome, the error rate for barcoded single-stranded based NGS method was calculated to be 2.25×10−10. This error rate is very close to the theoretical error frequency of 2.08×10−10 and the method is sufficiently accurate to identify most ultra-rare mutations.
The ultra-rare mutation detection performance of this method was then evaluated by the success rate of re-detecting the 38 Sanger sequencing validated sequence variants in the libraries created from normal DNA samples which were spiked with sequential dilutions of tumor DNA. As the dilution folds increased, as expected, less and less variants were detected (
Barcoded single-stranded library construction can be used as an improved pipeline to perform NGS, particularly targeted NGS. Improved performance in a human genome WES study has been demonstrated. Aside from WES, another very important application of barcoded single-stranded library would be the targeted resequencing of a gene panel. Targeted re-sequencing is one of the most popular NGS applications and it allows people to sequence a small cohort of gene targets to extreme depths, usually thousands of folds of coverage. And such sequencing depth can facilitate the detection of ultra-rare mutations with great sensitivity. In a barcoded single-stranded library based WES study, the entire exome of all human genes was attempted to be captured, where an over 98% coverage with the depth of over 200× was achieved on a standard NGS platform. More importantly, this method's detection limit of rare-mutation detection on whole exome scale is as low as 0.03%. For an even smaller cohort of target genes, the depth and coverage of barcoded single-stranded library NGS can be further increased, and the performance of ultra-rare mutation detection can be subsequently improved over additional several orders of magnitude.
Other than identifying ultra-rare SNVs with high sensitivity and accuracy, barcoded single-stranded library construction method can also be adopted for gene copy number variant (CNV) assays. Barcoded single-stranded library construction links a unique barcode to every single-stranded DNA molecules. Such barcode information can not only be used to label the molecules and create super reads for the purpose of reducing PCR errors, but also be used as a location marker for DNA fragments. After mapping the super reads back to human genome, the barcode on each super read can be assigned to the position where the super read sequence is mapped. Therefore, a human genome can be reconstructed by unique barcodes. Copy number information can be represented by the diversity of barcodes at subgenomic loci. More importantly, in this method, unique barcodes are specific to DNA single strands. Such information can allow further normalization of the CNV data by taking into the consideration that genomic DNA exists as duplex molecules and the density of unique barcodes for both DNA strands should match. Such calculation can massively improve the accuracy of CNV calling.
Aside from CNV analysis, large structural variants frequently observed in cancer genomes can also be analyzed in our pipeline. NGS sequencing improved by high sensitivity and deep coverage of library construction will provide reads covering the breakpoints with higher confidence than standard pipeline, and targeted capturing probes can be designed to specifically enrich subgenomic regions flanking popular genome breakpoints. A highly sensitive pipeline for translocation and large indel identification could be built based on barcoded single-stranded library construction pipeline.
In addition to applications in basic research, barcoded single-stranded library construction has a great potential in clinical NGS fields. This method can highly efficiently construct NGS DNA libraries with very low amount of DNA materials (≤20 pg), meanwhile it can detect ultra-rare mutations with high confidence. Such features are critical for NGS based clinical diagnostics where the samples are often limited and highly heterogeneous. A typical example would be the NGS sequencing of FFPE samples. FFPE has been a standard sample preparation method for many decades. Historically archived FFPE sample is a very valuable resource for retrospective studies in biomedical research. However, due to chemical modifications during specimen preparation and chronic damages to the tissue blocks or slides over long-term storage, it has been a challenging task to conduct NGS studies with FFPE samples. Poor DNA quality and artificial sequence changes are two major issues coming along with FFPE based NGS studies. WES data have been reported to be discordant between FFPE and fresh frozen samples at lower coverage levels (˜20×), however, this discrepancy can be reduced when higher coverages are achieved (Kerick, Isau et al. 2011). To ensure a high coverage in NGS sequencing, a sufficient number of original DNA molecules need to be incorporated into the library construction, and barcoded single-stranded library construction is a method meeting such a need.
This method has a great potential to discover novel low-frequency disease-causing variants in biomedical and clinical applications, and can identify more actionable therapeutic targets for patients. This method can fulfill an unprecedented level of personalized precision medicine by revealing the most complete patient genomic profile to date including high-frequency, low-frequency and particularly ultra-low-frequency mutations. This method can also be applied in other clinical applications, like circulating DNA sequencing from body fluid samples, where only limited amount of DNA materials is available. In clinical NGS applications, it is critical to construct NGS libraries from very limited amount of highly heterogeneous samples thus being less- or non-invasive; to highly efficiently enrich target sequences thereby reaching a great sequencing depth with limited cost and improved diagnostic sensitivity; and to remove artificial sequencing errors as completely as possible for the best diagnostic specificity. This method has been demonstrated to meet these needs with great potentials in numerous NGS applications.
Materials and Methods
The paired tumor and normal tissue samples from a pancreatic cancer patient of Asian race were obtained in accordance with guidelines and regulations from Tianjin Medical University Cancer Institute & Hospital, P.R. China after Institutional Review Board (IRB) approval at Tianjin Medical University, and under full compliance with HIPAA guidelines. An informed consent for conducting this study was obtained from the patient. The tumor tissue sample has an estimated neoplastic content of 43.4%.
Library preparation: Genomic DNA from patient normal and tumor fresh frozen tissues were extracted using DNeasy Blood & Tissue Kit (Qiagen) and sheared into 150 bp fragments with Diagenode's Bioruptor at a program of 7 cycles of 30 seconds ON/90 seconds OFF using 0.65 ml Bioruptor® Microtubes. Barcoded single-stranded library preparation starts from a complete dissociation of DNA duplex to form single-stranded DNA and tagging the 3′ end of each DNA single strand individually with a unique digital barcode. Barcoded first adaptors were synthesized with a sequence as described above and illustrated in
Real-time PCR assays with SYBR green detection was carried out using an ABI PRISM 7500 Sequence Detection System (Applied Biosystems). Briefly, the reaction conditions consisted of 500 ng of genomic DNA or DNA library products, 0.2 μM primers, and SYBR Green Real-Time PCR Master Mix (ThermoFisher Scientific) in a final volume of 20 μl. Each cycle consisted of denaturation at 95° C. for 15 seconds, annealing at 58.5° C. for 5 seconds and extension at 72° C. for 20 seconds, respectively. Gene specific primers were designed using Primer 3 (Untergasser, Cutcutache et al. 2012) and their sequences are provided in
r
(A/B)
=ΔE
ΔCt, where ΔCt=Ct(sample B)−Ct(sample A)
Whole exome sequencing was performed on an Illumina HiSeq 2500 platform according to manufacturer's manual. Total number of on-target reads from randomly chosen 5 million to 50 million reads were calculated. After trimming and barcoded super read grouping, SNVs were called with GATK (version 3.6) in a default mode as recommended by the GATK documentation with reference genome of Hg19 (McKenna, Hanna et al. 2010). In brief, for every sample (tumor or normal DNA), sequencing result was preprocessed by mapping to reference genome with BWA (version 0.7.10), and duplicates were marked with Picard (version 2.0.1). Base Recalibration was performed to generate the reads ready for SNV analysis. For individually processed T/N pair reads, Indel Realignment was performed to generate pairwise-processed T/N pair reads. HaplotypeCaller was used for raw SNV calling. Output from variant calling was directly used for SNV detection by MuTect (version 1) (Cibulskis, Lawrence et al. 2013). Mutations were filtered through a 4-step approach introduced in the section “Mutation and ultra-rare mutation detection”. Low-quality variant with a Phred score <30.0 was abandoned. Paired SNVs from complementary reads bearing different barcodes were identified as true mutations and subject to further validation through Sanger sequencing. The data yields after each step of data analysis for a barcoded single-stranded library NGS study were shown in
Mutation and Ultra-Rare Mutation Detection
The significantly increased number of unique reads obtained through barcoded single-stranded library approach enabled us to apply our stringent filters with the following 4-step procedure.
Step 1) group reads with the same barcode that are representing PCR duplicates of an original barcoded single-stranded DNA molecule, and call it a unique read family (URF);
Step 2) combine reads within each URF obtained from Step 1) by requesting >95% sequence identity among the reads;
Step 3) extract the unique DNA sequence and the barcode sequence for each URF, and call it a “super read”;
Step 4) for all the super reads identified in Step 3), find their paired complementary super reads, and only score sequence variants with matched complementary sequences from paired super reads. To accommodate damaged DNA molecules in the sample, complementary super reads may not be at the same length (
To evaluate the performance of barcoded single-stranded library in detecting low frequency (ultra-rare) mutations, 100 ng tumor DNA sample was sequentially diluted by 10, 100, 1,000 and 10,000 folds, and spiked each of them into the same amount (100 ng) of genomic DNA extracted from the paired normal tissue of the aforementioned cancer patient. This design can simulate early stages of cancer occurrence. The major obstacles in early cancer diagnostics using NGS include the very low allelic fractions of tumor specific mutations in the sample.
Build a highly accurate reference exome for ultra-rare mutation identification: To highly accurately assess the baseline mutation frequency of barcoded single-stranded library pipeline, six replicates of standard NGS DNA libraries were constructed in parallel, each using 100 ng normal DNA input. These six replicates of exome datasets were used to re-build our own reference exome database for this particular patient by requesting that if the same SNV was observed in ≥5 out of 6 independent datasets, the SNVs were considered as germline variants and updated our reference exome sequence database. For a standard NGS pipeline, the error rate is 1%, and the chance to see exactly the same random error at a fixed position for 5 times is (⅓*1%)5=4.12×10−13. This number means that if this approach is used to sequence the whole human genome once, there is presumably going to be only one artificial error, because 3×1012 human genome bases X (4.12×10−13)=1.24. However, the human exome is being enriched and sequenced, which is occupying only 1.5% of human genome, therefore the chance to see a single artificial error within the entire human exome is only 1.86% (=1.5%×1.24). An updated highly accurate normal exome reference database of the patient was built accordingly.
Barcoded Single-Stranded Library Construction Creates Errorproof Libraries with Ultra-Low Quality and Quantity of DNA
The library is prepared by a barcoded single strand library construction method. To assess the performance of such method in creating valid NGS libraries from limited amounts of DNA materials, 6 barcoded single-stranded libraries were constructed from sequentially diluted genomic DNA extracted (500 ng, 20 ng, 1 ng, 100 pg, 20 pg and 10 pg) from the normal pancreas tissue of a cancer patient. The first step of library construction is to ligate barcoded first adaptors to single strand DNA molecules, and this step is critical, since it provides the initial pool of DNA molecules for all downstream procedures. The average ligation efficiency for this step measured for 6 libraries were 32.3%, 46.5%, 52.1%, 40.3%, 35.1% and 30.5% (
298 human cancer related genes located on chromosome 1 through 22 and chromosome X (
Our results demonstrate that barcoded single-stranded library construction method is able to create DNA library from very low DNA material amount (10˜20 pg) and generate NGS feasible library products (>1 ug) with high broadness of coverage. The library has no obvious GC content bias and library molecules are evenly amplified to represent original input DNA's genome sequence abundance. These results also indicate that it becomes less efficient to amplify certain subgenomic regions when DNA input amount is extremely limited, i.e. around or lower than 20 pg. To construct DNA libraries with extremely low amount of DNA, a whole genome pre-amplification may be necessary. However, such procedure may generate artificial errors before the initial barcoding step in library construction, and can hinder its rare mutation detectability. Therefore, no further test with any lower amount of DNA materials for library construction, and the minimal input limit for a successful library construction was noted as 20 pg DNA. This amount (20 pg) contains the total DNA materials from less than 3 human somatic cells. The vast majority of biological samples will be more than enough to offer such abundance level of DNA materials, and our library construction method has demonstrated an excellent performance in creating NGS libraries with this low amount of DNA.
Whole Exome Sequencing
To evaluate the performance of barcoded single-stranded library construction in NGS, WES assays were performed using this method and compared the data to what obtained through standard NGS library preparation with a standard exome enrichment procedure. All libraries were constructed with 100 ng genomic DNA derived from the normal tissue of the cancer patient and 3 technical replicates were performed for each sample. All NGS runs were carried out on the same Illumina HiSeq 2500 platform with the same technical specifications of the runs. As shown in
All NGS data were analyzed on the same software pipeline with the same settings. Raw reads were filtered to remove duplicates, multiple mappers, improper pairs, and off-target reads. On average 75.4% reads were retained after filtering (
Next, the correlation between coverage efficiency and sequencing depth in barcoded single-stranded library was evaluated. Filtered reads were randomly selected in 5 million read increments from 5 million to 50 million. The fractions of the retained on-target reads covering the depths of at least 10×, 20×, 50×, and 100× were plotted using randomly selected 5 to 50 million reads (
To assess the impact of GC content on barcoded single-stranded library WES result, normalized mean read depth was plotted against GC content. There is a correlation between GC content and read depth in the barcoded single-stranded library WES experiment (
Detection of SNVs
One of the most important goals of exome sequencing is to identify sequence variants that are disease-causing or of clinical significance. To evaluate the sensitivity and specificity of sequence variant identification performance of barcoded single-stranded library construction, a WES study was conducted with 100 ng genomic DNA from a pair of normal and tumor tissue samples obtained from the same cancer patient. The same SNV calling pipeline was used for all data analysis in this study. Briefly, the normal DNA libraries created by barcoded single-stranded library construction method was sequenced and the data was analyzed using a standard data analysis pipeline, where the single-stranded barcodes were directly trimmed off, and 78,721 SNVs were detected from the exonic sequences of normal DNA sample at a read count of 30 million (error frequency 2.6×10−3,
The accuracy of mutations identified by barcoded single-stranded library based mutation calling was then examined. Following the 4-step data analysis procedure introduced in Materials and Methods, super reads were generated after Step 3). Steps 1-3 helped to reduce the mutation frequency by over 2 orders of magnitude from 2.6×10−3 down to 2.5×10−5 by removing most PCR related errors (
To determine the accuracy of variant detection by barcoded single-stranded library construction for clinically relevant mutations, the WES data generated from the normal and tumor tissue pair were analyzed side-by-side. For all assessed heterozygous exonic positions, the result was filtered through a 4-step procedure. The filtered result showed that for barcoded single-stranded library based WES study identified 97 sequence variants that were exclusively detected in tumor tissue DNA sample with 100× coverage at different fractions. 40 moderate- to high-abundance (>5%) variants were subject to Sanger sequencing validation, and 38 were confirmed (
A Protocol for Barcoded Single-Stranded Library Preparation
Fragmentation of Genomic DNA into 250 bp by BioRuptor
Turn on BioRuptor and water bath (set to 3° C.) at least 45 minutes before starting.
Place up to 1 μg of DNA adjusted to 57 μl with 1×TE buffer in a BioRuptor microtube.
Shear with below setting for a target size range of 175 bp:
Remove the large genomic DNA fragments through binding with 0.6×AMPure beads.
Transfer the supernatant into a new tube and then purify with 0.8×AMPure beads. Elute into 30 μl 1×TE buffer.
Heat denaturation and first adaptor ligation.
DNA in ddH2O to a volume of 33 μL in lo-bind tube.
Add 8 μl CircLigase II 10× reaction buffer.
Add 4 μl 50 mM MnCl2.
Add 1 μl (1 U) FastAP.
Incubate at 37° C. for 10 minutes then 95° C. for 2 minutes in Eppendorf thermomixer (thermal cycler with a heated lid in paper)
Place reaction tube into an ice-water bath.
Add 32 μl 50% PEG-8000
Add 1 μl 10 μM of the first adaptor as illustrated in
Vortex Intensely to Mix
Add 1 μl CircLigase II (Epicentre)
Vortex Intensely to Mix
Incubate at 60° C. for 3 hour in a thermal cycler then hold at 4° C.
Add 2 μl stop solution (98 μl 0.5M EDTA (PH8.0), 2 μl Tween-20)
Freeze Overnight
Immobilization of ligation products on streptavidin beads
Wash 20 μl of MyOne C1 beads twice with 500 μl bead-binding buffer (1 M NaCl, 10 mM Tris-HCl pH 8.0, 1 mM EDTA, 0.05% Tween-20, 0.5% SDS).
Re-suspended in 250 μl bead-binding buffer and transfer to a 1.5 ml-siliconized tube (Sigma-Aldrich).
Thaw reaction mix.
Incubate reaction mix at 95° C. for 2 minutes
Chill reaction mix in ice water bath.
Add reaction mix to beads and pipette up and down 10 times.
Rotate tube at room temp for 20 minutes.
Remove supernatant.
Wash beads with 200 μl of wash buffer A (100 mM NaCl, 10 mM Tris-HCl pH 8.0, 1 mM EDTA, 0.05% Tween-20, 0.5% SDS) and once with 200 μl wash buffer B (100 mM NaCl, 10 mM Tris-HCl pH 8.0, 1 mM EDTA, 0.05% Tween).
Primer Annealing and Extension
Remove supernatant.
Re-suspend beads in 47 μl reaction mixture:
Incubate at 65° C. for 2 min.
Immediately chill in ice-water bath
Transfer to thermocycler pre-cooled to 15° C.
While in thermocycler add 3 μl (24 U) Bst 3.0 DNA polymerase (New England Biolabs)
Incubate reaction at 15° C. for 5 minutes, then slowly increase the reaction temperature to 37° C. at a rate of no more than 1° C. per minute, and then hold the reaction at 37° C. for 3 minutes.
Mix gently every five minutes to keep beads in suspension.
Discard supernatant.
Wash beads with wash buffer A.
Beads were resuspended in 200 μl stringency wash buffer (0.1×SSC buffer (Sigma-Aldrich), 0.1% SDS).
Incubate at 45° C. for 3 min in thermal mixer.
Wash beads with 200 μl wash buffer B.
Removal of 3′-overhangs
Re-suspend beads in 99 μl of a reaction mix containing:
Add 1 μl (5 U) T4 DNA polymerase (Fermentas).
Incubate for 15 min at 25° C. in a thermal cycler.
Gently mix every five minutes to keep beads suspended.
Add 10 μl of EDTA (0.5M) to reaction mixture and vortex.
Wash beads with wash buffer A, stringency wash buffer with 45° C. incubation for 3 mins and then wash buffer B as described above.
Prepare Double-Stranded Adaptor for Ligation
A 100 μM solution of double-stranded DNA adaptor was generated by hybridizing two oligonucleotides (double-stranded adaptor oligo 1 and double-stranded adaptor oligo 2, sequence shown below) as follows: In a PCR reaction tube, 20 μl 500 μM DEEPER DS adaptor oligo 1, 20 μl 500 μM DEEPER DS adaptor oligo 1, 9.5 μl TE buffer and 0.5 μl 5 M NaCl were combined.
This mixture was incubated for 10 seconds at 95° C. in a thermal cycler and cooled to 14° C. at a speed of 0.1° C./s. Final concentration of 100 μM was reached by dilution with 50 μl TE.
Blunt-End Ligation of Second Adaptor and Library Elution
Re-suspend beads in 98 μl of a reaction mix containing:
Mix thoroughly and add 2 μl (10 U) T4 DNA ligase (Fermentas).
Incubate for 1 hour at 25° C. in a thermal mixer.
Gently mix every twenty minutes to keep beads suspended.
Wash beads with 0.1×BWT+SDS (wash buffer A), stringency wash and 0.1×BWT (wash buffer B) as described above.
Re-suspend beads in 25 μl elution buffer (10 mM Tris-HCl pH 8.0, 0.05% Tween-20) and transferred to single-cap PCR tubes.
Incubate for 5 min at 95° C. in a thermal cycler with heated lid.
Collect supernatant in fresh tube.
Library Amplification
Take 1 μl ligated DNA for test PCR reaction:
Prepare a master mix by multiplying the amount in column “per reaction” by the number of reactions plus one. Add in order the following:
MIX well
add 1 μl DNA
MIX well
Amplification Conditions:
1 minute at 98° C.
10˜14 cycles of:
5 minutes at 72° C.
Hold at 4° C.
PCR primer sequences:
The PCR is performed in two wells for each sample, 50 μl each. Then the amplified PCR product was purified using AMPure beads with ratio 1:1 (beads:sample), elute in 30 μl 1×TE buffer.
Use Qubit to quantify yield. You will have ˜150 ng/μl in general.
The present application claims benefit of U.S. Provisional Application No. 62/482,189, filed on Apr. 6, 2017, and is a continuation of, and claims priority of, international application No. PCT/US2018/016778, filed on Feb. 4, 2018, the disclosures of which are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62482189 | Apr 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2018/016778 | Feb 2018 | US |
Child | 15908190 | US |