The disclosure relates to a method for constructing a PacBio sequencing library. In particular, The disclosure relates to a method for rapidly constructing a PacBio sequencing library by utilizing the properties of a double-stranded DNA and a thermostable RNA ligase under a high temperature.
Next-generation sequencing technologies, which have become increasingly mature in recent years, are widely used in clinical research due to their outstanding advantages such as high throughput, high accuracy, high sensitivity, high automation and low operating costs. With the rapid development of sequencing technologies, third-generation sequencing technologies have also emerged, including SMRT technology from Pacific Biosciences (hereafter referred to as PacBio) 1, nanopore single molecule technology from company Oxford Nanopore Technologies 2 and Heliscope technology from company Helicos 3. Compared to the previous two generations of sequencing technologies, their most important feature is single-molecule long-fragment sequencing, wherein SMRT technology and Heliscope technology use fluorescent signals for sequencing, while nanopore single-molecule sequencing technology uses electrical signals generated by different bases for sequencing. Because the third-generation sequencing technology does not require a PCR amplification, the sequencing reaction speed is fast and the bias for GC bases is low. However, a single-base sequencing is less accurate. The sequencing libraries of PacBio have a dumbbell-shaped structure, sequencing DNA polymerase can amplify the target fragment of the library in multiple rounds, and the results of multiple rounds of sequencing can be mutually calibrated. Thus, the accuracy of PacBio sequencing after calibration is high, and the accuracy of 10 Kb target fragment can reach 99.99%.
The construction of a dumbbell-shaped PacBio sequencing library generally includes the following steps: (1) obtaining a target double-stranded DNA; (2) repairing and filling ends of the DNA; (3) ligating with PacBio linkers; (4) purifying the DNA; (5) repairing the DNA; (6) removing unligated linkers and the target DNA by exonuclease digestion; (7) removing linkers by two-step purification; (8) adding sequencing primers to anneal and DNA polymerase to form PacBio sequencing libraries. Depending on the characteristics of the target double-stranded DNA, the step (5) may be omitted. Traditional PacBio sequencing library construction is tedious, time-consuming and inefficient.
In view of above, the disclosure provides a method for constructing a PacBio sequencing library. Specifically, the method of the disclosure for constructing a PacBio sequencing library comprises four steps of obtaining a target double-stranded DNA, respectively connecting two ends of the double-stranded DNA to form a closed loop, purifying the DNA, combining the sequencing primers and adding a DNA polymerase, preferably consisting of the above four steps. Thermostable RNA ligases, with the property of ligating single-stranded ssDNA, include Thermus bacteriophage RNA ligases 4, 5, archaebacterium RNA ligases such as Methanobacterium thermoautorophicum RNA ligase 1 6 and the like. Under a high temperature, the ends of double-stranded DNA are unlocked by respiration to form single strands 7, and thermostable RNA ligases can respectively connect the 5′ phosphate and 3′ hydroxyl linkages at the ends of two single-stranded DNA into a loop, to form a dumbbell-shaped DNA library structure. In combination with specific sequencing primers and sequencing DNA polymerases, this library can be applied to PacBio sequencing platform for sequencing (
Thus, the purpose of the disclosure is to solve the problem of complicated and inefficient construction of PacBio sequencing library at the current stage. After obtaining the target double-stranded DNA, the two ends of the DNA are respectively connected into a loop by a thermostable RNA ligase, and the dumbbell-shaped DNA library can be quickly obtained after purification. PacBio sequencing libraries are formed by binding sequencing primers complementary to terminal circular DNA, and binding with sequencing DNA polymerase.
Thus, in a first aspect, the present application provides a method of constructing a PacBio sequencing library, comprising the following steps: (1) obtaining a target double-stranded DNA, and optionally further purifying said target double-stranded DNA; (2) adding a thermostable RNA ligase to respectively connect two ends of said double-stranded DNA to form a closed loop to obtain a dumbbell-shaped DNA library; (3) purifying said dumbbell-shaped DNA library; and (4) binding with a sequencing primer and adding a DNA polymerase to obtain a PacBio sequencing library.
In one embodiment, the steps and reaction conditions for the specific construction of a PacBio sequencing library may vary and can be adjusted by those skilled in the art as needed. If the reaction system for obtaining the target double-stranded DNA in step (1) affects the reaction efficiency of the thermostable RNA ligase, it is necessary to add a step of purifying said double-stranded DNA after step (1). The purification method can be a magnetic bead-based or a silica membrane column-based method, and the like.
Under a high temperature, the thermostable RNA ligase has a high efficiency for respectively connecting the two ends of the DNA into a loop, and a dumbbell-shaped DNA with a high-purity can be directly obtained after purification. In one embodiment, if the sequence of the target double-stranded DNA in step (1) causes the thermostable RNA ligase to be inefficient in respectively connecting the two ends of the double-stranded DNA to form a closed loop, affecting the subsequent sequencing steps, then it is necessary to additionally treat with an exonuclease after step (2) so as to remove the non-dumbbell DNA.
According to an embodiment, said target double-stranded DNA is obtained by a PCR amplification, a multiplex PCR amplification, or a CRISPR/Cas9 cleavage.
In one embodiment, the double-stranded DNA is an HBB gene. In this embodiment, the primer sequences for PCR amplification are shown in SEQ ID NO: 1 and 2.
According to an embodiment, the sequences at both ends of said target double-stranded DNA are the same or different.
According to a preferred embodiment, the ends of said target double-stranded DNA are blund ends and/or sticky ends.
According to a preferred embodiment, the 5′ base at the end of the target double-stranded DNA has a phosphate group, and the 3′ base at the end of the target double-stranded DNA has a hydroxyl group. If the 5′ base at the end of said target double-stranded DNA does not have a phosphate group, the 5′ at the end of the target double-stranded DNA can be phosphorylation modified by a kinase such as T4 polynucleotide kinase.
According to the present application, the two ends of the target double-stranded DNA are respectively connected to form a closed loop with the thermostable RNA ligase, thereby forming a dumbbell-shaped DNA library. Specifically, the thermostable RNA ligase can be derived from commercial products (e.g., Lucigen's CircLigase II ssDNA Ligase, Cat # CL9021K) or a purified protein, i.e., selected from Thermus bacteriophage RNA ligase, an archaebacterium RNA ligase such as Methanobacterium thermoautorophicum RNA ligase 1 and the like. The conditions and methods for respectively connecting two ends of the target double-stranded DNA to form a closed loop can be adjusted by those skilled in the art according to actual needs. Said thermostable RNA ligase is incubated at a temperature suitable for said thermostable RNA to remain active, for a sufficient time to respectively connect the two ends of said double-stranded DNA to form a closed loop. For example, the target double-stranded DNA may be incubated at 40-70° C. suitable for thermostable RNA ligase activity for 30 minutes to 16 hours, so that the reaction of connecting the two ends to form a closed loop is fully carried out.
According to a preferred embodiment, said thermostable RNA ligase is a pre-adenylated thermostable RNA ligase.
The purpose of the purification in step (3) is primarily to remove the enzyme required for the reactions in steps (1) and (2) and the components of buffer solution. In one embodiment, the purification can be performed by a magnetic bead-based or a silica membrane column-based method, and the like.
According to a preferred embodiment, said circular DNA sequences at both ends of said dumbbell-shaped DNA library are the same or different. If the circular DNA sequences at the two ends are different, the corresponding sequencing primers can be designed according to the DNA sequence of one end of the two ends.
According to a preferred embodiment, said target double-stranded DNA has or does not have a Barcode, which can be decided by a person skilled in the art as necessary.
According to a preferred embodiment, the length of said sequencing primer which is inversely complementary to the 4 sequence at one end of said dumbbell-shaped DNA library is 6-40 nt. Preferably, the sequence of said sequencing primer is shown in SEQ ID NO: 3.
The method described in the present application is characterized in that the thermostable RNA ligase respectively connects two ends of the double-stranded DNA to form a closed loop in the range of 40-70° C., and which facilitates the rapid construction of a PacBio sequencing library.
A second aspect of the present application also provides a kit, said kit is used for constructing a PacBio sequencing library by the method according to the first aspect of the present application.
According to a preferred embodiment, said kit comprises (a) one or more reagents selected from the group consisting of an amplification primer for the target double-stranded DNA or CRISPR/Cas9 reagent, a thermostable RNA ligase, a sequencing primer, and a DNA polymerase; and (b) an instruction.
The superior technical effect of the method described in the present application lies mainly in the following aspects:
(1) Simple and rapid experimental procedure. After obtaining the target double-stranded DNA, it is only necessary to use the thermostable RNA ligase to respectively connect the two ends of the double-stranded DNA to form a closed loop and then the dumbbell-shaped DNA library structure can be obtained.
(2) High reaction efficiency. Under the high temperature condition, the thermostable RNA ligase has a high efficiency for connecting the two ends of the DNA to form a closed loop, so the step of exonuclease digestion to remove the un-looped DNA can be omitted and the high-purity dumbbell-shaped DNA can be directly obtained after purification.
(3) High flexibility of target double-stranded DNA ends and the sequencing primer. Under a high temperature, the target double-stranded DNA ends are partially melted due to respiration. The 5′ phosphate group and 3′ hydroxyl group of the two ends of double-stranded DNA are respectively connected by the action of thermostable RNA ligase to respectively form a closed loop structure, and the reverse complementary sequencing primer can be designed. Taking PCR as an example, if only a single target region is detected, a sequencing primer that is reverse complementary to the end of the target region can be designed; if multiple target regions are detected simultaneously using multiplex PCR, the same sequence can be added to the end of the PCR primer to facilitate the design of reverse complementary sequencing primers.
The disclosure will be described in detail below with reference to examples. It should be noted that those skilled in the art should understand that the examples of The disclosure are only for the purpose of illustration, and do not constitute any limitation to The disclosure.
200 μL of human peripheral blood was collected with an EDTA anticoagulant tube. The reaction system was prepared according to the following table (wherein the 16 bases marked with an underline are the Barcode sequence bcl001 provided by the PacBio company. If there are multiple samples, different Barcodes can be used for each sample).
GCACTCTGATATGTGGAGGGAGGGCTGAGG
GCACTCTGATATGTGGGGTGGGCCTATGACA
On the PCR instrument, the amplification was performed under the following conditions:
After amplification was completed, Qubit dsDNA BR reagent (ThermoFisher, Cat # Q32850) was used to determine DNA concentration on a Qubit 3 Fluoromter (ThermoFisher, Cat # Q33216), and ddH2O was used to dilute the amplification product to 100 ng/μl. The PCR amplification product was verified with a DNA agarose gel (
The reaction system was prepared as indicated in the following table.
On a PCR instrument, the reaction system was reacted at 60° C. for 1 hour.
After step 2 was completed, 0.6× Ampure PB magnetic beads (Pacbio, Cat #100-265-900) were used to purify twice according to the manufacturer's instruction, and finally, 10 μl Elution Buffer was used for DNA elution. The obtained DNA Elution Solution is the target DNA dumbbell-shaped DNA library. The DNA concentration determined on a Qubit 3 Fluoromter (ThermoFisher, Cat # Q33216) using Qubit dsDNA HS reagent (ThermoFisher, Cat # Q32851) was 43.4 ng/μl.
The reaction system was prepared as indicated in following table.
On the PCR instrument, the amplification was performed under the following conditions:
As the reaction was completed, 1.5× Ampure PB magnetic beads (PacBio, Cat #100-265-900) were used to purify twice according to the manufacturer's instruction and the DNA was finally eluted with 10 ul Elution Buffer.
The reaction system was prepared according to the following table, in which the reagents were obtained from Sequel II Binding and Internal Control 1.0 Kit (PacBio, Cat #101-731-100):
The reaction system was reacted at 30° C. for 1 hour on the PCR instrument, and then placed at 4° C. to form a PacBio sequencing library.
92 μl Ampure PB magnetic beads (PacBio, Cat #100-265-900) were added to the product of 2). Then, the PacBio sequencing library was purified according to the instructions of the PacBio SMRT 8.0, and finally was eluted by 101.1 μl Complex Dilution Buffer.
98.5 μl of the purified library in step 3) was added to 3.8 μl of Diluted Internal Control from Sequel II Binding and Internal Control 1.0 Kit (PacBio, Cat #101-731-100), 11.5 μl DTT and 1.2 μl Sequel Additive. After mixing evenly, the product was tested on Sequel II platform using SMRT Cell 8M sequencing chip (PacBio, Cat #101-389-001) and the sequencing reagent (PacBio, Cat #101-768-000), with CCS mode for 15 hours.
Representative sequencing results are presented in
It should be noted that although the above examples elucidate some features of The disclosure, they are not intend to limit the disclosure. Those skilled in the art know there can be various modifications and changes. The reaction reagents, reaction conditions and others involved in PacBio sequencing library construction can be adjusted and changed according to specific needs. Therefore, for those skilled in the art, without departing from the concept and principle of The disclosure, several simple substitutions can be made, and these should all be included within the protection scope of The disclosure.
[7] Altan-Bonnet G, et al. Bubble Dynamics in Double-Stranded DNA. Phys Rev Lett. 2003 Apr. 4; 90(13): 138101. doi: 10.1103/PhysRevLett.90.138101.
Number | Date | Country | Kind |
---|---|---|---|
201910560767.2 | Jun 2019 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2020/097545 | 6/22/2020 | WO |