The present invention generally relates to the technical field of nucleic acid sequencing. In particular, the present invention relates to a nucleic acid molecule capable of blocking a motor protein, a library containing the same, a method of constructing the nucleic acid molecule or the library containing the same, and uses of the nucleic acid molecule or library containing the same.
Nucleic acid sequencing has become an indispensable and important research method in the field of life science research. Along with this, genomics technology based on large-scale sequencing data has also played an important role in different research and application directions, such as tracing the origin of complex diseases and dynamic monitoring of their development processes, directional breeding of economic crops and animals, research and protection of different biological genetic resources, etc.
However, the prerequisite for the above research is how to use high-throughput parallel sequencing to obtain high-precision and complete nucleic acid sequences. At present, the mainstream sequencing-by-synthesis technology has high sequencing accuracy, and the original accuracy rate of most sequenced bases can reach more than 99.9%, but the read length thereof is short, and it is difficult to restore the complete target nucleic acid sequence using these short-read length sequences, whether via resequencing alignment or de novo assembly.
Therefore, how to obtain long-range information in nucleic acid sequences has become one of the hot issues in the current sequencing technology research. Commercial companies that can currently provide long-read sequencing solutions mainly include: Pacific Bioscience (abbreviated as PacBio, the same below) in the United States, which provides single-molecule real-time sequencing, and Oxford Nanopore Technology (abbreviated as ONT, the same below) in the United Kingdom, which provides nanopore sequencing. In addition, companies such as Quantum SI and Genia in the United States, and Qitan Tech, Geneus, and Axbio in China have all entered this technology field and released prototypes.
Existing nanopore sequencing technology requires the use of motor proteins to drive the unwinding of double-stranded DNA to form single-stranded DNA, which then enters the nanopore, and base sequences are distinguished according to current changes caused by different bases. In current technology, the role of motor proteins is very critical. It undertakes two main functions. One is to unwind double-stranded DNA and ensure that single strands of DNA enter the pore, thereby ensuring the uniqueness of sequence recognition; the other is to control the speed of DNA passing through the nanopore so that it is not too fast or too slow, thereby ensuring that the base recognition algorithm can distinguish different sequences very well. However, it needs to be emphasized that in its natural state, the motor protein can spontaneously unwind double-stranded DNA once it is combined with the library adapter in the presence of ATP in the system. This will cause the library to be sequenced to totally become single strands, and cannot be sequenced normally. In order to solve this problem, the existing technical solution is to introduce a compound such as short-chain polyethylene glycol (e.g., iSp18) or polypropylene glycol (e.g., SpC3) into an oligonucleotide sequence that acts as an adapter in the library to be sequenced, which reduces the negative charge on this segment of the DNA chain and inhibits the motor protein from moving forward further.
However, there are some problems with the current nanopore sequencing technology solution. During sequencing, the motor protein will first stay at the spacer, then when the library moves to the nanopore, the spacer will pass through the active area of the motor protein under the action of electric field force, and lose its blocking effect. The motor protein then begins to unwind normally, thereby initiating sequencing. This means that the existing technical means inevitably need to detect an adapter sequence after the spacer during sequencing, and this sequence may cause crosstalk to subsequent information analysis results. Since a fixed adapter sequence is first detected at the beginning of sequencing, an additional process of filtering adapter sequence is required in subsequent biological information analysis, resulting in a waste of part of the sequencing read length; and due to the low accuracy of nanopore sequencing, the adapter sequence may be processed incompletely, which ultimately produces crosstalk in subsequent analyses; in addition, experimental observations indicate that special adapter sequences formed by short-chain polyethylene glycol (e.g., iSp18) or polypropylene glycol (e.g., SpC3) have weaker signal characteristics, so that it is difficult for downstream algorithms to determine the starting point of sequencing.
With the advancement of science and technology, the current clinical sample mutation detection is no longer satisfied with detecting only small-scale variations (single nucleotide mutations and small deletion/insertion mutations), and some genetic abnormalities caused by large-scale structural variations are gradually being analyzed. Since structural variation detection is more sensitive to sequencing read length, long-read sequencing technology will gradually shift from the field of scientific and technological services to the field of clinical detection, and the requirements for accuracy and cost control of sequencing technology will also increase.
In terms of accuracy, the Hi-Fi sequencing method based on circular consensus sequencing launched by PacBio in 2019 can achieve an average sequencing accuracy rate of more than 99% by repeating 5 reads. However, since it is required to repeatedly sequence the same molecule for several times, the cost is high.
In terms of cost, the latest PromethION 48 system launched by Oxford Nanopore (ONT) can achieve single Gb sequencing costs between US$2 to US$16, which is gradually approaching the cost of synthetic sequencing methods that are widely used in the market. Although the accuracy rate of this system is still insufficient as compared to the PacBio's circular consensus sequencing method, according to the latest data disclosed by Oxford Nanopore, its accuracy rate can reach 98.4%, which has been significantly improved as compared to the early nanopore data.
Although the current long-read sequencing technology still has some shortcomings, it represents an important direction for future technology development. With the advancement of technology, long-read sequencing is expected to penetrate into the field of clinical detection. The nanopore sequencing solution has become a key breakthrough direction in research due to its low cost and still having room for further improvement in accuracy.
Different from conventional sequencing-by-synthesis, the library for nanopore sequencing needs to be combined with motor proteins (usually helicases) in advance, and the motor proteins need to be kept at the adapter position of the library to be sequenced without unwinding the chain to be sequenced before sequencing formally starts. This requires the introduction of motor protein spacer structures during the construction of the library to be sequenced so as to achieve the above purpose. There remains a need in the art for efficient motor protein spacer structures.
The present application relates to an innovative study on efficient motor protein spacer structures.
For one object of the present invention, a target nucleic acid molecule to be sequenced comprises at its end a modified nucleotide capable of blocking a motor protein, thereby achieving the blocking effect against the motor protein and obtaining better single-molecule sequencing data.
For another object of the present invention, an end of a target nucleic acid molecule is modified during the end phosphorylation or end-repair process of the target nucleic acid molecule, thereby obtaining a nucleic acid molecule comprising at its end a modified nucleotide capable of blocking a motor protein.
For another object of the present invention, the detection of an adapter sequence after a spacer is avoided during sequencing, thereby further simplifying the biological information analysis process.
For further another object of the present invention, the motor protein blocking efficiency is effectively improved, thereby enhancing the adapter recognition signal.
In a first aspect, the present invention provides a nucleic acid molecule capable of blocking a motor protein, the nucleic acid molecule comprising at its end a modified nucleotide capable of blocking the motor protein.
In one embodiment, the nucleic acid molecule is DNA or RNA.
In one embodiment, the nucleic acid molecule comprises at its 5′ end a modified nucleotide capable of blocking a motor protein. Preferably, the nucleotide at the 5′ end of the nucleic acid molecule is phosphorylated by a methylphosphate group to form a modified nucleotide capable of blocking a motor protein.
In one embodiment, the nucleic acid molecule comprises at its 3′ end one or more modified nucleotides capable of blocking motor proteins.
In one embodiment, the nucleic acid molecule comprises at both its 5′ end and 3′ end modified nucleotides capable of blocking motor proteins.
In one embodiment, the modified nucleotide capable of blocking a motor protein refers to a nucleotide bearing a group capable of blocking a motor protein, and such group capable of blocking a motor protein may be selected from, but not limited to: alkyl, fluorophore, streptavidin and/or biotin, cholesterol, methylene blue, dinitrophenol (DNP), digoxigenin ligand and/or anti-digoxigenin ligand, dibenzocyclooctynyl, etc. In a preferred embodiment, such group capable of blocking a motor protein may be alkyl, for example, but not limited to, methyl, ethyl, propyl, isopropyl, silylmethyl or boranyl. Preferably, alkyl can replace a negatively charged oxygen atom on the phosphate group in the nucleotide, and other groups capable of blocking a motor protein can be connected to the base, phosphate, ribose or deoxyribose.
In one embodiment, the a (alpha)-phosphate of the modified nucleotide located at the end of the nucleic acid molecule is modified. Preferably, one negatively charged oxygen atom in the α-phosphate is substituted by the above-mentioned group capable of blocking a motor protein. In a more preferred embodiment, one negatively charged oxygen atom in the α-phosphate is substituted by alkyl, more preferably by methyl, ethyl, propyl, isopropyl, silylmethyl or boranyl.
In a preferred embodiment, the modified nucleotide can be selected from, but not limited to: ribonucleotides or deoxyribonucleotides with alkyl (e.g., methyl, ethyl, propyl, isopropyl, silylmethyl or boranyl) substitution on the α-phosphate; ribonucleotides or deoxyribonucleotides with nucleoside modification, for example, 3-methyladenine nucleotide, 7-methylguanosine nucleotide, 1, N6-ethenoadenine nucleotide, hypoxanthine nucleotide, uracil nucleotide, etc.; or ribonucleotides or deoxyribonucleotides with modification at the sugar ring, for example, locked nucleotides, peptide nucleotides or threose nucleotides, etc.
In one embodiment, the modified nucleotide contained at the end of the nucleic acid molecule can be represented by Formula I:
In a preferred embodiment, the modified nucleotide contained at the end of the nucleic acid molecule can be represented by Formula II:
In other words, in the nucleic acid molecule, one negatively charged oxygen atom in the α-phosphate of the modified nucleotide (Formula II) contained at its end is substituted by methyl.
In a preferred embodiment, the modified nucleotide contained at the end of the nucleic acid molecule may be adenosine methylphosphate, as shown in Formula III:
It is easy for those skilled in the art to understand that although Formula I, Formula II and Formula III show deoxyribose monophosphate nucleosides, ribose monophosphate nucleosides with similar structures are also encompassed in the scope of the present invention. For example, when the nucleic acid molecule is RNA, the modified nucleotide contained at its end is correspondingly a ribose monophosphate nucleoside in which one negatively charged oxygen atom in the α-phosphate is replaced by an R group. The corresponding structures are shown in Formula l′, Formula II′, and Formula III′, respectively:
In Formula I′, Formula II′ and Formula III′, the definitions of Base and R are the same as those in Formula I, Formula II and Formula III, respectively.
In one embodiment, the nucleic acid molecule containing at its end a modified nucleotide capable of blocking a motor protein is used for single-molecule sequencing, for example, for nanopore sequencing.
In one embodiment, the nucleic acid molecule containing at its end a modified nucleotide capable of blocking a motor protein constitutes a library, for example, a sequencing library. This sequencing library can be used for single-molecule sequencing, for example, for nanopore sequencing.
In one embodiment, when a nucleic acid molecule containing at its 5′ end a modified nucleotide capable of blocking a motor protein is used for single-molecule sequencing, for example, for nanopore sequencing, a motor protein with movement direction from 5′ to 3′, for example, but not limited to, a T4 bacteriophage DNA helicase Dda and mutant thereof (hereinafter referred to as Dda), Deinococcus radiodurans DNA helicase RecD2 (hereinafter referred to as RecD2), or human DNA helicase Pif1 (hereinafter referred to as Pif1), is used.
In one embodiment, when a nucleic acid molecule containing at its 3′ end a modified nucleotide capable of blocking a motor protein is used for single-molecule sequencing, for example, for nanopore sequencing, a motor protein with movement direction from 3′ to 5′ direction, for example, but not limited to, Escherichia coli DNA helicase Rep and its mutant Rep-X, Drosophila DNA helicase RecQ4 (hereinafter referred to as RecQ4), or Methanothermobacter thermautotrophicus DNA helicase Hel308 (hereinafter referred to as Hel308) is used.
In one embodiment, when a nucleic acid molecule respectively containing at its 5′ end and 3′ end modified nucleotides capable of blocking a motor protein is used for single-molecule sequencing, for example, for nanopore sequencing, a motor protein with movement direction from 5′ to 3′ can be used, and a motor protein with movement direction from 3′ to 5′ can also be used. Those skilled in the art can select appropriate motor proteins based on actual needs.
Those skilled in the art should understand that examples of motor proteins are known in the art, and those skilled in the art can select appropriate motor proteins according to actual needs.
In other embodiments, the present invention provides a sequencing library, the sequencing library comprises at least one nucleic acid molecule capable of blocking a motor protein, wherein the nucleic acid molecule comprises at its end a modified nucleotide capable of blocking a motor protein. Wherein, the “nucleic acid molecule capable of blocking a motor protein” and “modified nucleotide” are as defined above.
In a second aspect, the present invention relates to a method for preparing the nucleic acid molecule capable of blocking a motor protein according to the first aspect, the method comprising a step of making the target nucleic acid molecule contain at its end a modified nucleotide capable of blocking a motor protein. The nucleic acid molecule may be DNA or RNA.
In one embodiment, the modified nucleotides capable of blocking a motor protein refer to a nucleotide bearing a group capable of blocking a motor protein. Such group capable of blocking a motor protein may be selected from, but not limited to: alkyl, fluorophore, streptavidin and/or biotin, cholesterol, methylene blue, dinitrophenol (DNP), digoxigenin ligand and/or anti-digoxigenin ligand, dibenzocyclooctynyl, etc. In a preferred embodiment, such group capable of blocking a motor protein may be alkyl, for example, but not limited to, methyl, ethyl, propyl, isopropyl, silylmethyl or boranyl.
In one embodiment, the a (alpha)-phosphate of the modified nucleotide located at the end of the nucleic acid molecule is modified. Preferably, one negatively charged oxygen atom in the α-phosphate is substituted by the above-mentioned group capable of blocking a motor protein. In a more preferred embodiment, one negatively charged oxygen atom in the α-phosphate is substituted by alkyl, more preferably by methyl, ethyl, propyl, isopropyl, silylmethyl or boranyl.
In a preferred embodiment, the modified nucleotide may be selected from, but not limited to: ribonucleotides or deoxyribonucleotides with alkyl (e.g., methyl, ethyl, propyl, isopropyl, silylmethyl or boranyl) substitution on the α-phosphate; ribonucleotides or deoxyribonucleotides with nucleoside modification, for example, 3-methyladenine nucleotide, 7-methylguanosine nucleotide, 1, N6-ethenoadenine nucleotide, hypoxanthine nucleotide, uracil nucleotide, etc.: or ribonucleotides or deoxyribonucleotides in which the sugar ring is modified, for example, locked nucleotides, peptide nucleotides or threose nucleotides, etc.
In one embodiment, the modified nucleotide contained at the end of the nucleic acid molecule can be represented by Formula I, preferably Formula II, more preferably Formula III, wherein Formula I, Formula II and Formula III are all as defined in the first aspect; or can be represented by Formula I′, preferably Formula II′, more preferably Formula III′, wherein Formula I′, Formula II′ and Formula III′ are all as defined in the first aspect.
In one embodiment, the method of generating the nucleic acid molecule capable of blocking a motor protein comprises: introducing a phosphate group bearing a group capable of blocking a motor protein into the 5′-end nucleotide of the target nucleic acid molecule through 5′-end phosphorylation, wherein the group capable of blocking a motor protein may be selected from, but not limited to: alkyl, fluorophore, streptavidin and/or biotin, cholesterol, methylene blue, dinitrophenol (DNP), digoxigenin ligand and/or anti-digoxigenin ligand, dibenzocyclooctynyl, etc. In a preferred embodiment, such group capable of blocking a motor protein may be alkyl, for example, but not limited to, methyl, ethyl, propyl, isopropyl, silylmethyl or boranyl.
In a preferred embodiment, in the 5′-end phosphorylation reaction, the target nucleic acid molecule is reacted with a modified NTP represented by Formula IV (as a reaction substrate), so that the phosphate group with R group at the γ-position of the modified NTP represented by Formula IV is transferred to the 5′-end nucleotide of the target nucleic acid molecule,
In a more preferred embodiment, R is methyl. In the most preferred embodiment, R is methyl and Base is adenine (A), that is, the modified NTP represented by Formula IV is γ-methylphosphate ATP represented by Formula V:
Those skilled in the art can easily understand that although Formula IV and Formula V represent ribose nucleoside triphosphates, deoxyribose nucleoside triphosphates with similar structures are also within the scope of the present invention. The corresponding structures are represented by Formula IV′ and Formula V′, respectively:
In a preferred embodiment, 5′-end phosphorylation occurs in the presence of a polynucleotide kinase (PNK).
In a more preferred embodiment, the polynucleotide kinase (PNK) can be selected from, but is not limited to: nucleic acid kinases, phosphotransferases or phosphatases. For example, the polynucleotide kinase may be a T4 phage-derived polynucleotide kinase, a human-derived polynucleotide kinase 3′-phosphatase, a human-derived polynucleotide 5′-hydroxykinase NOL9, or a human-derived polynucleotide 5′-hydroxykinase Clp1.
In one embodiment, the reaction substrate of the 5′-end phosphorylation reaction is preferably represented by Formula IV, more preferably Formula V; or may be preferably represented by Formula IV′, more preferably Formula V′.
In one embodiment, the method for generating the nucleic acid molecule capable of blocking a motor protein comprises: introducing a modified nucleotide into the 3′ end of the target nucleic acid molecule through a polymerase in a step of performing end-repair of the target nucleic acid molecule and/or a step of adding A to the 3′ end of the target nucleic acid molecule.
Those skilled in the art should understand that the “step of performing end-repair” can repair the 5′ end and 3′ end of the nucleic acid molecule, and can also repair the nick position in the nucleic acid molecule; the “step of adding A at 3′ end” is a routine operation in this field, which is not limited to introducing dAMP at the 3′ end of the nucleic acid molecule, but can also introduce dNMP (N=C, G or T) that is different from dAMP, and can change the corresponding bases (G, C, A) on the adapter without affecting the effect.
Preferably, in the step of performing end-repair of the target nucleic acid molecule and/or the step of adding A to the 3′ end of the target nucleic acid molecule, the target nucleic acid molecule is reacted with the modified dNTP represented by Formula VI through a polymerase, thereby introducing the modified dNMP into the 3′ end of the target nucleic acid molecule, in which the phosphorus atom of the modified dNMP is connected with the R group capable of blocking a motor protein (also known as R group-modified adenosine phosphate):
In a preferred embodiment, the modified dNTP is dATP in which one negatively charged oxygen atom of the α-phosphate group is substituted by methyl. That is, the modified dNTP is α-methyl phosphate dATP represented by the following Formula VII:
In one embodiment, during the step of performing end-repair of the target nucleic acid molecule, one or more modified nucleotides can be transferred to the 3′ end of the target nucleic acid sequence by a polymerase. Preferably, in the step of performing end-repair, one or more modified nucleotides can be introduced into the 3′ end of the target nucleic acid sequence in a template-dependent manner. Alternatively, multiple modified nucleotides can also be introduced into the 3′ end of the target nucleic acid sequence in a template-independent manner by a terminal transferase.
In one embodiment, during the step of adding A to the 3′ end of the target nucleic acid molecule, one modified nucleotide can be transferred to the 3′ end of the target nucleic acid sequence by a polymerase.
The polymerase may be, but is not limited to: nucleic acid polymerases, transcriptases, reverse transcriptases or terminal transferases. For example, the polymerase may be Thermococcus gorgonarius (Tgo) DNA polymerase or mutant thereof (e.g., PGV2), Thermus aquaticus (Taq) polymerase I, or Escherichia coli polymerase I Klenow fragment, but is not limited thereto. Examples of polymerases are known in the art, and those skilled in the art can select a suitable polymerase according to actual needs.
In one embodiment, during the step of performing end-repair and/or the step of adding A to the 3′ end, multiple modified nucleotides can be transferred to the 3′ end of the target nucleic acid sequence using a polymerase, thereby achieving better blocking effect against the motor protein.
Alternatively, the nucleic acid molecule capable of blocking a motor protein can also be generated by chemical synthesis. For example, for a short target nucleic acid molecule, chemical synthesis can be used to generate a nucleic acid molecule containing at its end (for example, at its 5′ end, at its 3′ end, or at both its 3′ and 5′ ends) one or more modified nucleotides. The modified nucleotide may be represented by Formula I, preferably Formula II, more preferably Formula III, wherein Formula I, Formula II and Formula III are all as defined in the first aspect.
In a third aspect, the present invention provides a method for constructing a sequencing library capable of blocking motor proteins, the sequencing library comprising at least one nucleic acid molecule capable of blocking motor proteins according to the first aspect, wherein the method comprises a step of making a target nucleic acid molecule contain at its end (i.e., its 5′ end or 3′ end, or both its 5′ and 3′ ends) a modified nucleotide capable of blocking a motor protein during the construction of the sequencing library. The nucleic acid molecule may be DNA or RNA.
Those skilled in the art can understand that the method of generating the nucleic acid molecule capable of blocking a motor protein as described in the second aspect of the present invention can be used in the method for constructing a sequencing library, thereby obtaining the sequencing library comprising a nucleic acid molecule capable of blocking a motor protein. The sequencing library can be used for single-molecule sequencing, for example, for nanopore sequencing.
In one embodiment, when the nucleic acid molecule in the sequencing library comprises at its 5′ end a modified nucleotide capable of blocking a motor protein and the sequencing library is used for single-molecule sequencing, for example, for nanopore sequencing, a motor protein with movement direction from 5′ to 3′ is used, which is, for example, but not limited to, Dda or its mutant, recD2, or Pif1.
In one embodiment, when the nucleic acid molecule in the sequencing library comprises at its 3′ end a modified nucleotide capable of blocking a motor protein and the sequencing library is used for single-molecule sequencing, for example, for nanopore sequencing, a motor protein with movement direction from 3′ to 5′ is used, which is, for example, but not limited to, Rep or its mutant Rep-X, RecQ4, or Hel308.
Those skilled in the art should understand that examples of motor proteins are known in the art, and those skilled in the art can select appropriate motor proteins according to actual needs.
In a fourth aspect, the present invention relates to a use of the nucleic acid molecule capable of blocking a motor protein of the first aspect or the library comprising the same. The nucleic acid molecule capable of blocking a motor protein or the library comprising the same can be used in the field of single-molecule sequencing. For example, single-molecule nucleic acid sequence sequencing can be performed by incubating the nucleic acid molecule or the library containing the same with motor proteins, and adding to an electrophysiological detection system containing nanopore proteins.
The nucleic acid molecule capable of blocking a motor protein or the library containing the same can also be used to detect motor protein activity. For example, by incubating the nucleic acid molecule capable of blocking a motor protein or the library containing the same with the motor protein, the nucleic acid molecule capable of blocking the motor protein can be used as a negative control for the identification of motor protein activity since the motor protein is incapable to unwind the nucleic acid molecule capable of blocking the motor protein.
In a fifth aspect, the present invention relates to a single-molecule sequencing method, preferably a nanopore sequencing method, in which the method is performed by using the nucleic acid molecule capable of blocking a motor protein of the first aspect of the present invention or the library comprising the same.
In one embodiment, the single-molecule sequencing method comprises:
For the motor protein and/or sequencing adapter, as well as suitable single-molecule sequencing devices used in the single-molecule sequencing method, those skilled in the art can make appropriate selections based on actual needs.
In one embodiment, the single-molecule sequencing method comprises:
In one embodiment, the single-molecule sequencing method comprises:
In another embodiment, the single-molecule sequencing method comprises:
In another embodiment, the single-molecule sequencing method comprises:
Those skilled in the art will understand that when the 5′ end and 3′ end of the target nucleic acid molecule respectively contain modified nucleotides capable of blocking motor proteins, in single-molecule sequencing (e.g., nanopore sequencing) a motor protein that moves in the 5′ to 3′ direction can be used, and a motor protein that moves in the 3′ to 5′ direction can also be used. Those skilled in the art can select appropriate motor proteins based on actual needs.
Compared with a sequencing sequence or sequencing library that does not contain at its end the modified nucleotide, the nucleic acid molecular or sequencing library of the present invention that contains at its end the modified nucleotide capable of blocking a motor protein can avoid detecting an adapter sequence after a spacer during sequencing, which simplifies the biological information analysis and processing process, reduces the crosstalk of adapter sequences to subsequent analysis, and enhances the signal intensity of special sequences, thereby providing help for base identification tools in better determining the starting point of a sequencing sequence.
The features and advantages of the present invention will become more apparent by referring to the accompanying drawings in conjunction with the disclosure of the present application.
The term “nucleic acid” as used herein generally refers to a molecule containing one or more nucleic acid subunits. The nucleic acid may include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), or variants thereof. Nucleotides may comprise A, C, G, T, or U, or variants thereof. Nucleotides may comprise any subunit that can be incorporated into a growing nucleic acid chain. Such subunits may be A, C, G, T, or U, or any other subunits that are specific for one or more complementary A, C, G, T, or U, or complementary to purine (i.e., A or G, or variant thereof) or pyrimidine (i.e., C, T, or U, or variant thereof). Subunits enable individual nucleic acid bases or groups of bases (e.g., AA, TA, AT, GC, CG, CT, TC, GT, TG, AC, CA, or uracil counterparts thereof) to be resolved. In some examples, the nucleic acid is deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), or derivative thereof. The nucleic acid can be single-stranded or double-stranded.
The term “modified nucleotide” as used herein refers to a nucleotide bearing a modified group capable of blocking a motor protein. Preferably, such modifying group may include, but is not limited to: alkyl, fluorophore, streptavidin and/or biotin, cholesterol, methylene blue, dinitrophenol (DNP), digoxigenin ligand and/or anti-digoxigenin ligand, dibenzocyclooctynyl, etc.
The term “polymerase” as used herein generally refers to any enzyme capable of catalyzing the polymerization of bases. Examples of polymerases include, but are not limited to, nucleic acid polymerases, transcriptases, reverse transcriptases, or terminal transferases. For example, the polymerase may be Thermococcus gorgonarius (Tgo) polymerase or mutant thereof, Thermus aquaticus (Taq) polymerase I, or Escherichia coli polymerase I Klenow fragment, but is not limited thereto. Examples of polymerases are known in the art, and those skilled in the art can select a suitable polymerase according to actual needs.
The term “polynucleotide kinase (PNK)” as used herein generally refers to any enzyme capable of catalyzing the transfer of γ-phosphate group of a ribonucleotide to the 5′-end hydroxyl group of a target oligonucleotide. Examples of polynucleotide kinases include, but are not limited to, nucleic acid kinases, phosphotransferases, or phosphatases. For example, the polynucleotide kinase may be a T4 phage-derived polynucleotide kinase, a human-derived polynucleotide kinase 3′-phosphatase, a human-derived polynucleotide 5′-hydroxykinase NOL9,or a human-derived polynucleotide 5′-hydroxykinase Clp1. Examples of polynucleotide kinases are also known in the art, and those skilled in the art can select appropriate polynucleotide kinases according to actual needs.
The term “motor protein” as used herein generally refers to a protein that can directly or indirectly bind to a nucleic acid and drive the nucleic acid molecule to pass through a nanopore via hydrolyzing the energy-supplying molecule. Examples of motor proteins include, but are not limited to, helicases, polymerases, ligases, or exonucleases. For example, the motor protein may be a helicase.
As used herein, the term “helicase” generally refers to an enzyme that binds to single-stranded nucleic acids and breaks the hydrogen bonds between double-stranded nucleic acid molecules by hydrolyzing the high-energy phosphate bonds of ATP molecules, thereby unwinding the double-stranded nucleic acid molecules to form single-stranded nucleic acid molecules. Examples of helicases include, but are not limited to, helicases that move in the direction from 3′ to 5′ of nucleic acid (e.g., helicase Rep and its mutant Rep-X), or helicases that move in the direction from 5′ to 3′ of nucleic acid (e.g., helicase Dda and its mutants).
In the present application, the terms “motor protein” and “helicase” are used interchangeably. Moreover, examples of motor proteins and/or helicases are known in the art, and those skilled in the art can select appropriate motor proteins and/or helicases according to actual needs.
The term “nanopore” as used herein generally refers to a pore, channel or pathway formed in a membrane or otherwise provided. The membrane may be an organic membrane, such as a lipid bilayer, or a synthetic membrane, such as a membrane formed from a polymeric material. The membrane can also be of polymeric material. Nanopores may be disposed adjacent or close to sensing circuits or electrodes coupled to sensing circuits, such as, for example, complementary metal oxide semiconductors (CMOS) or field effect transistor (FET) circuits. In some examples, nanopores have a characteristic width or diameter from about 0.1 nanometer (nm) to about 1000 nm. Some nanopores are proteins.
Taking the library construction process of Oxford Nanopore Company as an example to illustrate the existing technical solutions in this field. The process starts with genome DNA extraction, genome fragmentation is an optional operation, followed by end-repair and A-tailing at the 3′ end, adapter ligation is performed, then tether sequence (tether) attachment (optional operation) is performed, and the library construction is completed (
In the library used for nanopore sequencing, in addition to conventional nucleic acid molecules, the adapter sequence mainly contains three special functional regions (
However, in existing nanopore sequencing, the motor protein will first stay at the spacer, and then when the library moves to the nanopore, the spacer will pass through the active area of the motor protein under the action of electric field force, thereby losing the blocking effect. The motor protein then begins to unwind normally, initiating sequencing. This means that existing technical means inevitably need to detect an adapter sequence after the spacer during sequencing, and this sequence may cause crosstalk to subsequent information analysis results.
The embodiments of the present invention will be described in detail below with reference to examples, but those skilled in the art will understand that the following examples are only used to illustrate the present invention and should not be regarded as limiting the scope of the present invention. If the specific conditions are not specified in the examples, the conditions should be carried out according to the conventional conditions or the conditions recommended by the manufacturer. If the manufacturer of the reagents or instruments used is not indicated, they are all conventional products that can be purchased commercially.
In this example, the Ad1 adapter sequence was prepared by annealing the chemically synthesized SEQ ID NO.1 and SEQ ID NO.2.
In this example, helicase Dda (SEQ ID NO.4) was prepared by recombinant expression in Escherichia coli, and this helicase was used as a motor protein.
In this example, a fragment of pUC57 plasmid (SEQ ID NO.5) was obtained by enzymatic digestion and used as a target nucleic acid sequence for sequencing, γ-methylphosphate ATP (represented by Formula V) was used as a reaction substrate, and thus a methylphosphate group was introduced on the 5′-end nucleotide of the target nucleic acid sequence for sequencing (i.e., the 5′-end nucleoside was shown in Formula II) by a phosphorylation reaction catalyzed by a polynucleotide kinase. Then, the obtained phosphorylated end-repair product was ligated to Ad1 prepared in Example 1, and then bound to the helicase Dda prepared in Example 2.
In this example, a nanopore detection platform was constructed based on the patch clamp platform, and used to perform nanopore sequencing on the target sequencing library prepared in Example 3 (the 5′ end was phosphorylated by methylphosphate group) to verify the advantages of the sequencing library that was constructed by the present application and capable of blocking a motor protein in nanopore sequencing.
The alignment results in
Comparing the results of
In Examples 5-9, polymerases were used to introduce adenosine methylphosphate at the 3′ end of the target sequencing nucleic acid molecule to construct a sequencing library, and nanopore sequencing was performed to verify the sequencing effect, which involved the preparation of compatible adapters, motor proteins and polymerases.
In this example, helicase Rep-X was prepared by recombinant expression in Escherichia coli and used as a motor protein.
A full-length Rep-X cDNA sequence was purchased from Sangon Biotech and ligated into a pET-28a(+) plasmid, and the used double enzyme cleavage sites were Nde1 and Xho1, so that the N-terminal of the expressed Rep-X protein had 6 *His tag and thrombin restriction site.
The cloned pET-28a(+)-Rep-X plasmid was transformed into Escherichia coli expression strain BL21(DE3). A single colony was picked, inoculated into 5 mL of LB medium containing kanamycin, and cultured overnight at 37° C. with shaking. Then it was transferred and inoculated into 1 L of LB, cultured with shaking at 37° C. until OD600=0.6-0.8, cooled to 16° C., and added with IPTG to a final concentration of 500 μM to induce expression overnight.
Five buffer solutions were prepared according to the following formulas:
The expressed Rep-X bacterial cells were collected, the bacterial cells were resuspended in Buffer A′, the bacterial cells were disrupted with a cell disruptor, and then centrifuged to take the supernatant. The supernatant was mixed with a Ni-NTA packing that had been previously equilibrated with Buffer A′, and binding was performed for 1 hour. The packing was collected and washed extensively with Buffer A′ until no impurity protein was washed out. Then Buffer B′ was added to the packing to elute Rep-X. The eluted Rep-X protein was passed through a HiTrap Desalting column (Cytiva, 29048684) that had been previously equilibrated with Buffer C′, so that the buffer was replaced. Then an appropriate amount of thrombin (Yisheng Biotech, 20402ES05) was added, and enzymatic digestion was carried out overnight at 4° C. The digested Rep-X was purified using a HiTrap Q FF anion exchange column (Cytiva, 17505301). The Q column was equilibrated with Buffer C′, then the sample was loaded. After sufficiently washing with Buffer C′, Rep-X was separated and purified by running in a continuous gradient from Buffer C′ to Buffer D′. The protein purified by the Q column was concentrated and loaded onto a molecular sieve Superdex 200 (Sigma, GE28-9909-44), in which the molecular sieve buffer used was Buffer E′. The molecular sieve peak diagram and SDS-PAGE results of the finally purified protein were shown in
In this example, polymerase mutant PGV2 (SEQ ID NO.13) was prepared by recombinant expression in Escherichia coli.
PGV2 was a reported Thermococcus gorgonarius polymerase mutant with the ability to polymerize methylphosphate dNTP (Sebastian Arangundy-Franklin et al, Nature Chemistry, vol. 11, pages 533-542 (2019), see https://doi.org/10.1038/s41557-019-0255-4). The full-length cDNA sequence of polymerase mutant PGV2 was purchased from Sangon Biotech, and ligated into pET-28a(+) plasmid. The used double enzyme cleavage sites were Nco1 and Xho1, so that the C-terminal of the expressed protein had a 6*His tag.
The cloned pET-28a(+)-PGV2 plasmid was transformed into Escherichia coli expression strain BL21(DE3) or derivative bacteria thereof. A single colony was picked, inoculated into 5 mL of LB medium containing kanamycin, and cultured overnight at 37° C. with shaking. Then it was transferred and inoculated into 1 L of LB, cultured with shaking at 37° C. until OD600-0.6-0.8, cooled to 16° C., and added with IPTG to a final concentration of 500 μM to induce expression overnight.
Five buffer solutions were prepared according to the following formulas:
The expressed PGV2 bacterial cells were collected, the bacterial cells were resuspended in Buffer F, the bacterial cells were disrupted with a cell disruptor, and then centrifuged to take the supernatant. The supernatant was mixed with a Ni-NTA packing that had been previously equilibrated with Buffer F, and binding was performed for 1 hour. The packing was collected and washed extensively with Buffer F until no impurity protein was washed out. Buffer G was then added to the packing to elute PGV2. The eluted PGV2 protein was passed through a HiTrap Desalting column that had been previously equilibrated with Buffer H, so that the buffer was replaced. Then it was loaded onto a HiTrap Q FF anion exchange column that had been equilibrated with Buffer H. After being fully washed with Buffer H, elution was carried out by running in a continuous gradient from Buffer H to Buffer I. The purified protein was concentrated and loaded onto a desalting column, in which the buffer used was Buffer J. The anion column elution and SDS-PAGE results of the finally purified protein were shown in
In this example, SEQ ID NO.9 and SEQ ID NO.10 were annealed and ligated to obtain a 59 bp sequence, which was used as a target sequencing nucleic acid molecule to construct a target sequencing library for nanopore sequencing. The polymerase mutant PGV2 obtained in Example 7 was used to introduce one adenosine methylphosphate (shown in Formula III) at the 3′ end of the target sequencing nucleic acid sequence. Then, the obtained phosphorylated end-repair product was ligated to the Adn1 adapter prepared in Example 5, and then bound to the helicase Rep-X prepared in Example 6.
In this example, a nanopore detection platform was constructed based on a patch clamp platform, and nanopore sequencing was performed on the target sequencing sequence prepared in Example 8 (one adenosine methylphosphate was introduced at the 3′ end) to verify the advantages of the sequencing library, which was constructed in the present application and capable of blocking a motor protein, in nanopore sequencing.
Although various embodiments of the present invention have been shown and described herein, it will be apparent to those skilled in the art that such embodiments are for illustrative purposes only. Various modifications, variations and substitutions will be apparent to those skilled in the art without departing from the present invention. Those skilled in the art will appreciate that various alternatives to the embodiments described herein may be adopted.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/CN2021/143658 | 12/31/2021 | WO |