This disclosure relates to biomolecules engineered for integrating into electronic circuits for biopolymer sensing/identification or sequencing.
Polymeric macromolecules commonly found in biological systems generally comprise a defined set of building blocks linked in a specific order, so-called sequence. The sequence defines a polymer's three-dimensional structure and its functions in a biological system. In the case of proteins, this function can be an enzymatic reaction or binding event; in the case of carbohydrates, this function can be a recognition element. In the case of nucleic acids, this function can be a carrier of heritable information. Therefore, an accurate determination of the sequence of a polymeric macromolecule is critical to understanding its functions.
In the specific case of nucleic acids, the first generation of deoxyribonucleic acid (DNA) sequencing technology (“Sanger sequencing”) employed a method analyzing polymerization products from enzymatic reactions performed in bulk solution [1]. Read lengths for this technology under ideal conditions can reach 1000 base pairs (bp) or more. This approach was used for the international Human Genome Project, taking over ten years and costing ˜2.7 billion US dollars to generate the first human genome sequence [2, 3]. This technology is unfeasible as a tool for a large scale of genomics, albeit suitable for sequencing small genetic elements such as circular plasmids. The next generation sequencing (NGS) technologies were developed towards a $1000 genome and have reduced the cost and time to sequence a human genome [4, 5]. However, NGS is hindered by complicated structure variations and repeat sequences in the human genome due to its short read length.
Furthermore, because NGS is less accurate than Sanger sequencing, it more often requires deep sequencing, especially for determining mutations. An NGS variation that uses labeled enzymes instead of labeled nucleotides still only produces short reads [6].
To this end, the third generation sequencing technologies have been developed, which decode nucleic acids at the single-molecule level. For example, Pacific Biosciences sequencing platforms use zero-mode waveguides (ZMW), which detect fluorescence signals emitted by individual incorporation events [7]. This technology can read long DNA sequences, but it suffers from relatively high error rates. Thus, a sequencing platform with greater accuracy, more straightforward analysis, and lower deployment costs is desired in a wide range of applications, including personalized medicine and epidemiology.
Other approaches employ nanopores for sequencing. Biological nanopores, such as those marketed by Oxford Nanopore Technologies, use transmembrane protein pores [8]. Although this technology offers increased read lengths, it also suffers from low accuracy and is therefore often used in conjunction with NGS. Biological nanopore chips are costly to manufacture, hindering affordable sequencing and widespread deployment.
Solid-state nanopores in inorganic materials created by semiconductor technologies can be produced massively in a cost-effective fashion [9]. However, the geometry of solid-state nanopores cannot be controlled as precisely as that of biological pores. Therefore, a sensing mechanism must be incorporated into the solid-state nanopore for sequencing instead of the measurement of ionic current.
Various arrangements of nanopores and biosensors have been described. One approach [10] is to use the biosensor to feed hybridization probes from nucleotide-triphosphate analogs into the nanopore to elicit a detectable response. Another method, described generally in [11], is to connect the two sides of the nanogap with a bridge molecule, which transmits conformational changes of the associated biosensor arising from nucleotide incorporation events. An ideal configuration of these components maximizes sensitivity and reproducibility.
The individual components of the system may also vary in composition. For example, bridges can comprise carbon nanotubes or a DNA nanowire, but the latter carries the distinct advantages of being chemically defined and functionalizable at discrete locations. However, the conductivity of a single DNA molecule is controversial, especially when its length exceeds 30 nm.
DNA polymerases from E. coli and bacteriophage phi29 (phi29 pol) are routinely employed as biosensors. Disclosures such as those found in [11], [12], [13], and [14] broadly cover myriad configurations of probes, biosensors, and linkers without detailed teaching on how to achieve them. For example, one suggested embodiment in [13] describes selective conjugation of a biosensor to probe using well-established thiol-maleimide coupling chemistry and acknowledges further that doing so would likely necessitate the removal of all other cysteine residues in the biosensor, which is not a trivial task. In the specific case of phi29pol, seven native cysteine residues would have to be mutated to other naturally occurred residues. Doing so presents a real challenge requiring considerable amounts of experiments, as enzymes are only marginally stable, and even a single point mutation can cause deleterious functional consequences [15, 16]. In other cases, cysteine residues are essential for enzyme structure or function. For example, papain protease employs a cysteine residue in its catalytic cycle [17]. As another example, antibodies commonly use disulfide bonds formed by cysteine residues to maintain their structures [18].
Disclosure [13] also describes an embodiment that employs genetically-incorporated unnatural amino acids to facilitate conjugation, citing “click chemistry” as a non-limiting example. Several different types of bio-compatible conjugation chemistries have been described; however, to determine which one(s) is best suited to link a biosensor to a probe requires a significant amount of experiments.
Another example of problematic disclosure is found in [12], whereby the biosensor is attached using multiple linkage points to facilitate enhanced sensitivity via close coupling of the biosensor to the probe. However, protein surfaces provide many potential sites for conjugation, and extensive experimentation is required to determine the optimum point or points of contact to the probe. Without a predetermined attachment point, one is not able to control the orientation of the enzymatic probe, which may have a remarkable effect on the probe function. Furthermore, increasing the number of attachment sites increases the configuration possibilities combinatorically.
Protein fusion tags have been used extensively in the prior art to enhance protein expression, solubility, and activity [19]. In the specific case of polymerases, fusing the protein Sso7d from Sulfolobus solfataricus has been shown to enhance the processivity of thermostable polymerases by maintaining the association with DNA [20]. In another example, glutathione-S-transferase (GST) has been fused to phi29pol to aid in purification, but doing so required the addition of trehalose to retain protein solubility [21]. Embodiments of polymerases that enhance expression and solubility without the use of such additives are desirable.
The present invention provides methods to engineer enzymes for their integration into a molecular nanowire as a functional component for biopolymer sequencing/identification. The said enzymes include but are not limited to DNA polymerase, RNA polymerase, DNA helicase, DNA ligase, DNA exonuclease, reverse transcriptase, RNA primase, ribosome, sucrase, or lactase, which are either natural, mutated, or synthesized.
The biopolymer includes but is not limited to DNA, RNA, oligonucleotides, protein, peptides, polysaccharides, etc., which are either natural or synthesized; and the molecular nanowire includes, but are not limited to a double-strand DNA (dsDNA or DNA duplex), a DNA duo (two dsDNA), a DNA nanostructure as disclosed in [24], or a combination thereof. The DNA duo is a simple DNA nanostructure and has an increased conductivity compared to a single DNA duplex. Below, we use the DNA duo and DNA polymerase to illustrate the method of engineering an enzyme. The same approach or principle applies to a single DNA duplex and a DNA nanostructure, sequencing and/or identifying different biopolymers using enzymes as sensors.
In one embodiment of the present invention, the said enzyme is an engineered DNA polymerase that carries unnatural amino acid residues containing an orthogonal functional group at two predefined positions (201,
In some embodiments of the present invention, the said enzyme is a wild-type DNA polymerase engineered with unnatural amino acids at the pre-select sites (702,
In some embodiments, the mutant DNA polymerase includes a fused, genetically-encoded protein conveying enhanced solubility and activity (701) (Sequence ID #1). The fused polymerase is engineered to contain only one or two cysteine residues (
In some embodiments, the fused polymerase is engineered by replacing some of its cysteines with selenocysteine (301,
In some embodiments, the unnatural amino acid used for protein engineering is a derivative of selenocysteine (shown in
In some embodiments, the said unnatural amino acid is a derivative of natural phenylalanine, which is incorporated into the said protein and mutants according to the cloning method stated in Methodology. Some of the phenylalanine derivatives are shown in
In some embodiments, the said unnatural amino acid is a derivative of natural lysine, which is incorporated into the said protein and mutants according to the cloning method stated in Methodology, Some of the lysine derivatives are shown in
In some embodiments, this invention provides a DNA duo to form a molecular junction as a medium for incorporating the said protein or a mutant and conveying the protein's movement to electrical signals. Each DNA duplex has one nucleoside functionalized (Nm), able to react with one of the said unnatural amino acids in the engineered protein or polymerase in the case of DNA/RNA sequencing, and two functional groups (Bm) at its two ends for attaching to the two electrodes at the nanogap respectively (
In some embodiments, the said DNA junction is a single DNA duplex (dsDNA), each strand of which has one nucleoside functionalized (Nm), able to react with the said noncanonical and unnatural amino acids engineered into the said protein or polymerase in the case of DNA/RNA sequencing, and one or two functional groups (Bm) at each end of the duplex for attaching to the two electrodes at the nanogap (
In some embodiments, the said DNA junction is a DNA nanostructure as disclosed in [24, 25] and two predefined locations in the nanostructure have nucleosides functionalized (Nm), able to react with the said noncanonical and unnatural amino acids engineered into the said protein or polymerase in the case of DNA/RNA sequencing, and one or two functional groups (Bm) at each end of the DNA nanostructure for attaching to the two electrodes at the nanogap (
In some embodiments, the double-stranded DNA has an amino function at one of its internal bases. For example, an amino group is situated at the 5-position of a pyrimidine base or the 7-position of a purine base. Some of these nucleosides are shown in
The aminated DNA is further functionalized with functional groups that can specifically react with the said unnatural amino acids engineered into the said protein or polymerase in the case of DNA/RNA sequencing. Some of which are shown in
In some embodiments of the present invention, the DNA duo generally comprises two double-stranded DNA with a length that can bridge two electrodes separated by a distance ranging from 3 to 50 nanometer. In some other embodiments, the DNA duo is replaced by two double-stranded RNA, PNA, XNA, or hybrids of DNA to RNA, DNA to PNA, DNA to XNA, RNA to PNA, RNA to XNA, or PNA to XNA.
In some embodiments, the sequence of a DNA duplex, either alone or being part of a DNA duo or a DNA nanostructure, contains at least 50% of GC base pairs with a length ranging from 10 to 150 base pairs. Besides the canonical bases, the DNA duplex also includes modified nucleobases and/or base analogs for improving its conductivity.
In some embodiments, the DNA duo comprises the palindromic double-stranded DNA that is formed spontaneously in solution from a single-stranded oligonucleotide with a self-complementary sequence. Both double-stranded DNA molecules in the DNA duo have the same symmetry without polarity along their helical axes. When the DNA duo is used as a molecular wire to bridge the nanogap, its two ends can be attached to either one of two electrodes, which would not cause electrical polarities.
Cloning. A gene cassette harboring sequences encoding a fusion protein and wild-type DNA polymerase from phi29 (phi29pol) was inserted into a T7-based plasmid such as pET21a and expressed in E. coli. Point mutations were made using PCR with oligonucleotide primers containing desired mutations [23]. The recombinant protein was purified using Ni-NTA agarose. Typical yields are approximately 30 mg per liter of culture (
Activity assay. In a typical, non-limiting reaction, enzyme (100 ng) is incubated in a buffered solution containing plasmid DNA (20 ng), dNTPs, and single-stranded DNA primer at 30° C. Products are digested with EcoRI, separated by agarose gel electrophoresis, and visualized by fluorescence (
DNA-functionalization with DBCO. In a typical, non-limiting reaction, single-stranded DNA containing an amino function (50 μM) is incubated with DBCO-PEG5-TFP ester (2.5 mM) in sodium tetraborate buffer (pH 9) overnight at 25° C. Any unreacted linker is removed by ethanol precipitation.
Macromolecule-enzyme conjugation. In a typical, non-limiting reaction, enzyme (30 μM) containing a p-azidophenylalanine residue is incubated in a buffered solution containing DBCO-conjugated macromolecules (150 μM) molecule at 20° C. (
Claimable items of this invention include, but not limited to, the following:
An embodiment is a DNA duplex or a DNA duo that bridges a nanogap between two electrodes. The said DNA duplex or DNA duo comprises:
An embodiment is a functional protein engineered to at least contain one of the above said noncanonical amino acid residues at predefined positions.
An embodiment is a functional protein engineered to contain two of the above said noncanonical amino acid residues at the predefined positions, and the said protein spontaneously and precisely forming covalent connections at two predefined positions on an engineered molecular wire.
An embodiment is a method to label enzymes with biomolecules and organic molecules.
An embodiment is the DNA duplex or DNA duo or DNA nanostructure internally carrying a nucleophile capable of reacting with the above said NHS, PFP, or TFP esters of functional molecules or other chemically active species.
An embodiment is a method to engineer DNA with different functional groups at predetermined locations.
All publications, patents, and other documents mentioned herein are incorporated by reference in their entirety.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as those commonly understood by one of ordinary skill in the art to which this invention belongs. While the present invention has been illustrated by a description of various embodiments and while these embodiments have been described in considerable detail, it is not the intention of the applicants to restrict or in any way limit the scope of the applications. Additional advantages and modifications will readily appear to those skilled in the art. The invention in its broader aspects is therefore not limited to the specific details, representative device, apparatus and method, and illustrative example shown and described. Accordingly, departures may be made from such details without departing from the spirit of the applicant's general inventive concept.
1. Smith L M, Sanders J Z, Kaiser R J, Hughes P, Dodd C, Connell C R, et al. Fluorescence detection in automated DNA sequence analysis. Nature. 1986; 321: 674-9.
2. Lander E S, Linton L M, Birren B, Nusbaum C, Zody M C, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001; 409: 860-921.
3. Venter J C, Adams M D, Myers E W, Li P W, Mural R J, Sutton G G, et al. The sequence of the human genome. Science. 2001; 291: 1304-51.
4. Margulies M, Egholm M, Altman W E, Attiya S, Bader J S, Bemben L A, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005; 437: 376-80.
5. Turcatti G, Romieu A, Fedurco M, Tairi A P. A new class of cleavable fluorescent nucleotides: synthesis and optimization as reversible terminators for DNA sequencing by synthesis. Nucleic Acids Res. 2008;36: e25.
6. Previte M J, Zhou C, Kellinger M, Pantoja R, Chen C Y, Shi J, et al. DNA sequencing using polymerase substrate-binding kinetics. Nat Commun. 2015; 6: 5936.
7. Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009; 323: 133-8.
8. Stoddart D, Heron A J, Mikhailova E, Maglia G, Bayley H. Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore. Proc Natl Acad Sci U S A. 2009; 106: 7702-7.
9. Dekker C. Solid-state nanopores. Nat Nanotechnol. 2007; 2: 209-15.
10. Mandell J G, Gunderson, Kevin L., Gundlach, Jens H. Compositions, systems, and methods for detecting events using tethers anchored to or adjacent to nanopores. The United States Patent Application No. 20190376135, 2019.
11. Merriman BLSD, Mola, Paul W., Biomolecular sensors and methods. The United States Patent Application No. 20180340220, 2018.
12. Merriman B L, Govindaraj V A, Mola P., Geiser T. ENZYMATIC CIRCUITS FOR MOLECULAR SENSORS. The United States Patent Application No. 20180305727, 2018.
13. Merriman B L, Govindaraj V A, Mola P., Geiser T., Costa G. BINDING PROBE CIRCUITS FOR MOLECULAR SENSORS. The United States Patent Application No. 20190004003, 2019.
14. Merriman B L S D, Mola P., Choi C. MOLECULAR SENSORS AND RELATED METHODS. The United States Patent Application No. 20190094175, 2019.
15. Matthews B W. Studies on protein stability with T4 lysozyme. Adv Protein Chem. 1995; 46: 249-78.
16. Yutani K, Ogasahara K, Tsujita T, Sugino Y. Dependence of conformational stability on hydrophobicity of the amino acid residue in a series of variant proteins substituted at a unique position of tryptophan synthase alpha subunit. Proc Natl Acad Sci USA. 1987; 84: 4441-4.
17. Klein I B, Kirsch J F. The activation of papain and the inhibition of the active enzyme by carbonyl reagents. J. Biol. Chem. 1969; 244: 5928-35.
18. Liu H, May K. Disulfide bond structures of IgG molecules: structural variations, chemical modifications and possible impacts to stability and biological function. MAbs. 2012; 4: 17-23.
19. Costa S, Almeida A, Castro A, Domingues L. Fusion tags for protein solubility, purification and immunogenicity in Escherichia coli: the novel Fh8 system. Front Microbiol. 2014; 5: 63.
20. Wang Y, Prosen D E, Mei L, Sullivan J C, Finney M, Vander Horn P B. A novel strategy to engineer DNA polymerases for enhanced processivity and improved performance in vitro. Nucleic Acids Res. 2004; 32: 1197-207.
21. Takahashi H, Yamazaki H, Akanuma S, Kanahara H, Saito T, Chimuro T, et al. Preparation of Phi29 DNA polymerase free of amplifiable DNA using ethidium monoazide, an ultraviolet-free light-emitting diode lamp and trehalose. PLoS One. 2014; 9: e82624.
22. Chen B, Long Q, Zhao Y, Wu Y, Ge S, Wang P, et al. Sulfone-Based Probes Unraveled Dihydrolipoamide S-Succinyltransferase as an Unprecedented Target in Phytopathogens. Journal of Agricultural and Food Chemistry. 2019; 67: 6962-9.
23. Liu H, Naismith J H. An efficient one-step site-directed deletion, insertion, single and multiple-site plasmid mutagenesis protocol. BMC Biotechnol. 2008; 8: 91.
24. Zhang P, Lei M, Devices, methods and chemical reagents for biopolymer sequencing. The U.S. Patent Application No. 62/794,096, 2019.
25. Zhang P, Krstic P, Lei M, Engineered DNA for molecular electronics, U.S. Patent Application No. 62/938,084, 2019
This application claims the benefit of U.S. Provisional Patent Application No. 62/968,929, filed Jan. 31, 2020, whichi is incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/015965 | 1/31/2021 | WO |
Number | Date | Country | |
---|---|---|---|
62968929 | Jan 2020 | US |