Provided herein are compositions, systems, and methods employing hyper-thermostable lysine-mutant ssDNA/RNA ligases that possesses both ssRNA ligase and ssDNA ligase activity. In certain embodiments, such hyper-thermostable lysine-mutant ssDNA/RNA ligases are used to ligate a first single-strand nucleic acid sequence with a 5′ adenylated end to a second single-strand nucleic acid sequence (e.g., at a temperature of at least 75° C.) to form a ligated nucleic acid sequence. In further embodiments, the ligated nucleic acid sequence is amplified and/or sequenced.
DNA or RNA ligases are divalent metal ion dependent enzymes that utilize ATP or NAD+ to catalyze phosphodiester bond formation between adjacent polynucleotide termini possessing a 3′-hydroxyl and a 5′-phosphate (Tomkinson et al., PNAS, 2006; Silber et al, PNAS, 1972). Depending on their origin of species, natural occurring DNA or RNA ligases may have many unique properties, for example, substrate specificity, sequence and domain organization, optimal reaction condition such as pH, co-factor dependence, temperature and salt tolerance etc. Their activities in vitro have also been exploited in numerous molecular biology protocols, making them critical tools for modern biotechnology.
All known DNA and RNA ligases perform the catalysis via a common pathway which involves three nucleotidyl transfer reactions (Lehman et al, Science, 1974; Lindahl et al, Annu Rev Biochem, 1992). In the case of ATP-dependent DNA or RNA ligases, the first step (step 1) involves the attack on the a-phosphate of ATP by ligase, which results in release of pyrophosphate and formation of a ligase-AMP intermediate. AMP is linked covalently to the amino group of a lysine residue within a conserved sequence motif. In the second step (step 2), the AMP nucleotide is transferred to the 5′-phosphate-terminated DNA or RNA strand to form a 5′-App-DNA/RNA intermediate. In the third and final step (step 3), attack by the 3′-OH strand on the 5′-App-DNA/RNA end joins the two polynucleotides and liberates AMP.
DNA ligases catalyze the formation of phosphodiester bonds at single-stranded nicks in double-stranded DNA. In vivo, this activity is critical for maintaining genomic integrity during DNA replication, DNA recombination, and DNA excision repair. Among them, T4 DNA ligase has been widely used in vitro to join double-stranded breaks with cohesive and blunt ends. RNA ligases are typically classified into at least two broad families. The Rnl1 family includes the T4 RNA ligase 1 (Rnl1) and the tRNA ligases from fungi, yeasts, and plants. These enzymes repair breaks in single-stranded RNA. The Rnl2 family is represented by the bacteriophage T4 RNA ligase 2 (T4Rnl2) and the RNA-editing ligases from the protozoans Trypanosoma and Leishmania (Ho and Shuman, PNAS, 2002). In vivo, these enzymes primarily seal nicks in double-stranded RNA. In vitro, T4Rnl2 and its mutants have been shown to join single-stranded RNA (Yin S et al, JBC, 2003), as well as between 3′-end of single-strand RNA and 5′-end of single-strand DNA adaptor (Viollet et al, 2011, BMC Biotechnol).
It has been observed that single-strand RNA ligases can sometimes accept single-strand DNA as substrate, albeit sometimes with lower ligation efficiency. For example, T4 RNA ligase 1 (T4Rnl1) has been known for a long time and used to ligate single-strand RNA (ssRNA), and ssDNA with a lower efficiency (Lau et al., 2001). Homologs within the T4 RNA ligase 1 family exhibit single-strand DNA ligation activity. For example, thermostable single-strand DNA ligase from Thermus scotoductus bacteriophage TS2126, which is a T4Rnl1 homolog, can ligate ssDNA at elevated temperature around 65° C. (Blondal et al, NAR, 2005) and was marketed as CircLigase by EpiCenter Biotechnologies (WO2004/027056A2). Another T4Rnl1 homolog from Rhodothermus marinus bacteriophage RM378 was also reported to ligate both ssRNA and ssDNA (Blondal et al, NAR, 2002). Due to its unique ssDNA ligation activity at relatively high temperature (˜65° C.), CircLigase quickly finds its application in many applications, such as in the process of library preparation of high-throughput next-generation sequencing (Gansauge M T, Nat. Protol., 2013).
Advancement in next-generation sequencing (NGS) has transformed biomedical and clinical research with an accelerating momentum. One of the most crucial families of enzymes used in the NGS library preparation workflow is ligases. The significance of the ligation step is that it attaches short pieces of DNA or RNA adaptors with known sequences to the ends of the unknown library sequences, thus provide priming sites to facilitate downstream steps such as PCR or primer extension. For DNA-sequencing applications, ligation step in the NGS library preparation workflow commonly occur between double-stranded library nucleic acids fragments and double-strand adaptor, but also possible between single-stranded nucleic acids fragments and 5′-phosphorylated single-stranded adaptor. The double-stranded ligation reaction is oftentimes catalyzed by T4 DNA ligase and is usually carried out at ambient or even lower (e.g., 16° C.) temperature. Single-stranded DNA ligation is currently catalyzed by enzymes such as CircLigase, with reaction temperature as high as 65° C. (Gansauge M T, Nat. Protol., 2013). In other applications such as small RNA sequencing, T4Rnl1 and T4Rnl2 mutants are commonly used to attached single-stranded DNA adaptor to single-strand RNA library fragments, usually at 37° C. or lower.
Provided herein are compositions, systems, and methods employing hyper-thermostable lysine-mutant ssDNA/RNA ligases that possesses both ssRNA ligase and ssDNA ligase activity. In certain embodiments, such hyper-thermostable lysine-mutant ssDNA/RNA ligases are used to ligate an first single stranded nucleic acid sequence with a 5′ adenylated end to a second single stranded nucleic acid sequence (e.g., at a temperature of at least 75° C.) to form a ligated nucleic acid sequence. In further embodiments, the ligated nucleic acid sequence is amplified and/or sequenced.
In some embodiments, provided herein are methods of ligating single-stranded nucleic acid comprising: a) combining in a reaction mixture: i) a hyper-thermostable lysine-mutant ssDNA/RNA ligase which is a mutated version of a precursor hyper-thermostable ssRNA ligase (e.g., a precursor ssRNA ligase shown in Table 2, or truncated or mutated versions thereof), wherein the precursor hyper-thermostable ssRNA ligase has a Motif I EKx(D/N/H)G (SEQ ID NO:32) and possess ssRNA ligase activity, but not ssDNA ligase activity, at a temperature of at least 75° C. (e.g., at least 75 . . . 85 . . . 95 . . . or 100° C., or between 75-100° C.), and wherein the hyper-thermostable lysine-mutant ssDNA/RNA ligase has an amino acid substitution at the K (lysine) in the Motif I, and possesses both ssRNA ligase and ssDNA ligase activity at a temperature of at least 75° C. (e.g., at least 75 . . . 85 . . . 95 . . . or or 100° C., or between 75-100° C.), ii) a first single stranded nucleic acid sequence with a 5′ end base that is adenylated (or becomes adenylated while in the reaction mixture), and iii) a second single stranded nucleic acid sequence with a 3′ end base; and b) incubating the reaction mixture at a temperature of at least 75° C. (e.g., at least 75 . . . 85 . . . 95 . . . or 100° C., or between 75-100° C.) such that the hyper-thermostable lysine-mutant ssDNA/RNA ligase ligates the 5′ adenylated end of the first single stranded nucleic acid sequence to the 3′ end of the second single stranded nucleic acid sequence to form a ligated nucleic acid sequence. In certain embodiments, the incubating the reaction mixture is at a temperature of at least 85° C. or at least 95° C. In particular embodiments, the hyper-thermostable lysine-mutant ssDNA/RNA ligase is an adenylation-deficient ATP-dependent ligase (i.e., a ligase that cannot form the AMP-ligase intermediate by reacting with ATP).
In particular embodiments, provided herein are systems and kits comprising: a) a hyper-thermostable lysine-mutant ssDNA/RNA ligase which is a mutated version of a precursor hyper-thermostable ssRNA ligase (e.g., a precursor ssRNA ligase shown in Table 2, or truncated or mutated versions thereof), wherein the precursor hyper-thermostable ssRNA ligase has a Motif I EKx(D/N/H)G (SEQ ID NO:32) and possess ssRNA ligase activity, but not ssDNA ligase activity, at a temperature of at least 75° C., and wherein the hyper-thermostable lysine-mutant ssDNA/RNA ligase has an amino acid substitution at the K (lysine) in the Motif I, and possesses both ssRNA ligase and ssDNA ligase activity at a temperature of at least 75° C., b) a first single stranded nucleic acid sequence with a 5′ end base that is adenylated; and c) a second single stranded nucleic acid sequence with a 3′ end base. In particular embodiments, the kits and systems further comprise a first container and a second container, and wherein the hyper-thermostable lysine-mutant ssDNA/RNA ligase is in the first container, and the first and/or second single-stranded nucleic acid sequence is in the second container.
In certain embodiments, provided herein are compositions comprising: a hyper-thermostable lysine-mutant ssDNA/RNA ligase which is a mutated version of a precursor hyper-thermostable ssRNA ligase (e.g., a precursor ssRNA ligase shown in Table 2 or truncated or mutated versions thereof), wherein the precursor hyper-thermostable ssRNA ligase has a Motif I EKx(D/N/H)G (SEQ ID NO:32) and possess ssRNA ligase activity, but not ssDNA ligase activity, at a temperature of at least 75° C. (e.g., at a temperature between 75-85 or 75-100° C.), and wherein the hyper-thermostable lysine-mutant ssDNA/RNA ligase has an amino acid substitution at the K (lysine) in the Motif I, and possesses both ssRNA ligase and ssDNA ligase activity at a temperature of at least 75° C. (e.g., at a temperature between 75-85 or 75-100° C.). In particular embodiments, the compositions further comprise at least one of the following: a) a first single stranded nucleic acid sequence with a 5′ end base that is adenylated, and b) a second single stranded nucleic acid sequence with a 3′ end base.
In particular embodiments, the 5′ end base of the first single stranded nucleic acid sequence is a DNA base, and wherein the 3′ end base of the second single stranded nucleic acid sequence is an RNA base or a DNA base. In other embodiments, the 5′ end base of the first single stranded nucleic acid sequence is a DNA base or an RNA base, and wherein the 3′ end base of the second single stranded nucleic acid sequence is a DNA base. In some embodiments, all or nearly all of the bases in the first single stranded nucleic acid sequence are DNA bases. In particular embodiments, all of the bases, or nearly all of the bases, in the second single stranded nucleic acid sequence are DNA bases.
In certain embodiments, the amino acid substitution for the K (lysine) is an amino acid selected from the group consisting of: alanine (A), serine (S), cysteine (C), valine (V), threonine (T), and Glycine (G). In other embodiments, the precursor hyper-thermostable ssRNA ligase is a wild-type hyper-thermostable ssRNA ligase. In further embodiments, the wild-type hyper-thermostable ssRNA ligase is from a species selected from the group consisting of: Thermococcus kodakarensis, Pyrococcus, yayanosii, Pyrococcus horikoshii, Pyrococcus abyssi, Pyrococcus furiosus, Hyperthermus butylicus, Aeropyrum pernix, Staphylothermus marinus, Pyrolobus fumarii, and Aquifex aeolicus. In additional embodiments, the hyper-thermostable lysine-mutant ssDNA/RNA ligase is encoded by one of the amino acid sequences shown in SEQ ID NO:1-11. In further embodiments, the hyper-thermostable lysine-mutant ssDNA/RNA ligase is encoded by an amino acid sequence that has at least 85%, 95%, or 99% sequence identity to one of the amino acid sequences shown in SEQ ID NO:1-11 (e.g., at least 85% . . . 90% . . . 95% . . . 98% . . . 99% . . . 99.5%). In certain embodiments, the hyper-thermostable lysine-mutant ssDNA/RNA ligase is encoded by an N-terminal or C-terminal truncated amino acid sequence shown in SEQ ID NO:1-11 (e.g., where the SEQ ID NO is missing 1-40 amino acids at either the N or C terminal, yet still has the same ligase activity as the full length sequence).
In certain embodiments, the hyper-thermostable lysine-mutant ssDNA/RNA ligase comprises a Motif sequence selected from the group consisting of: EGx(D/N/H)G (SEQ ID NO:12), EPx(D/N/H)G (SEQ ID NO:13), EAx(D/N/H)G (SEQ ID NO:14), EVx(D/N/H)G (SEQ ID NO:15), ELx(D/N/H)G (SEQ ID NO:16), E Ix(D/N/H)G (SEQ ID NO:17), EMx(D/N/H)G (SEQ ID NO:18), ECx(D/N/H)G (SEQ ID NO:19), EFx(D/N/H)G (SEQ ID NO:20), EYx(D/N/H)G (SEQ ID NO:21), EWx(D/N/H)G (SEQ ID NO:22), EHx(D/N/H)G (SEQ ID NO:23), ERx(D/N/H)G (SEQ ID NO:24), EQx(D/N/H)G (SEQ ID NO:25), ENx(D/N/H)G (SEQ ID NO:26), EEx(D/N/H)G (SEQ ID NO:27), EDx(D/N/H)G (SEQ ID NO:28), ESx(D/N/H)G (SEQ ID NO:29), ETx(D/N/H)G (SEQ ID NO:30) and Ex(D/N/H)G (SEQ ID NO:31), point deletion of lysine); wherein x is any amino acid. In particular embodiments, x is an amino acid with a small or smaller side chain compared to other amino acids.
In particular embodiments, the first single stranded nucleic acid sequence is a sequencing adapter (e.g., a next generation sequence adapter). In some embodiments, the second single stranded nucleic acid sequence comprises a sequencing library fragment. In particular embodiments, the methods further comprising: c) sequencing the ligated nucleic acid sequence.
In some embodiments, provided herein are methods comprising: a) combining in a reaction mixture: i) a single-stranded RNA ligase from Pyrococcus furiousus (PfuRnl2, WP_014835129.1) (or truncated or mutated version thereof), ii) a single-stranded nucleic acid sequence with an un-adenylated 5′ end, and iii) ATP molecules; and b) incubating the reaction mixture under condition such that the PfuRnl2 adenylates the un-adenylated 5′ end of the single-stranded nucleic acid sequence, thereby generating a first single stranded nucleic acid sequence with a 5′ end base that is adenylated. In particular embodiments, the incubating is at a temperature of at least 75° C. (e.g., at least 75 . . . 85 . . . 95 . . . or 100° C.; or between 75-85 or between 75-100° C.). In further embodiments, the combining in the reaction mixture further includes: iv) a second single stranded nucleic acid sequence with a 3′ end base, v) a hyper-thermostable lysine-mutant ssDNA/RNA ligase which is a mutated version of a precursor hyper-thermostable ssRNA ligase, wherein the precursor hyper-thermostable ssRNA ligase has a Motif I EKx(D/N/H)G (SEQ ID NO:32) and possess ssRNA ligase activity, but not ssDNA ligase activity, at a temperature of at least 75° C., and wherein the hyper-thermostable lysine-mutant ssDNA/RNA ligase has an amino acid substitution at the K (lysine) in the Motif I, and possesses both ssRNA ligase and ssDNA ligase activity at a temperature of at least 75° C. In some embodiments, the hyper-thermostable lysine-mutant ssDNA/RNA ligase ligates the 5′ adenylated end of the first single stranded nucleic acid sequence to the 3′ end of the second single stranded nucleic acid sequence to form a ligated nucleic acid sequence.
In certain embodiments, provided herein are compositions, kits, or systems comprising: a) a single-stranded RNA ligase from Pyrococcus furiousus (PfuRnl2, WP_014835129.1) (or truncated or mutated version thereof with the same or similar activity), and b) a hyper-thermostable lysine-mutant ssDNA/RNA ligase which is a mutated version of a precursor hyper-thermostable ssRNA ligase, wherein the precursor hyper-thermostable ssRNA ligase has a Motif I EKx(D/N/H)G (SEQ ID NO:32) and possess ssRNA ligase activity, but not ssDNA ligase activity, at a temperature of at least 75° C., and wherein the hyper-thermostable lysine-mutant ssDNA/RNA ligase has an amino acid substitution at the K (lysine) in the Motif I, and possesses both ssRNA ligase and ssDNA ligase activity at a temperature of at least 75° C. In other embodiments, the compositions, kits, and systems further comprise: c) a single-stranded nucleic acid sequence with an un-adenylated 5′ end.
In this application, the 5′-adenylated end of ssDNA or ssRNA if referred to as “donor” the end, and the 3′-OH end of ssDNA or ssRNA is referred to as the “acceptor” end.
Provided herein are compositions, systems, and methods employing hyper-thermostable lysine-mutant ssDNA/RNA ligases that possess both ssRNA ligase and ssDNA ligase activity. In certain embodiments, such hyper-thermostable lysine-mutant ssDNA/RNA ligases are used to ligate a first single stranded nucleic acid sequence with a 5′ adenylated end to a second single stranded nucleic acid sequence (e.g., at a temperature of at least 75° C.) to form a ligated nucleic acid sequence. In further embodiments, the ligated nucleic acid sequence is amplified and/or sequenced.
In certain embodiments, the hyper-thermostable lysine-mutants are as provided in SEQ ID Nos:1-11 in Table 1 below, or N or C terminal truncated, versions thereof.
Thermococcus
kodakarensis
Pyrococcus yayanosii
Pyrococcus horikoshii
Pyrococcus abyssi
Pyrococcus furiosus
Hyperthermus
butylicus
Aeropyrum pernix
Staphylothermus
marinus
Pyrolobus fumarii
Aquifex aeolicus
In some embodiments, the sequences in Table 1 above or Table 2 below are used to perform a sequence search (e.g., using BLAST) to find other hyper-thermostable ssRNA ligases from other species (e.g., by finding those with 30% . . . 50% . . . 60% or more homology). Once such ligases are identified, Motif I in such sequences can be located, and the lysine (K) in such Motif I can be mutated to another amino acid, preferably an alanine (A), serine (S), cysteine (C), valine (V), threonine (T), and Glycine (G). Such candidate enzymes can then be screened for ssDNA and ssRNA activities using (and thermostability), for example, using the same procedure as in Example 1 below. In this regard finding and designing hyper-thermostable lysine-mutant ssDNA/RNA ligases may be readily accomplished.
In certain embodiments, the hyper-thermostable lysine-mutant ssDNA/RNA ligases are used in sequencing methods, such as in attaching adapters to library fragments for subsequent sequencing. For example, in some embodiments, the disclosure provided herein finds use in a Second Generation (a.k.a. Next Generation or Next-Gen), Third Generation (a.k.a. Next-Next-Gen), or Fourth Generation (a.k.a. N3-Gen) sequencing technology including, but not limited to, pyrosequencing, sequencing-by-ligation, single molecule sequencing, sequence-by-synthesis (SBS), semiconductor sequencing, massive parallel clonal, massive parallel single molecule SBS, massive parallel single molecule real-time, massive parallel single molecule real-time nanopore technology, etc. Morozova and Marra provide a review of some such technologies in Genomics, 92: 255 (2008), herein incorporated by reference in its entirety.
Any number of DNA sequencing techniques are suitable, including fluorescence-based sequencing methodologies (See, e.g., Birren et al., Genome Analysis: Analyzing DNA, 1, Cold Spring Harbor, N.Y.; herein incorporated by reference in its entirety). In some embodiments, the present disclosure finds use in automated sequencing techniques understood in that art. In some embodiments, the present technology finds use in parallel sequencing of partitioned amplicons (PCT Publication No: WO2006084132, herein incorporated by reference in its entirety). In some embodiments, the technology finds use in DNA sequencing by parallel oligonucleotide extension (See, e.g., U.S. Pat. No. 5,750,341, and U.S. Pat. No. 6,306,597, both of which are herein incorporated by reference in their entireties). Additional examples of sequencing techniques in which the technology finds use include the Church polony technology (Mitra et al., 2003, Analytical Biochemistry 320, 55-65; Shendure et al., 2005 Science 309, 1728-1732; U.S. Pat. No. 6,432,360, U.S. Pat. No. 6,485,944, U.S. Pat. No. 6,511,803; all of which are herein incorporated by reference in their entireties), the 454 picotiter pyrosequencing technology (Margulies et al., 2005 Nature 437, 376-380; US 20050130173; herein incorporated by reference in their entireties), the Solexa single base addition technology (Bennett et al., 2005, Pharmacogenomics, 6, 373-382; U.S. Pat. No. 6,787,308; U.S. Pat. No. 6,833,246; herein incorporated by reference in their entireties), the Lynx massively parallel signature sequencing technology (Brenner et al. (2000). Nat. Biotechnol. 18:630-634; U.S. Pat. No. 5,695,934; U.S. Pat. No. 5,714,330; all of which are herein incorporated by reference in their entireties), and the Adessi PCR colony technology (Adessi et al. (2000). Nucleic Acid Res. 28, E87; WO 00018957; herein incorporated by reference in its entirety).
Next-generation sequencing (NGS) methods share the common feature of massively parallel, high-throughput strategies, with the goal of lower costs in comparison to older sequencing methods (see, e.g., Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; each herein incorporated by reference in their entirety). NGS methods can be broadly divided into those that typically use template amplification and those that do not. Amplification-requiring methods include pyrosequencing commercialized by Roche as the 454 technology platforms (e.g., GS 20 and GS FLX), Life Technologies/Ion Torrent, the Solexa platform commercialized by Illumina, GnuBio, and the Supported Oligonucleotide Ligation and Detection (SOLiD) platform commercialized by Applied Biosystems. Non-amplification approaches, also known as single-molecule sequencing, are exemplified by the HeliScope platform commercialized by Helicos BioSciences, and emerging platforms commercialized by VisiGen, Oxford Nanopore Technologies Ltd., and Pacific Biosciences, respectively.
In pyrosequencing (Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; U.S. Pat. No. 6,210,891; U.S. Pat. No. 6,258,568; each herein incorporated by reference in its entirety), template DNA is fragmented, end-repaired, ligated to adapters, and clonally amplified in-situ by capturing single template molecules with beads bearing oligonucleotides complementary to the adaptors. Each bead bearing a single template type is compartmentalized into a water-in-oil microvesicle, and the template is clonally amplified using a technique referred to as emulsion PCR. The emulsion is disrupted after amplification and beads are deposited into individual wells of a picotitre plate functioning as a flow cell during the sequencing reactions. Ordered, iterative introduction of each of the four dNTP reagents occurs in the flow cell in the presence of sequencing enzymes and luminescent reporter such as luciferase. In the event that an appropriate dNTP is added to the 3′ end of the sequencing primer, the resulting production of ATP causes a burst of luminescence within the well, which is recorded using a CCD camera. It is possible to achieve read lengths greater than or equal to 400 bases, and 106 sequence reads can be achieved, resulting in up to 500 million base pairs (Mb) of sequence.
In the Solexa/Illumina platform (Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; U.S. Pat. No. 6,833,246; U.S. Pat. No. 7,115,400; U.S. Pat. No. 6,969,488; each herein incorporated by reference in its entirety), sequencing data are produced in the form of shorter-length reads. In this method, single-stranded fragmented DNA is end-repaired to generate 5′-phosphorylated blunt ends, followed by Klenow-mediated addition of a single A base to the 3′ end of the fragments. A-addition facilitates addition of T-overhang adaptor oligonucleotides, which are subsequently used to capture the template-adaptor molecules on the surface of a flow cell that is studded with oligonucleotide anchors. The anchor is used as a PCR primer, but because of the length of the template and its proximity to other nearby anchor oligonucleotides, extension by PCR results in the “arching over” of the molecule to hybridize with an adjacent anchor oligonucleotide to form a bridge structure on the surface of the flow cell. These loops of DNA are denatured and cleaved. Forward strands are then sequenced with reversible dye terminators. The sequence of incorporated nucleotides is determined by detection of post-incorporation fluorescence, with each fluor and block removed prior to the next cycle of dNTP addition. Sequence read length ranges from 36 nucleotides to over 250 nucleotides, with overall output exceeding 1 billion nucleotide pairs per analytical run.
Sequencing nucleic acid molecules using SOLiD technology (Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; U.S. Pat. No. 5,912,148; U.S. Pat. No. 6,130,073; each herein incorporated by reference in their entirety) also involves fragmentation of the template, ligation to oligonucleotide adapters, attachment to beads, and clonal amplification by emulsion PCR. Following this, beads bearing template are immobilized on a derivatized surface of a glass flow-cell, and a primer complementary to the adaptor oligonucleotide is annealed. However, rather than utilizing this primer for 3′ extension, it is instead used to provide a 5′ phosphate group for ligation to interrogation probes containing two probe-specific bases followed by 6 degenerate bases and one of four fluorescent labels. In the SOLiD system, interrogation probes have 16 possible combinations of the two bases at the 3′ end of each probe, and one of four fluors at the 5′ end. Fluor color, and thus identity of each probe, corresponds to specific color-space coding schemes. Multiple rounds (usually 7) of probe annealing, ligation, and fluor detection are followed by denaturation, and then a second round of sequencing using a primer that is offset by one base relative to the initial primer. In this manner, the template sequence can be computationally re-constructed, and template bases are interrogated twice, resulting in increased accuracy. Sequence read length averages 35 nucleotides, and overall output exceeds 4 billion bases per sequencing run.
In certain embodiments, the technology described herein finds use in nanopore sequencing (see, e.g., Astier et al., J. Am. Chem. Soc. 2006 Feb. 8; 128(5):1705-10, herein incorporated by reference). The theory behind nanopore sequencing has to do with what occurs when a nanopore is immersed in a conducting fluid and a potential (voltage) is applied across it. Under these conditions a slight electric current due to conduction of ions through the nanopore can be observed, and the amount of current is exceedingly sensitive to the size of the nanopore. As each base of a nucleic acid passes through the nanopore, this causes a change in the magnitude of the current through the nanopore that is distinct for each of the four bases, thereby allowing the sequence of the DNA molecule to be determined.
In certain embodiments, the technology described herein finds use in HeliScope by Helicos BioSciences (Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; U.S. Pat. No. 7,169,560; U.S. Pat. No. 7,282,337; U.S. Pat. No. 7,482,120; U.S. Pat. No. 7,501,245; U.S. Pat. No. 6,818,395; U.S. Pat. No. 6,911,345; U.S. Pat. No. 7,501,245; each herein incorporated by reference in their entirety). Template DNA is fragmented and polyadenylated at the 3′ end, with the final adenosine bearing a fluorescent label. Denatured polyadenylated template fragments are ligated to poly(dT) oligonucleotides on the surface of a flow cell. Initial physical locations of captured template molecules are recorded by a CCD camera, and then label is cleaved and washed away. Sequencing is achieved by addition of polymerase and serial addition of fluorescently-labeled dNTP reagents. Incorporation events result in fluor signal corresponding to the dNTP, and signal is captured by a CCD camera before each round of dNTP addition. Sequence read length ranges from 25-50 nucleotides, with overall output exceeding 1 billion nucleotide pairs per analytical run.
The Ion Torrent technology is a method of DNA sequencing based on the detection of hydrogen ions that are released during the polymerization of DNA (see, e.g., Science 327(5970): 1190 (2010); U.S. Pat. Appl. Pub. Nos. 20090026082, 20090127589, 20100301398, 20100197507, 20100188073, and 20100137143, incorporated by reference in their entireties for all purposes). A microwell contains a template DNA strand to be sequenced. Beneath the layer of microwells is a hypersensitive ISFET ion sensor. All layers are contained within a CMOS semiconductor chip, similar to that used in the electronics industry. When a dNTP is incorporated into the growing complementary strand a hydrogen ion is released, which triggers a hypersensitive ion sensor. If homopolymer repeats are present in the template sequence, multiple dNTP molecules will be incorporated in a single cycle. This leads to a corresponding number of released hydrogens and a proportionally higher electronic signal. This technology differs from other sequencing technologies in that no modified nucleotides or optics is used. The per-base accuracy of the Ion Torrent sequencer is ˜99.6% for 50 base reads, with ˜100 Mb to 100 Gb generated per run. The read-length is 100-300 base pairs. The accuracy for homopolymer repeats of 5 repeats in length is ˜98%. The benefits of ion semiconductor sequencing are rapid sequencing speed and low upfront and operating costs.
The technology disclosed herein finds use in another nucleic acid sequencing approach developed by Stratos Genomics, Inc. and involves the use of Xpandomers. This sequencing process typically includes providing a daughter strand produced by a template-directed synthesis. The daughter strand generally includes a plurality of subunits coupled in a sequence corresponding to a contiguous nucleotide sequence of all or a portion of a target nucleic acid in which the individual subunits comprise a tether, at least one probe or nucleobase residue, and at least one selectively cleavable bond. The selectively cleavable bond(s) is/are cleaved to yield an Xpandomer of a length longer than the plurality of the subunits of the daughter strand. The Xpandomer typically includes the tethers and reporter elements for parsing genetic information in a sequence corresponding to the contiguous nucleotide sequence of all or a portion of the target nucleic acid. Reporter elements of the Xpandomer are then detected. Additional details relating to Xpandomer-based approaches are described in, for example, U.S. Pat. Pub No. 20090035777, which is incorporated herein in its entirety.
Other single molecule sequencing methods include real-time sequencing by synthesis using a VisiGen platform (Voelkerding et al., Clinical Chem., 55: 641-58, 2009; U.S. Pat.
No. 7,329,492; U.S. patent application Ser. No. 11/671956; U.S. patent application Ser. No. 11/781166; each herein incorporated by reference in their entirety) in which immobilized, primed DNA template is subjected to strand extension using a fluorescently-modified polymerase and florescent acceptor molecules, resulting in detectable fluorescence resonance energy transfer (FRET) upon nucleotide addition.
In certain embodiments, provided herein are compositions, kits, and systems comprising a hyper-thermostable lysine-mutant ssDNA/RNA ligase encoded by SEQ ID NOs:1-11, or encoded by a sequence with substantial identity with SEQ ID NOs:1-11. As applied to such polypeptides, the term “substantial identity” means that two peptide sequences, when optimally aligned, such as by the programs GAP or BESTFIT using default gap weights, share at least 80 percent sequence identity, or at least 90 percent sequence identity, or at least 95 percent sequence identity or more (e.g., 95% . . . 97% . . . or 99% percent sequence identity). In particular embodiments, residue positions which are not identical differ by conservative amino acid substitutions. Conservative amino acid substitutions refer to the interchangeability of residues having similar side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine. In some embodiments, the conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine. In certain embodiments, provided herein are peptides that have substantial identity to at least a portion of the amino acid sequences shown in SEQ ID NOS: 1-11.
First, a series of hyper-thermostable RNA ligases were identified in a search of the art. This list, in Table 2 below, includes hyperthermophilic archaea species, some of which can endure environments with over 100° C.
Thermococcus
kodakarensis
Pyrococcus
yayanosii
Pyrococcus
horikoshii
Pyrococcus
abyssi*
Pyrococcus
furiosus
Hyperthermus
butylicus
Aeropyrum
pernix
Staphylothermus
marinus
Pyrolobus fumarii
Aquifex aeolicus
Three of the 10 ligases from Table 2 (PhoRnl2, PfuRnl2, HbuRnl2) were synthesized using optimized E. coli codon, expressed and purified, and were examined for their activity on single-strand RNA and DNA substrate at high temperature (90° C. as in
For activity on ssDNA, however, although all enzymes can turn 5′-phosphorylated ssDNA to 5′-adenylated ssDNA, as shown in
Next, one of the RNA ligases (HbuRnl2) was tested for its activity on a 5′-adenylated ssDNA under conditions with or without ATP. As shown in
In step 2 of the ligation, the adenyl group is transferred from the catalytic lysine in the conserved Motif I in ligases to the 5′-phosphorylated end. Catalytic lysine resides in a conserved motif EKx(D/N/H)G (SEQ ID NO:32). Thus, while the present invention is not limited to any particular mechanism, and an understanding of the mechanism is not necessary to practice the present invention, by mutating the catalytic lysine in Motif I to other amino acids, the step 2 reverse reaction may be blocked or reduced, so that the step 3 ligation, which is the joining between the 3′-OH end and the 5′-adenylated end, can occur.
Next, HbuRnl2 was chosen as a target, and mutant HbuRnl2K106A (SEQ ID NO:7) was constructed and purified, and its activity was assayed in step 3 ligation. As shown in
Next, lysine 106 in HbuRnl2 was changed to other amino acids, and mutants were assayed for its adenylated ssDNA circularization activity. As shown in
The optimal reaction temperature range for HbuRnl2K106A was then examined. As shown in
Due to the intrinsic flexibility of ssDNA/RNA and the proximity of the ends of the same oligonucleotide, intra-molecule circularization by ligation occurs in a much higher frequency than inter-molecule concatenation. This may be part of the reason that TS2126 ligase was named “CircLigase” (WO2004/027056A2). Circularization of ssDNA/RNA can be useful for a lot of applications. For example, the circularized oligonucleotides are resistant to exonuclease digestion and can serve as templates for amplification, such as rolling circle amplification. Oftentimes, however, there is a desire to promote inter-molecule ssDNA/RNA ligation instead of intra-molecule circularization, for example, when one wants to attach an oligonucleotide adaptor to a library of ssDNA/RNA.
A number of strategies can be used to promote inter-molecule ligation between ssDNA/RNA. For example, 3′-OH ends of ssDNA/RNA can be modified, such as by using NH2 group or many other modification groups, so that the OH group is not present for the intra-molecule ligation to occur. In addition, Mn2+ and ligation enhancer, such as PEG8000, may be used (Zhelkovsky A., et al, BMC Biotechnology, 2012, herein incorporated by reference).
In
Per the above, it was determined that mutant HbuRnl2K106A can catalyze the ligation between two pieces of ssRNA, in which both ligation donor and acceptor are RNA, and it can catalyze the ligation between two pieces of ssDNA, in which both ligation donor and acceptor are DNA. It is further demonstrated that it can ligate between ssRNA donor and ssDNA acceptor, or between ssDNA donor and ssRNA acceptor (data not shown). These properties are useful for many applications. For example, current protocols for small RNA cloning and sequencing uses T4Rnl2 and its mutants at 37° C. or lower for the attachment of 5′-adenylated ssDNA adaptor to the 3′-OH of ssRNA. One could use HbuRnl2K106A for the same purpose but at a much higher temperature with the advantage of minimizing secondary structure.
Per the above, it was shown that PfuRnl2 can efficiently adenylate 5′-phosphorylated ends of ssDNA (
All genes are synthesized as double-stranded DNA fragment with optimized E. coli codon usage and 6X-His-tag at the N-terminus by IDT (Integrated DNA Technologies, Coralville, Iowa). The DNA was then inserted into expression vectors pTXB1 under the control of T7 promoter using Gibson Assembly (NEB #E2611). Expression constructs were transformed into T7 express E. coli strain (NEB #C2566). For expression in culture, the cells were grown in LB media to OD ˜0.6, upon which a final concentration of 0.5 mM IPTG was added. The induced cultures were kept grown by shaking at 20° C. for overnight.
For purification, 250 ml of induced cell culture was first pelleted by centrifugation. The cell pellet was then re-suspended in buffer containing 20 mM Tris (pH=7.5), 150 mM NaCl, 1× FastBreak (Promega), 200 ug/ml lysozyme, and sonicated. Cell lysates were then centrifuged at 15,000 g for 20 min at 4° C. The cleared lysate is then purified successively on a Ni-NTA column and a Heparin column.
Site-directed mutagenesis (e.g., by using New England Biolabs' Q5 mutagenesis kit) was performed with the cloned constructs to introduce mutation to the conserved lysine residue.
Hyperthermostable ssDNA/RNA ligases (e.g., as described in SEQ ID NOS:1-11) can be used in the library preparation process for next-generation high-throughput sequencing. Currently, the dominant library construction methods include a ligation step where double-stranded library fragments are ligated to double-stranded adaptors. Single-strand ligation based method exists, but by using the CircLigase with a optimal reaction temperature around 65° C. (Gansauge M T, Nat. Protol., 2013).
As illustrated in
After ligation, a cleanup, e.g., by using AMPure beads or other methods well known in the art, is used to recover all fragments ligated with the first adaptor. The recovered fragments are then adenylated, for example, by using the hyperthermostable 5′-adenylase PfuRnl2 in high concentration of ATP. Then, the 5′-adenylated DNA fragments are ligated with the second adaptor, for example, with regular 5′-OH and 3′-OH ends. This ligation reaction is catalyzed by one of the mutant ligases herein, such as the hyperthermostable HbuRnl2K106A. After ligation, the ligated DNA fragments are purified and ready for further downstream steps, such as via polymerase chain reaction (PCR) or cluster generation.
A number of advantages for a single-strand ligation based library construction exist with the presently disclosed methods, compositions, and systems. First, workflow of NGS sample preparation can be simplified, without end polishing steps. Second, compared with previous protocols, much less bias may exist, especially for long stretches of oligonucleotides in the library, due to high reaction temperature. Third, sequenced reads may be concatenated to form synthetic long reads and assigned to different starting genomic DNA molecules, with the assumption of random breakpoints introduced by the shearing process and relatively deep sequencing coverage so that most of the ssDNA fragments are sequenced (
Blondal et al., Isolation and characterization of a thermostable RNA ligase 1 from a Thermus scotoductus bacteriophage TS2126 with good single-stranded DNA ligation properties. Nucleic Acids Res. 2005 Jan. 7;33(1):135-42.
Blondal et al., Discovery and characterization of a thermostable bacteriophage RNA ligase homologous to T4 RNA ligase 1. Nucleic Acids Res. 2003, 31(24):7247-54.
Brooks et al., The structure of an archaeal homodimeric ligase which has RNA circularization activity. Protein Sci. 2008, 17(8):1336-45.
Gansauge and Meyer, Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat Protoc. 2013, 8(4):737-48.
Ho and Shuman, Bacteriophage T4 RNA ligase 2 (gp24.1) exemplifies a family of RNA ligases found in all phylogenetic domains. Proc Natl Acad Sci U S A. 2002, 99(20):12709-14.
Lau et al., An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science. 2001, 294(5543):858-62.
Lehman I R, DNA ligase: structure, mechanism, and function. Science. 1974 Nov. 29;186(4166):790-7.
Lindahl and Barnes, Mammalian DNA ligases. Annu Rev Biochem. 1992;61:251-81.
Silber et al., Purification and properties of bacteriophage T4-induced RNA ligase, Proceedings of the National Academy of Sciences of the United States of America, vol. 69, no. 10, pp. 3009-3013, 1972.
Tomkinson et al., T. DNA ligases: structure, reaction mechanism, and function. Chem Rev. 2006, 106(2):687-99.
Torchia et al., Archaeal RNA ligase is a homodimeric protein that catalyzes intramolecular ligation of single-stranded RNA and DNA. Nucleic Acids Res. 2008, 36(19):6218-27.
US2014/0128292 A1 Methods for improving ligation steps to minimize bias during production of libraries for massively parallel sequencing
Viollet et al., T4 RNA ligase 2 truncated active site mutants: improved tools for RNA analysis. BMC Biotechnol. 2011, 11:72.
WO2004/027056A2 Methods of use for thermostable RNA ligases
Yin et al., Structure-function analysis of T4 RNA ligase 2. J Biol Chem. 2003, 278(20):17601-8
Zhuang et al., Structural bias in T4 RNA ligase-mediated 3′-adapter ligation. Nucleic Acids Res. 2012, 40(7):e54.
Zhelkovsky and McReynolds, Simple and efficient synthesis of 5′ pre-adenylated DNA using thermostable RNA ligase. Nucleic Acids Res. 2011, 39(17):e117.
Zhelkovsky and McReynolds, Structure-function analysis of Methanobacterium thermoautotrophicum RNA ligase—engineering a thermostable ATP independent enzyme. BMC Mol Biol. 2012, 13:24.
All publications and patents mentioned in the specification and/or listed below are herein incorporated by reference. Various modifications and variations of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the relevant fields are intended to be within the scope described herein.
The present application claims priority to U.S. Provisional application Ser. No. 62/307,658, filed Mar. 14, 2016, which is herein incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US17/22232 | 3/14/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62307658 | Mar 2016 | US |