The invention relates to the use of specific terminal deoxynucleotidyl transferase (TdT) enzymes or the homologous amino acid sequence of Polμ, Polβ, Polλ, and Polθ of any species or the homologous amino acid sequence of X family polymerases of any species in a method of nucleic acid synthesis, to methods of synthesizing nucleic acids, and to the use of kits comprising said enzymes in a method of nucleic acid synthesis. The invention also relates to the use of terminal deoxynucleotidyl transferases or homologous enzymes and 3′-blocked nucleoside triphosphates in a method of template independent nucleic acid synthesis.
Nucleic acid synthesis is vital to modern biotechnology. The rapid pace of development in the biotechnology arena has been made possible by the scientific community's ability to artificially synthesise DNA, RNA and proteins.
Artificial DNA synthesis allows biotechnology and pharmaceutical companies to develop a range of peptide therapeutics, such as insulin for the treatment of diabetes. It allows researchers to characterise cellular proteins to develop new small molecule therapies for the treatment of diseases our aging population faces today, such as heart disease and cancer. It even paves the way forward to creating life, as the Venter Institute demonstrated in 2010 when they placed an artificially synthesised genome into a bacterial cell.
However, current DNA synthesis technology does not meet the demands of the biotechnology industry. Despite being a mature technology, it is practically impossible to synthesise a DNA strand greater than 200 nucleotides in length, and most DNA synthesis companies only offer up to 120 nucleotides. In comparison, an average protein-coding gene is of the order of 2000-3000 contiguous nucleotides, a chromosome is at least a million contiguous nucleotides in length and an average eukaryotic genome numbers in the billions of nucleotides. In order to prepare nucleic acid strands thousands of base pairs in length, all major gene synthesis companies today rely on variations of a ‘synthesise and stitch’ technique, where overlapping 40-60-mer fragments are synthesised and stitched together by enzymatic copying and extension. Current methods generally allow up to 3 kb in length for routine production.
The reason DNA cannot be synthesised beyond 120-200 nucleotides at a time is due to the current methodology for generating DNA, which uses synthetic chemistry (i.e., phosphoramidite technology) to couple a nucleotide one at a time to make DNA. Even if the efficiency of each nucleotide-coupling step is 99% efficient, it is mathematically impossible to synthesise DNA longer than 200 nucleotides in acceptable yields. The Venter Institute illustrated this laborious process by spending 4 years and 20 million USD to synthesise the relatively small genome of a bacterium.
Known methods of DNA sequencing use template-dependent DNA polymerases to add 3′-reversibly terminated nucleotides to a growing double-stranded substrate. In the ‘sequencing-by-synthesis’ process, each added nucleotide contains a dye, allowing the user to identify the exact sequence of the template strand. Albeit on double-stranded DNA, this technology is able to produce strands of between 500-1000 bps long. However, this technology is not suitable for de novo nucleic acid synthesis because of the requirement for an existing nucleic acid strand to act as a template.
Various attempts have been made to use a terminal deoxynucleotidyl transferase for de novo single-stranded DNA synthesis. Uncontrolled de novo single-stranded DNA synthesis, as opposed to controlled, takes advantage of TdT's deoxynucleoside 5′-triphosphate (dNTP) 3′-tailing properties on single-stranded DNA to create, for example, homopolymeric adaptor sequences for next-generation sequencing library preparation. In controlled extensions, reversible deoxynucleoside 5′-triphosphate termination technology needs to be employed to prevent uncontrolled addition of dNTPs to the 3′-end of a growing DNA strand. The development of a controlled single-stranded DNA synthesis process through TdT would be invaluable to in situ DNA synthesis for gene assembly or hybridization microarrays as it removes the need for an anhydrous environment and allows the use of various polymers incompatible with organic solvents.
However, TdT has not been shown to efficiently add nucleoside triphosphates containing 3′-O-reversibly terminating moieties for building up a nascent single-stranded DNA chain necessary for a de novo synthesis cycle. A 3′-O-reversible terminating moiety would prevent a terminal transferase like TdT from catalysing the nucleotide transferase reaction between the 3′-end of a growing DNA strand and the 5′-triphosphate of an incoming nucleoside triphosphate.
There is therefore a need to identify modified terminal deoxynucleotidyl transferases that readily incorporate 3′-O-reversibly terminated nucleotides. Said modified terminal deoxynucleotidyl transferases can be used to incorporate 3′-O-reversibly terminated nucleotides in a fashion useful for biotechnology and single-stranded DNA synthesis processes in order to provide an improved method of nucleic acid synthesis that is able to overcome the problems associated with currently available methods.
Bottom panel: Solid support synthesis of an 8-nt nucleic acid sequence using 3′-ONH2 nucleoside 5′-triphosphates with terminal deoxynucleotidyl transferase (TdT) SEQ ID NOs 1 & 344-727. The DNA sequence 5′-ATCGATCG-3′ was synthesized by repeated exposure of a solid support bound DNA initiator to nucleotide addition mixture (NAM) solutions containing (1) TdT, (2) neutral pH buffer, (3) monovalent salt, (4) cobalt chloride, and (5) 3′-ONH2-dNTP, where N is selected from adenine, thymine, cytosine, or guanine. A cycle consists of the following: (A) NAM solution with a specified A, T, C, or G reversibly terminated nucleotide was incubated for 5 minutes on the solid support at 37 C; (B) the solid support was then washed with a high salt solution at neutral pH; (C) the solid support was then exposed to acidic aqueous sodium nitrite; and (D) the solid support was then washed with the same high salt solution at neutral pH from (B). (A)-(D) were then repeated 7 more times to synthesize the desired 8-nt sequence. Synthesized DNA were analysed by running reactions on a denaturing polyacrylamide gel and quantified by virtue of a fluorophore covalently attached to the DNA initiator. The fraction full length (8-nt species) was determined by taking fluorescence intensity of the 8-nt band and dividing by total lane intensity. Wild-type bovine and spotted gar TdT activities as it relates to fraction full-length are indicated by the dotted line. All mutations included in the TdT variants resulted in improvements in TdT relative to wild-type bovine and spotted gar TdTs.
Described herein are modified terminal deoxynucleotidyl transferase (TdT) enzymes or the homologous amino acid sequence of Polμ, Polβ, Polλ, and Polθ of any species or the homologous amino acid sequence of X family polymerases of any species. Terminal transferase enzymes are ubiquitous in nature and are present in many species. Many known TdT sequences have been reported in the NCBI database http://www.ncbi.nlm.nih.gov/.
Bos taurus
Gallus gallus
Xenopus laevis
Oncorhynchus mykiss
Monodelphis domestica
Mus musculus
Ambystoma mexicanum
Takifugu rubripes
Raja eglanteria
Ginglymostoma cirratum
Danio rerio
Canis lupus familiaris
Lemur catta
Microcebus murinus
Otolemur garnettii
Mus musculus
Rattus norvegicus
Equus caballus
Bos taurus
Xenopus (Silurana) tropicalis
Oryctolagus cuniculus
Oryctolagus cuniculus
Ailuropoda melanoleuca
Sus scrota
Anolis carolinensis
Monodelphis domestica
Loxodonta africana
Ornithorhynchus anatinus
Cavia porcellus
Heterocephalus glaber
Macaca mulatta
Sarcophilus harrisii
Sarcophilus harrisii
Otolemur garnettii
Pan paniscus
Saimiri boliviensis boliviensis
Takifugu rubripes
Felis catus
Myotis davidii
Myotis davidii
Tupaia chinensis
Pleurodeles waltl
Orcinus orca
Trichechus manatus latirostris
Ceratotherium simum simum
Ceratotherium simum simum
Dasypus novemcinctus
Maylandia zebra
Ochotona princeps
Sorex araneus
Sorex araneus
Jaculus jaculus
Jaculus jaculus
Octodon degus
Echinops telfairi
Echinops telfairi
Condylura cristata
Condylura cristata
Mustela putorius furo
Xenopus (Silurana) tropicalis
Heterocephalus glaber
Mesocricetus auratus
Melopsittacus undulatus
Danio rerio
Danio rerio
Falco peregrinus
Chrysemys picta bellii
Microtus ochrogaster
Ictidomys tridecemlineatus
Chinchilla lanigera
Chinchilla lanigera
Cricetulus griseus
Cricetulus griseus
Geospiza fortis
Pseudopodoces humilis
Columba livia
Macaca fascicularis
Equus caballus
Pundamilia nyererei
Xiphophorus maculatus
Xiphophorus maculatus
Myotis brandtii
Myotis brandtii
Pantholops hodgsonii
Latimeria chalumnae
Alligator sinensis
Pelodiscus sinensis
Myotis lucifugus
Camelus ferus
Camelus ferus
Tupaia chinensis
Tupaia chinensis
Alligator mississippiensis
Alligator mississippiensis
Rattus norvegicus
Lepisosteus oculatus
The sequences of the various described terminal transferases show some regions of highly conserved sequence, and some regions which are highly diverse between different species. A sequence alignment for sequences from a selection of species is shown in
The inventors have modified the terminal transferase from Lepisosteus oculatus TdT (spotted gar) (shown as SEQ ID 1). However the corresponding modifications can be introduced into the analagous terminal transferase sequences from any other species, including the sequences listed above in the various NCBI entries, including those shown in
The amino acid sequence of the spotted gar (Lepisosteus oculatus) is shown below (SEQ ID 1)
An engineered variant of this sequence was previously identified as SEQ ID NO 8 in publication WO2016/128731. Further engineered Improvements to this published sequence are described herein. The modified sequences disclosed herein are different to SEQ ID NO 8 disclosed in the prior art. WO2016/128731 SEQ ID NO 2 is a “mis-annotated” wild-type gar sequence.
SEQ ID NO 8 in publication WO2016/128731 is shown below with the engineered mutations identified:
The inventors have identified various amino acids modifications in the amino acid sequence having improved properties. Certain regions improve the solubility and handling of the enzyme. Certain other regions improve the ability to incorporate nucleotides with modifications; these modifications include modifications at the 3′-position of the sugar and modifications to the base.
Described herein are modified terminal deoxynucleotidyl transferase (TdT) enzymes comprising amino acid modifications when compared to a wild type sequence SEQ ID NO 1 or a truncated version thereof or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) enzyme in other species or the homologous amino acid sequence of Polμ, Polβ, Polλ, and Polθ of any species or the homologous amino acid sequence of X family polymerases of any species, wherein the amino acid is modified at one or more of the amino acids:
V32, A33, I34, F35, A53, V68, V71, E97, I101, M108, G109, A110, Q115, V116, S125, T137, Q143, M152, E153, N154, H155, N156, Q157, I158, I165, N169, N173, S175, E176, G177, P178, C179, L180, A181, F182, M183, R184, A185, L188, H194, A195, I196, S197, S198, S199, K200, E203, G204, D210, Q211, T212, K213, A214, I216, E217, D218, L220, Y222, V228, D230, Q238, T239, L242, L251, K260, G261, F262, H263, S264, L265, E267, Q269, A270, D271, N272, A273, H275, F276, T277, K278, M279, Q280, K281, S291, A292, A293, V294, C295, K296, E298, A299, Q300, A301, Q304, I305, T309, V310, R311, L312, I313, A314, I318, V319, T320, G328, K329, E330, C331, L338, T341, P342, E343, M344, G345, K346, W349, L350, L351, N352, R353, L354, I355, N356, R357, L358, Q359, N360, Q361, G362, I363, L364, L365, Y366, Y367, D368, I369, V370, K376, T377, C381, K383, D388, H389, F390, Q391, K392, F394, I397, K398, K400, K401, E402, L403, A404, A405, G406, R407, D411, A421, P422, P423, V424, D425, N426, F427, A430, R438, F447, A448, R449, H450, E451, R452, K453, M454, L455, L456, D457, N458, H459, A460, L461, Y462, D463, K464, T465, K466, K467, T474, D477, D485, Y486, I487, D488, P489.
Modifications which improve the solubility include a modification within the amino acid region WLLNRLINRLQNQGILLYYDIV shown highlighted in the sequence below.
Modifications which improve the incorporation of modified nucleotides can be at one or more of selected regions shown below. Regions were selected according to mutation data (
Particular modifications which improve the incorporation of modified nucleotides can be at one or more of selected regions shown below. The second modification can be selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA, and YIDP shown highlighted in the sequence below.
Described herein is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least one amino acid modification when compared to a wild type sequence SEQ ID NO 1 or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) enzyme in other species, wherein the modification is selected from one or more of the amino acid regions WLLNRLINRLQNQGILLYYDIV, VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK, and DYIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
References to particular sequences include truncations thereof. Included herein are modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least one amino acid modification when compared to a wild type sequence SEQ ID NO 1 or a truncated version thereof, or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) enzyme in other species, wherein the modification is selected from one or more of the amino acid regions WLLNRLINRLQNQGILLYYDIV, VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK, and DYIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
Truncated proteins may include at least the region shown below
Described herein is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least the sequence:
or the homologous regions in other species, wherein the sequence has one or more amino acid modifications in one or more of the amino acid regions WLLNRLINRLQNQGILLYYDI, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK, and DYIDP of the sequence.
Disclosed herein is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least one amino acid modification when compared to a wild type sequence, wherein the modification is selected from one or more of the amino acid regions WLLNRLINRLQNQGILLYYDI, VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK, and DYIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
Disclosed herein is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least one amino acid modification when compared to a wild type sequence, wherein the modification is selected from one or more of the amino acid regions WLLNRLINRLQNQGILLYYDI, VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA, and YIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
Described herein is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least one amino acid modification when compared to a wild type sequence SEQ ID NO 1 or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) enzyme in other species, wherein the modification is selected from one or more of the amino acid regions WLLNRLINRLQNQGILLYYDI, VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA, and YIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
Homologous refers to protein sequences between two or more proteins that possess a common evolutionary origin, including proteins from superfamilies in the same species of organism as well as homologous proteins from different species. Such proteins (and their encoding nucleic acids) have sequence homology, as reflected by their sequence similarity, whether in terms of percent identity or by the presence of specific residues or motifs and conserved positions. A variety of protein (and their encoding nucleic acid) sequence alignment tools may be used to determine sequence homology. For example, the Clustal Omega multiple sequence alignment program provided by the European Molecular Biology Laboratory (EMBL) can be used to determine sequence homology or homologous regions.
Improved sequences as described herein can contain both modifications, namely
a. a first modification is within the amino acid region WLLNRLINRLQNQGILLYYDIV of the sequence of SEQ ID NO 1 or the homologous region in other species; and
b. a second modification is selected from one or more of the amino acid regions VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK, and DYIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
The sequence can be truncated
Improved sequences as described herein can contain both modifications, namely
a. a first modification is within the amino acid region WLLNRLINRLQNQGILLYYDI of the sequence of SEQ ID NO 1 or the homologous region in other species; and
b. a second modification is selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA, and YIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
As a comparison with other species, the sequence of Bos taurus (cow) TdT is shown below:
The homologous regions in the sequences are highlighted below.
Modifications which improve the solubility include a modification within the amino acid region QLLPKVINLWEKKGLLLYYDLV shown highlighted in the sequence below.
Modifications which improve the incorporation of modified nucleotides can be at one or more of selected regions shown below. The second modification can be selected from one or more of the amino acid regions LVLF, MGA, LNNYNHI, NEVSYVTFMRA, FTIISM, DKVKC, MGFRS, MSDKT, KFTKMQK, VSCVTR, EAEA, AVWAFL, GKKI, SPGSAE, YYDLV, DHFQK, MCPYENR, YATHERKMMLDNHALYDKTKR, and DYIEP shown highlighted in the sequence below.
Modifications which improve the incorporation of modified nucleotides can be at one or more of selected regions shown below. The second modification can be selected from one or more of the amino acid regions LVLF, MGA, NNYNH, FMRA, FTI, VKC, FRS, MSDKT, MQK, EAEA, AVW, KKI, SPGSAE, DHFQ, MCPYEN, YATHERKMMLDNHA, and YIEP shown highlighted in the sequence below.
As a comparison with other species, the sequence of Mus musculus (mouse) TdT is shown below:
Modifications which improve the solubility include a modification within the amino acid region QLLHKVTDFWKQQGLLLYCDIL shown highlighted in the sequence below.
Modifications which improve the incorporation of modified nucleotides can be at one or more of selected regions shown below. The second modification can be selected from one or more of the amino acid regions LVLF, MGA, LNNYNQL, NEGSCLAFMRA, FPITSM, DKVKS, MGFRT, QSDKS, RFTQMQK, VSCVNR, EAEA, AVVTFL, GKMT, SPEATE, DHFQK, SGQ, MCPYDRR, YATHERKMMLDNHALYDRT, R, and DYIEP shown highlighted in the sequence below.
Modifications which improve the incorporation of modified nucleotides can be at one or more of selected regions shown below. The second modification can be selected from one or more of the amino acid regions LVLF, MGA, LNNYNQ, NEGSCLAFMRA, FPI, VKS, FRT, SKIQSDKS, MQK, VSCVNR, EAEA, AVV, KMT, SPEATE, DHFQK, MCPYDR, YATHERKMMLDNHA, and YIEP shown highlighted in the sequence below.
Modifications which improve the incorporation of modified nucleotides can be at one or more of selected regions shown below. The second modification can be selected from one or more of the amino acid regions LVLF, MGA, NNYNQ, FMRA, FPI, VKS, FRT, QSDKS, MQK, VSCVNR, EAEA, AVV, KMT, SPEATE, DHFQ, MCPYDR, YATHERKMMLDNHA, and YIEP shown highlighted in the sequence below.
Thus by a process of aligning sequences, it is immediately apparent which regions in the sequences of terminal transferases from other species correspond to the sequences described herein with respect to the spotted gar sequence shown in SEQ ID NO 1.
Sequence homology extends to all modified or wild-type members of family X polymerases, such as DNA Polμ (also known as DNA polymerase mu or POLM), DNA Polβ (also known as DNA polymerase beta or POLB), and DNA Polλ (also known known as DNA polymerase lambda or POLL). It is well known in the art that all family X member polymerases, of which TdT is a member, either have terminal transferase activity or can be engineered to gain terminal transferase activity akin to terminal deoxynucleotidyl transferase (Biochim Biophys Acta. 2010 May; 1804(5): 1136-1150). For example, when the following human TdT loop1 amino acid sequence
was engineered to replace the following human Polμ amino acid residues
the chimeric human Polμ containing human TdT loop1 gained robust terminal transferase activity (Nucleic Acids Res. 2006 September; 34(16): 4572-4582).
Furthermore, it was generally demonstrated in US patent application no. 2019/0078065 that family X polymerases when engineered to contain TdT loop1 chimeras could gain robust terminal transferase activity. Additionally, it was demonstrated that TdT could be converted into a template-dependent polymerase through specific mutations in the loop1 motif (Nucleic Acids Research, June 2009, 37(14):4642-4656). As it has been shown in the art, family X polymerases can be trivially modified to either display template-dependent or template-independent nucleotidyl transferase activities. Therefore, all motifs, regions, and mutations demonstrated in this patent can be trivially extended to modified X family polymerases to enable modified X family polymerases to incorporate 3′-modified nucleotides, reversibly terminated nucleotides, and modified nucleotides in general to effect methods of nucleic acid synthesis.
As a comparison with other family X polymerases, the human Polμ sequence is shown below:
Modifications which improve the solubility include a modification within the amino acid region GLLPRVMCRLQDQGLILYHQHQ shown highlighted in the sequence below.
Modifications which improve the incorporation of modified nucleotides can be at one or more of selected regions shown below. The second modification can be selected from one or more of the amino acid regions VAIY, LGA, LTHHNTG, SEGRLLTFCRAA, SPVTTL, EHSSR, EGLRT, REQP, KLTQQQKA, STPVLR, DVDA, AVGQA, GKLQ, HPKEGQ, YHQHQ, DAFER, VAPVSQ, FSRKEKGLWLNSHGLFDPEQK, AND EYLPP shown highlighted in the sequence below.
Thus by a process of aligning sequences, it is immediately apparent which regions in the sequences of all family X polymerases from any species correspond to the sequences described herein with respect to the spotted gar sequence shown in SEQ ID NO 1.
Furthermore, the A family polymerase, DNA Polθ (also known as DNA polymerase theta or POLQ) was demonstrated to display robust terminal transferase capability (eLife. 2016; 5: e13740). DNA Polθ was also demonstrated to be useful in methods of nucleic acid synthesis (GB patent application no. 2553274). In US patent application no. 2019/0078065, it was demonstrated that chimeras of DNA Polθ and family X polymerases could be engineered to gain robust terminal transferase activity and become competent for methods of nucleic acid synthesis. Therefore, all motifs, regions, and mutations demonstrated in this patent can be trivially extended to modified A family polymerases, especially DNA Polθ, to enable modified A family polymerases to incorporate 3′-modified nucleotides, reversibly terminated nucleotides, and modified nucleotides in general to effect methods of nucleic acid synthesis.
Described herein are modified terminal deoxynucleotidyl transferase (TdT) enzymes. Terminal transferase enzymes are ubiquitous in nature and are present in many species. Many known TdT sequences have been reported in the NCBI database. The sequences described herein are modified from the sequence of the Spotted Gar, but the corresponding changes can be introduced into the homologous sequences from other species. Homologous amino acid sequences of Polμ, Polβ, Polλ, and Polθ or the homologous amino acid sequence of X family polymerases also possess terminal transferase activity. References to terminal transferase also include homologous amino acid sequences of Polμ, Polβ, Polλ, and Polθ or the homologous amino acid sequence of X family polymerases where such sequences possess terminal transferase activity.
Disclosed herein is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least one amino acid modification when compared to a wild type sequence, wherein the modification is selected from one or more of the amino acid regions WLLNRLINRLQNQGILLYYDIV, VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK, and DYIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
Described herein is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least the sequence ID 729:
or the equivalent homologous region in other species, wherein the sequence has one or more amino acid modifications in one or more of the amino acid regions WLLNRLINRLQNQGILLYYDIV, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK, and DYIDP of the sequence. The sequence above of 355 amino acids can be attached to other amino acids without affecting the function of the enzyme. For example there can be a further N-terminal sequence that is incorporated simply as a protease cleavage site, for example the sequence MENLYFQG.
Disclosed is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least one amino acid modification when compared to a wild type sequence SEQ ID NO 1 or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) enzyme in other species, wherein the modification is selected from one or more of the amino acid regions WLLNRLINRLQNQGILLYYDI, VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA, and YIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
Further disclosed is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising at least two amino acid modifications when compared to a wild type sequence SEQ ID NO 1 or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) enzyme in other species, wherein;
a. a first modification is within the amino acid region WLLNRLINRLQNQGILLYYDIV of the sequence of SEQ ID NO 1 or the homologous region in other species; and
b. a second modification is selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA, and YIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species.
When compared to the sequence of Bos taurus (cow) TdT; SEQ ID NO 2,
a. a first modification is within the amino acid region QLLPKVINLWEKKGLLLYYDLV of the sequence of SEQ ID NO 2 or the homologous region in other species; and
b. a second modification is selected from one or more of the amino acid regions LVLF, MGA, NNYNH, FMRA, FTI, VKC, FRS, MSDKT, MQK, EAEA, AVW, KKI, SPGSAE, MCP, YATHERKMMLDNHA, and YIEP of the sequence of SEQ ID NO 2 or the homologous regions in other species.
When compared to the sequence of Mus musculus (mouse) TdT; SEQ ID NO 3,
a. a first modification is within the amino acid region QLLHKVTDFWKQQGLLLYCDIL of the sequence of SEQ ID NO 3 or the homologous region in other species; and
b. a second modification is selected from one or more of the amino acid regions LVLF, MGA, NNYNQ, FMRA, FPI, VKS, FRT, QSDKS, MQK, VSCVNR, EAEA, AVV, KMT, SPEATE, DHFQ, MCPYDR, YATHERKMMLDNHA, and YIEP of the sequence of SEQ ID NO 3 or the homologous regions in other species.
The modifications can be chosen from any amino acid that differs from the wild type sequence. The amino acid can be a naturally occurring amino acid. The modified amino acid can be selected from ala, arg, asn, asp, cys, gln, glu, gly, his, ile, leu lys, met, phe, pro, ser, thr, trp, val, and sec.
For the purposes of brevity, the modifications are further described in relation to SEQ ID NO 1, but the modifications are applicable to the sequences from other species, for example those sequences listed above having sequences in the NCBI database. The sequence modifications also apply to truncated versions of SEQ ID NO 1.
The sequences can be modified at positions in addition to those regions described. Embodiments on the invention may include for example sequences having modifications to amino acids outside the defined positions, providing those sequences retain terminal transferase activity. Embodiments of the invention may include for example sequences having truncations of amino acids outside the defined positions, providing those sequences retain terminal transferase activity. For example the sequences may be BRCT truncated as described in application WO2018215803 where amino acids are removed from the N-terminus whilst retaining or improving activity. Alterations, additions, insertions or deletions or truncations to amino acid positions outside the claimed regions are therefore within the scope of the invention, providing that the claimed regions as defined are modified as claimed. The sequences described herein refer to TdT enzymes, which are typically at least 300 amino acids in length. All sequences described herein can be seen as having at least 300 amino acids. The claims do not cover peptide fragments or sequences which do not function as terminal transferase enzymes.
The modification within the region WLLNRLINRLQNQGILLYYDIV or the corresponding region from other species help improve the solubility of the enzyme. The modification within the amino acid region WLLNRLINRLQNQGILLYYDIV can be at one or more of the underlined amino acids.
Particular changes can be selected from W-Q, N-P, R-K, L-V, R-L, L-W, Q-E, N-K, Q-K or I-L.
The sequence WLLNRLINRLQNQGILLYYDIV can be altered to QLLPKVINLWEKKGLLLYYDLV.
The second modification improves incorporation of nucleotides having a modification at the 3′ position in comparison to the wild type sequence. The second modification can be selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ LAAG, APPVDN, FARHERKMLLDNHA, and YIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species. The second modification can be selected from two or more of the amino acid regions VAIF, EDN, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA, and YIDP of the sequence of SEQ ID NO 1 or the homologous regions in other species shown highlighted in the sequence below.
The identified positions commence at positions V32, M108, F182, T212, D271, M279, E298, A421, L456, Y486. Modifications disclosed herein contain at least one modification at the defined positions.
In the sequence below the modified regions are numbered thus
WLLNRLINRLQNQGILLYYDIV, 349 to 370
VAIF, 32 to 35
MGA, 108 to 110
MENHNQI, 152 to 158
SEGPCLAFMRA, 175 to 185
HAISSS, 194 to 199
DQTKA, 210 to 214
KGFHS, 260 to 264
QADNA, 269 to 273
HFTKMQK, 275 to 281
SAAVCK, 291 to 296
EAQA, 298 to 301
TVRLI, 309 to 313
GKEC, 328 to 331
TPEMGK, 341 to 346
DHFQK, 388 to 392
LAAG, 403 to 406
APPVDNF, 421 to 427
FARHERKMLLDNHALYDKTKK, 447 to 467
DYIDP, 485 to 489
The modified amino acid can be in the region FMRA. The modified amino acid can be in the region QADNA. The modified amino acid can be in the region EAQA. The modified amino acid can be in the region APP. The modified amino acid can be in the region LDNHA. The modified amino acid can be in the region YIDP. The region FARHERKMLLDNHA is advantageous for removing substrate biases in modifications. The FARHERKMLLDNHA region appears highly conserved across species.
The modification selected from one or more of the amino acid regions FMRA, QADNA, EAQA, APP, FARHERKMLLDNHA, and YIDP can be at the underlined amino acid(s).
Rather than the invention being described by modifications in defined domains, the sequences can be defined by modifications at certain defined amino acids. Described herein are modified terminal deoxynucleotidyl transferase (TdT) enzymes comprising amino acid modifications when compared to a wild type sequence SEQ ID NO 1 or a truncated version thereof or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) enzyme in other species, wherein the amino acid is modified at one or more of the amino acids A53, V68, V71, E97, I101, G109, Q115, V116, S125, T137, Q143, M152, N154, H155, Q157, I158, I165, N169, S175, G177, C179, L180, A181, M183, A195, S197, S198, S199, K200, D210, Q211, T212, K213, A214, E217, T239, K260, F262, S264, Q269, D271, N272, A273, H275, T277, K281, S291, K296, Q300, T309, R311, L312, I313, G328, E330, C331, T341, P342, E343, M344, G345, K346, N352, R353, L354, I355, N356, L358, N360, Q361, G362, I363, Y366, Y367, D368, V370, H389, K392, L403, A405, G406, D411, A421, P422, V424, N426, F427, R438, F447, R452, K453, L455, Y462, K464, T465, K467, D485, I487 and/or D488.
Described herein are modified terminal deoxynucleotidyl transferase (TdT) enzymes comprising amino acid modifications when compared to a wild type sequence SEQ ID NO 1 or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) enzyme in other species, wherein the amino acid is modified at one or more of the amino acids
V32, A33, I34, F35, A53, V68, V71, E97, I101, M108, G109, A110, Q115, V116, S125, T137, Q143, M152, E153, N154, H155, N156, Q157, I158, I165, N169, N173, S175, E176, G177, P178, C179, L180, A181, F182, M183, R184, A185, L188, H194, A195, I196, S197, S198, S199, K200, E203, G204, D210, Q211, T212, K213, A214, I216, E217, D218, L220, Y222, V228, D230, Q238, T239, L242, L251, K260, G261, F262, H263, S264, L265, E267, Q269, A270, D271, N272, A273, H275, F276, T277, K278, M279, Q280, K281, S291, A292, A293, V294, C295, K296, E298, A299, Q300, A301, Q304, I305, T309, V310, R311, L312, I313, A314, I318, V319, T320, G328, K329, E330, C331, L338, T341, P342, E343, M344, G345, K346, W349, L350, L351, N352, R353, L354, I355, N356, R357, L358, Q359, N360, Q361, G362, I363, L364, L365, Y366, Y367, D368, I369, V370, K376, T377, C381, K383, D388, H389, F390, Q391, K392, F394, I397, K398, K400, K401, E402, L403, A404, A405, G406, R407, D411, A421, P422, P423, V424, D425, N426, F427, A430, R438, F447, A448, R449, H450, E451, R452, K453, M454, L455, L456, D457, N458, H459, A460, L461, Y462, D463, K464, T465, K466, K467, T474, D477, D485, Y486, I487, D488, P489.
Specific amino acid changes can include any one of A53G, V68I, V71I, E97A, I101V, G109E, G109R, Q115E, V116I, V116S, S125R, T137A, Q143P, M152L, M152T, N154H, H155C, H155N, H155R, Q157K, Q157R, I158V, I158L, I158M, I165V, N169R, N173R, S175N, G177S, G177V, G177D, C179A, C179S, L180V, A181E, M183R, L188V, A195T, A195P, S197I, S197L, S198R, S198C, S199L, K200R, E203Q, G204D, D210E, Q211R, T212S, K213S, A214R, A214C, A214G, I216M, E217Q, D218E, L220I, L220F, L220Y, Y222C, V228A, D230E, Q238K, T239S, L242Q, L251P, K260M, F262L, L265F, S264T, E267D, Q269K, D271N, D271E, N272K, A273S, A273T, H275Q, T277S, K281R, S291N, S291T, K296R, Q300D, Q304R, Q304C, Q304H, I305V, T309A, R311W, R311H, L312G, L312K, L312A, I313M, A314T, I318L, V319L, T320A, G328A, E330N, E330S, C331I, L338I, T341S, P342A, E343G, E343Q, M344A, M344Q, M344K, G345R, K346R, N352Q, K353R, K353D, V354I, V354L, I355V, I355M, N356R, N356D, W358L, N360K, Q361K, G362E, I363L, Y366F, Y367C, D368H, D368E, V370I, K376I, T377S, T377L, T377A, C381S, K383R, H389N, H389A, K392Q, F394W, I397L, K398R, K400E, K400N, K401A, K401Q, E402Q, L403Q, L403R, A405D, G406R, R407C, D411N, A421L, A421M, A421V, P422A, P422C, V424Y, V424I, V424A, N426R, N426C, F427Y, A430T, R438K, F447W, R452K, L455I, K453R, Y462F, K464R, T465R, K467R, T474S, D477E, D485E, I487V, and/or D488P.
Amino acid changes include any two or more of A53G, V68I, V71I, E97A, I101V, G109E, G109R, Q115E, V116I, V116S, S125R, T137A, Q143P, M152L, M152T, N154H, H155C, H155N, H155R, Q157K, Q157R, I158V, I158L, I158M, I165V, N169R, N173R, S175N, G177S, G177V, G177D, C179A, C179S, L180V, A181E, M183R, L188V, A195T, A195P, S197I, S197L, S198R, S198C, S199L, K200R, E203Q, G204D, D210E, Q211R, T212S, K213S, A214R, A214C, A214G, I216M, E217Q, D218E, L220I, L220F, L220Y, Y222C, V228A, D230E, Q238K, T239S, L242Q, L251P, K260M, F262L, L265F, S264T, E267D, Q269K, D271N, D271E, N272K, A273S, A273T, H275Q, T277S, K281R, S291N, S291T, K296R, Q300D, Q304R, Q304C, Q304H, I305V, T309A, R311W, R311H, L312G, L312K, L312A, I313M, A314T, I318L, V319L, T320A, G328A, E330N, E330S, C331I, L338I, T341S, P342A, E343G, E343Q, M344A, M344Q M344K, G345R, K346R, N352Q, K353R, K353D, V354I, V354L, I355V, I355M, N356R, N356D, W358L, N360K, Q361K, G362E, I363L, Y366F, Y367C, D368H, D368E, V370I, K376I, T377S, T377L, T377A, C381S, K383R, H389N, H389A, K392Q, F394W, I397L, K398R, K400E, K400N, K401A, K401Q, E402Q, L403Q, L403R, A405D, G406R, R407C, D411N, A421L, A421M, A421V, P422A, P422C, V424Y, V424I, V424A, N426R, N426C, F427Y, A430T, R438K, F447W, R452K, L455I, K453R, Y462F, K464R, T465R, K467R, T474S, D477E, D485E, I487V, and/or D488P.
The modification of QADNA to KADKA, QADKA, KADNA, QADNS, KADNT, or QADNT is advantageous for the incorporation of 3′-O-modified nucleoside triphosphates to the 3′-end of nucleic acids and removing substrate biases during the incorporation of modified nucleoside triphosphates. The modification of APPVDN to MCPVDN, MPPVDN, ACPVDR, VPPVDN, LPPVDR, ACPYDN, LCPVDN, or MAPVDN is advantageous for the incorporation of 3′-O-modified nucleoside triphosphates to the 3′-end of nucleic acids and removing substrate biases during the incorporation of modified nucleoside triphosphates. The modification of FARHERKMLLDRHA to WARHERKMILDNHA, FARHERKMILDNHA, WARHERKMLLDNHA, FARHERKMLLDRHA, or FARHEKKMLLDNHA is also advantageous for the incorporation of 3′-O-modified nucleoside triphosphates to the 3′-end of nucleic acids and removing substrate biases during the incorporation of modified nucleoside triphosphates.
The modification can be selected from one or more of the following sequences FRRA, QADKA, EADA, MPP, FARHERKMLLDRHA, and YIPP. Included is a modified terminal deoxynucleotidyl transferase (TdT) enzyme wherein the second modification is selected from two or more of the following sequences FRRA, QADKA, EADA, MPP, FARHERKMLLDRHA, and YIPP. Included is a modified terminal deoxynucleotidyl transferase (TdT) enzyme wherein the second modification contains each of the following sequences FRRA, QADKA, EADA, MPP, FARHERKMLLDRHA, and YIPP.
Crystal structures shown herein show the following domains, which may be preferred as domains to modify:
In order to aid purification of the expressed sequence, the amino acid can be further modified. For example the amino acid sequence can contain one or more further histidine residues at the terminus. Included is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising any one of SEQ ID NOs 4 to 173 or a truncated version thereof. Sequences 4-173 are the full length sequences derived from the spotted gar. Included is a modified terminal deoxynucleotidyl transferase (TdT) enzyme comprising any one of SEQ ID NOs 174 to 343. Sequences 174 to 343 are N-truncated sequences as spotted gar/bovine chimeras. Sequences 344 to 727 are spotted Gar sequences in truncated form. Additionally, for these sequences, there is an N-terminal sequence that is incorporated simply as a protease cleavage site (MENLYFQG . . . ).
Also disclosed is a method of nucleic acid synthesis, which comprises the steps of:
(a) providing an initiator oligonucleotide;
(b) adding a 3′-blocked nucleoside triphosphate to said initiator oligonucleotide in the presence of a terminal deoxynucleotidyl transferase (TdT) as defined herein;
(c) removal of all reagents from the initiator oligonucleotide;
(d) cleaving the blocking group in the presence of a cleaving agent; and
(e) removal of the cleaving agent.
The method can add greater than 1 nucleotide by repeating steps (b) to (e).
References herein to ‘nucleoside triphosphates’ refer to a molecule containing a nucleoside (i.e. a base attached to a deoxyribose or ribose sugar molecule) bound to three phosphate groups. Examples of nucleoside triphosphates that contain deoxyribose are: deoxyadenosine triphosphate (dATP), deoxyguanosine triphosphate (dGTP), deoxycytidine triphosphate (dCTP) or deoxythymidine triphosphate (dTTP). Examples of nucleoside triphosphates that contain ribose are: adenosine triphosphate (ATP), guanosine triphosphate (GTP), cytidine triphosphate (CTP) or uridine triphosphate (UTP). Other types of nucleosides may be bound to three phosphates to form nucleoside triphosphates, such as naturally occurring modified nucleosides and artificial nucleosides.
Therefore, references herein to ‘3′-blocked nucleoside triphosphates’ refer to nucleoside triphosphates (e.g., dATP, dGTP, dCTP or dTTP) which have an additional group on the 3′ end which prevents further addition of nucleotides, i.e., by replacing the 3′-OH group with a protecting group.
It will be understood that references herein to ‘3′-block’, ‘3′-blocking group’ or ‘3′-protecting group’ refer to the group attached to the 3′ end of the nucleoside triphosphate which prevents further nucleotide addition. The present method uses reversible 3′-blocking groups which can be removed by cleavage to allow the addition of further nucleotides. By contrast, irreversible 3′-blocking groups refer to dNTPs where the 3′-OH group can neither be exposed nor uncovered by cleavage.
The 3′-blocked nucleoside 5′-triphosphate can be blocked by any chemical group that can be unmasked to reveal a 3′-OH. The 3′-blocked nucleoside triphosphate can be blocked by a 3′-O-azidomethyl, 3′-aminooxy, 3′-O—(N-oxime) (3′-O—N═CR1R2, where R1 and R2 are each a C1-C3 alkyl group, for example CH3, such that the oxime can be O—N═C(CH3)2 (N-acetoneoxime)), 3′-O-allyl group, 3′-O-cyanoethyl, 3′-O-acetyl, 3′-O-nitrate, 3′-phosphate, 3′-O-acetyl levulinic ester, 3′-O-tert butyl dimethyl silane, 3′-O-trimethyl(silyl)ethoxymethyl, 3′-O-ortho-nitrobenzyl, and 3′-O-para-nitrobenzyl.
The 3′-blocked nucleoside 5′-triphosphate can also be blocked by any chemical group that can be directly utilized in chemical ligations, such as copper-catalyzed or copper-free azide-alkyne click reactions and tetrazine-alkene click reactions. The 3′-blocked nucleoside triphosphate can include chemical moieties containing an azide, alkyne, alkene, and tetrazine.
References herein to ‘cleaving agent’ refer to a substance which is able to cleave the 3′-blocking group from the 3′-blocked nucleoside triphosphate. In one embodiment, the cleaving agent is a chemical cleaving agent. In an alternative embodiment, the cleaving agent is an enzymatic cleaving agent. The cleaving can be done in a single step, or can be a multi-step process, for example to transform an oxime (such as for example 3′-O—(N-oxime), 3′-O—N═C(CH3)2, into aminooxy (O—NH2), followed by cleaving the aminooxy to OH.
It will be understood by the person skilled in the art that the selection of cleaving agent is dependent on the type of 3′-nucleotide blocking group used. For instance, tris(2-carboxyethyl)phosphine (TCEP) or tris(hydroxypropyl)phosphine (THPP) can be used to cleave a 3′-O-azidomethyl group, palladium complexes can be used to cleave a 3′-O-allyl group, or sodium nitrite can be used to cleave a 3′-aminooxy group. Therefore, in one embodiment, the cleaving agent is selected from: tris(2-carboxyethyl)phosphine (TCEP), a palladium complex or sodium nitrite.
In one embodiment, the cleaving agent is added in the presence of a cleavage solution comprising a denaturant, such as urea, guanidinium chloride, formamide or betaine. The addition of a denaturant has the advantage of being able to disrupt any undesirable secondary structures in the DNA. In a further embodiment, the cleavage solution comprises one or more buffers. It will be understood by the person skilled in the art that the choice of buffer is dependent on the exact cleavage chemistry and cleaving agent required.
References herein to an ‘initiator oligonucleotide’ or ‘initiator sequence’ refer to a short oligonucleotide with a free 3′-end which the 3′-blocked nucleoside triphosphate can be attached to. In one embodiment, the initiator sequence is a DNA initiator sequence. In an alternative embodiment, the initiator sequence is an RNA initiator sequence.
References herein to a ‘DNA initiator sequence’ refer to a small sequence of DNA which the 3′-blocked nucleoside triphosphate can be attached to, i.e., DNA will be synthesised from the end of the DNA initiator sequence.
In one embodiment, the initiator sequence is between 5 and 50 nucleotides long, such as between 5 and 30 nucleotides long (i.e. between 10 and 30), in particular between 5 and 20 nucleotides long (i.e., approximately 20 nucleotides long), more particularly 5 to 15 nucleotides long, for example 10 to 15 nucleotides long, especially 12 nucleotides long.
In one embodiment, the initiator sequence is single-stranded. In an alternative embodiment, the initiator sequence is double-stranded. It will be understood by persons skilled in the art that a 3′-overhang (i.e., a free 3′-end) allows for efficient addition.
In one embodiment, the initiator sequence is immobilised on a solid support. This allows TdT and the cleaving agent to be removed (in steps (c) and (e), respectively) without washing away the synthesised nucleic acid. The initiator sequence may be attached to a solid support stable under aqueous conditions so that the method can be easily performed via a flow setup.
In one embodiment, the initiator sequence is immobilised on a solid support via a reversible interacting moiety, such as a chemically-cleavable linker, an antibody/immunogenic epitope, a biotin/biotin binding protein (such as avidin or streptavidin), or glutathione-GST tag. Therefore, in a further embodiment, the method additionally comprises extracting the resultant nucleic acid by removing the reversible interacting moiety in the initiator sequence, such as by incubating with proteinase K.
In one embodiment, the initiator sequence contains a base or base sequence recognisable by an enzyme. A base recognised by an enzyme, such as a glycosylase, may be removed to generate an abasic site which may be cleaved by chemical or enzymatic means. A base sequence may be recognised and cleaved by a restriction enzyme.
In a further embodiment, the initiator sequence is immobilised on a solid support via a chemically-cleavable linker, such as a disulfide, allyl, or azide-masked hemiaminal ether linker. Therefore, in one embodiment, the method additionally comprises extracting the resultant nucleic acid by cleaving the chemical linker through the addition of tris(2-carboxyethyl)phosphine (TCEP) or dithiothreitol (DTT) for a disulfide linker; palladium complexes or an allyl linker; or TCEP for an azide-masked hemiaminal ether linker.
In one embodiment, the resultant nucleic acid is extracted and amplified by polymerase chain reaction using the nucleic acid bound to the solid support as a template. The initiator sequence could therefore contain an appropriate forward primer sequence and an appropriate reverse primer could be synthesised.
In one embodiment, the terminal deoxynucleotidyl transferase (TdT) of the invention is added in the presence of an extension solution comprising one or more buffers (e.g., Tris or cacodylate), one or more salts (e.g., Na+, K+, Mg2+, Mn2+, Cu2+, Zn2+, Co2+, etc. all with appropriate counterions, such as Cl) and inorganic pyrophosphatase (e.g., the Saccharomyces cerevisiae homolog). It will be understood that the choice of buffers and salts depends on the optimal enzyme activity and stability. The use of an inorganic pyrophosphatase helps to reduce the build-up of pyrophosphate due to nucleoside triphosphate hydrolysis by TdT. Therefore, the use of an inorganic pyrophosphatase has the advantage of reducing the rate of (1) backwards reaction and (2) TdT strand dismutation.
In one embodiment, step (b) is performed at a pH range between 5 and 10. Therefore, it will be understood that any buffer with a buffering range of pH 5-10 could be used, for example cacodylate, Tris, HEPES or Tricine, in particular cacodylate or Tris.
In one embodiment, step (d) is performed at a temperature less than 99° C., such as less than 95° C., 90° C., 85° C., 80° C., 75° C., 70° C., 65° C., 60° C., 55° C., 50° C., 45° C., 40° C., 35° C., or 30° C. It will be understood that the optimal temperature will depend on the cleavage agent utilised. The temperature used helps to assist cleavage and disrupt any secondary structures formed during nucleotide addition.
In one embodiment, steps (c) and (e) are performed by applying a wash solution. In one embodiment, the wash solution comprises the same buffers and salts as used in the extension solution described herein. This has the advantage of allowing the wash solution to be collected after step (c) and recycled as extension solution in step (b) when the method steps are repeated.
Also disclosed is a kit comprising a terminal deoxynucleotidyl transferase (TdT) as defined herein in combination with an initiator sequence and one or more 3′-blocked nucleoside triphosphates.
The invention includes the nucleic acid sequence used to express the modified terminal transferase. Included within the invention are the codon-optimized cDNA sequences which express the modified terminal transferase. Included are the codon-optimized cDNA sequences for each of the protein variants (SEQ ID NOs 4-727).
The nucleic acid sequence may be the sequence below (ID 728):
The invention includes a cell line producing the modified terminal transferase.
Expression of TdT Variants
Briefly, plasmids containing genes encoding terminal transferase enzymes were transformed into BL21 E. coli competent cells. Starter Luria broth (LB) cultures were grown overnight at 37° C. and inoculated into LB expression cultures. Expression cultures were grown to an optical density of 0.6 at 600 nm and induced by the addition of IPTG to 1 mM. Cultures were induced and grown overnight at 25° C. The following morning, cultures were lysed in a detergent lysis buffer and purified to homogeneity by immobilized metal affinity chromatography (IMAC).
Assaying the Incorporation of Reversible Terminators by TdT Variants
173 terminal transferase enzymes were expressed, purified, and compared against wild-type bovine TdT (SEQ ID NO 2). Purified engineered TdTs were then used in the following assays: a fluorescently labeled 15-nt ssDNA primer was incubated with 1×TdT buffer (Thermo Fisher Scientific), yeast inorganic pyrophosphatase (Sigma-Aldrich, 0.1 mU/μl), 3′-azidomethyl dTTP or 3′-aminooxy dATP, and engineered TdT (24 μg/μl) for 10 min at 37° C. Formamide (Fisher Scientific) was then used to quench the reaction, and samples were directly loaded onto and analysed by denaturing polyacrylamide gel electrophoresis. Gels were imaged and resulting gel bands were quantified with a Typhoon scanner (GE).
The results from the 173 TdT enzymes (SEQ ID NOs 1-173) are shown in
Assaying the Incorporation of Base-Modified Reversible Terminators by TdT Variants
192 terminal deoxynucleotidyl transferases (TdT) variants were expressed and purified as described above. The expressed variants had SEQ ID NOs:
345, 347, 352, 357, 359, 360, 361, 362, 364, 365, 366, 367, 368, 370, 371, 372, 375, 376, 377, 378, 380, 382, 383, 384, 385, 387, 388, 392, 393, 394, 395, 397, 398, 399, 401, 405, 406, 410, 411, 416, 418, 422, 426, 427, 430, 433, 436, 439, 440, 442, 444, 445, 446, 447, 450, 453, 454, 455, 457, 460, 461, 462, 463, 464, 467, 472, 473, 475, 476, 477, 478, 479, 480, 485, 486, 487, 489, 492, 494, 495, 497, 499, 500, 503, 505, 506, 507, 509, 510, 514, 516, 517, 519, 524, 525, 526, 527, 528, 529, 531, 532, 533, 535, 543, 544, 546, 550, 553, 555, 557, 559, 560, 561, 562, 564, 565, 567, 568, 570, 572, 573, 575, 580, 582, 584, 589, 593, 595, 598, 599, 600, 601, 604, 605, 606, 609, 611, 612, 614, 618, 619, 620, 623, 629, 637, 638, 639, 641, 643, 644, 646, 648, 649, 651, 652, 654, 657, 658, 660, 661, 662, 664, 665, 666, 667, 670, 673, 678, 679, 681, 684, 685, 687, 690, 692, 698, 699, 700, 703, 706, 707, 708, 711, 712, 715, 716, 717, 718, 720, 721, 722, 725.
The base-modified reversibly terminators A*, C*, G*, and T* were supplied as substrates to engineered variants in a single nucleotide incorporation assay. A pool of DNA initiators with degenerate ends ( . . . NNN, where N=A, C, G, T) was immobilised to a solid support and exposed to nucleotide addition mixture (NAM) solutions containing (1) a TdT variant, (2) neutral pH buffer, (3) monovalent salt, (4) cobalt chloride, and (5) 3′-ONH2-dXTP, where X is the modified nucleobase resulting in A*, C*, G*, or T*. The incubation temperature was 37° C. and the reaction time was 2.5 minutes. The solid support was then washed with a high salt solution at neutral pH followed by a low salt solution at neutral pH. The initiator pools transformed into sequencing libraries and analysed by next-generation sequencing (NGS) on an Illumina NextSeq500 with paired-end 30 cycle reads (PE30 reads). Bcl files were converted to fastq files with Illumina's bcl2fastq conversion software and analysed in R. The incorporation efficiency ([reads containing N+1 product]/{[reads containing N initiator]+[reads containing N+1 product]}) was calculated for the addition of each modified base against to all possible initiator contexts. The average incorporation efficiency across all contexts is presented in
Assaying the Multicycling Capabilities of TdT Variants
Solid support synthesis of an 8-nt nucleic acid sequence using 3′-ONH2 nucleoside 5′-triphosphates with terminal deoxynucleotidyl transferase (TdT) SEQ ID NOs 1 & 344-727. The TdT variants were expressed and purified as described above. The DNA sequence 5′-ATCGATCG-3′ was synthesized by repeated exposure of a solid support bound DNA initiator to nucleotide addition mixture (NAM) solutions containing (1) TdT, (2) neutral pH buffer, (3) monovalent salt, (4) cobalt chloride, and (5) 3′-ONH2-dNTP, where N is selected from adenine, thymine, cytosine, or guanine. A cycle consists of the following: (A) NAM solution with a specified A, T, C, or G reversibly terminated nucleotide were incubated for 5 minutes on the solid support at 37° C.; (B) the solid support was then washed with a high salt solution at neutral pH; (C) the solid support was then exposed to acidic aqueous sodium nitrite; and (D) the solid support was then washed with the same high salt solution at neutral pH from (B). (A)-(D) were then repeated 7 more times to synthesize the desired 8-nt sequence. Synthesized DNA were analysed by running reactions on a denaturing polyacrylamide gel and quantified by virtue of a fluorophore covalently attached to the DNA initiator. The fraction full length (8-nt species) was determined by taking fluorescence intensity of the 8-nt band and dividing by total lane intensity. Wild-type bovine and spotted gar TdT activities as it relates to fraction full-length are indicated by the dotted line at y=0 indicating that they are incapable of synthesizing any 8-nt product. All mutations included in the TdT variants resulted in improvements in TdT relative to wild-type bovine and spotted gar TdTs. Results are shown in
Number | Date | Country | Kind |
---|---|---|---|
1901501.5 | Feb 2019 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2020/050247 | 2/4/2020 | WO | 00 |