Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase

Information

  • Patent Application
  • 20060078909
  • Publication Number
    20060078909
  • Date Filed
    June 07, 2005
    19 years ago
  • Date Published
    April 13, 2006
    18 years ago
Abstract
The present invention relates to the field of DNA recombinant technology. More specifically, this invention relates to fusion proteins comprising an ATP generating polypeptide joined to a polypeptide that converts ATP into a detectable entity. Accordingly, this invention focuses on sulfurylase-luciferase fusion proteins. This invention also relates to pharmaceutical compositions containing the fusion proteins and methods for using them.
Description
FIELD OF THE INVENTION

The invention relates generally to fusion proteins that are useful as reporter proteins, in particular to fusion proteins of ATP sulfurylase and luciferase which are utilized to achieve an efficient conversion of pyrophosphate (PPi) to light. This invention also relates to a novel thermostable sulfurylase which can be used in the detection of inorganic pyrophosphate, particularly in the sequencing of nucleic acid.


BACKGROUND OF THE INVENTION

ATP sulfurylase has been identified as being involved in sulfur metabolism. It catalyzes the initial reaction in the metabolism of inorganic sulfate (SO4−2); see e.g., Robbins and Lipmann, 1958. J. Biol. Chem. 233: 686-690; Hawes and Nicholas, 1973. Biochem. J. 133: 541-550). In this reaction SO4−2 is activated to adenosine 5′-phosphosulfate (APS). ATP sulfurylase is also commonly used in pyrophosphate sequencing methods. In order to convert pyrophosphate (PPi) generated from the addition of dNMP to a growing DNA chain to light, PPi must first be converted to ATP by ATP sulfurylase.


ATP produced by an ATP sulfurylase can also be hydrolyzed using enzymatic reactions to generate light. Light-emitting chemical reactions (i.e., chemiluminescence) and biological reactions (i.e., bioluminescence) are widely used in analytical biochemistry for sensitive measurements of various metabolites. In bioluminescent reactions, the chemical reaction that leads to the emission of light is enzyme-catalyzed. For example, the luciferin-luciferase system allows for specific assay of ATP. Thus, both ATP generating enzymes, such as ATP sulfurylase, and light emitting enzymes, such as luciferase, could be useful in a number of different assays for the detection and/or concentration of specific substances in fluids and gases. Since high physical and chemical stability is sometimes required for enzymes involved in sequencing reactions, a thermostable enzyme is desirable.


Because the product of the sulfurylase reaction is consumed by luciferase, proximity between these two enzymes by covalently linking the two enzymes in the form of a fusion protein would provide for a more efficient use of the substrate. Substrate channeling is a phenomenon in which substrates are efficiently delivered from enzyme to enzyme without equilibration with other pools of the same substrates. In effect, this creates local pools of metabolites at high concentrations relative to those found in other areas of the cell. Therefore, a fusion of an ATP generating polypeptide and an ATP converting peptide could benefit from the phenomenon of substrate channeling and would reduce production costs and increase the number of enzymatic reactions that occur during a given time period.


All patents and publications cited throughout the specification are hereby incorporated by reference into this specification in their entirety in order to more fully describe the state of the art to which this invention pertains.


SUMMARY OF THE INVENTION

The invention provides a fusion protein comprising an ATP generating polypeptide bound to a polypeptide which converts ATP into an entity which is detectable. In one aspect, the invention provides a fusion protein comprising a sulfurylase polypeptide bound to a luciferase polypeptide. This invention provides a nucleic acid that comprises an open reading frame that encodes a novel thermostable sulfurylase polypeptide. In a further aspect, the invention provides for a fusion protein comprising a thermostable sulfurylase joined to at least one affinity tag.


In another aspect, the invention provides a recombinant polynucleotide that comprises a coding sequence for a fusion protein having a sulfurylase poylpeptide sequence joined to a luciferase polypeptide sequence. In a further aspect, the invention provides an expression vector for expressing a fusion protein. The expression vector comprises a coding sequence for a fusion protein having: (i) a regulatory sequence, (ii) a first polypeptide sequence of an ATP generating polypeptide and (iii) a second polypeptide sequence that converts ATP to an entity which is detectable. In an additional embodiment, the fusion protein comprises a sulfurylase polypeptide and a luciferase polypeptide. In another aspect, the invention provides a transformed host cell which comprises the expression vector. In an additional aspect, the invention provides a fusion protein bound to a mobile support. The invention also includes a kit comprising a sulfurylase-luciferase fusion protein expression vector.


The invention also includes a method for determining the nucleic acid sequence in a template nucleic acid polymer, comprising: (a) introducing the template nucleic acid polymer into a polymerization environment in which the nucleic acid polymer will act as a template polymer for the synthesis of a complementary nucleic acid polymer when nucleotides are added; (b) successively providing to the polymerization environment a series of feedstocks, each feedstock comprising a nucleotide selected from among the nucleotides from which the complementary nucleic acid polymer will be formed, such that if the nucleotide in the feedstock is complementary to the next nucleotide in the template polymer to be sequenced said nucleotide will be incorporated into the complementary polymer and inorganic pyrophosphate will be released; (c) separately recovering each of the feedstocks from the polymerization environment; and (d) measuring the amount of PPi with an ATP generating polypeptide-ATP converting polypeptide fusion protein in each of the recovered feedstocks to determine the identity of each nucleotide in the complementary polymer and thus the sequence of the template polymer. In one embodiment, the amount of inorganic pyrophosphate is measured by the steps of: (a) adding adenosine-5′-phosphosulfate to the feedstock; (b) combining the recovered feedstock containing adenosine-5′-phosphosulfate with an ATP generating polypeptide-ATP converting polypeptide fusion protein such that any inorganic pyrophosphate in the recovered feedstock and the adenosine-5′-phosphosulfate will react to the form ATP and sulfate; (c) combining the ATP and sulfate-containing feedstock with luciferin in the presence of oxygen such that the ATP is consumed to produced AMP, inorganic pyrophosphate, carbon dioxide and light; and (d) measuring the amount of light produced.


In another aspect, the invention includes a method wherein each feedstock comprises adenosine-5′-phosphosulfate and luciferin in addition to the selected nucleotide base, and the amount of inorganic pyrophosphate is determined by reacting the inorganic pyrophosphate feedstock with an ATP generating polypeptide-ATP converting polypeptide fusion protein thereby producing light in an amount proportional to the amount of inorganic pyrophosphate, and measuring the amount of light produced.


In another aspect, the invention provides a method for sequencing a nucleic acid, the method comprising; (a) providing one or more nucleic acid anchor primers; (b) providing a plurality of single-stranded circular nucleic acid templates disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm; (c) annealing an effective amount of the nucleic acid anchor primer to at least one of the single-stranded circular templates to yield a primed anchor primer-circular template complex; (d) combining the primed anchor primer-circular template complex with a polymerase to form an extended anchor primer covalently linked to multiple copies of a nucleic acid complementary to the circular nucleic acid template; (e) annealing an effective amount of a sequencing primer to one or more copies of said covalently linked complementary nucleic acid; (f) extending the sequencing primer with a polymerase and a predetermined nucleotide triphosphate to yield a sequencing product and, if the predetermined nucleotide triphosphate is incorporated onto the 3′ end of said sequencing primer, a sequencing reaction byproduct; and (g) identifying the sequencing reaction byproduct with the use of a ATP generating polypeptide-ATP converting polypeptide fusion protein, thereby determining the sequence of the nucleic acid.


In one aspect, the invention provides a method for sequencing a nucleic acid, the method comprising: (a) providing at least one nucleic acid anchor primer; (b) providing a plurality of single-stranded circular nucleic acid templates in an array having at least 400,000 discrete reaction sites; (c) annealing a first amount of the nucleic acid anchor primer to at least one of the single-stranded circular templates to yield a primed anchor primer-circular template complex; (d) combining the primed anchor primer-circular template complex with a polymerase to form an extended anchor primer covalently linked to multiple copies of a nucleic acid complementary to the circular nucleic acid template; (e) annealing a second amount of a sequencing primer to one or more copies of the covalently linked complementary nucleic acid; (f) extending the sequencing primer with a polymerase and a predetermined nucleotide triphosphate to yield a sequencing product and, when the predetermined nucleotide triphosphate is incorporated onto the 3′ end of the sequencing primer, to yield a sequencing reaction byproduct; and (g) identifying the sequencing reaction byproduct with the use of a ATP generating polypeptide-ATP converting polypeptide fusion protein, thereby determining the sequence of the nucleic acid at each reaction site that contains a nucleic acid template.


In another aspect, the invention includes a method of determining the base sequence of a plurality of nucleotides on an array, the method comprising the steps of: (a) providing a plurality of sample DNAs, each disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, (b) adding an activated nucleotide 5′-triphosphate precursor of one known nitrogenous base to a reaction mixture in each reaction chamber, each reaction mixture comprising a template-directed nucleotide polymerase and a single-stranded polynucleotide template hybridized to a complementary oligonucleotide primer strand at least one nucleotide residue shorter than the templates to form at least one unpaired nucleotide residue in each template at the 3′-end of the primer strand, under reaction conditions which allow incorporation of the activated nucleoside 5′-triphosphate precursor onto the 3′-end of the primer strands, provided the nitrogenous base of the activated nucleoside 5′-triphosphate precursor is complementary to the nitrogenous base of the unpaired nucleotide residue of the templates; (c) determining whether or not the nucleoside 5′-triphosphate precursor was incorporated into the primer strands through detection of a sequencing byproduct with a ATP generating polypeptide-ATP converting polypeptide fusion protein, thus indicating that the unpaired nucleotide residue of the template has a nitrogenous base composition that is complementary to that of the incorporated nucleoside 5′-triphosphate precursor; and (d) sequentially repeating steps (b) and (c), wherein each sequential repetition adds and, detects the incorporation of one type of activated nucleoside 5′-triphosphate precursor of known nitrogenous base composition; and


(e) determining the base sequence of the unpaired nucleotide residues of the template in each reaction chamber from the sequence of incorporation of said nucleoside precursors.


In one aspect, the invention includes a method for determining the nucleic acid sequence in a template nucleic acid polymer, comprising: (a) introducing a plurality of template nucleic acid polymers into a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, each reaction chamber having a polymerization environment in which the nucleic acid polymer will act as a template polymer for the synthesis of a complementary nucleic acid polymer when nucleotides are added; (b) successively providing to the polymerization environment a series of feedstocks, each feedstock comprising a nucleotide selected from among the nucleotides from which the complementary nucleic acid polymer will be formed, such that if the nucleotide in the feedstock is complementary to the next nucleotide in the template polymer to be sequenced said nucleotide will be incorporated into the complementary polymer and inorganic pyrophosphate will be released; (c) detecting the formation of inorganic pyrophosphate with an ATP generating polypeptide-ATP converting polypeptide fusion protein to determine the identify of each nucleotide in the complementary polymer and thus the sequence of the template polymer.


In one aspect, the invention provides a method of identifying the base in a target position in a DNA sequence of sample DNA including the steps comprising: (a) disposing sample DNA within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, said DNA being rendered single stranded either before or after being disposed in the reaction chambers, (b) providing an extension primer which hybridizes to said immobilized single-stranded DNA at a position immediately adjacent to said target position; (c) subjecting said immobilized single-stranded DNA to a polymerase reaction in the presence of a predetermined nucleotide triphosphate, wherein if the predetermined nucleotide triphosphate is incorporated onto the 3′ end of said sequencing primer then a sequencing reaction byproduct is formed; and


(d) identifying the sequencing reaction byproduct with a ATP generating polypeptide-ATP converting polypeptide fusion protein, thereby determining the nucleotide complementary to the base at said target position.


The invention also includes a method of identifying a base at a target position in a sample DNA sequence comprising: (a) providing sample DNA disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, said DNA being rendered single stranded either before or after being disposed in the reaction chambers; (b) providing an extension primer which hybridizes to the sample DNA immediately adjacent to the target position; (c) subjecting the sample DNA sequence and the extension primer to a polymerase reaction in the presence of a nucleotide triphosphate whereby the nucleotide triphosphate will only become incorporated and release pyrophosphate (PPi) if it is complementary to the base in the target position, said nucleotide triphosphate being added either to separate aliquots of sample-primer mixture or successively to the same sample-primer mixture; (d) detecting the release of PPi with an ATP generating polypeptide-ATP converting polypeptide fusion protein to indicate which nucleotide is incorporated.


In one aspect, the invention provides a method of identifying a base at a target position in a single-stranded sample DNA sequence, the method comprising: (a) providing an extension primer which hybridizes to sample DNA immediately adjacent to the target position, said sample DNA disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 um, said DNA being rendered single stranded either before or after being disposed in the reaction chambers; (b) subjecting the sample DNA and extension primer to a polymerase reaction in the presence of a predetermined deoxynucleotide or dideoxynucleotide whereby the deoxynucleotide or dideoxynucleotide will only become incorporated and release pyrophosphate (PPi) if it is complementary to the base in the target position, said predetermined deoxynucleotides or dideoxynucleotides being added either to separate aliquots of sample-primer mixture or successively to the same sample-primer mixture, (c) detecting any release of PPi with an ATP generating polypeptide-ATP converting polypeptide fusion protein to indicate which deoxynucleotide or dideoxynucleotide is incorporated;characterized in that, the PPi-detection enzyme(s) are included in the polymerase reaction step and in that in place of deoxy- or dideoxy adenosine triphosphate (ATP) a dATP or ddATP analogue is used which is capable of acting as a substrate for a polymerase but incapable of acting as a substrate for a said PPi-detection enzyme.


In another aspect, the invention includes a method of determining the base sequence of a plurality of nucleotides on an array, the method comprising: (a) providing a plurality of sample DNAs, each disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, (b) converting PPi into light with an ATP generating polypeptide-ATP converting polypeptide fusion protein; (c) detecting the light level emitted from a plurality of reaction sites on respective portions of an optically sensitive device; (d) converting the light impinging upon each of said portions of said optically sensitive device into an electrical signal which is distinguishable from the signals from all of said other regions; (e) determining a light intensity for each of said discrete regions from the corresponding electrical signal; (f) recording the variations of said electrical signals with time.


In one aspect, the invention provides a method for sequencing a nucleic acid, the method comprising:(a) providing one or more nucleic acid anchor primers; (b) providing a plurality of single-stranded circular nucleic acid templates disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm;(c) converting PPi into a detectable entity with the use of an ATP generating polypeptide-ATP converting polypeptide fusion protein; (d) detecting the light level emitted from a plurality of reaction sites on respective portions of an optically sensitive device; (e) converting the light impinging upon each of said portions of said optically sensitive device into an electrical signal which is distinguishable from the signals from all of said other regions; (f) determining a light intensity for each of said discrete regions from the corresponding electrical signal; (g) recording the variations of said electrical signals with time.


In another aspect, the invention includes a method for sequencing a nucleic acid, the method comprising: (a) providing at least one nucleic acid anchor primer; (b) providing a plurality of single-stranded circular nucleic acid templates in an array having at least 400,000 discrete reaction sites; (c) converting PPi into a detectable entity with an ATP generating polypeptide-ATP converting polypeptide fusion protein; (d) detecting the light level emitted from a plurality of reaction sites on respective portions of an optically sensitive device; (e) converting the light impinging upon each of said portions of said optically sensitive device into an electrical signal which is distinguishable from the signals from all of said other regions; (f) determining a light intensity for each of said discrete regions from the corresponding electrical signal; (g) recording the variations of said electrical signals with time.


In another aspect, the invention includes an isolated polypeptide comprising an amino acid sequence selected from the group consisting of: (a) a mature form of an amino acid sequence of SEQ ID NO: 2; (b) a variant of a mature form of an amino acid sequence of SEQ ID NO: 2; an amino acid sequence of SEQ ID NO: 2; (c) a variant of an amino acid sequence of SEQ ID NO: 2, wherein one or more amino acid residues in said variant differs from the amino acid sequence of said mature form, provided that said variant differs in no more than 5% of amino acid residues from said amino acid sequence; (d) and at least one conservative amino acid substitution to the amino acid sequences in (a), (b), (c) or (d). The invention also includes an antibody that binds immunospecifically to the polypeptide of (a), (b), (c) or (d).


In another aspect, the invention includes an isolated nucleic acid molecule comprising a nucleic acid sequence encoding a polypeptide comprising an amino acid sequence selected from the group consisting of: (a) a mature form of an amino acid sequence of SEQ ID NO: 2; (b) a variant of a mature form of an amino acid sequence of SEQ ID NO: 2, wherein one or more amino acid residues in said variant differs from the amino acid sequence of said mature form, provided that said variant differs in no more than 5% of the amino acid residues from the amino acid sequence of said mature form; (c) an amino acid sequence of SEQ ID NO: 2; (d) a variant of an amino acid sequence of SEQ ID NO: 2, wherein one or more amino acid residues in said variant differs from the amino acid sequence of said mature form, provided that said variant differs in no more than 15% of amino acid residues from said amino acid sequence; a nucleic acid fragment encoding at least a portion of a polypeptide comprising an amino acid sequence of SEQ ID NO: 2, or a variant of said polypeptide, wherein one or more amino acid residues in said variant differs from the amino acid sequence of said mature form, provided that said variant differs in no more than 5% of amino acid residues from said amino acid sequence; (e) and a nucleic acid molecule comprising the complement of (a), (b), (c), (d) or (e).


In a further aspect, the invention provides a nucleic acid molecule wherein the nucleic acid molecule comprises nucleotide sequence selected from the group consisting of: (a) a first nucleotide sequence comprising a coding sequence differing by one or more nucleotide sequences from a coding sequence encoding said amino acid sequence, provided that no more than 20% of the nucleotides in the coding sequence in said first nucleotide sequence differ from said coding sequence; an isolated second polynucleotide that is a complement of the first polynucleotide; (b) and a nucleic acid fragment of (a) or (b). The invention also includes a vector comprising the nucleic acid molecule of (a) or (b). In another aspect, the invention includes a cell comprising the vector.


In a further aspect, the invention includes a method for determining the nucleic acid sequence in a template nucleic acid polymer, comprising: (a) introducing the template nucleic acid polymer into a polymerization environment in which the nucleic acid polymer will act as a template polymer for the synthesis of a complementary nucleic acid polymer when nucleotides are added; (b) successively providing to the polymerization environment a series of feedstocks, each feedstock comprising a nucleotide selected from among the nucleotides from which the complementary nucleic acid polymer will be formed, such that if the nucleotide in the feedstock is complementary to the next nucleotide in the template polymer to be sequenced said nucleotide will be incorporated into the complementary polymer and inorganic pyrophosphate will be released; (c) separately recovering each of the feedstocks from the polymerization environment; and (d) measuring the amount of PPi with an ATP sulfurylase and a luciferase in each of the recovered feedstocks to determine the identity of each nucleotide in the complementary polymer and thus the sequence of the template polymer.


In another aspect, the invention provides a method for sequencing a nucleic acid, the method comprising: (a) providing one or more nucleic acid anchor primers; (b) providing a plurality of single-stranded circular nucleic acid templates disposed within a plurality of cavities in an array on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm and at least 400,000 discrete sites; (c) annealing an effective amount of the nucleic acid anchor primer to at least one of the single-stranded circular templates to yield a primed anchor primer-circular template complex; (d) combining the primed anchor primer-circular template complex with a polymerase to form an extended anchor primer covalently linked to multiple copies of a nucleic acid complementary to the circular nucleic acid template; (e) annealing an effective amount of a sequencing primer to one or more copies of said covalently linked complementary nucleic acid; (f) extending the sequencing primer with a polymerase and a predetermined nucleotide triphosphate to yield a sequencing product and, if the predetermined nucleotide triphosphate is incorporated onto the 3′ end of said sequencing primer, a sequencing reaction byproduct; and (g) identifying the sequencing reaction byproduct with the use of an ATP sulfurylase and a luciferase, thereby determining the sequence of the nucleic acid.


In another aspect, the invention provides a method for sequencing a nucleic acid, the method comprising: (a) providing at least one nucleic acid anchor primer; (b) providing a plurality of single-stranded circular nucleic acid templates in an array having at least 400,000 discrete reaction sites; (c) annealing a first amount of the nucleic acid anchor primer to at least one of the single-stranded circular templates to yield a primed anchor primer-circular template complex; (d) combining the primed anchor primer-circular template complex with a polymerase to form an extended anchor primer covalently linked to multiple copies of a nucleic acid complementary to the circular nucleic acid template; (e) annealing a second amount of a sequencing primer to one or more copies of the covalently linked complementary nucleic acid; (f) extending the sequencing primer with a polymerase and a predetermined nucleotide triphosphate to yield a sequencing product and, when the predetermined nucleotide triphosphate is incorporated onto the 3′ end of the sequencing primer, to yield a sequencing reaction byproduct; and (g) identifying the sequencing reaction byproduct with the use of a thermostable sulfurylase and a luciferase, thereby determining the sequence of the nucleic acid at each reaction site that contains a nucleic acid template.


In a further aspect, the invention includes a method of determining the base sequence of a plurality of nucleotides on an array, the method comprising: (a) providing a plurality of sample DNAs, each disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, (b) adding an activated nucleotide 5′-triphosphate precursor of one known nitrogenous base to a reaction mixture in each reaction chamber, each reaction mixture comprising a template-directed nucleotide polymerase and a single-stranded polynucleotide template hybridized to a complementary oligonucleotide primer strand at least one nucleotide residue shorter than the templates to form at least one unpaired nucleotide residue in each template at the 3′-end of the primer strand, under reaction conditions which allow incorporation of the activated nucleoside 5′-triphosphate precursor onto the 3′-end of the primer strands, provided the nitrogenous base of the activated nucleoside 5′-triphosphate precursor is complementary to the nitrogenous base of the unpaired nucleotide residue of the templates; (c) detecting whether or not the nucleoside 5′-triphosphate precursor was incorporated into the primer strands through detection of a sequencing byproduct with a thermostable sulfurylase and luciferase, thus indicating that the unpaired nucleotide residue of the template has a nitrogenous base composition that is complementary to that of the incorporated nucleoside 5′-triphosphate precursor; and (d) sequentially repeating steps (b) and (c), wherein each sequential repetition adds and, detects the incorporation of one type of activated nucleoside 5′-triphosphate precursor of known nitrogenous base composition; and (e) determining the base sequence of the unpaired nucleotide residues of the template in each reaction chamber from the sequence of incorporation of said nucleoside precursors.




BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is one embodiment for a cloning strategy for obtaining the luciferase-sulfurylase sequence.



FIG. 2A and 2B show the preparative agarose gel of luciferase and sulfurylase as well as sulfurylase-luciferase fusion genes.



FIG. 3 shows the results of experiments to determine the activity of the luciferase-sulfurylase fusion protein on NTA-agarose and MPG-SA solid supports.




DETAILED DESCRIPTION OF THE INVENTION

This invention provides a fusion protein containing an ATP generating polypeptide bound to a polypeptide which converts ATP into an entity which is detectable. As used herein, the term “fusion protein” refers to a chimeric protein containing an exogenous protein fragment joined to another exogenous protein fragment. The fusion protein could include an affinity tag to allow attachment of the protein to a solid support or to allow for purification of the recombinant fusion protein from the host cell or culture supernatant, or both.


In a preferred embodiment, the ATP generating polypeptide and ATP converting polypeptide are from a eukaryote or a prokaryote. The eukaryote could be an animal, plant, fungus or yeast. In some embodiments, the animal is a mammal, rodent, insect, worm, mollusk, reptile, bird and amphibian. Plant sources of the polypeptides include but are not limited to Arabidopsis thaliana, Brassica napus, Allium sativum, Amaranthus caudatus, Hevea brasiliensis, Hordeum vulgare, Lycopersicon esculentum, Nicotiana tabacum, Oryza sativum, Pisum sativum, Populus trichocarpa, Solanum tuberosum, Secale cereale, Sambucus nigra, Ulmus americana or Triticum aestivum. Examples of fungi include but are not limited to Penicillum chrysogenum, Stachybotrys chartarum, Aspergillus fumigatus, Podospora anserina and Trichoderma reesei. Examples of sources of yeast include but are not limited to Saccharomyces cerevisiae, Candida tropicalis, Candida lypolitica, Candida utilis, Kluyveromyces lactis, Schizosaccharomyces pombe, Yarrowia lipolytica, Candida spp., Pichia spp. and Hansenula spp.


The prokaryote source could be bacteria or archaea. In some embodiments, the bacteria is E. coli, B. subtilis, Streptococcus gordonii, flavobacteria or green sulfur bacteria. In other embodiments, the archaea is Sulfolobus, Thermococcus, Methanobacterium, Halococcus, Halobacterium or Methanococcus jannaschii.


The ATP generating polypeptide can be a ATP sulfurylase, hydrolase or an ATP synthase. In a preferred embodiment, the ATP generating polypeptide is ATP sulfurylase. In one embodiment, the ATP sulfurylase is a thermostable sulfurylase cloned from Bacillus stearothermophilus (Bst) and comprising the nucleotide sequence of SEQ ID NO:1. This putative gene was cloned using genomic DNA acquired from ATCC (Cat. No. 12980D). The gene is shown to code for a functional ATP sulfurylase that can be expressed as a fusion protein with an affinity tag. The disclosed Bst sulfurylase nucleic acid (SEQ ID NO:1) includes the 1247 nucleotide sequence. An open reading frame (ORF) for the mature protein was identified beginning with an ATG codon at nucleotides 1-3 and ending with a TAA codon at nucleotides 1159-1161. The start and stop codons of the open reading frame are highlighted in bold type. The putative untranslated regions are underlined and found upstream of the initiation codon and downstream from the termination codon.

Bst Thermostable Sulfurylase Nucleotide Sequence(SEQ ID NO: 1)GTTATGAACATGAGTTTGAGCATTCCGCATGGCGGCACATTGATCAACCGTTGGAATCGG60GATTACCCAATGGATGAAGCAACGAAAACGATGGAGGTGTCCAAAGCCGAAGTAAGCGAC120CTTGAGCTGATCGGCACAGGCGCCTACAGCCCGCTCACCGGGTTTTTAAGGAAAGCCGAT180TACGATGCGGTCGTAGAAACGATGCGCCTCGCTGATGGCACTGTCTGGAGCATTCCGATC240ACGCTGGCGGTGACGGAAGAAAAAGCGAGTGAACTCACTGTCGGCGACAAAGCGAAACTC300GTTTATGGCGGCGACGTCTAGGGCGTCATTGAAATCGCCGATATTTACCGCCCGGATAAA360ACGAAAGAAGCCAAGCTCGTCTATAAAACCGATGAACTCGCTCACCCGGGCGTGGGCAAG420CTGTTTGAAAAACCAGATGTGTAGGTCGGCGGAGCGGTTAGGCTCGTCAAACGGAGCGAC480AAAGGCCAGTTTGCTCCGTTTTATTTCGATCCGGCCGAAACGCGGAAACGATTTGCCGAA540CTCGGCTGGAATACCGTCGTGGGCTTCCAAACACGCAACCCGGTTCACCGGGCCCATGAA600TACATTCAAAAATGCGCGCTTGAAATCGTGGACGGCTTGTTTTTAAACCCGCTCGTCGGC660GAAACGAAAGCGGACGATATTCCGGCCGACATCCGGATGGAAAGCTATCAAGTGCTGCTG720GAAAACTATTATCCGAAAGACCGCGTTTTCTTGGGCGTCTTCCAAGCTGCGATGCGGTAT780GCCGGTCCGCGCGAAGCGATTTTCCATGCCATGGTGCGGAAAAACTTCGGCTGCACGCAC840TTCATCGTCGGCCGGGACCATGCGGGCGTCGGCAACTATTACGGCACGTATGATGCGCAA900AAAATCTTCTCGAACTTTACAGCCGAAGAGCTTGGCATTACACCGCTCTTTTTCGAACAC960AGCTTTTATTGCAGGAAATGCGAAGGGATGGCATCGAGGAAAACATGCCCGCACGACGCA1020CAATATCACGTTGTCCTTTCTGGCACGAAAGTCCGTGAAATGTTGCGTAACGGCCAAGTG1080CCGCCGAGCACATTCAGCCGTCCGGAAGTGGCCGGCGTTTTGATCAAAGGGCTGCAAGAA1140CGCGAAACGGTCACCCCGTCGACACGCTAAAGGAGGAGCGAGATGAGCACGAATATCGTT1200TGGCATCATACATCGGTGACAAAAGAAGATCGCCGCCAACGCAACGG1247


The Bst sulfurylase polypeptide (SEQ ID NO:2) is 386 amino acid residues in length and is presented using the three letter amino acid code.

Bst Sulfurylase Amino Acid Sequence (SEQ ID NO: 2)Met Ser Leu Ser Ile Pro His Gly Gly Thr Leu Ile  1               5                  10Asn Arg Trp Asn Pro Asp Tyr Pro Ile Asp Glu Ala         15                  20Thr Lys Thr Ile Glu Leu Ser Lys Ala Glu Leu Ser    25                  30                  35Asp Leu Glu Leu Ile Gly Thr Gly Ala Tyr Ser Pro                 40                  45Leu Thr Gly Phe Leu Thr Lys Ala Asp Tyr Asp Ala         50                  55Val Val Glu Thr Met Arg Leu Ala Asp Gly Thr Val 60                  65                  70Trp Ser Ile Pro Ile Thr Leu Ala Val Thr Glu Glu             75                  80Lys Ala Ser Glu Leu Thr Val Gly Asp Lys Ala Lys    85                  90                  95Leu Val Tyr Gly Gly Asp Val Tyr Gly Val Ile Glu                100                 105Ile Ala Asp Ile Tyr Arg Pro Asp Lys Thr Lys Glu        110                 115Ala Lys Leu Val Tyr Lys Thr Asp Glu Leu Ala His120                 125                 130Pro Gly Val Arg Lys Leu Phe Glu Lys Pro Asp Val            135                 140Tyr Val Gly Gly Ala Val Thr Leu Val Lys Arg Thr    145                 150                 155Asp Lys Gly Gln Phe Ala Pro Phe Tyr Phe Asp Pro                160                 165Ala Glu Thr Arg Lys Arg Phe Ala Glu Leu Gly Trp        170                 175Asn Thr Val Val Gly Phe Gln Thr Arg Asn Pro Val180                 185                 190His Arg Ala His Glu Tyr Ile Gln Lys Cys Ala Leu            195                 200Glu Ile Val Asp Gly Leu Phe Leu Asn Pro Leu Val    205                 210                 215Gly Glu Thr Lys Ala Asp Asp Ile Pro Ala Asp Ile                220                 225Arg Met Glu Ser Tyr Gln Val Leu Leu Glu Asn Tyr        230                 235Tyr Pro Lys Asp Arg Val Phe Leu Gly Val Phe Gln240                 245                 250Ala Ala Met Arg Tyr Ala Gly Pro Arg Glu Ala Ile            255                 260Phe His Ala Met Val Arg Lys Asn Phe Gly Cys Thr    265                 270                 275His Phe Ile Val Gly Arg Asp His Ala Gly Val Gly                280                 285Asn Tyr Tyr Gly Thr Tyr Asp Ala Gln Lys Ile Phe        290                 295Ser Asn Phe Thr Ala Glu Glu Leu Gly Ile Thr Pro300                 305                 310Leu Phe Phe Glu His Ser Phe Tyr Cys Thr Lys Cys            315                 320Glu Gly Met Ala Ser Thr Lys Thr Cys Pro His Asp    325                 330                 335Ala Gln Tyr His Val Val Leu Ser Gly Thr Lys Val                340                 345Arg Glu Met Leu Arg Asn Gly Gln Val Pro Pro Ser        350                 355Thr Phe Ser Arg Pro Glu Val Ala Ala Val Leu Ile360                 365                 370Lys Gly Leu Gln Glu Arg Glu Thr Val Thr Pro Ser            375                 380Thr Arg    385


In one embodiment, the thermostable sulfurylase is active at temperatures above ambient to at least 50° C. This property is beneficial so that the sulfurylase will not be denatured at higher temperatures commonly utilized in polymerase chain reaction (PCR) reactions or sequencing reactions. In one embodiment, the ATP sulfurylase is from a thermophile. The thermostable sulfurylase can come from thermophilic bacteria, including but not limited to, Bacillus stearothermophilus, Thermus thermophilus, Bacillus caldolyticus, Bacillus subtilis, Bacillus thermoleovorans, Pyrococcus furiosus, Sulfolobus acidocaldarius, Rhodothermus obamensis, Aquifex aeolicus, Archaeoglobus fulgidus, Aeropyrum pernix, Pyrobaculum aerophilum, Pyrococcus abyssi, Penicillium chrysogenum, Sulfolobus solfataricus and Thermomonospora fusca.


The homology of twelve ATP sulfurylases can be shown graphically in the ClustalW analysis in Table 1. The alignment is of ATP sulfurylases from the following species: Bacillus stearothermophilus (Bst), University of Oklahoma—Strain 10 (Univ of OK), Aquifex aeolicus (Aae), Pyrococcus furiosus (Pfu), Sulfolobus solfataricus (Sso), Pyrobaculum aerophilum (Pae), Archaeoglobus fulgidus (Afu), Penicillium chrysogenum (Pch), Aeropyrum pernix (Ape), Saccharomyces cerevisiae (Sce), and Thermomonospora fusca (Tfu).
embedded imageembedded imageembedded imageembedded image


A thermostable sulfurylase polypeptide is encoded by the open reading frame (“ORF”) of a thermostable sulfurylase nucleic acid. An ORF corresponds to a nucleotide sequence that could potentially be translated into a polypeptide. A stretch of nucleic acids comprising an ORF is uninterrupted by a stop codon. An ORF that represents the coding sequence for a full protein begins with an ATG “start” codon and terminates with one of the three “stop” codons, namely, TAA, TAG, or TGA. For the purposes of this invention, an ORF may be any part of a coding sequence, with or without a start codon, a stop codon, or both. For an ORF to be considered as a good candidate for coding for a bona fide cellular protein, a minimum size requirement is often set, e.g., a stretch of DNA that would encode a protein of 50 amino acids or more.


The invention further encompasses nucleic acid molecules that differ from the nucleotide sequences shown in SEQ ID NO:1 due to degeneracy of the genetic code and thus encode the same thermostable sulfurylase proteins as that encoded by the nucleotide sequences shown in SEQ ID NO:1. In another embodiment, an isolated nucleic acid molecule of the invention has a nucleotide sequence encoding a protein having an amino acid sequence shown in SEQ ID NO:2. In addition to the thermostable sulfurylase nucleotide sequence shown in SEQ ID NO:1 it will be appreciated by those skilled in the art that DNA sequence polymorphisms that lead to changes in the amino acid sequences of the thermostable sulfurylase polypeptides may exist within a population (e.g., the bacterial population). Such genetic polymorphism in the thermostable sulfurylase genes may exist among individuals within a population due to natural allelic variation. As used herein, the terms “gene” and “recombinant gene” refer to nucleic acid molecules comprising an open reading frame encoding a thermostable sulfurylase protein. Such natural allelic variations can typically result in 1-5% variance in the nucleotide sequence of the thermostable sulfurylase genes. Any and all such nucleotide variations and resulting amino acid polymorphisms in the thermostable sulfurylase polypeptides, which are the result of natural allelic variation and that do not alter the functional activity of the thermostable sulfurylase polypeptides, are intended to be within the scope of the invention.


Moreover, nucleic acid molecules encoding thermostable sulfurylase proteins from other species, and thus that have a nucleotide sequence that differs from the sequence SEQ ID NO:1 are intended to be within the scope of the invention. Nucleic acid molecules corresponding to natural allelic variants and homologues of the thermostable sulfurylase cDNAs of the invention can be isolated based on their homology to the thermostable sulfurylase nucleic acids disclosed herein using the human cDNAs, or a portion thereof, as a hybridization probe according to standard hybridization techniques under stringent hybridization conditions. The invention further includes the nucleic acid sequence of SEQ ID NO:1 and mature and variant forms thereof, wherein a first nucleotide sequence comprising a coding sequence differing by one or more nucleotide sequences from a coding sequence encoding said amino acid sequence, provided that no more than 11% of the nucleotides in the coding sequence differ from the coding sequence.


Another aspect of the invention pertains to nucleic acid molecules encoding a thermostable sulfurylase protein that contains changes in amino acid residues that are not essential for activity. Such thermostable sulfurylase proteins differ in amino acid sequence from SEQ ID NO:2 yet retain biological activity. In separate embodiments, the isolated nucleic acid molecule comprises a nucleotide sequence encoding a protein, wherein the protein comprises an amino acid sequence at least about 96%, 97%, 98% or 99% homologous to the amino acid sequence of SEQ ID NO:2. An isolated nucleic acid molecule encoding a thermostable sulfurylase protein homologous to the protein of SEQ ID NO: 2 can be created by introducing one or more nucleotide substitutions, additions or deletions into the nucleotide sequence of SEQ ID NO:1 such that one or more amino acid substitutions, additions or deletions are introduced into the encoded protein.


Mutations can be introduced into SEQ ID NO:2 by standard techniques, such as site-directed mutagenesis and PCR-mediated mutagenesis. Preferably, conservative amino acid substitutions are made at one or more predicted, non-essential amino acid residues. A “conservative amino acid substitution” is one in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined within the art. These families include amino acids with basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g. threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Thus, a predicted non-essential amino acid residue in the thermostable sulfurylase protein is replaced with another amino acid residue from the same side chain family. Alternatively, in another embodiment, mutations can be introduced randomly along all or part of a thermostable sulfurylase coding sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for thermostable sulfurylase biological activity to identify mutants that retain activity. Following mutagenesis of SEQ ID NO:1, the encoded protein can be expressed by any recombinant technology known in the art and the activity of the protein can be determined.


The relatedness of amino acid families may also be determined based on side chain interactions. Substituted amino acids may be fully conserved “strong” residues or fully conserved “weak” residues. The “strong” group of conserved amino acid residues may be any one of the following groups: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW, wherein the single letter amino acid codes are grouped by those amino acids that may be substituted for each other. Likewise, the “weak” group of conserved residues may be any one of the following: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, VLIM, HFY, wherein the letters within each group represent the single letter amino acid code.


The thermostable sulfurylase nucleic acid of the invention includes the nucleic acid whose sequence is provided herein, or fragments thereof. The invention also includes mutant or variant nucleic acids any of whose bases may be changed from the corresponding base shown herein while still encoding a protein that maintains its sulfurylase-like activities and physiological functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids whose sequences are complementary to those just described, including nucleic acid fragments that are complementary to any of the nucleic acids just described. The invention additionally includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures include chemical modifications. Such modifications include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These modifications are carried out at least in part to enhance the chemical stability of the modified nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in therapeutic applications in a subject.


A thermostable sulfurylase nucleic acid can encode a mature thermostable sulfurylase polypeptide. As used herein, a “mature” form of a polypeptide or protein disclosed in the present invention is the product of a naturally occurring polypeptide or precursor form or proprotein. The naturally occurring polypeptide, precursor or proprotein includes, by way of nonlimiting example, the full-length gene product, encoded by the corresponding gene. Alternatively, it may be defined as the polypeptide, precursor or proprotein encoded by an ORF described herein. The product “mature” form arises, again by way of nonlimiting example, as a result of one or more naturally occurring processing steps as they may take place within the cell, or host cell, in which the gene product arises. Examples of such processing steps leading to a “mature” form of a polypeptide or protein include the cleavage of the N-terminal methionine residue encoded by the initiation codon of an ORF, or the proteolytic cleavage of a signal peptide or leader sequence. Thus a mature form arising from a precursor polypeptide or protein that has residues 1 to N, where residue 1 is the N-terminal methionine, would have residues 2 through N remaining after removal of the N-terminal methionine. Alternatively, a mature form arising from a precursor polypeptide or protein having residues 1 to N, in which an N-terminal signal sequence from residue 1 to residue M is cleaved, would have the residues from residue M+1 to residue N remaining. Further as used herein, a “mature” form of a polypeptide or protein may arise from a step of post-translational modification other than a proteolytic cleavage event. Such additional processes include, by way of non-limiting example, glycosylation, myristoylation or phosphorylation. In general, a mature polypeptide or protein may result from the operation of only one of these processes, or a combination of any of them.


The term “isolated” nucleic acid molecule, as utilized herein, is one, which is separated from other nucleic acid molecules which are present in the natural source of the nucleic acid. Preferably, an “isolated” nucleic acid is free of sequences which naturally flank the nucleic acid (i.e., sequences located at the 5′- and 3′-termini of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For example, in various embodiments, the isolated thermostable sulfurylase nucleic acid molecules can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleotide sequences which naturally flank the nucleic acid molecule in genomic DNA of the cell/tissue from which the nucleic acid is derived (e.g., brain, heart, liver, spleen, etc.). Moreover, an “isolated” nucleic acid molecule, such as a cDNA molecule, can be substantially free of other cellular material or culture medium when produced by recombinant techniques, or of chemical precursors or other chemicals when chemically synthesized.


A nucleic acid molecule of the invention, e.g., a nucleic acid molecule having the nucleotide sequence of SEQ ID NO:1 or a complement of this aforementioned nucleotide sequence, can be isolated using standard molecular biology techniques and the sequence information provided herein. Using all or a portion of the nucleic acid sequence of SEQ ID NO:1 as a hybridization probe, thermostable sulfurylase molecules can be isolated using standard hybridization and cloning techniques (e.g., as described in Sambrook, et al., (eds.), MOLECULAR CLONING: A LABORATORY MANUAL 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989; and Ausubel, et al., (eds.), CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, N.Y., 1993.)


A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively, genomic DNA, as a template and appropriate oligonucleotide primers according to standard PCR amplification techniques. The nucleic acid so amplified can be cloned into an appropriate vector and characterized by DNA sequence analysis. Furthermore, oligonucleotides corresponding to thermostable sulfurylase nucleotide sequences can be prepared by standard synthetic techniques, e.g., using an automated DNA synthesizer.


As used herein, the term “complementary” refers to Watson-Crick or Hoogsteen base pairing between nucleotides units of a nucleic acid molecule, and the term “binding” means the physical or chemical interaction between two polypeptides or compounds or associated polypeptides or compounds or combinations thereof. Binding includes ionic, non-ionic, van der Waals, hydrophobic interactions, and the like. A physical interaction can be either direct or indirect. Indirect interactions may be through or due to the effects of another polypeptide or compound. Direct binding refers to interactions that do not take place through, or due to, the effect of another polypeptide or compound, but instead are without other substantial chemical intermediates.


Fragments provided herein are defined as sequences of at least 6 (contiguous) nucleic acids or at least 4 (contiguous) amino acids, a length sufficient to allow for specific hybridization in the case of nucleic acids or for specific recognition of an epitope in the case of amino acids, respectively, and are at most some portion less than a full length sequence. Fragments may be derived from any contiguous portion of a nucleic acid or amino acid sequence of choice. Derivatives are nucleic acid sequences or amino acid sequences formed from the native compounds either directly or by modification or partial substitution. Analogs are nucleic acid sequences or amino acid sequences that have a structure similar to, but not identical to, the native compound but differs from it in respect to certain components or side chains. Analogs may be synthetic or from a different evolutionary origin and may have a similar or opposite metabolic activity compared to wild type. Homologs are nucleic acid sequences or amino acid sequences of a particular gene that are derived from different species.


Derivatives and analogs may be full length or other than full length, if the derivative or analog contains a modified nucleic acid or amino acid, as described below. Derivatives or analogs of the nucleic acids or proteins of the invention include, but are not limited to, molecules comprising regions that are substantially homologous to the nucleic acids or proteins of the invention, in various embodiments, by at least about 89% identity over a nucleic acid or amino acid sequence of identical size or when compared to an aligned sequence in which the alignment is done by a computer homology program known in the art, or whose encoding nucleic acid is capable of hybridizing to the complement of a sequence encoding the aforementioned proteins under stringent, moderately stringent, or low stringent conditions. See e.g. Ausubel, et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, N.Y., 1993, and below.


A “homologous nucleic acid sequence” or “homologous amino acid sequence,” or variations thereof, refer to sequences characterized by a homology at the nucleotide level or amino acid level as discussed above. Homologous nucleotide sequences encode those sequences coding for isoforms of thermostable sulfurylase polypeptides. Isoforms can be expressed in different tissues of the same organism as a result of, for example, alternative splicing of RNA. Alternatively, isoforms can be encoded by different genes. In the invention, homologous nucleotide sequences include nucleotide sequences encoding for a thermostable sulfurylase polypeptide of species other than humans, including, but not limited to: vertebrates, and thus can include, e.g., frog, mouse, rat, rabbit, dog, cat cow, horse, and other organisms. Homologous nucleotide sequences also include, but are not limited to, naturally occurring allelic variations and mutations of the nucleotide sequences set forth herein. Homologous nucleic acid sequences include those nucleic acid sequences that encode conservative amino acid substitutions in SEQ ID NO:1, as well as a polypeptide possessing thermostable sulfurylase biological activity. Various biological activities of the thermostable sulfurylase proteins are described below.


The thermostable sulfurylase proteins of the invention include the sulfurylase protein whose sequence is provided herein. The invention also includes mutant or variant proteins any of whose residues may be changed from the corresponding residue shown herein while still encoding a protein that maintains its sulfurylase-like activities and physiological functions, or a functional fragment thereof. The invention further encompasses antibodies and antibody fragments, such as Fab or (Fab)2, that bind immunospecifically to any of the proteins of the invention. This invention also includes a variant or a mature form of the amino acid sequence of SEQ ID NO:2, wherein one or more amino acid residues in the variant differs in no more than 4% of the amino acic residues from the amino acid sequence of the mature form.


Several assays have been developed for detection of the forward ATP sulfurylase reaction. The colorimetric molybdolysis assay is based on phosphate detection (see e.g., Wilson and Bandurski, 1958. J. Biol. Chem. 233: 975-981), whereas the continuous spectrophotometric molybdolysis assay is based upon the detection of NADH oxidation (see e.g., Seubert, et al., 1983. Arch. Biochem. Biophys. 225: 679-691; Seubert, et al., 1985. Arch. Biochem. Biophys. 240: 509-523). The later assay requires the presence of several detection enzymes.


Suitable enzymes for converting ATP into light include luciferases, e.g., insect luciferases. Luciferases produce light as an end-product of catalysis. The best known light-emitting enzyme is that of the firefly, Photinus pyralis (Coleoptera). The corresponding gene has been cloned and expressed in bacteria (see e.g., de Wet, et al., 1985. Proc. Natl. Acad. Sci. USA 80: 7870-7873) and plants (see e.g., Ow, et al., 1986. Science 234: 856-859), as well as in insect (see e.g., Jha, et al., 1990. FEBS Lett. 274: 24-26) and mammalian cells (see e.g., de Wet, et al., 1987. Mol. Cell. Biol. 7: 725-7373; Keller, et al., 1987. Proc. Natl. Acad. Sci. USA 82: 3264-3268). In addition, a number of luciferase genes from the Jamaican click beetle, Pyroplorus plagiophihalamus (Coleoptera), have recently been cloned and partially characterized (see e.g., Wood, et al., 1989. J. Biolumin. Chemilumin. 4: 289-301; Wood, et al., 1989. Science 244: 700-702). Distinct luciferases can sometimes produce light of different wavelengths, which may enable simultaneous monitoring of light emissions at different wavelengths. Accordingly, these aforementioned characteristics are unique, and add new dimensions with respect to the utilization of current reporter systems.


Firefly luciferase catalyzes bioluminescence in the presence of luciferin, adenosine 5′-triphosphate (ATP), magnesium ions, and oxygen, resulting in a quantum yield of 0.88 (see e.g., McElroy and Selinger, 1960. Arch. Biochem. Biophys. 88: 136-145). The firefly luciferase bioluminescent reaction can be utilized as an assay for the detection of ATP with a detection limit of approximately 1×10−13 M (see e.g., Leach, 1981. J. Appl. Biochem. 3: 473-517). In addition, the overall degree of sensitivity and convenience of the luciferase-mediated detection systems have created considerable interest in the development of firefly luciferase-based biosensors (see e.g., Green and Kricka, 1984. Talanta 31: 173-176; Blum, et al., 1989. J. Biolumin. Chemilumin. 4: 543-550).


The development of new reagents have made it possible to obtain stable light emission proportional to the concentrations of ATP (see e.g., Lundin, 1982. Applications of firefly luciferase In; Luminescent Assays (Raven Press, New York). With such stable light emission reagents, it is possible to make endpoint assays and to calibrate each individual assay by addition of a known amount of ATP. In addition, a stable light-emitting system also allows continuous monitoring of ATP-converting systems.


In a preferred embodiment, the ATP generating-ATP converting fusion protein is attached to an affinity tag. The term “affinity tag” is used herein to denote a peptide segment that can be attached to a polypeptide to provide for purification or detection of the polypeptide or provide sites for attachment of the polypeptide to a substrate. In principal, any peptide or protein for which an antibody or other specific binding agent is available can be used as an affinity tag. Affinity tags include a poly-histidine tract or a biotin carboxyl carrier protein (BCCP) domain, protein A (Nilsson et al., EMBO J. 4:1075, 1985; Nilsson et al., Methods Enzymol. 198:3, 1991), glutathione S transferase (Smith and Johnson, Gene 67:31, 1988), substance P, Flag.™. peptide (Hopp et al., Biotechnology 6:1204-1210, 1988; available from Eastman Kodak Co., New Haven, Conn.), streptavidin binding peptide, or other antigenic epitope or binding domain. See, in general Ford et al., Protein Expression and Purification 2: 95-107, 1991. DNAs encoding affinity tags are available from commercial suppliers (e.g., Pharmacia Biotech, Piscataway, N.J.).


As used herein, the term “poly-histidine tag,” when used in reference to a fusion protein refers to the presence of two to ten histidine residues at either the amino- or carboxy-terminus of a protein of interest. A poly-histidine tract of six to ten residues is preferred. The poly-histidine tract is also defined functionally as being a number of consecutive histidine residues added to the protein of interest which allows the affinity purification of the resulting fusion protein on a nickel-chelate or IDA column.


In some embodiments, the fusion protein has an orientation such that the sulfurylase polypeptide is N-terminal to the luciferase polypeptide. In other embodiments, the luciferase polypeptide is N-terminal to the sulfurylase polypeptide. As used herein, the term sulfurylase-luciferase fusion protein refers to either of these orientations. The terms “amino-terminal” (N-terminal) and “carboxyl-terminal” (C-terminal) are used herein to denote positions within polypeptides and proteins. Where the context allows, these terms are used with reference to a particular sequence or portion of a polypeptide or protein to denote proximity or relative position. For example, a certain sequence positioned carboxyl-terminal to a reference sequence within a protein is located proximal to the carboxyl terminus of the reference sequence, but is not necessarily at the carboxyl terminus of the complete protein.


The fusion protein of this invention can be produced by standard recombinant DNA techniques. For example, DNA fragments coding for the different polypeptide sequences are ligated together in-frame in accordance with conventional techniques, e.g. by employing blunt-ended or “sticky”-ended termini for ligation, restriction enzyme digestion to provide for appropriate termini, filling-in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In another embodiment, the fusion gene can be synthesized by conventional techniques including automated DNA synthesizers. Alternatively, PCR amplification of gene fragments can be carried out using anchor primers that give rise to complementary overhangs between two consecutive gene fragments that can subsequently be annealed and reamplified to generate a chimeric gene sequence (see, for example, Ausubel et al. (eds.) CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, 1992). The two polypeptides of the fusion protein can also be joined by a linker, such as a unique restriction site, which is engineered with specific primers during the cloning procedure. In one embodiment, the sulfurylase and luciferase polypeptides are joined by a linker, for example an ala-ala-ala linker which is encoded by a Notl restriction site.


In one embodiment, the invention includes a recombinant polynucleotide that comprises a coding sequence for a fusion protein having an ATP generating polypeptide sequence and an ATP converting polypeptide sequence. In a preferred embodiment, the recombinant polynucleotide encodes a sulfurylase-luciferase fusion protein. The term “recombinant DNA molecule” or “recombinant polynucleotide” as used herein refers to a DNA molecule which is comprised of segments of DNA joined together by means of molecular biological techniques. The term “recombinant protein” or “recombinant polypeptide” as used herein refers to a protein molecule which is expressed from a recombinant DNA molecule.


In one aspect, this invention discloses a sulfurylase-luciferase fusion protein with an N-terminal hexahistidine tag and a BCCP tag. The nucleic acid sequence of the disclosed N-terminal hexahistidine-BCCP luciferase-sulfurylase gene (His6-BCCP L-S) gene is shown below:

His6-BCCP L-S Nucleotide Sequence (SEQ ID NO: 3):ATGCGGGGTTCTCATCATCATCATCATCATGGTATGGCTAGCATGGAAGCGCCAGCAGCA60GCGGAAATCAGTGGTCACATCGTACGTTCCCCGATGGTTGGTAGTTTCTACCGCACCCCA120AGCCCGGACGCAAAAGCGTTCATCGAAGTGGGTCAGAAAGTCAACGTGGGCGATACCCTG180TGCATCGTTGAAGCCATGAAAATGATGAACCAGATCGAAGCGGACAAATCCGGTACCGTG240AAAGCAATTCTGGTCGAAAGTGGACAACCGGTAGAATTTGACGAGCCGCTGGTCGTCATC300GAGGGATCCGAGCTCGAGATCCAAATGGAAGACGCCAAAAACATAAAGAAAGGCCCGGCG360CCATTCTATCCTCTAGAGGATGGAACCGCTGGAGAGCAACTGCATAAGGCTATGAAGAGA420TACGCCCTGGTTCCTGGAACAATTGCTTTTACAGATGCACATATCGAGGTGAACATCACG480TACGCGGAATACTTCGAAATGTCCGTTCGGTTGGCAGAAGCTATGAAACGATATGGGCTG540AATACAAATCACAGAATCGTCGTATGCAGTGAAAACTCTCTTCAATTCTTTATGCCGGTG600TTGGGCGCGTTATTTATCGGAGTTGCAGTTGCGCCCGCGAACGACATTTATAATGAACGT660GAATTGCTCAACAGTATGAACATTTCGCAGCCTACCGTAGTGTTTGTTTCCAAAAAGGGG720TTGCAAAAAATTTTGAACGTGCAAAAAAAATTACCAATAATCCAGAAAATTATTATCATG780GATTCTAAAACGGATTACCAGGGATTTCAGTCGATGTACACGTTCGTCACATCTGATCTA840CCTCCCGGTTTTAATGAATACGATTTTGTACCAGAGTCCTTTGATCGTGACAAAACAATT900GCACTGATAATGAATTCCTCTGGATCTACTGGGTTACCTAAGGGTGTGGCCCTTCCGCAT960AGAACTGCCTGCGTCAGATTCTCGCATGCCAGAGATCCTATTTTTGGCAATCAAATCATT1020CCGGATACTGCGATTTTAAGTGTTGTTCCATTCCATCACGGTTTTGGAATGTTTACTACA1080CTCGGATATTTGATATGTGGATTTCGAGTCGTCTTAATGTATAGATTTGAAGAAGAGCTG1140TTTTTACGATCCCTTCAGGATTACAAAATTCAAAGTGCGTTGCTAGTACCAACCCTATTT1200TCATTCTTCGCCAAAAGCACTCTGATTGACAAATACGATTTATCTAATTTACACGAAATT1260GCTTCTGGGGGCGCACCTCTTTCGAAAGAAGTCGGGGAAGCGGTTGCAAAACGCTTCCAT1320CTTCCAGGGATACGACAAGGATATGGGCTCACTGAGACTACATCAGCTATTCTGATTACA1380CCCGAGGGGGATGATAAACCGGGCGCGGTCGGTAAAGTTGTTCCATTTTTTGAAGCGAAG1440GTTGTGGATCTGGATACCGGGAAAACGCTGGGCGTTAATCAGAGAGGCGAATTATGTGTC1500AGAGGACCTATGATTATGTCCGGTTATGTAAACAATCCGGAAGCGACCAACGCCTTGATT1560GACAAGGATGGATGGCTACATTCTGGAGACATAGCTTACTGGGACGAAGACGAACACTTC1620TTCATAGTTGACCGCTTGAAGTCTTTAATTAAATACAAAGGATATCAGGTGGCCCCCGCT1680GAATTGGAATCGATATTGTTACAACACCCCAACATCTTCGACGCGGGCGTGGCAGGTCTT1740CCCGACGATGACGCCGGTGAACTTCCCGGCGCCGTTGTTGTTTTGGAGCACGGAAAGACG1800ATGACGGAAAAAGAGATCGTGGATTACGTCGCCAGTCAAGTAACAACCGCGAAAAAGTTG1860CGCGGAGGAGTTGTGTTTGTGGACGAAGTACCGAAAGGTCTTACCGGAAAACTCGACGCA1920AGAAAAATCAGAGAGATCCTCATAAAGGCCAAGAAGGGCGGAAAGTCCAAATTGGCGGCC1980GCTATGCCTGCTCCTCACGGTGGTATTCTACAAGACTTGATTGCTAGAGATGCGTTAAAG2040AAGAATGAATTGTTATCTGAAGCGCAATCTTCGGACATTTTAGTATGGAACTTGACTCCT2100AGACAACTATGTGATATTGAATTGATTCTAAATGGTGGGTTTTCTCCTCTGACTGGGTTT2160TTGAACGAAAACGATTACTCCTCTGTTGTTACAGATTCGAGATTAGCAGACGGCACATTG2220TGGACCATCCCTATTACATTAGATGTTGATGAAGCATTTGCTAACCAAATTAAACCAGAC2280ACAAGAATTGCCCTTTTCCAAGATGATGAAATTCCTATTGCTATACTTACTGTCCAGGAT2340GTTTACAAGCCAAACAAAACTATCGAAGCCGAAAAAGTCTTCAGAGGTGACCCAGAACAT2400CCAGCCATTAGCTATTTATTTAACGTTGCCGGTGATTATTACGTCGGCGGTTCTTTAGAA2460GCGATTCAATTACCTCAACATTATGACTATCCAGGTTTGCGTAAGACACCTGCCCAACTA2520AGACTTGAATTCCAATCAAGACAATGGGACCGTGTCGTAGCTTTCCAAACTCGTAATCCA2580ATGCATAGAGCCCACAGGGAGTTGACTGTGAGAGCCGCCAGAGAAGCTAATGCTAAGGTG2640CTGATCCATCCAGTTGTTGGACTAACCAAACCAGGTGATATAGACCATCACACTCGTGTT2700CGTGTCTACCAGGAAATTATTAAGCGTTATCCTAATGGTATTGCTTTCTTATCCCTGTTG2760CCATTAGCAATGAGAATGAGTGGTGATAGAGAAGCCGTATGGCATGCTATTATTAGAAAG2820AATTATGGTGCCTCCCACTTCATTGTTGGTAGAGACCATGCGGGCCCAGGTAAGAACTCC2880AAGGGTGTTGATTTCTACGGTCCATACGATGCTCAAGAATTGGTCGAATCCTACAAGCAT2940GAACTGGACATTGAAGTTGTTGCATTCAGAATGGTCACTTATTTGCCAGACGAAGACCGT3000TATGCTCCAATTGATCAAATTGACACCACAAAGACGAGAACCTTGAACATTTCAGGTACA3060GAGTTGAGACGCCGTTTAAGAGTTGGTGGTGAGATTCCTGAATGGTTCTCATATCCTGAA3120GTGGTTAAAATCCTAAGAGAATCCAACCCACCAAGACCAAAACAAGGTTTTTCAATTGTT3180TTAGGTAATTCATTAACCGTTTCTCGTGAGCAATTATCCATTGCTTTGTTGTCAACATTC3240TTGCAATTCGGTGGTGGCAGGTATTACAAGATCTTTGAACACAATAATAAGACAGAGTTA3300CTATCTTTGATTCAAGATTTCATTGGTTCTGGTAGTGGACTAATTATTCCAAATCAATGG3360GAAGATGACAAGGACTCTGTTGTTGGCAAGCAAAACGTTTACTTATTAGATACCTCAAGC3420TCAGCCGATATTCAGCTAGAGTCAGCGGATGAACCTATTTCACATATTGTACAAAAAGTT3480GTCCTATTCTTGGAAGACAATGGCTTTTTTGTATTTTAA3519


The amino acid sequence of the disclosed His6-BCCP L-S polypeptide is presented using the three letter amino acid code (SEQ ID NO:4).

His6-BCCP L-S Amino Acid Sequence (SEQ ID NO: 4)Met Arg Gly Ser His His His His His His Gly Met  1               5                  10Ala Ser Met Glu Ala Pro Ala Ala Ala Glu Ile Ser         15                  20Gly His Ile Val Arg Ser Pro Met Val Gly Thr Phe 25                  30                  35Tyr Arg Thr Pro Ser Pro Asp Ala Lys Ala Phe Ile         40                  45Glu Val Gly Gln Lys Val Asn Val Gly Asp Thr Leu     50                  55                  60Cys Ile Val Glu Ala Met Lys Met Met Asn Gln Ile                 65                  70Glu Ala Asp Lys Ser Gly Thr Val Lys Ala Ile Leu         75                  80Val Glu Ser Gly Gln Pro Val Glu Phe Asp Glu Pro 85                  90                  95Leu Val Val Ile Glu Gly Ser Glu Leu Glu Ile Gln            100                 105Met Glu Asp Ala Lys Asn Ile Lys Lys Gly Pro Ala    110                 115                 120Pro Phe Tyr Pro Leu Glu Asp Gly Thr Ala Gly Glu                125                 130Gln Leu His Lys Ala Met Lys Arg Tyr Ala Leu Val        135                 140Pro Gly Thr Ile Ala Phe Thr Asp Ala His Ile Glu145                 150                 155Val Asn Ile Thr Tyr Ala Glu Tyr Phe Glu Met Ser            160                 165Val Arg Leu Ala Glu Ala Met Lys Arg Tyr Gly Leu    170                 175                 180Asn Thr Asn His Arg Ile Val Val Cys Ser Glu Asn                185                 190Ser Leu Gln Phe Phe Met Pro Val Leu Gly Ala Leu        195                 200Phe Ile Gly Val Ala Val Ala Pro Ala Asn Asp Ile205                 210                 215Tyr Asn Glu Arg Glu Leu Leu Asn Ser Met Asn Ile            220                 225Ser Gln Pro Thr Val Val Phe Val Ser Lys Lys Gly    230                 235                 240Leu Gln Lys Ile Leu Asn Val Gln Lys Lys Leu Pro                245                 250Ile Ile Gln Lys Ile Ile Ile Met Asp Ser Lys Thr        255                 260Asp Tyr Gln Gly Phe Gln Ser Met Tyr Thr Phe Val265                 270                 275Thr Ser His Leu Pro Pro Gly Phe Asn Glu Tyr Asp            280                 285Phe Val Pro Glu Ser Phe Asp Arg Asp Lys Thr Ile    290                 295                 300Ala Leu Ile Met Asn Ser Ser Gly Ser Thr Gly Leu                305                 310Pro Lys Gly Val Ala Leu Pro His Arg Thr Ala Cys        315                 320Val Arg Phe Ser His Ala Arg Asp Pro Ile Phe Gly325                 330                 335Asn Gln Ile Ile Pro Asp Thr Ala Ile Leu Ser Val            340                 345Val Pro Phe His His Gly Phe Gly Met Phe Thr Thr    350                 355                 360Leu Gly Tyr Leu Ile Cys Gly Phe Arg Val Val Leu                365                 370Met Tyr Arg Phe Glu Glu Glu Leu Phe Leu Arg Ser        375                 380Leu Gln Asp Tyr Lys Ile Gln Ser Ala Leu Leu Val385                 390                 395Pro Thr Leu Phe Ser Phe Phe Ala Lys Ser Thr Leu            400                 405Ile Asp Lys Tyr Asp Leu Ser Asn Leu His Glu Ile    410                 415                 420Ala Ser Gly Gly Ala Pro Leu Ser Lys Glu Val Gly                425                 430Glu Ala Val Ala Lys Arg Phe His Leu Pro Gly Ile        435                 440Arg Gln Gly Tyr Gly Leu Thr Glu Thr Thr Ser Ala445                 450                 455Ile Leu Ile Thr Pro Glu Gly Asp Asp Lys Pro Gly            460                 465Ala Val Gly Lys Val Val Pro Phe Phe Glu Ala Lys    470                 475                 480Val Val Asp Leu Asp Thr Gly Lys Thr Leu Gly Val                485                 490Asn Gln Arg Gly Glu Leu Cys Val Arg Gly Pro Met        495                 500Ile Met Ser Gly Tyr Val Asn Asn Pro Glu Ala Thr505                 510                 515Asn Ala Leu Ile Asp Lys Asp Gly Trp Leu His Ser            520                 525Gly Asp Ile Ala Tyr Trp Asp Glu Asp Glu His Phe    530                 535                 540Phe Ile Val Asp Arg Leu Lys Ser Leu Ile Lys Tyr                545                 550Lys Gly Tyr Gln Val Ala Pro Ala Glu Leu Glu Ser        555                 560Ile Leu Leu Gln His Pro Asn Ile Phe Asp Ala Gly565                 570                 575Val Ala Gly Leu Pro Asp Asp Asp Ala Gly Glu Leu            580                 585Pro Ala Ala Val Val Val Leu Glu His Gly Lys Thr    590                 595                 600Met Thr Glu Lys Glu Ile Val Asp Tyr Val Ala Ser                605                 610Gln Val Thr Thr Ala Lys Lys Leu Arg Gly Gly Val        615                 620Val Phe Val Asp Glu Val Pro Lys Gly Leu Thr Gly625                 630                 635Lys Leu Asp Ala Arg Lys Ile Arg Glu Ile Leu Ile            640                 645Lys Ala Lys Lys Gly Gly Lys Ser Lys Leu Ala Ala    650                 655                 660Ala Met Pro Ala Pro His Gly Gly Ile Leu Gln Asp                665                 670Leu Ile Ala Arg Asp Ala Leu Lys Lys Asn Glu Leu            675                 680Leu Ser Glu Ala Gln Ser Ser Asp Ile Leu Val Trp    685                 690                 695Asn Leu Thr Pro Arg Gln Leu Cys Asp Ile Glu Leu                700                 705Ile Leu Asn Gly Gly Phe Ser Pro Leu Thr Gly Phe        710                 715Leu Asn Glu Asn Asp Tyr Ser Ser Val Val Thr Asp720                 725                 730Ser Arg Leu Ala Asp Gly Thr Leu Trp Thr Ile Pro            735                 740Ile Thr Leu Asp Val Asp Glu Ala Phe Ala Asn Gln    745                 750                 755Ile Lys Pro Asp Thr Arg Ile Ala Leu Phe Gln Asp                760                 765Asp Glu Ile Pro Ile Ala Ile Leu Thr Val Gln Asp        770                 775Val Tyr Lys Pro Asn Lys Thr Ile Glu Ala Glu Lys780                 785                 790Val Phe Arg Gly Asp Pro Glu His Pro Ala Ile Ser            795                 800Tyr Leu Phe Asn Val Ala Gly Asp Tyr Tyr Val Gly    805                 810                 815Gly Ser Leu Glu Ala Ile Gln Leu Pro Gln His Tyr                820                 825Asp Tyr Pro Gly Leu Arg Lys Thr Pro Ala Gln Leu        830                 835Arg Leu Glu Phe Gln Ser Arg Gln Trp Asp Arg Val840                 845                 850Val Ala Phe Gln Thr Arg Asn Pro Met His Arg Ala            855                 860His Arg Glu Leu Thr Val Arg Ala Ala Arg Glu Ala    865                 870                 875Asn Ala Lys Val Leu Ile His Pro Val Val Gly Leu                880                 885Thr Lys Pro Gly Asp Ile Asp His His Thr Arg Val        890                 895Arg Val Tyr Gln Glu Ile Ile Lys Arg Tyr Pro Asn900                 905                 910Gly Ile Ala Phe Leu Ser Leu Leu Pro Leu Ala Met            915                 920Arg Met Ser Gly Asp Arg Glu Ala Val Trp His Ala    925                 930                 935Ile Ile Arg Lys Asn Tyr Gly Ala Ser His Phe Ile                940                 945Val Gly Arg Asp His Ala Gly Pro Gly Lys Asn Ser        950                 955Lys Gly Val Asp Phe Tyr Gly Pro Tyr Asp Ala Gln960                 965                 970Glu Leu Val Glu Ser Tyr Lys His Glu Leu Asp Ile            975                 980Glu Val Val Pro Phe Arg Met Val Thr Tyr Leu Pro    985                 990                 995Asp Glu Asp Arg Tyr Ala Pro Ile Asp Gln Ile Asp                1000                1005Thr Thr Lys Thr Arg Thr Leu Asn Ile Ser Gly Thr        1010                1015Glu Leu Arg Arg Arg Leu Arg Val Gly Gly Glu Ile1020                1025                1030Pro Glu Trp Phe Ser Tyr Pro Glu Val Val Lys Ile            1035                1040Leu Arg Glu Ser Asn Pro Pro Arg Pro Lys Gln Gly    1045                1050                1055Phe Ser Ile Val Leu Gly Asn Ser Leu Thr Val Ser                1060                1065Arg Glu Gln Leu Ser Ile Ala Leu Leu Ser Thr Phe        1070                1075Leu Gln Phe Gly Gly Gly Arg Tyr Tyr Lys Ile Phe1080                1085                1090Glu His Asn Asn Lys Thr Glu Leu Leu Ser Leu Ile            1095                1100Gln Asp Phe Ile Gly Ser Gly Ser Gly Leu Ile Ile    1105                1110                1115Pro Asn Gln Trp Glu Asp Asp Lys Asp Ser Val Val        1120                1125Gly Lys Gln Asn Val Tyr Leu Leu Asp Thr Ser Ser        1130                1135Ser Ala Asp Ile Gln Leu Glu Ser Ala Asp Glu Pro1140                1145                1150Ile Ser His Ile Val Gln Lys Val Val Leu Phe Leu            1155                1160Glu Asp Asn Gly Phe Phe Val Phe    1165                1170


Accordingly, in one aspect, the invention provides for a fusion protein comprising a thermostable sulfurylase joined to at least one affinity tag. The nucleic acid sequence of the disclosed N-terminal hexahistidine-BCCP Bst ATP Sulfurylase (His6-BCCP Bst Sulfurylase) gene is shown below:

His6-BCCP Bst Sulfurylase Nucleotide Sequence (SEQ ID NO: 5)ATGCGGGGTTCTCATGATCATCATCATCATGGTATGGCTAGCATGGAAGGGCCAGCAGCA60GCGGAAATCAGTGGTCACATCGTACGTTCCCCGATGGTTGGTACTTTCTACCGCACCCCA120AGCCCGGACGCAAAAGCGTTCATCGAAGTGGGTCAGAAAGTCAACGTGGGCGATACCCTG180TGCATCGTTGAAGCCATGAAAATGATGAACCAGATCGAAGCGGACAAATCCGGTACCGTG240AAAGCAATTCTGGTCGAAAGTGGACAACCGGTAGAATTTGACGAGCCGCTGGTCGTCATC300GAGGGATCCGAGCTCGAGATCTGCAGCATGAGCGTAAGCATCCCGCATGGCGGCACATTG360ATCAACCGTTGGAATCCGGATTACCCAATCGATGAAGCAACGAAAACGATCGAGCTGTCC420AAAGCCGAACTAAGCGACCTTGAGCTGATCGGCACAGGCGCCTACAGCCCGCTCACCGGG480TTTTTAACGAAAGCCGATTACGATGCGGTCGTAGAAACGATGCGCCTCGCTGATGGCACT540GTCTGGAGCATTCCGATCACGCTGGCGGTGACGGAAGAAAAAGCGAGTGAACTCACTGTC600GGCGACAAAGCGAAACTCGTTTATGGCGGCGACGTCTACGGCGTCATTGAAATCGCCGAT660ATTTACCGCCCGGATAAAACGAAAGAAGCCAAGCTCGTCTATAAAACCGATGAACTCGCT720CACCCGGGCGTGCGCAAGCTGTTTGAAAAACCAGATGTGTACGTCGGCGGAGCGGTTACG780CTCGTCAAACGGACCGACAAAGGCCAGTTTGCTCCGTTTTATTTCGATCCGGCCGAAACG840CGGAAACGATTTGCCGAACTCGGCTGGAATACCGTCGTCGGCTTCCAAACACGCAACCCG900GTTCACCGCGCCCATGAATACATTCAAAAATGCGCGCTTGAAATCGTGGACGGCTTGTTT960TTAAACCCGCTCGTCGGCGAAACGAAAGCGGACGATATTCCGGCCGACATCCGGATGGAA1020AGCTATCAAGTGCTGCTGGAAAACTATTATCCGAAAGACCGCGTTTTCTTGGGCGTCTTC1080CAAGCTGCGATGCGCTATGCCGGTCCGCGCGAAGCGATTTTCCATGCCATGGTGCGGAAA1140AACTTCGGCTGCACGCACTTCATCGTCGGCCGCGACCATGCGGGCGTCGGCAACTATTAC1200GGCACGTATGATGCGCAAAAAATCTTCTCGAACTTTACAGCCGAAGAGCTTGGCATTACA1260CCGCTCTTTTTCGAACACAGCTTTTATTGCACGAAATGCGAAGGCATGGCATCGACGAAA1320ACATGCCCGCACGACGCACAATATCACGTTGTCCTTTCTGGCACGAAAGTCCGTGAAATG1380TTGCGTAACGGCCAAGTGCCGCCGAGCACATTCAGCCGTCCGGAAGTGGCCGCCGTTTTG1440ATCAAAGGGCTGCAAGAACGCGAAACGGTCGCCCCGTCAGCGGGCTAA1488


The amino acid sequence of the His6-BCCP Bst Sulfurylase polypeptide is presented using the three letter amino acid code in Table 6 (SEQ ID NO:6).

Claims
  • 1-221. (canceled)
  • 222. A method of determining the base sequence of a plurality of single stranded template nucleotides on an array, the method comprising: (a) providing a planar surface comprises at least 400,000 discrete cavities, wherein each cavity forms a reaction chamber containing single-stranded nucleic acid templates of a single species, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, wherein each reaction chamber contains a reaction mixture comprising a template-directed nucleotide polymerase and said one of said plurality of single-stranded template nucleotides hybridized to a complementary oligonucleotide primer strand at least one nucleotide residue shorter than the single-stranded template nucleotides to form at least one unpaired nucleotide residue in each template at the 3′-end of the primer strand; (b) adding an activated nucleotide 5′-triphosphate precursor of one known nitrogenous base to the reaction chambers under conditions which allow incorporation of the activated nucleoside 5′-triphosphate precursor onto the 3′-end of the primer strand, provided the nitrogenous base of the activated nucleoside 5′-triphosphate precursor is complementary to the nitrogenous base of the unpaired nucleotide residue of the templates; (c) detecting whether or not the nucleoside 5′-triphosphate precursor was incorporated into the primer strands in each reaction chamber by detecting a sequencing byproduct with an ATP generating polypeptide-ATP converting polypeptide fusion protein or an ATP generating protein and an ATP converting protein, thus indicating that the unpaired nucleotide residue of the template has a nitrogenous base composition that is complementary to that of the incorporated nucleoside 5′-triphosphate precursor in each reaction chamber; (d) sequentially repeating steps (b) and (c), wherein each sequential repetition adds and, detects the incorporation of one type of activated nucleoside 5′-triphosphate precursor of known nitrogenous base composition; and (e) determining the base sequence of the unpaired nucleotide residues of the template in each reaction chamber from the sequence of incorporation of said nucleoside precursors.
  • 223. The method of claim 222 wherein said sequencing byproduct is pyrophosphate.
  • 224. The method of claim 222 wherein the ATP generating polypeptide-ATP converting polypeptide fusion protein comprises an ATP generating polypeptide portion with an amino acid sequence which is at least 96% homologous to SEQ ID NO:2.
  • 225. The method of claim 222 wherein the ATP generating polypeptide-ATP converting polypeptide fusion protein comprises an ATP generating polypeptide portion with an amino acid sequence which is SEQ ID NO:6.
  • 226. The method of claim 222 wherein the ATP generating polypeptide-ATP converting polypeptide fusion protein comprises an amino acid sequence of SEQ ID NO:4.
  • 227. The method of claim 222 wherein the ATP generating protein comprises an amino acid sequence which is at least 96% homologous to SEQ ID NO:2.
  • 228. The method of claim 222 wherein the ATP generating protein comprises an amino acid sequence of SEQ ID NO:2 or SEQ ID NO:6.
  • 229. The method of claim 222 wherein said ATP generating polypeptide-ATP converting polypeptide fusion protein comprise an amino acid sequence encoded by a polynucleotide with an open reading frame of SEQ ID NO:3.
  • 230. The method of claim 222 wherein said ATP generating polypeptide comprise an amino acid sequence encoded by a polynucleotide with an open reading frame which is no more than 11% different from an open reading frame of SEQ ID NO:1.
  • 231. The method of claim 222 wherein said ATP generating polypeptide comprises an amino acid sequence encoded by an open reading frame of SEQ ID NO:1 or SEQ ID NO:5.
  • 232. The method of claim 222 wherein said ATP generating polypeptide-ATP converting polypeptide fusion protein or said ATP generating protein further comprises an affinity tag.
  • 233. The method of claim 222 wherein said ATP generating polypeptide-ATP converting polypeptide fusion protein, said ATP generating protein, or said ATP converting polypeptide is bound to a bead.
  • 234. A method of identifying a base at a target position in a sample nucleic acid sequence, comprising providing a sample nucleic acid and a primer which hybridizes to the sample nucleic acid immediately adjacent to the target position, subjecting the sample nucleic acid and primer to a polymerase reaction in the presence of a nucleotide whereby the nucleotide will only become incorporated if it is complementary to the base in the target position, and detecting said incorporation of the nucleotide by monitoring the release of inorganic pyrophosphate, whereby detection of incorporation of said nucleotide is indicative of identification of a base at a target position that is complementary to said nucleotide, and wherein the release of inorganic pyrophosphate is detected using a thermostable sulfurylase-luciferase fusion protein or a thermostable sulfurylase.
  • 235. The method of claim 234 wherein the thermostable sulfurylase-luciferase fusion protein or the thermostable sulfurylase comprises an amino acid of at least 96% homology to SEQ ID NO:2, SEQ ID NO:4 or SEQ ID NO:6.
  • 236. The method of claim 234 wherein the thermostable sulfurylase-luciferase fusion protein or the thermostable sulfurylase is encoded by an open reading frame of SEQ ID NO: 1, 3 or 5.
  • 237. The method of claim 234 wherein the thermostable sulfurylase-luciferase fusion protein or the thermostable sulfurylase further comprises an affinity tag.
  • 238. The method of claim 234 wherein said the thermostable sulfurylase-luciferase fusion protein or the thermostable sulfurylase is bound to a bead.
CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No. 10/154,515, filed May 23, 2002, which is a continuation in part of U.S. patent application Ser. No. 10/122,706 filed Apr. 11, 2002 which claims the benefit of priority to U.S. Patent Application 60/335,949 filed Oct. 30, 2001 and U.S. Patent Application 60/349,076 filed Jan. 16, 2002. All patents, patent applications and references cited in this specification is hereby incorporated by reference.

Provisional Applications (2)
Number Date Country
60335949 Oct 2001 US
60349076 Jan 2002 US
Continuations (1)
Number Date Country
Parent 10154515 May 2002 US
Child 11147763 Jun 2005 US
Continuation in Parts (1)
Number Date Country
Parent 10122706 Apr 2002 US
Child 10154515 May 2002 US