This patent disclosure contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the U.S. Patent and Trademark Office patent file or records, but otherwise reserves any and all copyright rights.
To conform to the requirements for International Patent Applications, many of the figures presented herein are black and white representations of images originally created in color. The original color versions can be viewed in Perez-Jimenez et al., 2011, Nat Struct Mol. Biol., 18(5):592-6 (including the accompanying Supplementary Information available in the on-line version of the manuscript available on the Nature Structural & Molecular Biology web site) and Perez-Jimenez, et al., 2009, Nat Struct Mol Biol 16: 890-6, and Alegre-Cebollada et al., 2010, J Biol Chem, 285(25):18961-6. The contents of Perez-Jimenez et al., 2011, Nat Struct Mol. Biol., May; 18(5):592-6 (including the accompanying “Supplementary Information,”), Perez-Jimenez et al., 2009, Nat Struct Mol Biol 16:890-6 and Alegre-Cebollada et al., 2010, J Biol Chem, 285(25):18961-6, are herein incorporated by reference in their entireties.
All patents, patent applications and publications cited herein are hereby incorporated by reference in their entirety. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art as known to those skilled therein as of the date of the invention described herein.
The market for industrial enzymes has exploded in the past decades, with applications now including biotech, pharma, detergents, textile production, food processing, wine making, paper manufacturing, beauty products and many other areas. This has created an increasing need for enzymes that are stable at a wider range of temperatures and pH. As of today, there is no reliable method to achieve this while not simultaneously affecting the activity. A common practice nowadays is to randomly insert mutations in existing enzymes and screen for variants that exhibit the desired characteristics. However, due to the enormous combinatorial possibilities, this often becomes a costly and work-intense endeavor, and never guarantees success. Still, this has been the preferred method to discover most of the presently used industrial enzymes, many of which are patented.
Little is known about how the chemistry of primitive enzymes arose and how the environmental conditions affected the evolution of their chemistry (Zalatan et al., Nat. Chem. Biol., 5:516-520 (2009)); however since these organisms lived on the primordial earth and in an environment that was much hotter and more acidic than today, their enzymes would have been optimized to have a higher thermal and acidic stability than their modern counterparts. Experimental paleogenetics and paleobiochemistry (e.g. the study of resurrected proteins) can reveal valuable information regarding the adaptation of extinct forms of life to climatic, ecological and physiological alterations (Thornton, Science 301, 1714-7 (2003); Thomson et al., Nat Genet. 37, 630-5 (2005); Boussau et al., Nature 456, 942-5 (2008); Chang et al., Mol Biol Evol 19, 1483-9 (2002)). Unfortunately, previous reconstruction and resurrection provide a journey back in time on the order of a only few millions years (Myr) (Benner et al., Adv Enzymol Relat Areas Mol Biol 75, 1-132, xi (2007); Thornton, Nat Rev Genet. 5, 366-75 (2004); Gaucher et al., Nature 425, 285-8 (2003)). Consequently, many hypotheses about ancient life remain untested and cannot be directly answered by examining fossil records (Nisbet and Sleep, Nature 409, 1083-91 (2001)). There is a need for reliable methods for optimizing the pH and temperature stabilities of existing enzymes. There is also a need for methods useful for developing enzymes in a predictable and cost effective manner that are more effective and work in a wider range of environments. This invention addresses these needs.
In one aspect, the invention relates to an isolated polypeptide having a sequence selected from the group consisting of: SEQ ID NO: 1-7. In another aspect, the invention relates to an isolated polypeptide having at least about 75% identity to SEQ ID NO: 1-7. In still another aspect, the invention relates to an isolated polypeptide comprising at least about 10, at least about 20, at least about 30, at least about 50 at least about 60, at least about 70, at least about 80, at least about 90 or at least about 100 consecutive amino acids from any of SEQ ID NOs: 1-7. In one embodiment, the isolated polypeptide does not have 100% identity with any extant polypeptide. In another embodiment, the variant has at least about 85.5%, at least about 90.5%, at least about 92.5%, at least about 95%, about 96%, about 96.5%, about 97%, about 97.5%, about 98%, about 98.5%, about 99%, about 99.5% or about 99.9% amino acid sequence identity to any one of SEQ ID NO: 1-7.
In still a further embodiment, the isolated polypeptide has enzymatic activity. In still another embodiment, the isolated polypeptide has thioredoxin activity.
In yet another embodiment, the isolated polypeptide is labeled. In one embodiment, the label is colorimetric, radioactive, chemiluminescent, or fluorescent. In still a further embodiment, the isolated polypeptide is chemically modified. In one embodiment, the chemical modification comprises covalent modification of an amino acid. In another embodiment, the covalent modification comprises methylation, acetylation, phosphorylation, ubiquitination, sumoylation, citrullination, or ADP ribosylation.
In one aspect, the invention relates to an isolated antibody that specifically binds to a polypeptide of any of SEQ ID NO: 1-7.
In another aspect, the invention relates to an isolated nucleic acid comprising a nucleic acid sequence which encodes a polypeptide having a sequence selected from the group consisting of: SEQ ID NO: 1-7. In another aspect, the invention relates to an isolated nucleic acid comprising a nucleic acid sequence which encodes a polypeptide having at least about 75% identity to SEQ ID NO: 1-7. In another aspect, the invention relates to an isolated nucleic acid comprising a nucleic acid sequence which encodes a polypeptide comprising at least about 10, at least about 20, at least about 30, at least about 50 at least about 60, at least about 70, at least about 80, at least about 90 or at least about 100 consecutive amino acids from any of SEQ ID NOs: 1-7.
In one embodiment, the nucleic acid sequence is optimized for expression in a mammalian expression system. In another embodiment, the nucleic acid sequence is optimized for expression in a bacterial expression system. In one embodiment, the bacterial expression system is E. coli. In another embodiment, the isolated nucleic acid is operably linked to one or more control sequences that direct the production of the polypeptide in a suitable expression host.
In another aspect, the invention relates to a recombinant expression vector comprising an isolated nucleic acid comprising a nucleic acid sequence which encodes a polypeptide having a sequence selected from the group consisting of: SEQ ID NO: 1-7.
In another aspect, the invention relates to a recombinant expression vector comprising an isolated nucleic acid comprising a nucleic acid sequence which encodes a polypeptide having at least about 75% identity to SEQ ID NO: 1-7.
In another aspect, the invention relates to a recombinant expression vector comprising an isolated nucleic acid comprising a nucleic acid sequence which encodes a polypeptide comprising at least about 10, at least about 20, at least about 30, at least about 50 at least about 60, at least about 70, at least about 80, at least about 90 or at least about 100 consecutive amino acids from any of SEQ ID NOs: 1-7.
In another aspect, the invention relates to a host cell comprising a recombinant expression vector comprising an isolated nucleic acid comprising a nucleic acid sequence which encodes a polypeptide having a sequence selected from the group consisting of: SEQ ID NO: 1-7.
In another aspect, the invention relates to a host cell comprising a recombinant expression vector comprising an isolated nucleic acid comprising a nucleic acid sequence which encodes a polypeptide having at least about 75% identity to SEQ ID NO: 1-7.
In another aspect, the invention relates to a host cell comprising a recombinant expression vector comprising an isolated nucleic acid comprising a nucleic acid sequence which encodes a polypeptide comprising at least about 10, at least about 20, at least about 30, at least about 50 at least about 60, at least about 70, at least about 80, at least about 90 or at least about 100 consecutive amino acids from any of SEQ ID NOs: 1-7.
In still a further aspect, the invention relates to a method for producing a polypeptide having a sequence selected from the group consisting of: SEQ ID NO: 1-7, the method comprising cultivating a host cell comprising a nucleic acid construct comprising a polynucleotide encoding the polypeptide under conditions suitable for production of the polypeptide; and recovering the polypeptide.
In still a further aspect, the invention relates to a method for producing a polypeptide having at least about 75% identity to SEQ ID NO: 1-7, the method comprising cultivating a host cell comprising a nucleic acid construct comprising a polynucleotide encoding the polypeptide under conditions suitable for production of the polypeptide; and recovering the polypeptide.
In still a further aspect, the invention relates to a method for producing a polypeptide comprising at least about 10, at least about 20, at least about 30, at least about 50 at least about 60, at least about 70, at least about 80, at least about 90 or at least about 100 consecutive amino acids from any of SEQ ID NOs: 1-7, the method comprising cultivating a host cell comprising a nucleic acid construct comprising a polynucleotide encoding the polypeptide under conditions suitable for production of the polypeptide; and recovering the polypeptide.
In still another aspect, the invention relates to a method of generating a reconstructed ancestral polypeptide having greater activity or stability at low pH than an extant polypeptide, the method comprising (a) aligning a plurality of sequences corresponding to homologues of the extant polypeptide, (b) generating a phylogenetic tree of the plurality of sequences corresponding homologues of the extant polypeptide, (c) using bayesian statistical analysis to generate inferred sequences of one or more ancestral genes encoding a version of the polypeptide that was present in a common ancestor of at least two or more organisms in the phylogenetic tree, (d) calculating posterior probabilities for all 20 amino acids in each inferred sequence, (e) generating a reconstructed ancestral polypeptide sequence by assigning to each position in the inferred sequence the amino acid residue having the highest posterior probability for that position and wherein a polypeptide comprising the reconstructed ancestral polypeptide sequence has increased activity or stability at low pH relative to the extant polypeptide.
In still another aspect, the invention relates to a method generating a reconstructed ancestral polypeptide having greater activity or stability at high temperature than an extant polypeptide, the method comprising (a) aligning a plurality of sequences corresponding to homologues of the extant polypeptide, (b) generating a phylogenetic tree of the plurality of sequences corresponding homologues of the extant polypeptide, (c) using bayesian statistical analysis to generate inferred sequences of one or more ancestral genes encoding a version of the polypeptide that was present in a common ancestor of at least two or more organisms in the phylogenetic tree, (d) calculating posterior probabilities for all 20 amino acids in each inferred sequence, (e) generating a reconstructed ancestral polypeptide sequence by assigning to each position in the inferred sequence the amino acid residue having the highest posterior probability for that position and wherein a polypeptide comprising the reconstructed ancestral polypeptide sequence has increased activity or stability at high temperature relative to the extant polypeptide.
In still another aspect, the invention relates to a method generating a reconstructed ancestral polypeptide having a higher melting temperature than an extant polypeptide, the method comprising (a) aligning a plurality of sequences corresponding to homologues of the extant polypeptide, (b) generating a phylogenetic tree of the plurality of sequences corresponding homologues of the extant polypeptide, (c) using bayesian statistical analysis to generate inferred sequences of one or more ancestral genes encoding a version of the polypeptide that was present in a common ancestor of at least two or more organisms in the phylogenetic tree, (d) calculating posterior probabilities for all 20 amino acids in each inferred sequence, (e) generating a reconstructed ancestral polypeptide sequence by assigning to each position in the inferred sequence the amino acid residue having the highest posterior probability for that position and wherein a polypeptide comprising the reconstructed ancestral polypeptide sequence has a higher melting temperature than an extant polypeptide.
In one embodiment, the extant polypeptide is a thioredoxin polypeptide.
In another aspect, the invention relates to a polypeptide produced according to the methods described herein.
The issued patents, applications, and other publications that are cited herein are hereby incorporated by reference to the same extent as if each was specifically and individually indicated to be incorporated by reference.
Industry has a large demand of pH stable and temperature polypeptides for use in a number of industrial applications. Methods to alter polypeptide pH and temperature stability without eliminating function of the polypeptide are highly needed. The methods described herein are related in part to the finding that it is possible to predict, synthesize and characterize enzymes from extinct organisms that lived on earth as long as 4 billion years ago. In certain aspects, the methods described herein are relate to the understanding that because these organisms lived on the primordial earth (i.e. in an environment that was much hotter and more acidic than today), their enzymes were necessarily optimized through selective pressure to have a higher thermal and acidic stability than their modern counterparts. In some aspects, the methods described herein are relate to the finding that because enzyme homologues exist different species, Bayesian statistics can be used to predict the ancestral gene encoding for a version of the enzyme that was present in the common ancestor of these organisms.
In certain aspects, the methods described herein can be used to substitute amino acids according to their presence in resurrected protein sequences from extinct organisms. In one embodiment, the methods described herein are useful for altering (e.g increasing) the stability of a recombinant polypeptide at low pH and/or high temperatures by making one or more conservative substitutions in the amino acid sequence of the polypeptide. In one embodiment, the methods described herein are useful for altering (e.g increasing) the activity of a recombinant polypeptide at low pH and/or high temperatures by making one or more conservative substitutions in the amino acid sequence of the polypeptide.
In certain aspects, the invention described herein relates to the finding that single molecule force-clamp spectroscopy can be used to study protein dynamics under a mechanical force. The experimental resurrection of ancestors of these universal enzymes together with the sensitivity of single-molecule techniques can be a powerful tool towards understanding the origin and evolution of life on Earth. As described herein, the force-dependency of a reaction can be a sensitive probe of substrate nanomechanics during catalysis. This type of protein spectroscopy can also be useful for obtaining details of enzyme active site dynamics. The methods described herein can also complement structural x-ray and NMR data and provide benchmarks for molecular dynamics simulations
The singular forms “a,” “an,” and “the” include plural references unless the content clearly dictates otherwise.
As used herein, “sequence identity” means the percentage of identical nucleotide or amino acid residues at corresponding positions in two or more sequences when the sequences are aligned to maximize sequence matching, i.e., taking into account gaps and insertions. “Percent identity” in the context of two or more nucleic acids or polypeptide sequences, refers to the percentage of nucleotides or amino acids that two or more sequences or subsequences contain which are the same. A specified percentage of amino acid residues or nucleotides can be referred to such as: 60% identity, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or more identity over a specified region, when compared and aligned for correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection.
As used herein, the term “extant” refers to taxa (such as species, genera or families) that are still in existence (living). The term extant contrasts with extinct. As used herein, the terms “extant protein”, “extant polypeptide”, “extant amino acid sequence”, “extant gene” and “extant nucleic acid sequence” refer to proteins, polypeptides, amino acid sequences, genes, and nucleic acid sequences from extant taxa.
Other definitions are provided throughout the specification.
A journey back in time is possible at the molecular level by resurrecting proteins from extinct organisms. Laboratory resurrection of these ancestral proteins enables exploration of aspects of ancient life that cannot be inferred from fossil records alone (Benner et al., Adv Enzymol Relat Areas Mol Biol 75, 1-132, xi (2007); Thornton, Nat Rev Genet. 5, 366-75 (2004); Liberles, Ancestral sequence reconstruction, xiii, 252 p. (Oxford University Press, Oxford; New York, 2007); Hall, Proc Natl Acad Sci USA 103, 5431-6 (2006). Such time traveling is largely limited by the ambiguity in the historical models used for ancestral sequence inference. (Pollock and Chang, in Ancestral sequence reconstruction, pages 85-94 (ed. Liberles. D. A., Oxford University Press, Oxford; New York, 2007); Gaucher et al., Nature 425, 285-8 (2003); Gaucher et al., Nature 451, 704-7 (2008)). For instance, uncertainties in databases, sequence alignments, failures in evolutionary theories and uncertainty in the construction of phylogenetic trees are common sources of ambiguity.
Understanding the molecular mechanisms of enzyme function presents unique challenges in biophysics. In certain aspects, the invention described herein relates to computational methods for resuscitating ancestral genes. In some embodiments, the methods described herein can be used to reconstruct the amino acid sequence of ancient proteins. Reconstructed proteins can be expressed in an expression system and, in certain applications, examined for their activity, pH stability or thermal stability (Gaucher et al. Nature, 2008. 451(7179): p. 704-U2; Gaucher et al, Nature, 2003. 425(6955): p. 285-8).
The pH and temperature stability of polypeptides can depend in part on the distribution of amino acid residues throughout the three dimensional structure of the polypeptide. In one aspect, the methods described herein are relate to findings from the resurrection of seven Precambrian thioredoxin enzymes (Trx), dating back between ˜1.4 and ˜4 billion years ago (Gyr). These findings relate to the evolution of enzymatic reactions of thioredoxin enzymes (Trx) from extinct organisms that lived in the Precambrian. Their mechanism of reduction was probed using single molecule force-spectroscopy which can readily distinguish simple nucleophiles from the more complex chemistry of the active site of Trx enzymes. As described herein, differential scanning calorimetry (DSC) showed that these resurrected enzymes have melting temperatures up to ˜32° C. higher than those of extant Trx, following a trend with a slope of ˜6 K/Gyr. From the force-dependency of the rate of reduction of an engineered substrate can be used to determine whether the ancient Trxs utilized chemical mechanisms of reduction similar to those of modern enzymes. As described herein, the most ancient enzymes showed high activity at low pH, where the extant Trxs became inactive under in low pH environments. The results described herein show that, while Trx enzymes have maintained their reductase chemistry unchanged, they have adapted over a 4 Gyr time span to the changes in temperature and ocean acidity that characterize the evolution of the environment from ancient to modern Earth.
The results described herein also show that the chemical mechanisms observed in modern Trx enzymes were already present in Trxs from Precambrian organisms. Ancestral Trx enzymes from LBCA, AECA and LACA that lived in the mid-to-late Hadean were highly resistant to temperature and active in relatively acidic conditions. These findings are consistent with the hypothesis that in early life Trx enzymes were present in hot environments and these environments have progressively cooled from 4 to 0.5 Gyr (Nisbet and Sleep, Nature 409, 1083-91 (2001); Gaucher et al., Nature 451, 704-7 (2008); Knauth et al., Geo. Soc. Am. Bull., 115: 566-580 (2003); Schulte, M., Oceanography 20, 42-49 (2007)). However, it is also possible that a much cooler early Earth was populated by psychrophiles, mesophiles and thermophiles and that the latter could have been the only survivors of cataclysmic events (e.g., the late heavy bombardment or global glaciations on Early Earth (Nisbet and Sleep, Nature 409, 1083-91 (2001); Gogarten-Boekels et al., Orig. Life Evol. Biosph., 25: 251-264 (1995)). Thus, these findings indicate that important biochemical pathways in the modern biosphere were already established by 3.5 Gyr ago (Nisbet and Sleep, Nature 409, 1083-91 (2001)). For instance, metabolism is one of the most conserved cellular processes. Important pathways like energy production, sugar degradation, cofactor biosynthesis or amino acids processing are highly conserved from bacteria to human and were likely present in LUCA (Peregrin-Alvarez et al., Genome Res 13, 422-7 (2003)). Thus, in some aspects, the present invention is directed to a nucleic acid encoding a recombinant thioredoxin or to recombinant thioredoxin amino acid sequences, such as for example a thioredoxin polypeptide optimized to have greater stability and/or activity at high temperature and/or low pH, that has been modified to change amino acids where the one or more modified are pH optimizing or temperature optimizing modifications.
Evolution operates at multiple levels of biological organization; however, enzymatic mechanisms accompanying adaptive changes seem to be highly conserved. The ability of enzymes to maintain specific chemical reactivities and mechanisms in disparate environments is necessary for the diversification of life. While this ability is exemplified by Trx enzymes, it can also be universal to all proteins (e.g., ubiquitin, RNase, ATPase or other metabolic enzymes that have been maintained in nearly all organisms throughout the history of life). Thus, although some of compositions and methods described herein relate to the activity of resurrected thioredoxin, the paleoenzymological methods described herein can be used to generate polypeptides optimized to have greater stability and/or activity at high temperature and/or low pH. The experimental resurrection of ancestors of these universal proteins together with the sensitivity of single-molecule techniques can be a powerful tool towards understanding the origin and evolution of life on Earth.
In one aspect, the invention relates to computational methods for determining ancestral sequences. Such methods can be used, for example, to determine ancestral sequences for an extant polypeptide (e.g. thioredoxin). In another aspect, the invention relates to methods for increasing the stability and/or activity of a polypeptide (e.g. a thioredoxin) at low pH or at elevated temperature. Methods for determining ancestral sequences can be based on amino acid sequences or on nucleic acid sequences encoding (or predicted to encode) proteins.
In some embodiments, the computational methods described herein are based on the principle of maximum likelihood. The sequences of polypeptides used in the methods described herein can be selected on the basis of a common feature (e.g. a threshold sequence identity, common enzymatic activity, or common modular domain architecture). The methods may involve the construction of a phylogeny using an evolutionary model of the probabilities of amino acid or nucleic acid substitutions polypeptide among different organisms.
Where the sequences differ (e.g. due to mutation), the maximum likelihood methodology can be used to assigns an amino acid or nucleic acid residue to the node a phylogenetic trees (i.e., the branch point of the lineages). Generally, a model of sequence substitutions and then a maximum likelihood phylogeny can be determined for multiple data sets. The sequence at the base node of the maximum likelihood phylogeny is referred to as the ancestral sequence (or most recent common ancestor).
In certain embodiments, the invention is directed to methods for generating an ancestral polypeptide (e.g. thioredoxin) sequences through reconstruction of phylogenetic trees. The ancestral polypeptide sequence may be any polypeptide sequence which contains at least homolog in another organism.
In one aspect, the invention described herein relates to a method for increasing the temperature stability of a recombinant polypeptide produced from a nucleic acid in an expression system, the method comprising replacing one or more temperature stability decreasing amino acids of the recombinant polypeptide with one or more temperature stability increasing amino acids. In another aspect, the invention described herein relates to a method for increasing the pH stability of a recombinant polypeptide produced from a nucleic acid in an expression system, the method comprising replacing one or more temperature pH decreasing amino acids of the recombinant polypeptide with one or more pH stability increasing amino acids.
In certain aspects, the present invention relates to the finding that it is possible to predict, synthesize and characterize polypeptides from extinct organisms. Thus, one embodiment the stability of a extant polypeptide at low pH (e.g. a pH lower than the pH at which the extant polypeptide is expressed in an organism, or the pH at which the polypeptide displays its greatest stability and/or activity) can be increased by reconstructing an ancestral polypeptide of the extant polypeptide by (a) aligning a plurality of sequences corresponding homologues of the extant polypeptide, (b) generating a phylogenetic tree of the plurality of sequences corresponding homologues of the extant polypeptide, (c) using Bayesian statistical analysis to generate inferred sequences of one or more ancestral genes encoding a version of the polypeptide that was present in a common ancestor of at least two or more organisms in the phylogenetic tree, (d) calculating posterior probabilities for all 20 amino acids in each inferred sequence, (e) generating a reconstructed ancestral polypeptide sequence by assigning to each position in the inferred sequence the amino acid residue having the highest posterior probability for that position.
Thus, one embodiment the stability of a extant polypeptide at high temperature (e.g. a temperature higher than the temperature at which the extant polypeptide is expressed in an organism, or the temperature at which the polypeptide displays its greatest stability and/or activity) can be increased by reconstructing an ancestral polypeptide of the extant polypeptide by (a) aligning a plurality of sequences corresponding homologues of the extant polypeptide, (b) generating a phylogenetic tree of the plurality of sequences corresponding homologues of the extant polypeptide, (c) using Bayesian statistical analysis to generate inferred sequences of one or more ancestral genes encoding a version of the polypeptide that was present in a common ancestor of at least two or more organisms in the phylogenetic tree, (d) calculating posterior probabilities for all 20 amino acids in each inferred sequence, (e) generating a reconstructed ancestral polypeptide sequence by assigning to each position in the inferred sequence the amino acid residue having the highest posterior probability for that position.
In another embodiment the activity of a extant polypeptide at low pH (e.g. a pH lower than the pH at which the extant polypeptide is expressed in an organism, or the pH at which the polypeptide displays its greatest stability and/or activity) can be increased by reconstructing an ancestral polypeptide of the extant polypeptide by (a) aligning a plurality of sequences corresponding homologues of the extant polypeptide, (b) generating a phylogenetic tree of the plurality of sequences corresponding homologues of the extant polypeptide, (c) using Bayesian statistical analysis to generate inferred sequences of one or more ancestral genes encoding a version of the polypeptide that was present in a common ancestor of at least two or more organisms in the phylogenetic tree, (d) calculating posterior probabilities for all 20 amino acids in each inferred sequence, (e) generating a reconstructed ancestral polypeptide sequence by assigning to each position in the inferred sequence the amino acid residue having the highest posterior probability for that position.
In another embodiment the activity of a extant polypeptide at high temperature (e.g. a temperature higher than the temperature at which the extant polypeptide is expressed in an organism, or the temperature at which the polypeptide displays its greatest stability and/or activity) can be increased by reconstructing an ancestral polypeptide of the extant polypeptide by (a) aligning a plurality of sequences corresponding homologues of the extant polypeptide, (b) generating a phylogenetic tree of the plurality of sequences corresponding homologues of the extant polypeptide, (c) using Bayesian statistical analysis to generate inferred sequences of one or more ancestral genes encoding a version of the polypeptide that was present in a common ancestor of at least two or more organisms in the phylogenetic tree, (d) calculating posterior probabilities for all 20 amino acids in each inferred sequence, (e) generating a reconstructed ancestral polypeptide sequence by assigning to each position in the inferred sequence the amino acid residue having the highest posterior probability for that position.
In another embodiment the melting temperature of a extant polypeptide can be increased by reconstructing an ancestral polypeptide of the extant polypeptide by (a) aligning a plurality of sequences corresponding homologues of the extant polypeptide, (b) generating a phylogenetic tree of the plurality of sequences corresponding homologues of the extant polypeptide, (c) using Bayesian statistical analysis to generate inferred sequences of one or more ancestral genes encoding a version of the polypeptide that was present in a common ancestor of at least two or more organisms in the phylogenetic tree, (d) calculating posterior probabilities for all 20 amino acids in each inferred sequence, (e) generating a reconstructed ancestral polypeptide sequence by assigning to each position in the inferred sequence the amino acid residue having the highest posterior probability for that position.
In one embodiment, the sequence of a reconstructed protein can be generated by contracting a phylogenetic tree from a plurality of extant (modern) sequences of the enzyme to be reconstructed. The phylogenetic tree can be used to predict the sequences corresponding to every node of the tree. In one embodiment, the enzyme to be reconstructed can be a thioredoxin enzyme and the extant enzymes of a plurality of extant thioredoxin enzymes can be used to construct a phylogenetic tree and predict the sequences of every node of the tree.
Generally, polypeptide sequences corresponding homologues of the extant polypeptide can be obtained from publicly available databases (e.g., GenBank). Sequence comparison and alignment can be performed according to different analytical parameters. For example, in some cases, one sequence can be used are a reference against which all other sequences are compared. In the case of sequence comparison algorithms, test and reference sequences can be input into a computer and sequence algorithm program parameters can be designate for analysis. Alignment of the sequences can be performed using any method, algorithm or program known in the art. Examples of suitable alignment programs include, but are not limited to, MUSCLE (Edgar, Nucleic Acids Res 32, 1792-7 (2004)), Clustal W, the BioEdit program available from North Carolina State University (available at http://www mbio.ncsu.edu/BioEdit/bioedit.html), and the SegEd program.
The terms “homologous” or “homologue” refer to related sequences that share a common ancestor or arise from gene duplication and are determined based on degree of sequence identity. Alternatively, a related sequence may be a sequence having homology, which has arisen by convergent evolution. These terms describe the relationship between a gene found in one species, subspecies, variety, cultivar or strain and the corresponding or equivalent gene in another species, subspecies, variety, cultivar or strain or, in the case of paralogous genes, two related sequences within a species, subspecies, variety, cultivar or strain. “Homologous sequences” are thought, believed, or known to be functionally related. A functional relationship may be indicated in a number of ways, including, but not limited to: (a) the degree of sequence identity; and/or (b) the same or similar biological function. Homology can be determined using software programs readily available in the art, such as those discussed in Current Protocols in Molecular Biology (F. M. Ausubel et al., eds., 1987).
The term “homolog” is also used to refer to proteins with amino acid sequences sharing at least about 60%, 70%, 80%, 90% or more identity with the amino acid sequences of an ancestral protein, such as the ancestral Trx proteins described herein. The term “homolog” is also used to refer to gene sequences with nucleic acid sequences sharing at least about 60%, 70%, 80%, 90% or more identity with nucleic acid sequences capable of encoding an ancestral protein, such as the ancestral Trx proteins described herein.
In certain embodiments of the methods described herein, the sequences and/or sequence alignments can be further subjected to manual correction. Other suitable alignment algorithms include, but are not limited to the local homology algorithm of Smith and Waterman (Adv. Appl. Math. 2:482 (1981)), by the homology alignment algorithm of Needleman and Wunsch (J. Mol. Biol. 48:443 (1970)), by the search for identity method of Pearson and Lipman (Proc. Natl. Acad. Sci. USA 85:2444 (1988)), by the progressive alignment method of Feng and Doolittle (J. Mol. Evol. 35:351-60 (1987)) (e.g. PILUP), by the CLUSTAL method described by Higgins and Sharp (Gene 73:237-44 (1988); CABIOS 5:151-53 (1989)), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by visual inspection (see, generally Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, New York (1996)). Analysis of the percent sequence identity between the test sequence(s) and the reference sequence can be performed on the basis of designated program parameters. For example, a reference sequence can be compared to other test sequences to determine the percent sequence identity relationship using the following parameters different gap weights, different gap length weights, and weighted end gaps. Appropriate parameters can be identified by one skilled in the art. In some embodiments, the number of sequences can also be reduced by treating conservative substitutions occupying a position in a sequence as being identical to a single residue occupying that position. The choice of residue representing the members of one or more conservative substitution groups may be selected based on the physio-chemical properties of the amino acid, the frequency of occurrence in the sequence alignment or any other criteria known in the art.
A “conservative substitution,” when describing a protein, refers to a change in the amino acid composition of the protein that is less likely to substantially alter the protein's activity. Thus, “conservatively modified variations” of a particular amino acid sequence refers to amino acid substitutions of those amino acids that are less likely to be critical for protein activity or substitution of amino acids with other amino acids having similar properties (e.g., acidic, basic, positively or negatively charged, polar or non-polar, etc.) such that the substitutions of even critical amino acids do not substantially alter activity. Conservative substitution tables providing amino acids that are often functionally similar are well known in the art (see, e.g., Creighton, Proteins, W. H. Freeman and Company (1984)). Conservative amino acid substitutions can be made at one or more non-essential amino acid residues. A conservative amino acid substitution can be a substitution in which an amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined in the art, including basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine), aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine), aliphatic side chains (e.g., glycine, alanine, valine, leucine, isoleucine), and sulfur-containing side chains (methionine, cysteine). Substitutions can also be made between acidic amino acids and their respective amides (e.g., asparagine and aspartic acid, or glutamine and glutamic acid).
Conservative amino acid substitutions can be utilized in making variants of the Trx enzymes described herein. For example, replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, a threonine with a serine, or a similar replacement of an amino acid with a structurally related amino acid, may not have a major effect on the properties of the resulting polypeptide or fusion polypeptide. Whether an amino acid change results in a functional polypeptide or fusion polypeptide can readily be determined by assaying the specific activity of the polypeptide or fusion polypeptide.
One skilled in the art will also be able to remove sequences below a particular size cut-off, subject the sequences to split decomposition analysis to remove any phylogenetic noise. A phylogenetic tree can then be constructed by heuristic search using a maximum likelihood (ML) approach. In one embodiment, one or more phylogenetic trees can be generated a suitable program known in the art. Examples of suitable programs include, but are not limited to PAUP (e.g. PAUP 4.0 beta) and PHYML. In one embodiment, the phylogenetic analysis and the phylogenetic tree can be performed using PAUP by the minimum evolution distance criterion with 1000 bootstrap replicates. Once phylogenetic trees are generated, one skilled in the art will appreciate that such tree can be rooted according to different parameters. In certain embodiments, the phylogenetic tree can be used to predict the sequences corresponding to every node of the tree. Parameters suitable for use with the methods described herein include, but are not limited to, strict or relaxed molecular clock model (Lai, Microbiol. Rev., 56:61-79, 1992; Lee et al., J. Virol., 73:11-18, 1999), non-reversible models of substitution, midpoint rooting, and/or outgroup criterion (Gao et al., J. Virol., 79:1154-1163, 2005; Higgins and Sharp, Gene, 73:237-244, 1988; Lai, Microbiol. Rev., 56:61-79, 1992; Lee et al., J. Virol., 73:11-18, 1999; Logvinoff et al., Proc. Natl. Acad. Sci. USA, 101:10149-10154, 2004; Mink et al., Virology, 200:246-255, 1994). The rooted tree can then be used as a template to simulate an ancestral sequence. Simulation of ancestral sequences at internal nodes as well as at common ancestor can be inferred using a reconstruction program using Bayesian statistical analysis. An exemplary reconstruction program for Bayesian statistical analysis is PAML (e.g. PAML version 3.14). In one embodiment, the Bayesian statistical analysis is performed using PAML and the gamma distribution for variable replacement rates across sites is incorporated (Yang, Comput Appl Biosci 13, 555-556 (1997)). In another embodiment, the Bayesian statistical analysis is performed using MrBayes (mrbayes csit.fsu.edu). For each site of the inferred sequences, posterior probabilities can be calculated for all 20 amino acids and the amino acid residue with the highest posterior probability can be assigned at each site of an inferred sequence.
Sequences corresponding homologues of the recombinant polypeptide can be nucleic acid sequences, amino acid sequences, confirmed sequences, predicted sequences or hypothetical sequences. Where conversion of nucleic acid sequences to amino acid sequences is required (e.g. for alignment purposes), one skilled in the art will readily be able to convert the nucleic acid sequences to amino acid sequences using appropriate codon translation tables and/or algorithms for identifying protein coding regions in nucleic acids. In certain embodiments, the sequences corresponding homologues of the recombinant polypeptide can be selected such that at least one sequence is from an organism of the archaea domain, at least one sequence is from an organism of the bacteria domain and at least one sequence is from an organism of the eukarya domain.
Phylogenetically related sequences may be divided according to any criteria known to a person of skill in the art. Exemplary subdivisions include, but are not limited to subdivisions according to phylogenetic distance, function, motif organization, or the like.
The methods of the present invention can be performed using a computer. In one embodiment, the invention involves the use of a computer system which is adapted to allow input of one or more sequences and which includes computer code for performing one or more of the steps of the various methods described herein. For example, the present invention encompasses a computer program that includes code for performing one or more of generating protein sequences, generating gene sequences, aligning gene or polypeptide sequences, generating phylogenetic relationships, performing maximum likelihood and/or Bayesian statistical analysis and for computing any of the methods described herein sequentially or simultaneously.
The computer systems of the invention can comprise a means for inputting data such as the sequence of proteins, a processor for performing the various calculations described herein, and a means for outputting or displaying the result of the calculations.
One of skill in the art can readily create computer code for executing the methods of the invention, using any suitable computer code language or system known in the art, such as “C” for example.
Thioredoxins belong to a broad family of oxidoreductase enzymes ubiquitous in all living organisms (Holmgren, Thioredoxin Annu Rev Biochem 54, 237-71 (1985)). In one aspect, the methods described herein relate to the evolution of thioredoxin (Trx) enzymes. In certain aspects, the methods and compositions described herein relate to the finding that the chemical mechanisms of reduction by thioredoxin enzymes have evolved over time and where the earliest forms thioredoxin enzymes had capabilities that were only comparable to those of simple reducing agents like glutathione or cysteine (
The archetypical active site (CXXC) and the Trx fold are well conserved throughout evolution, indicating that Trxs enzymes were present in primitive forms of life. By using single molecule force-clamp spectroscopy the chemical mechanisms of disulfide reduction by Trx enzymes can be examined in detail at the sub-Ångström scale (Wiita et al., Nature 450, 124-7 (2007); Perez-Jimenez et al., Nat Struct Mol Biol 16, 890-6 (2009)). Hence, the combination of single-molecule force spectroscopy and the resurrection of ancestral proteins can reveal novel insights into the reductase activity of these sulfur-based enzymes. Thioredoxin (Trx) enzymes reduce disulfide bonds in a myriad of target proteins in both intracellular and extracellular compartments (Amer and Holmgren, Eur J Biochem, 2000. 267(20): p. 6102-9; Kumar et al., Proc Natl Acad Sci USA, 2004. 101(11): p. 3759-64; Powis and Montfort, Annu Rev Biophys Biomol Struct, 2001. 30: p. 421-55). In addition to its role as an important cellular antioxidant, the reduction of disulfide bonds by Trx can activate signaling cascades by triggering conformational changes in transcription factors (e.g. NF-κB) (Lillig and Holmgren, Antioxid Redox Signal, 2007. 9(1): p. 25-47) or ion channel activation (Xu et al., TRPC channel activation by extracellular thioredoxin. Nature, 2008. 451(7174): p. 69-72). Trx plays essential roles in the life cycle of viruses (Holmgren, A., Thioredoxin and glutaredoxin systems. J Biol Chem, 1989. 264(24): p. 13963-6) and can be an activator of viral entry into cells. Trx catalyzes the reduction of disulfide bonds in the second domain of the extracellular receptor CD4 as an important step in HIV entry into cells (Matthias, et al., Nat Immunol, 2002. 3(8): p. 727-32; Matthias and Hogg, Antioxid Redox Signal, 2003. 5(1): p. 133-8). Trx is also involved in DNA replication and repair by keeping the essential enzyme ribonucleotide reductase in its reduced state (Avval and Holmgren, J Biol Chem, 2009. 284(13): p. 8233-40). Trx enzymes share a highly conserved amino acid motif, Cys-X-X-Cys, in their active sites as well as a characteristic structural motif called the Trx fold (
Thioredoxin enzymes have structural features that help positioning the participating sulfur atoms, such that an attack through an SN2 reaction is favored, resulting in disulfide bond reduction. An important structural feature in the Trx family of enzymes is the presence of a hydrophobic binding groove that abuts the active site of the enzyme (
The mode of action of Trx catalysis occurs through two conserved cysteine residues of the active site which play complementary roles during the reduction of a target disulfide bond. First, the catalytic Cys32 attacks the target disulfide bond resulting in a mixed disulfide between the enzyme and the substrate. Catalysis is resolved by a subsequent nucleophilic attack by Cys35 (Carvalho, et al., J Phys Chem B, 2008. 112(8): p. 2511-23; Chivers and Raines, Biochemistry, 1997. 36(50): p. 15810-6). After this cycle, the two cysteines in the active site are disulfide bonded and the enzyme is rendered inactive. Another enzyme called Trx reductase (TrxR) draws electrons from NADPH to reduce and reactivate Trx, allowing this cycle to be repeated indefinitely (Williams et al., Eur J Biochem, 2000. 267(20): p. 6110-7; Mustacich, Powis, Biochem J, 2000. 346 Pt 1: p. 1-8). The catalytic activity of Trx enzymes relies on an active cysteine thiolate (
A structural feature of thioredoxin enzymes is a polypeptide binding groove adjacent to the active site of the enzyme. The groove also serves to orient the substrate with respect to the catalytic cysteine, creating signatures that can be detected by force-clamp spectroscopy. The target binds into the binding groove and the target is then reduced by the exposed thiol of the catalytic cysteine. At least four different types of force-dependent reactions can be distinguished. As described herein, a variety of extant and ancient thioredoxins with different groove characteristics, like depth and width, can be used to examine how groove characteristics determine the force-dependency of the reaction. In certain embodiments, the methods described herein can be used to identify groove-free forms of thioredoxin by using evolutionary trees to resuscitate ancient forms of the enzyme and study their catalytic mechanisms. As described herein, molecular dynamics simulations can be used to examine the relationship between the groove characteristics and the mechanisms observed.
A fundamental step in the evolution of thioredoxin chemistry may have been the formation of this binding groove. Thus, by resurrecting ancient forms of thioredoxins, the methods described herein can be used to identify early versions of these enzymes where groove binding was either absent or shallow and poorly evolved (
Several structural features of the binding groove can be directly measured from X-ray or NMR structures of Trx enzymes and by correlating them with observed chemical mechanisms of action. For example, structural axes can be defined to measure the depth and width of the binding groove in the region surrounding the catalytic cysteine (
The binding groove becomes evident by studying mixed disulfide complexes between a mutant form of Trx lacking C35 and disulfide bonded target such as Nf-kB and Ref-1 derived polypeptides (
As described herein, ancient thioredoxin enzymes can be reconstructed that are functional and show greatly altered properties. Further, as described herein, Trx enzymes from different kingdoms can be reconstructed to identify thioredoxin enzymes showing unique features in their force-dependent rate of catalysis. Such findings can be related to their binding groove. Many x-ray structures of Trx enzymes are known (e.g. PDB: 1ZZY, 2FCH, 2FD3, etc). Similarly, x-ray structures of resurrected enzymes can also be resolved (e.g. LBCA;
In on aspect, the invention relates to Trx ancestral proteins having the Trx amino acid sequence of SEQ ID NO: 1-7. Such ancestor proteins include, for example, full-length protein, polypeptides, fragments, derivatives and analogs thereof. In one aspect, the invention provides amino acid sequences of ancestor proteins in SEQ ID NOs: 1-7. In some embodiments, the ancestor protein is functionally active.
In one embodiment, the invention is directed to a last bacterial common ancestor (LBCA) Trx amino acid having the sequence
In another embodiment, the invention is directed to a last archaeal common ancestor (LACA) Trx amino acid having the sequence
In another embodiment, the invention is directed to an archaeal/eukaryotic common ancestor (AECA) Trx amino acid having the sequence
In another embodiment, the invention is directed to a last eukaryotic common ancestor (LECA) Trx amino acid having the sequence
In another embodiment, the invention is directed to a last common ancestor of cyanobacterial and deinococcus/thermus groups (LPBCA) Trx amino acid having the sequence
In another embodiment, the invention is directed to the last common ancestor of γ-proteobacteria, ˜1.61 Gyr old (LGPCA) Trx amino acid having the sequence
In another embodiment, the invention is directed to the last common ancestor of animals and fungi (LAFCA) Trx amino acid having the sequence
A specific embodiment relates to an ancestor protein, fragment, derivative or analog that can be bound by an antibody. Such ancestor proteins, fragments, derivatives or analogs can be tested for the desired immunogenicity by procedures known in the art. (See e.g., Harlow and Lane).
In another aspect, a polypeptide is provided which consists of or comprises a fragment that has at least 8-10 contiguous amino acids of the Trx amino acid sequence as provided in any one of SEQ ID NO: 1-7. In other embodiments, the fragment comprises at least 20 or 50 contiguous amino acids of the Trx amino acid sequence as provided in any one of SEQ ID NO: 1-7.
In one aspect, the invention is directed to polypeptide variants of any one of SEQ ID NO: 1-7. Contemplated variants of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 50% to about 55% identity to that of any one of SEQ ID NO: 1-7. Contemplated variants of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 55.1% to about 60% identity to that of any one of SEQ ID NO: 1-7. Contemplated variants of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 60.1% to about 65% identity to that of any one of SEQ ID NO: 1-7. Contemplated variants of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 65.1% to about 70% identity to that of any one of SEQ ID NO: 1-7. Contemplated variants of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide having at least from about 70.1% to about 75% identity to that of any one of SEQ ID NO: 1-7. Contemplated variants of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 75.1% to about 80% identity to that of any one of SEQ ID NO: 1-7. Contemplated variants of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 80.1% to about 85% identity to that of any one of SEQ ID NO: 1-7. Contemplated variants of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 85.1% to about 90% identity to that of any one of SEQ ID NO: 1-7. Contemplated variant of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 90.1% to about 95% identity to that of any one of SEQ ID NO: 1-7. Contemplated variants of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 95.1% to about 97% identity to that of any one of SEQ ID NO: 1-7. Contemplated variant of any one of SEQ ID NO: 1-7 include but are not limited to polypeptide sequences having at least from about 97.1% to about 99% identity to that of any one of SEQ ID NO: 1-7.
In certain aspects, the invention is directed to a Trx amino acid sequence as provided in any one of SEQ ID NO: 1-7. In another embodiment of the above aspect of the invention, the nucleic acid comprises consecutive nucleotides having a sequence substantially identical to any one of SEQ ID NO: 1-7.
In certain aspects, the invention is directed to an isolated nucleic acid encoding, or capable of encoding, a Trx amino acid sequence as provided in any one of SEQ ID NO: 1-7. In certain aspects, the invention is directed to an isolated nucleic acid complementary to an isolated nucleic acid encoding, or capable of encoding, Trx amino acid sequences as provided in any one of SEQ ID NO: 1-7.
In certain aspects, the invention is directed to isolated amino acid sequence variants of any one of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1-7 include, but are not limited to, amino acid sequences having at least from about 50% to about 55% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1-7 include, but are not limited to, amino acid sequences having at least from about 55.1% to about 60% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1-7 include, but are not limited to, amino acid sequences having at least from about 60.1% to about 65% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1 include, but are not limited to, amino acid sequences having at least from about 65.1% to about 70% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1 include, but are not limited to, amino acid sequences having at least from about 70.1% to about 75% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1-7 include, but are not limited to, amino acid sequences having at least from about 75.1% to about 80% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1-7 include, but are not limited to, amino acid sequences having at least from about 80.1% to about 85% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1-7 include, but are not limited to, amino acid sequences having at least from about 85.1% to about 90% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1-7 include, but are not limited to, amino acid sequences having at least from about 90.1% to about 95% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1-7 include, but are not limited to, amino acid sequences having at least from about 95.1% to about 97% identity to that of SEQ ID NO: 1-7. Variants of SEQ ID NO: 1-7 include, but are not limited to, amino acid sequences having at least from about 97.1% to about 99% identity to that of SEQ ID NO: 1-7.
In one embodiment invention is directed to a polypeptide sequence comprising from about 10 to about 50 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 15 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 20 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 25 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 30 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 35 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 40 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 45 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 50 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 55 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 60 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 65 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is directed to a polypeptide sequence comprising from about 10 to about 70 consecutive amino acids from any one of SEQ ID NO: 1-7.
The invention is further directed to polypeptide sequences having from about 50% to about 99% identity to a polypeptide sequence comprising from about 8 to about 75 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is further directed to polypeptide sequences having from about 50% to about 99% identity to a polypeptide sequence comprising from about 8 to about 80 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is further directed to polypeptide sequences having from about 50% to about 99% identity to a polypeptide sequence comprising from about 8 to about 85 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is further directed to polypeptide sequences having from about 50% to about 99% identity to a polypeptide sequence comprising from about 8 to about 90 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is further directed to polypeptide sequences having from about 50% to about 99% identity to a polypeptide sequence comprising from about 8 to about 95 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is further directed to polypeptide sequences having from about 50% to about 99% identity to a polypeptide sequence comprising from about 8 to about 80 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is further directed to polypeptide sequences having from about 50% to about 99% identity to a polypeptide sequence comprising from about 8 to about 85 consecutive amino acids from any one of SEQ ID NO: 1-7. The invention is further directed to polypeptide sequences having from about 50% to about 99% identity to a polypeptide sequence comprising from about 8 to about 110 consecutive amino acids from any one of SEQ ID NO: 1-7.
In one embodiment, the invention is directed to an isolated nucleic acid sequence comprising from about 10 to about 50 consecutive nucleotides of a nucleic acid encoding, or capable of encoding any one of SEQ ID NO: 1-7. In another embodiment, the invention is directed to an isolated nucleic acid sequence comprising from about 10 to about 100 consecutive nucleotides of a nucleic acid encoding, or capable of encoding any one of SEQ ID NO: 1-7. In another embodiment, the invention is directed to an isolated nucleic acid sequence comprising from about 10 to about 200 consecutive nucleotides of a nucleic acid encoding, or capable of encoding any one of SEQ ID NO: 1-7. In another embodiment, the invention is directed to an isolated nucleic acid sequence comprising from about 10 to about 300 consecutive nucleotides of a nucleic acid encoding, or capable of encoding any one of SEQ ID NO: 1-7. In another embodiment, the invention is directed to an isolated nucleic acid sequence comprising from about 10 to about 320 consecutive nucleotides of a nucleic acid encoding, or capable of encoding any one of SEQ ID NO: 1-7.
In other aspects the invention is directed to isolated nucleic acid sequences such as primers and probes, comprising nucleic acid sequences derived from of a nucleic acid encoding, or capable of encoding any one of SEQ ID NO: 1-7. The isolated nucleic acids which can be used as primer and/probes are of sufficient length to allow hybridization with, i.e. formation of duplex with a corresponding target nucleic acid sequence, or a nucleic acid encoding, or capable of encoding any one of SEQ ID NO: 1-7, or a variant thereof.
To be expressed, the DNA segment encoding a gene can be coupled to one or more cis acting regulatory elements that regulate the expression profile of the gene. Such regulatory elements comprise, but are not limited to, elements that promote transcription, enhance transcription, silence transcription, modulate transcription such that it is responsive to extracellular and intracellular cues, regulate stability of the encoded RNA, regulate splicing of the encoded RNA, regulate export of the encoded RNA, regulate localization of the encoded RNA, regulate translation from the encoded RNA. Also apparent to those skilled in the art is that the expression profile of a given gene in one organism is frequently a reliable indicator of the expression pattern of homologs in phylogenetically related organisms.
Ancestor protein derivatives and analogs can be produced by various methods known in the art. The manipulations which result in their production can occur at the gene or protein level. For example, a nucleic acid encoding an ancestor protein can be modified by any of numerous strategies known in the art (see, e.g., Sambrook), such as by making conservative substitutions, deletions, insertions, and the like. The nucleic acid sequence can be cleaved at appropriate sites with restriction endonuclease(s), followed by further enzymatic modification, if desired, isolated, and ligated in vitro. In the production of nucleic acids encoding a fragment, derivative or analog of an ancestor protein, the modified nucleic acid typically remains in the proper translational reading frame, so that the reading frame is not interrupted by translational stop signals or other signals that interfere with the synthesis of the fragment, derivative or analog. The ancestral sequence nucleic acid can also be mutated in vitro or in vivo to create and/or destroy translation, initiation and/or termination sequences. The ancestral sequence-encoding nucleic acid can also be mutated to create variations in coding regions and/or to form new restriction endonuclease sites or destroy preexisting ones and to facilitate further in vitro modification. Any technique for mutagenesis known in the art can be used, including but not limited to chemical mutagenesis, in vitro site-directed mutagenesis, and the like. In one embodiment, genes encoding the ancestral Trxs enzymes can be synthesized and codon-optimized for expression in an expression system (e.g. E. coli cells). One skilled in the art will be able generate codon-optimized variants of the nucleic acid sequences encoding the ancestral Trx proteins described herein for expression in a desired expression system.
The ancestral polypeptides described herein can be produced in a host expression system. Exemplary host expression systems include but not limited to, eukaryotic expression systems, prokaryotic expression systems, plant expression systems, animal expression systems, bacterial expression systems, yeast cell expression systems, insect cell expression systems, mammalian cell expression systems, primate cell expression systems, human cell expression systems, hamster cell expression systems, mouse cell expression systems, goat cell expression systems, sheep cell expression systems, bird cell expression systems, chicken cell expression systems, and the like. The host expression system may also be any cell line suitable for recombinant protein expression, including, but not limited to, Chinese hamster ovary (CHO) cells, mouse myeloma NS0 cells, baby hamster kidney cells (BHK), human embryo kidney 293 cells (HEK-293), human C6 cells, Madin-Darby canine kidney cells (MDCK) and Sf9 insect cells. The expression system may also be an entire organism, such as a transgenic plant or animal. For example, the expression system may be a transgenic sheep or cow that capable of expression of recombinant proteins that are secreted into the milk, or a recombinant plant capable of expressing recombinant proteins. Any suitable host system for recombinant protein expression known in the art can be used in accordance with the methods of the present invention.
Expression of nucleic acid sequences can be regulated by a second nucleic acid sequence so that the encoded nucleic acid is expressed in a host transformed with the recombinant DNA molecule. For example, expression of an ancestral sequence can be controlled by any suitable promoter/enhancer element known in the art. Suitable promoters include, for example, the SV40 early promoter region (Benoist and Chambon, Nature 290:304-10 (1981)), the promoter contained in the 3′ long terminal repeat of Rous sarcoma virus (Yamamoto et al., Cell 22:787-97 (1980)), the herpes thymidine kinase promoter (Wagner et al., Proc. Natl. Acad. Sci. USA 78:1441-45 (1981)), the Cytomegalovirus promoter, the translational elongation factor EF-1.alpha. promoter, the regulatory sequences of the metallothionein gene (Brinster et al., Nature 296:39-42 (1982)), prokaryotic promoters such as, for example, the .beta.-lactamase promoter (Villa-Komaroff et al., Proc. Natl. Acad. Sci. USA 75:3727-31 (1978)) or the tac promoter (deBoer et al., Proc. Natl. Acad. Sci. USA 80:21-25 (1983)), plant expression vectors including the cauliflower mosaic virus 35S RNA promoter (Gardner et al., Nucl. Acids Res. 9:2871-88 (1981)), and the promoter of the photosynthetic enzyme ribulose biphosphate carboxylase (Herrera-Estrella et al., Nature 310:115-20 (1984)), promoter elements from yeast or other fungi such as the GAL7 and GAL4 promoters, the ADH (alcohol dehydrogenase) promoter, the PGK (phosphoglycerol kinase) promoter, the alkaline phosphatase promoter, and the like.
In a specific embodiment, a vector is used that comprises a promoter operably linked to the ancestral sequence encoding nucleic acid, one or more origins of replication, and, optionally, one or more selectable markers (e.g., an antibiotic resistance gene). Suitable selectable markers include, for example, those conferring resistance to ampicillin, tetracycline, neomycin, G418, and the like. An expression construct can be made, for example, by subcloning a nucleic acid encoding an ancestral sequence into a restriction site of the pRSECT expression vector. Such a construct allows for the expression of the ancestral sequence under the control of the T7 promoter with a histidine amino terminal flag sequence for affinity purification of the expressed polypeptide.
Expression systems suitable for use with the methods described herein include, but are not limited to in-vitro expression systems and in vivo expression systems. Exemplary in vitro expression systems include, but are not limited to, cell-free transcription/translation systems (e.g. ribosome based protein expression systems). Several such systems are known in the art (see, for example, Tymms (1995) In vitro Transcription and Translation Protocols: Methods in Molecular Biology Volume 37, Garland Publishing, NY).
Exemplary in vivo expression systems include, but are not limited to prokaryotic expression systems such as bacteria (e.g., E. coli and B. subtilis), yeast expression systems (e.g., Saccharomyces cerevisiae), worm expression systems (e.g. Caenorhabditis elegans), insect expression systems (e.g. Sf9 cells), plant expression systems, and amphibian expression systems (e.g. melanophore cells).
Manipulations of the ancestral sequence can also be made at the protein level. Included within the scope of the invention are ancestor protein fragments, derivatives or analogs that are differentially modified during or after synthesis (e.g., in vivo or in vitro translation). Such modifications include conservative substitution, glycosylation, acetylation, phosphorylation, amidation, derivatization by known protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule or other cellular ligand, and the like. Any of numerous chemical modifications can be carried out by known techniques, including, but not limited to, specific chemical cleavage (e.g., by cyanogen bromide); enzymatic cleavage (e.g., by trypsin, chymotrypsin, papain, V8 protease, and the like); modification by, for example, NaBH.sub.4 acetylation, formylation, oxidation and reduction; metabolic synthesis in the presence of tunicamycin; and the like. Amino acids can be modified, for example, co-translationally or post-translationally during recombinant production (e.g., N-linked glycosylation at N-X-S/T motifs during expression in mammalian cells) or modified by synthetic means. Examples of modified amino acids suitable for use with the methods described herein include, but are not limited to, glycosylated amino acids, sulfated amino acids, prenlyated (e.g., farnesylated, geranylgeranylated) amino acids, acetylated amino acids, PEG-ylated amino acids, biotinylated amino acids, carboxylated amino acids, phosphorylated amino acids, and the like. Exemplary protocol and additional amino acids can be found in Walker (1998) Protein Protocols on CD-ROM Human Press, Towata, N.J.
In addition, fragments, derivatives and analogs of ancestor proteins can be chemically synthesized. For example, a peptide corresponding to a portion, or fragment, of an ancestor protein, which comprises a desired domain, can be synthesized by use of chemical synthetic methods using, for example, an automated peptide synthesizer. (See also Hunkapiller et al., Nature 310:105-11 (1984); Stewart and Young, Solid Phase Peptide Synthesis, 2nd ed., Pierce Chemical Co., Rockford, Ill., (1984).) Furthermore, if desired, nonclassical amino acids or chemical amino acid analogs can be introduced as a substitution or addition into the polypeptide sequence. Non-classical amino acids include, but are not limited to, the D-isomers of the common amino acids, .alpha.-amino isobutyric acid, 4-aminobutyric acid, 2-amino butyric acid, 6-amino hexanoic acid, 2-amino isobutyric acid, 3-amino propionic acid, ornithine, norleucine, norvaline, hydroxyproline, sarcosine, citrulline, cysteic acid, t-butylglycine, t-butylalanine, phenylglycine, cyclohexylalanine, .beta.-alanine, selenocysteine, fluoro-amino acids, designer amino acids such as .beta.-methyl amino acids, C .alpha.-methyl amino acids, N .alpha.-methyl amino acids, and other amino acid analogs. Furthermore, the amino acid can be D (dextrorotary) or L (levorotary).
The ancestral protein, fragment, derivative or analog can also be a chimeric, or fusion, protein-comprising an ancestor protein, fragment, derivative or analog thereof (typically consisting of at least a domain or motif of the ancestor protein, or at least 10 contiguous amino acids of the ancestor protein) joined at its amino- or carboxy-terminus via a peptide bond to an amino acid sequence of a different protein. In one embodiment, such a chimeric protein is produced by recombinant expression of nucleic acid encoding the chimeric protein. The chimeric nucleic acid can be made by ligating the appropriate nucleic acid sequences to each other in the proper reading frame and expressing the chimeric product by methods commonly known in the art. Alternatively, the chimeric protein can be made by protein synthetic techniques (e.g., by use of an automated peptide synthesizer).
The nucleic acids encoding ancestral sequences can be inserted into an appropriate expression vector (i.e., a vector which contains the necessary elements for the transcription and translation of the inserted polypeptide-coding sequence). A variety of host-vector systems can be utilized to express the polypeptide-coding sequence(s). These include, for example, mammalian cell systems infected with virus (e.g., vaccinia virus, adenovirus, sindbis virus, Venezuelan equine encephalitis (VEE) virus, and the like), insect cell systems infected with virus (e.g., baculovirus), microorganisms such as yeast containing yeast vectors, or bacteria transformed with bacteriophage DNA, plasmid DNA, or cosmid DNA. The expression elements of vectors vary in their strengths and specificities. Depending on the host-vector system utilized, any one of a number of suitable transcription and translation elements can be used. In specific embodiments, the ancestral sequence is expressed in human cells, other mammalian cells, yeast or bacteria. In yet another embodiment, a fragment of an ancestral sequence comprising an immunologically active region of the sequence is expressed. In one embodiment, the ancestral genes can be cloned into a pQE80L vector and transformed in E. coli BL21 (DE3) cells. For expression, the cells can be incubated overnight in LB medium at 37° C. and protein expression can be induced with 1 mM IPTG. Expressed protein can be recovered by pelleting and sonicated the cells.
Upon expression, ancestral proteins can be isolated and purified by standard methods including chromatography (e.g., ion exchange, affinity, sizing column chromatography, high pressure liquid chromatography), centrifugation, differential solubility, or by any other standard technique for the purification of proteins. In one embodiment, the ancestral proteins can be His 6-tagged. Upon recovery, the proteins can be purified by loading cell lysates onto a His GraviTrap affinity column. The purified protein can be verified by SDS-PAGE. The proteins can then loaded into PD-10 desalting column and finally dialyzed against a buffer (e.g. 50 mM HEPES, pH 7.0 buffer).
Conditions for Trx enzymatic activity can vary according to the Trx enzyme because thioredoxins are in a reduced state to be active. Reduced state Trx enzymes can be generated by any method known in the art, including but not limited to the use of a complementary bacterial or eukaryotic Trx reductase (TrxR) enzyme. Where Trx enzymes are from extant sources or are resurrected enzymes, their accompanying reductases may be unknown or unavailable. In such cases small amounts of dithiothreitol (DTT) (e.g. 50-100 μM) or Tris(2-carboxyethyl)phosphine HCl (TCEP hydrochloride) can be used to maintain the enzymes in the reduced state. The amount of DTT of TCEP can be selected such that it is sufficient to maintain the enzymes in the reduced state but low enough as to not trigger the reduction of disulfide bonds by themselves. Such conditions can to be established for each individual enzyme.
Enzymes can be exceptional catalysts useful for accelerating chemical reaction rates by several orders of magnitude. The mechanisms of numerous enzymatic reactions can be studied using any number of protein biochemistry as well as structural biology approaches, including, but not limited to X-ray crystallography and NMR. Such studies can be used to identify structural features and conformational changes necessary for the catalytic activity of enzymes. Single molecule techniques can also be useful for studying enzyme dynamics in solution at the Ångström scale. In certain aspects, single molecule techniques are useful where observation of rearrangements in the participating atoms necessary for catalysis is important. Such approaches generate data that, combined together with structural information as well as molecular dynamics simulations, can provide a more complete view of enzyme dynamics.
Several methods, some of which are based on spectrophotometry, can be used to determine the activity of Trx enzymes. Exemplary methods include, but are not limited to monitoring the oxidation of NADPH in the presence of Trx reductase or ribonucleotide reductase (Holmgren, J Biol Chem, 1979. 254(18): p. 9113-9; Holmgren, J Biol Chem, 1979. 254(19): p. 9627-32); the observation of the turbidity of solutions containing insulin, which readily aggregates after reduction of its disulfide bonds (Holmgren, J Biol Chem, 1979. 254(19): p. 9627-32) or the use of Ellman's reagent (DTNB), where upon reduction by thiol groups generates products that can be easily detected with a spectrophotometer (Holmgren, Thioredoxin. Annu Rev Biochem, 1985. 54: p. 237-71). Changes in tryptophan fluorescence have also been used to measure rates of Trx oxidation and reduction (Holmgren, J Biol Chem, 1972. 247(7): p. 1992-8). Although effective in monitoring the overall activity of thioredoxin, these methods are not sensitive enough to probe the substrate-enzyme interactions that take place in the binding groove of the enzyme. Such methods can be important because binding grooves are common in enzymes and enzymatic reactions. In such cases, examination of the enzymatic mechanisms and/or activity can be facilitated by single molecule techniques.
Described herein is a force-clamp spectrometer built on top of a “through the lens” Total Internal Reflection Fluorescence (TIRF) microscope. This experimental setup enables the application of force to a single protein while at the same time measuring a fluorescent signal. The force-spectrometer can be either an AFM (Sarkar et al., Proc Natl Acad Sci USA, 2004. 101(35): p. 12882-6), or an electromagnet (Liu et al., Biophysical Journal, 2009. 96(9): p. 3810-3821). Both of these can readily pick up and stretch a single engineered polypeptide. The design takes advantage of the stability and high spatial sensitivity of the evanescent field of the TIRF microscope. As a result of total internal reflection, an evanescent wave is formed on the surface of the microscope slide. The amplitude of the evanescent wave decays exponentially, with a space constant that can be set to be as short as ˜90 nm and up to ˜300 nm. The evanescent wave can excite any fluorophore that enters this field, and its fluorescence can readily be measured by a high performance CCD camera. The rapidly decaying evanescent field on the surface of the microscope slide can be used either to measure displacement in the z direction or to capture single molecule fluorescence without any background emanating from the solution buffer. The combined AFM/TIRF microscope to can be used to demonstrate that a calibrated evanescent field can be used to track the mechanical unfolding of a single polypeptide with sub-nanometer resolution (Sarkar et al., Proc Natl Acad Sci USA, 2004. 101(35): p. 12882-6). The same TIRF microscope equipped with magnetic tweezers can track the unfolding of a polypeptide at very low forces and for very long periods of time (Liu et al., Biophysical Journal, 2009. 96(9): p. 3810-3821). However, the simplest application of the AFM/TIRF microscope is in detecting fluorescence over a very short distance of a mechanically stretched protein, without interference from the bulk. This technique has been demonstrated by mechanically stretching and unfolding the protein talin, a key player in coupling the cytoskeleton of a cell to the extracellular matrix (del Rio et al., Science, 2009. 323(5914): p. 638-41). These experiments demonstrate the versatility of combining force-spectroscopy with TIRF microscopy. As described herein, this technique can be used to monitor the association/dissociation reactions of single thioredoxin enzymes as they reduce disulfide bonds in substrate proteins. Trx enzymes can be labeled while remaining active, for example, exposed lysines of Trx enzymes can be labeled with Alexa Fluor 488 fluorophore such to allow monitoring when the enzyme binds to the exposed disulfide bond. The experimental design is shown in
The association and dissociation of fluorescently labeled thioredoxin enzymes can be measured while simultaneously monitoring reduction events using force-spectroscopy/TIRF instrumentation. The force dependency of association and dissociation can also be measured as can the dwell times between association and reduction. These data can be used to examine the mechanisms by which thioredoxin enzymes find their target disulfide bonds. As described herein, the single molecule AFM detection of disulfide bond reduction can be combined with simultaneous Total Internal Reflection (TIRF) detection of fluorescently labeled thioredoxin enzymes to follow them as they bind and unbind to the disulfide bond being reduced. This instrument enables real time visualization of the entire association, reduction and dissociation cycle of a single enzyme as it catalyzes the reduction of its target. The combined AFM/TIRF instrument can be used to study the search mechanism, and to measure association and dissociation rates as a function of the mechanical force applied to the substrate.
In one aspect, the invention described herein relates to the use of single molecule force-clamp spectroscopy techniques for investigating the chemical mechanisms of catalysis of thioredoxins, a broad class of enzymes that specialize in reducing disulfide bonds and that can also function as oxidases and isomerases. Thioredoxin enzymes are present in all known organisms from bacteria to human and play crucial roles in a wide variety of cellular functions. Thioredoxins have been implicated in pathological processes such as vascular damage caused by oxidative injury, virus entry into cells, and a wide variety of immune related disorders, but also have found practical use in biotechnology.
The single molecule assay for the reduction of disulfide bonds by thioredoxin can be performed by detecting the step elongation of a protein under force, which results from the cleavage of a covalent bond (
By applying a calibrated force the conformations of a disulfide bond substrate can be controlled, and the effect of this restriction on the activity of thioredoxin enzymes can be measured. This assay is a highly sensitive probe of the sub-Ångström level rearrangement of the sulfur atoms at the catalytic center of Trx enzymes (Wiita, A. P., et al., Nature, 2007. 450(7166): p. 124-7). By combining this new form of spectroscopy together with structural data and molecular dynamics simulations we obtain novel insights into catalysis. These studies can be generalized and understood in relation to the structure of other enzymes to evaluate of the range of chemical mechanisms available to thioredoxin as well as other enzymes and how such mechanisms can be controlled by structural features such as binding grooves.
Single molecule assays can also be used to detect the oxidase activity of thioredoxin enzymes. For example, if the stretching force is quenched after a substrate disulfide bond has been reduced, the substrate protein folds, however the disulfide bond does not reform spontaneously. By introducing a mutant form of thioredoxin, efficient re-oxidation can be obtained during folding.
Force spectroscopy can also be used to examine other covalent bond cleaving enzymes. For example, proteases share structural features in common with thioredoxins such as a binding groove adjacent to the catalytic nucleophile. A steric-switch approach, where a bond cleavage event is translated into an easily identified stepwise elongation of the substrate protein, can be adapted to detect the activity of proteases, and study their catalytic mechanisms.
As described herein, single molecule force-spectroscopy experiments demonstrate that the application of a mechanical force to a substrate disulfide bond can regulate the catalytic activity of thioredoxin enzymes, thereby revealing distinct chemical mechanisms of reduction that can be distinguished by their sensitivity to an applied force. Thus, single molecule assay of thioredoxin catalysis provides with a novel and useful new approach to study the chemical mechanisms of catalysis in this important class of enzymes.
One advantage of the single molecule approach is that individual conformations, which can otherwise be averaged out in bulk experiments, can be observed directly and then correlated with the known structural features of the molecule. This approach can also be used for ion channels, where it was possible to provide a detailed account of the structure-function relationship for this class of membrane proteins. As described herein, single molecule assays for substrate dynamics in thioredoxin and protease catalysis can be used to study enzyme dynamics.
In single molecule force clamp spectroscopy experiments, a mechanical force is applied to a substrate protein containing a target disulfide bond, and the effect of the resulting stiffening on the rate of reduction or oxidation by thioredoxin enzymes is measured. The applied force restricts the movement of the enzymatic substrate in the binding groove of the enzyme, acting as a form of spectroscopy that can be used to investigate the types of substrate motions that occur during enzymatic catalysis. As described herein, this form of spectroscopy can be used to study the catalytic mechanisms of enzymes, including, but not limited to thioredoxin enzymes and proteases.
The application of force to a substrate disulfide bond can be used to modulate conformational dynamics in the binding groove of Trx (
During protein disulfide bond reduction, thioredoxin binds to the substrate in a catalytically favorable configuration (Qin et al., Structure, 1995. 3: p. 289-297). The mechanisms by which thioredoxin finds a substrate disulfide bond can be examined by measuring the association and dissociation of single enzymes as they find and reduce a disulfide bond. Thioredoxin enzymes may find and position the two bonded sulfur atoms out of the thousands of atoms of the host protein by utilizing a “reduced dimensionality” approach (Adam and Delbruck, Structural Chemistry and Molecular Biology, ed. A. Rich and N. Davidson. 1968, New York: W. H. Freeman and Co. 198-215; von Hippel and Berg, J Biol Chem, 1989. 264(2): p. 675-8), similar to enzymes that target DNA (Gorman et al., Mol Cell, 2007. 28(3): p. 359-70; Stanford et al., Embo J, 2000. 19(23): p. 6546-57). A reduced dimensionality search consists of at least two distinct steps: a nonspecific association with the substrate macromolecule followed by some form of processivity along the coordinates of the substrate (Riggs, et al, Lac Repressor-Operator Interaction 0.3. Kinetic Studies. Journal of Molecular Biology, 1970. 53(3): p. 401-7).
In the case of DNA binding enzymes, the principle of reduced dimensionality has been well established as a widespread mechanism (Halford et al., Nucleic Acids Res, 2004. 32(10): p. 3040-52). For enzymes acting on macromolecular substrates, reduced dimensionality may be important for facilitating the target search (Adam and Delbruck, Structural Chemistry and Molecular Biology, ed. A. Rich and N. Davidson. 1968, New York: W. H. Freeman and Co. 198-215; Riggs, et al, Lac Repressor-Operator Interaction 0.3. Kinetic Studies. Journal of Molecular Biology, 1970. 53(3): p. 401-7; Berg and Blomberg, Biophysical Chemistry, 1978. 8(4): p. 271-280; Berg et al., Biochemistry, 1981. 20(24): p. 6929-6948; von Hippel and Berg, J Biol Chem, 1989. 264(2): p. 675-8). In the case of Trx enzymes, Trx enzymes may first bind to a substrate and then diffusing along the extended polypeptide until finding the disulfide bond. The polypeptide stays loosely bound to the enzymatic groove, and slides randomly towards the disulfide. The simplest expression for the mean time to target is given by
where D is the diffusion coefficient for the enzyme sliding along the polypeptide and dsl is the sliding distance between the place where Trx was first bound to the polypeptide and the exposed disulfide bond (
Although several different ancestral Trx polypeptides are described herein, one of skill in the art will recognize that other types of ancestral polypeptides can also be produced using the methods described herein. Ancestral sequences can be generated for any polypeptide using the methods described herein, including, but not limited to therapeutic proteins and proteins susceptible to industrial use.
The stability and/or activity of any polypeptide at low pH or elevated temperature can be modified according to the methods described herein. Polypeptides having increased stability and/or activity of any polypeptide at low pH or elevated temperature that can be produced according to the methods described herein can be from any source or origin and can include a polypeptide found in prokaryotes, viruses, and eukaryotes, including fungi, plants, yeasts, insects, and animals, including mammals (e.g. humans). Polypeptides having increased stability and/or activity of any polypeptide at low pH or elevated temperature that can be produced according to the methods described herein include, but are not limited to any polypeptide sequences, known or hypothetical or unknown, which can be identified using common sequence repositories. Example of such sequence repositories include, but are not limited to GenBank EMBL, DDBJ and the NCBI. Other repositories can easily be identified by searching on the internet. Polypeptides that can be produced using the methods described herein also include polypeptides have at least about 60%, 70%, 75%, 80%, 90%, 95%, or at least about 99% or more identity to any known or available polypeptide (e.g., a therapeutic polypeptide, a diagnostic polypeptide, an industrial enzyme, or portion thereof, and the like).
Polypeptides having increased stability and/or activity of any polypeptide at low pH or elevated temperature that can be produced according to the methods described herein also include polypeptides comprising one or more non-natural amino acids. As used herein, a non-natural amino acid can be, but is not limited to, an amino acid comprising a moiety where a chemical moiety is attached, such as an aldehyde- or keto-derivatized amino acid, or a non-natural amino acid that includes a chemical moiety. A non-natural amino acid can also be an amino acid comprising a moiety where a saccharide moiety can be attached, or an amino acid that includes a saccharide moiety.
Polypeptides having increased stability and/or activity of any polypeptide at low pH or elevated temperature can also comprise peptide derivatives (for example, that contain one or more non-naturally occurring amino acids). In specific embodiments, the library members contain one or more non-natural or non-classical amino acids or cyclic peptides. Non-classical amino acids include but are not limited to the D-isomers of the common amino acids, -amino isobutyric acid, 4-aminobutyric acid, Abu, 2-amino butyric acid; .-Abu, -Ahx, 6-amino hexanoic acid; Aib, 2-amino isobutyric acid; 3-amino propionic acid; ornithine; norleucine; norvaline, hydroxyproline, sarcosine, citrulline, cysteic acid, t-butylglycine, t-butylalanine, phenylglycine, cyclohexylalanine, .beta.-alanine, designer amino acids such as .beta.-methyl amino acids, C-methyl amino acids, N-methyl amino acids, fluoro-amino acids and amino acid analogs in general. Furthermore, the amino acid can be D (dextrorotary) or L (levorotary).
Also inclusive are derivative polypeptides having an amino acid sequence selected from the group consisting of a polypeptide of SEQ ID NOs: 1-7 and which has been acetylated, carboxylated, phosphorylated, glycosylated, ubiquitinated or other post-translational modifications. In another embodiment, the derivative has been labeled with, e.g., radioactive isotopes such as 125I, 32P, 35S, and 3H. In another embodiment, the derivative has been labeled with fluorophores, chemiluminescent agents, enzymes, and antiligands that can serve as specific binding pair members for a labeled ligand.
Polypeptide modifications are well known to those of skill and have been described in detail in the scientific literature. Several common modifications, such as glycosylation, lipid attachment, sulfation, gamma-carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation, for instance, are described in most basic texts, such as, for instance Creighton, Protein Structure and Molecular Properties, 2nd ed., W. H. Freeman and Company (1993). Many detailed reviews are available on this subject, such as, for example, those provided by Wold, in Johnson (ed.), Posttranslational Covalent Modification of Proteins, pgs. 1-12, Academic Press (1983); Seifter et al., Meth. Enzymol. 182: 626-646 (1990) and Rattan et al., Ann. N.Y. Acad. Sci. 663: 48-62 (1992).
One can determine whether a polypeptide of the invention will be post-translationally modified by analyzing the sequence of the polypeptide to determine if there are peptide motifs indicative of sites for post-translational modification. There are a number of computer programs that permit prediction of post-translational modifications. See, e.g., expasy with the extension .org of the world wide web (accessed Nov. 11, 2002), which includes PSORT, for prediction of protein sorting signals and localization sites, SignalP, for prediction of signal peptide cleavage sites, MITOPROT and Predotar, for prediction of mitochondrial targeting sequences, NetOGlyc, for prediction of type O-glycosylation sites in mammalian proteins, big-PI Predictor and DGPI, for prediction of prenylation-anchor and cleavage sites, and NetPhos, for prediction of Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins. Other computer programs, such as those included in GCG, also can be used to determine post-translational modification peptide motifs.
Examples of types of post-translational modifications include, but are not limited to: (Z)-dehydrobutyrine; 1-chondroitin sulfate-L-aspartic acid ester; l′-glycosyl-L-tryptophan; 1′-phospho-L-histidine; 1-thioglycine; 2′-(S-L-cysteinyl)-L-histidine; 2′-[3-carboxamido (trimethylammonio)propyl]-L-histidine; 2′-alpha-mannosyl-L-tryptophan; 2-methyl-L-glutamine; 2-oxobutanoic acid; 2-pyrrolidone carboxylic acid; 3′-(1′-L-histidyl)-L-tyrosine; 3′-(8alpha-FAD)-L-histidine; 3′-(S-L-cysteinyl)-L-tyrosine; 3′,3″,5′-triiodo-L-thyronine; 3′-4′-phospho-L-tyrosine; 3-hydroxy-L-proline; 3′-methyl-L-histidine; 3-methyl-L-lanthionine; 3′-phospho-L-histidine; 4′-(L-tryptophan)-L-tryptophyl quinone; 42 N-cysteinyl-glycosylphosphatidylinositolethanolamine; 43-(T-L-histidyl)-L-tyrosine; 4-hydroxy-L-arginine; 4-hydroxy-L-lysine; 4-hydroxy-L-proline; 5′-(N-6-L-lysine)-L-topaquinone; 5-hydroxy-L-lysine; 5-methyl-L-arginine; alpha-1-microglobulin-Ig alpha complex chromophore; bis-L-cysteinyl bis-L-histidino diiron disulfide; bis-L-cysteinyl-L-N3′-histidino-L-serinyl tetrairon' tetrasulfide; chondroitin sulfate D-glucuronyl-D-galactosyl-D-galactosyl-D-xylosyl-L-serine; D-alanine; D-allo-isoleucine; D-asparagine; dehydroalanine; dehydrotyrosine; dermatan 4-sulfate D-glucuronyl-D-galactosyl-D-galactosyl-D-xylosyl-L-serine; D-glucuronyl-N-glycine; dipyrrolylmethanemethyl-L-cysteine; D-leucine; D-methionine; D-phenylalanine; D-serine; D-tryptophan; glycine amide; glycine oxazolecarboxylic acid; glycine thiazolecarboxylic acid; heme P450-bis-L-cysteine-L-tyrosine; heme-bis-L-cysteine; hemediol-L-aspartyl ester-L-glutamyl ester; hemediol-L-aspartyl ester-L-glutamyl ester-L-methionine sulfonium; heme-L-cysteine; heme-L-histidine; heparan sulfate D-glucuronyl-D-galactosyl-D-galactosyl-D-xylosyl-L-serine; heme P450-bis-L-cysteine-L-lysine; hexakis-L-cysteinyl hexairon hexasulfide; keratan sulfate D-glucuronyl-D-galactosyl-D-galactosyl-D-xylosyl-L-threonine; L oxoalanine-lactic acid; L phenyllactic acid; 1′-(8alpha-FAD)-L-histidine; L-2′,4′,5′-topaquinone; L-3′,4′-dihydroxyphenylalanine; L-3′,4′,5′-trihydroxyphenylalanine; L-4′-bromophenylalanine; L-6′-bromotryptophan; L-alanine amide; L-alanyl imidazolinone glycine; L-allysine; L-arginine amide; L-asparagine amide; L-aspartic 4-phosphoric anhydride; L-aspartic acid 1-amide; L-beta-methylthioaspartic acid; L-bromohistidine; L-citrulline; L-cysteine amide; L-cysteine glutathione disulfide; L-cysteine methyl disulfide; L-cysteine methyl ester; L-cysteine oxazolecarboxylic acid; L-cysteine oxazolinecarboxylic acid; L-cysteine persulfide; L-cysteine sulfenic acid; L-cysteine sulfinic acid; L-cysteine thiazolecarboxylic acid; L-cysteinyl homocitryl molybdenum-heptairon-nonasulfide; L-cysteinyl imidazolinone glycine; L-cysteinyl molybdopterin; L-cysteinyl molybdopterin guanine dinucleotide; L-cystine; L-erythro-beta-hydroxyasparagine; L-erythro-beta-hydroxyaspartic acid; L-gamma-carboxyglutarnic acid; L-glutamic acid 1-amide; L-glutamic acid 5-methyl ester; L-glutamine amide; L-glutamyl 5-glycerylphosphorylethanolarnine; L-histidine amide; L-isoglutamyl-polyglutamic acid; L-isoglutamyl-polyglycine; L-isoleucine amide; L-lanthionine; L-leucine amide; L-lysine amide; L-lysine thiazolecarboxylic acid; L-lysinoalanine; L-methionine amide; L-methionine sulfone; L-phenyalanine thiazolecarboxylic acid; L-phenylalanine amide; L-proline amide; L-selenocysteine; L-selenocysteinyl molybdopterin guanine dinucleotide; L-serine amide; L-serine thiazolecarboxylic acid; L-seryl imidazolinone glycine; L-T-bromophenylalanine; L-T-bromophenylalanine; L-threonine amide; L-thyroxine; L-tryptophan amide; L-tryptophyl quinone; L-tyrosine amide; L-valine amide; meso-lanthionine; N-(L-glutamyl)-L-tyrosine; N-(L-isoaspartyl)-glycine; N-(L-isoaspartyl)-L-cysteine; N,N,N-trimethyl-L-alanine; N,N-dimethyl-L-proline; N2-acetyl-L-lysine; N2-succinyl-L-tryptophan; N4-(ADP-ribosyl)-L-asparagine; N4-glycosyl-L-asparagine; N4-hydroxymethyl-L-asparagine; N4-methyl-L-asparagine; N5-methyl-L-glutamine; N6-1-carboxyethyl-L-lysine; N6-(4-amino hydroxybutyl)-L-lysine; N6-(L-isoglutamyl)-L-lysine; N6-(phospho-5′-adenosine)-L-lysine; N6-(phospho-5′-guanosine)-L-lysine; N6,N6,N6-trimethyl-L-lysine; N6,N6-dimethyl-L-lysine; N6-acetyl-L-lysine; N6-biotinyl-L-lysine; N6-carboxy-L-lysine; N6-formyl-L-lysine; N6-glycyl-L-lysine; N6-lipoyl-L-lysine; N6-methyl-L-lysine; N6-methyl-N-6-poly(N-methyl-propylamine)-L-lysine; N6-mureinyl-L-lysine; N6-myristoyl-L-lysine; N6-palmitoyl-L-lysine; N6-pyridoxal phosphate-L-lysine; N6-pyruvic acid 2-iminyl-L-lysine; N6-retinal-L-lysine; N-acetylglycine; N-acetyl-L-glutamine; N-acetyl-L-alanine; N-acetyl-L-aspartic acid; N-acetyl-L-cysteine; N-acetyl-L-glutamic acid; N-acetyl-L-isoleucine; N-acetyl-L-methionine; N-acetyl-L-proline; N-acetyl-L-serine; N-acetyl-L-threonine; N-acetyl-L-tyrosine; N-acetyl-L-valine; N-alanyl-glycosylphosphatidylinositolethanolamine; N-asparaginyl-glycosylphosphatidylinositolethanolamine; N-aspartyl-glycosylphosphatidylinositolethanolamine; N-formylglycine; N-formyl-L-methionine; N-glycyl-glycosylphosphatidylinositolethanolamine; N-L-glutamyl-poly-L-glutamic acid; N-methylglycine; N-methyl-L-alanine; N-methyl-L-methionine; N-methyl-L-phenylalanine; N-myristoyl-glycine; N-palmitoyl-L-cysteine; N-pyruvic acid 2-iminyl-L-cysteine; N-pyruvic acid 2-iminyl-L-valine; N-seryl-glycosylphosphatidylinositolethanolamine; N-seryl-glycosyOSPhingolipidinositolethanolamine; O-(ADP-ribosyl)-L-serine; O-(phospho-5′-adenosine)-L-threonine; O-(phospho-5′-DNA)-L-serine; O-(phospho-5′-DNA)-L-threonine; O-(phospho-5′rRNA)-L-serine; O-(phosphoribosyl dephospho-coenzyme A)-L-serine; O-(sn-1-glycerophosphoryl)-L-serine; O4′-(8alpha-FAD)-L-tyrosine; O4′-(phospho-5′-adenosine)-L-tyrosine; O4′-(phospho-5′-DNA)-L-tyrosine; O4′-(phospho-5′-RNA)-L-tyrosine; O4′-(phospho-5′-uridine)-L-tyrosine; O4-glycosyl-L-hydroxyproline; O4′-glycosyl-L-tyrosine; O4′-sulfo-L-tyrosine; O5-glycosyl-L-hydroxylysine; O-glycosyl-L-serine; O-glycosyl-L-threonine; omega-N-(ADP-ribosyl)-L-arginine; omega-N-omega-N′-dimethyl-L-arginine; omega-N-methyl-L-arginine; omega-N-omega-N-dimethyl-L-arginine; omega-N-phospho-L-arginine; O′ octanoyl-L-serine; O-palmitoyl-L-serine; O-palmitoyl-L-threonine; O-phospho-L-serine; O-phospho-L-threonine; O-phosphopantetheine-L-serine; phycoerythrobilin-bis-L-cysteine; phycourobilin-bis-L-cysteine; pyrroloquinoline quinone; pyruvic acid; S hydroxycinnamyl-L-cysteine; S-(2-aminovinyl)methyl-D-cysteine; S-(2-aminovinyl)-D-cysteine; S-(6-FW-L-cysteine; S-(8alpha-FAD)-L-cysteine; S-(ADP-ribosyl)-L-cysteine; 5-(L-isoglutamyl)-L-cysteine; S-12-hydroxyfarnesyl-L-cysteine; S-acetyl-L-cysteine; S-diacylglycerol-L-cysteine; S-diphytanylglycerot diether-L-cysteine; S-farnesyl-L-cysteine; S-geranylgeranyl-L-cysteine; S-glycosyl-L-cysteine; S-glycyl-L-cysteine; S-methyl-L-cysteine; S-nitrosyl-L-cysteine; S-palmitoyl-L-cysteine; S-phospho-L-cysteine; S-phycobiliviolin-L-cysteine; S-phycocyanobilin-L-cysteine; S-phycoerythrobilin-L-cysteine; S-phytochromobilin-L-cysteine; S-selenyl-L-cysteine; S-sulfo-L-cysteine; tetrakis-L-cysteinyl diiron disulfide; tetrakis-L-cysteinyl iron; tetrakis-L-cysteinyl tetrairon tetrasulfide; trans-2,3-cis 4-dihydroxy-L-proline; tris-L-cysteinyl triiron tetrasulfide; tris-L-cysteinyl triiron trisulfide; tris-L-cysteinyl-L-aspartato tetrairon tetrasulfide; tris-L-cysteinyl-L-cysteine persulfido-bis-L-glutamato-L-histidino tetrairon disulfide trioxide; tris-L-cysteinyl-L-N3′-histidino tetrairon tetrasulfide; tris-L-cysteinyl-L-NM'-histidino tetrairon tetrasulfide; and tris-L-cysteinyl-L-serinyl tetrairon tetrasulfide.
Additional examples of post translational modifications can be found in web sites such as the Delta Mass database based on Krishna, R. G. and F. Wold (1998). Posttranslational Modifications. Proteins—Analysis and Design. R. H. Angeletti. San Diego, Academic Press. 1: 121-206.; Methods in Enzymology, 193, J. A. McClosky (ed) (1990), pages 647-660; Methods in Protein Sequence Analysis edited by Kazutomo Imahori and Fumio Sakiyama, Plenum Press, (1993) “Post-translational modifications of proteins” R. G. Krishna and F. Wold pages 167-172; “GlycoSuiteDB: a new curated relational database of glycoprotein glycan structures and their biological sources” Cooper et al. Nucleic Acids Res. 29; 332-335 (2001) “O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins” Gupta et al. Nucleic Acids Research, 27: 370-372 (1999); and “PhosphoBase, a database of phosphorylation sites: release 2.0.”, Kreegipuu et al. Nucleic Acids Res 27(1):237-239 (1999) see also, WO 02/211 39A2, the disclosure of which is incorporated herein by reference in its entirety.
Exemplary polypeptides having increased stability and/or activity of any polypeptide at low pH or elevated temperature that can be produced according to the methods described herein include but are not limited to, cytokines, inflammatory molecules, growth factors, their receptors, and oncogene products or portions thereof. Examples of cytokines, inflammatory molecules, growth factors, their receptors, and oncogene products include, but are not limited to e.g., alpha-1 antitrypsin, Angiostatin, Antihemolytic factor, antibodies (including an antibody or a functional fragment or derivative thereof selected from: Fab, Fab′, F(ab)2, Fd, Fv, ScFv, diabody, tribody, tetrabody, dimer, trimer or minibody), angiogenic molecules, angiostatic molecules, Apolipopolypeptide, Apopolypeptide, Asparaginase, Adenosine deaminase, Atrial natriuretic factor, Atrial natriuretic polypeptide, Atrial peptides, Angiotensin family members, Bone Morphogenic Polypeptide (BMP-1, BMP-2, BMP-3, BMP-4, BMP-5, BMP-6, BMP-7, BMP-8a, BMP-8b, BMP-10, BMP-15, etc.); C-X-C chemokines (e.g., T39765, NAP-2, ENA-78, Gro-a, Gro-b, Gro-c, IP-10, GCP-2, NAP-4, SDF-1, PF4, MIG), Calcitonin, CC chemokines (e.g., Monocyte chemoattractant polypeptide-1, Monocyte chemoattractant polypeptide-2, Monocyte chemoattractant polypeptide-3, Monocyte inflammatory polypeptide-1 alpha, Monocyte inflammatory polypeptide-1 beta, RANTES, 1309, R83915, R91733, HCC1, T58847, D31065, T64262), CD40 ligand, C-kit Ligand, Ciliary Neurotrophic Factor, Collagen, Colony stimulating factor (CSF), Complement factor 5a, Complement inhibitor, Complement receptor 1, cytokines, (e.g., epithelial Neutrophil Activating Peptide-78, GRO alpha/MGSA, GRO beta, GRO gamma, MIP-1 alpha, MIP-1 delta, MCP-1), deoxyribonucleic acids, Epidermal Growth Factor (EGF), Erythropoietin (“EPO”, representing a preferred target for modification by the incorporation of one or more non-natural amino acid), Exfoliating toxins A and B, Factor IX, Factor VII, Factor VIII, Factor X, Fibroblast Growth Factor (FGF), Fibrinogen, Fibronectin, G-CSF, GM-CSF, Glucocerebrosidase, Gonadotropin, growth factors, Hedgehog polypeptides (e.g., Sonic, Indian, Desert), Hemoglobin, Hepatocyte Growth Factor (HGF), Hepatitis viruses, Hirudin, Human serum albumin, Hyalurin-CD44, Insulin, Insulin-like Growth Factor (IGF-I, IGF-II), interferons (e.g., interferon-alpha, interferon-beta, interferon-gamma, interferon-epsilon, interferon-zeta, interferon-eta, interferon-kappa, interferon-lambda, interferon-T, interferon-zeta, interferon-omega), glucagon-like peptide (GLP-1), GLP-2, GLP receptors, glucagon, other agonists of the GLP-1R, natriuretic peptides (ANP, BNP, and CNP), Fuzeon and other inhibitors of HIV fusion, Hurudin and related anticoagulant peptides, Prokineticins and related agonists including analogs of black mamba snake venom, TRAIL, RANK ligand and its antagonists, calcitonin, amylin and other glucoregulatory peptide hormones, and Fc fragments, exendins (including exendin-4), exendin receptors, interleukins (e.g., IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-11, IL-12, etc.), I-CAM-1/LFA-1, Keratinocyte Growth Factor (KGF), Lactoferrin, leukemia inhibitory factor, Luciferase, Neurturin, Neutrophil inhibitory factor (NIF), oncostatin M, Osteogenic polypeptide, Parathyroid hormone, PD-ECSF, PDGF, peptide hormones (e.g., Human Growth Hormone), Oncogene products (Mos, Rel, Ras, Raf, Met, etc.), Pleiotropin, Polypeptide A, Polypeptide G, Pyrogenic exotoxins A, B, and C, Relaxin, Renin, ribonucleic acids, SCF/c-kit, Signal transcriptional activators and suppressors (p53, Tat, Fos, Myc, Jun, Myb, etc.), Soluble complement receptor 1, Soluble I-CAM 1, Soluble interleukin receptors (IL-1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15), soluble adhesion molecules, Soluble TNF receptor, Somatomedin, Somatostatin, Somatotropin, Streptokinase, Superantigens, i.e., Staphylococcal enterotoxins (SEA, SEB, SECT, SEC2, SEC3, SED, SEE), Steroid hormone receptors (such as those for estrogen, progesterone, testosterone, aldosterone, LDL receptor ligand and corticosterone), Superoxide dismutase (SOD), Toll-like receptors (such as Flagellin), Toxic shock syndrome toxin (TSST-1), Thymosin a 1, Tissue plasminogen activator, transforming growth factor (TGF-alpha, TGF-beta), Tumor necrosis factor beta (TNF beta), Tumor necrosis factor receptor (TNFR), Tumor necrosis factor-alpha (TNF alpha), transcriptional modulators (for example, genes and transcriptional modular polypeptides that regulate cell growth, differentiation and/or cell regulation), Vascular Endothelial Growth Factor (VEGF), virus-like particle, VLA-4NCAM-1, Urokinase, signal transduction molecules, estrogen, progesterone, testosterone, aldosterone, LDL, corticosterone.
Additional polypeptides having increased stability and/or activity of any polypeptide at low pH or elevated temperature that can be produced according to the methods described herein include but are not limited to enzymes (e.g., industrial enzymes) or portions thereof. Examples of enzymes include, but are not limited to amidases, amino acid racemases, acylases, dehalogenases, dioxygenases, diarylpropane peroxidases, epimerases, epoxide hydrolases, esterases, isomerases, kinases, glucose isomerases, glycosidases, glycosyl transferases, haloperoxidases, monooxygenases (e.g., p450s), lipases, lignin peroxidases, nitrile hydratases, nitrilases, proteases, phosphatases, subtilisins, transaminase, and nucleases.
Other polypeptides having increased stability and/or activity of any polypeptide at low pH or elevated temperature that can be produced according to the methods described herein include, but are not limited to, agriculturally related polypeptides such as insect resistance polypeptides (e.g., Cry polypeptides), starch and lipid production enzymes, plant and insect toxins, toxin-resistance polypeptides, Mycotoxin detoxification polypeptides, plant growth enzymes (e.g., Ribulose 1,5-Bisphosphate Carboxylase/Oxygenase), lipoxygenase, and Phosphoenolpyruvate carboxylase.
Polypeptides having increased stability and/or activity of any polypeptide at low pH or elevated temperature that can be produced according to the methods described herein include, but are not limited to, antibodies, immunoglobulin domains of antibodies and their fragments. Examples of antibodies include, but are not limited to antibodies, antibody fragments, antibody derivatives, Fab fragments, Fab′ fragments, F(ab)2 fragments, Fd fragments, Fv fragments, single-chain Fv fragments (scFv), diabodies, tribodies, tetrabodies, dimers, trimers, and minibodies.
In another embodiment, the invention is directed to a composition comprising a recombinant polypeptide having increased stability and/or activity of any polypeptide at low pH or elevated temperature produced according to the methods described herein, and an additional component selected from the group consisting of pharmaceutically acceptable diluents, carriers, excipients and adjuvants.
Polypeptides having increased stability and/or activity of any polypeptide at low pH or elevated temperature that can be produced according to the methods described herein can also further comprise a chemical moiety selected from the group consisting of: cytotoxins, pharmaceutical drugs, dyes or fluorescent labels, a nucleophilic or electrophilic group, a ketone or aldehyde, azide or alkyne compounds, photocaged groups, tags, a peptide, a polypeptide, a polypeptide, an oligosaccharide, polyethylene glycol with any molecular weight and in any geometry, polyvinyl alcohol, metals, metal complexes, polyamines, imidizoles, carbohydrates, lipids, biopolymers, particles, solid supports, a polymer, a targeting agent, an affinity group, any agent to which a complementary reactive chemical group can be attached, biophysical or biochemical probes, isotypically-labeled probes, spin-label amino acids, fluorophores, aryl iodides and bromides.
In some embodiments, the present invention involves mutating nucleotide sequences to add/create or remove/disrupt sequences. Such mutations can me made using any suitable mutagenesis method known in the art, including, but not limited to, site-directed mutagenesis, oligonucleotide-directed mutagenesis, positive antibiotic selection methods, unique restriction site elimination (USE), deoxyuridine incorporation, phosphorothioate incorporation, and PCR-based mutagenesis methods. Details of such methods can be found in, for example, Lewis et al. (1990) Nucl. Acids Res. 18, p3439; Bohnsack et al. (1996) Meth. Mol. Biol. 57, p1; Vavra et al. (1996) Promega Notes 58, 30; Altered SitesII in vitro Mutagenesis Systems Technical Manual #TM001, Promega Corporation; Deng et al. (1992) Anal. Biochem. 200, p81; Kunkel et al. (1985) Proc. Natl. Acad. Sci. USA 82, p488; Kunke et al. (1987) Meth. Enzymol. 154, p367; Taylor et al. (1985) Nucl. Acids Res. 13, p8764; Nakamaye et al. (1986) Nucl. Acids Res. 14, p9679; Higuchi et al. (1988) Nucl. Acids Res. 16, p7351; Shimada et al. (1996) Meth. Mol. Biol. 57, p157; Ho et al. (1989) Gene 77, p51; Horton et al. (1989) Gene 77, p61; and Sarkar et al. (1990) BioTechniques 8, p404. Numerous kits for performing site-directed mutagenesis are commercially available, such as the QuikChange II Site-Directed Mutagenesis Kit and the Altered Sites II in vitro mutagenesis system. Such commercially available kits may also be used to optimize sequences. Other techniques that can be used to generate modified nucleic acid sequences are well known to those of skill in the art. See for example Sambrook et al. (2001) Molecular Cloning: A Laboratory Manual, 3rd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.
The following examples illustrate the present invention, and are set forth to aid in the understanding of the invention, and should not be construed to limit in any way the scope of the invention as defined in the claims which follow thereafter.
A highly articulated phylogenetic tree encompassing over 200 diverse Trx sequences from the three domains of life was constructed (
The sequences of the ancestral Trx enzymes were reconstructed using statistical methods based on maximum likelihood (Liberles, Ancestral sequence reconstruction, xiii, 252 p. (Oxford University Press, Oxford; New York, 2007; Gaucher et al., Nature 425, 285-8 (2003)). For a given node in the tree, the posterior probability values for all 20 amino acids were calculated considering each site of the inferred sequence. These values represent the probability that a certain residue occupied a specific position in the sequence at a particular point in the phylogeny. The posterior probabilities were calculated on the basis of an amino acid replacement matrix (Yang et al., Genetics 141, 1641-50 (1995)). The most probabilistic ancestral sequence (M-PAS) at a specific node was then reconstructed by assigning to each site the residue with the highest posterior probability.
Ovis
Fagopyrum
Clostridium
Deinococcus
Bos
Cryptosporidium
Clostridium
Thermus
Sus
Theileria
Thermoanaerobacter
Deinococcus
Equus
Plasmodium
Chlamydia
Dehalococcoides
Mus
Archaeoglobus
Chlamydophila
Chloroflexus
Rattus
Methanosaeta
Chlorobium
Chloroflexus
Methanococcoides
Chlorobium
Chloroflexus
Candidatus
Bacteroides
Aquifex
Methanospirillum
Flavobacterium
Bdellovibrio
Picrophilus
Porphyromonas
Geobacter
Callithrix
Methanococcus
Bacteroides
Bdellovibrio
Monodelphis
Methanocorpusculum
Bacteroides
Desulfovibrio
Ornithorhynchus
Methanosaeta
Porphyromonas
Geobacter
Gallus
Natronomonas
Bacteroides
Solibacter
Melopsittacus
Haloquadratum
Bacteroides
Solibacter
Ophiophagus
Haloarcula
Rhodopirellula
Acidobacteria
Xenopus
Natronomonas
Rhodopirellula
Wolinella
Tetraodon
Halobacterium
Mycobacterium
Helicobacter
Ictalurus
Archaeoglobus
Mycobacterium
Campylobacter
Danio
Thermoplasma
Corynebacterium
Helicobacter
Drosophila
Thermofilum
Thermobifida
Wolinella
Drosophila
Caldivirga
Streptomyces
Wolinella
Drosophila
Sulfolobus
Thermobifida
Agrobacterium
Apis
Sulfolobus
Mycobacterium
Sinorhizobium
Tribolium
Sulfolobus
Streptomyces
Brucella
Bombyx
Hyperthermus
Corynebacterium
Rickettsia
Graphocephala
Aeropyrum
Streptomyces
Bovin Mitochondrio
Litopenaeus
Metallosphaera
Mycobacterium
Equus
Sulfolobus
Thermobifida
Homo Mitochondrion
Aspergillus
Sulfolobus
Streptomyces
Rattus Mitochondrio
Neosartorya
Staphylothermus
Synechocystis
Mus Mitochondrion
Aspergillus
Aeropyrum
Nostoc
Thiobacillus
Aspergillus
Clostridium
Nostoc
Neisseria
Pichia
Thermoanaerobacter
Synechocystis
Thiobacillus
Candida
Bacillus
Thermosynechococcus
Burkholderia
Pichia
Bacillus
Thermosynechococcus
Bordetella
Kluyveromyces
Streptococcus
Synechocystis
Thiobacillus
Saccharomyces
Enterococcus
Thermosynechococcus
Thiobacillus
Candida
Listeria
Nostoc
Bordetella
Saccharomyces
Lactobacillus
Nostoc
Pseudomonas
Schizosaccharomyces
Lactobacillus
Thermosynechococcus
Vibrio
Monosiga
Staphylococcus
Nostoc
Yersinia
Entamoeba
Geobacillus
Prochlorococcus
Salmonella
Dictyostelium
Bacillus
Synechocystis
Shigella
Arabidopsis
Bacillus
Nostoc1
Escherichia
Arabidopsis
Lactobacillus
Escherichia
Limonium
Listeria
Shigella
Zea
Enterococcus
Salmonella
Vitis
Streptococcus
Pisum Chloroplast
Yersinia
Ostreococcus
Clostridium
Brana Chloroplast
Vibrio
Helicosporidium
Clostridium
Thermus
Thermal Stability of Ancient Trx Enzymes
As a first step toward investigating the physico-chemical properties of these resurrected enzymes, differential scanning calorimetry (DSC) was used to measure their thermal stabilities. The denaturation temperature (Tm) can provide an idea about the temperature range in which the proteins are operative.
Force-dependent chemical kinetics of disulfide reduction
It is also of great interest to examine the chemical mechanisms of disulfide bond reduction utilized by the resurrected enzymes. Given the ancient origin of the resurrected thioredoxin enzymes, with some of them predating the buildup of atmospheric oxygen, it can be assumed that chemical mechanisms of disulfide bond reduction utilized by the resurrected enzymes are closer to that of simple sulfur based molecules. Simple sulfur based molecules utilize a straightforward collision-driven substitution nucleophilic bimolecular (SN2) mechanism of reduction (Kice et al., Progress in Inorganic Chemistry (ed. Edwards, J. O.) 147-206 (2007)). By contrast, Trx enzymes utilize a complex mixture of chemical mechanisms including a critical substrate binding and rearrangement reaction that accounts for the vast increase in the efficiency of Trx over the simpler sulfur compounds that were available in early geochemistry (Wiita et al., Nature 450, 124-7 (2007); Perez-Jimenez et al., Nat Struct Mol Biol 16, 890-6 (2009)).
A single molecule force-spectroscopy based assay can be used to measure the effect of applying a well-controlled force to a disulfide bonded substrate, on its rate of reduction by a nucleophile. This assay can be used to distinguish the simple SN2 chemistry of nucleophiles (e.g. hydroxide, glutathione and L-Cys), from the more complex reduction chemistry of the Trx enzymes (Wiita et al., Nature 450, 124-7 (2007); Perez-Jimenez et al., Nat Struct Mol Biol 16, 890-6 (2009); Wiita et al., Proc Natl Acad Sci USA 103, 7222-7 (2006); Koti Ainavarapu et al., J Am Chem Soc 130, 6479-87 (2008); Garcia-Manyes et al., Nature Chemistry 1, 236-242 (2009); Liang and Fernandez, Mechanochemistry: One Bond at a Time. ACS Nano (2009)). This feature makes this assay a good system to probe the chemistry of the resurrected enzymes.
This approach is described in
The chemical mechanisms of disulfide reduction can be distinguished by their sensitivity to the force applied to the substrate (Perez-Jimenez et al., Nat Struct Mol Biol 16, 890-6 (2009)). Simple thiol reducing agents show a force-dependency where the rate always increased exponentially with the applied force (Wiita et al., Proc Natl Acad Sci USA 103, 7222-7 (2006); Koti Ainavarapu et al., J Am Chem Soc 130, 6479-87 (2008)). By contrast, modern Trx enzymes show a negative force dependency in the range of 30-200 pN (Perez-Jimenez et al., Nat Struct Mol Biol 16, 890-6 (2009)). This mechanism is consistent with a Michaelis-Menten binding reaction followed by a force-inhibited reorientation of the substrate disulfide bond, necessary for an SN2 reaction to occur (Wiita et al., Nature 450, 124-7 (2007)). In a second mechanism, the rate of reduction increases exponentially at forces above 200 pN. This mechanism can be described by a simple SN2 reaction and is found only in Trx enzymes of bacterial origin. Present in all thioredoxin enzymes, there is a force-independent mechanism of reduction that can be ascribed to single electron transfer reaction (Perez-Jimenez et al., Nat Struct Mol Biol 16, 890-6 (2009)).
Surprisingly, the same three reduction mechanisms can be observed in the ancient enzymes with similar patterns to those found in extant Trxs (
One might expect that Trx enzymes from primitive forms of life should have less-developed chemical mechanisms. For instance, one of the main factors controlling the chemistry of Trx catalysis is the geometry of the binding groove. In the case of modern bacterial-origin Trxs, the binding groove is less pronounced than in eukaryotic Trxs (Perez-Jimenez et al., Nat Struct Mol Biol 16, 890-6 (2009)). This structural difference is responsible for the different chemical behavior observed in eukaryotic versus bacterial Trxs. If ancient enzymes had a less-structured groove, it could make their chemistry more similar to that of simple reducing agents like L-Cys or TCEP (Ainavarapu et al., J Am Chem Soc 130, 436-7 (2008)). However, the chemistry of Trx enzymes seems to have been established very early in evolution, about 4 Gyr ago, in the same manner that it is observed today. This observation shows that the step from simple reducing compounds to well-structured and functional enzymes occurred early in molecular evolution (Nisbet and Sleep, Nature 409, 1083-91 (2001)).
Nevertheless, several aspects of the catalytic mechanisms of some ancestral Trxs are intriguing. For example, high activity is observed for AECA and LACA Trxs when the substrate is pulled at forces below 200 pN (
Activity of Ancestral Trxs in Acidic Conditions (pH 5)
LBCA, AECA and LACA lived in an anoxygenic environment likely rich in sulfur compounds and CO2 whereas LPBCA, LECA, LGPCA and LAFCA lived in an oxygenic environment (Nisbet and Sleep, Nature 409, 1083-91 (2001)) (
Methods Summary
Thioredoxin sequences were retrieved from GenBank. Phylogenetic analysis and sequence reconstructions were performed using MrBayes, PAUP and PAML as previously described (Gaucher et al., Nature 451, 704-7 (2008)). The reconstructed sequences were synthesized, cloned into pQE80L vector and expressed in E. coli cells. Protein engineering and purification was carried as described in Wiita et al., Nature 450, 124-7 (2007). Thermal stabilities were measured using a VP-Capillary DSC calorimeter from MicroCal. The heat capacity vs. temperature profiles were analyzed following the two-state thermodynamic model (Ibarra-Molero et al., Biochemistry 38, 8138-49 (1999)). AFM experiments were performed in a custom-made apparatus in its force-clamp mode (Fernandez and Li, Science 303, 1674-8 (2004)). Silicon nitride cantilevers were used with a typical spring constant of 0.02 N/m. The buffer used in the experiments contained 10 mM HEPES, 150 mM NaCl, 1 mM EDTA, 2 mM NADPH, pH 7.2. Individual (I27G32C-A75C)8 proteins are stretched at a constant force of 175-185 pN during 0.2-0.3 s. This pulse unfolds the modules up to the disulfide bond. The test-pulse force is then applied during several seconds to allow capturing all the possible reduction events. Trx reductase 50 nM (eukaryotic or bacterial) or DTE 200 μM was used to keep Trx enzymes in their reduced state. The traces containing reduction events are summated, normalized and fitted with a single exponential obtaining thus the reduction rate (r=1/π). A kinetic model containing two force-dependent rate constants was applied. The kinetic parameters were solved using matrix analysis and the errors were estimated using the bootstrap method. Igor software was used for data collection and analysis.
Phylogenetic Analysis and Ancestral Sequence Reconstruction.
A total of 203 thioredoxin sequences from the three domains of life were retrieved from GenBank (Table 1). Sequences were aligned using MUSCLE (Edgar, Nucleic Acids Res 32, 1792-7 (2004)) and further corrected manually. The phylogenetic analysis was carried out by the minimum evolution distance criterion with 1000 bootstrap replicates using PAUP* 4.0 beta. Ancestral sequences were reconstructed using PAML version 3.14 and incorporated the gamma distribution for variable replacement rates across sites (Yang, Comput Appl Biosci 13, 555-556 (1997)). For each site of the inferred sequences, posterior probabilities were calculated for all 20 amino acids. The amino acid residue with the highest posterior probability was then assigned at each site.
Protein Expression and Purification.
Genes encoding the ancestral Trxs enzymes were synthesized and codon-optimized for expression in E. coli cells. The genes were cloned into pQE80L vector (Qiagen) and transformed in E. coli BL21 (DE3) cells (Invitrogen). Cells were incubated overnight in LB medium at 37° C. and protein expression was induced with 1 mM IPTG. Cell pellets were sonicated and the His 6-tagged proteins were loaded onto His GraviTrap affinity column (GE Healthcare). The purified protein was verified by SDS-PAGE. The proteins were then loaded into PD-10 desalting column (GE Healthcare) and finally dialyzed against 50 mM HEPES, pH 7.0 buffer. The preparation of (I27G32C-A75C)8 was carried out as follows: mutations Gly32Cys and Ala75Cys are introduced into the I27 module using the QuickChange site-directed mutagenesis protocol. Multi-step cloning was performed to produce an N-C-linked eight-domain polypeptide. The gene encoding the polypeptide was cloned into a pQE80L and the protein was expressed at 37° C. for 4 hours in E. coli BLR (DE3) cells. Cell pellet was lysed using a French press. The polypeptide with a His 6-tagged was purified using Talon-Co2+ resin. The protein was further purified by size exclusion chromatography on a Superdex 200 HR 10/30 column. The protein was eluted in 10 mM HEPES, 150 mM NaCl, 1 mM EDTA, pH 7.2.
DSC Experiments
Thermal stabilities of ancestral and modern Trx enzymes were measured with a VP-Capillary DSC (MicroCal). Protein solutions were dialyzed into a buffer of 50 mM HEPES, pH 7. The scan speed was set to 1.5 K/min. Several buffer-buffer baselines were first obtained for proper equilibration of the calorimeter. Concentrations were 0.3-0.7 mg/mL and were determined spectrophotometrically at 280 nm using theoretical extinction coefficients and molecular weights. The experimental traces were analyzed following the two-state thermodynamic model (Ibarra-Molero et al., Biochemistry 38, 8138-49 (1999)).
AFM Experiments
The atomic force microscope used is a custom-made design (Fernandez and Li, Science 303, 1674-8 (2004)). Data acquisition is controlled by two PCI cards 6052E and 6703 (National Instruments). Cantilever model MLCT of silicon nitride were used. We calibrate the cantilever using the equipartition theorem (Florin et al., Biosensors & Bioelectronics 10, 895-901 (1995)) giving rise to a typical spring constant of 0.02 N/m. The AFM works in the force-clamp mode with length resolution of 0.5 nm. The feedback response can reach 5 ms. The buffer used in the experiment is 10 mM HEPES, pH 7.2, 150 mM NaCl, 1 mM EDTA, 2 mM NADPH. Trx enzymes are added to a desired concentration. The buffer also contains Trx reductase 50 nM (prokaryotic or eukaryotic) to keep Trx enzymes in their reduced state. E. coli Trx reductase works well with bacterial-origin Trx enzymes whereas eukaryotic Trx reductase works with Archaea/Eukaryote Trx enzymes. Similar results are obtained when using DTE 200 μM to keep Trx enzymes reduced, thus demonstrating that modern Trx reductases maintain fully reduced ancestral Trx enzymes. For the experiments at pH 5, 20 mM sodium acetate buffer and 200 μM DTE was used.
To perform the experiment 3-6 μl of substrate at ˜0.1 mg/mL was deposited on a gold-covered coverslide. A drop of ˜100 μl containing the Trx solution was then added. The force-clamp protocol consists of three pulses of force. In the first pulse the cantilever tip was pressed against the surface at 800 pN for 2 s. In the second pulse the attached (I27G32C-A75C)8 is stretched at 175-185 pN for 0.2-0.3 s. The third pulse is the test force where the reduction events are captured. This pulse is applied at different forces 30-500 pN time enough to capture all the possible reduction events.
The traces were collected and analyzed using custom-written software in Igor Pro 6.03. The traces containing the reduction events at each force were summated, normalized and fitted with a single exponential. From the fitting we can obtain a time constant, π, and thus the reduction rate at a given force (r=1/π). Bootstrapping method was used to obtain the error of the reduction rates. The bootstrapping was run 1000 times for each reduction rate obtaining a distribution from where the s.e.m. can be calculated.
AFM Data Analysis
The data were fitted following a three-state kinetic model previously described (Wiita et al., Nature 450, 124-7 (2007); Perez-Jimenez et al., Nat Struct Mol Biol 16, 890-6 (2009)). In this model three different chemical mechanisms are taken into account. The rate constants used in the kinetic model are:
k
01=α0[Trx]
k
12=β0exp(FΔx12/kBT)+λ0
k
02=γ0[Trx]exp(FΔx02/kBT)+λ0
k
10=δ0
Rate constants k01 and ko2 depend on Trx concentration in a linear manner. k12 and k02 exponentially depend on force. The kinetic model is solved using matrix analysis and parameters α0, β0, ΔX12, γ0, Δx02, λ and δ0 can be obtained for each ancestral enzyme. The optimal kinetic parameters are calculated by numerical optimization using the downhill simplex method (Nelder and Mead, Computer Journal 7, 308-313 (1965) (Table 2).
A brief explanation of the different chemical mechanisms is as follows: when the substrate is stretched at low force (below 200 pN) k01 and k12 dominate. The negative force dependence observed in all Trx enzymes (ancestral and modern) gives rise to a negative value of Δx12. This is consistent with a shortening of the polypeptide chain. This shortening was explained by a force-inhibited rotation of the disulfide bond necessary for the correct alignment of the S—S bond (180°) for an SN2 reaction to occur. This mechanism is similar to a Michaelis-Menten reaction in which the formation of an enzyme-substrate complex is crucial. A second reduction mechanism occurs at forces over 200 pN where k02 dominates. In the case of bacterial-origin Trxs, the rate of reduction is exponentially accelerated. This is consistent with a simple SN2 reaction with an elongation of the disulfide bond at the transition state, Δx02. This elongation, ˜0.18 Å, is only observed in bacterial-origin Trxs. In the case of eukaryotic-origin Trxs the rate of disulfide bond reduction when the substrate is pulled at forces over 200 pN is essentially force-independent. In this case k02=λ0. This force-independent mechanism is explained by a single-electron transfer reaction accounted for the parameter λ0 in the kinetic model. This mechanism seems to be ubiquitous to all Trx enzymes but is certainly remarkable in eukaryotic-origin Trxs. The origin of this diversity of chemical mechanisms was explained on the basis of the structural features of the binding groove (Perez-Jimenez et al., Nat Struct Mol Biol 16, 890-6 (2009)).
Described herein is data demonstrating the feasibility of reconstructing ancient thioredoxin enzymes from predicted nodes. For example, the predicted DNA sequence of a Trx enzyme from the node corresponding to the Last Bacterial Common Ancestor, dated about 4 billion years ago, was selected for gene synthesis and protein expression in our laboratory (
The resuscitated LBCA Trx showed a 26° C. higher denaturation temperature than that of modern E. coli Trx. Higher denaturation temperatures have also been reported for resuscitated elongation factor proteins (Gaucher et al., Nature, 2008. 451(7179): p. 704-U2; Gaucher et al., Nature, 2003. 425(6955): p. 285-8). The LBCA thioredoxin enzyme also showed a high rate of catalysis at pH 5, where extant enzymes are largely inactive (
Results with the last bacterial common ancestor Trx enzyme show the feasibility of resurrecting active enzymes that disappeared from Earth millions of years ago. This approach can be used to uncover variations in the chemical mechanisms of thioredoxin catalysis (
The force-dependent rate of reduction shows that human Trx, which has a much deeper groove than that of E. coli, excludes the force accelerated mechanism of reduction (type III in
Any enzyme that cleaves covalent bonds can be investigated using the single molecule force spectroscopy assay described herein. Exemplary molecules that can be examined using the methods described herein include but are not limited to proteases. Proteases are a vast group of proteins with highly important physiological functions (Lopez-Otin and Bond, J Biol Chem, 2008. 283(45): p. 30433-7). The fact that their catalytic mechanisms have been thoroughly studied by traditional techniques facilitates interpretation of the single-molecule results (Frey and Hegeman, Enzymatic reaction mechanisms. 2007, Oxford: Oxford University Press). The high substrate specificity shown by some proteases can be used to design substrates suitable for single-molecule force spectroscopy. The proteolysis of those substrates can be studied under force. The catalytic activity of proteases will a complex force dependency because proteases have substrate-binding grooves that are similar to those found in thioredoxin enzymes and because the chemical mechanism of proteolysis can involve geometric rearrangements at the transition state (Frey and Hegeman, Enzymatic reaction mechanisms. 2007, Oxford: Oxford University Press). As in the case of thioredoxins, the molecular interpretation of the force dependency of proteases will shed light into the sub-Ångström contortions of the substrate atoms as they are cleaved by the protease during catalysis.
To determine the force-dependency of protease catalysis, an appropriate substrate that can detect single protease cleavage events will be constructed. Because simply cleaving the backbone of a mechanically stretched protein would be the end the experiment because the polypeptide would loose its mechanical continuity, a substrate which retains its mechanical integrity upon cleavage and which also extends sufficiently to provide an unmistakable fingerprint will be constructed.
An exemplary substrate, as set forth in
Although the covalent bridge design works (
Short polypeptides containing a cleavage sequence and terminated by either thiols or maleimides (to covalently link the short polypeptide to the exposed cysteines) can also be generated. Because the intra-molecular conjugation scheme described herein is also dependent on the distance between the reactive groups, the position of the exposed cysteines conjugating bifunctional reagents (recognition sequences) can be varied among different lengths until optimal constructs are identified. The force dependency of the catalytic activity of enterokinase can be studied using these substrates. Given that enterokinase contains a substrate-binding groove (Lu et al., J Mol Biol, 1999. 292(2): p. 361-73), and that the chemistry of proteolysis involves structural rearrangements of the participating atoms (e.g. formation of a tetrahedral intermediate), these substrates can be used to determine whether force exerts a complex effect on enterokinase activity. Once the force-dependency of protease catalysis is measured, kinetic models can be developed to explain the data. In particular, the measured force dependency can be used to formulate activity models as a series of chemical mechanisms that require bond rotations/elongation of the recognition sequence. The effect of width, depth and hydrophobicity of the binding groove can be studies as functions of the measured force dependent mechanisms. This approach can also be extended to study other specific proteases such as factor Xa and thrombin as well as the role of substrate conformations in enzymatic catalysis. This approach can also be important for the development of drug targets given the medical importance of protease inhibitors.
An octamer of the I27 module can be mutated to incorporate two cysteine residues (G32C, A75C;
The methods described herein can be used to detect when the enzyme reduces a target disulfide bond. To determine when Trx enzymes bind to a substrate, how long it takes to reduce the substrate after binding and how long the enzyme remains attached to the substrate after the reduction event, force-clamp assays of disulfide bond reduction can be combined with single molecule fluorescence detection of enzyme binding to the exposed substrate using our newly developed AFM/TIRF instrument (
The labeled enzymes can be observed in the TIRF microscope.
The dissociation dwell time of the enzyme after a disulfide bond has been reduced can be measured from the combined AFM/TIRF experiments (
The single molecule assay described herein can also be used to study oxidative folding by thioredoxin enzymes. In vivo, thiol-disulfide exchange reactions are catalyzed by a number of enzymes belonging to the thioredoxin (Trx) superfamily. All of these enzymes share the thioredoxin fold and most feature a CXXC active site motif (Martin, Structure, 1995. 3(3): p. 245-50). In humans and other eukaryotes, thioredoxin catalyzes the cleavage of disulfide bonds whereas PDI enzymes catalyze their oxidation and isomerization. However the function of PDI as an oxidase is not unique given that, in S. cerevisiae, deletion strains lacking the essential gene encoding PDI can be rescued by a gene encoding for a simple thioredoxin C35S mutant (Chivers et al., EMBO J, 1996. 15(11): p. 2659-67). This thioredoxin variant has a CXXS active site, meaning that the conventional pathway for substrate reduction is not possible. In addition, PDI-like enzymes with CXXS active sites have also been shown to complement this yeast deletion strain (Tachibana et al., Mol Cell Biol, 1992. 12 (10): p. 4601-11; LaMantia et al., Cell, 1993. 74(5): p. 899-908). In certain aspects, the new single molecule oxidative folding assay described herein (e.g.
As shown in
Shown in
To study the oxidase mechanisms of thioredoxin, the value of Δt can be varied in order to determine the rate of reoxidation by hTrxC35S. The force dependency of the rate of reoxidation can be measured by quenching the force to different values during the folding/reoxidation period Δt. The methods described herein may also reveal a complex force dependency from substrate-enzyme interactions during oxidative folding. To study the role played by the binding groove in the reoxidation of the substrate, the C35S mutation will be engineered into E. coli thioredoxin enzymes. E. coli thioredoxin enzymes that have a much shallower groove than human Trx and show different mechanisms in its force dependency (
The single molecule assays described herein have the ability to identify and separate the different stages of protein folding (Garcia-Manyes et al., PNAS, 2009. 106(26): p. 10534-10539; Garcia-Manyes et al., PNAS, 2009. 106(26): p. 10540-10545), and can thus be used to determine at what stage of folding a thioredoxin enzyme is capable of oxidizing a substrate. Although the finding described herein show that the human thioredoxin mutant hTrxC35S gains oxidase activity, the methods described herein can also be used to determine whether the C35S mutation can have a similar effect on other members of the thioredoxin family with different groove depths.
The activity of the ancestral enzymes was measured using the conventional insulin assay (
Due to spontaneous precipitation of insulin at pH below 6, DTNB was used as a substrate for disulfide reduction to further verify the ability of the oldest enzymes to work at pH 5 (
Method Summary
Thioredoxin bulk enzymatic measurements. Bulk-solvent oxidoreductase activity for ancestral thioredoxins was determined using the insulin precipitation assay as described (Suarez, M. et al., Biophys Chem 147, 13-9 (2010); Holmgren, A., J Biol Chem 254, 9627-32 (1979); Perez-Jimenez et al., J. Biol. Chem., 283: 27121-27129 (2008)). In order to further verify the activity of ancestral Trxs enzymes at acidic pH, DTNB (5,5′-dithiobis-(2-nitrobenzoic acid)) was used as a substrate at pH 5. In this assay, Trxs enzymes were preactivated by incubation with 1 mM DTT. The reaction was initiated by adding active Trx to a final concentration of 4 μM to the cuvette containing 1 mM DTNB in 20 mM sodium acetate buffer, pH 5. Change in absorbance at 412 nm due to the formation of TNB was followed during 1 min. Activity was determined from the slope dΔA412/dt. A control experiment lacking Trx was registered and subtracted as baseline.
The crystal structure of the ancestral enzyme thioredoxin AECA is depicted in
This application claims priority to U.S. Provisional Application No. 61/364,640, filed on Jul. 15, 2010, and also claims priority to PCT/US11/44084, filed on Jul. 14, 2011, which are herein incorporated by reference in their entirety.
This invention was made with government support under HL66030 and HL61228 awarded by NIH. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/44275 | 7/15/2011 | WO | 00 | 5/29/2013 |
Number | Date | Country | |
---|---|---|---|
61364640 | Jul 2010 | US |