The official copy of the sequence listing is submitted electronically via EFS-Web as an ASCII formatted sequence listing with a file named 389762SEQLIST.TXT, created on Jul. 7, 2010, and having a size of 4.14 kilobytes and is filed concurrently with the specification. The sequence listing contained in this ASCII formatted document is part of the specification and is herein incorporated by reference in its entirety.
The present invention relates to the fields of molecular biology, three-dimensional structural determinations of polypeptides, and their methods of use.
Transgenic crops carrying herbicide resistance genes allow non-selective, broad-range herbicides such as glufosinate and glyphosate to be used as selective herbicides, effectively controlling a broader spectrum of weed species, and at the same time, minimizing injury to the crops (Castle et al. (2006) Curr. Opin. Biotechnol. 17(2):105-112). Glyphosate inhibits 5-enolpyruvylshikimate-3-phosphate (EPSP) synthase, an enzyme in the aromatic amino acid biosynthetic pathway essential for plants but absent in animals. The transgene present in most glyphosate-tolerant crops codes for a glyphosate-insensitive form of EPSPS, from Agrobacterium sp. (Padgette et al. (1996) In S. O. Duke (ed) Herbicide-Resistant Crops: Agricultural, Economic, Environmental, Regulatory, and Technological Aspects, Lewis Publishers: 53-84). An alternative glyphosate resistance strategy was recently reported (Castle et al. (2004) Science 304:1151-1154), in which glyphosate is converted to non-herbicidal N-acetylglyphosate, catalyzed by glyphosate N-acetyltransferase (GLYAT), optimized from B. licheniformis parental enzymes. In their native form, these enzymes exhibit acetylation activity to glyphosate in vitro but are unable to confer tolerance to transgenic organisms. High-efficiency variants exhibiting up to ˜5,000 fold enhancement in kcat/Km were obtained through multiple iterations of DNA shuffling.
Compositions and methods are needed that provide a clear understanding of how the tertiary structure of GLYAT variants impacts enzymatic activity. Such methods and compositions can be used to further develop GCN5-related N-acetyltransferases (GNATS) with improved enzymatic or substrate binding activity.
Compositions and methods for evaluating and identifying polypeptides that have an increased affinity or specificity for glyphosate when compared to a native glyphosate N-acetyltransferase (GLYAT) polypeptide are described. Further provided herein are methods for evaluating and identifying polypeptides having greater N-acetyltransferase activity when compared to a native N-acetyltransferase enzyme. Such methods involve the comparison of a three-dimensional molecular structure of region(s) of a GLYAT polypeptide with a three-dimensional molecular structure of a candidate polypeptide to evaluate the potential of the candidate polypeptide to bind to glyphosate with a higher binding affinity or specificity or to have higher activity than native GLYAT proteins. The methods further provide for the modification of the primary structure of the candidate polypeptide to maximize a similarity or relationship between the three-dimensional molecular structures of the GLYAT polypeptide region(s) and the candidate polypeptide in order to identify polypeptides with a higher binding affinity or activity for glyphosate.
Compositions include a computer-readable storage medium comprising the atomic coordinates of GLYAT polypeptide variants bound to glyphosate and acetyl coenzyme A (acetyl coA).
Provided herein is the structure of the optimized R7 or R11 variant of glyphosate N-acetyltransferase (GLYAT) bound to glyphosate and acetyl coA. Table 18 provides the atomic coordinates of GLYAT R7 bound to glyphosate and acetyl coA, whereas Table 19 provides the atomic coordinates of GLYAT R11 bound to glyphosate and acetyl coA. Compositions therefore include a computer readable storage medium as well as an electronic representation of these structures.
Further provided herein are methods for evaluating the potential of a candidate polypeptide to associate with glyphosate with a higher binding affinity and/or higher binding specificity than a native GLYAT. The method comprises providing a three-dimensional molecular structure of a candidate polypeptide and comparing the candidate polypeptide molecular structure to a three-dimensional molecular structure of at least a substrate binding cavity of a GLYAT polypeptide comprising the atomic coordinates provided herein or a variant thereof to determine if the candidate polypeptide comprises the GLYAT substrate binding cavity or variant thereof. In some embodiments of the methods of the invention, the molecular structure of the GLYAT polypeptide further comprises a GNAT wedge joining region. In these embodiments, the candidate polypeptide can be a polypeptide suspected of or having N-acetyltransferase activity. The molecular structure of the candidate polypeptide is compared to the GNAT wedge joining region of the GLYAT polypeptide to determine if the candidate polypeptide comprises the wedge joining region to evaluate the potential of the candidate polypeptide to have N-acetyltransferase activity with a higher catalytic rate (Kcat), a higher catalytic efficiency (KM/kcat), or both for glyphosate when compared to a native GLYAT polypeptide. The provided molecular structures of the candidate polypeptide and GLYAT polypeptide are determined with the polypeptides bound to glyphosate and an acetyl donor (e.g., acetyl coA).
Described methods involve comparing the three-dimensional molecular structures of a GLYAT polypeptide and a candidate polypeptide to evaluate the substrate binding affinity, specificity or N-acetyl transferase activity of the candidate polypeptide. As used herein, a polypeptide having N-acetyltransferase activity refers to a polypeptide having the ability to catalyze the transfer of an acetyl group from acetyl CoA (AcCoA) or another acetyl donor to an amine (e.g., primary amine, secondary amine). For example, glyphosate N-acetyltransferase (GLYAT) can transfer an acetyl group from acetyl CoA to the nitrogen of glyphosate. As used herein, a GLYAT polypeptide or enzyme comprises a polypeptide which has glyphosate-N-acetyltransferase activity (“GLYAT” activity), i.e., the ability to catalyze the acetylation of glyphosate. In specific embodiments, a polypeptide having glyphosate-N-acetyltransferase activity can transfer the acetyl group from acetyl CoA to the N of glyphosate. Some GLYAT polypeptides are also capable of catalyzing the acetylation of glyphosate analogs and/or glyphosate metabolites, e.g., aminomethylphosphonic acid. Methods to assay for this activity are disclosed, for example, in U.S. Application Publication Nos. 2003/0083480 and 2004/0082770, and U.S. Pat. No. 7,405,074, International Application Publication Nos. WO2005/012515, WO2002/36782, and WO2003/092360, each of which is herein incorporated by reference in its entirety.
The term “GLYAT polypeptide” can refer to native GLYAT polypeptides as well as variants thereof. As used herein, a “native” GLYAT polynucleotide or polypeptide comprises a naturally occurring nucleotide sequence or amino acid sequence, respectively, that encodes or comprises a polypeptide having GLYAT activity. It should be noted, however, that the term “native GLYAT polypeptide” can be used to refer to GLYAT sequences found in nature that have been expressed recombinantly or used in other molecular biological methods. Non-limiting examples of native GLYAT polypeptides include GLYAT polypeptides from Bacillus licheniformis, including the 401, B6, and DS3 polypeptides that are encoded by the genes found in GenBank under the accession numbers AX543338, AX543339, and AX543340, respectively (Castle et al. (2004) Science 304:1151-1154, which is herein incorporated by reference in its entirety). Non-limiting variants of GLYAT polypeptides are set forth in U.S. Application Publication No. 2004/0082770 and U.S. Application. Publication No. 2005/0246798, both of which are herein incorporated by reference in their entirety.
In embodiments, a recombinant GNAT polypeptide is described having an array of amino acid side chains which together comprise a glyphosate acetyltransferase active site, said active site being composed of: (i) at least the atomic coordinates of Table 1 or Table 2; or (ii) a structural variant of the substrate binding cavity of part (i), wherein said structural variant comprises a root mean square deviation from the back-bone atoms of the amino acids of Table 1 or Table 2 of not more than 2 Å, wherein said GNAT polypeptide has less than about 60% sequence identity to the native GLYAT as set forth in SEQ ID NO: 3. In embodiments, the recombinant GNAT polypeptide has less than about 55%, 50%, 45%, 40%, 35%, 30%, 25% or 20% sequence identity to SEQ ID NO: 3.
In embodiments, a recombinant GNAT polypeptide is described having an array of amino acid side chains which together comprise a glyphosate acetyltransferase active site, said active site being composed of: (i) at least the atomic coordinates of Table 7 or Table 8; or (ii) a structural variant of the GNAT wedge joining region of part (i), wherein said structural variant comprises a root mean square deviation from the back-bone atoms of the amino acids of Table 7 or Table 8 of not more than 2 Å, wherein said GNAT polypeptide has less than about 60% sequence identity to the native GLYAT as set forth in SEQ ID NO: 3. In embodiments, the recombinant GNAT polypeptide has less than about 55%, 50%, 45%, 40%, 35%, 30%, 25% or 20% sequence identity to SEQ ID NO: 3.
The active sites described herein can be combined with any polypeptide scaffold. Thus, a de novo polypeptide or protein can be designed having the active site described herein.
The methods of the invention also encompass the use of three-dimensional molecular structures of fragments and variants of GLYAT and candidate polypeptides. By “fragment” is intended a portion of the polynucleotide or a portion of the amino acid sequence and hence polypeptide encoded thereby. In general, three-dimensional molecular structures of polypeptides are determined with the entire polypeptide sequence because tertiary structures of the polypeptide can comprise interactions between amino acid residues that are distantly located within the primary structure of the polypeptide. In some embodiments, however, a molecular structure of a fragment of a polypeptide (candidate polypeptide or GLYAT polypeptide) is provided. Fragments of a polynucleotide may encode biologically active portions of GLYAT polypeptides. A biologically active fragment of a GLYAT polypeptide is one that retains glyphosate N-acetyltransferase activity or retains the ability to bind to glyphosate, acetyl CoA, or both.
A fragment of a GLYAT polynucleotide that encodes a biologically active portion of a GLYAT polypeptide will encode at least 15, 25, 30, 50, 100, 150, 200, or 250 contiguous amino acids, or up to the total number of amino acids present in a full-length GLYAT polypeptide. A biologically active portion of a GLYAT polypeptide can be prepared by isolating a portion of one of the native or variant GLYAT polynucleotides, expressing the encoded portion of the GLYAT polypeptide (e.g., by recombinant expression in vitro), and assessing the activity of the encoded portion of the GLYAT. Polynucleotides that are fragments of a GLYAT nucleotide sequence comprise at least 15, 20, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 800, 900, 1,000, 1,100, 1,200, 1,300, or 1,400 contiguous nucleotides, or up to the number of nucleotides present in a full-length GLYAT polynucleotide.
Molecular structures of variant GLYAT polypeptides are provided. As used herein, a variant GLYAT polypeptide is a polypeptide having GLYAT activity that is not found in nature without human intervention. A variant can be encoded by a variant polynucleotide that comprises a deletion and/or addition of one or more nucleotides at one or more internal sites within the native GLYAT polynucleotide and/or a substitution of one or more nucleotides at one or more sites in the native polynucleotide. For polynucleotides, conservative variants include those sequences that, because of the degeneracy of the genetic code, encode the amino acid sequence of one of the native GLYAT polypeptides. Variant polynucleotides include synthetically derived polynucleotides, such as those generated, for example, by using site-directed mutagenesis but which still encode a polypeptide having GLYAT activity. Generally, variants of a particular polynucleotide will have at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to that particular polynucleotide as determined by sequence alignment programs and parameters described elsewhere herein. The mutations that will be made in the polynucleotide encoding the variant must not place the sequence out of reading frame and optimally will not create complementary regions that could produce secondary mRNA structure.
Variants of a particular native. GLYAT polynucleotide (i.e., the reference polynucleotide) can also be evaluated by comparison of the percent sequence identity between the polypeptide encoded by a variant polynucleotide and the polypeptide encoded by the reference polynucleotide. Percent sequence identity between any two polypeptides can be calculated using sequence alignment programs and parameters described elsewhere herein. Where any given pair of polynucleotides is evaluated by comparison of the percent sequence identity shared by the two polypeptides they encode, the percent sequence identity between the two encoded polypeptides is at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
“Variant” protein is intended to mean a protein derived from the reference protein (i.e., native GLYAT polypeptide) by deletion or addition of one or more amino acids at one or more internal sites in the reference protein and/or substitution of one or more amino acids at one or more sites in the reference protein. Variant proteins encompassed by the present invention are biologically active, that is they continue to possess the desired biological activity of the reference protein, that is, glyphosate N-acetyl transferase activity or the ability to bind to glyphosate and/or acetyl coA as described herein. Biologically active variants of a GLYAT protein of the invention will have at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the amino acid sequence for the native protein as determined by sequence alignment programs and parameters described elsewhere herein. A biologically active variant of a protein may differ from that protein by as few as 1-15 amino acid residues, as few as 1-10, such as 6-10, as few as 5, as few as 4, 3, 2, or even 1 amino acid residue.
The proteins may be altered in various ways including amino acid substitutions, deletions, truncations, and insertions. Methods for such manipulations are generally known in the art. For example, amino acid sequence variants and fragments of the GLYAT proteins can be prepared by mutations in the DNA. Methods for mutagenesis and polynucleotide alterations are well known in the art. See, for example, Kunkel (1985) Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel et al. (1987) Methods in Enzymol. 154:367-382; U.S. Pat. No. 4,873,192; Walker and Gaastra, eds. (1983) Techniques in Molecular Biology (MacMillan Publishing Company, New York) and the references cited therein. Guidance as to appropriate amino acid substitutions that do not affect biological activity of the protein of interest may be found in the model of Dayhoff et al. (1978) Atlas of Protein Sequence and Structure (Natl. Biomed. Res. Found., Washington, D.C.), herein incorporated by reference. Conservative substitutions, such as exchanging one amino acid with another having similar properties, may be optimal.
The deletions, insertions, and substitutions of the protein sequence encompassed herein are not expected to produce radical negative changes in the characteristics of the protein. However, to confirm the effect of the substitution, deletion, or insertion in advance of doing so, one skilled in the art will appreciate that the effect may be evaluated by routine screening assays. Assays for measuring the acetylation of glyphosate are disclosed, for example, in U.S. Application Publication Nos. 2003/0083480 and 2004/0082770, and U.S. Pat. No. 7,405,074, and International Application Publication Nos. WO2005/012515 and WO2002/36782, each of which are herein incorporated by reference in its entirety.
Variant polynucleotides and proteins also encompass sequences and proteins derived from a mutagenic and recombinogenic procedure such as DNA shuffling. With such a procedure, one or more different GLYAT coding sequences can be manipulated to create a new GLYAT possessing the desired properties (having GLYAT activity). In this manner, libraries of recombinant polynucleotides are generated from a population of related sequence polynucleotides comprising sequence regions that have substantial sequence identity and can be homologously recombined in vitro or in vivo. For example, using this approach, sequence motifs encoding a domain of interest may be shuffled between a first GLYAT gene and other known GLYAT genes to obtain a new gene coding for a protein with an improved property of interest, such as a decreased KM. Strategies for such DNA shuffling are known in the art. See, for example, Stemmer (1994) Proc. Natl. Acad. Sci. USA 91:10747-10751; Stemmer (1994) Nature 370:389-391; Crameri et al. (I 997) Nature Biotech. 15:436-438; Moore et al. (1997) J. Mol. Biol. 272:336-347; Zhang et al. 0997) Proc. Natl. Acad. Sci. USA 94:4504-4509; Crameri et al. (1998) Nature 391:288-291; and U.S. Pat. Nos. 5,605,793 and 5,837,458.
Such gene shuffling procedures were used to identify optimized variants of GLYAT polypeptides with enhanced binding, specificity, or catalytic activities (Castle et al. (2004) Science 304:1151-1154). These optimized GLYAT polypeptides and the polynucleotides encoding them are known in the art and particularly disclosed, for example, in U.S. Application Publication Nos. 2003/0083480, 2004/0082770, and 2008/0234130 and U.S. Pat. No. 7,405,074, each of which is herein incorporated by reference in its entirety.
The GLYAT polypeptide used to generate the atomic coordinates provided in herein is a GLYAT R7 variant resulting from seven rounds of DNA shuffling of a native GLYAT polypeptide (Keenan et al. (2005) Proc Natl Acad Sci USA 102:8887-8892, which is herein incorporated by reference in its entirety) for which a crystal structure was determined (Siehl et al. (2007) J Biol Chem 282:11446-11455; Protein Databank (PDB):2JDC; PDB:2JDD; each of which is herein incorporated by reference in its entirety). In some embodiments, the R7 GLYAT variant polypeptide comprises the sequence set forth in SEQ JD NO: 1. The R7 GLYAT variant exhibits an improved catalytic efficiency for glyphosate in comparison to native GLYAT polypeptides (Siehl et al. (2007) J Biol Chem 282:11446-11455, which is herein incorporated by reference in its entirety). Thus, in some embodiments, the GLYAT polypeptide for which a molecular structure is provided for comparison to the structure of a candidate polypeptide has the sequence set forth in SEQ ID NO: 1. In other embodiments, the molecular structure represents an R11 GLYAT variant from the eleventh round of DNA shuffling (Keenan et al. (2005) Proc Natl Acad Sci USA 102:8887-8892) referred to by Siehl et al. (2007) J Biol Chem 282:11446-11455. In some embodiments, the R 11 GLYAT variant polypeptide has the sequence set forth in SEQ ID NO: 2.
Described methods are used to evaluate candidate polypeptides to determine if the polypeptides bind glyphosate with a higher binding affinity or greater specificity or if they exhibit greater catalytic activity than a native GLYAT polypeptide. As used herein, a “candidate polypeptide” refers to polypeptides that are being evaluated in the methods of the invention. The candidate polypeptide can be a naturally-occurring polypeptide or one that is not found in nature. Naturally-occurring candidate polypeptides may be from any organism, including but not limited to, a bacterium, fungus, animal, or human. The non-naturally occurring candidate polypeptide may have resulted from the mutagenesis or gene shuffling of a naturally-occurring sequence and may have been produced through recombinant or synthetic means.
In some embodiments, the candidate polypeptide has been shown to exhibit N-acetyltransferase activity or has sequence similarity to an N-acetyltransferase enzyme known in the art. Several families of N-acetyltransferase polypeptides are known. Such families include the GCN5 family, the p300/CBP family, the TAF250 family, the SRC) family, the MOZ family, and the N-terminal acetyltransferases (NAT) family. See, for example, Kouzarides et al., (2002) The EMBO J. 19:1176-1179; Kouzarides (1999) Current Opinions in Genetics Development 79:40-48, and Polevoda et al. (2003) J. Mol. Biol. 325:595-622, each of which are herein incorporated by reference in its entirety. Another family of N-acetyltransferases includes the GCN5-related N-acetyltransferases. See, INTERPRO Acc. No. IPRO00182, PFAM Accession No. PF00583 and Prosite profile PS51186. The GNAT superfamily includes aminoglycoside N-acetyltransferases, serotonin N-acetyltransferase (also known as aryl alkylamine N-acetyltransferase or AANAT), phosphinothricin acetyltransferase (PAT); glucosamine-6-phosphate N-acetyltransferase, glyphosate-N-acetyltransferase, the histone acetyltransferases, mycothiol synthase, protein N-myristoyltransferase, and the Fern family of amino acyl transferases (see Dyda et al, (2000) Annu. Rev. Biophys. Biomol. Struct. 29:81-103, which is herein incorporated in its entirety).
In some of these embodiments, the candidate polypeptide shares at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or greater sequence identity with a known N-acetyltransferase enzyme over the full-length of the polypeptide or with a fragment of the polypeptide. The candidate polypeptide and known N-acetyltransferase enzyme may share sequence similarity over at least about 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000 or more contiguous amino acids. The candidate polypeptide or the N-acetyltransferase with which a candidate polypeptide shares sequence identity may be a known member of the GCN5-related N-acetyltransferase (GNAT) superfamily of enzymes. In some embodiments, the three-dimensional molecular structure of the candidate polypeptide comprises a GNAT wedge. As used herein, a GNAT wedge comprises a V-shaped wedge formed by two central parallel beta strands splaying apart at the middle point (see β4 and β5 in
In some embodiments, the candidate polypeptide exhibits a similar primary structure to a native or variant GLYAT polypeptide. For example, the candidate polypeptide may share at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or greater sequence identity with a native GLYAT polypeptide or an optimized variant GLYAT polypeptide.
In some embodiments, the candidate polypeptide exhibits a similar primary structure to a native or variant phosphinothricin acetyltransferase (PAT) polypeptide, another enzyme capable of herbicide detoxification (De Block et al. (1987) EMBO J. 6:2513-2518). PAT polypeptides acetylate and detoxify phosphinothricin herbicides, such as glufosinate. Interestingly, GLYAT and PAT not only carry out the same acetylation reaction, but also share similar three-dimensional structures. Despite sequence divergence, the structural alignment between GLYAT PDB:2bsw (Keenan et al. (2005) Proc. Natl. Acad. Sci. USA 102(25):8887-8892) and PAT PDB:1 yr0 (Berman et al. (2000) Nucleic Acids Research 28:235-242) shows the two structures possessing the same fold with a Dali Z-score of 14.7 and an RMSD of 2.2 Å (Holm & Sander (1996) Science 273(5275):595-603). Furthermore, both glyphosate and glufosinate are similar in their chemical composition and structure.
Three-dimensional molecular structures of a GLYAT polypeptide and a candidate polypeptide are described herein. As used herein, the terms “molecular structure” refer to the arrangement of atoms within a particular object (e.g., polypeptide). Polypeptides can comprise a primary, secondary, and a tertiary molecular structure. A primary structure of a polypeptide consists of the linear arrangement of its amino acid residues, which is described by the amino acid sequence of the polypeptide. The secondary structure of a polypeptide consists of local inter-residue interactions by hydrogen bonds between backbone amide and carbonyl groups. The most common secondary structures are alpha helices and beta sheets. The tertiary structure represents the folding of the polypeptide chain, combining the elements of secondary structure, linked by turns and loops imparted by non-bond interactions and disulfide bonds. A three-dimensional molecular structure refers to the three-dimensional arrangement of atoms within a particular object (e.g., the three-dimensional structure of the atoms that comprise a polypeptide, and, optionally, the atoms that comprise a substrate that interacts with the polypeptide). In reference to a polypeptide, a three-dimensional molecular structure of a polypeptide is a representation of the tertiary structure of the polypeptide.
As used herein, a “beta-sheet” refers to two or more polypeptide chains (or beta-strands) that run alongside each other and are linked in a regular manner by hydrogen bonds between the main chain C═O and N—H groups. Therefore all hydrogen bonds in a beta-sheet are between different segments of a polypeptide. Hydrogen bonds in anti-parallel sheets are perpendicular to the chain direction and spaced evenly as pairs between strands. Hydrogen bonds in parallel sheets are slanted with respect to the chain direction and spaced evenly between strands.
As used herein, an “alpha helix” refers to the most abundant helical conformation found in globular proteins and the term is used in accordance with the standard meaning of the art. In an alpha helix, all amide protons point toward the N-terminus and all carbonyl oxygens point toward the C-terminus. Hydrogen bonds within an alpha helix also display a repeating pattern in which the backbone C═O of residue X (wherein X refers to ally amino acid) hydrogen bonds to the backbone H—N of residue X+4. The alpha helix is a coiled structure characterized by 3.6 residues per turn, and translating along its axis 1.5 Å per amino acid. Thus the pitch is 3.6×1.5 or 5.4 Å. The screw sense of alpha helices is always right-handed.
As used herein, a “loop” refers to any other conformation of amino acids (i.e. not a helix, strand or sheet). Additionally, a loop may contain hydrogen bond interactions between amino acids, including the side chains of the amino acids, but not in a repetitive, regular fashion.
A three-dimensional molecular structure of a polypeptide or a fragment thereof is most often provided through a solved structure based on X-ray diffraction data from a crystal of the polypeptide. One of skill in the art will also appreciate that, along with X-ray crystallography, three-dimensional molecular structures can also be generated using nuclear magnetic resonance (NMR) spectroscopy. Although NMR spectroscopy advantageously allows for the structure of a particular polypeptide to be determined in solution, the utility of NMR for structure determination is limited to very small proteins. Methods for structure determination using NMR can be found, for example, in Wüthrich (1986) NMR of proteins and nucleic acids, Wiley New York; Wüthrich (1990) J Biol Chem 265:22059-22062; Cavanagh et al, (1996) Protein NMR Spectroscopy, Academic Press; San Diego), each of which is herein incorporated by reference in its entirety.
In some embodiments, the three-dimensional molecular structures of a GLYAT polypeptide, a candidate polypeptide, or both are determined using X-ray crystallography, wherein the polypeptides are purified, crystallized, and exposed to an X-ray beam to generate diffraction data from which a three-dimensional molecular structure can be determined.
As used herein, the term “crystal” refers to any three-dimensional ordered array of molecules that diffracts X-rays. In order to generate crystals of a polypeptide or for structure determination via NMR spectroscopy, the polypeptide must be purified and concentrated. The polypeptide can be naturally or synthetically derived or produced by recombinant means. For example, a bacterial host, such as E. coli, can be used to express large quantities of the GLYAT or candidate polypeptide. The polypeptide can be purified by methods known in the art, including, but not limited to, selective precipitation, dialysis, chromatography, and/or electrophoresis. In some embodiments, the GLYAT polypeptide is purified using CoA-agarose affinity chromatography and gel filtration. Purification may be monitored by SDS-PAGE or by measuring the ability of a fraction to perform the catalytic activity. Any standard method of measuring acetyltransferase activity may be used.
For certain embodiments, it may be desirable to express the polypeptide as a fusion protein. In specific non-limiting embodiments, the fusion protein comprises a tag which facilitates purification of the GLYAT or candidate polypeptide. As referred to herein, a “tag” is any added series of amino acids which are provided in a protein at either the C-terminus, the N-terminus, or internally that contributes to the identification or purification of the protein. Suitable tags include but are not limited to tags known to those skilled in the art to be useful in purification including but not limited to a His tag, flag tag, glutathione-s-transferase, and maltose binding protein. Such tagged proteins may also be engineered to comprise a cleavage site, such as a thrombin, enterokinase or factor X cleavage site, for ease of removal, of the tag before, during or after purification. Vector systems which provide a tag and a cleavage site for removal of the tag are particularly useful to make expression constructs for expression and purification of the polypeptide. A tagged polypeptide may be purified by immuno-affinity or conventional chromatography, including but not limited to, chromatography employing the following:
glutathione-sepharose (Amersham-Pharmacia, Piscataway, N.J.) or an equivalent resin, nickel or cobalt-purification resins, nickel-agarose resin, anion exchange chromatography, cation exchange chromatography, hydrophobic resins, gel filtration, antibody-conjugated resin, and reverse phase chromatography. In some embodiments, after purification, at least about 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or greater of total protein is the GLYAT or candidate polypeptide or a mixture of the polypeptide and one or more substrates or modulators thereof (e.g., glyphosate, acetyl coA). The polypeptide or complexed polypeptide may be concentrated to achieve a concentration equal to or greater than about 1 mg/ml for crystallization purposes, including but not limited to about 1 mg/ml, 2 mg/ml, 3 mg/ml, 4 mg/ml, 5 mg/ml, 6 mg/ml, 7 mg/ml, 8 mg/ml, 9 mg/ml, 10 mg/ml, 15 mg/ml, 20 mg/ml, 25 mg/ml, or greater. In one embodiment, the concentration is greater than about 5 mg/ml. In some embodiments, the concentration is about 10 mg/ml.
Crystals can be grown from an aqueous solution containing the purified and concentrated GLYAT or candidate polypeptide by a variety of techniques. These techniques include batch, liquid, bridge, dialysis, vapor diffusion, and hanging drop methods (McPherson (1982) John Wiley, New York; McPherson (1990) Eur. J. Biochem. 189:1-23; Webber (1991) Adv. Protein Chem. 41:1-36, each of which is herein incorporated by reference in its entirety). Seeding of the crystals in some instances may be required to obtain X-ray quality crystals. Standard micro and/or macro seeding of crystals may therefore be used. In general, crystals are grown by adding precipitants to the concentrated solution of the polypeptide. The precipitants are added at a concentration just below that necessary to precipitate the protein. Water is removed by controlled evaporation to produce precipitating conditions, which are maintained until crystal growth ceases.
In some embodiments, the GLYAT or candidate polypeptide is crystallized via hanging drop vapor diffusion against a crystallization solution. In some embodiments, the crystallization solution comprises sodium acetate, ammonium sulfate, and polyethylene glycol. In some of these embodiments, the concentration of sodium acetate within the crystallization solution ranges from about 50 mM to about 200 mM, including but not limited to about 50 mM, 60 mM, 70 mM, 80 mM, 90 mM, 100 mM, 125 mM, 150 mM, 175 mM, and 200 mM. In these embodiments, the pH of the sodium acetate can range from about 3.5 to about 6.0, including but not limited to about 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, and 6.0. In particular embodiments, the crystallization solution comprises 100 mM sodium acetate at a pH of about 4.6. In certain embodiments, the concentration of ammonium sulfate within the crystallization solution ranges from about 150 mM to about 300 mM, including but not limited to, about 150 mM, 175 mM, 200 mM, 225 mM, 250 mM, 275 mM, and 300 mM. In some embodiments, the crystallization solution comprises PEG4000 at a concentration ranging from about 15% to about 40%, including but not limited to about 15%, 20%, 25%, 30%, 35%, and 40%. In certain embodiments, the concentration of PEG4000 in the crystallization solution ranges from about 20% to about 25%. In particular embodiments, the crystallization solution comprises about 100 mM sodium acetate at a pH of about 4.6, 150 mM to about 300 mM ammonium sulfate, and about 20% to about 25% PEG4000.
To collect diffraction data From the crystals of the GLYAT polypeptide or candidate polypeptide, the crystals may be flash-frozen in the crystallization solution employed for the growth of said crystals. In some embodiments, the crystals are flash frozen in a buffer wherein the precipitant concentration is higher than the crystallization buffer. If the precipitant is not a sufficient cryoprotectant (i.e. a glass is not formed upon flash-freezing), cryoprotectants (e.g. glycerol, ethylene glycol, low molecular weight PEGs, alcohols, etc.) may be added to the solution in order to achieve glass formation upon flash-freezing, providing the cryoprotectant is compatible with preserving the integrity of the crystals. In some embodiments, the cryoprotectant solution comprises sodium acetate, glycerol, and polyethylene glycol. In some of these embodiments, the concentration of sodium acetate within the cryoprotectant solution ranges from about 50 mM to about 200 mM, including but not limited to about 50 mM, 60 mM, 70 mM, 80 mM, 90 mM, 100 mM, 125 mM, 150 mM, 175 mM, and 200 mM. In these embodiments, the pH of the sodium acetate can range from about 3.5 to about 6.0, including but not limited to about 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, and 6.0. In particular embodiments, the cryoprotectant solution comprises about 100 mM sodium acetate at a pH of about 4.6. In some embodiments, the cryoprotectant solution comprises PEG4000 at a concentration ranging from about 15% to about 40%, including but not limited to about 15%, 20%, 25%, 30%, 35%, and 40%. In certain embodiments, the concentration of PEG4000 in the cryoprotectant solution is about 20%. The cryoprotectant solution can comprise glycerol at a concentration ranging from about 10% to about 30%, including but not limited to about 10%, 15%, 20%, 25%, and 30%. In particular embodiments, the cryoprotectant solution comprises about 100 mM sodium acetate at a pH of about 4.6, about 20% PEG4000, and about 20% glycerol.
In those embodiments wherein a molecular structure of the GLYAT or candidate polypeptide in complex with substrate(s) is desired, the substrate(s) can be added to the crystallization solution and the cryoprotectant solution. One of skill in the art will appreciate that the substrate(s) should be included at a concentration that is at, near or above the concentration required for saturation of the substrate binding site of the enzyme. As used herein, a “substrate” refers to a molecule that is capable of binding to the enzyme and being acted upon by the enzyme. The term substrate comprises metabolites, cofactors, coenzymes, and prosthetic groups (e.g., heme) that are required for enzymatic catalysis. Thus, in some embodiments, acetyl CoA is added to the crystallization and cryoprotectant solution. In some of these embodiments, the concentration of acetyl CoA in the crystallization and cryoprotectant solution ranges from about 0.1 mM to about 20 mM, including but not limited to about 0.1 mM, 0.2 mM, 0.3 mM, 0.4 mM, 0.5 mM, 1 mM, 2 mM, 3 mM, 4 mM, 5 mM, 6 mM, 7 mM, 8 mM, 9 mM, 10 mM, 11 mM, 12 mM, 13 mM, 14 mM, 15 mM, 16 mM, 17 mM, 18 mM, 19 mM, or 20 mM. In certain embodiments, the concentration of acetyl CoA in the crystallization and cryoprotectant solutions is about 2 mM.
In some embodiments, glyphosate is added to the crystallization and cryoprotectant solution. In some of these embodiments, the concentration of glyphosate in the crystallization and cryoprotectant solution ranges from about 2 mM to about 50 mM, including, but not limited to about 2 mM, 5 mM, 10 mM, 15 mM, 20 mM, 25 mM, 30 mM, 35 mM, 40 mM, 45 mM and 50 mM. In certain embodiments, the concentration of glyphosate in the crystallization and cryoprotectant solution is about 20 mM.
In particular embodiments, both glyphosate and acetyl CoA are added to the crystallization and cryoprotectant solutions and the three-dimensional molecular structures of the GLYAT polypeptide and candidate polypeptide are determined in complex with both glyphosate and acetyl CoA. In some of these embodiments, the concentration of glyphosate is about 20 mM and the concentration of acetyl coA is about 2 mM in the crystallization and cryoprotectant solutions.
As used herein, the term “glyphosate” refers to the molecule whose chemical structure is depicted in
The flash-frozen crystals are maintained at a temperature of less than about −110° C. in some embodiments and in other embodiments, less than about −150° C. during the collection of the crystallographic data by X-ray diffraction. The diffraction data is generally obtained by placing a crystal in an X-ray beam. The incident X-rays interact with the electron cloud of the molecules that make up the crystal, resulting in X-ray scatter. The combination of X-ray scatter with the lattice of the crystal gives rise to non-uniformity of the scatter; areas of high intensity are called diffracted X-rays. The angle at which diffracted beams emerge from the crystal can be computed by treating diffraction as if it were reflection from sets of equivalent, parallel planes of atoms in a crystal (Bragg's Law). The most obvious sets of planes in a crystal lattice are those that are parallel to the faces of the unit cell. These and other sets of planes can be drawn through the lattice points. Each set of planes is identified by three indices, hkl. The h index gives the number of parts into which the a edge of the unit cell is cut, the k index gives the number of parts into which the b edge of the unit cell is cut, and the l index gives the number of parts into which the c edge of the unit cell is cut by the set of hkl planes.
When a detector is placed in the path of the diffracted X-rays, in effect cutting into the sphere of diffraction, a series of spots, or reflections, are recorded to produce a “still” diffraction pattern. Each reflection is the result of X-rays reflecting off one set of parallel planes, and is characterized by an intensity, which is related to the distribution of molecules in the unit cell, and hkl indices, which correspond to the parallel planes from which the beam producing that spot was reflected. If the crystal is rotated about an axis perpendicular to the X-ray beam, a large number of reflections are recorded on the detector, resulting in a diffraction pattern.
Sources of X-rays include, but are not limited to, a rotating anode X-ray generator such as a Rigaku RU-200 or a beamline at a synchrotron light source. Suitable detectors for recording diffraction patterns include, but are not limited to, X-ray sensitive film, multiwire area detectors, image plates coated with phosphorus, and CCD cameras. Typically, the detector and the X-ray beam remain stationary, so that, in order to record diffraction from different parts of the crystal's sphere of diffraction, the crystal itself is moved via an automated system of moveable circles called a goniostat.
The unit cell dimensions and space group of a crystal can be determined from its diffraction pattern. The “unit cell” is the crystal's repeating unit. The spacing of reflections is inversely proportional to the lengths of the edges of the unit cell. Therefore, if a diffraction pattern is recorded when the X-ray beam is perpendicular to a face of the unit cell, two of the unit cell dimensions may be deduced from the spacing of the reflections in the x and y directions of the detector, the crystal-to-detector distance, and the wavelength of the X-rays. Those of skill in the art will appreciate that, in order to obtain all three unit cell dimensions, the crystal must be rotated such that the X-ray beam is perpendicular to another face of the unit cell. Second, the angles of a unit cell can be determined by the angles between lines of spots on the diffraction pattern. Third, the absence of certain reflections and the repetitive nature of the diffraction pattern, which may be evident by visual inspection, indicate the internal symmetry, or space group, of the crystal. Therefore, a crystal may be characterized by its unit cell and space group, as well as by its diffraction pattern.
Once the dimensions of the unit cell are determined, the likely number of polypeptides in the asymmetric unit can be deduced from the size of the polypeptide, the density of the average protein, and the typical solvent content of a protein crystal, which is usually in the range of 30-70% of the unit cell volume.
The sphere of diffraction has symmetry that depends on the internal symmetry of the crystal, which means that certain orientations of the crystal will produce the same set of reflections. Thus, a crystal with high symmetry has a more repetitive diffraction pattern, and there are fewer unique reflections that need to be recorded in order to have a complete representation of the diffraction. The goal of data collection, a dataset, is a set of consistently measured, indexed intensities for as many reflections as possible. A complete dataset is collected if at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or greater of unique reflections are recorded. In some embodiments, a complete dataset is collected using one crystal. In another embodiment, a complete dataset is collected using more than one crystal of the same type.
Once a dataset of intensities for the reflections is collected, the information is used to determine the three-dimensional structure of the molecule in the crystal. However, in the absence of a suitable molecular model, this cannot be done from a single measurement of reflection intensities because certain information, known as phase information, is lost between the three-dimensional shape of the molecule and its Fourier transform, the diffraction pattern. This phase information must be acquired by methods described below in order to perform a Fourier transform on the diffraction pattern to obtain the three-dimensional structure of the molecule in the crystal. It is the determination of phase information that in effect refocuses X-rays to produce the image of the molecule.
In one approach, if the polypeptide for which the structure is to be solved forms crystals that are isomorphous, i.e., that have the same unit cell dimensions and space group as a related molecule whose structure has been determined, then the phases and/or co-ordinates for the related molecule can be combined directly with newly observed amplitudes to obtain electron density maps and, consequently, atomic co-ordinates of the polypeptide with unknown structure.
In another approach, if the polypeptide of unknown structure is related to another molecule of known three-dimensional structure, but crystallizes in a different unit cell with different symmetry, the skilled artisan may use a technique known as molecular replacement to obtain useful phases from the co-ordinates of the molecule whose structure is known (M. G. Rossmann, ed. “The Molecular Replacement Methods,” Sci. Rev. J. No. 13, Gordon & Breach, New York, N.Y. (1972); Eaton Lattman, “Use of Rotation and Translation Functions,” H. W Wyckoff C. H. W. Hist. (S, N. Timasheff, ed.) Methods in Enzmmology, 115: 55-77 (1985)). For an example of the application of molecular replacement, see, for example, Rice & Steitz (1994) EMBO J. 13:1514-24). Specifically, molecular replacement is a method of calculating initial phases for a new crystal of a polypeptide or polypeptide co-complex whose structure coordinates are unknown by orienting and positioning a related polypeptide whose structure coordinates are known within the unit cell of the new crystal so as to best account for the observed diffraction pattern of the new crystal. To enable this, the related molecule must have a similar three dimensional structure. Briefly, the principle behind the method of molecular replacement is as follows. The three-dimensional structure of the known molecule is positioned within the unit cell of the new crystal by finding the orientation and position that provides the best agreement between observed diffraction amplitudes and those calculated from the co-ordinates of the positioned polypeptide. From this modeling, approximate phases for the unknown crystal can be derived. Once the orientation of a test molecule is known, the position of the molecule must be found using a translational search. X-PLOR (Brunger et al. (1987) Science 235:458-460; CNS (Crystallography & NMR System), Brunger et al., (1998) Acta Cryst. Sect. D 54: 905-921), and AMORE: an Automatic Package for Molecular Replacement (Navaza, J. (1994) Acta Cryst. Sect. A, 50: 157-163) are computer programs that can execute rotation and translation function searches. Once the known structure has been positioned in the unit cell of the unknown molecules, phases for the observed diffraction data can be calculated from the atomic co-ordinates of the structurally related atoms of the known molecules. By using the calculated phases and X-ray diffraction data for the unknown molecule, the skilled artisan can generate an electron density map and/or atomic co-ordinates of the GLYAT polypeptide of candidate polypeptide.
In general, the success of molecular replacement for solving structures depends on the fraction of the structures that are related and their degree of identity. For example, if about 50% or more of the structure shows a root mean square (RMS) deviation between corresponding atoms in the range of about 2 Å or less, the known structure can be successfully used to solve the unknown structure.
The term “root mean square deviation” means the square root of the arithmetic mean of the squares of the deviations from the mean. It is a way to express the deviation or variation from a trend or object. For example, the “root mean square deviation” can define the variation in the backbone of a polypeptide from the relevant portion of the backbone of a GLYAT polypeptide or a portion thereof as defined by the structure coordinates described herein.
A third method of phase determination is multi-wavelength anomalous dispersion or MAD. In this method, X-ray diffraction data are collected at several different wavelengths from a single crystal containing at least one heavy atom with absorption edges near the energy of incoming X-ray radiation. The resonance between X-rays and electron orbitals leads to differences in X-ray scattering that permits the locations of the heavy atoms to be identified, which in turn provides phase information for a crystal of a polypeptide. A detailed discussion of MAD analysis can be found in Hendrickson (1985) Trans. Am. Crystallogr. Assoc., 21:11; Hendrickson et al. (1990) EMBO J. 9:1665; and Hendrickson (1991) Science 4:91.
A fourth method of determining phase information is single wavelength anomalous dispersion or SAD. In this technique, X-ray diffraction data are collected at a single wavelength from a single native or heavy-atom derivative crystal, and phase information is extracted using anomalous scattering information from atoms such as sulfur or chlorine in the native crystal or from the heavy atoms in the heavy-atom derivative crystal. A detailed discussion of SAD analysis can be found in Brodersen et al. (2000) Acta Cryst. D56:431-441.
A fifth method of determining phase information is single isomorphous replacement with anomalous scattering or SIRAS. This technique combines isomorphous replacement and anomalous scattering techniques to provide phase information for a crystal of a polypeptide. X-ray diffraction data are collected at a single wavelength, usually from a single heavy-atom derivative crystal. Phase information obtained only from the location of the heavy atoms in a single heavy-atom derivative crystal leads to an ambiguity in the phase angle, which is resolved using anomalous scattering from the heavy atoms. Phase information is therefore extracted from both the location of the heavy atoms and from anomalous scattering of the heavy atoms. A detailed discussion of SIRAS analysis can be found in North (1965) Acta Cryst. 18:212-216; Matthews (1966) Acta Cryst. 20:82-86.
To generate a heavy atom derivative of a polypeptide, the crystals of the polypeptide may be soaked in heavy-atoms. As used herein, heavy atom derivative or derivatization refers to the method of producing a chemically modified form of a protein or protein complex crystal wherein said protein is specifically bound to a heavy atom within the crystal. In practice, a crystal is soaked in a solution containing heavy metal atoms or salts, or organometallic compounds (e.g., lead chloride, gold cyanide, thimerosal, lead acetate, uranyl acetate, mercury chloride, gold chloride) which can diffuse through the crystal and bind specifically to the protein. The location(s) of the bound heavy metal atom(s) or salts can be determined by X-ray diffraction analysis of the soaked crystal. This information is used to generate phase information which is used to construct the three-dimensional structure of the crystallized polypeptide.
In another approach, if no crystals are available for the candidate polypeptide, but it is homologous to another molecule whose three-dimensional structure is known, the skilled artisan may use a process known as homology modeling to produce a three-dimensional model of the candidate polypeptide. Accordingly, information concerning the crystals and/or atomic co-ordinates of one molecule can greatly facilitate the determination of the structures of related molecules.
As used herein, the term “homology modeling” refers to the practice of deriving models for three-dimensional structures of macromolecules from existing three-dimensional structures for their homologues. In general, the procedure may comprise one or more of the following steps: aligning the amino acid sequence of an unknown molecule against the amino acid sequence of a molecule whose structure has previously been determined; identifying structurally conserved and structurally variable regions; generating atomic co-ordinates for core (structurally conserved) residues of the unknown structure from those of the known structure(s); generating conformations for the other (structurally variable) residues in the unknown structure; building side chain conformations; and refining structure through energy minimization and molecular dynamics, and/or evaluating the unknown structure. Homology models are obtained using computer programs that make it possible to alter the identity of residues at positions where the sequence of the molecule of interest is not the same as that of the molecule of known structure. For example, homology modeling was used to generate the R11 and YVII revertant mutant described elsewhere herein (see Experimental section).
Once phase information is obtained, it is combined with the diffraction data to produce an electron density map, an image of the electron clouds that surround the molecules in the unit cell. For basic concepts and procedures of collecting, analyzing, and utilizing X-ray diffraction data for the construction of electron densities see, for example, Campbell et al. (1984) Biological Spectroscopy, The Benjamin/Cummings Publishing Co., Inc., (Menlo Park, Calif.); Cantor et al. (1980) Biophysical Chemistry, Part II: Techniques for the study of biological structure and function, W. H. Freeman and Co., San Francisco, Calif.; A. T. Brunger (1993) X-PLOR Version 3.1: A system for X-ray crystallography and NMR, Yale Univ. Pr., (New Haven, Conn.); M. M. Woolfson (1997) An Introduction to X-ray Crystallography, Cambridge Univ. Pr., (Cambridge, UK); J. Drenth (1999) Principles of Protein X-ray Crystallography (Springer Advanced Texts in Chemistry), Springer Verlag; Berlin; Tsirelson et al. (1996) Electron Density and Bonding in Crystals: Principles, Theory and X-ray Diffraction Experiments in Solid State Physics and Chemistry, Inst. of Physics Pub.; U.S. Pat. No. 5,942,428; U.S. Pat. No. 6,037,117; U.S. Pat. No. 5,200,910 and U.S. Pat. No. 5,365,456 (“Method for Modeling the Electron Density of a Crystal”).
The higher the resolution of the data, the more distinguishable are the features of the electron density map, e.g., amino acid side chains and the positions of carbonyl oxygen atoms in the peptide backbones, because atoms that are closer together are resolvable. In certain embodiments, the protein crystals and protein-substrate complex crystals of the GLYAT polypeptide or candidate polypeptide diffract to a high resolution limit. As used herein, the term “resolution” in relation to electron density is a measure of the resolvability in the electron density map of a molecule. In X-ray crystallography, resolution is the highest resolvable peak in the diffraction pattern. Resolution is expressed in terms of the lowest resolvable distance between two atoms, measured in angstroms (Å). In some embodiments, the maximal resolution of crystals of the GLYAT polypeptide or candidate polypeptide, alone or complexed with one or more substrate (e.g., glyphosate) is less than or equal to about 3.5 Å, including, but not limited to about 3.5 Å, 3.4 Å, 3.3 Å, 3.2 Å, 3.1 Å, 3.0 Å, 2.9 Å, 2.8 Å, 2.7 Å, 2.6 Å, 2.5 Å, 2.4 Å, 2.3 Å, 2.2 Å, 2.1 Å, 2.0 Å, 1.9 Å, 1.8 Å, 1.7 Å, 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2, Å, 1.1 Å, 1.0 Å, or less than 1.0 Å. In particular embodiments, the polypeptide or polypeptide-substrate complex crystal have a resolution limit of about 1.6 Å.
The electron density maps generated from the diffraction and phase data are used to establish the positions of the individual atoms within a single polypeptide, which are expressed as atomic coordinates. As used herein, the term “atomic coordinates” refers to mathematical co-ordinates (represented as “X,” “Y” and “Z” values) that describe the positions of atoms in a crystal of a polypeptide with respect to a chosen crystallographic origin. As used herein, the term “crystallographic origin” refers to a reference point in the crystal unit cell with respect to the crystallographic symmetry operation. These atomic coordinates can be used to generate a three-dimensional representation of the molecular structure of the polypeptide.
A model of the macromolecule is then built into the electron density map with the aid of a computer, using as a guide all available information, such as the polypeptide sequence and the established rules of molecular structure and stereochemistry. Interpreting the electron density map is a process of finding the chemically realistic conformation that fits the map precisely. The atomic co-ordinates are entered into one or more computer programs for molecular modeling, as known in the art. By way of illustration, a list of computer programs useful for viewing or manipulating three-dimensional structures include: Midas (University of California, San Francisco); MidasPlus (University of California, San Francisco); MOIL (University of Illinois); Yumrnie (Yale University); Sybyl (Tripos, Inc.); Insight/Discover (Biosym Technologies); MacroModel (Columbia University); Quanta (Molecular Simulations, Inc.); Cerius (Molecular Simulations, Inc.); Alchemy (Tripos, Inc.); LabVision (Tripos, Inc,); Rasmol (Glaxo Research and Development); Ribbon (University of Alabama); NAOMI (Oxford University); Explorer Eyecbem (Silicon Graphics, Inc.); Univision (Cray Research); Molscript (Uppsala University); Chem-3D (Cambridge Scientific); Chain (Baylor College of Medicine); 0 (Uppsala University); GRASP (Columbia University); X-Plor (Molecular Simulations, Inc.; Yale University); Spartan (Wavefunction, Inc.); Catalyst (Molecular Simulations, Inc.); Molcadd (Tripos, Inc.); VMD (University of Illinois/Beckman Institute); Sculpt (Interactive Simulations, Inc.); Procheck (Brookhaven National Library); DGEOM (QCPE); REVIEW (Brunell University); Modeller (Birbeck College, University of London); Xmol (Minnesota Supercomputing Center); Protein Expert (Cambridge Scientific); HyperChcm (Hypercube); MD Display (University of Washington); PKB (National Center for Biotechnology Information, NIH); ChemX (Chemical Design, Ltd.); Cameleon (Oxford Molecular, Inc.); and Iditis (Oxford Molecular, Inc.).
After a model is generated, the structure is refined. Refinement is the process of minimizing the function Φ, which is the difference between observed and calculated intensity values (measured by an R-factor), and which is a function of the position, temperature factor, and occupancy of each non-hydrogen atom in the model. This usually involves alternate cycles of real space refinement, i.e., calculation of electron density maps and model building, and reciprocal space refinement, i.e., computational attempts to improve the agreement between the original intensity data and intensity data generated from each successive model. Refinement ends when the function Φ converges on a minimum wherein the model fits the electron density map and is stereochemically and conformationally reasonable. During refinement, ordered solvent molecules are added to the structure.
While Cartesian coordinates are important and convenient representations of the three-dimensional molecular structure of a polypeptide, those of skill in the art will readily recognize that other representations of the structure are also useful. Therefore, the three-dimensional molecular structure of a polypeptide, as discussed herein, includes not only the Cartesian coordinate representation, but also all alternative representations of the three-dimensional distribution of atoms. For example, atomic coordinates may be represented as a Z-matrix, wherein a first atom of the protein is chosen, a second atom is placed at a defined distance from the first atom, a third atom is placed at a defined distance from the second atom so that it makes a defined angle with the first atom. Each subsequent atom is placed at a defined distance from a previously placed atom with a specified angle with respect to the third atom, and at a specified torsion angle with respect to a fourth atom. Atomic coordinates may also be represented as a Patterson function, wherein all interatomic vectors are drawn and are then placed with their tails at the origin. This representation is particularly useful for locating heavy atoms in a unit cell. In addition, atomic coordinates may be represented as a series of vectors having magnitude and direction and drawn from a chosen origin to each atom in the polypeptide structure. Furthermore, the positions of atoms in a three-dimensional structure may be represented as fractions of the unit cell (fractional coordinates), or in spherical polar coordinates.
Additional information, such as thermal parameters, which measure the motion of each atom in the structure, chain identifiers, which identify the particular chain of a multi-chain protein or protein co-complex in which an atom is located, and connectivity information, which indicates to which atoms a particular atom is bonded, is also useful for representing a three-dimensional molecular structure.
The three-dimensional molecular structures for the GLYAT R7 variant polypeptide was determined with the GLYAT variant in complex with oxidized coA (a binary complex) and in complex with acetyl coA and 3PG (ternary complex) (Siehl et al. (2007) J Biol Chem 282:I′1446-11455). The atomic coordinates and structural information for the binary and ternary complexes can be found in the Protein Data Bank (Berman et al. (2000) Nucleic Acids Research 28, 235-242; see also, the web page at the URL resb.org/pdb/) with the accession numbers PDB ID: 2JDC and PDB ID: 2JDD, respectively, which are herein incorporated by reference in their entireties (Siehl et al. (2007) J Biol Chem 282:11446-11455). The GLYAT R7 variant exhibits enhanced catalytic activity for glyphosate over the native GLYAT polypeptide. The optimized GLYAT polypeptide was generated through iterative DNA shuffling of a native GLYAT polypeptide.
As will be apparent to those of ordinary skill in the art, the atomic structures presented herein are independent of their orientation, and the atomic co-ordinates identified herein merely represent one possible orientation of a particular GLYAT polypeptide. The atomic coordinates are a relative set of points that define a shape in three dimensions. Thus, it is possible that a different set of coordinates could define a similar or identical shape. Therefore, slight variations in the individual coordinates will have little effect on overall shape. It is apparent, therefore, that the atomic co-ordinates identified herein may be mathematically rotated, translated, scaled, or a combination thereof, without changing the relative positions of atoms or features of the respective structure. The variations in coordinates discussed may bc generated because of mathematical manipulations of the structure coordinates. For example, the structure coordinates could bc manipulated by crystallographic permutations of the structure coordinates, fractionalization of the structure coordinates, integer additions or subtractions to sets of the structure coordinates, inversion of the structure coordinates or any combination of the above.
Alternatively, modifications in the crystal structure due to mutations, additions, substitutions and/or deletions of amino acids, or other changes in any of the components that make up the crystal could also account for variations in the structure coordinates. If such variations are within an acceptable standard of error as compared to the original coordinates, the resulting three-dimensional shape is considered to be the same. Thus, in one aspect of the present invention, any molecule or molecular complex that has a RMSD of conserved residue backbone atoms (N, Calpha, C, O) of less than about 4 Å, 2 Å, 1.5 Å, 1 Å, or 0.5 Å when superimposed on the relevant backbone atoms described by the coordinates listed in any one of Tables 1-10 are considered identical.
Using the methods of the invention, candidate polypeptides are evaluated for the potential of having an improved enzymatic activity in comparison to native GLYAT enzymes based on three-dimensional structural similarities with an optimized GLYAT. Enzymatic activity can be characterized using the conventional kinetic parameters kcat, KM, and kcat/KM. The catalytic constant, kcat, can be thought of as a measure of the maximum rate of acetylation, particularly at high substrate concentrations; KM is a measure of the affinity of an enzyme for its substrate (e.g., glyphosate) and cofactor (e.g., acetyl CoA); and kcat/KM is a measure of catalytic efficiency that takes both substrate affinity and catalytic rate into account. kcat/Km is particularly important in the situation where the concentration of a substrate is at least partially rate-limiting. In general, an enzyme with a higher kcat or kcat/KM is a more efficient catalyst than another enzyme with a lower kcat, or kcat/KM. An enzyme with a lower KM binds its substrate with a higher affinity and is a more efficient catalyst than another enzyme with a higher KM. Thus, to determine whether one GLYAT is more effective than another, one can compare kinetic parameters for the two enzymes. The relative importance of kcat, kcat/KM and KM will vary depending upon the context in which the GLYAT will be expected to function, e.g., the anticipated effective concentration of glyphosate relative to the KM for glyphosate.
Thus, the GLYAT polypeptide used to evaluate the candidate polypeptide or the candidate polypeptide itself may have a higher affinity, and thus, a lower KM, for glyphosate than native GLYAT enzymes. For example, in some embodiments, the KM of the GLYAT polypeptide or candidate polypeptide is less than about 1 mM, including but not limited to, about 0.9 mM, 0.8 mM, 0.7 mM, 0.6 mM, 0.5 mM, 0.4 mM, 0.3 mM, 0.2 mM, 0.1 mM, 0.05 mM, or less.
The GLYAT polypeptide or candidate polypeptide may have a higher kca, for a substrate (e.g., glyphosate) than native GLYAT polypeptides. For example, in some embodiments, the GLYAT polypeptide or candidate polypeptide has a kcat of at least about 20 min−1, including but not limited to, about 50 min−1, 100 min−1, 200 min−1, 500 min−1, 1000 min−1, 1100 min−1, 1200 min−1, 1250 min−1, 1300 min−1, 1400 min−1, 1500 min−1, 1600 min−1, 1700 min−1, 1800 min−1, 1900 min−1, 2000 min−1 or higher. GLYAT polypeptides or the candidate polypeptides may have a higher kcat/KM for a substrate (e.g., glyphosate) than native GLYAT enzymes. In some embodiments, the GLYAT polypeptide or candidate polypeptide has a kcat/KM of at least about 100 mM−1 min−1, 500 mM−1 min−1, 1000 mM−1 min−1, 2000 mM−1 min−1, 3000 mM−1 min−1, 4000 mM−1 min−1, 5000 mM−1 min−1, 6000 mM−1 min−1, 7000 mM−1 min−1, or 8000 mM−1 min−1, or higher. The activity of GLYAT enzymes is affected by, for example, pH and salt concentration; appropriate assay methods and conditions are known in the art (see, e.g., WO2005012515, which is herein incorporated by reference in its entirety). Such improved enzymes identified using the presently disclosed methods may find particular use in methods of growing a crop in a field where the use of a particular herbicide or combination of herbicides and/or other agricultural chemicals would result in damage to the plant if the enzymatic activity (i.e., kcat, KM, or kcat/KM) were lower.
In some embodiments, the GLYAT polypeptide for which a molecular structure is provided for comparison to a candidate polypeptide or the candidate polypeptide itself exhibits a greater specificity for glyphosate than native GLYAT polypeptides., As used herein, “specificity” refers to the preference of a polypeptide to bind and/or catalyze one substrate over another. For example, a polypeptide with a greater specificity for glyphosate over other potential GLYAT substrates binds to glyphosate with an affinity that is at least two times greater than its affinity for another substrate (e.g., D-AP3). In some embodiments, the affinity, kcat, and/or kcat/KM is about 2 times, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 25, about 30, about 40, about 50, about 100, about 200, about 500, about 1000, or greater times that of the native GLYAT polypeptide for glyphosate over another substrate (e.g, D-AP3). In those embodiments wherein the affinity is greater, the KM of the GLYAT polypeptide or candidate polypeptide for glyphosate is equivalently lower than the KM of the polypeptide for the other substrate.
In some embodiments, the specificity of the GLYAT polypeptide for which the molecular structure is constructed and/or the candidate polypeptide exhibit a greater specificity for glyphosate than native GLYAT polypeptides. In certain embodiments, the GLYAT polypeptide or candidate polypeptide is able to bind compounds with at least five main chain atoms with a higher affinity than native GLYAT polypeptides. Kinetic data has demonstrated that optimizing GLYAT for activity with glyphosate shifted the binding preference to ligands with a main-chain length of 5-atoms from those of 4-atoms in the wild-type enzyme (Siehl et al. (2007) J Biol Chem 282:11446-11455). For example, the R7 and R11 variants of GLYAT have a higher binding affinity and higher catalytic activity on compounds with five main chain atoms (e.g., glyphosate) than native GLYAT polypeptides, which exhibit a preference for smaller compounds with three to four main chain atoms (e.g., D-AP3). Thus, in some embodiments, the GLYAT polypeptide or candidate polypeptide bind compounds with at least five main chain atoms with an affinity that is at least about 2 fold greater than native GLYAT polypeptides, including but not limited to at least about 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, 20-fold, 30-fold, 40-fold, 50-fold, 100-fold, or greater.
The analysis of the molecular structure of the GLYAT R7 variant polypeptide complexed with acetyl CoA and glyphosate provided herein has provided the identity and location of the residues important for the binding of substrates to GLYAT polypeptides. Importantly, the analysis has provided a molecular basis for the enhanced affinity and specificity exhibited by the GLYAT variant polypeptides over that of the native GLYAT polypeptide.
The atomic coordinates of the GLYAT R7 variant polypeptide that comprise the substrate binding cavity are presented in Table 1, wherein the GLYAT R7 variant polypeptide is bound to glyphosate and acetyl coA. Table 2 provides the atomic coordinates of the substrate binding cavity of GLYAT R11 variant polypeptide when bound to glyphosate and acetyl coA. As used herein, a “substrate binding cavity” refers to the atoms of a polypeptide that directly contact (e.g., through hydrogen bonds, van der Waals interactions) the substrate (e.g., glyphosate) or are within about 4 Å of the substrate (e.g., glyphosate). A “substrate binding cavity” can also include residues that contribute to the structure or flexibility of the residues directly contacting or within 4 Å of the substrate. In some embodiments, the substrate binding cavity comprises at least the atomic coordinates of Table 1.
aThe data are derived from a modeled structure based on PDB: 2JDD, in which 3PG was replaced by glyphoshate (FIG. 1). The structural model underwent a series of energy minimization with CHARMm, on newly added hydrogen (CONJ, 500 cycles), on hydrogen and glyphosate (500 cycles), on non-backbone atoms (200 cycles), and on whole system (200 cycles). The amino acid atom is the specific atom of the amino acid, as identified in Protein Data Bank file 2JDD;
bX, Y, and Z are the three-dimensional coordinates specifying the distance in Angstroms of the amino acid atom relative to the center of mass of the crystal defined by the PDB file 2JDD;
cAtoms of glyphosate are defined in FIG. 2A.
aThe atom naming convention is the same as in Table 1.
According to the methods of the invention, a candidate polypeptide is evaluated for its potential to associate with glyphosate with a higher binding affinity, higher binding specificity, or both when compared to a native GLYAT polypeptide. In these embodiments, a three-dimensional molecular structure of at least a substrate binding cavity of a GLYAT polypeptide is provided. The three-dimensional molecular structure is determined with the GLYAT polypeptide bound to glyphosate and an acetyl donor, such as acetyl coA. As used herein the terms “bind,” “binding,” “bound,” “bond,” or “bonded,” when used in reference to the association of atoms, molecules, or chemical groups, refer to any physical contact or association of two or more atoms, molecules, or chemical groups. Such contacts and associations include covalent and non-covalent types of interactions.
The three-dimensional molecular structure of the substrate binding cavity can comprise at least the atomic coordinates of Table 1. In other embodiments, the substrate binding cavity comprises at least the atomic coordinates of Table 2. Alternatively, the substrate binding cavity can comprise a structural variant of the substrate binding cavity defined by the atomic coordinates of Table 1 or Table 2. As used herein, a “structural variant” comprises a three-dimensional molecular structure that is similar to another three-dimensional molecular structure. In some embodiments, the structural variant comprises a root mean square deviation from the back-bone atoms of the amino acids of Table 1 or Table 2 of not more than about 4 Å, including but not limited to about 3.5 Å, 3 Å, 2.5 Å, 2 Å, 1.9 Å, 1.8 Å, 1.7 Å, 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2 Å, 1.1 Å, 1.0 Å, 0.9 Å, 0.8 Å, 0.7 Å, 0.6 Å, 0.5 Å, 0.4 Å, 0.3 Å, 0.2 Å, and 0.1 Å. In some of these embodiments, the structural variant substrate binding cavity comprises a root mean square deviation from the back-bone atoms of the amino acids of Table 1 or Table 2 of not more than about 2.0 Å.
Two loops (loop20 and loop130, which is more specifically described as a β-hairpin) cover the bound substrate from opposite sides and join together at their tip points, creating the substrate binding cavity (
aThe atom naming convention is the same as in the Table 1.
bThe minimum distance in Angstroms between the listed pairs of atoms in loop20 and glyphosate.
aThe atom naming convention is the same as in the Table 1.
bThe minimum distance in Angstroms between the listed pairs of atoms in loop20 and glyphosate.
The substrate-binding β-hairpin comprises residues 130-138 (FDTPPVGPH of the GLYAT R7 variant). The substrate-binding β-hairpin connects strands 6 and 7, with the four middle residues (TPPV) forming a typical Via β-turn (Richardson (1981) Adv Protein Chem. 1981; 34:167-339). As described elsewhere herein, the two consecutive prolines Pro133 and Pro 134 reduce the flexibility of the β-turn with Pro 133 adopting a trans- and Pro 134 a cis-conformation. The β-hairpin covers glyphosates phosphono group and harbors the putative catalytic base H is)38 (see
As described elsewhere herein, amino acid substitutions I132T and I135V, introduced by gene shuffling, had a significant impact on β-hairpin stability by reducing hydrophobic packing strength among the paired side chains (see
In some embodiments, the substrate binding cavity further comprises the full atomic coordinates of the substrate-binding β-hairpin (residues 130-138) defined by the atomic coordinates provided in Table 5 in addition to the atomic coordinates provided in Table 1, Table 3, or both or a structural variant thereof. In other embodiments, the substrate binding cavity further comprises the full atomic coordinates of the substrate-binding β-hairpin defined by the atomic coordinates provided in Table 6 in addition to the atomic coordinates provided in Table 2, Table 4, or both or a structural variant thereof. The minimum distances between β-hairpin residues and glyphosate are also shown in Tables 5 and 6.
aThe atom naming convention is the same as in Table 1.
bThe minimum distance in Angstroms between the listed pairs of atoms in beta-hairpin and glyphosate.
aThe atom naming convention is the same as in Table 1.
bThe minimum distance in Angstroms between the listed pairs of atoms in beta-hairpin and glyphosate.
Without being bound by any theory or mechanism of action, the mutated residues of the β-hairpin of the optimized GLYAT variants contribute to its reduced stability and greater flexibility, which might contribute to an acceleration of the opening of the active site and determine substrate specificity. In addition, the phenol of wild-type GLYAT residue Y130 hydrogen bonds with the side chain of Asn 109. The R7 GLYAT variant polypeptide has a Y130F mutation and without being bound by any theory or mechanism of action, we believe that the absence of this hydrogen bond might allow the optimized GLYAT variant to more easily adjust then β-hairpin conformation to accommodate new substrate (e.g., glyphosate).
In any of these embodiments, a structural variant of the substrate binding cavity can be used for comparison to a three-dimensional molecular structure of a candidate polypeptide comprising the provided atomic coordinates in Table 1, Table 1 and Table 3, Table 1 and Table 5, Tables 1, 3, and 5, Table 2, Table 2 and 4, Table 2 and 6, or Tables 2, 4, and 6, wherein the structural variant comprises a root mean square deviation from the back-bone atoms of the amino acids for which the atomic coordinates are provided of not more than about 4 Å, and in some embodiments, not more than about 2 Å, including but not limited to about 4 Å, 3.5 Å, 3 Å, 2.5 Å, 2.0 Å, 1.9 Å, 1.8 Å, 1.7 Å, 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2 Å, 1.1 Å, 1.0 Å, 0.9 Å, 0.8 Å, 0.7 Å, 0.6 Å, 0.5 Å, 0.4 Å, 0.3 Å, 0.2 Å, and 0.1 Å.
The three-dimensional molecular structures of the GLYAT polypeptide and the candidate polypeptide are compared to determine if the candidate polypeptide comprises the substrate binding cavity of the GLYAT polypeptide (comprising the atomic coordinates of Table 1, Table 1 and Table 3, Table 1 and Table 5, Tables 1, 3, and 5, Table 2, Table 2 and 4, Table 2 and 6, or Tables 2, 4, and 6). A candidate polypeptide is considered to comprise the substrate binding cavity of the GLYAT polypeptide if the candidate polypeptide comprises a region wherein the back-bone atoms of the amino acids of this region have no more than about 4 Å root mean square deviation from the backbone atoms of the amino acids provided in Table 1, and optionally Table 3, and Table 5, including but not limited to about 4 Å, 3.5 Å, 3 Å, 2.5 Å, 2.0 Å, 1.9 Å, 1.8 Å, 1.7 Å, 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2 Å, 1.1 Å, 1.0 Å, 0.9 Å, 0.8 Å, 0.7 Å, 0.6 Å, 0.5 Å, 0.4 Å, 0.3 Å, 0.2 Å, and 0.1 Å. In other embodiments, a candidate polypeptide is considered to comprise the substrate binding cavity oldie GLYAT polypeptide if the candidate polypeptide comprises a region wherein the back-bone atoms of the amino acids of this region have no more than about 4 Å root mean square deviation from the backbone atoms of the amino acids provided in Table 2, and optionally Table 4, and Table 6, including but not limited to about 4 Å, 3.5 Å, 3 Å, 2.5 Å, 2.0 Å, 1.9 Å, 1.8 Å, 1.7 Å, 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2 Å, 1.1 Å, 1.0 Å, 0.9 Å, 0.8 Å, 0.7 Å, 0.6 Å, 0.5 Å, 0.4 Å, 0.3 Å, 0.2 Å, and 0.1 Å. In some embodiments, the two molecular structures are considered the same if the root mean square deviation between the back-bone atoms of the amino acids of this region are not more than about 2 Å. Any method known in the art can be used to compare the two three-dimensional molecular structures to determine if the candidate polypeptide comprises the optimized substrate binding cavity. Such analyses may be carried out in current software applications, such as the Molecular Similarity application of QUANTA (Molecular Simulations Inc., San Diego, Calif.) and as described in the accompanying User's Guide. The Molecular Similarity application permits comparisons between different structures, different conformations of the same structure, and different parts of the same structure. The procedure used in Molecular Similarity to compare structures is divided into four steps: 1) load the structures to be compared; 2) define the atom equivalences in these structures; 3) perform a fitting operation; and 4) analyze the results. Each structure is identified by a name. One structure is identified as the target (i.e., the fixed structure); all remaining structures are working structures (i.e., moving structures). Since atom equivalency within QUANTA is defined by user input, for the purpose of this invention we will define equivalent atoms as protein backbone atoms (N, C.alpha., C and O) for all conserved residues between the two structures being compared. Many other structural comparison tools automatically identify equivalent atoms (usually the alpha carbons of equivalent residues). Since the geometrical distance between the alpha carbons of any two residues in a 3D structure does not directly reflect the position of the residues in the corresponding primary ID sequence, the identified equivalent residues of two proteins can be non-consecutive, not the same residue number, or even not in the same sequential order. The widely available software packages include, but are not limited to, Dali (Holm & Sander (1993) J Mol Biol. 233(1):123-138), SSM (Krissinel & Henrick (2004) Acta Cryst. D60:2256-2268), VAST (Gibrat et al. (1996) Curr Opin Struct Biol 6(3):377-385), and CE (Shindyalov & Bourne (1998) Protein Engineering 11(9):739-747). We will also consider only rigid fitting operations. When a rigid fitting method is used, the working structure is translated and rotated to obtain an optimum fit with the target structure. The fitting operation uses an algorithm that computes the optimum translation and rotation to be applied to the moving structure, such that the root mean square difference of the fit over the specified pairs of equivalent atom is an absolute minimum. This number, given in angstroms, is reported by QUANTA and others.
In embodiments, the present subject matter is directed to an electronic representation comprising the atomic coordinates of any glyphosate N-acetyltransferase (GLYAT) or variant thereof described herein. In a preferred embodiment, an electronic representation comprises the atomic coordinates of a glyphosate N-acetyltransferase (GLYAT) polypeptide crystal. In another preferred embodiment, an electronic representation comprises the atomic coordinates found in Tables 18 or 19.
In another embodiment, the present subject matter is directed to a data array comprising the atomic coordinates of a glyphosate N-acetyltransferase (GLYAT) polypeptide crystal said atomic coordinates comprising, a) a three-dimensional representation of at least one of a substrate binding cavity comprising atomic coordinates described herein; and b) a variant of the three-dimensional representation of part (a), wherein said variant comprises a root mean square deviation from the back-bone atoms of said amino acids of not more than 1.9 Å.
In another embodiment, the present subject matter is directed to an electronic representation comprising the atomic coordinates of a glyphosate N-acetyltransferase (GLYAT) polypeptide crystal said atomic coordinates comprising, a) a three-dimensional representation of at least one of a substrate binding cavity comprising atomic coordinates described herein; and b) a variant of the three-dimensional representation of part (a), wherein said variant comprises a root mean square deviation from the back-bone atoms of said amino acids of not more than 1.9 Å.
It is to be noted that the candidate polypeptide can be considered to comprise the GLYAT substrate binding cavity of Table 1, and in some embodiments, Table 3, Table 5, or both, or the GLYAT substrate binding cavity of Table 2, and in some embodiments, Table 4, Table 6, or both, even if the particular residue number between the GLYAT polypeptide and candidate polypeptide are dissimilar, so long as the atomic coordinates of the amino acid atoms that contact glyphosate are the same (or wherein the back-bone atoms of the amino acids of this region have no more than about 4 Å root mean square deviation from the backbone atoms of the amino acids provided in Table 1, Table 1 and Table 3, Table 1 and Table 5, Tables 1, 3, and 5, Table 2, Table 2 and 4, Table 2 and 6, or Tables 2, 4, and 6, as discussed above). For example, the leucine residue at position 20 in the substrate binding cavity of the GLYAT R7 variant polypeptide listed in Table 1 can correspond to a leucine residue in the substrate binding cavity of the candidate polypeptide that is not at the 20 position in the amino acid sequence of the candidate polypeptide. One of skill in the art will appreciate that the two molecular structures can still be considered the same or similar so long as the three-dimensional molecular structure of the candidate polypeptide comprises the atomic coordinates within Table 1, Table 1 and Table 3, Table 1 and Table 5, Tables 1, 3, and 5, Table 2, Table 2 and 4, Table 2 and 6, or Tables 2, 4, and 6 (or a variation thereof), regardless of the positioning of a given residue within the polypeptide chain.
In some embodiments, the methods of the invention further comprise altering the primary structure of the candidate polypeptide to maximize a similarity or relationship between the three-dimensional molecular structures of the candidate polypeptide and the substrate binding cavity of the GLYAT polypeptide (comprising the atomic coordinates of Table 1, Table 1 and Table 3, Table 1 and Table 5, Tables 1, 3, and 5, Table 2, Table 2 and 4, Table 2 and 6, or Tables 2, 4, and 6). Any method known in the art can be used to alter the primary structure of the candidate polypeptide, including any mutagenic or recombinogenic methods described elsewhere herein. One of skill in the art will appreciate that mutations introduced outside of the substrate binding cavity may influence the secondary or tertiary structure of the polypeptide and indirectly alter the three-dimensional structure of the substrate binding cavity. Candidate polypeptides, particularly those whose primary structure have been modified to provide a better fit with the substrate binding cavity of the GLYAT polypeptide, can be produced and assayed for the ability to bind to glyphosate with a higher binding affinity or specificity when compared to a native GLYAT polypeptide using any method known in the art. In this way, the methods of the invention provide for the identification of additional optimized GLYAT polypeptides that exhibit enhanced affinity or specificity for glyphosate over native GLYAT polypeptides.
As used herein, the term “maximize” includes enhance, increase, improve and the like. Thus, the term is not limited to a highest measure but is meant to also describe incremental enhancements, improvements and the like.
In some embodiments of the methods of the invention, the candidate polypeptide is evaluated for its potential to have N-acetyltransferase activity with a higher catalytic rate (kcat) for a substrate when compared to a native GLYAT polypeptide. In these embodiments, a three-dimensional molecular structure of at least a GNAT wedge joining region of a GLYAT polypeptide is provided and the three-dimensional molecular structure of a candidate polypeptide are compared to determine if the candidate polypeptide has the potential to have N-acetyltransferase activity with a higher kcat for a substrate when compared to a native GLYAT polypeptide. The molecular structure is determined from a GLYAT polypeptide bound to glyphosate and an acetyl donor (e.g., AcCoA). GLYAT polypeptides comprise the classic GNAT wedge shape that comprises a V-shaped wedge formed by two central parallel beta strands splaying apart at the middle point (for example, see beta strands β4 and β5 of GLYAT in
Beyond substrate binding, two other residues, Try118 and Met75, are essential to catalysis. Try118 is about 3.6 Å from AcCoA SIP and is in position to serve as the general base protonating the thiolate anion of CoA (Sichl et al. (2007) J Biol Chem 282:11446-11455). A characteristic feature of GLYAT, the β-bulge at strand 4, formed by residues Gly74 and Met75, orients the amide of Met75 to the reaction center, forming a hydrogen bond to the carbonyl of the AcCoA's thioester (
The wedge also contributes two residues that recognize glyphosate through their side-chains (Arg73 and Arg111). Atomic coordinates found within about 4 Å of the bound AcCoA, where the two beta strands meet are considered part of the wedge joining region. In some embodiments, the GNAT wedge joining region comprises the atomic coordinates provided in Table 7 or Table 8.
aThe naming convention of amino acid atoms and all the atomic coordinates is the same as Table 1 and the structure model used here is the same as that in Table 1.
aThe naming convention of amino acid atoms and all the atomic coordinates is the same as Table 1 and the structure model used here is the same as that in Table 1.
In some embodiments, the three-dimensional molecular structure of the GNAT wedge joining region can be described as comprising the backbone atomic coordinates and the inter-strand C-alpha atom distance of Table 9, which are found in the GLYAT R7 variant polypeptide, and the GNAT wedge joining region further comprises the atomic coordinates of Table 9, in addition to those of Table 7. In other embodiments, the three-dimensional molecular structure of the GNAT wedge joining region can be described as comprising the backbone atomic coordinates and the inter-strand C-alpha atom distance of Table 10, which are found in the GLYAT R11 variant polypeptide, and the GNAT wedge joining region further comprises the atomic coordinates of Table 10, in addition to those of Table 8.
aThe amino acid atom is the specific atom of the amino acid, as identified in Protein Data Bank file 2JDD;
bX, Y, and Z are the three-dimensional coordinates specifying the distance in Angstroms of the amino acid atom relative to the center of mass of the crystal.
cThe distance is the interstrand (β4/β5) distance of the two corresponding C-alpha atoms.
aThe amino acid atom is the specific atom of the amino acid, as identified in Protein Data Bank file 2JDD;
bX, Y, and Z are the three-dimensional coordinates specifying the distance in Angstroms of the amino acid atom relative to the center of mass of the crystal.
cThe distance is the interstrand (β4/β5) distance of the two corresponding C-alpha atoms.
Alternatively, the GNAT wedge joining region can comprise a structural variant of the GNAT wedge joining region defined by the atomic coordinates of Table 7, Tables 7 and 9, Table 8, or Tables 8 and 10, wherein the structural variant comprises a root mean square deviation from the back-bone atoms of the amino acids of Table 7, Tables 7 and 9, Table 8, or Tables 8 and 10 of not more than about 4 Å, including but not limited to about 3.5 Å, 3 Å, 2.5 Å, 2 Å, 1.9 Å, 1.8 Å, 1.7 Å, 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2 Å, 1.1 Å, 1.0 Å, 0.9 Å, 0.8 Å, 0.7 Å, 0.6 Å, 0.5 Å, 0.4 Å, 0.3 Å, 0.2 Å, and 0.1 Å. In some of these embodiments, the variant GNAT wedge joining region comprises a root mean square deviation from the back-bone atoms of the amino acids of the structure defined by the atomic coordinates of Table 7, Tables 7 and 9, Table 8, or Tables 8 and 10 of not more than about 2.0 Å.
The analysis described elsewhere herein (see Experimental Example 1) describes two independent structural inter-subdomain motion modes within the GLYAT polypeptide involving the GNAT wedge, wherein the wedge joining region serves as a hinge for both the observed wedge opening and wedge twisting motions. Without being bound by any theory or mechanism of action, it is believed that these motions play a role in controlling the access of AcCoA, determining bound AcCoA's conformation, facilitating the egress of CoA, and facilitating the binding of glyphosate and that the mutations in the wedge joining region found in the optimized GLYAT variants contribute to the enhanced catalytic activity (and perhaps the enhanced glyphosate binding affinity and specificity) associated with these optimized variants.
The three-dimensional molecular structure of the GLYAT wedge joining region is compared to the provided three-dimensional molecular structure of a candidate polypeptide to determine if the structure of the candidate polypeptide comprises the wedge joining region of the GLYAT polypeptide (comprising the atomic coordinates of Table 7, Tables 7 and 9, Table 8, or Tables 8 and 10). En some of these embodiments, the candidate polypeptide is known to comprise a GNAT wedge or is suspected of comprising a GNAT wedge based on sequence similarity to protein members of the GNAT superfamily (see Dyda et al. (2000) Annu Rev. Biophys. Biomol. Struct. 29:81-103, which is herein incorporated by reference in its entirety). A candidate polypeptide can be suspected of comprising a GNAT wedge if the candidate polypeptide exhibits at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or higher sequence similarity to a member of the GNAT superfamily of N-acetyltransferases. In some of these embodiments, the candidate polypeptide has been shown to exhibit N-acetyltransferase activity or is suspected of having N-acetyltransferase activity (based on sequence similarity with other N-acetyltransferases). The candidate polypeptide can be suspected of having N-acetyltransferase activity if the candidate polypeptide exhibits at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or higher sequence similarity to a known N-acetyltransferase. In certain embodiments, the candidate polypeptide comprises a GLYAT polypeptide and the substrate comprises glyphosate.
A candidate polypeptide is considered to comprise the GNAT wedge joining region of the GLYAT polypeptide if the candidate polypeptide comprises a region wherein the back-bone atoms of the amino acids of this region have no more than about 4 Å root mean square deviation from the backbone atoms of the amino acids provided in Table 7, Tables 7 and 9, Table 8, or Tables 8 and 10, including but not limited to about 4 Å, 3.5 Å, 3 Å, 2.5 Å, 2.0 Å, 1.9 Å, 1.8 Å, 1.7 Å, 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2 Å, 1.1 Å, 1.0 Å, 0.9 Å, 0.8 Å, 0.7 Å, 0.6 Å, 0.5 Å, 0.4 Å, 0.3 Å, 0.2 Å, and 0.1 Å. In some embodiments, the two molecular structures are considered the same if the root mean square deviation between the back-bone atoms of the amino acids of this region are no more than about 2 Å. Any method known in the art can be used to compare the two three-dimensional molecular structures to determine if the candidate polypeptide comprises the GNAT wedge joining region, including those described elsewhere herein.
It is to be noted that the candidate polypeptide can be considered to comprise the GNAT wedge joining region of the GLYAT polypeptide (comprising the atomic coordinates of Table 7, Tables 7 and 9, Table 8, or Tables 8 and 10) even lithe particular residue number between the GLYAT polypeptide and candidate polypeptide are dissimilar as long as the atomic coordinates of the amino acid atoms are the same (or wherein the back-bone atoms of the amino acids of this region have no more than about 4 Å root mean square deviation from the backbone atoms of the amino acids provided in Table 7, Tables 7 and 9, Table 8, or Tables 8 and 10, as discussed above). For example, the arginine residue at position 73 in the GNAT wedge joining region of the GLYAT R7 variant polypeptide listed in Table 9 can correspond to an arginine residue in the substrate binding cavity of the candidate polypeptide that is not at the 73rd position in the amino acid sequence of the candidate polypeptide. One of skill in the art will appreciate that the two molecular structures can still be considered the same or similar as long as the three-dimensional molecular structure of the candidate polypeptide comprises the atomic coordinates within Table 9 (or a variation thereof), regardless of the positioning of a given residue with the polypeptide chain.
In some embodiments, the methods of the invention further comprise altering the primary structure of the candidate polypeptide to maximize a similarity or relationship between the three-dimensional molecular structures of the candidate polypeptide and the GNAT wedge joining region of the GLYAT polypeptide (comprising the atomic coordinates of Table 7, Tables 7 and 9, Table 8, or Tables 8 and 10). Any method known in the art can be used to alter the primary structure of the candidate polypeptide, including those described elsewhere herein. Candidate polypeptides whose primary structure have been modified to provide a better fit with the GNAT wedge joining region of the GLYAT polypeptide can be tested for the ability to acetylate its substrate at a higher catalytic rate when compared to a native GLYAT polypeptide using any method known in the art. In these embodiments, the catalytic rate will be determined under optimal conditions (e.g., non-limiting substrate). In this way, the methods of the invention provide for the identification of N-acetyltransferases that exhibit enhanced catalytic activity over native GLYAT polypeptides.
The methods can further comprise producing the candidate polypeptide having the GNAT wedge joining region described herein (comprising the atomic coordinates of Table 7, Tables 7 and 9, Table 8, or Tables 8 and 10). The candidate polypeptide can be synthesized using any method known in the art. The catalytic rate of the candidate polypeptide against a substrate (e.g., glyphosate) can then be assayed to determine if the candidate polypeptide has improved catalytic activity when compared to native GLYAT.
The presently disclosed subject matter further provides methods for evaluating the potential of a variant GLYAT polypeptide to associate with glyphosate with a higher binding affinity when compared to a native GLYAT polypeptide, higher binding specificity when compared to a native GLYAT polypeptide, or a combination thereof through the provision of a three-dimensional molecular structure of a variant GLYAT polypeptide. As described elsewhere herein, structural analysis of the altered amino acid residues between the optimized R11 and R7 variants compared with the native GLYAT identified three residue substitution trends associated with improved functionality; (1) increased positive charge through surface residue substitution, (2) expansion of the substrate binding cavity and (3) relaxation of the protein's interior packing density through downsizing amino acid substitution.
There are a total of 21 amino acid substitutions from the native GLYAT to the R7 variant, and 12 more from the R7 to R11 (
Of the interior substitutions, only 4, Y31F-V114A-I132T-I135V, are at the active site and they are all downsizing changes, i.e. residues with larger side-chain are replaced by relatively smaller ones (Table 14). V114A makes a direct contact with the pantetheine motif of AcCoA. I132T and I135V are located at the β-hairpin and interact with glyphosate's phosphono group. Y31F directly contacts the substrate carboxyl group through a van der Waals attraction in R7 and/or a hydrogen bond in the native GLYAT. These four substitutions effectively increase the size of the substrate binding-site. As described earlier (Siehl et al. (2007) J Biol Chem 282:11446-11455), the substrate most active with native GLYAT is D-AP3 (
Besides the four substitutions at the active site, other interior substitutions show the same downsizing trend, totaling 7 from the native to R7 (Y31F, T33S, T89S, V 114A, Y130F, I132T and I135V, Table 14) and 6 more substitutions from the R7 to R11 (119V, L36T, Y45F, 153V, M75V and 191V, Table 16). As a consequence, the overall molecular weight of R7 was 90 units smaller, 16,600 Da (R7) vs. 16,690 Da (native). These downsizing substitutions systematically created numerous small cavities, as with T33S and M75V, or abolished some internal hydrogen bonds, such as Y45F and Y130F, in the protein core, relaxing the protein's packing density. It is well documented that structural flexibility is inversely related to packing density (Halle (2002) Proc. Natl. Acad. Sci. USA 99:1274-1279). Mutagenesis and theoretical approaches have shown that introducing new interior cavities in some instances may decrease a protein's thermal stability (Matsumura et al. (1988) Nature, 334, 406-410; Eriksson et al (1992) Science, 255, 17K-183; Xu et al. (1998) Protein Sci. 7(1):158-177). On the other hand, in some instances, filling cavities can inhibit the motion of functionally important regions of a protein, thereby diminishing its catalytic activity (Ogata et al., (1996). Nat. Struct. Biol., 3, 178-187). Thus, the greater flexibility of optimized GLYATs is important for its improved functionality.
The GLYANT variant's structural characteristics in the absence of both substrate and cofactor AcCoA can be studied by a molecular dynamics simulation of an unliganded apo-enzyme. Without the bound ligands, the protein undergoes a large and hinge-like subdomain motion along the V-shaped wedge, and consequently the binding cavities for both substrate and cofactor are wide open. The binding site openness can be measured by calculating the average wedge angle and by measuring an inter-loop distance of the substrate binding loops, the β-hairpin and loop20. As used herein, a “wedge angle” is defined by the formula α+β−180°, wherein a comprises the angle formed by the Cα carbons in the following amino acid residues: alanine at position 76, leucine at position 72 and cysteine at position 108; and wherein β comprises the angle formed by the Cα carbons in the following amino acid residues: leucine at position 72, cysteine at position 108, and arginine at position 111 (see
As used herein, a “molecular dynamics simulation” refers to a simulation method devoted to the calculation of the time dependent behavior of a molecular system in order to investigate the structure, dynamics and thermodynamics of molecular systems by solving the equation of motion for a molecule. This equation of motion provides information about the time dependence and magnitude of fluctuations in both positions and velocities of a given molecule. The direct output of molecular dynamics simulations is a set of “snapshots” (coordinates and velocities) taken at equal time intervals, or sampling intervals. Depending on the desired level of accuracy, the equation of motion to be solved may be the classical (Newtonian) equation of motion, a stochastic equation of motion, a Brownian equation of motion, or even a combination (Becker et al. (2001) cds. Computational Biochemistry and Biophysics New York). There are a number of ways to implement molecular dynamics simulations and examples of suitable simulation packages include, but are not limited to, CHARMM 983) J Comp. Chem. 4:187-217), AMBER ((2005) J. Computat. Chem. 26:1668-1688), GROMACS (van der Spoel et al. (2005) J Comp. Chem. 26:1701-1718, TINKER (Ponder et al. (1987) J. Comput. Chem. 8:1016-1024), NAMD (Phillips et al. (2005) J. Comput. Chem. 26:1781-1802) and LAMMPS (Plimpton (1995) J. Comp. Phys. 117:1-19). Any method known in the art for performing a molecular dynamics simulation can be used, including the methods described elsewhere herein (see Experimental section). For example, CHARMM 27 (MacKerell et al. (2004) Journal of Computational Chemistry 25:1400-1415) or GROMACS simulations, OPLS-AA/L (Jorgensen et al. (1996) J. Am. Chem. Soc. 118: 11225-11236; Kaminski et al. (2001) J. Phys. Chem. 105:6474-6487) can be performed.
The sampling interval (that is, the duration of the molecular dynamics trajectory) is determined according to the time scale of the protein motion to be sampled. In some embodiments of the presently disclosed methods, the sampling interval of the molecular dynamics simulation is about 0.1, 1, 2, 4, 6, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500 nanoseconds or greater. In some of these embodiments, the molecular dynamics simulation occurs over an interval of about 10 nanoseconds. The average wedge angle of the GNAT wedge of the variant GLYAT polypeptide is determined over the specified sampling interval. In certain embodiments, the maximal wedge angle over an entire sampling interval of a molecular simulation of at least about 41°, including but not limited to about 42′, 43°, 44°, 45°, 46°, 47°, 48°, 49°, 50°, 51″, 52°, 53″, 54°, 55° or greater indicates the variant GLYAT polypeptide associates with glyphosate with a higher binding affinity, higher binding specificity or both when compared to a native GLYAT polypeptide.
The following terms are used to describe the sequence relationships between two or more polynucleotides or polypeptides: (a) “reference sequence”, (b) “comparison window”, (c) “sequence identity”, and, (d) “percentage of sequence identity.”
(a) As used herein, “reference sequence” is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence.
(b) As used herein, “comparison window” makes reference to a contiguous and specified segment of a polynucleotide sequence, wherein the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two polynucleotides. Generally, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50, 100, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide sequence a gap penalty is typically introduced and is subtracted from the number of matches.
Methods of alignment of sequences for comparison are well known in the art. Thus, the determination of percent sequence identity between any two sequences can be accomplished using a mathematical algorithm. Non-limiting examples of such mathematical algorithms are the algorithm of Myers and Miller (1988) CABIOS 4: 11-17; the local alignment algorithm of Smith et al., (1981) Adv. Appl. Math. 2:482; the global alignment algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48: 443-453; the search-for-local alignment method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. 85: 2444-2448; the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87: 2264-2268, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5877.
Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity. Such implementations include, but are not limited to: CLUSTAL in the PC/Gene program (available from Intelligenctics, Mountain View, Calif.); the ALIGN program (Version 2.0) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the GCG Wisconsin Genetics Software Package, Version 10 (available from Accelrys Inc., 9685 Scranton Road, San Diego, Calif., USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins et al. (1988) Gene 73: 237-244 (1988); Higgins et al. (1989) CABIOS 5: 151-153; Corpct et al. (1988) Nucleic Acids Res. 16:10881-90; Huang et al. (1992) CABIOS 8: 155-65; and Pearson et al. (1994) Meth. Mol. Biol. 24: 307-331. The ALIGN program is based on the algorithm of Myers and Miller (1988) supra. A PAM 120 weight residue table, a gap length penalty of 12, and a gap penalty of 4 can be used with the ALIGN program when comparing amino acid sequences. The BLAST programs of Altschul et al (1990) J. Mol. Biol. 215:403 are based on the algorithm of Karlin and Altschul (1990) supra. BLAST nucleotide searches can be performed with the BLASTN program, score=100, wordlength=12, to obtain nucleotide sequences homologous to a nucleotide sequence encoding a protein of the invention. BLAST protein searches can be performed with the BLASTX program, score=50, wordlength=3, to obtain amino acid sequences homologous to a protein or polypeptide of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped BLAST, PSI-BLAST, the default parameters of the respective programs (e.g., BLASTN for nucleotide sequences. BLASTX for proteins) can be used. BLAST software is publicly available on the NCBI website. Alignment may also be performed manually by inspection.
In some embodiments in the present methods, some steps, preferably the determining step can be implemented by a machine whereas the evaluation or evaluating step is conducted by a person. Computer programs disclosed herein or known in the art for comparing three-dimensional molecular structures are suitable for the present methods. More specifically, the one or more steps are implemented by a machine-readable program code on a machine readable medium and configured for execution by a machine such as a computer. General purpose machines may be used with the programs described herein or other suitable programs for executing one or more steps of the presently described methods. However, preferably embodiments are implemented in one or more computer programs executing on programmable systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. The program is executed on the processor to perform the functions described herein.
Each such program may be implemented in any desired computer language (including machine, assembly, high level procedural, object oriented programming languages, or the like) to communicate with a computer system. In any case, the language may be a compiled or interpreted language. The computer program will typically be stored on a storage media or device (e.g., ROM, CD-ROM, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
As used herein, the phrase “computer-readable storage medium” refers to any medium or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine readable medium includes machine readable storage media (read only memory (“ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices); machine readable transmission media (electrical, optical, acoustical or other form of propagated signals, e.g., carrier waves, infrared signals, digital signals, etc.); floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions.
Unless otherwise stated, sequence identity/similarity values provided herein refer to the value obtained using GAP Version 10 using the following parameters: % identity and % similarity for a nucleotide sequence using GAP Weight of 50 and Length Weight of 3, and the nwsgapdna.cmp scoring matrix; % identity and % similarity for an amino acid sequence using GAP Weight of 8 and Length Weight of 2, and the BLOSUM62 scoring matrix; or any equivalent program thereof. By “equivalent program” is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide or amino acid residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by GAP Version 10.
GAP uses the algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48: 443-453, to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. GAP considers all possible alignments and gap positions and creates the alignment with the largest number of matched bases and the fewest gaps. It allows for the provision of a gap creation penalty and a gap extension penalty in units of matched bases. GAP must make a profit of gap creation penalty number of matches for each gap it inserts. If a gap extension penalty greater than zero is chosen, GAP must, in addition, make a profit for each gap inserted of the length of the gap times the gap extension penalty. Default gap creation penalty values and gap extension penalty values in Version 10 of the GCG Wisconsin Genetics Software Package for protein sequences are 8 and 2, respectively. For nucleotide sequences the default gap creation penalty is 50 while the default gap extension penalty is 3. The gap creation and gap extension penalties can be expressed as an integer selected from the group of integers consisting of from 0 to 200. Thus, for example, the gap creation and gap extension penalties can be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 or greater.
GAP presents one member of the family of best alignments. There may be many members of this family, but no other member has a better quality. GAP displays four figures of merit for alignments: Quality, Ratio, Identity, and Similarity. The Quality is the metric maximized in order to align the sequences. Ratio is the quality divided by the number of bases in the shorter segment. Percent Identity is the percent of the symbols that actually match. Percent Similarity is the percent of the symbols that are similar. Symbols that are across from gaps are ignored. A similarity is scored when the scoring matrix value for a pair of symbols is greater than or equal to 0.50, the similarity threshold. The scoring matrix used in Version 10 of the GCG Wisconsin Genetics Software Package is BLOSUM62 (see Henikoff and Henikoff (1989) Proc. Natl. Acad. Sci. USA 89: 10915).
(c) As used herein, “sequence identity” or “identity” in the context of two polynucleotides or polypeptide sequences makes reference to the residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. When sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences that differ by such conservative substitutions are said to have “sequence similarity” or “similarity”. Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, Calif.).
(d) As used herein, “percentage of sequence identity” means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.
It is to be noted that the term “a” or “an” entity refers to one or more of that entity; for example, “a polypeptide” is understood to represent one or more polypeptides. As such the terms “a” (or “an”),“one or more,” and “at least one” can be used interchangeably herein.
Throughout this specification and the claims, the words “comprise,” “comprises,” and “comprising” are used in a non-exclusive sense, except where the context requires otherwise.
As used herein, the term “about,” when referring to a value is meant to encompass variations of, in some embodiments ±50%, in some embodiments ±40%, in some embodiments ±30%, in some embodiments ±20%, in some embodiments ±10%, in some embodiments ±5%, in some embodiments ±1%, in some embodiments ±0.5%, and in some embodiments ±0.1% from the specified amount, as such variations are appropriate to perform the disclosed methods or employ the disclosed compositions.
All publications and patent applications mentioned in the specification are indicative of the level of those skilled in the art to which this invention pertains. All publications and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be obvious that certain changes and modifications may be practiced within the scope of the appended claims.
The following examples are offered by way of illustration and not by way of limitation.
Optimized variants of glyphosate N-accetyltransferase (GLYAT) from B. licheniformis efficiently catalyze the acetylation of glyphosate, a broad-spectrum and non-selective herbicide, and confer resistance in transgenic plants. Structural modeling and molecular dynamics (MD) simulations were performed on the native enzyme, 7th (R7) and 11th (R11) round variants from DNA shuffling experiments (Keenan et al. (2005) Proc Natl Acad Sci USA 102(25):8887-8892), and a revertant form of R7 in which all four active site substitutions were changed back to the wild type form (YVII). Structural analysis revealed that the efficiency enhancement of the shuffling variants coincided with interior bulky residues being mutated to smaller ones. Substitutions that exemplify that trend in evolving native GLYAT to R7 include Y31F, T33S, T89S, V114A, I132T, Y130F and I135V; and from R7 to R11, 119V, L36T, Y45F, 153V, M75V, 191V. MD simulations showed that the more optimized GLYAT roughly had a larger amplitude of fluctuation and inter-subdomain motion, supporting the hypothesis that the interior downsizing mutations reduced the enzyme's core packing strength, resulting in more flexibility. Two major substrate binding elements, loop20 connecting the α1 and α2 helices and the β-hairpin connecting the β6 and β7 strands, were the most flexible. In the absence of ligand, loop20 and the β-hairpin drift more than 16 Å apart from their closed form when bound to ligand. The β-hairpin, containing a type Via β turn and two downsizing mutations I132V and I135T, apparently plays a role in regulating the active site conformation and determining substrate specificity. The Principal Component Analysis of a MD trajectory identified two novel, independent inter-subdomain motion modes involving the signature v-shaped wedge: wedge opening and wedge twisting. These long range motions might be a unique feature of the GCN5-related N-acetyltransferase (GNAT) superfamily fold and could be useful in understanding GNAT's structure-function relationship.
X-ray crystal structures of R7 GLYAT (from the 7th round of gene shuffling) complexed with AcCoA and 3-phosphoglycerate (3PG), a competitive inhibitor with respect to glyphosate, revealed the active site architecture. See PDB:2JDD for the atomic coordinates and structure factors of the X-ray crystal structure of the ternary complex of R7 GLYAT with AcCoA and 3PG and PDB:2JDC for the atomic coordinates and structure factors of the X-ray crystal structure of the binary complex of R7 GLYAT with oxidized CoA and sulfate bound in the glyphosate binding pocket. See Tables 11 and 12 for the atoms of the R7 GLYAT variant polypeptide and of AcCoA that contact 3PG (i.e., the substrate binding cavity) and the residues of R7 that contact AcCoA, respectively.
aThe amino acid atom is the specific atom of the amino acid, as identified in Protein Data Bank file 2JDD;
bX, Y, and Z are the three-dimensional coordinates specifying the distance in Angstroms of the amino acid atom relative to the center of mass of the crystal;
cAtoms of 3PG or AcCoA are defined in PDB:2JDD and FIG. 2.
aThe name convention and structure are the same as in Table 11.
In the ternary complex, 3PG sits on a platform defined by the pseudo-β sheet of the two splaying β4 and β5 strands and the pantetheine moiety of the cofactor, with the main-chain of 3PG perpendicular to the β-strands. The inhibitor is covered by two tip-joining loops, loop20 connecting α1/α2 and loop130 (or n-hairpin) spanning β6/β7. Surprisingly, the 21 amino-acid differences between the R7 and wild-type GLYAT are almost evenly distributed across the entire structure; none of the 3PG ligation residues—L20, Arg21, Gly74, Arg73, Arg111, and His138—are altered; and only four amino acid differences are in the perimeter of the active site, with Y31F, I132T, and I135V near 3PG and V114A close to AcCoA (Siehl et al. (2007) J Biol Chem 282:11446-11455). On the other hand, it has been documented that mutations distal to the active site can affect protein functions such as drug resistance (Perryman et al. (2004) Protein Sci 13:1108-1123), allosteric regulation (Taly et al. (2006) Proc. Natl. Acad. Sci. USA 103(45):16965-16970; Berendsen & Hayward (2000) Curr Opin Struct Biol 10(2):165-169), and ligand binding specificity (Ma et al. (2005) Biophysical Journal 89:1183-1193), often through long range correlated motion or conformational changes (Ma et al. (2002) Protein Sci 11:184-197). Thus, investigating GLYAT's dynamic characteristics and conformational flexibility is crucial to understanding the mechanism of its functional evolution and to further facilitate new herbicide tolerant gene development. Provided herein is a structural modeling and/or molecular dynamics (MD) study on the 7th round (R7), the 11th round (R11), YVII, and wild type GLYAT in various ligation states. YVII is a revertant mutant in which the four substitutions near the active site of R7 (Y31, V114, I132 and I135) were mutated back to wild-type. In fully liganded complex MD simulations, glyphosate, 3PG, or D-AP3 were modeled separately to examine the intimate details of the interaction between ligand and the enzymatic active site. To verify the findings, some simulations were carried out on two independent platforms, CHARMm 31b1 with CHARMM 27 force field and Gromacs with OPL-AA. All the simulations were performed in explicit solvent for multiple nanoseconds. This study characterized a novel open conformation, a transition mechanism between an open and closed active site, and inter-subdomain hinge motions around the wedge, and showed that the activity enhancement resulting from shuffling correlated with decreased protein core packing density or increased structural fluctuation. This is the first major simulation study applied to a member of the GNAT superfamily.
Analysis of Shuffling Changes through Structure Modeling:
Structure models of R11 and native GLYAT with bound ligands were built based on the crystal structure of R7 GLYAT complexed with AcCoA and 3PG (Siehl et al. (2007) J Biol Chem 282:11446-11455). After a series of energy minimizations under various constraints, the resulting models were similar to the R7 structure with RMSDs of <0.9 Å over all Cα atoms. MD simulations in explicit solvent were applied to further relax any outstanding strains. Harmonic constraints on heavy atoms in the protein were applied for the first 300 ps, followed by free simulation for the next >500 ps. In the presence of ligands, the models remained stable over the course of the simulations and the trajectory RMSDs of heavy atoms over the initial structures were comparable to those observed in R7 GLYAT, suggesting that the models were reasonably accurate.
The complete atomic coordinates of the GLYAT R7 variant bound to acetyl coA and glyphosate can be found in Table 18, whereas the complete atomic coordinates of the GLYAT R11 variant bound to acetyl coA and glyphosate are provided in Table 19.
Between the native GLYAT and the R7 variant, there are a total of 21 amino acid substitutions (
Regarding the 11 interior mutations, four of them were simply isomer switches between Leu and Ile (I15L, L261, L97I, and L145I) that are unlikely to alter catalytic efficiency in a significant way. Strikingly, the other 7 buried or partially buried substitutions all showed a clear trend that the larger residues of the native protein were replaced by smaller ones in R7: Y31F, T33S, T89S, V114A, Y130F, I132T, and I135V (Table 14). As a consequence, the overall molecular weight of R7 was 90 units smaller, 16,600 Da (R7) vs. 16,690 Da (native). Of these downsizing substitutions, Y31F, V114A, I132T and I135V are at the active site. V114A makes direct contact with the pantetheine motif of AcCoA. I132T and I135V are located at the glyphosate binding β-hairpin while Y31F directly contacts the substrate through either a hydrogen bond in the native or a van der Waals attraction in R7. These four substitutions effectively increase the size of the enzyme's substrate binding site. As described earlier (Siehl et al. (2007) J Biol Chem 282: 11446-11455), the substrate most active with native GLYAT is D-AP3 (
A total of 12 more substitutions were observed between R7 and R11 with only four mutations (E14D, G38S, Q67K and K119R) on the surface and eight mutations (I119V, L36T, Y45F, 153V, M75V, I191V, L105M and L1061) being fully or partially buried in the liganded structure (
Gene shuffling has reshaped the protein surface properties such as increasing the net positive charge and altering the dipole. It also directly increased the volume of the substrate binding site to accommodate the larger glyphosate. Other systematically downsizing substitutions created numerous small cavities and/or abolished some internal hydrogen bonds in the protein core. Structural flexibility is inversely related to protein packing density (Halle (2002) Proc. Natl. Acad. Sci. USA 99:1274-1279). On the other hand, filling cavities can inhibit the motion of functionally important regions of a protein, thereby diminishing its catalytic activity (Ogata et al., (1996). Nat. Struct. Biol., 3, 178-187). Thus, the greater flexibility of optimized GLYATs may be needed for its functional improvement.
The improvement of GLYAT catalytic efficiency by gene shuffling was contributed in part through an enhancement of substrate recognition, as the glyphosate KM decreased from 1.27 mM for native GLYAT, to 0.24 mM for R7, and to 0.055 mM for R11 (Siehl et al. (2007) J Biol Chem 282:11446-11455). The crystal structures in complex with ligands showed that the glyphosate binding site is located near the center of the enzyme and buried by the two binding loops, loop20 and loop130, or β-hairpin (
To gain insights into the conformational transition of GLYAT's active site, molecular dynamics simulations were performed for the apoenzyme. The 3PG structure (PDB:2J DD) was used as the starting coordinates with all the crystal waters kept, but ligands deleted. The empty space left by the removal of the ligands was filled with waters and brought to equilibrium by >200 ps MD simulations with protein heavy atoms under harmonic constraints. A ˜3 ns MD simulation of the R7 GLYAT variant was first run using CHARMm in CHARMm 27 force field and TIP3P waters. The simulation produced a stable trajectory and most significantly, the two binding loops started opening up at ˜200 ps. To confirm the findings, simulations with GROMACS were carried out in OPLS-AA force field and SPC waters up to ˜11 ns including ˜1 ns equilibration phase. The results from the two methods were very similar, consistent with a recent literature report that most of the detected major conformational dynamics behaviors with MD are force field independent (Rueda et al. (2007) Proc. Natl. Acad. Sci. USA 104(3):796-801). In comparing the trajectories between 1.8 and 3.0 ns, we noticed CHARMm produced relatively larger fluctuations and underwent a faster conformational evolution. For CHARMm and Gromacs, respectively, the RMSF of all the protein heavy atoms were 1.01±0.52 and 0.89±0.45, while the average RMSD of heavy atoms compared to their initial structures were 2.68±0.18 and 2.06±0.13. Due to the longer simulation periods enabled by its higher computing speed, only the Gromacs results are reported herein. R11 and YVII GLYATs in the absence of ligand were also simulated (Table 17).
All three trajectory RMSDs of heavy atoms to the initial structures were stabilized after ˜400 ps and the overall values in the 10 ns production phase were less than 3.3, 2.5, and 2.2 Å for R11, R7, and YVII, respectively (
An overlay of the α-carbon traces of snapshots of the open and closed conformations of R7 GLYAT shows that the β hairpin and loop20 underwent the biggest conformational changes (
Principal Component Analysis (PCA) of MD trajectory is an efficient way to filter high frequency motion and capture low frequency but highly correlated motions that often have biological significance (Kitao & Go (1999) Curr. Opin. Struc. Biol. 9:164-169; Ota & Agard (2001) Protein Sci 10(7):1403-1414). Covariance matrices were built from backbone atoms of 7,000 frames (<7 ns). The resultant eigenvalues showed that the first two eigenvectors predominated. Their projected motions are delineated in
To probe the stability of the wedge over the MD simulation, we defined a dihedral angle with four Ca atoms, Ala76, Leu72, Cys108 and Argil 1, (
As hinge-like, broad-range motions are usually determined by a protein's overall structure (Sinha and Nussinov (2001) Proc. Natl. Acad. Sci. USA 98:3139-3144), GLYAT's inter-subdomain motions involving wedge opening and twisting were apparently a feature of its unique topology. In the GLYAT structure, the most stable elements were the helix α3 and the surrounding seven stranded β sheet, which is split by the wedge at one end. The first four strands (β1-β4 in the subdomain I) wrap against helix α3 while the strands β5-β7 in subdomain II interact with α3 only at the wedge joining end. On the other end, helix α4 acts like a spring inserted between the subdomains, enabling the inter-subdomain movements. Conceivably, this inter-subdomain motion involving the well conserved structural elements plays a role in controlling the access of AcCoA, determining bound AcCoA's conformation, and facilitating the egress of CoA.
The motion associated with the active site conformational change is enacted by the 0 hairpin and loop20, the least conserved motifs in the GNAT family. The β-hairpin, comprised of residues 130 to 138 (FDTPPVGPH in R7), connect β6 and β7, with the four middle residues (TPPV) forming a typical Vla β-turn (Richardson (1981) Adv Protein Chem. 1981; 34:167-339). The two consecutive prolines Pro133 and Pro134 reduce its flexibility, with Pro133 adopting a trans- and Pro134 a cis-conformation. Such structural motifs often are associated with molecular recognition and function, including type VI β-turns in HIV-11IIB (Tugarinov et al. (1999) Nat. Struct. Biol. 6(4): 331-335), Bowman-Birk proteinase inhibitor (Brauer et al. (2002) Biochemistry 41(34):10608-10615), and disulfide oxidoreductase (DsbA) (Charbonnier et al. (1999) Protein Sci 8:96-105). Here, the β-hairpin covers glyphosate's phosphono group and also harbors the putative catalytic base His138 (
The MD simulations also suggested that the reduced stability of the β-hairpin in optimized GLYAT variants might also be responsible for accelerating the active site opening. In the crystal structure of the R7-3PG complex, both the n-hairpin and the loop20 cover 3PG and make direct van der Waals contacts through their tip regions, including the side chains of Val135 with Arg21 and Pro134 with Gln24. The aliphatic side chain of Arg111 and the β-hairpin also align with each other. The interloop van der Waals contacts of YVII GLYAT were well maintained whereas these same contacts were lost quickly as a consequence of a large conformational adjustment of the β-hairpin in the R7 and R11 simulations. Indeed, revertant mutations at the β-hairpin of R7 significantly elevated the KM for glyphosate by 3.2- and 6.4-fold for T132I and V135I, respectively, reflecting the fact that the enhanced β-hairpin flexibility partially enables optimized GLYAT variants to better associate with glyphosate. In summary, the more optimized GLYAT apparently showed a larger amplitude of fluctuation and inter-subdomain motion in the simulation, associated with and probably a consequence of the selection of an ensemble or downsizing substitutions.
The partially or fully liganded simulations were carried out in CHARMm 27 force field. The ligand topology and parameters of AcCoA, glyphosate and D-AP3 were generated by InsightI1 (Accelrys, San Diego). The partial charge values were calculated with vcharge (
(1). Binary complex of R7+AcCoA: The recognition mode of the cofactor in all the known structures is extremely similar despite high divergence in their primary sequence and, in Fact, the GNAT fold seems to have been optimized around the binding of the phosphopantetheine motif (Dyda et al. (2000) Annu. Rev. Biophys Biomol. Struct. 29:81-103). The pantetheine arm and β4 form a pseudo β-sheet and the interacting inter-strand hydrogen bonds were well preserved in the simulation. In R7 GLYAT, the average bond length spanning N4P of AcCoA and C═O of Gly75 was 2.91±0.18 Å and that spanning C═O of AcCoA and the amide N of Thr77 was 2.91±0.14 Å. The pyrophosphate moiety of AcCoA also maintained stable interactions with the protein but its 3′ phosphate and ribosyl groups were solvent accessible and fluctuated widely. The kinetic mechanism of well-studied GNAT family members was shown to be ordered with a preference for AcCoA first binding to the free enzyme, followed by the binding of acceptor substrates (Vetting et al. (2005) Protein Sci 12:1954-1959; De Angelis et al. (998) J. Biol. Chem. 273 3045-3050), suggesting a structural role of the cofactor in organizing the active site (Dyda et al. (2000) Annu. Rev. Biophys. Biomol. Struct. 29:81-103). With AcCoA bound in the wedge, the overall fluctuations across the entire protein core decreased but the glyphosate binding loops remained mobile. The flexibility of the β-bulge was reduced apparently due to interaction with the acetyl carbonyl group of AcCoA. Regarding the subdomain motion, the angles of the V-shaped wedge opening and twisting were 36.3±2.8°, and −14.6±6.8°, significantly different from the unliganded values (45.9±5.1° and −9.2±6.2°) but similar to the fully liganded crystal structure (33.04° and −16.34°). Conceivably, AcCoA binding severely restricts inter-subdomain fluctuation around the wedge.
(2) Ternary complexes of R7+AcCoA+glyphosate and YVII+AcCoA+D-AP3: The initial conformations of the substrates were modeled as follows. The atoms of glyphosate were mapped onto the corresponding positions of 3PG, since the two molecules have a similar main chain structure. D-AP3, a primary amine, has a shorter main chain and branched structure. Its phosphono and carboxyl groups were placed in the equivalent positions of 3PG and its amine was directed toward the acetyl of AcCoA. When the docked complex structures were carefully relaxed with energy minimizations in the presence of crystal waters, the initial substrate conformations were well retained. During the subsequent simulations, glyphosate remained in its initial conformation as did the phosphono and carboxyl groups of D-AP3. However, the D-AP3 amine started to sway away from AcCoA after ˜1.5 ns, resulting in an unproductive conformation. Compared with the binding site of glyphosate in R7, the D-AP3 binding site of YVII exhibited much less fluctuation and was more compact. As a consequence, the average trajectory RMSDs against the X-ray structure of backbone atoms were significantly different. The RMSD of D-AP3+AcCoA+YVII was 0.8±0.13 Å, much smaller than the 1.15±0.15 Å observed for glyphosate+AcCoA+R7. The higher stability of D-AP3+AcCoA+YVII apparently resulted from (a) the smaller and more rigid D-AP3 structure, (b) the hydrogen bond of the Y31 phenol to D-AP3's carboxyl, and (c) the increased hydrophobic packing of I132 and I135 in YVII compared to T132 and V135 in R7. These findings again demonstrate the effect of downsizing substitutions in increasing the protein flexibility.
Although glyphosate shares many similar features with 3PG, a difference in their binding mode was observed. During the simulation, the glyphosate structure adjusted at ˜100 ps, responding to the absence of an equivalent of the intramolecular hydrogen bond seen with 3PG between the 2-hydroxyl and a phosphate oxygen (
The starting coordinates of the complex of R7 GLYAT from the 7th round gene shuffling with bound 3-phosphoglycerate (3PG) and AcCoA were taken from the x-ray structure, PDB:2JDD at 1.60-Å resolution. The initial structural coordinates of other GLYAT variants were constructed using InsightII's MODELER module but without invoking its auto energy minimization procedure (Accelrys, San Diego) and/or CHARMm IC facility (Brooks et al. (1983) J. Comput. Chem. 4:187-217). The in silico mutations based on R7-GLYAT included (1) F31Y, A114V, V132I and T135I for YVII GLYAT; (2) E14D, I19V, L36T, G38S, Y45F, I53V, Q67K, M75V, 191V, L105M, L1061 and K119R for the R11-GLYAT; and (3) L15I, V19I, V132I, I26L, F31Y, S33T, R37G, G47R, Q58E, Q65E, Q67E, Q68E, S89T, K82R, I97L, R101K, A114V, K119E, F130Y, T132I, R144K and I145L for the native GLYAT, respectively (
Molecular Dynamics (MD) simulations of all the liganded and unliganded systems (see Table 17 for a list of all the MD simulations that were performed) were carried out for >2,000 picoseconds (ps) by CHAR Mm 31b1 while, as a comparison, GROMACS 3.3.1 was also employed for the unliganded systems, R7-GLYAT, YVII-GLYAT, and R11-GLYAT for longer simulation times (−11,000 ps). For CHARMm simulations, the residue topology and parameter files as generated by CHARM M 27 (MacKerell et al. (2004) Journal of Computational Chemistry 25:1400-1415) were used for protein atoms and ligands. The Verlet-Leapfrog algorithm was used to integrate the equations of motion by using a time step of 2.0 fs. The SHAKE algorithm was used to constrain the bonds containing hydrogen to their equilibrium length. Electrostatic interactions were treated with a cutoff switch of 14 Å. A harmonic constraint of force of 10 kcal·mol−1·Å−2 was applied to heavy atoms in the heating phase, from 240 to 300 K for ˜200 ps. Then the constraints were only applied to heavy non-water atoms in equilibrium phase lasting >600 ps. Finally, all the constraints were released for the production phase at 300 K. For the GROMACS simulations, an OPLS-AA/L all-atom force field (Jorgensen et al. (1996) J. Am. Chem. Soc. 118:11225-11236; Kaminski et al. (2001) J. Phys. Chem. 105:6474-6487) was used and the NPT ensemble was computed at 300 K using the Berendsen thermostat. Electrostatics was treated as the particle mesh Ewald method with a short range cut-off of 10 Å. The time step for integration was 2 fs, calculated with the leap-frog algorithm. The LINCS algorithm was performed to restrain bond lengths. Each system was subjected to a 600-ps dynamics run with the protein restrained at 4.8 kcal·mol−1·Å−2 on all heavy atoms, followed by a 10 its free simulation. All of the simulations were performed on a Linux cluster.
Covariance analysis and principal component analysis (PCA, Tai et al., (2001) Biophys. J. 81:715-724) were performed on trajectories computed by either CHARMm or GROMACS to reduce the data complexity. The backbone atomic average displacements over trajectories were used as covariance variables. The covariance matrix and eigenvector analysis were obtained by applying the g_covar program of the GROMACS package. To capture the large amplitude, slow frequency, and dominant motions, the trajectories were projected into the top two eigenvectors. All the graphs were prepared with Pymol (http://pymol.sourceforge.net/), InsightII, and Gnuplot (http://www.gnuplot.info/).
aResI: The residue ids in the structure
bResN: The residue names; the common amino acid residue with three letter representation; GLF representing Glyphosate; ACO representing Acetyl Co-enzyme A; and HOH representing water.
cAtomI: The atom ids in structure.
dAtomN: The atom name.
eX, Y, Z: The atom coordinates of X, Y, and Z axes in angstroms.
fElemN: The corresponding element symbol for each atom.
gSegN: The segment names in the complex, Pro representing peptide, LIG representing the bound ligands, and WAT representing surrounding waters.
aResI: The residue ids in the structure
bResN: The residue names; the common amino acid residue with three letter representation; GLF representing Glyphosate; ACO representing Acetyl Co-enzyme A; and HOH representing water.
cAtomI: The atom ids in structure.
dAtomN: The atom name.
eX, Y, Z: The atom coordinates of X, Y, and Z axes in Angstroms.
fElemN: The corresponding element symbol for each atom.
gSegN: The segment names in the complex, Pro representing peptide, LIG representing the bound ligands, and WAT representing surrounding waters.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2010/041154 | 7/7/2010 | WO | 00 | 7/20/2012 |
Number | Date | Country | |
---|---|---|---|
61223613 | Jul 2009 | US |