1. Field of the Invention
The present invention relates generally to the fields of enzymology and X-ray crystallography. More specifically, the present invention relates to identification of the structure of dicamba monooxygenase (DMO) and methods for providing variants thereof, including variants with altered enzymatic activity.
2. Description of the Related Art
Dicamba monooxygenase (DMO) catalyzes the degradation of the herbicide dicamba (3,6-dichloro-o-anisic acid; also termed 3,6-dichloro-2-methoxybenzoic acid) to non-herbicidal 3,6-dichlorosalicylic acid (3,6-DCSA; DCSA) (Herman et al., 2005; GenBank accession AY786443; encoded sequence shown in SEQ ID NO:1). Expression of DMO in transgenic plants confers herbicide tolerance (U.S. Pat. No. 7,022,896).
The wild-type bacterial oxygenase gene (isolated from Pseudomonas maltophilia) encodes a 37 kDa protein composed of 339 amino acids that is similar to other Rieske non-heme iron oxygenases that function as monooxygenases (Chakraborty et al., 2005; Gibson and Parales, 2000; Wackett, 2002). In its active form the enzyme comprises a homo-oligomer of three monomers, or a homotrimer, of which the monomers are termed molecules “a”, “b”, and “c”. Activity of DMO typically requires two auxiliary proteins for shuttling electrons from NADH and/or NADPH to dicamba, a reductase and a ferredoxin (U.S. Pat. No. 7,022,896; Herman et al., 2005). However dicamba tolerance in transgenic plants has been demonstrated through transformation with DMO alone, indicating that a plant's endogenous reductases and ferredoxins may substitute in shuttling the electrons. The three dimensional structure of DMO, including identification of functional domains important to function and the nature of interaction with dicamba has not previously been determined. There is, therefore, a great need in the art for such information as it could allow, for the first time, targeted development of variant molecules exhibiting altered or even enhanced dicamba degrading activity. Furthermore, identification of other proteins with the same structural properties described here in could be used to create dicamba binding or degrading activity.
The present invention provides a crystallized dicamba monooxygenase polypeptide comprising a sequence at least 85% identical to any of SEQ ID NO:1, SEQ ID NO:2, or SEQ ID NO:3. The invention further provides, in one embodiment, a molecule comprising a binding surface for dicamba, that binds to dicamba with a KD or KM of between 0.1-500 μM, wherein the molecule does not comprise an amino acid sequence of any of SEQ ID NOs:1-3. In another embodiment, the invention provides a molecule comprising a binding surface for dicamba wherein the KD or KM for dicamba is between 0.1-100 μM. In certain embodiments, the molecule may be further defined as a polypeptide.
In another aspect, the invention provides an isolated polypeptide comprising dicamba monooxygenase (DMO) activity, wherein the polypeptide comprises a sequence selected from the group consisting of: a) a polypeptide sequence that when in crystalline form comprises a space group of P32; b) a polypeptide sequence that when in crystalline form comprises a binding site for a substrate, the binding site defined as comprising the characteristics of: (i) a volume of 175-500 Å3, (ii) electrostatically accommodative of a negatively charged carboxylate, (iii) accommodative of at least one chlorine moiety if present in the substrate, (iv) accommodative of a planar aromatic ring in the substrate, and (v) displays a distance from an iron atom that activates oxygen in the polypeptide to a carbon of the methoxy group of the substrate, sufficient for catalysis, of about 2.5 Å to about 7 Å; c) a polypeptide sequence that when in crystalline form comprises a unit-cell parameter of a=79-81 Å, b=79-81 Å, and c=158-162 Å; d) a polypeptide sequence that folds to produce a three-dimensional macromolecular structure characterized by the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, or a macromolecular structure that exhibits a root-mean-square difference (rmsd) in α-carbon positions of less than 2.0 Å with the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, when superimposed on the corresponding backbone atoms described by the structure coordinates of amino acid residues comprising the polypeptide, when 70% or more of the total macromolecular structure α-carbon atoms are used in the superimposition; e) a polypeptide sequence that folds to produce a three-dimensional macromolecular structure that has the same tertiary and quaternary fold as that characterized by the α-carbon coordinates for the structure represented in any of Tables 1-5, and 25-26; f) a polypeptide sequence comprising substantially all of the amino acid residues corresponding to H51, A316, L318, C49, P55, V308, F53, D47, A54, L73, I48, I301, Y307, H86, R304, C320, A300, V297, N84, E322, P50, R52, C68, Y70, L95, P315, P31, T30, L46, I313, G87, D321, D29, N154, G89, S94, M317, H71, D157, G72, V296, V298, D58, D153, R314, and R98 of SEQ ID NO:2 or SEQ ID NO:3; g) a polypeptide sequence comprising a Rieske center domain, further defined as comprising a polypeptide sequence that folds to produce a three-dimensional macromolecular structure characterized by the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, corresponding to amino acid residues 2-124 of SEQ ID NO:2 or SEQ ID NO:3, or a macromolecular structure that exhibits a root-mean-square difference (rmsd) in α-carbon positions of less than 2.0 Å with the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, corresponding to amino acid residues 2-124 of SEQ ID NO:2 or SEQ ID NO:3 when superimposed on the corresponding backbone atoms described by the structure coordinates of amino acid residues comprising the polypeptide, when 70% or more of the macromolecular structure α-carbon atoms corresponding to amino acid residues 2-124 of SEQ ID NO:2 or SEQ ID NO:3 are used in the superimposition; and h) a polypeptide sequence comprising a DMO catalytic domain, further defined as comprising a polypeptide sequence that folds to produce a three-dimensional macromolecular structure characterized by the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, corresponding to amino acid residues 125-343 of SEQ ID NO:2 or SEQ ID NO:3, or a macromolecular structure that exhibits a root-mean-square difference (rmsd) in α-carbon positions of less than 2.0 Å with the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, corresponding to amino acid residues 125-343 of SEQ ID NO:2 or SEQ ID NO:3 when superimposed on the corresponding backbone atoms described by the structure coordinates of amino acid residues comprising the polypeptide, when 70% or more of the macromolecular structure α-carbon atoms corresponding to amino acid residues 125-343 of SEQ ID NO:2 or SEQ ID NO:3 are used in the superimposition; wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3.
In particular embodiments, the invention further provides the isolated polypeptide comprising dicamba monooxygenase (DMO) activity, wherein the polypeptide comprises a DMO enzyme having the sequence domain: -W-X1-X2-X3-X4-L- (SEQ ID NO:152), in which X1 is Q, F, or H; X2 is A, D, F, I, R, T, V, W, Y, C, E, G, L, M, Q, or S; X3 is Q, G, I, V, A, C, D, H, L, M, N, R, S, T, or E; and X4 is A, C, G, or S. In other embodiments the isolated comprises a DMO enzyme having the sequence domain: -N-X1-Q-, in which X1 is A, L, C, F, F, I, N, Q, S, V, W, Y, M or T. In yet other embodiments, the isolated polypeptide comprises a DMO enzyme having the sequence domain: -W-X1-D- in which X1 is N, K, A, C, E, I, L, S, T, W, Y, H, or M. In still further embodiments, the isolated polypeptide exhibits an increased level of DMO activity relative to the activity of a wild type DMO. In particular embodiments the isolated polypeptide comprises a DMO enzyme having the sequence domain: X1-X2-G-X3-H (SEQ ID NO:153) in which X1 is S, H, or T; X2 is R, Q, S, T, F, H, N, V, W, Y, C, I, K, L, or M; and X3 is T, Q, or M. In particular embodiments, the isolated polypeptide comprises a substitution in residue X2, numbered according to the numbering of SEQ ID NO:2 or SEQ ID NO:3, selected from the group consisting of: R248C, R248I, R248K, R248L, R248M. In yet other embodiments, the isolated polypeptide comprises a DMO enzyme comprising one or more substitution(s) in residues numbered according to the numbering of SEQ ID NO:2 or SEQ ID NO:3, selected from the group consisting of: A169M, N218H, N218M, G266S, L282I, A287C, A287E, A287M, A287S, and Q288E.
In certain embodiments, the isolated polypeptide, comprising dicamba monooxygenase (DMO) activity wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3, comprises the secondary structural elements of table 6 or table 8. The isolated polypeptide may also be defined as comprising a polypeptide sequence that when in crystalline form comprises a unit-cell parameter α=β=90° and γ=120° wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3. In certain embodiments, the isolated polypeptide may further be defined as comprising one monomer per asymmetric unit. The isolated polypeptide may also further be defined as a crystal.
In other embodiments, the isolated polypeptide may be defined as comprising a polypeptide sequence that when in crystalline form diffracts X-rays for a determination of atomic coordinates at a resolution higher than 3.2 Å. In particular embodiments, the isolated polypeptide may be defined as comprising a polypeptide sequence that when in crystalline form diffracts X-rays for a determination of atomic coordinates at a resolution higher than 3.0 Å, or about 2.65 Å, or about 1.9 Å. In each of these embodiments, the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3.
In certain embodiments, the present invention also includes the isolated polypeptide comprising dicamba monooxygenase (DMO) activity as described above, wherein the presence of free iron enhances binding to dicamba and wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3. The isolated polypeptide comprising dicamba monooxygenase (DMO) activity, wherein the polypeptide comprises a sequence selected from the group consisting of: a) a polypeptide sequence that when in crystalline form comprises a space group of P32; b) a polypeptide sequence that when in crystalline form comprises a binding site for a substrate, the binding site defined as comprising the characteristics of: (i) a volume of 175-500 Å3, (ii) electrostatically accommodative of a negatively charged carboxylate, (iii) accommodative of at least one chlorine moiety if present in the substrate, (iv) accommodative of a planar aromatic ring in the substrate, and (v) displays a distance from an iron atom that activates oxygen in the polypeptide to a carbon of the methoxy group of the substrate, sufficient for catalysis, of about 2.5 Å to about 7 Å; c) a polypeptide sequence that when in crystalline form comprises a unit-cell parameter of a=79-81 Å, b=79-81 Å, and c=158-162 Å; d) a polypeptide sequence that folds to produce a three-dimensional macromolecular structure characterized by the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, or a macromolecular structure that exhibits a root-mean-square difference (rmsd) in α-carbon positions of less than 2.0 Å with the atomic structure coordinates of any of Tables 1-5, and 25-26, when superimposed on the corresponding backbone atoms described by the structure coordinates of amino acid residues comprising the polypeptide, when 70% or more of the total macromolecular structure α-carbon atoms are used in the superimposition; e) a polypeptide sequence that folds to produce a three-dimensional macromolecular structure that has the same tertiary and quaternary fold as that characterized by the α-carbon coordinates for the structure represented in any of Tables 1-5, and 25-26; f) a polypeptide sequence comprising substantially all of the amino acid residues corresponding to H51, A316, L318, C49, P55, V308, F53, D47, A54, L73, I48, I301, Y307, H86, R304, C320, A300, V297, N84, E322, P50, R52, C68, Y70, L95, P315, P31, T30, L46, I313, G87, D321, D29, N154, G89, S94, M317, H71, D157, G72, V296, V298, D58, D153, R314, and R98 of SEQ ID NO:2 or SEQ ID NO:3; g) a polypeptide sequence comprising a Rieske center domain, further defined as comprising a polypeptide sequence that folds to produce a three-dimensional macromolecular structure characterized by the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, corresponding to amino acid residues 2-124 of SEQ ID NO:2 or SEQ ID NO:3, or a macromolecular structure that exhibits a root-mean-square difference (rmsd) in α-carbon positions of less than 2.0 Å with the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, corresponding to amino acid residues 2-124 of SEQ ID NO:2 or SEQ ID NO:3 when superimposed on the corresponding backbone atoms described by the structure coordinates of amino acid residues comprising the polypeptide, when 70% or more of the macromolecular structure α-carbon atoms corresponding to amino acid residues 2-124 of SEQ ID NO:2 or SEQ ID NO:3 are used in the superimposition; and h) a polypeptide sequence comprising a DMO catalytic domain, further defined as comprising a polypeptide sequence that folds to produce a three-dimensional macromolecular structure characterized by the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, corresponding to amino acid residues 125-343 of SEQ ID NO:2 or SEQ ID NO:3, or a macromolecular structure that exhibits a root-mean-square difference (rmsd) in α-carbon positions of less than 2.0 Å with the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, corresponding to amino acid residues 125-343 of SEQ ID NO:2 or SEQ ID NO:3 when superimposed on the corresponding backbone atoms described by the structure coordinates of amino acid residues comprising the polypeptide, when 70% or more of the macromolecular structure α-carbon atoms corresponding to amino acid residues 125-343 of SEQ ID NO:2 or SEQ ID NO:3 are used in the superimposition; wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3, may further be defined as a folded polypeptide bound to a non-heme iron ion and comprising a Rieske center domain. The isolated polypeptide may also be further defined as a folded polypeptide bound to dicamba. In particular embodiments, the polypeptide comprises an amino acid sequence with from about 20% to about 99% sequence identity to the polypeptide sequence of any of SEQ ID NOs:1-3. Alternatively, the isolated polypeptide comprising dicamba monooxygenase (DMO) activity may comprise an amino acid sequence with less than about 95%, less than about 85%, less than about 65%, or less than about 45% identity to any of SEQ ID NOs:1-3.
The invention further relates to an isolated polypeptide comprising dicamba monooxygenase (DMO) activity, wherein the polypeptide comprises a C-terminal domain for donating an electron to a Rieske center, and further comprises an electron transport path from a Rieske center to a catalytic site having a conserved surface with a macromolecular structure formed by the amino acid residues N154, D157 H160, H165, and D294, corresponding to SEQ ID NO:2 or SEQ ID NO:3, or conservative substitutions thereof, and wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3. In certain embodiments, the isolated polypeptide wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3, comprises a polypeptide wherein the distance for iron FE2 to His71 ND1 is 2.57 ű0.2-0.3 Å; the distance for the His71 NE2 to Asp157 OD1 is 3.00 ű0.2-0.3 Å, the distance for Asp157 OD1 to His160 ND1 is 2.80 ű0.2-0.3 Å, and the distance for His160 NE2 to Fe is 2.43 ű0.2-0.3 Å.
The invention further provides an isolated polypeptide wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3, further comprising a subunit interface region having a conserved surface with a macromolecular structure formed by amino acid residues V325, E322, D321, C320, L318, M317, A316, P315, R314, I313, V308, Y307, R304, I301, A300, V297, V296, V164, Y163, H160, G159, D157, N154, D153, R98, L95, S94, G89, G87, H86, P85, N84, L73, G72, H71, Y70, P69, C68, Q67, D58, P55, A54, F53, R52, H51, P50, I48, D47, L46, P31, T30, and D29, corresponding to SEQ ID NO:2 or SEQ ID NO:3, or conservative substitutions thereof. The invention also relates to an isolated polypeptide, wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3, and further comprises a motif of residues H51a:R52:F53a:Y70a:H71a:H86a:H160c:Y163c:R304c:Y307c:A316c:L318c numbered corresponding to SEQ ID NO:2 or SEQ ID NO:3. The isolated polypeptide may further be defined as a homotrimer. A plant cell comprising a polypeptide comprising DMO activity, wherein the polypeptide does not comprise the amino acid sequence of any of SEQ ID NOs:1-3, is also an embodiment of the invention.
In another aspect, the invention relates to a method for determining the three dimensional structure of a crystallized DMO polypeptide to a resolution of about 3.0 Å or better comprising: (a) obtaining a crystal comprising a sequence at least 85% identical to any of SEQ ID NO:1, SEQ ID NO:2, or SEQ ID NO:3; and (b) analyzing the crystal to determine the three dimensional structure of crystallized DMO. In particular embodiments, the method comprises a method wherein the analyzing comprises subjecting the crystal to diffraction analysis or spectrophotometric analysis.
In still another aspect, the invention provides a computer readable data storage medium encoded with computer readable data comprising atomic structural coordinates representing the three dimensional structure of crystallized DMO or a dicamba binding domain thereof. In particular embodiments, the computer readable data comprises atomic structural coordinates representing: (a) a dicamba binding domain defined by structural coordinates of one or more residues according to any of Tables 1-5, and 25-26, selected from the group consisting of L155, D157, L158, H160, A161, H165, R166, A169, Q170, D172, A173, A216, W217, N218, I220, N230, I232, A233, V234, S247, R248, G249, T250, H251, G266, S267, L282, Q286, A287, Q288, A289, and V291 numbered corresponding to SEQ ID NO:2, or conservative substitutions thereof; (b) an interface domain defined by structure coordinates of one or more residues according to any of Tables 1-5, and 25-26, selected from the group consisting of V325, E322, D321, C320, L318, M317, A316, P315, R314, I313, V308, Y307, R304, I301, A300, V297, V296, V164, Y163, H160, G159, D157, N154, D153, R98, L95, S94, G89, G87, H86, P85, N84, L73, G72, H71, Y70, P69, C68, Q67, D58, P55, A54, F53, R52, H51, P50, I48, D47, L46, P31, T30, and D29, numbered corresponding to SEQ ID NO:2 or SEQ ID NO:3, or conservative substitutions thereof; (c) an electron transport path from a Rieske center to a catalytic site defined by structure coordinates of one or more residues according to any of Tables 1-5, and 25-26, selected from the group consisting of N154, D157 H160, H165, and D294, numbered corresponding to SEQ ID NO:2 or SEQ ID NO:3, or conservative substitutions thereof; (d) a C-terminal domain defined by structure coordinates of one or more residues according to any of Tables 1-5, and 25-26, selected from the group consisting of A323, A324, V325, R326, V327, S328, R329, E330, I331, E332, K333, L334, E335, Q336, L337, E338, A339, A340 numbered corresponding to SEQ ID NO:2 or SEQ ID NO:3; or (e) a domain of any of (a)-(d) exhibiting a root mean square deviation of amino acid residues, comprising α-carbon backbone atoms, of less than 2 Å with the atomic structure coordinates of any of Tables 1-5, and 25-26, when superimposed on the backbone atoms described by the structure coordinates of said amino acids when 70% or more of the macromolecular structure α-carbon atoms are used in the superimposition. In yet other particular embodiments, the computer readable data storage medium comprises the structural coordinates of any of Tables 1-5, and 25-26. A computer programmed to produce a three-dimensional representation of the data comprised on the computer readable data storage medium is also an aspect of the invention.
In accordance with the invention, compositions and methods are provided for engineering molecules with dicamba binding activity. In specific embodiments the molecules comprise a variant of dicamba monooxygenase (DMO) that can be engineered based on the identification of the dicamba monooxygenase crystal structure and residues important for DMO function as described herein. Such variants may be defined in the presence or absence of a non-heme iron cofactor as well as of the substrate, dicamba. In one aspect, the invention comprises a crystallized DMO polypeptide, wherein the crystal comprises a space group of P32 with unit cell parameters of about a=79-81 Å, b=79-81 Å, and c=158-162 Å; for instance a=80.06 Å, b=80.06 Å, and c=160.16 Å, or a=80.56 Å, b=80.56 Å, and c=159.16 Å; and about α=β=90° and about γ=120°, and wherein the crystal comprises a polypeptide with a primary sequence that does not comprise SEQ ID NOs: 1-3.
In another aspect, the invention comprises an isolated polypeptide with DMO activity, wherein the polypeptide sequence does not comprise SEQ ID NOs:1-3, and which when in crystalline form comprises a space group of P32 with unit cell parameters of about a=79-81 Å, b=79-81 Å, and c=158-162 Å; for instance a=80.06 Å, b=80.06 Å, and c=160.16 Å, or a=80.56 Å, b=80.56 Å, and c=159.16 Å; and about α=β=90° and about γ=120°. The asymmetric unit may be a monomer. Three monomers form a symmetric (trimer) unit, and when in crystalline form the three Rieske (Fe2S2) clusters of the symmetric unit may be defined as arranged about 50 Å apart in an approximately equilateral triangle. The trimer can form a lattice with a high solvent content of about 51%. The invention also relates to such an isolated polypeptide comprising an amino acid sequence with, for example, from about 20% to about 99% sequence identity with SEQ ID NOs:1-3, as determined, for instance, by BLAST (Altschul et al., 1990) or another alignment method as described herein.
The invention further relates to a molecule, such as a polypeptide, displaying dicamba binding activity, as well as one also displaying DMO activity, for instance, as determined by a measurable KD (see, for example, Copeland (2000)); or KM (see, for example, Copeland (2000); Cleland (1990); and Johnson (1992)) for dicamba of about 0.1-500 μM under physiological conditions (e.g. pH, ionic strength, and temperature) found in plants and in terrestrial and aquatic environments. In specific embodiments, the KD or KM for dicamba may be about 0.1-100 μM.
A polypeptide or other molecule provided by the invention may also be defined as comprising a three-dimensional macromolecular structure characterized by the atomic structure coordinates of peptide backbone atoms of any of Tables 1-5, and 25-26, or a macromolecular structure exhibiting a root-mean-square difference (r.m.s.d) in α-carbon positions over the length of each of the three polypeptides that make up the asymmetric unit of less than 1.5 Å with the atomic structure coordinates of Tables 1-5, and 25-26, when superimposed on the corresponding backbone atoms described by the structural coordinates of amino acid residues comprising the polypeptide, when at least 70% of the total macromolecular structure α-carbon atoms are used in the superimposition, and wherein the polypeptide does not comprise an amino acid sequence of SEQ ID NOs:1-3.
A macromolecular structure provided by the invention may also be defined as comprising one or more of: (i) a path for the donation of an electron(s) to a Rieske (Fe2S2) center; (ii) a macromolecular structure defining a Rieske center; (iii) a macromolecular structure defining an electron transport path from the Rieske center to a substrate binding (catalytic) site; (iv) a macromolecular structure defining a substrate binding site; (v) a macromolecular structure defining a subunit interface region; and (vi) a macromolecular structure defining a C-terminal region corresponding to residues 323-340 of SEQ ID NO:3. The invention also relates to macromolecular structures of (i) to (vi) exhibiting a root-mean-square difference (r.m.s.d) in α-carbon positions over the length of each of the three polypeptides that make up the asymmetric unit of less than 1.5 Å or less than 2.0 Å with the atomic structure coordinates of Tables 1-5, and 25-26, when superimposed on the corresponding backbone atoms corresponding to each of portions (i) to (vi) of the structure described by the structural coordinates of amino acid residues comprising the polypeptide, when at least 70% of the total macromolecular structure α-carbon atoms defining the given structure of (i)-(vi) are used in the superimposition.
A conserved pocket or surface of a macromolecular structure, such as a polypeptide, may be defined as the space or surface in or on which a molecule of interest, for example a dicamba molecule, or other structure, such as a polypeptide, can interact due to its shape complementary properties. The “fit” may be spatial as that of a “lock and key”, and also may address properties such as those described below for conservative acid replacement (i.e. physico-chemical structure). A conserved pocket allows for the correct positioning and orientation of a ligand or substrate for their desired binding and for the possible enzymatic activity associated with the macromolecular structure, while also possessing the appropriate electrostatic potential, e.g. proper charge(s), and/or binding property, e.g. ability to form hydrogen bond(s), for instance as donor or acceptor. Thus a conserved pocket or surface is a space with the proper shape (spatial arrangement of atoms) as well as physicochemical properties to accept, for instance, a dicamba molecule or other substrate, with a given range of specificity and affinity, and such that the space may be designed for. A conserved space or surface may be identified by use of available modeling software, such as Molsoft ICM (Molsoft LLC, La Jolla, Calif.). The concept of a conserved surface or complimentary space has been discussed in the art (e.g. Fersht, 1985; Dennis et al., 2002; Silberstein et al., 2003; Morris et al., 2005).
Modification may be made to the polypeptide sequence of a protein such as the sequences provided herein while retaining enzymatic activity. The following is a discussion based upon changing the amino acids of a protein to create similar, or even an improved, modified polypeptide and corresponding coding sequences. It is known, for example, that certain amino acids may be substituted for other amino acids in a protein structure without appreciable loss of interactive binding capacity with structures such as binding sites on substrate molecules. Since it is the interactive capacity and nature of a protein that defines that protein's biological functional activity, certain amino acid sequence substitutions can be made in a protein sequence, and, of course, its underlying DNA coding sequence, and nevertheless obtain a protein with like properties. It is thus contemplated that various changes may be made in a DMO peptide sequences as described herein, and corresponding DNA coding sequences, without appreciable loss of their three-dimensional structure, or their biological utility or activity.
In making such changes, the hydropathic index of amino acids may be considered. The importance of the hydropathic amino acid index in conferring interactive biologic function on a protein is generally understood in the art (Kyte et al., 1982). It is accepted that the relative hydropathic character of the amino acid contributes to the secondary structure of the resultant protein, which in turn defines the interaction of the protein with other molecules, for example, enzymes, substrates, receptors, DNA, antibodies, antigens, and the like. Each amino acid has been assigned a hydropathic index on the basis of their hydrophobicity and charge characteristics (Kyte et al., 1982), these are: isoleucine (+4.5); valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine (−0.4); threonine (−0.7); serine (−0.8); tryptophan (−0.9); tyrosine (−1.3); proline (−1.6); histidine (−3.2); glutamate (−3.5); glutamine (−3.5); aspartate (−3.5); asparagine (−3.5); lysine (−3.9); and arginine (−4.5).
It is known in the art that amino acids may be substituted by other amino acids having a similar hydropathic index or score and still result in a protein with similar biological activity, i.e., still obtain a biological functionally equivalent protein. In making such changes, the substitution of amino acids whose hydropathic indices are within ±2 is preferred, those which are within ±1 are particularly preferred, and those within ±0.5 are even more particularly preferred.
It is also understood in the art that the substitution of like amino acids can be made effectively on the basis of hydrophilicity. U.S. Pat. No. 4,554,101 states that the greatest local average hydrophilicity of a protein, as governed by the hydrophilicity of its adjacent amino acids, correlates with a biological property of the protein. As detailed in U.S. Pat. No. 4,554,101, the following hydrophilicity values have been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0±1); glutamate (+3.0±1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (−0.4); proline (−0.5±1); alanine (−0.5); histidine (−0.5); cysteine (−1.0); methionine (−1.3); valine (−1.5); leucine (−1.8); isoleucine (−1.8); tyrosine (−2.3); phenylalanine (−2.5); tryptophan (−3.4). It is understood that an amino acid can be substituted for another having a similar hydrophilicity value and still obtain a biologically equivalent protein. In such changes, the substitution of amino acids whose hydrophilicity values are within ±2 is preferred, those which are within ±1 are particularly preferred, and those within ±0.5 are even more particularly preferred. Exemplary substitutions which take these and various of the foregoing characteristics into consideration are well known to those of skill in the art and include: arginine and lysine; glutamate and aspartate; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine.
Conservative amino acid substitutions may thus be identified based on physico-chemical properties, function, and Van der Waals volumes. Physico-chemical properties may include chemical structure, charge, polarity, hydrophobicity, surface properties, volume, presence of an aromatic ring, and hydrogen bonding potential, among other properties. Several classifications of the amino acids have been proposed (e.g. Taylor, 1986; Livingstone et al., 1993; Mocz, 1995; and Stanfel, 1996). A hierarchical classification of the twenty natural amino acids has also been described (May, 1999). Descriptions of amino acid surfaces are known (e.g. Chothia, 1975); amino acid hydrophobicity is discussed by Zamyatin (1972); and hydrophobicity of amino acid residues is discussed in Karplus (1997). Amino acid substitutions at enzymatic active sites have also been described (e.g. Gutteridge and Thornton, 2005). Binding pocket shape has also been discussed (e.g. Morris et al., 2005). Binding pockets may also be defined based on other properties, such as charge (e.g. Bate et al., 2004). These teachings may be utilized to identify amino acid substitutions that result in altered (e.g. improved) enzymatic function.
It is also understood that a polypeptide sequence sharing a degree of primary amino acid sequence identity with a polypeptide that displays a similar function, such as dicamba binding activity or the presence of a Rieske domain, would possess a common structural fold, and vice versa. The term “fold” refers to the three-dimensional arrangement of secondary structural elements (i.e., helices and β-sheets) in a protein. Moreover, a “folded polypeptide” refers to a polypeptide sequence, or linear sequence of amino acids, that possesses a fold. Generally, at least about 30% primary amino acid sequence identity within the region of the binding/catalytic domain would be necessary to specify such a structural fold (e.g. Todd et al., 2001), although proteins with the same function (i.e. type of chemical transformation) are known that nonetheless display even less than 25% primary amino acid sequence identity when given catalytic site regions are compared. Specific structural folds and substrate binding surfaces may be predicted based on amino acid sequence similarity (e.g. Lichtarge and Sowa, 2002). In certain embodiments, a polypeptide comprising a structural fold in common with DMO comprises an amino acid sequence with from about 20% to about 99% sequence identity to the polypeptide sequence of any of SEQ ID NOs:1-3. In particular embodiments, the level of sequence identity with any of SEQ ID NOs:1-3 may be less than 95%, less than 85%, less than 65%, less than 45%, or about 30% to about 99%.
An “interface region” may be defined as that region that describes the surface between two adjacent molecules, such as neighboring polypeptide chains or subunits, wherein molecules (e.g. amino acids) interact, for example within their Van der Waals radii. The interface region in the DMO trimer bridges the two iron centers, is functionally important for contact between subunits in the homo-oligomer (trimer), and is critical for the catalytic cycle. Interfaces in proteins can be described or classified (e.g. Ofran et al., 2003; Tsuchiya et al., 2006; Russell et al., 2004).
The invention also relates to a macromolecular structure defining a path for donation of an electron to a Rieske (Fe2S2) center as an electron is transferred from the electron donor (e.g. ferredoxin) to the Rieske center. In specific embodiments, the macromolecular structure may be defined as a polypeptide and may be a DMO.
The invention also further relates to a polypeptide comprising a macromolecular structure defining a Rieske center domain, comprising substantially all of the amino acid residues 2-124 of a polypeptide sequence numbered corresponding to SEQ ID NO:2 or SEQ ID NO:3, in which the C49, H51, C68, and H71 residues of one monomer, or ones corresponding to them, participate in the formation of the Rieske cluster in DMO.
In yet another embodiment, the invention relates to a DMO comprising a macromolecular structure defining an electron transport path from the Rieske center of one monomer to a non-heme iron at the substrate binding (catalytic) site in a second monomer that comprises the trimeric DMO asymmetric unit, comprising substantially all of the following amino acid residues: H71 in one monomer; and in the second monomer: D157, the residues which chelate the non-heme iron: H160, H165, and D294, and a residue which plays a role in such chelation, N154. This structure is shown for instance in
The invention also relates to a molecule, such as a DMO polypeptide, comprising a substrate (dicamba) binding site which comprises the following characteristics: (i) a volume of 175-500 Å3; (ii) electrostatically accommodative of a negatively charged carboxylate; (iii) accommodative of at least one chlorine moiety if present in the substrate; (iv) accommodative of a planar aromatic ring in a substrate; and (v) displays a distance from an iron atom that activates oxygen in the DMO polypeptide to a carbon of the methoxy group of the substrate, sufficient for catalysis, of about 2.5 Å to about 7 Å.
The substrate binding site/catalytic domain may also be defined as comprising a macromolecular structure defining a substrate binding pocket, within 4 Å of the bound substrate/product, and comprising substantially all of the amino acid residues L155, D157, L158, H160, A161, H165, R166, A169, Q170, D172, A173, S200, A216, W217, N218, I220, N230, I232, A233, V234, S247, G249, H251, S267, L282, W285, Q286, A287, Q288, A289, L290, and V291. Additional residues are within a 5-6 Å sphere around the bound substrate/product, including S200, L202, M203, and F206. The active site and the dicamba/DCSA substrate/product binding site nearly overlap. The catalytic domain extends between about residues corresponding to those numbered 125-343 from the N-terminus of a full length DMO polypeptide, for instance as numbered in SEQ ID NO:2 or SEQ ID NO:3.
The invention further relates to a DMO comprising a macromolecular structure defining a subunit interface region, also termed an “interaction domain”, comprising substantially all of the amino acid residues V325, E322, D321, C320, L318, M317, A316, P315, R314, I313, V308, Y307, R304, I301, A300, V297, V296, E293, R166, V164, Y163, H160, G159, D157, N154, D153, R98, L95, S94, G89, G87, H86, P85, N84, L73, G72, H71, Y70, P69, C68, Q67, D58, P55, A54, F53, R52, H51, P50, I48, D47, L46, P31, T30, and D29, numbered corresponding to SEQ ID NO:3.
The invention also relates to a macromolecular structure defining a dicamba binding site, comprising substantially all of the amino acid residues L155, D157, L158, H160, A161, H165, R166, A169, Q170, D172, A173, A216, W217, N218, I220, N230, I232, A233, V234, S247, G249, H251, S267, L282, Q286, A287, Q288, A289, and V291 numbered corresponding to SEQ ID NO:3, wherein the atomic structure coordinates for these residues are as listed in Table 4.
The invention further relates to a macromolecular structure defining a DCSA binding site, comprising substantially all of the amino acid residues L155, D157, L158, H160, A161, H165, R166, A169, Q170, D172, A173, A216, W217, N218, I220, N230, I232, A233, V234, S247, G249, H251, S267, L282, Q286, A287, Q288, A289, and V291 numbered corresponding to SEQ ID NO:3, wherein the atomic structure coordinates for these residues are as listed in Table 5.
The invention further relates to an isolated polypeptide comprising dicamba monooxygenase (DMO) activity, wherein the polypeptide comprises a DMO enzyme having the following sequence domain near residue W285 of the primary peptide sequence numbered for instance according to SEQ ID NO:3: -W-X1-X2-X3-X4-L- (SEQ ID NO:152), in which X1 is Q, F, or H; X2 is A, D, F, I, R, T, V, W, Y, C, E, G, L, M, Q, or S; X3 is Q, G, I, V, A, C, D, H, L, M, N, R, S, T, or E; and X4 is A, C, G, or S. The isolated polypeptide may comprise a DMO enzyme having the following sequence domain near residue A169: -N-X1-Q-, in which X1 is A, L, C, F, F, I, N, Q, S, V, W, Y, M or T. In yet other embodiments, the isolated polypeptide comprises a DMO enzyme comprising a sequence domain near residue N218: -W-X1-D- in which X1 is N, K, A, C, E, I, L, S, T, W, Y, H, or M.
The invention also relates to an isolated polypeptide exhibiting an increased level of DMO activity relative to the activity of a wild type DMO when measured under identical, or substantially identical, conditions. For instance, the isolated polypeptide may exhibit at least 105%, 110%, 120%, 130%, 140%, 150%, or more of the activity of a wild type DMO enzyme when measured under identical, or substantially identical, conditions. Thus, for instance, the isolated polypeptide may comprise a DMO enzyme having a sequence domain near residue R248 which comprises: X1-X2-G-X3-H (SEQ ID NO:153) in which X1 is S, H, or T; X2 is R, Q, S, T, F, H, N, V, W, Y, C, I, K, L, or M; X2 is R, Q, S, T, F, H, N, V, W, Y, C, I K, L, or M, and X3 is T, Q, or M. The isolated polypeptide may also, or alternatively, comprise a substitution in residue X2, numbered according to the numbering of SEQ ID NO:2 or SEQ ID NO:3, and selected from the group consisting of: R248C, R248I, R248K, R248L, R248M. Or, the isolated may comprise a DMO enzyme comprising one or more substitution(s) in residues numbered according to the numbering of SEQ ID NO:2 or SEQ ID NO:3, selected from the group consisting of: A169M, N218H, N218M, G266S, L282I, A287C, A287E, A287M, A287S, and Q288E.
The invention further relates to a macromolecular structure defining a DCSA binding site, comprising substantially all of the amino acid residues L155, D157, L158, H160, A161, H165, R166, A169, Q170, D172, A173, A216, W217, N218, I220, N230, I232, A233, V234, S247, G249, H251, S267, L282, W285, Q286, A287, Q288, A289, L290, and V291 numbered corresponding to SEQ ID NO:3, wherein the atomic structure coordinates for these residues are as listed in Tables 25-26.
The invention still further relates to a plant cell exhibiting tolerance to the herbicidal effect of dicamba comprising a polypeptide with DMO activity, wherein the polypeptide sequence does not comprise SEQ ID NO:1, SEQ ID NO:2, or SEQ ID NO:3, and in which the polypeptide, when in crystalline form, comprises a space group of P32 with unit cell parameters of about a=79-81 Å, b=79-81 Å, and c=158-162 Å, for instance a=80.06 Å, b=80.06 Å, and c=160.16 Å, or a=80.56 Å, b=80.56 Å, and c=159.16 Å; and about α=β=90° and about γ=120°
In another aspect, the invention relates to a method for determining the three dimensional structure of a crystallized DMO polypeptide. Methods for obtaining the polypeptide in crystalline form are disclosed, as are methods for analysis by diffraction and spectrophotometric methods, including analysis of the resulting diffraction and spectrophotometric data. Such analysis results in sets of atomic coordinates that define the three dimensional structure of the active DMO structure and of a crystal lattice comprising the structure, and are found, for instance, in Tables 1-5, and 25-26. Such structures have been determined for DMO in the presence and absence of Fe2+, as well as in the presence or absence of the substrate (dicamba) and the product (DCSA).
In yet another aspect, the invention relates to computer readable storage media comprising atomic structural coordinates of any of Tables 1-5, and 25-26, or a subset of a one of these tables, representing for instance the three dimensional structure of crystallized DMO, or the dicamba binding domain thereof, and to a computer programmed to produce a three dimensional representation of the data comprised on such a computer readable storage medium.
Methods for identifying the potential for an agent to bind to the substrate binding pocket of DMO are also related to the invention. Such an agent may be an inhibitor of DMO activity, acting for instance as a synergist or potentiator for dicamba's herbicidal activity, or may also demonstrate herbicidal activity. Such methods may be carried out by one of skill in the art by computer-based modeling of the three dimensional structure of such an agent in the presence of a three dimensional model of DMO structure, and analyzing the ability of such an agent to bind to DMO, and also in certain embodiments to undergo catalysis by DMO.
The invention further relates to methods for utilizing the physico-chemical characteristics and three dimensional structure of the substrate binding pocket of DMO to identify molecules useful for dicamba degradation, water purification, degradation of other xenobiotics, or identification of analogous structures, including polypeptides that are functional homologs of DMO, from closely or otherwise related organisms, or are obtained via mutagenesis.
The following examples are included to illustrate embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples that follow represent techniques discovered by the inventor to function well in the practice of the invention. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
The DMO X-ray crystal structure was solved by multiwavelength anomalous dispersion (MAD) methods. Four basic types of DMO structures were obtained first: (a) DMO with the non-heme iron site disordered and unoccupied: (“DMO Crystal 3”, “DMO Crystal 4”); (b) DMO with the non-heme iron site ordered and occupied (“DMO Crystal 5”); (c) DMO with the non-heme iron site ordered and occupied and dicamba bound (“DMO Crystal 6”); and (d) DMO with DCSA bound (“DMO Crystal 7”). Subsequently, refined crystal structures of (e) DMO co-crystallized with dicamba and with cobalt at the non-heme iron site (“DMO Crystal 11”); (f) DMO co-crystallized with DCSA and with cobalt at the non-heme iron site (“DMO Crystal 12”); were also determined. The DMO structures lacking ordered and occupied non-heme iron sites are DMO Crystal 3, with Rwork=30.8% and Rfree=35.0% for 20-3.0 Å data, and DMO Crystal 4, with Rwork=31.3% and Rfree=36.6% for 20-3.2 Å data. The DMO structure with the occupied non-heme iron site is DMO Crystal 5, which has Rwork=33.4% and Rfree=37.3% for 20-2.65 Å data. The DMO structure with the occupied non-heme iron sites and dicamba bound is DMO Crystal 6, which has Rwork=31.6% and Rfree=35.7% for 20-2.7 Å data. Detailed refinement information for these crystals is listed in Tables 11-13. The DMO Crystal 5 structure, which contains an occupied and ordered non-heme iron site in all three molecules of the crystallographic asymmetric unit, is the DMO structure used for several further structural comparisons and assessments. Residue names, atom names, and connectivity conventions used in these protein structure determinations as described below follow Protein Data Bank (PDB) standards (deposit.rcsb.org/het_dictionary.txt; Berman et al., 2000). The connectivity information for dicamba (residue name “DIC”) and for DCSA (residue name “DCS”) is listed in
DMO possesses a unique Rieske non-heme iron oxygenase (“RO”; Gibson & Parales, 2000) structure that is in some aspects similar to, yet distinct from, other RO enzymes of known structure. Detailed descriptions of the DMO quaternary structure, Rieske domain, catalytic domain, the electron transfer pathway, substrate biding pocket, interaction domain, and C-terminal helix region are provided below (Sections A-G).
A. Quaternary Structure of DMO is Compositionally Like Other RO α Subunit Structures
The three-fold symmetric arrangement of DMO oxygenase molecules in the crystallographic asymmetric unit, as shown in
Moreover in the DMO-non-heme iron structure (
The DMO structure is also similar in composition to other RO α subunit structures (Ferraro et al., 2005). Other RO α subunit structures possess a Rieske Fe2S2 cluster domain and a mononuclear iron-containing catalytic domain, and the DMO-iron soak structure possesses these elements as well. In DMO, the Rieske domain is found from residue 2-124 and contains the 501-Fe2S2 cluster. The catalytic domain extends from residue 125-343, and contains the 502-non-heme Fe (Numbering of iron atoms as per PDB format). In the DMO-non-heme iron site structure (DMO Crystal 5), residues 176-195 and 235-246 are disordered, and not visible in the structure. The structure for molecule A of DMO Crystal 5, with the Rieske and catalytic domains clearly visible, is displayed in
It is noteworthy that DMO is significantly smaller in size to the α subunits of other structurally characterized RO enzymes. This DMO construct for crystallography contained 349 amino acids, and contains 343 if the C-terminal hexa-His tag is excluded. The α-subunits for other RO enzymes with Rieske domains when compared to DMO are significantly larger in size. The naphthalene 1,2-dioxygenase from Rhodococcus sp. (PDB entry 2B24; SEQ ID NO:20) α-subunit contains 470 amino acids; the naphthalene 1,2-dioxygenase from Pseudomonas sp. (PDB entry INDO; SEQ ID NO:22) α-subunit has 449 amino acids; the nitrobenzene dioxygenase from Comamonas sp. (PDB entry 2BMO; SEQ ID NO: 17) α-subunit contains 447 amino acids; and the biphenyl dioxygenase from Rhodococcus sp. (PDB entry 1ULI; SEQ ID NO:21) α-subunit has 460 amino acids.
B. Rieske Domain
The DMO Rieske domain (residues 2-124) from DMO Crystal 5 is displayed in
The domain alignment results and a review of
C. Catalytic Domain
The structure of the catalytic domain in DMO Crystal 5, which extends from residues 125-343, is displayed in
A consideration of the alignment results immediately suggests that there are both key structural commonalities and differences between the catalytic domains of DMO and NDO-R. Clearly, 106 Cα atoms in the two aligned domains have close spatial positions relative to one another; however, this represents only 48% of the residues in the DMO catalytic domain. A close examination of
The non-heme iron (Fe) ions in the aligned domains are also in a similar location.
All in all, while the catalytic domains of DMO and NDO-R possess a common 7-stranded antiparallel β-sheet, and have good to reasonable spatial overlaps in five helical regions and in the location of the non-heme iron binding sites, less than 50% of the Cα's in the DMO catalytic domain align well with those in NDO-R.
D. Electron Transfer
In the DMO-Fe+2 soak crystal structure, DMO Crystal 5, the DMO Rieske domain of one subunit is ˜12 Å away from the non-heme iron site in another subunit (
E. Dicamba and DCSA Binding
The DMO Crystal 6 structure reveals that dicamba binds in the same, chemically relevant orientation in molecules “b” and “c”. Dicamba binds in a cavity under Ile232, and it is oriented by three key hydrogen bonds involving Asn230 and His251 with dicamba (
Dicamba binds in a pocket directly below the Ile232 side chain, and this pocket is further bounded above by Asn218, Ile220, Ser247, and Asn230. On the sides this pocket is bounded by Leu158, Asp172, Leu282, Gly249, Ser267, and His251. Below dicamba, this cavity is bordered by Ala289, the non-heme iron, the residues which chelate it (His160, His165, Asp294) and Asn154, Ala169, Ala287, Tyr263, and Leu155.
DCSA, or 3,6-dichlorosalicylic acid, binds to DMO in a nearly identical manner as does dicamba, and the DMO-DCSA interaction in molecule A of the structure can be viewed in
The “Crystal 11” and “Crystal 12” structures (Tables 25-26) indicate that dicamba and DCSA bind above W285, which is opposite I232 in the active site cavity. These two residues provide key hydrophobic contacts to the dicamba/DCSA ring. Residues H251, N230, Q286, and W285 provide key polar interactions and in almost all cases H251, W285 and N230 engage in hydrogen bonds with substrate or product and provide stabilizing polar interactions to the carboylate moiety of dicamba/DCSA. In a number of the structures specific hydrogen bonds are predicted/observed for N230, H251 and W285: (a) Q286 NE2-DCSA or dicamba O1 (possible H bond depending on rotamer); (b) W285 NE1-DCSA or dicamba O1; (c) H251 NE2 to O1; and (d.) N230 nd2 to O2 are in the carboxylate moiety of dicamba and DCSA. The non-heme Co+2 ion binds a water molecule or an oxygen (O2) molecule in the 2.05 Å resolution DMO-Co-DCSA and DMO-Co-dicamba structures. Since Co2+ binds to the non-heme iron site in DMO, the above observations are also valid for the DMO-Fe2+-dicamba and the DMO-Fe2+-DCSA structures.
F. Interaction Domain
Interaction domains in proteins, those that provide a surface for other entities (e.g. proteins or polynucleotides) to dock in certain spaces, are conserved domains that contain functionally similar residues in many cases. In addition, this conservation is usually coincident with the chemical functionality and properties of the residues. For instance, a common substitution comprises leucine for isoleucine and vice versa. Similarly, phenylalanine and tyrosine may substitute for one another in that they have much of the same aromatic character and nearly the same steric volume while tyrosine provides the added possibility of a polar interaction with its hydroxyl group. In this context, conservative amino acid substitution is defined as the replacement of an amino acid with another amino acid of similar physico-chemical properties, e.g. chemical structure, charge, polarity, hydrophobicity, surface, volume, presence of aromatic ring, hydrogen bonding potential. There are several classifications of the amino acids that can be found in the literature, e.g. Taylor, 1986, Livingstone et. al., 1993; Mocz, 1995 and Stanfel, 1996. Hierarchical classification of the twenty natural amino acids has also been described (May, 1999). Example descriptions of the surfaces of amino acids can be found in Chothia, (1975). Example descriptions of amino acid volumes can be found in Zamyatin (1972), and hydrophobicity descriptions in Karplus (1997). While one may infer conserved sequence homology among a group of proteins and even identify the conserved secondary motifs, in the absence of structural data it is not easy to identify the specific function of these residues and domains and describe them in detail. The functional nature of these residues is more subtle than that of the more commonly identified amino acid motifs, for example those that are likely to participate in binding the Rieske center in the protein. Thus below are described a handful of key residues which are largely responsible for the subunit to subunit interactions that bring together the Rieske domain of one monomer with the non-heme iron domain of another monomer.
The trimer structure of the DMO crystal reveals that the interface between subunits is at the Rieske center of one subunit to the non-heme iron of another. The residues that make up this region are important for maintaining surface contact for oligomeric structure, productive electron transport and ultimately for catalysis.
G. C-Terminal Helix Region
The C-terminal helix of DMO is defined by hydrophobic residues around positions 323-340, AAVRVSREIEKLEQLEAA (SEQ ID NO:24). Mutation of many of these residues results in loss of enzymatic activity, and this region apparently interacts with helper proteins such as ferredoxin.
A protein sample of DMO (hereafter referred to as DMO or DMOw; SEQ ID NO:3) was prepared to 10-20 mg/mL in 30 mM Tris-pH 8.0 buffer, 10 mM NaCl, 0.1 mM EDTA, and trace amount of PMSF. Protein crystals suitable for X-ray diffraction studies were obtained using the vapor diffusion crystallization method (McPherson, 1982). Deep red, diamond-shaped crystals were obtained using a precipitant solution of 17% PEG 6000 and 100 mM sodium citrate-pH 6.0 in the reservoir, and suspending over this reservoir a droplet that contained equal volumes of the protein solution and the reservoir solution.
An initial 2.8 Å X-ray diffraction data set was obtained on a cryo-cooled DMO crystal using a Rigaku protein crystallography data collection system (Rigaku Americas Corporation, The Woodlands, Tex.). The data were collected on an R-AXIS IV++ imaging plate detector, with Cu Kα X-rays produced by a MicroMax-007 X-ray generator operating at 40 kV and 20 mA; beam collimation was provided by HiRes2 Confocal Optics using a beam path that was evacuated with He gas. Cryo-cooling to approximately −170° C. was provided by an X-stream 2000 system. Prior to data collection, the DMO crystal was transferred to a cryo-solution that was 22% PEG 6000, 0.1M citrate buffer-pH 6.0, 22% glycerol, 1 mM NaN3 prior, and then plunge-cooled in liquid nitrogen. This crystal was then transferred to the R-AXIS IV++ goniostat for data collection using cryo-cooled tongs.
The initial 2.8 Å DMO crystal data were processed using the HKL2000 package (Otwinowski & Minor, 1997). Extensive analyses revealed that the DMO crystal belongs to the trigonal crystal system, and is space group P31 or P32. A summary of the initial data collection statistics are listed in Table 11 under the header ‘DMO Crystal 1’. Assuming three molecules of DMO per crystallographic asymmetric unit results in a Matthews parameter of 2.52 Å3/Da (Matthews, 1968) and a crystal solvent content of 51%, which are reasonable values and consistent with the high-quality diffraction displayed using the X-ray diffraction unit.
A. Efforts to Solve the Structure by Molecular Replacement Phasing Methods.
DMO has been classified as a member of the Rieske non-heme iron family of oxygenases (Chakraborty et al., 2005). Sequence comparison analysis of the DMO sequence with sequences of known protein structures from the Protein Data Bank (PDB; www.rcsb.org) revealed some similarity in the N-terminal Rieske domain portion of DMO with the following Rieske-containing dioxygenases, which are listed in order of decreasing similarity: naphthalene 1,2-dioxygenase from Rhodoccocus sp. (PDB entry 2B24; SEQ ID NO:20; Gakhar et al., 2005); nitrobenzene dioxygenase from Comamonas sp. (PDB entry 2BMO; SEQ ID NO:17; Friemann et al., 2005); biphenyl dioxygenase from Rhodoccocus sp. (PDB entry 1ULI; SEQ ID NO:21; Furusawa et al., 2004); and naphthalene 1,2-dioxygenase from Pseudomonas sp. (PDB entry 1EG9; SEQ ID NO:22; Kauppi et al., 1998; Carredano et al., 2000). Using this information, molecular replacement (MR) phasing calculations were conducted using DMO X-ray data and all or portions, most notably the Rieske domain portions, of each of the PDB entries 2B24, 2BMO, 1ULI and 1EG9 as phasing models. This MR work was performed using the Phaser program (McCoy et al., 2005), and, to a lesser extent, the AMoRe program (Navaza, 1994). No promising MR solutions were obtained.
B. Solving the Structure Using Single Anomalous Dispersion Phasing Methods and High-Redundancy Cu (λ=1.5418 Å) and Cr (λ=2.29 Å) Anode X-Ray Data Sets.
As the DMO crystal structure could not be solved using MR methods, phases for crystallographic structure solution had to be sought by other methods. Crystallographic phasing was pursued using the method of single wavelength anomalous dispersion (SAD), taking advantage of the anomalous scattering from the sulfur (S) and iron (Fe) atoms in DMO. Data from more than one wavelength was obtained, allowing for multiple wavelength anomalous dispersion (MAD) analysis (e.g. Hendrickson, 1991). Each DMO monomer contains 16 sulfur atoms (seven from Cys residues, seven from Met residues, and two from the Rieske Fe2S2 cluster), and, maximally, three iron atoms (two in the Rieske Fe2S2 cluster and one from the non-heme Fe site). Moreover, since each DMO crystallographic asymmetric unit contains three monomers, each asymmetric can contain up to 48 sulfurs and nine iron sites for phasing. Prior sulfur SAD phasing of 44 kDa glucose isomerase, which contains nine sulfurs, and 33 kDa xylanase, which contains five sulfurs (Ramagopal et al., 2003), suggested that sulfur SAD phasing may be a plausible strategy for 39 kDa DMO, which contains 16 sulfurs. Moreover, the successful structure determination of a 129-residue Rieske iron-sulfur protein fragment using the anomalous signal from iron (Iwata et al., 1996) indicated that the Fe atoms in the Fe2S2 cluster were useful for structure solution phasing.
A 3.2 Å resolution, high-redundancy X-ray data set for sulfur SAD phasing was collected from a DMO crystal, prepared as previously described, at the Rigaku Americas Corporation headquarters (The Woodlands, Tex.). The X-ray data collection system that was employed used a MicroMax-007HF X-ray generator, Cr Varimax HR collimating optics, an R-AXIS-IV++ detector, an X-stream 2000 low temperature device, and a chromium (Cr) anode, which produces a 2.29 Å wavelength X-ray. The data collection statistics for this data set are listed in Table 11 under the header “DMO Crystal 2”. In addition, a 3.0 Å resolution, high-redundancy, X-ray data set for Fe SAD phasing was also collected. The X-ray data collection system used to obtain these data contained a MicroMax-007HF X-ray generator, Cu Varimax HR collimating optics, a Saturn 944 CCD detector, an X-stream 2000 low temperature device, and a copper (Cu) anode, which produces a 1.5418 Å wavelength X-ray. The data collection statistics for these data are listed in Table 11 under the header “DMO Crystal 3”.
Sulfur SAD phasing calculations using 20-3.2 Å DMO Crystal 2 data were conducted with the program SOLVE (Terwilliger & Berendzen, 1999), and subsequent density modification calculations were conducted with its partner program RESOLVE (Terwilliger, 1999). Calculations were performed in both space group P31 and P32. No encouraging phasing solutions and density maps resulted from this work, indicating that the crystal structure of DMO is distinct from that of other known Rieske non-heme iron oxygenase family members in this regard.
Iron SAD phasing calculations were next conducted using the 20-3 Å DMO Crystal 3 data and the SOLVE and RESOLVE programs, and in both space group P31 and P32. SOLVE calculations in both space groups produced three ‘Fe’ sites, but only the electron density map resulting from the P32 work had the features of a promising protein structure map, with clear protein-solvent boundaries and peptide paths. This map evaluation work, and all subsequent crystallographic map work, was done with program O (Jones et al., 1991), using a linux workstation with stereo-graphics capabilities.
Further evaluation of this 20-3 Å P32 map from Fe SAD phasing revealed several noteworthy features. Each ‘Fe’ site clearly represented an individual Fe2S2 cluster in the structure, and an equilateral triangle arrangement of these sites, with sides of ˜50 Å in length, was located in the map. Similar intermolecular Fe2S2 cluster arrangements have been observed in other known dioxygenase crystal structures, such as the naphthalene 1,2-dioxygenase structures from Pseudomonas sp. (PDB entries 1NDO and 1EG9; SEQ ID NO:22;) and from Rhodoccocus sp. (PDB entries 2B24 and 2B1X; SEQ ID NO:20). From this realization, and by using the Rieske cluster in 2B1X as a guide, the peptide stretches that covalently interact with the Rieske cluster, residues 42-58 and 68-82, as well as the Rieske cluster itself, denoted 501, were built into the density map. Once the three Rieske clusters in the DMO asymmetric unit were defined, 20-3 Å program SOLVE phasing was performed using six Fe atoms (instead of the initial three Fe atoms), followed by program RESOLVE density modification. The resulting Fourier map was significantly improved over the initial one, with better peptide path definition and side chain clarity. From this density map, and by concentrating on building DMO structure into just one DMO molecule in the asymmetric unit, a preliminary DMO structure containing 193 (residues 1-130; 146-155; 272-324) of the 349 total residues and the Fe2S2 cluster was built.
To improve the quality of the DMO density map for further map-fitting, the coordinates of the three Fe2S2 clusters were included in subsequent 20-3 Å resolution RESOLVE density modification work to allow non-crystallographic symmetry (NCS) averaging to be performed; from these Fe2S2 coordinates, the program could define the 3-fold NCS symmetry axis relating the individual DMO monomers in the DMO trimer, and use this axis for density averaging. Using NCS-averaging improved the Fourier map quality. In addition, calculating 20-3.2 Å difference anomalous Fourier maps using the Cr anode X-ray data and these phases yielded strong (>3.5σ) peaks at the location of well-defined sulfur atoms (found in Cys and Met residues, and in the Fe2S2 clusters) in the map. With these enhancements, further map fitting allowed 83% of the DMO structure to be built: residues 2-157; 196-234; 247-269; 278-343; and the 501-Fe. The Phaser MR program was then used to locate the remaining two DMO molecules in the asymmetric unit, and this was followed by 20-3 Å crystallographic refinement using the CNX program (Accelrys, Inc., San Diego, Calif.). The CNX program is based on the once widely used program X-PLOR (Briinger, 1992a). This refined DMO structure has an Rwork=30.8% and an Rfree=35.0% for 20-3 Å data. 90% of the X-ray data were used to calculate the Rwork and 10% of the X-ray data were used to calculate the Rfree (Brünger, 1992b). The complete refinement statistics are listed in Table 12 under ‘DMO Crystal 3’. Ribbon-style renditions of the initial DMO trimer structure are displayed in
C. Pursuing a DMO Structure with the Non-Heme Iron Site Occupied and Dicamba or DCSA Bound.
When the structure of DMO Crystal 3 was solved, conspicuously absent from the structure was the iron (Fe) ion that must bind to the non-heme or catalytic iron site. In all known Rieske non-heme iron oxygenase (RO) structures to date, electron transfer in the RO α3 or α3β3 quaternary unit involves a flow of electrons from the Fe2S2 Rieske center in one subunit to the mononuclear iron, ˜12 Å away, in a neighboring subunit (Ferraro et al, 2006). This iron site is chelated by two histidines and a single aspartic acid, and it is at this site that molecular oxygen and an aromatic substrate react to and produce the oxidized product (Ferraro et al, 2006). Although the invention is not bound by any particular mechanism for electron transport and catalysis, it is believed that electron transfer and catalysis involving dicamba likely occurs by a similar route in DMO, and it was decided to obtain a crystal structure with the non-heme iron site occupied.
As a first step toward getting the non-heme iron site occupied in the DMO crystals, crystallizations were performed as before, but with 5 mM Fe+2 and 5 mM Fe+3 included in the droplets, as well as 5 mM of other isostructural divalent ions, like Mn+2, Co+2, Ni+2, Cu+2 and Zn+2. Crystals resulted only from droplets including Co+2, Ni+2 and Mn+2. X-ray data were collected on a DMO crystal grown in 5 mM Ni+2 and soaked in a cryo-solution, similar in composition to the one noted previously, but containing 5 mM Ni+2 for 1.7 hours prior to cryo-cooling. The data collection summary and structure solution statistics for this crystal are listed in Table 12 under “DMO Crystal 4”. The X-ray data from DMO Crystal 4 were collected employing an X-ray synchrotron (SER-CAT 22-BM beamline; Argonne National Laboratory, Argonne, Ill.) at a wavelength of 1.000 Å, using a Mar225 CCD detector (Mar USA, Evanston, Ill.).
The refined structure for DMO Crystal 4 revealed no evidence of a non-heme iron site being occupied within ˜12 Å of the Fe2S2 clusters in the DMO trimer, as was the case in the refined structure of DMO Crystal 3 (
The difficulty in obtaining a DMO crystal structure with the non-heme iron site occupied led to a review of all aspects of the DMO structure determination process, and to a review of the RO crystallization literature. This review suggested that use of citrate buffer in both the crystallizations and crystal cryo-solutions could be the root cause. Citric acid is known to bind a variety of divalent metal ions, including ions of magnesium, calcium, manganese, iron, cobalt, nickel, copper, and zinc (Dawson et al., 1986). Additionally, other RO oxygenases that crystallize in the pH 5.5-6.5 range and have solved structures with the non-heme iron site occupied, such as naphthalene 1,2-dioxygenase from Pseudomonas sp. (Kauppi et al., 1998; PDB entry INDO; SEQ ID NO:22) and nitrobenzene dioxygenase (PDB entry 2BMO; SEQ ID NO:17), were crystallized using the MES or HEPES buffers, which have low to negligible metal ion binding properties (Good & Izawa, 1968).
Growth of DMO crystals using either MES or acetate buffers in the crystallization, initially yielded crystals of lesser size and visual quality than those grown using citrate buffer. However a crystal structure of DMO with the non-heme iron site occupied was pursued by first equilibrating DMO crystals in a cryo-solution containing MES buffer and then equilibrating the DMO crystals in a cryo-solution containing MES and Fe+2. DMO crystals, grown by means already described, were soaked in a cryo-solution of 23% PEG 4K, 23% glycerol, and 0.1M MES-pH 6.0 buffer for ˜16 hours and were then transferred to a cryo-solution that was 20.7% PEG 4K, 20.7% glycerol, 0.09 M MES-pH 6.0, and ˜10 mM FeSO4 for ˜9 hours. X-ray data were collected on an Fe+2-soaked DMO crystal using the Cu-anode X-ray data collection system at a wavelength of 1.5418 Å, and in a manner previously described in this example. These data were processed and the structure solved by previously noted methods. The data collection and refinement stats for this crystal are listed in Table 12 under “DMO Crystal 5”, and the active site region of this crystal structure is displayed in
Evaluation of the ‘DMO Crystal 5’ X-ray data indicated success in achieving Fe+2-binding at the non-heme iron site of the crystal. As has been noted previously, iron has a significant anomalous scattering signal at the 1.5418 Å wavelength used in the data collection, and anomalous difference Fourier maps calculated using these data revealed a strong (>3.5σ) peak at the Fe+2 site position in all three molecules in the crystallographic asymmetric unit. This is strong evidence of non-heme iron binding. The structural results in ‘DMO Crystal 4’ and ‘DMO Crystal 5’ indicate that in the absence of an ion to fill the non-heme iron site, the peptide from residues 157-162 adopts an extended conformation (
D. Crystal Structure of DMO with Substrate or Product.
A 2.7 Å resolution crystal structure of DMO was next obtained, with the non-heme iron and substrate binding sites occupied in two molecules of the crystallographic asymmetric unit. The dicamba and Fe+2 ions were introduced into pre-formed protein crystals by soaking. The crystals were growing by previously noted methods using 16-20 (w/v) % PEG 6,000 and 0.1 M sodium acetate buffer, pH 6.0, as the precipitating agent. Crystals were then transferred to a stabilization solution that contained 20.4 (w/v) % PEG 6,000, 20.4% glycerol, 0.09 M HEPES-pH 7 buffer, 1.25 mM dicamba, and ˜10 mM FeSO4. The crystal used for X-ray data collection was stored in this solution for 30 hours (1.25 days) prior to cryo-cooling. Data collection and structure solution statistics for this crystal are listed in Table 13 below under “DMO Crystal 6”. Atomic coordinates are given in Table 4. The structure solution statistics for the crystal bound to DCSA are listed in Table 13 below under “DMO Crystal 7”. Atomic coordinates are given in Table 5. Both data sets for DMO Crystal 6 and DMO Crystal 7 were collected on the SER-CAT 22-ID beamline at the Advanced Photon Source synchrotron, (Argonne National Laboratory, Argonne, Ill.). The X-ray wavelength employed was 1.000 Å, and these data were collected using a Mar300 CCD detector (Mar USA, Evanston, Ill.).
The DMO-dicamba crystal structure (“DMO Crystal 6” crystal structure), was solved by the molecular replacement (MR) method using all the crystal Fobs X-ray data from 35.9-2.70 Å resolution with |Fobs|/σ|Fobs|>2.0, the coordinates for DMO molecule “a” from the DMO Crystal 5 coordinates as the search model, and the Phaser program to perform the MR phasing. Evaluation of the initial 2Fo-Fc density maps revealed that molecule “a” of the trimer contained no evidence for non-heme iron binding and that the peptide stretch from residues a159-a175 was disordered. However, this map also revealed evidence of non-heme iron binding and indications of dicamba binding in molecules “b” and “c”. As a consequence of this observation, the first modeled DMO segment of molecule “a” was trimmed from a2-a175 to a2-a158. After two additional rounds of map-fitting and refinement, a refined structure was obtained which revealed clear, well-defined electron density for dicamba in DMO molecules “b” and “c” of the asymmetric unit.
It was also possible to obtain a 2.8 Å resolution crystal structure of DMO with clear electron density for the non-heme iron site occupied and DCSA bound in two molecules of the crystallographic asymmetric unit. DCSA is an acronym for 3,6-dichlorosalicylic acid, and DCSA is the product resulting when DMO dealkylates dicamba. The DCSA and Fe+2 ions were introduced into pre-formed protein crystals by soaking. The crystals were growing by previously noted methods using 16-20 (w/v) % PEG 6,000 and 0.1 M sodium acetate buffer-pH 6.0 as the precipitating agent. The crystals were then transferred to a stabilization solution that contained 20.4 (w/v) % PEG 6,000, 20.4% glycerol, 0.09 M HEPES-pH 7 buffer, 1.25 mM DCSA, and ˜10 mM FeSO4. The crystal used for X-ray data collection was stored in this solution for 30 hours (1.25 days) prior to cryo-cooling. The data collection and structure solution statistics for this crystal are listed in Table 13 above, under “DMO Crystal 7”.
The DMO-DCSA “DMO Crystal 7” structure was solved by the MR method using the crystal Fobs X-ray data file, the coordinates for DMO molecule “a” from the DMO Crystal 6 coordinates, and the Phaser program. The Phaser search model used DCSA rather than dicamba, and this was prepared by removing the methoxy carbon from the dicamba coordinates. Evaluation of the initial 2Fo-Fc density maps revealed that molecule ‘b’ contained no evidence for non-heme iron binding and that the peptide stretch from residues a159-a175 was disordered. However, this map also revealed evidence of non-heme iron binding and indications of DCSA binding in molecules ‘a’ and ‘c’. The density for DCSA binding was strongest in molecule ‘a’, and is only partially complete in molecule c. As a consequence of this observation, in the second round of refinement DMO molecule b contained only residues b2-b158 for the first protein segment, and no longer included the non-heme iron ion or DCSA. After one additional round of refinement, the ‘DMO Crystal 7’ structure resulted.
ICM pro from MolSoft (ver3.4-7h; Molsoft, LLC; La Jolla, Calif.) was used to probe the interaction domains in the DMO Crystal 5 structure. Default settings were used for the identification of the contact regions and the results were examined by hand to identify those residues on the protein subunit interface.
While a DMO monomer alone will likely perform a single turnover, for full catalysis interaction with other subunits is necessary. In addition helper proteins are required such as an electron donor to the Rieske center (e.g. ferredoxin) and a reductase to shuttle electrons from NADH or NADPH. The interface (i.e. “interaction domain”) between subunits is described by and includes these 52 amino acids numbered from the N-terminus of a monomer: V325, E322, D321, C320, L318, M317, A316, P315, R314, I313, V308, Y307, R304, I301, A300, V297, V296, V164, Y163, H160, G159, D157, N154, D153, R98, L95, S94, G89, G87, H86, P85, N84, L73, G72, H71, Y70, P69, C68, Q67, D58, P55, A54, F53, R52, H51, P50, I48, D47, L46, P31, T30, and D29. All of these cross subunit contacts are described below with the most significant of these contacts used as the anchors for this discussion.
Key interface residues include H51a:R52:F53a:Y70a:H71a:H86a:H160c: Y163c:R304c:Y307c:A316c:L318c. These residues account for over 85% of the interface contacts (i.e. non-redundant contacts, possibly even more of total contacts since some interact with the same residues). These identified residues are thought to make up an interaction domain motif. The functional description of these residues as specific to interactions between the subunits has not previously been described. Conserved residues in the alignments (
Some of the key interactions mediated by the residues of the interaction domain include: those which involve residue H51, which is involved in the Rieske cluster (2.2 Å to Fe cluster, subunit a) and participates in a bifurcated hydrogen bond with the E322 side chain in an adjoining subunit (subunit c). H-bond (defined from donor to acceptor) interactions are between H51NE2 and E322 OE1 (2.5 Å) and OE2 (2.7 Å) respectively. H51 also has Van der Waals contacts (≦3.8 Å) with P315, A316, and L318. This is a key residue for Rieske center binding and electron transport and participates in the interface region.
The R52 side chain is adjacent to H51, a member of the Rieske cluster. The main chain NH appears to have a hydrogen bond (3.2 Å) with the Rieske center sulfur. All of the N atoms of the R52 side chain are with in 3.4 to 3.8 Å of E322 side chain oxygen atom OE1 and up to 4.6 Å away from OE2. The side chain has various contacts of less than 4.3 Å with adjacent subunit residues D153, P315, M317, and V325.
The F53 side chain of subunit a inserts into a hydrophobic cavity defined by the following subunit a residues: P315, R314, I313, V308, Y307, I301, and the main chain of R304. These hydrophobic contacts range from 3.8 to 4.3 Å. This is a significant grouping of hydrophobic contacts which is likely a key anchor to this portion of the interface. This residue is not conserved in other oxygenases of this type which typically utilize much smaller residues in this position (G, L). Its proximity to H51 (<4.0 Å in same subunit) and the non-heme iron suggest this could play a role in excluding solvent from this side of the non-heme iron site.
Y70 is less than 6.0 Å away from the non-heme iron of the adjoining subunit and less than 4.0 Å from the Rieske center. The side chain is involved in Van der Waals contacts/hydrophobic interactions of 4.0 Å or less with the in subunit c side chains H160, I301, V297, V164 and the main chain of Y163. The Y70 (OH) has a polar contact with N154 OD1 (or ND1) (2.5 Å) from subunit c. N154 ND1 is only 3.3 Å away from the non-heme iron molecule and provides a possible ligand interaction. V297 CG1 side has Van der Waals contacts with A54a CB (3.8 Å). A54a engages in Van der Waals contacts with I301 (4.3 Å). These residues form a series of hydrophobic contacts along an interface helix.
The H71 side chain NE2 in subunit “a” engages in a hydrogen bond with the OD1 of D157 (2.8 Å), in subunit “c”, which is in turn hydrogen bonded to H160 ND1 through D157 OD1 (2.8 Å). This is an important interaction analogous to that in NDO (Kauppi, 1998;
H86 is 6.1 Å from the Rieske center and its ND1 and CE1 atoms have Van der Waals contacts of 3.8-4.4 Å with the underside of the side chain of D321 of the adjoining subunit. H86 side chain atoms also have cross subunit Van der Waals/hydrophobic interactions of less than 4.3 Å with G159, Y163, C 320 and L318. Few other amino acid residues would fit in place of G159 in this structure.
The H160 residue is important to binding the non-heme iron. It is also on the interface between the subunits and its side chain methylene group is flanked by Y70 and H71 of the neighboring subunit and V297 of the same subunit. These three residues are highly conserved among all oxygenases and this pocket of Y70a, H71a and H160c is likely critical to the function of this enzyme.
Y163 is highly a solvent exposed residue in subunit “c”. It forms contacts with subunit a through hydrophobic and Van der Waals interactions (4 Å or less) with 8 residues: Q67, C68, P69, Y70 (main chain), G72, H71 (main chain), P85, and H86; most from the same face of the aromatic residue side chain. It appears to shield H71 and possibly C68 (Rieske center cysteine) from solvent although it interacts with the main chain of this residue. Interestingly it is juxtaposed with the Y70 residue in subunit a and both are likely key interactions for keeping the surfaces together and shielding the non-heme iron center. Y163 appears to have conserved function i.e. hydrophobic aromatic while Y70 appears to be universally conserved in the most closely (although with a limited over all homology, i.e. about 37%) related oxygenases. This is a significant residue that may be part of a motif. Structural information thus provides insight into its function (e.g. as a key interaction residue and for solvent exclusion from the Rieske site).
The subunit “c” R304 side chain and the subunit “a” D47 side chain engage in an polar interaction (salt bridge or hydrogen bond), the distances between adjacent side chain atoms ranges from 3.0 to 3.7 Å. The NH1 N of R304 appears to be in position to form a hydrogen bond with the carbonyl O of D47 of 3.0 Å in length.). The I48 side chain has multiple contacts (4.0 and 5.0 Å) with the R304 side chain. P31, P55 and A 54 side chains and residues have van der Waals contacts with R304 (3.2-4.7 Å). R304 is part of the F53 cluster of residues.
The subunit “c” Y307 residue is partially solvent exposed and has numerous van der Waals contacts with subunit a residues (4.3 Å or less) in length: D29, T30 (CG2 side chain), P31 (CD ring carbon), L46, I48, and R98 are among them. L46 and 148 (3.2 Å) appear to be the most significant of these hydrophobic interactions. It may also participate in a polar interaction with the side chain of R98 (NH1) which is only 3.9 Å away from the Y307 (OH).
The subunit “c” A316 (involved in the H51 set of residues described above) side chain lies along a hydrophobic surface coil near the subunit a Rieske cluster. The CB has hydrophobic contacts with the L95 side chain CD2 (3.7 Å), the carbonyl O of S94 (3.6 Å), and the carbonyl O of P50 (3.2 Å). Also, A316 N engages in a hydrogen bond with P50 (2.8 Å). It also has a Van der Waals interaction of 4.3 A with the F53 ring.
The L318 (subunit “c”) side chain inserts into a shallow cavity in subunit a, defined by the side chains of H51, H71, L73, N84, H86, and L95. All of these hydrophobic contacts range from 3.5 to 4.3 Å in distance. This cluster of 6 contact residues is likely very important to the protein-protein interaction of the subunits. It also appears that the L318 hydrophobic cluster shields the Rieske center from solvent. A neighboring residue, C320, has a polar contact across the subunits (3.4 Å with N84) to the outside of this cluster at the end of the C terminal helix.
Table 14 below shows the interaction residues described above along with the contact area and exposed area. “Contact area” is the area of the residue that is in contact with another residue or metal ion. The “Exposed area” is the area of the residue that is exposed to solvent and the contact area is the area of the residue that is in contact with another residue or metal ion. The stated “percent” is the ratio of contact/exposed×100. These contacts were determined using the ICM Pro program with the algorithm developed from Abagyan and Totrov (1997). The bolded portion of the table corresponds to the most variable region among the alignments of
M
317
14.3
104.9
14
x
x
A
316
65.9
92.2
71
x
x
P
315
30.7
97.4
31
K
P
R
314
5.6
182.1
3
L
V
I
313
21.9
101.3
22
L
M
V
308
33.9
59.7
57
x
V
Y
307
71.6
135.1
53
x
x
R
304
72.6
146.7
49
Q(N)
Q(I)
I
301
61.5
88.1
70
Q
Q
A
300
17.9
38.1
47
R
A
V
297
32.4
73.2
44
M, I
M, I
Five additional residues are involved in subunit interactions: A300, V296, G89, G87, and D58. The A300c CB has van der Waals contacts of 3.4-4.1 Å with P55 ring carbon atoms. P55 is near the Rieske center and is part of the R304 interactions. The C320c SG interacts with a neighboring molecule near the subunit a Rieske cluster. It has van der Waals contacts of less than 4.3 Å with N84 side chain ND2, and the carbonyl oxygens of G89 and G87. The side chain nitrogen ND2 of N84 has van der Waals contacts of 3.9 Å with the C320c SG. These interactions were described in part in the description of the L318 cluster (see above). The V296c CG1 has long van der Waals contacts with the D58 and P55 on the order of 5 Å or so.
A number of residues are conserved in these sequences as described above and their conservation allows the definition of an interface structure, domain or motif. Numbering is based on the structure file residue numbers.
A. Active Site Modeling Using Structure with a Bound Non-Heme Iron
DMO Crystal 5 structure coordinates were analyzed to define the structure of the active site and the interactions of DMO with dicamba. The pdb file was loaded into Molsoft ICM-Pro, version 3.4 (MolSoft LLC, La Jolla, Calif.), and converted to a Molsoft object. Hydrogen atoms were added and optimized, and the resulting structure was defined as a docking receptor in Molsoft, with default parameters, and used to identify potential binding pockets. The largest pocket (volume 443 Å3; area 479 Å2) is in the vicinity of the non-heme iron ion. This is thought to be the dicamba binding pocket as required by the chemical constraints for dicamba demethylation. The pocket is formed by residues L155, D157, L158, H160, A161, H165, R166, A169, Q170, D172, A173, A216, W217, N218, 1220, N230, 1232, A233, V234, S247, G249, H251, S267, L282, W285, Q286, A287, Q288, A289, V291 (as shown in
A receptor map was calculated using default Molsoft parameters and dicamba docking was performed. The five lowest energy conformations of docked dicamba are shown in
The dicamba binding pocket of DMO (DdmC) was identified using Molsoft ICM-Pro (version 3.4) and the residues forming the pocket were mapped onto the primary sequence (e.g.
B. Active Site Modeling Using Structure with a Bound Non-Heme Iron and Bound Dicamba in the Crystal (Crystal 6)
In order to obtain a structure of the DMO bound to dicamba, a molecular structure with dicamba present in subunits b and c of the DMO trimer was constructed. As above, a data file in pdb (Protein database) format with atomic coordinates for DMO with bound dicamba (e.g. data of Table 4) was loaded into Molsoft ICM-Pro (version 3.4) and converted to Molsoft object (hydrogen atoms were added and optimized). The dicamba contact residues in the binding pocket with orientation elucidated from “Crystal structure 6” were calculated using default Molsoft parameters. Corresponding residues in toluene sulfonate monooxygenase and vanillate monooxygenase (aka Vanillate O-demethylase) were identified from sequence alignment for the purpose of further engineering of these oxygenases (e.g. Tables 16, 23).
The resulting structure indicates that the list of residues predicted to form the dicamba binding pocket, by modeling described above, are contained in the pocket identified based on the actual X-ray structure of DMO with dicamba. The list of predicted residues is as follows; underlined residues were identified to be within 4 Å and/or to participate in binding or to be in contact with the dicamba molecule (see
Many of the DMO active site residues forming the binding pocket and those interacting with the substrate (i.e. within a 4 Å distance) as identified by the three dimensional structure are not readily identifiable by primary amino acid sequence alignment, for instance as shown in
C. Active Site Modeling with DCSA Bound
As can be seen in
The electron transfer distances in the DMO Crystal 5 coordinates are listed in Table 19 below. The electron transfer path from a Rieske center to the non heme iron center with an activated oxygen and ultimately resulting in the oxidation of the dicamba to DCSA are described, using the atomic coordinates of Crystal 5, for the three active sites in the entire trimer with approximate atomic distances, and “A”, or “B” or “C” denoting in which monomer the given residue is found.
The entire set of residues that forms the extended electron transport chain is H71, N154, D157, H160, H165 and D294 numbered corresponding to SEQ ID NO:2 or SEQ ID NO:3, or conservative substitutions thereof. These residues constitute a motif and the distances and arrangements above constitute a necessary element of the functional catalytic enzyme. On average the distance for Fes FE2 to His71 ND1 is 2.57 ű0.15; the distance for the His71 NE2 to Asp157 OD1 is 3.00 ű0.20, the distance for Asp157 OD1 to His160 ND1 is 2.80 Å, and the distance for His 160 NE2 to Fe is 2.43±0.12. These distances may vary by about 0.2-0.3 Å.
The closest potential homologs of DdmC were identified by Blast search (Altschul et al., 1990), selected for chemical similarity of substrate and reaction, and aligned (Molsoft ICM-Pro, version 3.4; Molsoft LLC, La Jolla, Calif.). Identity and similarity tables for the sequences used is shown below appendix. These were aligned using the ZEGA algorithm inside ICM pro for multiple sequence alignments. Based on the alignment, several regions for degenerate oligonucleotide tail (‘DOT’;
The sets of DOT primers (SEQ ID NOs:46-151) introducing the amino acid combinations indicated below were designed and used to introduce mutations into the ddmC gene by means of terminated PCR on the template (ddmC gene with His tag and two changes, T2S+I123L (SEQ ID NO:23) or V4L+L281I (SEQ ID NO:42) in pMV4 vector (Modular Genetics, Cambridge, Mass.). Resulting PCR products were treated with DpnI to remove the parental template molecules, self-annealed and transformed into chemically competent E. coli Top10 F′ (Invitrogen, Carlsbad, Calif.). Standard DNA cloning methods were utilized (e.g. Sambrook et al., 1989). The individual colonies were grown in liquid culture. DNA was isolated by a standard miniprep procedure and used to transform the chemically competent E. coli (e.g. BL21(DE3)).
An LC-MS/MS screen for oxygenase activity was used to detect DCSA. The method comprises a two stage process made up of a liquid cell culture assay coupled to an LC-MS/MS detection screen. In the liquid cell culture stage, the gene of interest, i.e. ddmC or a variant thereof, was cloned under the control of a promoter (e.g. T7 promoter, pET vector) and transformed into an E. coli host cell. The transformed E. coli cells harbouring the gene of interest were then grown in LB/carbenicillin media containing 200 to 500 μM Dicamba (30 hrs, 37° C., shaking at 450 RPM). Cells were spun down, and the supernatant was filtered and diluted tenfold with the inclusion of 8 μM salicylic acid as an internal standard. Samples of the supernatant, and/or optionally the cell pellet (or lysate thereof), from this first stage were analyzed for DCSA levels (i.e. DMO activity level) by LC-MS/MS.
This method was used to rapidly screen and provide feedback regarding enzymatic activity for use in protein design and engineering procedures, or to screen libraries of genes (e.g. bacterial genes) for activity toward dicamba or other similar substrates. While many oxygenases such as DMO are multi component systems requiring other helper enzymes for activity, it was observed that components in E. coli may substitute for these helper enzymes or functions. Thus, transformation of E. coli with a single “oxygenase” gene from a multi component system nevertheless results in measurable activity for the gene alone, even without other components being co-transformed into the same E. coli cell. Activity is observed because, in E. coli cells, surrogates with homology to the original helper enzymes (e.g. ferredoxin and reductase) may be utilized. Additionally, E. coli can take up substrate (e.g. dicamba) and excrete the product (e.g. DCSA) into the media supernatant, allowing for the speed and simplicity of this cell based screen. No lysis of cells is required. Alternatively, an HPLC-based assay for DMO activity may be utilized. Promising variants were used as templates for additional rounds DOT and or other mutagenesis methods in an iterative manner.
Table 20 illustrates an identity and similarity table calculated using, for instance, NEEDLE a pairwise global alignment program (GAP, based on the Needleman-Wunsch global alignment algorithm to find the optimum alignment (including gaps) of two sequences when considering their entire length), and aligned as shown in
A. Non-Heme Iron and Electron Transport Variants
The DMO crystallographic data so far have revealed that His160, His165, and Asp294 are involved in chelating the non-heme iron ion, and that Asn154 plays an ancillary role in the non-heme iron chelation, and this can be seen in
Based on the DMO three dimensional structure, five amino acid mutants interfering with electron transport and non-heme iron coordination were initially suggested: N154A, D157N, H160N, H165N, D294N. Five pairs of GeneDirect primers (SEQ ID NO:32-41; Table 22) introducing the individual mutations were designed and used as PCR primers on the template, a ddmC gene with two changes, T2S and I123L (SEQ ID NO:23), cloned in the pMV4 vector (Modular Genetics, Inc. Cambridge, Mass.). The resulting PCR products were treated with DpnI to remove the parental template molecules, self-annealed and transformed into chemically competent E. coli Top10 F′ (Invitrogen, Carlsbad, Calif.). The individual colonies were grown in liquid culture and DNA was isolated by a standard miniprep procedure and used to transform chemically competent E. coli BL21(DE3). The E. coli culture was grown in LB/carbenicillin media containing 500 μM Dicamba (30 hrs, 37° C., 450 RPM). Cells were spun down, and the media was filtered and diluted tenfold with 8 μM salicylic acid as an internal standard. The samples were frozen and DCSA levels (i.e. DMO activity level) were determined by LC-MS analysis. Results are shown in Table 21, demonstrating that these residues participate in electron transport and non-heme iron coordination.
These data clearly confirm the importance of residues N154, D157, H160, H165, or D294 to the functioning of DMO. Mutating any of the residues implicated in non-heme iron binding—H160, H165, & D294—leads to an inactive enzyme relative to the wild type (WT) enzyme, and mutating N154, which also appears to play a role in iron chelation, yields an enzyme with only 2% or less activity relative to the WT. Moreover, mutating D157, the aspartate residue responsible for electron transfer between DMO's protein subunits, to Asn results in an enzyme with only 23% activity relative to the WT. This indicates that N157, which is iso-structural to D157, still has some minor electron transfer capabilities relative to the WT enzyme. Additional variants and their activities are also described in Example 12.
B. C-Terminal Helix Variants
The C-terminal helix of the DdmC protein is defined by the following residues (AAVRVSREIEKLEQLEAA crystal structure residue numbers 323-340; SEQ ID NO:24;
Additional changes were also made in the IEKLEQLE (SEQ ID NO:25) region (SEQ ID NOs:26-31) which includes this residue; some examples are shown in Table 23. None of these mutants showed detectable activity in an in vivo screen. However, some residues in this region may be changed while retaining DMO activity.
C. Interface Residue Variants
The F53 side chain of subunit a inserts into a hydrophobic cavity as mentioned above. This residue is not highly conserved in other oxygenases of this type. Site-directed mutagenesis and activity assays indicate that residue F53 can be altered, for instance to histidine and to leucine, which are functionally equivalent to phenylalanine for hydrophobic interactions, and retain some activity. Additional variants are described in Examples 12-13.
Polypeptides encoded by genes for Toluene sulfonate mono-oxygenases (“TolO's” and like) and Vanillate O-demethylases (“VanO's” and like; Table 24, (SEQ ID NOs:4-22) were aligned with DMO (SEQ ID NO:1) to evaluate available oxygenase enzymes with the highest known degree of identity or similarity to DMO (e.g. Table 20,
Rhodopseudomonas palustris
Xanthomonas campestris pv.
vesicatoria str. 85-10
Pseudomonas syringae pv.
Comamonas testosteroni
Pseudomonas maltophilia
Pseudomonas fluorescens Pf-5
Bradyrhizobium sp. BTAi1
Acinetobacter sp. ADP
Pseudomonas sp. vanA gene
Pseudomonas sp.
Ralstonia eutropha JMP134
Pseudoalteromonas atlantica
Comamonas testosteroni
Comamonas testosteroni
Burkholderia cepacia
Comamonas sp.; strain: Js765
Rhodococcus sp.
Rhodococcus sp
Pseudomonas sp
Ralstonia solanacearum strain
Additional DMO crystal soaking and structure determination has yielded refined structures with Co+2 in the non-heme iron site at higher resolution, and coordinates as shown in Table 25 (“DMO Crystal 11”), with Rwork=27.0% and Rfree=30.2% for 20-2.05 Å data, and Table 26 (“DMO Crystal 12”), with Rwork=24.5% and Rfree=27.9% for 20-2.05 Å data. “Crystal 11” represents the refined structure with cobalt and dicamba, while “Crystal 12” represents the refined structure with cobalt and DCSA. The data collection and refinement statistics for DMO Crystal 11, and DMO Crystal 12, are listed in Table 27.
These structures (Crystals 11 and 12) have Co+2 in the non-heme iron site instead of Fe+2. The Co+2, dicamba, and DCSA were introduced into pre-formed protein crystals by soaking methods similar to those used to obtain the DMO-Fe+2-dicamba and DMO-Fe+2-DCSA structures, which have already been described. The crystals were grown by previously noted methods. The DMO-Co+2-dicamba crystals were soaked in stabilization solutions containing 10 mM CoCl2 and 1.25 mM dicamba for 24 hours prior to cryo-cooling. The DMO-Co+2-DCSA crystals were soaked in stabilization solutions containing 10 mM CoCl2 and 1.25 mM DCSA for 24 hours prior to cryo-cooling. The data and refinement statistics for the DMO-Co+2-dicamba crystal structure are listed under “DMO Crystal 11” in Table 27 and those for DMO-Co+2-DCSA are listed under “DMO Crystal 12” in Table 27. Data sets for DMO Crystal 11 and DMO Crystal 12 were collected on the SER-CAT 22-ID beamline at the Advanced Photon Source synchrotron, (Argonne National Laboratory, Argonne, Ill.). The X-ray wavelength employed was 1.000 Å, and these data were collected using a Mar300 CCD detector (Mar USA, Evanston, Ill.).
Sets of saturation mutagenesis primers (using NNS degenerate triplets) for 31 residues at the active site and involved in the electron transfer chain as indicated in Table 28 were designed and used to introduce mutations into the ddmC gene by means of terminated PCR on the template (ddmC gene with His tag and two changes, T2S+I123L (SEQ ID NO:23) in pMV4 vector (Modular Genetics, Cambridge, Mass.). Resulting PCR products were treated with DpnI to remove the parental template molecules, self-annealed and transformed into chemically competent E. coli Top10 F′ (Invitrogen, Carlsbad, Calif.). Standard DNA cloning methods were utilized (e.g. Sambrook et al., 1989). The individual colonies were grown in liquid culture. DNA was isolated by a standard miniprep procedure and used to transform the chemically competent E. coli (e.g. BL21(DE3)). The E. coli culture was grown in LB/carbenicillin media containing 200 to 500 μM Dicamba (30 hrs, 37° C., 450 RPM). Cells were spun down, and the supernatant was filtered and diluted tenfold with 8 μM salicylic acid as an internal standard. The DCSA levels (i.e. DMO activity level) were determined by LC-MS analysis as described above (e.g. Example 8).
Determination of activity of mutants at specific residues as shown in Table 28 indicated that certain residues tolerate changes, while others did not. Interestingly, while N154 is outside of the 5 Å sphere for the substrate based on the structural determination, it is within the chelating sphere for the non-heme iron. N154 is though to play a role in metal binding and possibly in modulating the activation of oxygen. Substitution at L158 resulted in loss of activity. H160 could not be changed to any other residue tried while retaining >2% activity as compared to the wild type enzyme. H160 is thus a key Fe ligand and electron transfer residue required for activity. Likewise, substitution at H165, another key Fe binding residue, also resulted in loss of activity in all cases except one, when M was substituted for very low activity. Substitution at I232, a hydrophobic contact to substrate/product in the active site, only resulted in retention of appreciable activity when the conservative substitution, to Val, was made. Substitution at G249 resulted in loss of most activity. Substitution by a larger residue at G249 appears to be sterically unfaforable and likely interferes with hydrogen bonding to the substrate and/or product by neighboring residues. No good substitutions were found for T250, H251, Y263. and F265. All residues play a role in forming the three dimensional inner and outer sphere of the active site and H251 contributes a key polar contact, in this case a hydrogen bond. S267 and L282 were also inflexible toward substitution, i.e. showed a complete or nearly complete loss of activity following almost all substitutions. Refined crystal structure information indicated that W285 is a residue for substrate and product binding. It engages in Van der Waals contact with substrate/product at the active site. The saturation mutagenesis results confirm this in that no viable substitution for W285 was observed while retaining any appreciable activity. L290 was also generally intolerant of substitutions, with only one variant, comprising the conservative substitution I, retaining >50% activity as compared to wild type.
In contrast, certain residues tolerated a degree of substitution while retaining moderate activity or even demonstrating increased enzymatic activity. Thus, for instance, substitution at A169, N218, S247, R248, L282, G266, A287, or Q288 could yield a variant enzyme with >50% of wild type activity, while substitution(s) at A169, N218, R248, G266, L282, A287, or Q288, resulted, in at least some instances, in variant enzymes with activity increased above the control level. In particular, R248 and A287 showed a high flexibility for substitution while retaining activity or even showing increased activity. Although most substitutions at G266, a residue in the outer part of the carboxylate binding pocket, resulted in loss of activity, G266S was more active than the control.
Additional single substitution variants of DMO at residues corresponding to interaction (e.g. subunit interface) domains and the ferredoxin binding site were also prepared and assayed for their effect on enzymatic activity (See also Examples 8A, 8B) by the LC/MS screen. In most cases shown in Table 29, one conservative and one neutral substitution was tried at a given residue. None of the tested variants displayed enzymatic activity greater than the wild type, and many substitutions resulted in >50% loss of activity.
— +/− 1.22 (4)
— +/− 1.6 (4)
— +/− 1.36 (4)
— +/− 0.98 (4)
— +/− 0.09 (4)
— +/− 1.59 (4)
— +/− 4.99 (4)
— +/− 3.54 (4)
— +/− 0.8 (4)
— +/− 0.07 (4)
— +/− 0.08 (4)
— +/− 0.07 (4)
— +/− 0.03 (4)
— +/− 0.06 (4)
— +/− 0.03 (2)
— +/− 3.84 (8)
— +/− 2.61 (8)
— +/− 0.27 (4)
— +/− 0.38 (4)
— +/− 1.01 (4)
— +/− 0.79 (4)
— +/− 2.34 (4)
— +/− 0.15 (4)
— +/− 1.11 (4)
— +/− 0.26 (4)
— +/− 1.79 (4)
— +/− 3.41 (4)
— +/− 1.42 (4)
— +/− 2.64 (4)
— +/− 3.06 (4)
— +/− 1.5 (4)
— +/− 1.15 (4)
— +/− 3.35 (4)
— +/− 0.78 (4)
— +/− 0.68 (4)
— +/− 1.06 (4)
— +/− 0.37 (4)
— +/− 2.12 (4)
— +/− 0.04 (4)
— +/− 0.65 (4)
— +/− 0.36 (4)
— +/− 0.54 (4)
— +/− 0.16 (4)
Table 30 lists DMO activity data (by LC/MS screen) for 58 variants relative to ddmC_M—6his (the wild type ddmC sequence with an N-terminal His tag (SEQ ID NO: 154), while Table 31 lists DMO activity data (by LC/MS screen) for 1685 variants relative to ddmC_RLE6his (the wild type ddmC sequence with T2S and I123L mutations (SEQ ID NO:23), and a C-terminal 6×His tag). The numbering of residues is based on the numbering found in SEQ ID NO:2 or SEQ ID NO:3.
The references listed below are incorporated herein by reference to the extent that they supplement, explain, provide a background for, or teach methodology, techniques, and/or compositions employed herein.
This application claims the priority of U.S. provisional application Ser. No. 60/884,854 filed Jan. 12, 2007, and U.S. provisional application Ser. No. 60/939,278, filed May 21, 2007, the entire disclosures of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60884854 | Jan 2007 | US | |
60939278 | May 2007 | US |