Cell wall proteins play important roles in regulating cell wall extensibility which in turn controls cell enlargement. Among cell wall proteins studied to date, expansins are unique in their ability to induce immediate cell wall extension in vitro and cell expansion in vivo. Expansins are extracellular proteins that promote plant cell wall enlargement, evidently by disrupting noncovalent bonding between cellulose microfibrils and matrix polymers (McQueen-Mason, S., et al., (1994) Proc. Natl. Acad. Sci. USA 91:6574-6578; McQueen-Mason, S., et al., (1992) Plant Cell 4:1425-1433).
Since their first isolation from cucumber hypocotyls, expansin proteins have been identified in many plant species and organs on the basis of activity assays and immunoblotting. Examples include tomato leaves, oat coleoptiles, maize roots, rice internodes, tobacco cell cultures, and various fruits. The original sequencing of cucumber expansin cDNAs has impacted our understanding of expansins in several respects. First, expansin genes have now been identified in many other plant species, and they appear to be restricted largely to the plant kingdom. Second, expansins comprise a large multigene family in the plant species. For example, in Arabidopsis, 31 expansin genes have been identified. Third, studies of expression and localization of expansin mRNA are providing new insights and hypothesis concerning the developmental roles of specific expansin genes. And fourth, sequence comparisons have led to the discovery that another group of proteins known previously as group-1 grass pollen allergens, have expansin activity. These pollen-specific proteins are closely related to a group of sequences known primarily from expressed sequence tag (EST) databases. These EST sequences, together with the group-1 pollen allergens, have now been classified as beta-expansins, whereas the original group of expansins are now classified as alpha-expansins. The α-expansins are described in U.S. Pat. Nos. 5,959,082 and 5,990,283 to Cosgrove et al., which are herein incorporated by reference. β-expansins, in general, are the subject of a previously filed U.S. patent application Ser. No. 09/071,252 filed May 1, 1998. Although these two expansin families have only about 20% amino acid identity, they are similar in size, they share a number of conserved motifs, and they have similar wall-loosening activities.
To date, most studies have focused on α-expansins, and limited work has been done on β-expansins. A soybean cytokinin-induced gene known as CIM1 is now classified as a β-expansin, but the biological function of the CIM1 protein is uncertain. The maize group-1 pollen allergen, Zea m1, has wall-loosening activity with high specificity for grass cell walls. This β-expansin is hypothesized to aid fertilization by loosening the cell walls of the stigma and style, thereby facilitating penetration of the pollen tube. Many other β-expansin sequences are found in the rice EST databases, and most of these sequences come from cDNA libraries made from young seedlings and other plant materials that do not contain pollen. Thus, their biological functions clearly differ from those of the group-1 pollen allergens. These so-called vegetative β-expansins are hypothesized to function in cell enlargement and other processes where wall loosening is required. It is notable that the rice EST collection contains at least 75 entries representing at least 10 distinct β-expansin genes. In contrast, only a single Arabidopsis EST is classified as a β-expansin (although a total of five β-expansin genes are found in the Arabidopsis genome). The disparity in the number of β-expansin entries in the rice and Arabidopsis EST collection, together with the specificity of Zea m1 activity for grass walls, leads to the proposal that β-expansins have evolved specialized function in conjunction with the evolution of the grass cell wall, which has a distinctive set of matrix polysaccharide and structural proteins compared with other land plants. If this is true, one would expect to find an abundance of β-expansin homology in other grasses, with expression in many tissues beside pollen.
Recently, Group 2 and Group 3 allergens (designated group 2/3 allergens or also termed HED2 proteins) have also been shown to have expansin activity. Although these allergens from grass pollen have been studied for many years by immunologists concerned with how they elicit hay fever and related allergic responses in humans, the native activity and biological roles of these proteins have not been examined. Group 2/3 grass pollen allergens are distinguished by pI and immuno-cross reactivity, but accumulating sequence information indicates that they belong to the same protein family, genes for group 2/3 allergens encode a protein with a signal peptide and a mature protein with statistically significant sequence similarity (up to 42% identity) with domain 2 of expansins, with the greatest similarly to group-1 allergen sub-class of β-expansins.
Of the two families of expansins, α- and β-group 2/3 allergens are closest in sequence to the subset β-expansins known to immunologists as the grass pollen group 1 allergens.
Once identified, however, proteins with expansin activity including β-expansins, α-expansins, and group 2/3 allergens, or HED proteins all of which are proteins capable of inducing cell wall extension, have utility not only in the engineered extension of cell walls in living plants but foreseeably in commercial applications where their chemical reactivity. Expansins can disrupt noncovalent associations of cellulose, and as such have particular utility in the paper recycling industry. Paper recycling is a growing concern and will prove more important as the nation's landfill sites become scarcer and more expensive. Paper derives its mechanical strength from hydrogen bonding between paper fibers, which are composed primarily of cellulose. During paper recycling, the hydrogen bonding between paper fibers is disrupted by chemical and mechanical means prior to re-forming new paper products. Proteins which cause cell expansion are thus intrinsically well suited to paper recycling, especially when the proteins are nontoxic and otherwise innocuous, and when the proteins can break down paper products which are resistant to other chemical and enzymatic means of degradation. Use of proteins of this type could thus expand the range of recyclable papers.
Other modes of application of expansins, include production of virgin paper. Pulp for virgin paper is made by disrupting the bonding between plant fibers. For the reasons identified above, expansins are useful in the production of paper pulp from plant tissues. Use of expansins can substitute for harsher chemicals now in use and thereby reduce the financial and environmental costs associated with disposing of these harsh chemicals. The use of expansins can also result in higher quality plant fibers because they would be less degraded than fibers currently obtained by harsher treatments.
Still other modes of applications include the production of ethanol. One of the major limitations and costs associated with ethanol production from cellulose is conversion of cellulose to simple fermentable sugars. Because of the crystalline structure of cellulose, its enzymatic conversion to sugars takes a considerable amount of time and requires large quantities of cellulase enzymes, which are expensive. Likewise for the production of chemically-modified cellulose derivatives, cellulose must be made accessible to reactive chemical agents, this usually requiring high temperature, pressures and harsh chemical conditions. Furthermore, the efficient digestion of straws, hay, and other plant materials by ruminants and other animals is limited by the accessibility of cellulose to the digestive enzymes in the animals' gut. Expansin proteins, particularly, group 2/3/allergans have been shown to made cellulose more easily degraded by cellulase enaymes.
Thus, a continuing need remains for the identification, characterization, and optimization of expansins—proteins which can be characterized as catalysts of the extension of plant cell walls and the weakening of the hydrogen bonds in the pure cellulose.
The invention relates to crystal structure and activities of Beta-expansins and grass pollen allergens and identification of key regions essential to maximize activity and to identify sequence motifs which correlate with activity.
According to the invention, Beta-expansin structure has been delineated to identify critical regions for activity. For example the β-expansin molecule consists of two domains closely packed and aligned to form a long shallow groove with potential to bind a glycan backbone. The domain has first residues 19-140 which form a protein fold, the second domain includes 147-245 composed of eight β-strands assembled into two anti-parallel sheets. Essential residues include surface aromatic residues W194 and Y160 which are in line with W25 and Y27. From this data one can extrapolate to identify essential regions of conservation to develop modified expansins with improved properties, efficiencies and the like.
It is well-known in the art of protein chemistry, that crystallizing a protein is a difficult process. In fact it is now evident that protein crystallization is the main hurdle in protein structure determination. There are many references which describe the difficulties associated with growing protein crystals. For example, Kierzek, A. M. and Zielenkiewicz, P., (2001), Biophysical Chemistry, 91:1-20, Models of protein crystal growth, and Wiencek, J. M. (1999) Annu. Rev. Biomed. Eng., 1:505-534, New Strategies for crystal growth. It is commonly held that crystallization of protein molecules from solution is the major obstacle in the process of determining protein structures. The reasons for this are many; proteins are complex molecules, and the delicate balance involving specific and non-specific interactions with other protein molecules and small molecules in solution is difficult to predict.
Each protein crystallizes under a unique set of conditions which cannot be predicted in advance. Simply supersaturating the protein to bring it out of solution may not work, the result would, in most cases, be an amorphous precipitate. Many precipitating agents are used, common ones are different salts, and polyethylene glycols, but others are known. In addition, additives such as metals and detergents can be added to modulate the behavior of the protein in solution. Many kits are available (e.g. from Hampton Research), which attempt to cover as many parameters in crystallization space as possible, but in many cases these are just a starting point to optimize crystalline precipitates and crystals which are unsuitable for diffraction analysis. Successful crystallization is aided by a knowledge of the proteins behavior in terms of solubility, dependence on metal ions for correct folding or activity, interactions with other molecules and any other information that is available. Even so, crystallization of proteins is often regarded as a time-consuming process, whereby subsequent experiments build on observations of past trials.
In cases where protein crystals are obtained, these are not necessarily always suitable for diffraction analysis; they may be limited in resolution, and it may subsequently be difficult to improve them to the point at which they will diffract to the resolution required for analysis. Limited resolution in a crystal can be due to several things. It may be due to intrinsic mobility of the protein within the crystal, which can be difficult to overcome, even with other crystal forms. It may be due to high solvent content within the crystal, which consequently results in weak scattering. Alternatively, it could be due to defects within the crystal lattice which mean that the diffracted x-rays will not be completely in phase from unit to unit within the lattice. Any one of these or a combination of these could mean that the crystals are not suitable for structure determination.
Some proteins never crystallize, and after a reasonable attempt it is necessary to examine the protein itself and consider whether it is possible to make individual domains, different N or C-terminal truncations, or point mutations. It is often hard to predict how a protein could be re-engineered in such a manner as to improve crystallizability. Our understanding of crystallization mechanisms are still incomplete and the factors of protein structure which are involved in crystallization are poorly understood.
A mathematical operation termed a Fourier transform relates the diffraction pattern observed from a crystal and the molecular structure of the protein comprising the crystal. A Fourier transform may be considered to be a summation of sine and cosine waves each with a defined amplitude and phase. Thus, in theory, it is possible to calculate the electron density associated with a protein structure by carrying out an inverse Fourier transform on the diffraction data. This, however, requires amplitude and phase information to be extracted from the diffraction data. Amplitude information may be obtained by analyzing the intensities of the spots within a diffraction pattern. Current technologies for generating x-rays and recording diffraction data lead to loss of all phase information. This “phase information” must be in some way recovered and the loss of this information represents the “crystallographic phase problem”. The phase information necessary for carrying out the inverse Fourier transform can be obtained via a variety of methods. If a protein structure exists a set of theoretical amplitudes and phases may be calculated using the protein model and then the theoretical phases combined with the experimentally derived amplitudes. An electron density map may then be calculated and the protein structure observed.
If there is no known structure of the protein then alternative methods for obtaining phases must be explored. One method is multiple isomorphous replacement (MIR). This relies on soaking “heavy atom” (i.e. platinum, uranium, mercury, etc) compounds into the crystals and observing how their incorporation into the crystals modifies the spot intensities observed in the diffraction pattern. This method relies on the heavy atoms being incorporated into the protein at a finite number of defined sites. It is a pre-requisite of an isomorphous replacement experiment that the heavy atom soaked crystals remain isomorphous. That is, there should be no appreciable alterations in the physical characteristics of the protein crystal (i.e. perturbations to crystallographic cell dimensions, or significant loss of resolution). Perturbations to the physical properties of the crystal are termed non-isomorphisms and prevent this type of experiment being successfully completed. Successful isomorphous incorporation of heavy atoms into a protein crystal results in the intensities of the spots within the diffraction pattern obtained from the crystal being modified, as compared to the data collected from an identical, unsoaked, (native) crystal. The diffraction data obtained from a successful isomorphous replacement experiment are termed a “derivative” dataset. By mathematically analyzing the “native” and “derivative” datasets it is possible to extract preliminary phase information from the datasets. This phase information, when combined with the experimentally obtained amplitudes from the native dataset, enables an electron density map of the unknown protein molecule to be calculated using the Fourier transform method.
An alternative method for obtaining phase information for a protein of unknown structure is to perform a multi-wavelength anomalous dispersion (MAD) experiment. This relies on the absorption of X-rays by electrons at certain characteristic X-ray wavelengths. Different elements have different characteristic absorption edges. Anomalous scattering by atoms within a protein will modify the diffraction pattern obtained from the protein crystal. Thus if a protein contains atoms which are capable of anomalous scattering a diffraction dataset (anomalous dataset) may be collected at an X-ray wavelength at which this anomalous scattering is maximal. By altering the X-ray wavelength to a value at which there is no anomalous scattering a native dataset may then be collected. Similarly to the MIR case, by mathematically processing the anomalous and native datasets the phase information necessary for the calculation of an electron density map may be determined. The most usual way to introduce anomalous scatterers into a protein is to replace the sulphur containing methionine amino acid residues with selenium containing seleno-methionine residues. This is done by generating recombinant protein that is isolated from cells grown on growth media that contain seleno-methionine. Selenium is capable of anomalously scattering X-rays and may thus be used for a MAD experiment. Further methods for phase determination such as single isomorphous replacement (SIR), single isomorphous replacement anomalous scattering (SIRAS) and direct methods exist, but the principles behind them are similar to MIR and MAD.
The final method generally available for the calculation of the phases necessary for the determination of an unknown protein structure is molecular replacement. This method relies upon the assumption that proteins with similar amino acid sequences (primary sequences) will have a similar fold and three-dimensional structure (tertiary structure). Proteins related by amino acid sequence are termed homologous proteins. If an X-ray diffraction dataset has been collected from a crystal whose protein structure is not known, but a structure has been determined for a homologous protein, then molecular replacement can be attempted. Molecular replacement is a mathematical process that attempts to correlate the dataset obtained from a new protein crystal with the theoretical diffraction pattern calculated for a protein of known structure. If the correlation is sufficiently high some phase information can be extracted from the known protein structure and combined with the amplitudes obtained from the new protein dataset. This enables calculation of a preliminary electron density map for the protein of unknown structure.
If an electron density map has been calculated for a protein of unknown structure then the amino acids comprising the protein must be fitted into the electron density for the protein. This is normally done manually, although high resolution data may enable automatic model building. The process of model building and fitting the amino acids to the electron density can be both a time consuming and laborious process. Once the amino acids have been fitted to the electron density it is necessary to refine the structure. Refinement attempts to maximize the correlation between the experimentally calculated electron density and the electron density calculated from the protein model built. Refinement also attempts to optimize the geometry and disposition of the atoms and amino acids within the user-constructed model of the protein structure. Sometimes manual re-building of the structure will be required to release the structure from local energetic minima. There are now several software packages available that enable an experimentalist to carry out refinement of a protein structure. There are certain geometry and correlation diagnostics that are used to monitor the progress of a refinement. These diagnostic parameters are monitored and rebuilding/refinement continued until the experimenter is satisfied that the structure has been adequately refined.
The present invention relates to the crystal structure of EXPB1 (Genbank accession AA045608; PDB accession 2HCZ), which allows the binding location of the polysaccharides to the compound and its activities to be investigated and determined.
Thus in one aspect, the invention provides a three dimensional structure of EXPB1 set out in
According to the invention, EXPB1 contains two domains (residues 19-140 [D1] and 147-245 [D2]) connected by a short linker (residues 141-146) and aligned end to end so as to make a closely-packed irregular cylinder ˜66 Å long and 26 Å in diameter (
The two EXPB1 domains pack close to one another, making contact via H-bonds and salt bridges between basic residues (K65 and R137) in D1 and acidic residues (E217 and D171) in D2. These residues are highly conserved in the EXPB family (see annotated sequence logo in
The two EXPB1 domains align so as to form a long, shallow groove with highly conserved polar and aromatic residues suitably positioned to bind a twisted polysaccharide chain of 10 xylose residues (
Residues that could bind a polysaccharide by van der Waals interactions with the sugar rings include W26, Y27, G40, and G44 from D1 as well as Y160 and W194 from D2. Conserved residues that might stabilize polysaccharide binding by H-bonding include T25, D37, D95 and D107 in D1 and N157, S193 and R199 in D2.
In general aspects, the present invention is concerned with the provision of an EXPB1 structure and its use in modeling the interaction of molecular structures, e.g. potential and existing substrates, inhibitors, analogs, or fragments of such compounds, with this EXPB1 structure.
These and other aspects and embodiments of the present invention are discussed below. The above aspects of the invention, both singly and in combination, all contribute to features of the invention, which are advantageous.
The invention comprises in one paragraph a computer-based method for the analysis of the interaction of a molecular structure with an EXPB1 structure, which comprises: providing a structure comprising a three-dimensional representation of EXPB1 or a portion thereof, which representation comprises all or a portion of the coordinates of any one of figures represented in
The method of the invention further comprises the steps of obtaining or synthesizing a compound which has said molecular structure; and contacting said compound with EXPB1 protein to determine the ability of said compound to interact with the EXPB1.
The method also include obtaining or synthesizing a compound which has said molecular structure; forming a complex of an EXPB1 substrate protein and said compound; and analyzing said complex by X-ray crystallography to determine the ability of said compound to interact with the EXPB1 substrate.
The method further comprises the steps of: obtaining or synthesizing a compound which has said molecular structure; and determining or predicting how said compound interacts with an EXPB1 substrate; and modifying the compound structure so as to alter the interaction between it and the substrate. The invention also includes a compound having the modified structure identified using the method and which has expansin activity.
A method of obtaining a structure of a target EXPB1 protein of unknown structure, the method comprises the steps of: providing a crystal of said target EXPB1 protein, obtaining an X-ray diffraction pattern of said crystal, calculating a three-dimensional atomic coordinate structure of said target, by modeling the structure of said target EXPB1 protein of unknown structure on the active site structure of any one of
The invention also includes methods where the molecular structure to be fitted is in the form of a model of a pharmacophore including but not limited to: (a) a wire-frame model; (b) a chicken-wire model; (c) a ball-and-stick model; (d) a space-filling model; (e) a stick-model; (f) a ribbon model; (g) a snake model; (h) an arrow and cylinder model; (i) an electron density map; (j) a molecular surface model.
The invention also includes a computer-based method for the analysis of molecular structures which comprises: (a) providing the coordinates of at least two atoms of an EXPB1 structure as defined in
A computer-based method of protein design comprising: (a) providing the coordinates of at least two atoms of an EXPB1 structure as defined in any one of
A method for identifying a candidate modulator of EXPB1 comprising the steps of: (a) employing a three-dimensional structure of EXPB1, at least one sub-domain thereof, or a plurality of atoms thereof, to characterize at least one EXPB1 binding cavity, the three-dimensional structure being defined by
The invention also contemplates a method for determining the structure of a protein, which method comprises: providing the co-ordinates per
A method for determining the structure of a compound bound to EXPB1 protein, said method comprising: providing a crystal of EXPB1 protein; soaking the crystal with the compound to form a complex; and determining the structure of the complex by employing the data of any one of
A method for determining the structure of a compound bound to EXPB1 protein, said method comprising: mixing EXPB1 protein with the compound; crystallizing an EXPB1 protein-compound complex; and determining the structure of the complex by employing the data of any one of Tables 1 or
A method for modifying the structure of a compound in order to alter its metabolism by an EXPB1, which method comprises: fitting a starting compound to one or more coordinates of at least one amino acid residue of the ligand-binding region of the EXPB1; modifying the starting compound structure so as to increase or decrease its interaction with the ligand-binding region.
A method for modifying the structure of a compound in order to alter its metabolism by an EXPB1, which method comprises: fitting a starting compound to one or more coordinates of at least one amino acid residue of the binding region of the EXPB1; modifying the starting compound structure so as to increase or decrease its interaction with the binding region.
A method for modifying the structure of a compound in order to alter its, or another compounds, metabolism by an EXPB1, which method comprises: fitting a starting compound to one or more coordinates of at least one amino acid residue of the peripheral binding region of the EXPB1; modifying the starting compound structure so as to increase or decrease its interaction with the peripheral binding region; wherein said peripheral binding region is defined as the EXPB1 residues numbered as: W26, Y27, G40, nd G44, Y160, and W194.
A method of obtaining a representation of the three dimensional structure of a crystal of EXPB1, which method comprises providing the data of any one of PDB accession #2HCZ or
A computer system, intended to generate structures and/or perform optimization of compounds which interact with EXPB1, EXPB1 homologues or analogues, complexes of EXPB1 with compounds, or complexes of EXPB1 homologues or analogues with compounds, the system containing computer-readable data comprising one or more of: (a) EXPB1 co-ordinate data of any one of PDB accession #2HCZ, of
A computer system according to paragraph comprising: (i) a computer-readable data storage medium comprising data storage material encoded with said computer-readable data; (ii) a working memory for storing instructions for processing said computer-readable data; and (iii) a central-processing unit coupled to said working memory and to said computer-readable data storage medium for processing said computer-readable data and thereby generating structures and/or performing rational compound design.
A computer system comprising a display coupled to said central-processing unit for displaying said structures.
A method of providing data for generating structures and/or performing optimization of compounds which interact with EXPB1, EXPB1 homologues or analogues; complexes of EXPB1 with compounds, or complexes of EXPB1 homologues or analogues with compounds, the method comprising: (i) establishing communication with a remote device containing (a) computer-readable data comprising atomic coordinate data of any one of Tables 1, or
A computer-readable storage medium, comprising a data storage material encoded with computer readable data, wherein the data are defined by all or a portion of the structure coordinates of the EXPB1 protein of any one of PDB accession #2HCZ or
The present invention provides a crystal of EXPB1 having cell dimensions of about a=113.7 Å, b=45.2 Å and c=70.3 Å. With angles α=90.0°, β=124.6°, and γ=90.0. Unit cell variability of 5% may be observed in all dimensions.
Substrates include plant cell walls, or components thereof. Alternatively the ligand could be a compound whose interaction with EXPB1 is unknown.
Such crystals may be obtained using the methods described in the accompanying examples.
The EXPB1 may optionally comprise a tag, such as a C-terminal polyhistidine tag to allow for recovery and purification of the protein.
The methodology used to provide an EXPB1 crystal illustrated herein may be used generally to provide an EXPB1 crystal resolvable at a resolution of at least 3.0 Å and preferably at least 2.8 Å. The invention thus further provides an EXPB1 crystal having a resolution of at least 3.0 Å, preferably at least 2.8 Å. The proteins may be wild-type proteins or variants thereof, which are modified to promote crystal formation, for example by N-terminal truncations and/or deletion of loop regions, which prevent crystal formation.
In a further aspect, the invention provides a method for making an EXPB1 protein crystal, particularly of an EXPB1 protein comprising the core sequence of EXPB1 (as defined above) or a variant thereof, which method comprises growing a crystal by vapor diffusion using a reservoir buffer that contains 0.05-0.2 M HEPES pH 7.0-7.8, 2.5-10% IPA, 0-20% PEG 4000, 0-0.3 M sodium chloride, 0-10% PEG 400, 0-10% glycerol, preferably 0.1 M HEPES pH 7.2, 5% IPA, 10% PEG 4000. The crystal is grown by vapor diffusion and is performed by placing an aliquot of the solution on a cover slip as a hanging drop above a well containing the reservoir buffer. The concentration of the protein solution used was 0.3-0.7 mM.
Crystals of the invention also include crystals of EXPB1 mutants, chimeras, homologues in the expansin family (e.g. α-expansins, β-expansins, group 2/3 allergens, etc) and alleles.
A mutant is an EXPB1 protein characterized by the replacement or deletion of at least one amino acid from the wild type EXPB1. Such a mutant may be prepared for example by site-specific mutagenesis, or incorporation of natural or unnatural amino acids.
The present invention contemplates “mutants” wherein a “mutant” refers to a polypeptide which is obtained by replacing at least one amino acid residue in a native or synthetic EXPB1 with a different amino acid residue and/or by adding and/or deleting amino acid residues within the native polypeptide or at the N- and/or C-terminus of a polypeptide corresponding to EXPB1, and which has substantially the same three-dimensional structure as EXPB1 from which it is derived. By having substantially the same three-dimensional structure is meant having a set of atomic structure co-ordinates that have a root mean square deviation (r.m.s.d.) of less than or equal to about 2.0 Å (preferably less than 1.55 or 1.5 Å, more preferably less than 1.0 Å, and most preferably less than 0.5 Å) when superimposed with the atomic structure co-ordinates of the EXPB1 from which the mutant is derived when at least about 50% to 100% of the Cα atoms of the EXPB1 are included in the superposition. A mutant may have, but need not have, enzymatic or catalytic activity.
To produce homologues or mutants, amino acids present in the said protein can be replaced by other amino acids having similar properties, for example hydrophobicity, hydrophobic moment, antigenicity, propensity to form or break α-helical or β-sheet structures, and so on. Substitutional variants of a protein are those in which at least one amino acid in the protein sequence has been removed and a different residue inserted in its place. Amino acid substitutions are typically of single residues but may be clustered depending on functional constraints e.g. at a crystal contact. Preferably amino acid substitutions will comprise conservative amino acid substitutions. Insertional amino acid variants are those in which one or more amino acids are introduced. This can be amino-terminal and/or carboxy-terminal fusion as well as intrasequence. Examples of amino-terminal and/or carboxy-terminal fusions are affinity tags, MBP tag, and epitope tags.
Amino acid substitutions, deletions and additions which do not significantly interfere with the three-dimensional structure of the EXPB1 will depend, in part, on the region of the EXPB1 where the substitution, addition or deletion occurs. In highly variable regions of the molecule, non-conservative substitutions as well as conservative substitutions may be tolerated without significantly disrupting the three-dimensional structure of the molecule. In highly conserved regions, or regions containing significant secondary structure, conservative amino acid substitutions are preferred.
Conservative amino acid substitutions are well-known in the art, and include substitutions made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity and/or the amphipathic nature of the amino acid residues involved. For example, negatively charged amino acids include aspartic acid and glutamic acid; positively charged amino acids include lysine and arginine; amino acids with uncharged polar head groups having similar hydrophilicity values include the following: leucine, isoleucine, valine; glycine, alanine; asparagine, glutamine; serine, threonine; phenylalanine, tyrosine. Other conservative amino acid substitutions are well known in the art.
In some instances, it may be particularly advantageous or convenient to substitute, delete and/or add amino acid residues in order to provide convenient cloning sites in the cDNA encoding the polypeptide, to aid in purification of the polypeptide, etc. Such substitutions, deletions and/or additions which do not substantially alter the three dimensional structure of EXPB1 will be apparent to those having skills in the art.
It should be noted that the mutants contemplated herein need not exhibit enzymatic activity. Indeed, amino acid substitutions, additions or deletions that interfere with the catalytic activity of the EXPB1 but which do not significantly alter the three-dimensional structure of the catalytic region are specifically contemplated by the invention. Such crystalline polypeptides, or the atomic structure co-ordinates obtained there from, can be used to identify compounds that bind to the protein.
The residues for mutation could easily be identified by those skilled in the art and these mutations can be introduced by site-directed mutagenesis e.g. using a Stratagene QuikChange™ Site-Directed Mutagenesis Kit or cassette mutagenesis methods (see e.g. Ausubel et al., eds., Current Protocols in Molecular Biology, John Wiley & Sons, Inc., New York, and Sambrook et al., Molecular Cloning: a Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., (1989)).
The present invention contemplates “alleles” wherein allele is used for two or more alternative forms of a gene resulting in different gene products and thus different phenotypes. An allele contains nucleotide changes that have been shown to affect transcription, splicing, translation, post-transcriptional or post-translational modifications or result in at least one amino acid change. These different alleles are particularly important in EXPB1s as some may confer different properties on cell wall expansion onto the phenotype. Alleles are often only different by one or two amino acids.
To the extent that the present invention relates to EXPB1-ligand complexes and mutant, homologue, analogue, allelic form, species variant proteins of EXPB1, crystals of such proteins may be formed. The skilled person would recognize that the conditions provided herein for crystallizing EXPB1 may be used to form such crystals. Alternatively, the skilled person would use the conditions as a basis for identifying modified conditions for forming the crystals.
Thus the aspects of the invention relating to crystals of EXPB1, may be extended to crystals of mutant and mutants of EXPB1 which result in homologue, allelic form, and species variant.
(iii) Crystallization of EXPB1
To produce crystals of EXPB1 protein the final protein is, conveniently, concentrated to 10-60, e.g. 20-40 mg/ml in 10-100 mM potassium phosphate with high salt (e.g. 500 mM NaCl or KCl), optionally also with about 1 mM EDTA and/or about 2 mM dithiothreitol, by using concentration devices which are commercially available. Crystallization of the protein is set up by the 0.5-2/1 hanging or sitting drop methods and the protein is crystallized by vapor diffusion at 5-25° C. against a range of vapor diffusion buffer compositions. It is customary to use a 1:1 ratio of protein solution and vapor diffusion buffer in the hanging drop, and this has been used herein unless stated to the contrary.
Typically the vapor diffusion buffer comprises 0-27.5%, preferably 2.5-27.5% PEG 1K-20 K, preferably 1-8K or PEG 2000MME-5000MME, preferably PEG 2000 MME, or 0-10% Jeffamine M-600 and/or 5-20%, e.g. 10-20% propanol or 15-20% ethanol or about 15%-30%, e.g. about 15% 2-methyl-2,4-pentanediol (MPD), optionally with 0.01 M-1.6 M salt or salts and/or 0-0.15, e.g. 0-0.1 M of a solution buffer and/or 0-35%, such as 0-15%, glycerol and/or 0-35% PEG300-400; but preferably: 10-25% PEG 1K-8K or PEG 2000MME or 0-10% Jeffamine M-600 and/or 5-15%, e.g. 10-15%, propanol or ethanol, optionally with 0.1 M-0.2 M salt or salts and/or 0-0.15, e.g. 0-0.1 M solution buffer and/or PEG400, but more preferably: 15-20% PEG 3350 or PEG 4000 or PEG 2000MME or 0-10% Jeffamine M-600 or 5-15%, e.g. 10-15% propanol or ethanol, optionally with 0.1 M-0.2 M salt or salts and/or 0-0.15 M solution buffer.
Alternatively the vapor diffusion buffer may be 0.1 M HEPES pH 7.5 0.2-0.3 M potassium chloride, 1-5% MPD, 7-14.0% PEG 3350 or PEG 4000, 25-50 mM calcium chloride more specifically 0.1 M HEPES pH 7.5, 0.20-0.30 M KCl, 10-14% PEG 4000, 5% MPD, 25 mM calcium chloride.
The salt may be an alkali metal (particularly lithium, sodium and potassium), alkaline earth metal (e.g. magnesium or calcium), ammonium, ferric, ferrous or transition metal salt (e.g. zinc) of a halide (e.g. bromide, chloride or fluoride), acetate, formate, nitrate, sulfate, tartrate, citrate or phosphate. This includes sodium fluoride, potassium fluoride, ammonium fluoride, ammonium acetate, lithium acetate, magnesium acetate, sodium acetate, potassium acetate, calcium acetate, zinc acetate, ammonium chloride, lithium chloride, magnesium chloride, potassium chloride, sodium chloride, potassium bromide, magnesium formate, sodium formate, potassium formate, ammonium formate, ammonium nitrate, lithium nitrate, potassium nitrate, sodium nitrate, ammonium sulfate, potassium sulfate, lithium sulfate, sodium sulfate, di-sodium tartrate, potassium sodium tartrate, di-ammonium tartrate, potassium dihydrogen phosphate, tri-sodium citrate, tri-potassium citrate, zinc acetate, ferric chloride, calcium chloride, magnesium nitrate, magnesium sulfate, sodium dihydrogen phosphate, di-sodium hydrogen phosphate, di-potassium hydrogen phosphate, ammonium dihydrogen phosphate, di-ammonium hydrogen phosphate, tri-lithium citrate, nickel chloride, ammonium iodide, di-ammonium hydrogen citrate.
Solution buffers if present include, for example, Hepes, Tris, imidazole, cacodylate, tri-sodium citrate/citric acid, tri-sodium citrate/HCl, acetic acid/sodium acetate, phosphate-citrate, sodium potassium phosphate, 2-(N-morpholino)-ethane sulphonic acid/NaOH (MES), CHES or bis-trispropane. The pH range is desirably maintained at pH 4.2-8.5, preferably 4.7-8.5. Solution buffers if present can also include, for example, bicine, bis-tris, CAPS, MOPS, ADA which allow the pH to be maintained in the range 5.8-11.
Crystals may be prepared using a Hampton Research Screening kits, Poly-ethylene glycol (PEG)/ion screens, PEG grid, Ammonium sulphate grid, PEG/ammonium sulphate grid or the like. Crystallization may also be performed in the presence of an inhibitor of EXPB1, e.g. fluvoxamine or 2-phenyl imidazole. EXPB1 crystallization may also be performed in the presence of one or more inhibitors e.g. ketoconazole, metyrapone, fluconazole or triadimefon and/or in the presence of one or more substrate(s) e.g. testosterone or progesterone.
Additives can be added to a crystallization condition identified to influence crystallization. Additive Screens are to be used during the optimization of preliminary crystallization conditions where the presence of additives may assist in the crystallization of the sample and the additives may improve the quality of the crystal e.g. Hampton Research additive screens which use glycerol, polyols and other protein stabilizing agents in protein crystallization (R. Sousa. Acta. Cryst. (1995) D51, 271-277) or divalent cations (Trakhanov, S. and Quiocho, F. A. Protein Science (1995) 4, 9, 1914-1919).
In addition, detergents may be added to a crystallization condition to improve the crystallization behavior e.g. the ionic, non-ionic and zwitterionic detergents found in the Hampton Research detergent screens (McPherson, A., et al., The effects of neutral detergents on the crystallization of soluble proteins, J. Crystal Growth (1986) 76, 547-553).
Alternatively, the vapor diffusion buffer typically comprises 0-27.5% PEG 1K-20 K, preferably 1-8K or PEG 2000MME-5000MME, preferably PEG 2000 MME, or 0-10% Jeffamine M-600 and/or 1-20%, e.g. 1-20% propanol or 15-20% ethanol or about 1%-30%, e.g. about 2-25% 2-methyl-2,4-pentanediol (MPD), optionally with 0.01 M-1.6 M salt or salts and/or 0-0.15 M, e.g. 0-0.1 M, of a solution buffer and/or 0-35%, such as 0-15%, glycerol and/or 0-35% PEG300-400; but preferably: 0-27.5%, preferably 2.5-27.5% PEG 1K-20 K, most preferably 5-20% PEG 4K or PEG 2000MME-5000MME, preferably PEG 2000 MME, and 1-20% alcohol, e.g. 1-20% propanol e.g. iso-propanol or 2-25% 2-methyl-2,4-pentanediol (MPD), optionally with 0.01 M-1.6 M salt or salts and/or 0-0.15 M, e.g. 0-0.1 M, of a solution buffer and/or 0-35%, such as 0-15%, glycerol and/or 0-35% PEG300-400.
In a further aspect, the invention also provides a crystal of EXPB1 having the three dimensional atomic coordinates of PDB accession #2HCZ, the description herein, table 1, and/or
Protein structure similarity is routinely expressed and measured by the root mean square deviation (r.m.s.d.), which measures the difference in positioning in space between two sets of atoms. The r.m.s.d. measures distance between equivalent atoms after their optimal superposition. The r.m.s.d. can be calculated over all atoms, over residue backbone atoms (i.e. the nitrogen-carbon-carbon backbone atoms of the protein amino acid residues), main chain atoms only (i.e. the nitrogen-carbon-oxygen-carbon backbone atoms of the protein amino acid residues), side chain atoms only or more usually over C-α atoms only. For the purposes of this invention, the r.m.s.d. can be calculated over any of these, using any of the methods outlined below.
Thus the coordinates disclosed herein provide a measure of atomic location in Angstroms, given to 3 decimal places. The coordinates are a relative set of positions that define a shape in three dimensions, but the skilled person would understand that an entirely different set of coordinates having a different origin and/or axes could define a similar or identical shape. Furthermore, the skilled person would understand that varying the relative atomic positions of the atoms of the structure so that the root mean square deviation of the residue backbone atoms (i.e. the nitrogen-carbon-carbon backbone atoms of the protein amino acid residues) is less than 2.0 Å, preferably less than 1.55 or 1.5 Å, more preferably less than 1.0 Å, more preferably less than 0.5 Å, more preferably less than 0.3 Å, such as less than 0.25 Å, or less than 0.2 Å, and most preferably less than 0.1 Å, when superimposed on the coordinates provided in PDB accession #2HCZ for the residue backbone atoms, will generally result in a structure which is substantially the same as the structures disclosed herein in terms of both its structural characteristics and usefulness for structure-based analysis of EXPB1-interactivity molecular structures.
A further rmsd value of less than 1.0 Å which is preferred is a value of less than 0.6 Å, and rmsd values of less than 0.5 Å which are preferred are values of less than 0.45 Å, preferably less than 0.35 Å.
Unless explicitly set out to the contrary, or otherwise clear from the context, reference throughout the present specification to the use of all or selected coordinates disclosed herein does not exclude the use of additional coordinates.
Methods of comparing protein structures are discussed in Methods of Enzymology, vol 115, pg 397-420. The necessary least-squares algebra to calculate r.m.s.d. has been given by Rossman and Argos (J. Biol. Chem., vol 250, pp 7525 (1975)) although faster methods have been described by Kabsch (Acta Crystallogr., Section A, A92, 922 (1976)); Acta Cryst. A34, 827-828 (1978)), Hendrickson (Acta Crystallogr., Section A, A35, 158 (1979)); McLachan (J. Mol. Biol., vol 128, pp 49 (1979)) and Kearsley (Acta Crystallogr., Section A, A45, 208 (1989)). Some algorithms use an iterative procedure in which the one molecule is moved relative to the other, such as that described by Ferro and Hermans (Ferro and Hermans, Acta Crystallographic, A33, 345-347 (1977)). Other methods e.g. Kabsch's algorithm locate the best fit directly.
Programs for determining rmsd include MNYFIT (part of a collection of programs called COMPOSER, Sutcliffe, M. J., Haneef, I., Carney, D. and Blundell, T. L. (1987) Protein Engineering, 1, 377-384), MAPS (Lu, G. An Approach for Multiple Alignment of Protein Structures (1998, in manuscript and on http://bioinfol.mbfys.lu.se/TOP/maps.html)).
It is usual to consider C-alpha atoms and the rmsd can then be calculated using programs such as LSQKAB (Collaborative Computational Project 4. The CCP4 Suite: Programs for Protein Crystallography, Acta Crystallographica, D50, (1994), 760-763), QUANTA (Jones et al., Acta Crystallography A47 (1991), 110-119 and commercially available from Accelerys, San Diego, Calif.), Insight (commercially available from Accelerys, San Diego, Calif.), Sybyl.®. (commercially available from Tripos, Inc., St Louis), O (Jones et al., Acta Crystallographica, A47, (1991), 110-119), and other coordinate fitting programs.
In, for example the programs LSQKAB and O, the user can define the residues in the two proteins that are to be paired for the purpose of the calculation. Alternatively, the pairing of residues can be determined by generating a sequence alignment of the two proteins, programs for sequence alignment are discussed in more detail in Section F. The atomic coordinates can then be superimposed according to this alignment and an r.m.s.d. value calculated. The program Sequoia (C. M. Bruns, I. Hubatsch, M. Ridderstrom, B. Mannervik, and J. A. Tainer (1999) Human Glutathione Transferase A4-4 Crystal Structures and Mutagenesis Reveal the Basis of High Catalytic Efficiency with Toxic Lipid Peroxidation Products, Journal of Molecular Biology 288(3): 427-439) performs the alignment of homologous protein sequences, and the superposition of homologous protein atomic coordinates. Alternatively, the program Astex-KFIT (published in WO2004/038015) can be used. Once aligned, the r.m.s.d. can be calculated using programs detailed above. For sequence identical, or highly identical, the structural alignment of proteins can be done manually or automatically as outlined above. Another approach would be to generate a superposition of protein atomic coordinates without considering the sequence.
It is more normal when comparing significantly different sets of coordinates to calculate the rmsd value over C-α atoms only. It is particularly useful when analyzing side chain movement to calculate the rmsd over all atoms and this can be done using LSQKAB and other programs.
Those of skill in the art will appreciate that in many applications of the invention, it is not necessary to utilize all the coordinates disclosed herein, but merely a portion of them. For example, as described below, in methods of modeling candidate compounds with EXPB1, selected coordinates of EXPB1 may be used.
By “selected coordinates” it is meant for example at least 5, preferably at least 10, more preferably at least 50 and even more preferably at least 100, for example at least 500 or at least 1000 atoms of the EXPB1 structure. Likewise, the other applications of the invention described herein, including homology modeling and structure solution, and data storage and computer assisted manipulation of the coordinates, may also utilize all or a portion of the coordinates (i.e. selected coordinates). The selected coordinates may include or may consist of atoms found in the EXPB1 binding pocket, as described herein below.
EXPB1 contains two domains (residues 19-140 [D1] and 147-245 [D2]) connected by a short linker (residues 141-146) and aligned end to end so as to make a closely-packed irregular cylinder ˜66 Å long and 26 Å in diameter (
Based on its electron density, our model of this N-linked glycan consists of a (1→4)-linked backbone of GlcNac1GlcNac2Man3 with two Man residues and a Xyl residue attached to Man3 and a Fuc residue linked to GlcNac1 (
Residues 1-3 in the leader sequence were not modeled due to insufficient electron density, but N-terminal sequencing and mass spectrometry indicate their presence (24). The 24-aa signal peptide at the N-terminus, predicted from the EXPB1 cDNA, was absent and was presumably excised during ER processing prior to secretion. No other post-translational modifications, bound metals or ligands were evident from the crystal structure.
The two EXPB1 domains pack close to one another, making contact via H-bonds and salt bridges between basic residues (K65 and R137) in D1 and acidic residues (E217 and D171) in D2. These residues are highly conserved in the EXPB family (see annotated sequence logo in
Structure of Domain 1. Residues 19-140 form an irregular ovoid with rough dimensions of 35×30×24 Å. The protein fold is dominated by a six-stranded β-barrel flanked by short loops and α-helices (
Previous analysis (2, 3) indicated that D1 has distant sequence similarity to members of glycoside hydrolase family 45 (GH45), whose members have been characterized as inverting endo-β-(1→4)-D-glucanases (2, 3, 32, 33). Superposition of D1 with a GH45 protein (PDB #4ENG) using the secondary structure matching algorithm in CCP4 (34) gives good overlap of the two structures for 84 residues (60%) of the peptide backbone of D1 (
The GH45 enzyme is substantially larger than D1 (210 residues versus 121) and the “extra” structure in the GH45 enzyme is composed largely of loop regions and α-helices forming a large ridge and subtending structure lacking in D1 (
In addition to partial conservation of the protein fold, D1 has noteworthy, but incomplete, conservation of the catalytic site identified in GH45 enzymes (
What is missing in EXPB1 is a residue corresponding to D10*, the catalytic base required for glucan hydrolysis by GH45 enzymes (35). As indicated in
Inspection of the EXPB1 structure revealed another acidic residue, D95, which is close to D107 (the carboxylate groups are 8.5 Å away). D95 is highly conserved in group-1 allergens, as well as in β-expansins in general (
Enzymatic activity. Because of the structural similarity between D1 and GH45 and the configuration of D95/D37, we tested the ability of EXPB1 to hydrolyze the major polysaccharides of the cell wall. Even with 48-h incubations, we did not detect hydrolytic activity by EXPB1 (
Taking another tack, we tested two GH45 enzymes (32, 36) and a nonenzymatic GH45-related protein named “swollenin” (37) for their abilities to catalyze cell wall extension. For these experiments, heat-inactivated walls from cucumber hypocotyls and wheat coleoptiles were clamped in tension in an extensometer and changes in length were monitored upon addition of protein. We observed only small traces of wall extension activity for the GH45 enzymes and for swollenin. Thus, these related proteins lack significant expansin-type activity, at least with the cell walls tested here.
We conclude that, despite the structural similarity of D1 to GH45, EXPB1 does not induce wall extension via wall polysaccharide hydrolysis.
Structure of Domain 2 (D2). Residues 147-245 of EXPB1 make up a second domain (D2) composed of eight β strands assembled into two antiparallel β sheets (
D1 and D2 form a long potential polysaccharide-binding site. The two EXPB1 domains align so as to form a long, shallow groove with highly conserved polar and aromatic residues suitably positioned to bind a twisted polysaccharide chain of 10 xylose residues (
Residues that could bind a polysaccharide by van der Waals interactions with the sugar rings include W26, Y27, G40, and G44 from D1 as well as Y160 and W194 from D2. Conserved residues that might stabilize polysaccharide binding by H-bonding include T25, D37, D95 and D107 in D1 and N157, S193 and R199 in D2.
The use of chimeric proteins to achieve desired properties is now common in the scientific literature. Active site chimeras are also described: for example, Swairjo et al (Biochemistry (1998) 37:10928-10936) made loop chimeras of HIV-1 and HIV-2 protease to try to understand determinants of inhibitor-binding specificity.
Of particular relevance are cases where the active site is modified so as to provide a surrogate system to obtain structural information. Thus Ikuta et al (J Biol Chem (2001) 276:27548-27554) modified the active site of cdk2, for which they could obtain structural data, to resemble that of cdk4, for which no X-ray structure is currently available. In this way they were able to obtain protein/ligand structures from the chimeric protein which were useful in cdk4 inhibitor design. In a similar way, based on comparison of primary sequences of highly related isoforms the active site of the EXPB1 protein could be modified to resemble those isoforms. Protein structures or protein/ligand structures of the chimeric proteins could be used in structure-based alteration of the metabolism of compounds which are substrates of that related EXPB1 isoform.
Aspects of the present invention therefore relate to modification of EXPB1 proteins such that the active sites mimic those of related isoforms. For example, from a knowledge of the structure and residues of the active site of the maize EXPB1 structure contained herein, a person skilled in the art could modify an EXPB1 protein such that the active site mimicked that of maize EXPB1. This protein could then be used to obtain information on compound binding through the determination of protein/ligand complex structures using the chimeric EXPB1 protein.
For example, in one aspect the present invention provides a chimeric protein having a binding cavity which provides a substrate specificity substantially identical to that of EXPB1 protein, wherein the chimeric protein binding cavity is lined by a plurality of atoms which correspond to selected EXPB1 atoms lining the EXPB1 binding cavity, and the relative positions of the plurality of atoms corresponding to the relative positions, as defined herein.
The invention also provides a means for homology modeling of other proteins (referred to below as target EXPB1 proteins). By “homology modeling”, it is meant the prediction of related EXPB1 structures based either on X-ray crystallographic data or computer-assisted de novo prediction of structure, based upon manipulation of the coordinate data derivable herein or selected portions thereof.
“Homology modeling” extends to target EXPB1 proteins which are analogues or homologues of the EXPB1 protein whose structure has been determined in the accompanying examples. It also extends to EXPB1 protein mutants of EXPB1 protein itself.
The term “homologous regions” describes amino acid residues in two sequences that are identical or have similar (e.g. aliphatic, aromatic, polar, negatively charged, or positively charged) side-chain chemical groups. Identical and similar residues in homologous regions are sometimes described as being respectively “invariant” and “conserved” by those skilled in the art.
In general, the method involves comparing the amino acid sequences of the EXPB1 protein with a target EXPB1 protein by aligning the amino acid sequences. Amino acids in the sequences are then compared and groups of amino acids that are homologous (conveniently referred to as “corresponding regions”) are grouped together. This method detects conserved regions of the polypeptides and accounts for amino acid insertions or deletions as seen in
Homology between amino acid sequences can be determined using commercially available algorithms. The programs BLAST, gapped BLAST, BLASTN, PSI-BLAST and BLAST2 (provided by the National Center for Biotechnology Information) are widely used in the art for this purpose, and can align homologous regions of two amino acid sequences. These may be used with default parameters to determine the degree of homology between the amino acid sequence of the protein and other target EXPB1 proteins which are to be modeled.
Analogues are defined as proteins with similar three-dimensional structures and/or functions with little evidence of a common ancestor at a sequence level.
Homologues are defined as proteins with evidence of a common ancestor, i.e. likely to be the result of evolutionary divergence and are divided into remote, medium and close sub-divisions based on the degree (usually expressed as a percentage) of sequence identity.
A homologue is defined here as a protein with at least 15% sequence identity or which has at least one functional domain, which is characteristic of EXPB1. This includes polymorphic forms of EXPB1.
There are two types of homologue: orthologues and paralogues. Orthologues are defined as homologous genes in different organisms, i.e. the genes share a common ancestor coincident with the speciation event that generated them. Paralogues are defined as homologous genes in the same organism derived from a gene/chromosome/genome duplication, i.e. the common ancestor of the genes occurred since the last speciation event.
The homologues could also be polymorphic forms of EXPB1 such as alleles or mutants as described in section (A).
Once the amino acid sequences of the polypeptides with known and unknown structures are aligned, the structures of the conserved amino acids in a computer representation of the polypeptide with known structure are transferred to the corresponding amino acids of the polypeptide whose structure is unknown. For example, a tyrosine in the amino acid sequence of known structure may be replaced by a phenylalanine, the corresponding homologous amino acid in the amino acid sequence of unknown structure.
The structures of amino acids located in non-conserved regions may be assigned manually by using standard peptide geometries or by molecular simulation techniques, such as molecular dynamics. The final step in the process is accomplished by refining the entire structure using molecular dynamics and/or energy minimization.
Homology modeling as such is a technique that is well known to those skilled in the art (see e.g. Greer, Science, Vol. 228:1055 (1985), and Blundell et al., Eur. J. Biochem, Vol. 172:513 (1988)). The techniques described in these references, as well as other homology modeling techniques, generally available in the art, may be used in performing the present invention.
Thus the invention provides a method of homology modeling comprising the steps of: (a) aligning a representation of an amino acid sequence of a target EXPB1 protein of unknown three-dimensional structure with the amino acid sequence of the EXPB1 herein to match homologous regions of the amino acid sequences; (b) modeling the structure of the matched homologous regions of said target EXPB1 of unknown structure on the corresponding regions of the EXPB1 structure as obtained as described above and/or that of any one of Tables 1-4 or selected coordinates thereof; and (c) determining a conformation (e.g. so that favorable interactions are formed within the target EXPB1 of unknown structure and/or so that a low energy conformation is formed) for said target EXPB1 of unknown structure which substantially preserves the structure of said matched homologous regions. Preferably one or all of steps (a) to (c) are performed by computer modeling.
The aspects of the invention described herein which utilize the EXPB1 structure in silico may be equally applied to homologue models of EXPB1 obtained by the above aspect of the invention, and this application forms a further aspect of the present invention. Thus having determined a conformation of an EXPB1 by the method described above, such a conformation may be used in a computer-based method of rational drug design as described herein.
The atomic coordinate data of EXPB1 can also be used to solve the crystal structure of other target EXPB1 proteins including other crystal forms of EXPB1, mutants, co-complexes of EXPB1, where X-ray diffraction data or NMR spectroscopic data of these target EXPB1 proteins has been generated and requires interpretation in order to provide a structure.
In the case of EXPB1, this protein may crystallize in more than one crystal form. The data, as provided by this invention, are particularly useful to solve the structure of those other crystal forms of EXPB1. It may also be used to solve the structure of EXPB1 mutants, EXPB1 co-complexes, or of the crystalline form of any other protein with significant amino acid sequence homology to any functional domain of EXPB1.
In the case of other target EXPB1 proteins, particularly the maize EXPB1 proteins referred to in Section E above, the present invention allows the structures of such targets to be obtained more readily where raw X-ray diffraction data is generated.
Thus, where X-ray crystallographic or NMR spectroscopic data is provided for a target EXPB1 of unknown three-dimensional structure, the atomic coordinate data derived herein, may be used to interpret that data to provide a likely structure for the other EXPB1 by techniques which are well known in the art, e.g. phasing in the case of X-ray crystallography and assisting peak assignments in NMR spectra.
One method that may be employed for these purposes is molecular replacement. In this method, the unknown crystal structure, whether it is another crystal form of EXPB1, an EXPB1 mutant, an EXPB1 chimera or an EXPB1 co-complex, or the crystal of a target EXPB1 protein with amino acid sequence homology to any functional domain of EXPB1, may be determined using the EXPB1 structure coordinates. This method will provide an accurate structural form for the unknown crystal more quickly and efficiently than attempting to determine such information ab initio.
Examples of computer programs known in the art for performing molecular replacement are CNX (Brunger A. T.; Adams P. D.; Rice L. M., Current Opinion in Structural Biology, Volume 8, Issue 5, October 1998, Pages 606-611 (also commercially available from Accelrys San Diego, Calif.), MOLREP (A. Vagin, A. Teplyakov, MOLREP: an automated program for molecular replacement, J. Appl. Cryst. (1997) 30, 1022-1025, part of the CCP4 suite) or AMoRe (Navaza, J. (1994). AMoRe: an automated package for molecular replacement. Acta Cryst. A50, 157-163).
In another aspect, the present invention provides systems, particularly a computer system, the systems containing one of (a) EXPB1 co-ordinate data herein, said data defining the three-dimensional structure of EXPB1 or at least selected coordinates thereof; (b) atomic coordinate data of a target EXPB1 protein generated by homology modeling of the target based on the coordinate data herein, (c) atomic coordinate data of a target EXPB1 protein generated by interpreting X-ray crystallographic data or NMR data by reference to the co-ordinate data herein; or (d) structure factor data derivable from the atomic coordinate data of (b) or (c).
For example the computer system may comprise: (i) a computer-readable data storage medium comprising data storage material encoded with the computer-readable data; (ii) a working memory for storing instructions for processing said computer-readable data; and (iii) a central-processing unit coupled to said working memory and to said computer-readable data storage medium for processing said computer-readable data and thereby generating structures and/or performing rational compound design. The computer system may further comprise a display coupled to said central-processing unit for displaying said structures.
The invention also provides such systems containing atomic coordinate data of target EXPB1 proteins wherein such data has been generated according to the methods of the invention described herein based on the starting data provided the data herein or selected coordinates thereof.
Such data is useful for a number of purposes, including the generation of structures to analyze the mechanisms of action of EXPB1 proteins and/or to perform rational drug design of compounds, which interact with EXPB1.
In a further aspect, the present invention provides computer readable media with at least one of (a) EXPB1 co-ordinate data herein, said data defining the three-dimensional structure of EXPB1 or at least selected coordinates thereof; (b) atomic coordinate data of a target EXPB1 protein generated by homology modeling of the target based on the coordinate data herein, (c) atomic coordinate data of a target EXPB1 protein generated by interpreting X-ray crystallographic data or NMR data by reference to the co-ordinate data; or (d) structure factor data derivable from the atomic coordinate data of (b) or (c).
In another aspect, the invention provides a computer-readable storage medium, comprising a data storage material encoded with computer readable data, wherein the data are defined by all or a portion (e.g. selected coordinates as defined herein) of the structure coordinates of EXPB1 herein, or a homologue of said EXPB1, wherein said homologue comprises backbone atoms that have a root mean square deviation from the Cα or backbone atoms (nitrogen-carbonα-carbon) of less than 2 Å, preferably less than 1.55 or 1.5 Å, more preferably less than 1.0 Å (e.g. less than 0.6 Å), and most preferably less than 0.5 Å (e.g. less than 0.45 Å such as less than 0.35 Å).
As used herein, “computer readable media” refers to any medium or media, which can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media such as floppy discs, hard disc storage medium and magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.
By providing such computer readable media, the atomic coordinate data of the invention can be routinely accessed to model EXPB1s or selected coordinates thereof. For example, RASMOL (Sayle et al., TIBS, Vol. 20, (1995), 374) is a publicly available computer software package, which allows access and analysis of atomic coordinate data for structure determination and/or rational drug design.
As used herein, “a computer system” refers to the hardware means, software means and data storage means used to analyze the atomic coordinate data of the invention. The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means and data storage means. Desirably a monitor is provided to visualize structure data. The data storage means may be RAM or means for accessing computer readable media of the invention. Examples of such systems are microcomputer workstations available from Silicon Graphics Incorporated and Sun Microsystems running Unix based, Windows NT or IBM OS/2 operating systems.
The invention also provides a computer-readable data storage medium comprising a data storage material encoded with a first set of computer-readable data comprising the EXPB1 coordinates herein or selected coordinates thereof; which, when combined with a second set of machine readable data comprising an X-ray diffraction pattern of a molecule or molecular complex of unknown structure, using a machine programmed with the instructions for using said first set of data and said second set of data, can determine at least a portion of the electron density corresponding to the second set of machine readable data.
The crystal structures obtained according to the present invention as well as the structures of target EXPB1 proteins obtained in accordance with the methods described herein), may be used in several ways for chemical compound design.
In the case where a molecule is bound by an EXPB1, information on the binding orientation by either co-crystallization, soaking or computationally docking the binding orientation of the compound in the binding pocket can be determined. This will guide specific modifications to the chemical structure designed to mediate or control the interaction of the compound with the protein. Such modifications can be designed with an aim to increase the enhancement of activity by EXPB1 or to increase the active life of the compound and so improve its enzymatic activity.
The crystal structure could also be useful to understand EXPB1-cellulose (substrate) interactions. The crystal structure of the present invention complexed to such a modulator or other compound (either in vitro or in silico) may also allow rational modifications either to modify the modulator such that it either increases or decreases activity, or to modify the EXPB1 such that it could bind better and so displace the modulator.
EXPB1s, as all expansins display significant polymorphic variations dependent on the plant species. This can manifest itself in adverse reactions from some uses. By using the crystal structures of the present invention to map the relevant mutation with respect to the binding mode of EXPB1, chemical modifications could also be made to the expansin to avoid interactions with the variable region of the protein. This could ensure more consistent polysaccharide binding and cell wall extension from EXPB1 for such segments of the population and avoid unwanted deleterious effects.
Some compounds may be converted by EXPB1s into active metabolites. In the case of such compounds, a greater understanding of how such compounds are converted by an EXPB1 will allow modification of the compound so that it can be converted at a different rate. For example, increasing the rate of conversion may allow a more rapid delivery of a desired wall loosening effect, whereas decreasing the rate of conversion may allow for higher sustained activity.
Thus, the determination of the three-dimensional structure of EXPB1 provides a basis for the design of new compounds, which interact with EXPB1 in novel ways. For example, knowing the three-dimensional structure of EXPB1, computer modeling programs may be used to design different molecules expected to interact with possible or confirmed active sites, such as binding sites or other structural or functional features of EXPB1.
In one approach, the structure of a compound bound to an EXPB1 may be determined by experiment. This will provide a starting point in the analysis of the compound bound to EXPB1, thus providing those of skill in the art with a detailed insight as to how that particular compound interacts with EXPB1 and the mechanism by which it is metabolized.
Many of the techniques and approaches to structure-based compound design described above rely at some stage on X-ray analysis to identify the binding position of a ligand in a ligand-protein complex. A common way of doing this is to perform X-ray crystallography on the complex, produce a difference Fourier electron density map, and associate a particular pattern of electron density with the ligand. However, in order to produce the map (as explained e.g. by Blundell et al., in Protein Crystallography, Academic Press, New York, London and San Francisco, (1976)), it is necessary to know beforehand the protein 3D structure (or at least the protein structure factors). Therefore, determination of the EXPB1 structure also allows difference Fourier electron density maps of EXPB1-compound complexes to be produced, determination of the binding position of the drug and hence may greatly assist the process of rational drug design.
Accordingly, the invention provides a method for determining the structure of a compound bound to EXPB1, said method comprising: providing a crystal of EXPB1 according to the invention; soaking the crystal with said compounds; and determining the structure of said EXPB1 compound complex by employing the data described herein.
Alternatively, the EXPB1 and compound may be co-crystallized. Thus the invention provides a method for determining the structure of a compound bound to EXPB1, said method comprising; mixing the protein with the compound(s), crystallizing the protein-compound(s) complex; and determining the structure of said EXPB1-compound(s) complex by reference to the EXPB1 structural data herein.
The analysis of such structures may employ (i) X-ray crystallographic diffraction data from the complex and (ii) a three-dimensional structure of EXPB1, or at least selected coordinates thereof, to generate a difference Fourier electron density map of the complex, the three-dimensional structure being defined by atomic coordinate data provided herein. The difference Fourier electron density map may then be analyzed.
Therefore, such complexes can be crystallized and analyzed using X-ray diffraction methods, e.g. according to the approach described by Greer et al., J. of Medicinal Chemistry, Vol. 37, (1994), 1035-1054, and difference Fourier electron density maps can be calculated based on X-ray diffraction patterns of soaked or co-crystallized EXPB1 and the resolved structure of uncomplexed EXPB1. These maps can then be analyzed e.g. to determine whether and where a particular compound binds to EXPB1 and/or changes the conformation of EXPB1.
Electron density maps can be calculated using programs such as those from the CCP4 computing package (Collaborative Computational Project 4. The CCP4 Suite: Programs for Protein Crystallography, Acta Crystallographica, D50, (1994), 760-763.). For map visualization and model building programs such as “O” (Jones et al., Acta Crystallographica, A47, (1991), 110-119) can be used.
In addition, in accordance with this invention, EXPB1 mutants may be crystallized in co-complex with known EXPB1 substrates or inhibitors or novel compounds. The crystal structures of a series of such complexes may then be solved by molecular replacement and compared with that of the EXPB1 structure disclosed herein. Potential sites for modification within the various binding sites of the protein may thus be identified. This information provides an additional tool for determining the most efficient binding interactions, for example, increased hydrophobic interactions, between EXPB1 and a chemical entity or compound.
For example there are alleles of EXPB1, which differ from the native EXPB1 by only 1-2 amino acid substitutions, and yet individuals who express these allelic variants may exhibit different binding affinities or activities. The metabolism of enzymatic agents used in the hydrolysis of cellulose or plant cell wall extension applications can be investigated using the structure provided here and the agents then altered using the methods described herein.
This information may thus be used to optimize known classes of EXPB1 enhanced enzymes (e.g. cellulases), substrates or enhancers, and more importantly, to design and synthesize novel classes of compounds with modified or enhanced EXPB1 activity.
Although the invention will facilitate the determination of actual crystal structures comprising an EXPB1 and a compound, which interacts with the EXPB1, current computational techniques provide a powerful alternative to the need to generate such crystals and generate and analyze diffraction date. Accordingly, a particularly preferred aspect of the invention relates to in silico methods directed to the analysis and development of compounds which interact with EXPB1 structures of the present invention.
Determination of the three-dimensional structure of EXPB1 provides important information about the binding sites of EXPB1, particularly when comparisons are made with similar expansins, and grass pollen allergens. This information may then be used for rational design and modification of EXPB1 substrates and inhibitors, e.g. by computational techniques which identify possible binding ligands for the binding sites, by enabling linked-fragment approaches to drug design, and by enabling the identification and location of bound ligands (e.g. including those ligands mentioned herein above) using X-ray crystallographic analysis. These techniques are discussed in more detail below.
Thus as a result of the determination of the EXPB1 three-dimensional structure, more purely computational techniques for chemical compound design may also be used to design structures whose interaction with EXPB1 is better understood (for an overview of these techniques see e.g. Walters et al (Drug Discovery Today, Vol. 3, No. 4, (1998), 160-178; Abagyan, R.; Totrov, M. Curr. Opin. Chem. Biol. 2001, 5:375-382). For example, automated ligand-receptor docking programs (discussed e.g. by Jones et al. in Current Opinion in Biotechnology, Vol. 6, (1995), 652-656 and Halperin, I.; Ma, B.; Wolfson, H.; Nussinov, R. Proteins 2002, 47:409-443), which require accurate information on the atomic coordinates of target receptors may be used.
The aspects of the invention described herein which utilize the EXPB1 structure in silico may be equally applied to both the EXPB1 structure of disclosed herein and the models of target EXPB1 proteins obtained by other aspects of the invention. Thus having determined a conformation of an EXPB1 by the method described above, such a conformation may be used in a computer-based method of rational drug design as described herein. In addition the availability of the structure of the EXPB1 will allow the generation of highly predictive models for virtual library screening or compound design.
Accordingly, the invention provides a computer-based method for the analysis of the interaction of a molecular structure with an EXPB1 structure of the invention, which comprises: providing the structure of an EXPB1 of the invention; providing a molecular structure to be fitted to said EXPB1 structure; and fitting the molecular structure to the EXPB1 structure.
In an alternative aspect, the method of the invention may utilize the coordinates of atoms of interest of the EXPB1 binding region, which are in the vicinity of a putative molecular structure, for example within 10-25 Å of the catalytic regions or within 5-10 Å of a compound bound, in order to model the pocket in which the structure binds. These coordinates may be used to define a space, which is then analyzed “in silico”. Thus the invention provides a computer-based method for the analysis of molecular structures which comprises; providing the coordinates of at least two atoms of an EXPB1 structure of the invention (“selected coordinates”); providing the structure of a molecular structure to be fitted to said coordinates; and fitting the structure to the selected coordinates of the EXPB1.
In practice, it will be desirable to model a sufficient number of atoms of the EXPB1 as defined herein, which represent a binding groove, e.g. the atoms of the residues identified in residues G1229-N157 which also preferably maintains the binding motifs TWYG, GGACG, HFD. Thus, in this embodiment of the invention, there will preferably be provided the coordinates of at least 5, preferably at least 10, more preferably at least 50 and even more preferably at least 100, e.g. at least 500 such as at least 1000, selected atoms of the EXPB1 structure.
Although every different compound metabolized by EXPB1 may interact with different parts of the binding pocket of the protein, the structure of this EXPB1 allows the identification of a number of particular sites which are likely to be involved in many of the interactions of EXPB1 with a candidate compound. The residues are set out in
In order to provide a three-dimensional structure of compounds to be fitted to an EXPB1 structure of the invention, the compound structure may be modeled in three dimensions using commercially available software for this purpose or, if its crystal structure is available, the coordinates of the structure may be used to provide a representation of the compound for fitting to an EXPB1 structure of the invention.
The binding pockets of cytochrome EXPB1 molecules are of a size which can accommodate more than one ligand. Indeed, some interactions may occur as a result of interaction of the compounds within the binding pocket of the same EXPB1. In any event, the findings of the present invention may be used to examine or predict the interaction of two or more separate molecular structures within the EXPB1 binding pocket of the invention.
Thus the invention provides a computer-based method for the analysis of the interaction of two molecular structures within an EXPB1 binding pocket structure, which comprises: providing the EXPB1 structure; providing a first molecular structure; fitting the first molecular structure to said EXPB1 structure; providing a second molecular structure; and fitting the second molecular structure to a different part said EXPB1 structure.
Optionally the method of analysis further comprises providing a third molecular structure and also fitting that structure to the EXPB1 structure. Indeed, further molecular structures may be provided and fitted in the same way.
In one aspect, one or more of the molecular structures may be fitted to one or more of the polysaccharide binding area, residues G129 through N157 of the EXPB1 binding groove mentioned above, and one or more of the other molecular structures may be fitted to coordinates of amino acids from another part of the EXPB1 binding pocket, such as another part of the ligand-binding region.
Following the fitting of the molecular structures, a person of skill in the art may seek to use molecular modeling to determine to what extent the structures interact with each other (e.g. by hydrogen bonding, other non-covalent interactions, or by reaction to provide a covalent bond between parts of the structures) or the interaction of one structure with EXPB1 is altered by the presence of another structure.
The person of skill in the art may use in silico modeling methods to alter one or more of the structures in order to design new structures which interact in different ways with EXPB1, so as to speed up or slow down their metabolism, as the case may be.
Newly designed structures may be synthesized and their interaction with EXPB1 may be determined or predicted as to how the newly designed structure is metabolized by said EXPB1 structure. This process may be iterated so as to further alter the interaction between it and the EXPB1.
By “fitting”, it is meant determining by automatic, or semi-automatic means, interactions between at least one atom of a molecular structure and at least one atom of an EXPB1 structure of the invention, and calculating the extent to which such an interaction is stable. Interactions include attraction and repulsion, brought about by charge, steric considerations and the like. Various computer-based methods for fitting are described further herein.
More specifically, the interaction of a compound or compounds with EXPB1 can be examined through the use of computer modeling using a docking program such as GOLD (Jones et al., J. Mol. Biol., 245, 43-53 (1995), Jones et al., J. Mol. Biol., 267:727-748 (1997)), GRAMM (Vakser, I. A., Proteins, Suppl., 1:226-230 (1997)), DOCK (Kuntz et al, J. Mol. Biol. 1982, 161:269-288, Makino et al, J. Comput. Chem. 1997, 18:1812-1825), AUTODOCK (Goodsell et al, Proteins 1990, 8:195-202, Morris et al, J. Comput. Chem. 1998, 19:1639-1662.), FlexX, (Rarey et al, J. Mol. Biol. 1996, 261:470-489) or ICM (Abagyan et al, J. Comput. Chem. 1994, 15:488-506). This procedure can include computer fitting of compounds to EXPB1 to ascertain how well the shape and the chemical structure of the compound will bind to the EXPB1.
Also computer-assisted, manual examination of the active site structure of EXPB1 may be performed. The use of programs such as GRID (Goodford, J. Med. Chem., 28, (1985), 849-857)—a program that determines probable interaction sites between molecules with various functional groups and an the polysaccharide binding surface—may also be used to analyze the active site to predict, for example, the types of modifications which will alter the rate of conformational change, or cell wall extension a compound or plant cell type.
Computer programs can be employed to estimate the attraction, repulsion, and steric hindrance of the two binding partners (i.e. the EXPB1 and a compound).
If more than one EXPB1 active site is characterized and a plurality of respective smaller compounds are designed or selected, a compound may be formed by linking the respective small compounds into a larger compound, which maintains the relative positions and orientations of the respective compounds at the active sites. The larger compound may be formed as a real molecule or by computer modeling.
Detailed structural information can then be obtained about the binding of the compound to EXPB1, and in the light of this information adjustments can be made to the structure or functionality of the compound, e.g. to alter its interaction with EXPB1. The above steps may be repeated and re-repeated as necessary.
As indicated above, molecular structures, which may be fitted to the EXPB1 structure of the invention, include compounds under development as potential enzymatic agents. The agents may be fitted in order to determine how the action of EXPB1 modifies the agent and to provide a basis for modeling candidate agents, which are metabolized at a different rate by an EXPB1.
Molecular structures, which may be used in the present invention, will usually be compounds under development for pharmaceutical use. Generally such compounds will be organic molecules, which are typically from about 100 to 2000 Da, more preferably from about 100 to 1000 Da in molecular weight. Such compounds include peptides and derivatives thereof, and the like. In principle, any compound under development in the field of enzymology can be used in the present invention in order to facilitate its development or to allow further design to improve its properties.
(iii) Analysis of Compounds in Binding Pocket Regions
Our finding of a long grooved binding region allows the analysis and design methods described in the preceding subsections to be focused on compounds which interact with one or more of the residues which make up this area.
Thus in one embodiment, the present invention provides a method for modifying the structure of a compound (polysaccharide) in order to alter its binding to EXPB1 or hydrolysis when bound to EXPB1, which method comprises: fitting a starting compound to one or more coordinates of at least one amino acid residue of the ligand-binding region of the EXPB1; modifying the starting compound structure so as to increase or decrease its interaction with the ligand-binding region.
In another embodiment, the present invention provides a method for modifying the structure of a compound in order to alter its metabolism by an EXPB1, which method comprises: fitting a starting compound to one or more coordinates of at least one amino acid residue of the ligand-binding region of the EXPB1; modifying the starting compound structure so as to increase or decrease its interaction with the ligand-binding region; wherein said ligand-binding region is defined as including at least one, such as at least two, for example such as at least five, preferably at least ten of the EXPB1 residues in the binding groove.
In another embodiment, the invention provides a method for modifying the structure of a compound in order to alter its binding properties to EXPB1 or cell wall extension when bound, which method comprises: fitting a starting compound to one or more coordinates of at least one amino acid residue of the binding region of the EXPB1; modifying the starting compound structure so as to increase or decrease its interaction with the binding region.
Desirably, in the above aspects of the invention, coordinates from at least two, preferably at least five, and more preferably at least ten amino acid residues of the EXPB1 will be used.
For the avoidance of doubt, the term “modifying” is used as defined in the preceding subsection, and once such a compound has been developed it may be synthesized and tested also as described above.
(viii) Compounds of the Invention.
Where a potential modified compound has been developed by fitting a starting compound to the EXPB1 structure of the invention and predicting from this a modified compound with an altered rate of metabolism (including a slower, faster or zero rate), the invention further includes the step of synthesizing the modified compound and testing it in an in vivo or in vitro biological system in order to determine its activity and/or the rate at which it is metabolized.
The method comprises: (a) providing EXPB1 under conditions where, in the absence of modulator, the EXPB1 is able to metabolize known substrates; (b) providing the compound; and (c) determining the extent to which the compound is metabolized in the presence of EXPB1 or (d) determining the extent to which the compound inhibits metabolism of a known substrate of EXPB1.
More preferably, in the latter steps the compound is contacted with EXPB1 under conditions to determine its function.
For example, in the contacting step above the compound is contacted with EXPB1 in the presence of the compound, and typically a buffer and substrate, to determine the ability of said compound to inhibit EXPB1 or to be metabolized by EXPB1. So, for example, an assay mixture for EXPB1 may be produced which comprises the compound, substrate and buffer.
In another aspect, the invention includes a compound, which is identified by the methods of the invention described above.
Following identification of such a compound, it may be manufactured and/or used in the preparation, i.e. manufacture or formulation, of a composition such as an enzymatic composition used in ethanol production, paper recycling or other plant cell extension industrial applications.
Thus, the present invention extends in various aspects not only to a compound as provided by the invention, but also to formulations including acceptable excipients, vehicles or carriers, and optionally other ingredients.
The above-described processes of the invention may be iterated in that the modified compound may itself be the basis for further compound design.
By “optimizing the structure” we mean e.g. adding molecular scaffolding, adding or varying functional groups, or connecting the molecule with other molecules (e.g. using a fragment linking approach) such that the chemical structure of the modulator molecule is changed while its original modulating functionality is maintained or enhanced. Such optimization is regularly undertaken during chemical compound development programs to e.g. enhance potency, promote pharmacological acceptability, increase chemical stability etc. of lead compounds.
Modification will be those conventional in the art known to the skilled medicinal chemist, and will include, for example, substitutions or removal of groups containing residues which interact with the amino acid side chain groups of an EXPB1 structure of the invention. For example, the replacements may include the addition or removal of groups in order to decrease or increase the charge of a group in a test compound, the replacement of a charge group with a group of the opposite charge, or the replacement of a hydrophobic group with a hydrophilic group or vice versa. It will be understood that these are only examples of the type of substitutions considered by medicinal chemists in the development of new pharmaceutical compounds and other modifications may be made, depending upon the nature of the starting compound and its activity.
Expansins are small extracellular proteins that promote turgor-driven extension of plant cell walls. EXPB1 (also called Zea m 1) is a member of the β-expansin subfamily known in the allergen literature as group-1 grass pollen allergens. EXPB1 induces extension and stress relaxation of grass cell walls. To help elucidate expansin's mechanism of wall loosening, we determined the structure of EXPB1 by X-ray crystallography to 2.75 Å resolution. EXPB1 consists of two domains closely packed and aligned so as to form a long, shallow groove with potential to bind a glycan backbone of ˜10 sugar residues.
The structure of EXPB1 domain 1 resembles that of family-45 glucoside hydrolase (GH45), with conservation of most of the residues in the catalytic site. However, EXPB1 lacks a second aspartate that serves as the catalytic base required for hydrolytic activity in GH45 enzymes. Domain 2 of EXPB1 is an immunoglobulin-like β-sandwich with aromatic and polar residues that form a potential surface for polysaccharide binding in line with the glycan binding cleft of domain 1. EXPB1 binds to maize cell walls, most strongly to xylans, causing swelling of the cell wall. Tests for hydrolytic activity by EXPB1 with various wall polysaccharides proved negative. Moreover, GH45 enzymes and a GH45-related protein called “swollenin”, lacked wall extension activity comparable to that of expansins. We propose a model of expansin action in which EXPB1 facilitates the local movement and stress relaxation of arabinoxylan-cellulose networks within the wall by noncovalent rearrangement of its target.
Prior to maturation plant cells typically experience a period of prolonged cell enlargement, often resulting in a >103 fold increase in volume. The impressive height of trees, some exceeding 100 m, depends on such enlargement, which entails massive vacuolar expansion and irreversible yielding of the cellulosic cell wall. In physical terms, the rate-limiting process for cell enlargement resides within the cell wall, which must be loosened so as to allow wall stress relaxation and consequent water uptake for vacuole enlargement and stretching of the wall (1, 2). Currently, the only plant proteins shown to cause cell wall relaxation are expansins (3, 4), although xyloglucan endotransglucosylase, pectate lyase, cellulase and other enzymes participate in cell wall restructuring during cell growth (5-8).
Expansins were originally discovered in a “fishing expedition” for catalysts of cell wall extension (9, 10). When walls are clamped in tension and incubated in acidic buffer, these proteins rapid induce wall extension and enhance wall stress relaxation. Their biological role in promoting cell enlargement is amply supported by in-vitro and in-vivo experiments, as well as by studies of gene expression, gene silencing, and ectopic expression (3, 11-13). In addition to cell enlargement, expansins are also implicated in other developmental processes where wall loosening occurs, such as in fruit softening, organ abscission, seed germination, and pollen tube invasion of the grass stigma (14-17).
Two expansin families with wall-loosening activity have been identified, named α-expansins (EXPA) and β-expansins (EXPB); both are found in all groups of land plants, from mosses to flowering plants (3, 18). Although they have only ˜20% amino acid identity, EXPA and EXPB proteins are of similar size (˜27 kD), their sequences align well with one another and they contain a number of conserved residues and characteristic motifs distributed throughout the length of the protein. EXPA and EXPB appear to act on different cell wall components, but their native targets have not yet been well defined.
A subset of β-expansins is known in the immunological literature as group-1 grass pollen allergens (19-21). These β-expansins are abundantly and specifically expressed in grass pollen, causing hay fever and seasonal asthma in an estimated 200-400 million humans (22, 23). The extraordinary abundance of group-1 allergens—comprising up to 4% of the protein extracted from grass pollen (24)—is unique (as far as we know) in the world of expansins, which are typically found in very low abundance and tightly bound to the cell wall. The abundance of group-1 allergens in grass pollen bespeaks a unique biological role, namely to loosen the cell walls of the grass stigma and style, thereby aiding pollen tube penetration and assisting delivery of its two sperm cells to the ovule, where a double fertilization occurs, forming the diploid zygote and the triploid endosperm. Seed development follows, and because cereal grasses provide the largest food source for humanity (e.g. rice, maize, wheat, and barley, to name but a few), the importance of these events for human welfare is hard to overestimate.
Other genes in the β-expansin family are expressed in a variety of other tissues in the plant body and in general lack the specific allergenic epitopes characteristic of group-1 allergens (24, 25). These so-called “vegetative β-expansins” are thought to have cell wall loosening activity and substrate specificity similar to the group-1 allergens, but these inferences have yet to be demonstrated experimentally.
The mechanism by which expansins loosen cell walls has not yet been worked out in molecular detail. Plant cell walls consist of a scaffold of long cellulose microfibrils ˜4 nm in diameter, embedded in a matrix of cellulose-binding glycans, such as xyloglucan and arabinoxylan, and gel-forming pectic polysaccharides (
Most of the biochemical work on expansins to date has focused on α-expansins, which do not hydrolyze the major structural polysaccharides of the wall and indeed are devoid of every enzyme activity assayed to date (28). Our current model proposes that α-expansins disrupt the polysaccharide complexes that link cellulose microfibrils together. The pollen β-expansins (group-1 allergens) have a marked loosening action on cell walls from grasses, but not from dicots, whereas the reverse is true for α-expansins; therefore it seems that the two forms of expansin target different components of the cell wall (21, 24). Grass cell walls are notable for containing relatively small amounts of xyloglucan and pectin, which are replaced with β-(1→3),(1→4)-D-glucan and glucuronoarabinoxylan (29)—two potential targets of β-expansins in their wall-loosening activity.
Sequence analysis suggests that expansins consist of two domains (2, 3). The putative N-terminal domain (D1) has distant sequence similarity (˜20% identity) to the catalytic domain of family-45 glycoside hydrolases (GH45; http://afmb.cnrs-mrs.fr/CAZY/). Despite this resemblance, α-expansins do not hydrolyze wall polysaccharides and so the sequence similarity is enigmatic. The C-terminal domain (D2) has sequence similarity (from 35% to <10% identity) to another class of allergens, the group-2/3 grass pollen allergens, whose biological function is unknown (30).
In this study we present the crystal structure of a native β-expansin purified from maize pollen. In the allergen field it is designated Zea m 1 isoform d, whereas by expansin nomenclature it is called EXPB1 (GenBank accession AAO45608). The allergen name “Zea m 1” encompasses a group of at least four pollen proteins (EXPB1, EXPB9, EXPB10, EXPB11) in two rather divergent sequence classes (24). EXPB1 is the most abundant of the maize group-1 allergens. We also test EXPB1 for binding and activity on cell walls. At the end we discuss a molecular model of expansin action that is consistent with its structure and known biophysical and biochemical activities.
EXPB1 has two closely-packed domains. Native EXPB1 was purified from maize pollen and crystallized in 15% (w/v) polyethylene glycol 4000 with 0.1 or 0.2 M ammonium sulfate. Two crystals were analyzed, yielding X-ray diffraction patterns consistent with the monoclinic C2 space group. EXPB1 structure was solved and refined to 2.75 Å resolution (see Methods) with a crystallographic R-factor of 0.233 and an R-free of 0.291 (Table 1).
EXPB1 contains two domains (residues 19-140 [D1] and 147-245 [D2]) connected by a short linker (residues 141-146) and aligned end to end so as to make a closely-packed irregular cylinder ˜66 Å long and 26 Å in diameter (
Based on its electron density, our model of this N-linked glycan consists of a β-(1→4)-linked backbone of GlcNac1GlcNac2Man3 with two Man residues and a Xyl residue attached to Man3 and a Fuc residue linked to GlcNac1. Such so-called paucimannosidic-type N-linked glycans are characteristically processed in the Golgi and in post-Golgi steps (31).
Residues 1-3 in the leader sequence were not modeled due to insufficient electron density, but N-terminal sequencing and mass spectrometry indicate their presence (24). The 24-aa signal peptide at the N-terminus, predicted from the EXPB1 cDNA, was absent and was presumably excised during ER processing prior to secretion. No other post-translational modifications, bound metals or ligands were evident from the crystal structure.
The two EXPB1 domains pack close to one another, making contact via H-bonds and salt bridges between basic residues (K65 and R137) in D1 and acidic residues (E217 and D171) in D2. These residues are highly conserved in the EXPB family (see annotated sequence logo in
Structure of Domain 1. Residues 19-140 form an irregular ovoid with rough dimensions of 35×30×24 Å. The protein fold is dominated by a six-stranded β-barrel flanked by short loops and α-helices (
Previous analysis (2, 3) indicated that D1 has distant sequence similarity to members of glycoside hydrolase family 45 (GH45), whose members have been characterized as inverting endo-β-(1→4)-D-glucanases (2, 3, 32, 33). Superposition of D1 with a GH45 enzyme (PDB #4ENG) using the secondary structure matching algorithm in CCP4 (34) gives good overlap of the two structures for 84 residues (60%) of the peptide backbone of D1 (
The GH45 enzyme is substantially larger than D1 (210 residues versus 121) and the “extra” structure in the GH45 enzyme is composed largely of loop regions and α-helices forming a large ridge and subtending structure lacking in D1 (
In addition to partial conservation of the protein fold, D1 has noteworthy, but incomplete, conservation of the catalytic site identified in GH45 enzymes (
What is missing in EXPB1 is a residue corresponding to D10*, the catalytic base required for glucan hydrolysis by GH45 enzymes (35). As indicated in
Inspection of the EXPB1 structure revealed another acidic residue, D95, which is close to D107 (the carboxylate groups are 8.5 Å away). D95 is highly conserved in group-1 allergens, as well as in β-expansins in general (
Enzymatic activity. Because of the structural similarity between D1 and GH45 and the configuration of D95/D37, we tested the ability of EXPB1 to hydrolyze the major polysaccharides of the cell wall. Even with 48-h incubations, we did not detect hydrolytic activity by EXPB1 (
Taking another tack, we tested two GH45 enzymes (32, 36) and a nonenzymatic GH45-related protein named “swollenin” (37) for their abilities to catalyze cell wall extension. For these experiments, heat-inactivated walls from cucumber hypocotyls and wheat coleoptiles were clamped in tension in an extensometer and changes in length were monitored upon addition of protein. We observed only small traces of wall extension activity for the GH45 enzymes and for swollenin. Thus, these related proteins lack significant expansin-type activity, at least with the cell walls tested here.
We conclude that, despite the structural similarity of D1 to GH45, EXPB1 does not induce wall extension via wall polysaccharide hydrolysis.
Structure of Domain 2 (D2). Residues 147-245 of EXPB1 make up a second domain (D2) composed of eight β strands assembled into two antiparallel β sheets (
D1 and D2 form a long potential polysaccharide-binding site. The two EXPB1 domains align so as to form a long, shallow groove with highly conserved polar and aromatic residues suitably positioned to bind a twisted polysaccharide chain of 10 xylose residues (
Residues that could bind a polysaccharide by van der Waals interactions with the sugar rings include W26, Y27, G40, and G44 from D1 as well as Y160 and W194 from D2. Conserved residues that might stabilize polysaccharide binding by H-bonding include T25, D37, D95 and D107 in D1 and N157, S193 and R199 in D2.
The openness of the long groove may enable EXPB1 to bind polysaccharides that are part of a bulky cell wall complex, such as on the surface of cellulose; that openness may also be important for binding branched glycans such as arabinoxylan which itself binds to the surface of cellulose microfibrils. Because EXPB1 binds preferentially to xylans (see below), we have modeled an arabinoxylan, characteristic of grass cell walls, bound to the long groove of EXPB1 (
A second conserved surface in D2 is far removed from D1 (arrows in
Binding. EXPB1 bound to isolated maize cell wall (
With the molecular structure of EXPB1 in hand, we can examine previous inferences about expansin structure and its mechanism of cell wall loosening, but first the use of the group-1 pollen allergen for this study merits comment. Unlike other forms of expansin, which are found in very low abundance and have low solubility, the group-1 allergens are produced in copious amounts by grass pollen, from which they are readily extracted, purified, and concentrated to high levels without precipitation. Moreover, grasses produce abundant pollen, with maize being an especially liberal donor. In contrast to recombinant forms, use of the native protein insures correct processing and post-translational modifications. We note that expression of active expansins in various recombinant systems has proved problematic, due to improper folding, aggregation and hyperglycosylation (M. Shieh and D. J. Cosgrove, unpublished data). Other forms of {tilde over (□)}expansin (e.g. the vegetative homologs) require harsh conditions to extract them from plant tissues (38), resulting in denatured protein; in soybean cultures an EXPB accumulates in the medium, but in a degraded and inactive form (39). EXPA proteins have been purified from various plant tissues, but in our experience they are difficult to concentrate to levels suitable for crystallization.
The high solubility and abundance of the group-1 allergens thus commends them for crystallization studies, but it should be noted that some of their biochemical properties may be specialized for their unique biological role in grass pollination. A case in point is their atypical pH dependence (maximum activity at pH 5.5; (24)), which is shifted to less acidic values than that found for other expansins. Likewise, their high solubility seems to be exceptional. Nevertheless, the general features of EXPB1 structure should prove to be common to the whole expansin family.
EXPB1 is composed of two domains. Although D1 structurally resembles GH45 and indeed has conserved much of the GH45 catalytic site, it lacks the second Asp residue—the catalytic base—required for hydrolytic activity in GH45 enzymes (33, 35). Thus, expansin's lack of wall polysaccharide hydrolytic activity, documented here for EXPB1 and in previous work for EXPA (28, 40), can be understood in structural terms as due to the lack of the required catalytic base. Furthermore, our finding that bona fide GH45 enzymes lack expansin's wall extension activity lends additional support to the conclusion that expansin does not loosen the cell wall by polysaccharide hydrolysis.
D2 as binding module? We previously speculated that D2 may be a carbohydrate-binding module (CBM) (2, 4). This notion gains indirect support from the structure of D2, in which two surface aromatic residues (W194, Y160) are in line with two aromatic residues (W26, Y27) in D1, forming part of an extended, open, and highly conserved surface in EXPB1. D2 has an immunoglobulin-like fold. Proteins with this fold form a large superfamily of β-sandwich proteins implicated in binding interactions, but lacking in enzymatic activity (41). At least 16 of the currently recognized CBM families in the Carbohydrate-Active Enzymes (CAZY) database (http://afmb.cnrs-mrs.fr/CAZY/) have a β-sandwich fold. However, the specific fold topology of D2 does not match any of these CBM folds and D2 lacks a bound metal atom, found in nearly all of the β-sandwich CBMs (42).
Nevertheless, from the structure of EXPB1 we expect that D2 aids glycan binding, particularly via the two surface aromatic residues W194 and Y160, aided by polar residues S193, R199, C156 and N157. These potential sugar-binding residues do not correspond to those inferred from a homology model of Lol p 1, a group-1 allergen from rye grass (43). In this model, which was based on the structure of Phl p 2, a group-2 allergen (30, 44), the authors identified two potential polysaccharide binding surfaces, one of which corresponds to the buried D2 face contacting D1.
It is notable that endoglucanases are most often found in nature as modular enzymes, coupled to a CBM via a long, highly glycosylated linker. Crystallization of intact GH45 enzymes with their CBMs has not yet been achieved, probably because the two domains do not maintain a fixed spatial relationship to each other. This difficulty of crystallization is a common experience with many CBM-coupled enzymes, and so successful crystallization of the two-domain EXPB1 is notable in this regard. In EXPB1 the linker is very short and the multiple contacts between D1 and D2 enable close coupling of the two domains, which may function as a single unit in binding the cell wall.
Expansins as cysteine proteases? A controversial hypothesis has been proposed that group-1 allergens are papain-related cysteine proteinases, with conservation of papain's active site residues C25, H159 and N175 (the “catalytic triad”) (45, 46). According to this hypothesis, C73 in EXPB1 should correspond to papain's C25. However, from the structure of EXPB1 we see that C73 participates in a disulfide bond conserved with GH45 enzymes, is relatively inaccessible, and is nowhere near the conserved surface. Moreover, the residues claimed to correspond to papain's H159 and N175 are dispersed in D2, remote from C73 and are not conserved in expansins. We conclude that the resemblance to papain suggested by Grobe et al. (45, 46) is not supported by our crystallographic model of EXPB1.
The conserved surface of EXPB1 does contain two Cys residues (C58, C156), but their environment does not resemble that of papain's active site. C58, which is conserved in about half of the EXPB family, is relatively inaccessible, being mostly buried underneath Y27 at the bottom of the extended groove. C156 not conserved in the EXPB family, but is usually replaced by serine. Experimental assays failed to detect proteinase activity in native EXPB1 (47). Moreover, the group-1 allergens are noted for their remarkable stability, which is also the case for EXPB1. We deem it likely that recombinant expression of EXPB in Pichia induced a host protease that accounted for the protein instability observed by Grobe et al. (45, 46). In fact, such host proteinase induction has been reported upon recombinant expression of a group-1 allergen (48).
Comparison with vegetative β-expansins and with α-expansins. EXPB1 is a member of the group-1 grass pollen allergens, which comprise a subset of the larger EXPB family. The EXPB family is notably larger in grasses than in other groups of land plants, and part of this expansion involved the unique evolution and radiation of the pollen allergen class of EXPBs, which are encoded by multiple genes (49). For instance, we classified 5 of the 19 EXPB genes in the rice genome as group-1 allergens (49). Multiple EXPB genes of the pollen allergen class may account in part for the numerous group-1 “isoallergens” found in grass pollen (19, 20, 50, 51).
There are minor conserved differences between the allergen class and the remaining “vegetative” EXPBs. These are so slight that we expect the structural features of EXPB1 are characteristic of the vegetative EXPBs, with one exception: the N-terminal extension in EXPB1 contains a motif (VPPGPNITT) that is consistently found, with only minor variation, in group-1 grass pollen allergens, but not in other EXPBs. This motif contains one or more hydroxyprolines and a glycosylated asparagine, features common to the pollen allergen class of EXPB (52). The function of this N-terminal extension is unknown, but it may play a role in protein recognition, transport, packaging and processing by the pollen secretory apparatus. Additionally, the glycosylated extension may contribute to the exceptional solubility of the group-1 allergens (other expansins characterized to date have very low solubility) or may interact with other components of the cell wall. While this motif is a unique hallmark of the group-1 allergens, many EXPB proteins lack an N-terminal extension altogether, and so it is not an essential part of expansin function. However, an N-terminal extension with similar post-translational modifications was found as part of an EXPB expressed in soybean cell cultures (39).
The good sequence alignment and conservation of motifs between the EXPB and EXPA families make it likely that EXPA proteins will have the same three-dimensional structure as reported here for EXPB1. There are two notable regions where EXPA and EXPB differ. EXPA has an additional stretch of ˜12 amino acids in the region corresponding to E99/P100 in EXPB1. E99 and P100 are part of a loop between β strands IV and V in D1; these residues form part of the upraised flank to the left of the long groove identified in
A second difference is that EXPAs lack a segment corresponding to G120-H127 in EXPB1. This segment, which contains few conserved residues, forms α-helix c and constitutes part of the surface of the pointed end of D1. This surface is remote from the conserved regions we have identified, and so is unlikely to affect activity.
Allergenic epitopes. Allergies to grass pollen are widespread, afflicting an estimated 200-400 million people, and numerous studies have concluded that the group-1 allergens are the most important allergenic components of grass pollen (23, 23, 54, 55). Maize EXPB1 and its orthologs in turf grasses share common epitopes, as judged by antibody cross reactivity, with the predominant epitopes found in the protein portion of the molecule and the glycosyl residues being of secondary antigenic significance (52, 56, 57). The dominant group-1 allergenic epitopes, which have been identified by epitope mapping studies, can be readily located on the surface of EXPB1. For instance, the 15-residue c98 epitope identified by Ball et al. (58) includes D107 in the conserved catalytic site of EXPB1, but also includes residues that are exposed on the opposite side of the protein. “Site D” identified by Hiller et al. (59) overlaps part of the extended conserved groove of D1 containing the motif TWYG28 (
In view of the sequence conservation within the EXPB family, as well as within the entire expansin superfamily, it is surprising that the dominant antigenic epitopes of the group-1 allergens are not shared by vegetative EXPBs or by EXPA members. Nevertheless, this seems to be the case because antibodies raised against the group-1 allergens do not recognize other forms of expansin. This is indeed fortunate, for otherwise persons with strong allergies to grass pollen would also be allergic to fresh fruits, vegetables, grains and other plant tissues that express members of this large gene family that is ubiquitous in plants.
A molecular model of wall loosening by expansins. Expansin action may be summarized as follows: the protein binds one or more wall polysaccharides and within seconds induces wall stress relaxation followed by wall extension, without hydrolysis of the wall polymers. There is no requirement for ATP or other source of chemical energy, and the wall continues to extend so long as the wall bears sufficient tension and expansin is present (that is, expansin acts catalytically, not stoichiometrically).
In the case of EXPB1, we imagine that stress relaxation begins when it binds a taut arabinoxylan tethered to a cellulose microfibril, causing local release of the arabinoxylan from the cellulose surface. Movement of the β-expansin along the arabinoxylan-cellulose junction would enable it to unzip the hydrogen bonds between the polysaccharides, relaxing the taut tether and allowing turgor-driven displacement of cellulose and arabinoxylan, which may then reassociate in a relaxed state to restore wall strength. During this movement, the two expansin domains might shift in a hinge-like manner, binding and letting go of the arabinoxylan independently of each other, leading to an inchworm-like movement along the polysaccharide. We estimate that as little as 10° shift in angle between domains could cause a one-residue dislocation of the polysaccharide along the binding surface.
To assess the feasibility of such inter-domain movement, we estimated the buried surface area between the two domains, using CCP4. The value is 589 Å, which is indicative of a weak inter-domain interaction (61), is consistent with domain movements as imagined above. A potential source of energy for these movements is the mechanical strain energy stored by the taut polysaccharide in a turgor-stretched cell wall. In this model, expansin acts as molecular device that uses the strain energy stored in a taut cellulose-binding glycan to help dissociate the glycan from the surface of cellulose.
Protein Purification, Crystallization and Data Collection. Native Zea m 1 was extracted from pollen of field-grown maize plants at 4° C. in 0.125 M sodium carbonate and then purified to electrophoretic homogeneity in the presence of 5 mM dithiothreitol using two chromatographic steps as described (24). With this method four Zea m 1 isoforms were readily distinguished and we used the most abundant isoform, Zea m 1d (=EXPB1), for crystallization and activity assays. For the binding experiments, EXPB1 was further purified by HPLC on a reverse phase column (Discovery C8, 15 cm×4.6 mm i.d., 5 μm, Supelco) pre-equilibrated with 10% acetonitrile containing 0.1% trifluoroacetic acid. Bound protein was eluted at 1 mL min− with a linear gradient of 22 to 90% acetonitrile in the same solution for 20 min at a flow rate of 1 mL min−, at 25° C. We confirmed wall extension activity of EXPB1 purified in this way.
Crystals were grown at 21° C. for 9 days using EXPB1 at 10.5 mg/mL in 100 mM Na acetate, pH 4.6, in 5-μL hanging drops, with addition of 5-μL precipitant (15% (w/v) polyethylene glycol 4000 with 0.1 or 0.2 M ammonium sulfate) and with 1-mL reservoir volume. Two crystals were analyzed, yielding diffraction patterns consistent with the monoclinic C2 space group. Crystal 1 had unit cell dimensions of a=113.7 Å, b=45.2 Å, and c=70.3 Å, with angles α=90.0°, β=124.6°, and γ=90.0°; crystal 2 had unit cell dimensions of a=112.6 Å, b=44.4 Å, and c=69.6 Å, with angles α=90.0°, β=124.4°, and γ=90.0°.
Data were collected using a RIGAKU RU200 rotating anode X-ray generator with CuK□ radiation, operating at 5 KW of power (50 kV, 100 mA) (Molecular Structure Corporation, The Woodlands, Tex.). Three-degree oscillation frames, each exposed for 120 minutes were collected on an R-AXIS IV detector. The two crystals were used to get a 93% complete dataset. DENZO and SCALEPACK software suite (62) were used for data processing.
Structure Solution and Refinement. Our final model of EXPB1 structure was based on the native crystal data set and was solved by molecular replacement calculations using the program AmoRe (63) with the structure of Phl p 1 (PDB entry code 1N10) which has 58% amino acid identity with EXPB1 over 240 residues. EXPB1 has four more residues at its C-terminus. The best molecular replacement solution in AMoRE was obtained by deleting the first 13 residues of the N-terminus (attempts that included this stretch did not yield a solution) and by including all the side-chains for the rest of the protein (attempts with just the backbone atoms did not yield a good solution as well) and including all the available data to 2.75 Å. The correlation co-efficient and the R-factor for the best solution was 55.1 and 51.0 respectively. The next best solution had an inferior correlation co-efficient and R-factor of 49.3 and 53.9, enabling us to proceed with further refinement and model building with confidence. For further refinement details and comparison with the 1N10 structure, see supplemental text, published on the PNAS web site. Coordinates and structure factors of the structure have been deposited in the protein data bank (PDB code 2HCZ; (64)). A summary of the refinement results is given in Table 1 (on PNAS web site).
Polysaccharide Hydrolysis. Two mg of dye-coupled insoluble polysaccharides (AZCL-polysaccharides, Megazyme, Wicklow, Ireland) were suspended in 100/L buffer (50 mM sodium acetate, pH 4.5, with 1 mM NaN3 and 10 mM dithiothreitol) and incubated with shaking at 30° C. for 48 h+/−30 μg of EXPB1. At the end of the incubation, 300 μL of 2.5% Trizma base was added to each tube to stop reaction, the suspension was centrifuged, and the absorbance (590 nm) of the supernatant was measured.
Binding. Cell walls were collected from maize silks, cleaned by phenol/acetic acid/water washes (65) and lyophilized. EXPB1 was purified on a CM-Sepharose Fast Flow (Amersham Biosciences) column in a LP system (Bio-Rad) (24). EXPB1 (10 μg) was incubated with 1 mg cell wall in 400 μL of 50 mM sodium acetate, pH 5.5, for 1 h at 25° C. with agitation. After incubation, protein remaining in the supernatant was analyzed by SDS-PAGE (12% poly acrylamide), stained with SYPRO Ruby protein gel stain (Bio-Rad).
Commercial polysaccharides dissolved in 20 mM sodium acetate, pH 4.5 (200 μg, oat spelts xylan (Sigma), birch wood xylan (Fluka), barley β-glucan (Sigma, G-6513), konjac glucomanna (Megazyme) and tamarind xyloglucan (Megazyme) were applied to nitrocellulose membranes disks (ca. 7 mm diameter, Protran, BA83, pore size; 0.2 μm, Whatman). The disks were dried at 80° C. overnight. The coated disks were incubated with blocking reagent (Roche) dissolved in 0.1 M maleic acid buffer for 1 h at room temperature to reduce nonspecific binding of EXPB1. After the blocking, the disks were washed with 20 mM Na acetate 5 times for 3 min each, then incubated with EXPB1 (20 μg per tube; purified by reverse-phase chromatography; see above) in 400 μL of 20 mM sodium acetate, pH 5.5 at 25° C. for 1. After the incubation, the supernatant (unbound protein) was analyzed by reverse phase chromatography (above). The amount of EXPB1 bound to the coated nitrocellulose membrane disks was calculated from the reduction in the amount of unbound protein, assessed by reverse-phase HPLC of the supernatant.
Acknowledgments. This work was supported by DOE Grant FG02-84ER13179 and NIH Grant 5R01GM60397 to DJC. We thank: Dr. Greg Farber for instimable advice and assistance with growing the EXPB1 crystals; Dr. Javier Sampedro for useful discussions; Daniel M. Durachko, Edward Wagner and Dr. Hemant Yennawar for expert technical assistance; Dr. Colin Mitchison for gift of the swollenin sample; Dr. Inez Munoz for gift of the TrCel45 sample; Dr. Jan-Christer Janson for gift of the MeCel45 sample.
Structure Solution and Refinement. After several cycles of rigid body refinement the maps still looked noisy. To improve this, density modification was performed by using the program CNS (1). Solvent density modification and density truncation features were used. The resulting maps gradually helped in modeling regions of the missing N-terminal residues and the loop between residues 29 and 38. The side chains that were different in Phl p 1 compared to EXPB1 could also be corrected, and the four extra residues at the C terminus could be located. The polysaccharide covalently linked to Asn-10 was modeled as shown in
Comparison with Phl p 1 (PDB ID code 1N10). Compared to the 2.9-Å structure of 1N10, which has a dimer in the asymmetric unit, EXPB1, with a monomer in the asymmetric unit, is solved to a better resolution (2.75 Å). The loop consisting of residues 29-38 is not resolved in 1N10 but has good electron density in EXPB1. This is an important loop because it contains D37, a potential candidate for the catalytic base. The first 15 residues at the N-terminal extension are oriented entirely differently in the two structures (leading to successful molecular replacement when omitted). The N-terminal strand in 1N10 extends out away from the protein and interacts with a second monomer. Because the recombinantly produced Phl p 1 used to solve the 1N10 structure was not native protein, the processing of the N-terminal extension appears to be atypical (the hydroxylation of prolines is lacking, and glycosylation pattern at N10 is probably different and was resolved to only one GlcNac).
When the Cα carbon atoms of both D1 and D2 are superimposed for the two proteins, the rmsd is 1.84 Å. Superposition of the D1s alone (excluding the first 15 residues at the N terminus) reveals a good overlap (rmsd of 0.88 Å) whereas the D2s overlap poorly (rmsd of 1.82 Å). Comparison of the overlapping structures shows that the Cα of W194 (D2), which is a crucial part of the putative binding groove, is displaced by 4 Å in 1N10 and its side chain is rotated and displaced by almost 12 Å. However, the tryptophan ring continues to stay in the same plane as the other residues at the base of the groove, and hence a sugar could still bind in a fashion similar to that for EXPB1.
Expansins have the many conserved domains as shown in
The mature protein is ˜25-27 kDa and consists of two domains, an amino-terminal domain of ˜120 amino acid residues (green in structure) with structural and sequence similarity to family-45 endoglucanases (EG45-like domain) and a carboxy-terminal domain of ˜98 amino acid residues (cyan in structure) that is hypothesized to function as a polysaccharide-binding domain (this is not experimentally established).
This application claims priority under 35 U.S.C. § 119 of a provisional application Ser. No. 60/822,716 filed Aug. 17, 2006, which application is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60822716 | Aug 2006 | US |