The present invention relates to a method for performing restrained dynamics docking of one or several substrates having allosteric or synergistic effect on enzymes presenting multispecific and flexible active site. It also concerns a method for determining the 3D-structure of active sites that are flexible and can adapt to different substrates, which is the case for multispecific enzymes such as cytochrome P450.
As of today, various computer graphics systems allow to generate molecular models of large molecules such as proteins from the PDB structural data obtained using X-ray crystallography and NMR. We can cite for example MODELLER, COMPOSER, MATCHMAKER (Tripos), or 3D graphical environments for molecular modeling such as SYBYL (Tripos) or INSIGHT II (Accelrys).
Substrates as well as inhibitors or agonists often act by binding to particular regions of an enzyme or receptor referred as the active site. In industry, the purpose of using these 3D models is to assess the main features of the molecules which are involved in the binding to the active site. New molecules that fit the active site can be designed.
Biological interactions are not possible without flexibility and motion. One of the principal tools in the theoretical study of motion in biological molecules is the method of molecular dynamics simulations (MD). This computational method calculates the time dependent behavior of a molecular system (Karplus and McCammon, 2002). MD simulations have provided detailed information on the fluctuations and conformational changes of proteins and nucleic acids. These methods are now routinely used to investigate the structure, dynamics and thermodynamics of biological molecules and their complexes. They are also used in the determination of structures from x-ray crystallography and from NMR experiments. The molecular dynamics simulations can be used to recreate the successive events in the binding process of a molecule, and thermodynamic parameters implicated in such process can therefore be derived, which is of great interest in the design of active molecules.
Nevertheless, the methods proposed in the art are based on a relatively low level of calculations of few parameters. It relies only on the molecule energy constrained with a fixed geometry. It relies only on the interaction energy between the molecule and the active site frozen in a fixed geometry.
Consequently, there is a need for a model replicating in silico the natural process of molecular interactions.
The method according to the invention provides both minimizations and molecular dynamics calculations. More specifically, it provides a new approach which is more appropriate to flexible structures, hereafter referred as “restrained dynamics docking” or “soft-restrained restrained dynamics docking”. This technique employs constrained dynamics simulations, where the only constraints are active site-substrate distances.
For example, to explain and predict drug metabolism in organisms, in which the cytochrome P450 (CYP) superfamily of haem-thiolate enzymes plays a central role, it is of large interest to dispose of a molecular picture of the binding sites responsible for the biotransformation. Efficiency of the prediction is then directly related to the molecular precision of the model, which resolution must be obtained at the atomic level to exploit the model for further docking studies.
In mammalian, hepatic cytochrome P450s constitute the major enzymes involved in the metabolism of exogenic compounds. Among them, isozymes of the CYP3 family (such as CYP3A1 and 3A2 in rat, and CYP3A4, CYP3A5, CYP3A7, CYP 3A43 in human) are known to metabolize the majority of drugs in clinical use. These are multi-specific enzymes, able to metabolize a large variety of structurally diverse chemicals or substrates including steroids, linear or cyclized peptides (Delaforge et al. 1997, Delaforge et al. 2001, Aninat et al. 2001), generally fairly lipophilic, within a broad range of molecular sizes from testosterone (Mw 288) to cyclosporin A (Mw 1203).
The inventory of known substrates for CYP 3A contains a large variety of different molecules having apparently no common structural factors. Actually it can be estimated that more than five hundred utilized drugs can be recognized and metabolized by CYP 3A (Guengerich 1995, Wrighton et al. 2000, Lewis 2001). Closer inspection of the precise transformations catalyzed by CYP 3A indicates that there is an important regio- and stereo-selectivity for each substrate. The active site can accommodate relatively rigid substrates such as aflatoxin derivatives or steroids, that are oxidized almost exclusively at a precise position. Thus CYP 3A4 catalyzes the testosterone oxidation exclusively at the 6β position, whereas CYP 3A7 oxidizes dehydroepiandrosterone (DHEA) or its 3 sulfate conjugate exclusively on the 16α position (see
The recognized substrates can have endogenous origin such as steroids or can be drugs or compounds found in food. For example, grapefruit juice contains bergamottin derivatives having specific CYP 3A inhibitory activities (Schmiedlin-Ren et al. 1997). Linear peptides (Delaforge et al. 2001, Hosea et al. 2000) or cyclized peptides (Delaforge et al. 1997) containing from 2 aminoacids (called diketopiperazine, Delaforge et al. 2001, Aninat et al. 2001) to 11 amino-acids (e.g. cyclosporin) are also recognized.
Following this wide range substrate recognition, a tentative subclassification was established leading to a multi-site hypothesis (Hosea et al. 2000, Ekins et al. 2003) consisting of at least 2 or 3 binding zones in the active site. This hypothesis has been established on the facts that CYP 3A shows often atypical hyperbolic kinetic constants and is thus unable to reach saturation. In addition, the presence in the active site of a second substrate having a different molecular nature lead to either no modification or increased metabolism of both substrates. Such allosteric effects have been clearly described in the case of simultaneous metabolism of steroids such as testosterone and α-natphtoflavone.
Consequently, any molecular model describing correctly the multiple substrate specificity (that takes into account large variations in molecular size and chemical structures), and substrate cooperativity effects within the active site (when two or more drugs interact), is of considerable scientific and industrial interest. Such a molecular model must be able to rationalize the binding of the diverse known substrates, and the orientations of the molecules in the binding site that account for their known positions of metabolism (such as N-demethylations, benzylic hydroxylations etc.). CYP3A4 is considered as the main hepatic form and is found in a wide variety of human organs such as intestine, brain or skin. CYP 3A5 is also present in liver and is the major 3A form present in the kidney. The 3A5 isoform is subject to genetic polymorphism. CYP 3A7 is the major 3A isoform present in the foetus whereas CYP3A43 is mainly located in adult prostate or testis. These isoforms share amino acid identities higher than 70%. (Westlind-Johnsson et al. 2003, Gellner et al. 2001, Koch et al. 2002). It is currently accepted that CYP3A4 is the most active isoform for classical P450 3A substrates whereas recent data (Williams et al. 2002) demonstrate equal or slightly reduced activity for CYP3A5 and a significantly lower metabolism capability for CYP3A7 as compared to CYP3A4. Additionally, differences have been observed in term of oxidative regioselectivity of the CYP3A7 compared to other isoforms. As an example, CYP3A7 metabolizes intensively DHEA and especially its sulfate conjugate derivative whereas CYP3A4 is a poor metabolizer. The oxidation by CYP3A7 occurs mostly in the 16α position of DHEA. In contrast, CYP3A7 metabolizes testosterone in both 6β and 16α position whereas CYP3A4 or 3A5 metabolize it almost exclusively in the 6β position (Inoue et al. 2000).
At the contrary of the P450 3A subfamily, other P450 isoforms have more rigid active site, as suggested by the narrow range of recognized substrates or inhibitors. These P450 isoforms recognize generally a small number of substrates or inhibitors having in common the same shape (i.e. P450 1A isoforms), or the same charge (i.e. CYP 2B, 2C or 2D isoforms), or the same chemical nature such as steroids (i.e. CYP19 or CYP21 isoforms) or lipids (i.e. CYP 4 family).
As no high-resolution 3D structure of CYP3A is today publicly available, due to continuing difficulties in promoting crystallization of intrinsic membrane proteins or due to an unusual confornational flexibility that would explain how CYP3A can accommodate various substrates, it is necessary to rebuild a 3D model structure, integrating the known biochemical data of CYP3A and the structural data of other members of the CYP superfamily. X-ray crystallographic determinations of several bacterial P450 enzymes in the 1990s (see Table 1 for a summary of structural data) have stimulated numerous attempts in modeling microsomal P450S such as human CYP3A4. The chapter 6 of the book “Guide to Cytochromes P450: structure and function” written by David F. V. Lewis reviews the current status of structural and modeling investigations of the P450 family (Lewis 2001). This review was however written just before the release of the first mammalian P450 structure (2C5), still today the only one mammalian template available.
Table 1: the eight X-ray crystal structures of P450s available in 2002: six bacterial, one fungal (P450 nor), one mammalian (CYP2C5). The P450cam, P450terp, P450eryF, P450nor belong to class I P450s enzymes, whereas P450BM3 belongs to class II enzymes, like microsomal enzymes CYP2C5 and 3A. P450BM3 structure is therefore a priori more relevant to rebuilding a structural model of CYP3A, but since the CYP2CS X-Ray structure has been released, it became obvious that the structural homology between the other bacterial enzymes and microsomal enzymes was better than expected from the poor homology of primary structure (<25% identity). Then, the relevance of using class I and class II structures together for rebuilding models of class 11 P450s was no more questionable. In the two examples described in the present invention, the structural model of human CYP3A4 was rebuilt using the six first structures listed above, with no preference in the structural alignment, and the structural model of human CYP3A7 was rebuilt using four structures among those listed above with again no preference in the structural alignment, i.e. P450BM3, P450 EryF, P450 2C5 and CYP51, one of the last published structural sets. CYP119 was not incorporated into the modeling process.
All the proposed models of CYP3A4 obtained by homology modeling are thus so far based on bacterial crystal structure templates: the first was proposed by Ferenczy and Morris and used the X-ray structure of bacterial P450cam, as unique template structure (Ferenczy and Morris 1989). Another model was built later by David F. V. Lewis, using also a unique template structure, the P450BM3 structure, which was supposed to be more relevant as a template since this P450 was the only one class II enzyme with known three-dimensional structure (Lewis et al. 1996). A third model, based on a multiple structure template, was built by Szklarz and Halpert, using the four first X-ray crystal structures available P450cam, P450terp, P450eryF, and P450BM3. This four-bacterial template approach strategy is closer to our rebuilding strategy, but was still missing some relevance in the absence of a mammalian template. In our hands, the incorporation of the mammalian 2C5 crystal structure into rebuilding steps of models of cytochrome P450 3A proved to be decisive. Inclusion of 2C5 crystal structure had indeed a profound effect on the structural alignment with the five non-mammalian structures, resulting in a different topology of the active site and a marked divergence between the model and each individual template. The advantage of our multiple-template approach resides essentially in the availability of a final template that can be used to rebuild various mammalian cytochromes P450. Up to now there is no available crystal structure or structural model of human CYP3A5, CYP 3A7, CYP3A43 or other mammalian CYP3A.
More recently, two new bacterial P450 crystal structures emerged in the literature (Table 1): CYP51 (PDB code le9x), from Mycobacterium tuberculosis, that catalyzes the oxidative removal of 14α-methyl group from sterol precursors in sterol biosynthesis in yeast and fungi (ergosterol), plants (phytosterol) and mammals (cholesterol), for its potential in the design of antifungal agents (Podust et al. 2001). And CYP119 (PDB code lf4t), from the thermophilic archaeon Sulfolobus solfataricus, the first P450 identified in Archaea, for its interest in understanding the enhanced thermal stability of the structure, especially in the region of the active site (Yano et al. 2000). Those two structures have been shown to exhibit the typical bacterial P450 fold, with some exceptions in the topology. They have not been included as structural templates in the modeling steps of the CYP3A4 model described in example 1. The names of newly discovered P450s follow the now accepted nomenclature of David R. Nelson (Nelson 1999).
The protein databank (Brookhaven Protein Databank, http://www.rcsb.org/pdb/) currently indicates that there are 76 separate crystal structures available for the eight crystallized P450s, plus 7 crystal structures on hold (Sep. 1, 2002), the majority of which containing either bound substrates or inhibitors. Table 1 provides the relevant information about the structural templates used for human CYP3A model rebuilding. The idea behind homology modeling is that proteins belonging to the same functional class and showing a strong sequence identity, adopt a similar fold (review in (Hilbert et al. 1993)). Known analogous structures are then used to generate a template or parent structure for the unknown protein to be modeled. The reliability of the various methods employed depend mostly on the number of experimental 3D structures that can be aligned. Knowing that for pairs of distantly related proteins (with residue identity of about 20%) the regions having the same fold will represent less than half of each molecule, the regions where the folds differ will predominate, and the divergence of sequence must be compensated by a higher number of homologous proteins to align (Chothia and Lesk 1986). Below 50% of sequence identity, the deviation in structurally not conserved regions becomes significant, and loop regions are difficult to predict. It is generally accepted that below 20% of sequence identity, the prediction turns out to be hazardous, and fold assignment methods are best replaced by ab inilio methods, that ideally attempt to predict the native structure only from the primary sequence of the protein to be modeled. But produced models so far had the correct fold for only a few small protein domains (Sanchez et al. 2000).
The strategy of model rebuilding in the P450 family is strongly driven by the low degree of homology between bacterial and mammal cytochrome P450s (Table 2).
Table 2: Sequence identities between the various crystallized cytochrome P450s and human CYP3A4 and CYP3A7 using BLOSUM 62 matrix (source LALIGN, http://www.infobiogen.fr/services/analyseq/cgi-bin/lfastap_in.pl, algorithm of Huang and Miller LALIGN that finds the best local alignments between two sequences, version 2.1u03 April 2000, published in Adv. Appl. Math. 1991, 12: 373-381). The P450 BM3 structure, Swissprot code name CPXB_BACME, corresponds to the structure of a fusion protein of P450 and a reductase domain, so that it displays twice the number of residues.
Our global scheme, which steps are described hereafter, is founded on a combination of methods developed in the literature for different purposes in protein structure determination studies. The principle of the primary steps, until the generation of a correct alignment of P450 primary sequences, is described in Jean et al. 1997. The last steps are summarized in Loiseau 2002.
Therefore, in a first object, the invention relates to a method for designing a 3-dimentional (3-D) model of a protein, the 3-D representation of at least three family members has already been experimentally obtained, [said 3-D representation presenting similarities], comprising the steps of:
a. identification of common structural blocks (CSBs) among said members of said family,
b. alignment of the amino-acids primary sequence of said family members according to said structural similarities, represented by said CSBs, in order to obtain a first alignment,
c. alignment of said protein as compared on said first alignment, in order to obtain a second alignment, wherein:
d. definition of the 3-D structure of CSBs of said protein, according to the 3-D structure of the CSBs of said family members,
e. definition of the global constraints (distance and angular constraints) derived from the comparisons of the structural templates in CSBs, and definition of the local constraints (distance and angular constraints) for the atoms of residues that are not structurally determined after step d. (that are not in the CSBs),
f. selection of rotamers,
g. determination of a family of 3-D model structures of said protein, taking into account said 3-D structure of CSBs obtained in step d., said global and local constraints defined in step e., and said rotamers defined in step f.,
h. optimization of said family of 3-D models obtained in step g., by
i discarding structures that present topological defects, and
ii recalculating 3-D structures by taking electrostatic forces into account, and performing the method again from step c. downward, with modifications in the alignment between the primary sequence of said protein and said first alignment, when the obtained model structures do not satisfactorily account for known mutations having biological effects.
In the present invention, the term “backbone atoms” refers to the C, N, Cα, and O atoms of a protein that are common to all amino acid building blocks or involved in the peptide linkage. When the protein structure is described as a trajectory in internal coordinates such as α, τ angles, or is a low-resolution crystallographic structure, backbone atoms stand only for Cα atoms of each residue.
In the present invention, the term “similarities” is used in the search for structural fragments conserved between the template proteins, that is fragments that have similar local trajectories in the backbone internal coordinate space. Two protein fragments have “similar” local trajectories when they are matched according to two adjustable parameters, the mesh and the margin (Jean et al. 1997).
In the present invention, the term “common structural blocks (CSB)” define the protein fragments of equal length that are found similar between all the template proteins in the internal coordinate representation.
In the present invention, the term “first alignment” refers to the alignment imposed by the CSBs, that is the structural alignment between template proteins defined by CSBs sequences. This alignment is totally independent on the primary sequence of the template proteins.
In the present invention, the term “out-of-block regions” designates all other protein fragments located out of and between the CSBs, i.e. that are not structurally conserved in the internal coordinate space. There is no information of sequence alignment for these regions (see in
In the present invention, the term “global constraints” refers to geometric constraints that are assigned to atoms of residues from CSBs, and that can be derived by computing all distance or angle information available within CSBs or between CSB.
In the present invention, the term “local constraints” refers to loose structural constraints that are assigned to residues of out-of-block regions, in order to restrict their backbone conformation to allowed regions of the Ramachandran diagram.
In the present invention, the term “rotamers” defines the low energy side-chain conformations of residues. The use of a library of rotarners allows determining or modeling a structure with the most likely side-chain conformations, saving time and producing a structure that is more likely to be correct.
For identification of CSBs between all selected 3D structures:
CSBs define the common local folds found similar in the template proteins, and are used as building blocks to set up the fold of the model (results in Loiseau 2002). The non conserved regions, that can be parts of secondary structures or non-structured regions as loops, will be rebuilt with no initial structural information. For multiple alignment of crystalline P450s, on the basis of CSBs determination: Once the structurally conserved elements are identified, a first structural alignment between the template proteins is derived. The following step involves the localization of these elements in the target sequence. Sequence pairwise comparisons between selected crystal structures and CYP3A (Table 2) show low sequence identity, so that online tools of multiple alignment such as CLUSTALW or PHD (Heidelberg) fail to produce an clear-cut alignment. Instead, local alignment tools, such as that described in Jean et al. 1997, were used to match the CSB profile to the target sequence, where a matrix is slid along the sequence and a score of similarity (based on a standard matrix such as BLOSUM62) is calculated for each position. Online tools of multiple alignment such as CLUSTALW 1.8 can be further used for assessment of accuracy.
The target sequence of human cytochrome P450 3A is thus aligned against the multiple alignment obtained from the CSBs. This produces the key sequence alignment which allows the generation of the template structure used for the rebuilding of the various CYP3A models. Following steps involve:
1) Generation of distance and dihedral angles constraints.
2) Selection of rotamers for side chains in CSBs.
3) Calculation of a set of structures using DYANA software. Loops are rebuilt between CSBs.
4) Structure optimization under XPLOR software (Brünger 1992).
In a preferred embodiment, said 3-D representation of family members has been obtained by crystallography or NMR.
The alignment of said common structural blocks in steps b. and c. can be performed by use of the GOK software as described in Jean et al., 1997.
In addition, step d. is preferably performed according to the following rules:
The definition of rebuilding global constraints in step e. is performed by using all available geometrical information intra- and inter-CSB (distances and angles), issued from the comparisons of the structural templates, each geometric constraint being defined as an interval. On another hand, the definition of local constraints for out-of-blocks residues is performed by analysis of the allowed regions in Ramachandran diagram.
Furthermore, distances and angles defining global constraints are preferably selected in step e. by the following rules:
The distance of 8 Å is chosen in order to reduce drastically the total number of constraints to take into account in the computation, and to allow to excessively constrain the model.
Angular constraints are preferably selected in step e. by the following rule:
To practice the method of the invention, rotamers in step f. can be selected from the couples according to the tables of Dunbrack and Karplus and step g. can be performed with the DYANA software, as described in Güntert et al, 1997.
In addition, the optimization in step h. comprises the use of the X-Plor software, as described in A. T. Brünger, X-PLOR, version 3.1.
The method according to the invention is particularly applicable to a cytochrome P450 subfamily 3A comprising mammal and human cytochromes P450 3A]
In a preferred embodiment, said mammal cytochrome P450 3A is selected from the group comprising CYP3A6 (SEQ ID No14), CYP3A12 (SEQ ID No16), CYP3A29 (SEQ ID No17) and CYP3A13 (SEQ ID No18).
In another preferred embodiment, said human cytochrome P450 subfamily 3A is selected from the group comprising CYP3A4 (SEQ ID No11), CYP3A7 (SEQ ID No15), CYP3A5 (SEQ ID No12) and CYP3A43 (SEQ ID No13).
The method is applicable as well to human cytochrome of the subfamily P450 3A4, wherein said family members that are used for performing said first alignment for designing a 3-D model of CYP3A4 are chosen from Nor (SEQ ID No 1), Ery F (SEQ ID No 2), terp (SEQ ID No 3), Cam (SEQ ID No 4), BM3 (SEQ ID No 5) and 2C5 (SEQ ID No 6).
The method is applicable as well to human cytochrome of the subfamily 3A7, wherein family members that are used for performing said first alignment for designing a 3-D model of CYP3A7 are chosen from Ery F (SEQ ID No 2), BM3 (SEQ ID No 5), CYP51 (SEQ ID No 8) and 2C5 (SEQ ID No 6).
The method is applicable as well to other mammalian cytochrome P450 3A isoforms.
In a second object, the invention is directed to 3-D structure model of a protein, obtained by the method as described above.
In a preferred embodiment, the protein is a cytochrome P450 subfamily 3A comprising mammal and human cytochromes P450 3A
In another preferred embodiment, the protein is selected from the group comprising CYP3A6 (SEQ ID No14), CYP3A12 (SEQ ID No16), CYP3A29 (SEQ ID No17) and CYP3A 13 (SEQ ID No 18).
In still another preferred embodiment, the protein is a human cytochrome P450 subfamily 3A selected from the group comprising CYP3A4 (SEQ ID No 11), CYP3A7 (SEQ ID No 15), CYP3A5 (SEQ ID No 12) and CYP3A43 (SEQ ID No13).
In still another preferred embodiment, the protein is a human cytochrome P450 3A4 or 3A7.
Regarding the rebuilt P450 3A4 model, the main residues involved in the recognition of the substrate are C97; R104; F101; F107; F247; F303 and C376.
More specifically, C97 and C376 are found in positions compatible with the formation of a disufide bridge allowing limited or enhanced flexibility of corresponding protein domains, while R104 is involved in the capture of the substrate that is close to the entrance site, and allows to accompany it to the active site. F303 is involved in the recognition of the substrate in the active site. F107; F247 and F303 are involved in the recognition at the modulation site responsible for positive regulation. Role of F303 in the active site has already been suggested by studies of Domanski et al. 1998 in the SRS 4 region (mutants 1300, F303, A304, and T308).
Features of this model comprise the 3-D atomic coordinates of Table 3.
Table 3
In a preferred embodiment, the residues C97; R104; F101; F107; F247; F303 and C376 are involved in the CYP 3A4 for the recognition and uptake of the substrate at the entry site, and its binding into the active site having the 3-D atomic coordinates of Table 3.
Regarding the P450 3A7 model, features comprise the 3-D atomic coordinates of Table 4.
Table 4
In a preferred embodiment the residues Q79; F102; R105; R106; F108; F248; F304 and E374 are involved in the CYP 3A7 for the recognition and uptake of the substrate at the entry site, and its binding into the active having the 3-D atomic coordinates of Table 4.
In a third object, the invention contemplates a method for designing a protein, biological functions of which are altered, comprising:
a) obtaining a 3-D model of said protein by the method as depicted above,
b) analyzing said model of step a., and determining the amino-acids that are putatively involved in the biological functions of said protein,
c) changing said amino-acids by mutating the corresponding nucleotides on the nucleic acid sequence coding for said protein, in order to obtain a mutated protein having altered properties.
In the present invention, the term “altered properties” means that the generated protein is altered in its enzymatic properties, such as the substrate recognition, the movements associated to the entrance or the exit of the substrate, the multiple binding at the active site, the allosteric behaviour, the electron transfer, the coupling to the P450 reductase.
In another object, the invention relates to a computer-assisted method for performing restrained dynamics docking of a substrate on an enzyme, a 3-D structure of which is available, comprising the steps:
j. determining a force field, and independently simulating the presence of said enzyme in said force field,
k. minimizing the potential energy (Ep) linked to said force field of said 3-D structure, wherein the spatial position of some atoms of said enzyme is fixed, and wherein the other atoms are mobile, by allowing mobility of the mobile atoms, by
i. simulating an increase in temperature (in order to give kinetic energy),
ii. and minimizing the potential energy by re-specifying the temperature as 0 Kelvin (K)
l. optionally repeating step k in order to obtain other Ep minima, wherein said Ep minima are such that the structure of the protein remains folded,
m. minimizing Ep in said force field of said 3-D structure, wherein all the atoms of the protein are mobile, by
i. simulating an increase in temperature (in order to give kinetic energy), and
ii. minimizing the potential energy by re-specifying the temperature as 0 Kelvin (K)
n. simulating, at 0 K the presence of said substrate next to said enzyme,
o. optionally generating a molecular dynamics simulation on said substrate and enzyme (simulating an increase in temperature, in order to allow mobility of the atoms)
p. generating some constraints to said substrate, in order to impose that it has interaction with said enzyme,
q. generating a molecular dynamics simulation on said substrate and enzyme, with said constraints imposed in step p.,
r. optionally, generating a molecular dynamics simulation on said substrate and enzyme without said constraints of step p.
In the present invention, the term “restrained dynamics docking” means a procedure by which the docking of the substrate is simulated using molecular dynamics (MD) simulations under constraints that are specified by the user.
In the present invention, the term “soft-restrained dynamics docking” refers to a restrained dynamics docking in which the substrate-protein distance constraints are loose, with force field parameters associated to the constraints as low as 1 or 2 Kcal/mol.
20 In the present invention, the term “constraints” when applied to substrate docking refers to a distance imposed between atoms of the protein, generally from the active site (such as atoms of the heme group), and atoms of the substrate. These distance restraints are defined as intervals, where the distance range is large enough to allow the free movement of the substrate within the active site.
In a preferred embodiment of this method for performing restrained dynamics docking, said fixed atoms in step k. are the backbone atoms N—Ca—CO in the first minimization step and only Ca in subsequent minimization steps.
In another preferred embodiment of this method, kinetic energy is simulated by temperature increase to about 100 K for about 5-20 ns.
The force field in step j. comprises forces linked to:
a. the distance between atoms,
b. the angles of valence,
c. the dihedral angles,
d. the deformation with regard to planar geometry,
e. the electrostatic field,
f. the Van der Waals forces,
g. hydrogen bonds.
The constraints in step p. are attraction constraints to force said substrate in the active site, and wherein said constraints are not prejudiced to the exact spatial conformation of the substrate in the active site. These constraints are final distance constraints between some atoms of said substrate and some atoms of amino-acids present in said active site.
In the present invention, the term “final distance constraints”, when applied to substrate docking, means distances imposed between atoms from the heme group (such as the iron atom), and atoms of the substrate. These distance contraints are defined as intervals, and are related to the final position of the substrate in the vicinity of the heme group before metabolization.
Preferably, step o. is performed with a simulated temperature of between about 15 and 50 K, step q. is performed with a simulated temperature of between about 15 and 50 K, and step r. is performed with a simulated temperature of between about 200 and 350 K.
This method is particularly suited for multispecific protein such as a cytochrome 36 P450 subfamily 3A comprising mammal and human cytochromes.
The cytochrome can be cytochrome P450 3A4 or any of all other P450 from the 3A subfamily, and said structure can be the structure obtained by the method of the invention described above, in particular the model structures which atomic coordinates are listed in Tables 3 and 4 for CYP3A4 and CYP3A7.
The substrate can be a small organic compound which size can range for example from MW 288 (testosterone) to MW 1203 (cyclosporine A).
In a preferred embodiment said substrate is testosterone.
In another object, the invention is aimed at a computer-assisted method for performing restrained dynamics docking of at least two substrates on an enzyme, a 3-D structure of which is available, consisting of performing the steps j, k, l, m, n, o, p, q and r depicted above with a first substrate and repeating said steps with a second substrate when the first substrate reaches an unconstrained state after molecular dynamics simulations.
The first and second substrates can be the same molecule or different molecules.
The first and second substrates can display either allosteric or synergistic effect.
This method can be practiced with substrates that are inhibitors (competitive, uncompetitive, non competitive) or display an inhibitor-base mechanism. It can also be practiced with an agonist and any molecule interfering with the biological function of the protein.
In preferred embodiments:
In another embodiment, this method also embraces a successive repeat of the steps j, k, l, m, n, o, p, q and r depicted above with a 3rd, 4th or 5th substrate, some of them being the same or different molecules.
In this method for performing restrained dynamics docking, said fixed atoms in step k. are the backbone atoms N—Cα-CO in the first minimization step and only Cα in subsequent minimization steps.
In addition, kinetic energy is simulated by temperature increase to about 100 K for about 5-20 ns.
The force field in step j. comprises preferably forces linked to
a. the distance between atoms,
b. the angles of valence,
c. the dihedral angles,
d. the deformation with regard to planar geometry,
e. the electrostatic field,
f. the Van der Waals forces,
g. hydrogen bonds.
The constraints in step p. are preferable attraction constraints to force said substrate in the active site, and wherein said constraints are not prejudiced to the exact spatial conformation of the substrate in the active site. These constraints are final distance constraints between some atoms of said substrate and some atoms of amino-acids present in said active site.
Preferably, step o. is performed with a simulated temperature of between about 15 and 50 K, step q. is performed with a simulated temperature of between about 15 and 50 K, and step r. is performed with a simulated temperature of between about 200 and 350 K.
This method is particularly suited for multispecific protein such as a cytochrome P450. The cytochrome can be cytochrome P450 3A4, or any of all other P450 of the 3A subfamily and said structure can be the structure obtained by the method of the invention described above, in particular the model structures which atomic coordinates are listed in Tables 3 and 4 for CYP3A4 and CYP3A7.
In a preferred embodiment:
The invention is also directed to the use of the method for designing a 3-D model of a protein and to the computer-assisted method for performing restrained dynamics docking as mentioned above for screening, designing or identifying natural, unnatural substrates or substrate analogs, as well as inhibitors, activators or modulators of said enzyme.
Another object of the invention is the use of these methods for determining the effect of a first substrate on a second substrate, which can also be applied to pharmaceutical products.
The invention contemplates the use of these methods for determining the effect of a first bound testosterone molecule on the access of a second testosterone molecule as well as for determining the mutual effect of a testosterone molecule with alpha-naphtoflavone (αNF) molecule.
The invention is also directed to:
The distance and angular constraints derived from CSBs common to the crystallized cytochromes P450 used as structural templates, are applied to conserved atoms of CSBs of the target protein. The DYANA software (Güntert et al. 1997) allows to rebuild directly the whole structure of the target protein on the basis of its primary sequence, by taking into account these geometric constraints. Out-of-blocks residues are rebuilt ab initio by selecting the most favorable solutions in terms of minimal global potential energy. As examples, actual tables 3 and 4 display the atomic coordinates of structural models obtained by applying DYANA calculation to target protein sequences CYP3A4 and CYP3A7 respectively.
Sequences:
SEQ ID No1: P450 Nor, crystal structure 1 rom
SEQ ID No2: P450 Ery F, crystal structure 1 oxa
SEQ ID No3: P450 Terp, crystal structure 1 cpt
SEQ ID No4: P450 Cam, crystal structure 3cpp
SEQ ID No5: P450 BM3, crystal structure 2hpd
The sequence corresponding to the PDB structure includes 471 residues. For more clarity in
SEQ ID No6: P450 2C5, crystal structure 1dt6
Cyp2C5 from Oryctolagus cuniculus (Rabbit), with membrane spanning residues 3-21 deleted and a 4 residue histidine tag at the C-Terminus containing additional internal mutations.
SEQ ID No7: P450 2C5 rabbit
Sequence corresponding to the non-mutated CYP 2C5 gene from Oryctolagus cuniculus (Rabbit), consistently with SwissProt CPC5_RABIT P00179.
SEQ ID No 8: CYP51, crystal structure 1e9x
Cyp51 from Mycobacterium tuberculosis, with a 4 residue histidine tag at the C-Terminus.
SEQ ID No9: CYP3A1 rat
SEQ ID No10: CYP3A3 human
Cytochrome P-450, a possible variant of CYP3A4, inducible by glucocorticoids in human liver.
SEQ ID No11: CYP3A4 human
Numbering starts at Ala 1 (first residue Met is not included, consistently with SwissProt CP34_HUMAN P08684)
SEQ ID No12: CYP3A5 human
SEQ ID No13: CYP3A43 human
SEQ ID No14: CYP3A6 rabbit
SEQ ID No15: CYP3A7 human
SEQ ID No16: CYP3A12 dog
SEQ ID No17: CYP3A29 pig
SEQ ID No18: CYP3A13 mouse
Sequence numbering is indicated for each enzyme of the structural template and for the human 3A4 and 3A7 isozymes, as examples given in the present invention. This alignment is first based on the structural alignment of bacterial P450s and rabbit P450 2C5 derived from GOK analysis. Human P450 3A sequences were then aligned with in-house tools that locates the CSBs on the target sequence. The alignment shown outside the CSBs is not relevant, as there is no structural information available in these regions. The CSB sequences are indicated by bold uppercase characters and are highlighted in grey. Amino acids strictly conserved between CYP3A and 2C5, or between CYP3A and all the sequences of crystal structures, are highlighted in black.
Panel 4A In CYP3A4 active site, the docked testosterone molecule is oriented so that the A steroid cycle (carrying in position 3 a carbonyl function with an oxygen atom symbolized by a large ball ) is close to the heminic iron. This supports the propensity of CYP3A4 to metabolize testosterone in 6 β position as indicated by the black solid arrow.
Panel 4B In CYP3A7 active site, the docked testosterone molecule is oriented so that the D steroid cycle (carrying in position 17 a hydroxylic function with an oxygen atom symbolized by a large ball ) is close to the heminic iron. This supports the propensity of CYP3A7 to metabolize testosterone in 16 α position as indicated by the black solid arrow
Material
The coordinates of the six P450 crystal structures: P450cam (3cpp), P450terp (1cpt), P450BM-3 (2hpd), P450eryF (1oxa), P450 nor (1rom) and P450 2C5 (1dt6) were retrieved from the Brookhaven Protein data bank. The structural alignment and the conserved regions determination were realized using the GOK software (Jean et al. 1997) running on an Octane Silicon-Graphics workstation. Structures were built using the DYANA (Güntert et al. 1997), and X-PLOR softwares (Brünger 1992). Docking studies were performed with SYBYL 6.6 (Tripos Inc.) and TRIPOS force field. The structures were analyzed using Procheck-NMR (Laskowski et al. 1993) and visualized under SYBYL 6.6 (Tripos Inc.).
Common Structural Blocks (CSB) Determination.
The first key point of this homology modeling study is the identification of the structural elements (hereafter designed as CSBs for Common Structural Blocks) conserved among the family of cytochromes P450 of known 3D structures, and the localization of these elements in the target sequence. These two tasks are performed using the GOK software (Jean et al. 1997), and are well described in a forthcoming article (Minoletti et al., Proteins, Structure, Function and Genetics, 2002). In brief, the basic idea of CSB identification by GOK is to use an internal coordinate representation—(α,τ) in our case (another representation of φ, ψ and ω angles)—and to search for fragments in the six-template proteins having similar local trajectories in the internal coordinate space. GOK provides two adjustable parameters (the α-mesh and the α-margin) that define the tolerance on the comparison of the trajectories. These parameters were adjusted recursively to values ranging from 15 to 30° (α-mesh) and 1 to 3 (α-margin in mesh units). The evaluation of the quality of the match was measured using two multiple-way rmsd calculated in the cartesian coordinates space: mp-rms (the mean of all pairwise rms deviations) and s-rms (the mean of the deviations calculated with respect to a mean structure obtained from the average internal coordinates). For the different CSBs, mp-rms value ranged between 0.3 and 4.9 Å in average, and s-rms between 0.04 and 2.4 Å.
CYP3A4 Sequence Alignment and Evaluation of the Profile
The multiple sequence alignment derived from the CSB identification was then used to build a similarity profile. The profile is defined as a position-specific scoring table created from aligned gap-free segments such as CSBs (Jean et al. 1997). The alignment then consists in a search of the best match (as per the best score) between a CSB of sequences defined structurally (ie. independently of the nature of the aligned residues) and several other sequences that are well-aligned and exhibit a high sequence identity. In the P450 3A subfamily, many proteins exhibit high sequence identity. We extended our profile search program to take this information into account, i.e. to align the profile with a pre-defined multiple alignment of the cytochromes P450 3A subfamily members sequences (Gotoh 1992; Nelson et al. 1996). The similarity score was calculated using BLOSUM62 matrix (Henikoff and Henikoff 1992). The in-house tool SmartConsAlign (Atelier de Bio-informatique, Universitd Paris VI) described in Jean et al. 1997, allows to move the consensus matrix along the multiple sequence alignment of P450 3A family, and computes for each position a score of similarity. The best alignment found of CYP3A4 on CSBs is shown in
Once the alignment is completed, the 3D model rebuilding process can incorporate the atom Cartesian coordinates of the template structures only for amino acids located in structurally conserved regions (i.e. the CSBs). The coordinates of any of the template structures can be used for determining the final template. In each CSB, amino acid positions have been renumbered according to the sequence of human P450 3A4. At a given position, when residues are identical between all the template structures and the target sequence, the 3D coordinates of the reference residues are purely assigned to the modeled (target) residue. When residues differ, only the coordinates of the backbone atoms are assigned (Cα), and sometimes Cβ when they exist. Side chains are rebuilt from libraries giving the most probable rotamers for each amino acid (see below). In some cases, it was possible to superimpose the positions of carbon atoms of lateral chains up to ranks γ and δ along the sidechain, thus explicitly defining a unique rotamer.
For amino acids located outside the CSBs (structurally variable zones that include generally loops), the rebuilding is more complex, and can be done only after rebuilding of structurally conserved zones. In the multiple structural alignment (
Constraints Derivation and Rebuilding
A strategy inspired of the techniques commonly used to built structures from NMR data (Patard et al. 1996) is applied. The main idea is to express all available information issued from the comparison of the templates in term of geometrical constraints (distances and angles). Each constraint will be defined as an interval (for a given pair of atoms, this is the average of the six atom-atom distances found in the template structures±the standard deviation), similarly to the strategy developed by Havel and Snow (Havel and Snow 1991). However, the number of constraints corresponding to all atom-atom distances, for example, would be prohibitive for a protein of the size of the P450 (around 1,000,000 inter-residual distances if we consider 250 conserved residues and an average of four atoms per residues). Previous NMR studies (Patard et al. 1996) have shown that local constraints are sufficient to allow a correct reconstruction of a structure. This reduces drastically the number of constraints needed, and increases the flexibility of the model. In addition, similarly to what is done in protein structure determination by NMR, we can build a family of structures instead of a single model. This allows an easier analysis of the well or less well-predicted regions. This is also an advantage for the analysis of the side-chain positions, particularly in prevision of a substrate docking study. Finally, the loops are passively reconstructed with the rest of the structure. The only specific information we have introduced in variable regions was to guide all their residues to an allowed region of the Ramachandran diagram. Indeed, analysis of well-defined structures shows that nearly all residues, including those of the loops, should belong to an allowed region. The lower the proportion of residues found outside the allowed Ramachandran regions, the better the structure is. This criterion of quality has been applied to derive the model described herein.
Accordingly, we retained for model rebuilding all the distance and angle intervals corresponding to the following principles:
The total number of distance constraints was, in these conditions, equal to 58506. Similarly, angular constraints were calculated in each building block. A CSB is indeed defined as a conserved trajectory in the φ,ψ coordinates space (or α,τ). Thus, dihedral angles φ and ψ of all residues located in CSBs can be defined as constraints, given by the average values of corresponding φ,ψ angles in the six templates±the standard deviation. To these backbone dihedral angles, can be added the side chains torsion angles χ1, χ2 whenever possible, as determined by the rotamer selection. The total number of dihedral angle constraints was, in these conditions, equal to 761.
Rotamer Selection
In proteins, the preferential orientation of the side chain (60°, −60°, 180°) depends on the local conformation of the residue, and thus on the nature of the secondary structure in which the residue is involved. According to the rotamer library built by Karplus and coil. (Dunbrack and Karplus 1993), to a given (φ, ψ) couple in the Ramachandran diagram can be associated a specific rotamer for each type of residue. These tables have been used to determine the most probable rotamer for each residue located in CSB, except when there are conserved atoms in the side chain that assign unambiguously a rotamer (χ1, χ2). The selected (χ1, χ2) couples were included in the above-mentioned set of angle 761 dihedral constraints.
Structure Calculation and Optimization
We used a procedure similar to structure calculation starting from NMR constraints. A first set of structures was calculated using the DYANA software (Guntert et al. 1997) and the 58506 distance and 761 angular constraints. Families of structures are generated. The energy of each structure is minimized with the procedure vtfmin in DYANA.
Due to the size and the amount of loops in the molecule, some structures presented topological defects and were discarded. The others were further optimized by using the X-PLOR software. A set of constraints was added at this stage in order to guide the loop residues to the nearest allowed region in the Ramachandran diagram. The topology and parameter files of CHARMM22 were used. The electrostatic term was turned off.
The DYANA software is unable to deal with disconnected objects. A new residue type was, thus, added to the standard amino acid library to take into account the the presence of the heme. This residue was obtained by combining the heme to a cysteine and was inserted at position 441 in the sequence of the protein (
Description of the CYP3A4 Model
We rebuilt a model of the protein depleted of its first 50 residues (N-terminal domain). This segment is highly hydrophobic, and supposed to form the anchor of the protein in the membrane. There is no structural information about this putative transmembrane domain, and this segment was thus not incorporated into the modeling process, and in the final model. Such a “free” segment (with no constraints) would perturbate the convergence of computation or the stability of the whole rebuilt structure.
The quality of the various structures optimized under XPLOR was checked for the stereochemical quality (backbone and side chain conformation) by PROCHECK (Laskowski et al. 1993). The Ramachadran plot shows that our six-template approach generated converging models, possessing the same fold. The lowest energy models had 73% of their non-glycine and non-proline residues with φ, ψ conformation in the most favoured regions of the Ramachandran plot (core region), 20% in additional allowed regions, and 5% in the generously allowed regions. Only 2.3% (9 residues) had their φ, ψ conformation in disallowed regions (
When compared to the CYP2C5 crystal structure, it can be noticed that the CYP3A4 model exhibits a good 3D similarity in the global fold than expected, since this structure counts only for one in the six-template approach. This proves that in this approach, there is no “averaging” effect, i.e. the mammalian structure had a decisive influence over the five bacterial (and fungus) templates. Our final fold of CYP3A4 is very consistent with a mammalian one, despite the fact that it has been rebuilt by using the structural information contained in non-mammalian cytochromes P450.
The active site is delimited by the six substrate recognition sites (SRS) that have been first identified and described by Gotoh (Gotoh 1992) from the unique structure available in the early 1990s (P450cam), and that are today commonly accepted for depicting substrate recognition by various cytochromes P450 (especially from the family 2, but extended to other P450 families). These sites are associated with the active site and are located in the less conserved regions of the CYPs, thus possibly accounting for the various substrate specificity among P450s. When comparing our various optimized structures, it is found that SRS1 (100-125, includes helix B), SRS 2 (205-218, includes C-terminus of helix F), and SRS3 (237-249, includes N-terminus part of helix G) are located in less-defined regions, with significant variability in spatial position (flexibility). These regions correspond also to parts of the sequence that are less well-aligned. At the opposite, the SRS4 (295-320, central part of helix 1), SRS5 (363-380, C-term of helix K and β-sheet ,1-4) and SRS6 (470-490, β-sheets β4-1 and β4-2) are well-defined fragments of the structures. SRS4 and SRS5 segments in particular are correlated to regions in the sequence that are unequivocally aligned.
The only model structure of CYP3A4 that has been described in the literature and that we can handle for structural comparison, is that of Szklarz and Halpert, derived from a multiple-template approach (four-bacterial template) (Szklarz and Halpert 1997). Roughly, the same secondary structures are identified, but we found divergences in SRS location between their model and those derived from the present approach. SRS4 and SRS5 match well, but SRS2 is shifted (divergence in the position of helix F along the sequence), while SRS1 (helix B′), SRS3 (helix G) and SRS6 (sheet β4) are more notably displaced. The loops connecting the secondary structures of these SRS significantly disagree. These differences are likely to issue from a wrong alignment with the crystal P450 structures in the model of Szklarz and Halpert.
The model rebuilding of CYP3A7 was performed according to the techniques described above in example 1 for CYP3A4, except that we used a restrained set of four-template structures, still including the mammalian CYP2C5, in order to test the robustness of the modeling approach. Below are pointed out only the differences in input data and the results relevant to CYP3A7.
Material
The coordinates of the four P450 crystal structures: P450BM-3 (2hpd), P450eryF (1oxa), P450 51-like from Mycobacterium tuberculosis (1e9x) and P450 2C5 (1dt6) were retrieved from the Brookhaven Protein data bank and used as initial template for GOK analysis.
Common Structural Blocks (CSB) Determination.
The GOK parameters were adjusted recursively to values ranging from 10 to 30° (α-mesh) and 1 to 3 (α-margin in mesh units). Occasionally, the α-mesh value was pushed up to 60° to refine some local structured loops (DE loop, HI loop) or short helices (such as J′). 27 CSBs have been identified. New CSBs were detected: the block 7* (between blocks 6 and 7A), the block 7B* (between 7B and 8) and the block 7C (between 7B* and 8). For the different CSBs, mp-rms value ranged between 0.12 and 4.57 Å in average.
The best alignment found of CYP3A7 on CSBs is shown in
Constraints Derivation and Rebuilding
With a larger cutoff (12 Å), we obtained around 73000 distance constraints, and 900 dihedral constraints.
The residue covalently linked to the heme group is at position 442 in the sequence of the protein (
Description of the CYP3A 7 Model
The four-template approach generated converging models, possessing the same fold. The PROCHECK analysis for structure quality assessment for the lowest energy models showed 74.4% of their non-glycine and non-proline residues with φ, ψ conformation in the most favoured regions of the Ramachandran plot (core region), 18.2% in additional allowed regions, and 4.7% in the generously allowed regions. 2.7% (11 residues) had their φ, ψ conformation in disallowed regions. The total number of residues in the model is 459; which 407 are non-glycine and non-proline residues, and number of residues in the native sequence is 503.
A closer inspection of the structure, and after the results of dynamics docking experiments (see below), revealed that several hydrogen bonds can hinder the main access to the active site. Thus, key residues that are likely to be involved in the recognition and admission of the substrate are Q79; F102; R105; R106; F108; F248; F304 and E374, and additionally C98 and C377 (
Our aim in this example was to obtain the different positions of the known substrates of CYP3A in the active site, consistent with the oxidation sites and biochemical differences among the CYP3A isoforms. Considering the fact that the heme-binding site is deeply buried in the protein structure, and thus the selection and the pathway of the substrates within the enzyme structure are strongly dependent on the various possibilities of structure opening, we implemented a special approach more appropriate to flexible structures, hereafter referred as “restrained dynamics docking” or “soft-restrained dynamics docking”. This technique employs constrained molecular dynamics simulations, where the only constraints are heme-substrate distances. The successive steps are:
Conversion of the PDB XPLOR File in PDB for SYBYL File
The optimized structures with XPLOR (PDB format) are visualized with the SYBYL 6.6 software (Tripos Inc.), which implies a conversion of the file (atoms types correction) so as to make it compatible and exploitable in the constrained dynamics which will be performed with SYBYL.
Stabilization of the P450 3A4 Model Generated Under XPLOR
Then, we do agregate No1 (in the meaning of SYBYL) with all the NCαCO atoms of the peptide backbone of the protein. The structure is relaxed with a dynamic of 10 ns at 100K followed by a minimization of 100 steps. Agregate No1 is then deleted.
We do agregate No2 constituted of the protein Cα only. The protein relaxation is reiterated with a dynamic of 10 ns at 100K and a minimization of 100 steps. Agregate No2 is then deleted.
The all protein is then relaxed with a first dynamic of 1 ns at 100K, followed by a dynamic of 1 ns at 200K and a dynamic of ions at 300K. We terminate with a minimization of 100 steps.
Restrained Dynamics Docking of the Substrate (Example: Testosterone)
We do agregate No3 constituted of all atoms outside a sphere of 20 Å around the Cα of residues constituting the heart of the B′ loop. We also add heminic iron to this aggregate.
The substrate is placed inside the protein, at around 30 Å from the heminic iron and next to SRS1 and SRS5 sites. The substrate is placed so that the contraints between the heminic iron and the substrate backbone go between SRS1, SRS5 and SRS3. Thus, for testosterone docking, we establish 4 distance contraints (limit below 3 Å, above 10 Å) between heminic iron and C3, C8, C10 and C13 carbons with a constraint of 2 kcal/Å on the entire structure so as to avoid to favour the approach of one part of the substrate more than the other.
We begin to perform a dynamic without contraints of the entire system at 20 K during 2 ns to stabilize the system, then we perform a dynamic under contraints at 20 K during 5 ns. We observe that the substrate worms between SRS1, SRS3 et SRS5 to reach a position at the vicinity of heminic iron. We terminate with a dynamic without contraints at 300 K to relax the system and we realize a minimization of 1000 steps.
Results
We found that the testosterone molecule is positioned at the vicinity of heminic iron in such way that the C6 of testosterone be at 4.9 Å of the iron, which is compatible with the hydroxylation of this compound to give 6β-hydroxy-testosterone (
Minimizations and dynamics with the SYBYL software are performed with the Tripos force field following the parameters: dielectric constant equal to 1 and distance-dependent, minimization method of POWELL, a minimum gradient of 0.05 kcal.mol−1.Å−1, electrostatics charges calculated according to the Gasteiger-Huckel method, and a NB cutoff of 8.0 Å (non-bond energies). The energetic diagram of dynamic docking of testosterone is shown in
Interest of this Docking Strategy:
Most P450 isozymes recognize only one substrate (for specific catalysis in a metabolic pathway), or a very limited number of substrates, all chemically closely related. At the contrary, CYP 3A isozymes are known to recognize a large palette of substrates, and are also capable of multiple binding in the active site, up to three molecules in the vicinity of the heme, according to the model developed by Hosea et al. 2000. Multiple pharmacophoric behavior (Ekins et al. 2003), as well as allosteric or synergistic effects, characterize the members of this P450 subfamily.
The docking strategy described above can be easily extended to different binding and metabolism scenario.
For example, the docking of two or three testosterone molecules, or of two testosterone molecules and one alpha-naphtoflavone molecule (αNF) can be simulated in the following manner:
Of course, not only substrates can be docked, but also inhibitors. The docking procedure above can help to measure the potential inhibitory power of a molecule, for example a compound comprising an imidazole group. A first step would include a standard constrained dynamic docking of the potential inhibitor, followed by a free MD simulation (constraints are released when the inhibitor is in the active site), or by a specifically-constrained MD simulation where the imidazole group is confined in the vicinity of the heminic iron by using an additional distance constraint Fe-imidazole. In a following step, a second substrate is dynamically docked under constraints from the exterior, and one can determine in what conditions the second molecule can chase the first one from its binding position. The strength of the additional constraint can be a measurement of the inhibitory potential.
Correspondingly, the exit pathway of the metabolites can be explored by simulating the exit of the molecule bound to the active site, using either free MD simulation (if the chemical nature of the transformed molecule allows an energetical instability), or using inverted constraints, i.e. soft distance constraints (between an external point and the bound molecule) that help to expel out the metabolite. Additionally, the best exit pathway can be deduced from the most favored energy profiles.
Aninat, C., Hayashi, Y., Andre, F., and Delaforge, M. 2001. Molecular requirements for inhibition of cytochrome p450 activities by roquefortine. Chem Res Toxicol. 14: 1259-1265.
Brünger, A. T. 1992. X-PLOR Version 3.1. A system for X-ray crystallography and NMR. version 3.1. Yale University Press, New Haven, Conn., USA.
Chothia, C., and Lesk, A. M. 1986. The relation between the divergence of sequence and structure in proteins. Embo J 5: 823-826.
Cupp-Vickery, J. R., and Poulos, T. L. 1995. Structure of cytochrome P450eryF involved in erythromycin biosynthesis. Nature Struct. Biology 2: 144-153.
Delaforge, M., André, F., Jaouen, M., Dolgos, H., Benech, H., Gomis, J. M., Noël, J. P., Cavelier, F., Verducci, J., Aubagnac, J. L., and Liebermann, B. 1997.
Metabolism of tentoxin by hepatic cytochrome P-450 3 A isozymes. Eur J Biochem. 250: 150-157.
Delaforge, M., Bouillé, G., Jaouen, M., Jankowski, C.K., Lamouroux, C., Bensoussan, C. 2001. Recognition and oxidative metabolism of cyclodipeptides by hepatic cytochrome P450. Peptides 22: 557-565.
Domanski, T. L., Liu, J., Harlow, G. R., and Halpert, J. R. 1998. Å nalysis of four residues within substrate recognition site 4 of human cytochrome P450 3 A4: role in steroid hydroxylase activity and alpha-naphthoflavone stimulation. Arch. Biochem. Biophys. 350: 223-232.)
Dunbrack, R. L. J., and Karplus, M. 1993. Backbone-dependent rotamer library for proteins- Application to side chain prediction. J. Mol. Biol. 230: 543-574.
Ekins, S., Stresser, D. M., and Williams, J. A. 2003. In vitro and pharrnacophore insights into CYP3A enzymes. Trends Pharmacol Sci. 24: 161-166.
Ferenczy, G., and Morris, G. 1989. The active site of cytochrome P450 nifedipine oxidation model building study. J. Mol. Graph 7: 206-211.
Gellner, K., Eiselt, R., Hustert, E., Arnold, H., Koch, I., Haberl, M., Deglmann, C. J., Burk, O., Buntefuss, D., Escher, S., Bishop, C., Koebe, H. G., Brinkmann, U., Klenk, H. P., Kleine, K., Meyer, U. A., and Wojnowski, L. 2001. Genomic organization of the human CYP3A locus: identification of a new inducible CYP3A gene. Pharmacogenetics. 11: 111-121.
Gotoh, O. 1992. Substrate Recognition Sites in Cytochrome-P450 Family-2 (CYP2) Proteins Inferred from Comparative Analyses of Amino Acid and Coding Nucleotide Sequences. Journal of Biological Chemistry 267: 83-90.
Guenguerich, F. P. 1995. Human cytochrome P450 enzymes. In “Cytochrome P450: structure, mechanism and biochemistry”, P. R. Ortiz de Montellano Ed., Plenum Press, pp. 537-574, New York.
Güntert, P., Mumenthaler, C., and Wüthrich, K. 1997. Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273: 283-298.
Hasemann, C. A., Ravichandran, K. G., Peterson, J. A., and Deisenhofer, J. 1994. Crystal Structure and Refinement of Cytochrome P450(Terp) at 2.3 Å Resolution. J. Mol. Biol. 236: 1169-1185.
Havel, T. F., and Snow, M. E. 1991. A new method for building protein conformations from sequence alignments with homologues of known structure. J Mol Biol 217: 1-7.
Henikoff, S., and Henikoff, J. G. 1992. Amino acid substitution matrices from protein blocks. Proc Natll Acad Sci USA 89: 10915-10919.
Hilbert, M., Bohm, G., and Jaenicke, R. 1993. Structural relationships of homologous proteins as a fundamental principle in homology modeling. Proteins 17: 138-151.
Hosea, N. A., Miller, G. P., and Guengerich, F. P. 2000. Elucidation of distinct ligand binding sites for cytochrome P450 3 A4. Biochemistry 39: 5929-5939.
Inoue, E., Takahashi, Y., Imai, Y., Kamataki, T. Development of bacterial expression system with high yield of CYP3 A 7, a human fetus-specific form of cytochrome P450. Biochem Biophys Res Commun. 2000 Mar. 16;269(2):623-7.
Jean, P., Pothier, J., Dansette, P. M., Mansuy, D., and Viari, A. 1997. Automated multiple analysis of protein structures: application to homology modeling of cytochromes P450. Proteins 28: 388-404.
Karplus, M. and McCammon, J. A. 2002. Molecular dynamics simulations of biomolecules. Nat. Struct. Biol. 9: 646-652.
Koch, I., Weil, R., Wolbold, R., Brockmnoller, J., Hustert, E., Burk, O., Nuessler, A., Neuhaus, P., Eichelbaum, M., Zanger, U., Wojnowski, L. 2002. Interindividual variability and tissue-specificity in the expression of cytochrome P450 3 A mRNA. Drug Metab Dispos. 30: 1108-1114.
Laskowski, R. A., MacArthur, M., Moss, D. S., and Thorntorn, J. 1993. PROCHECK: a program to check the stereochemical quality of protein structures. J Appl. Crystallog. 26: 283-291.
Lewis, D. F. V. 2001. Guide to cytochrome P450 structure and function. Taylor & Francis, New York, pp. cm.
Lewis, D. F. V., Eddershaw, P. J., Goldfarb, P. S., and Tarbit, M. H. 1996. Molecular modelling of CYP3 A 4 from an alignment with CYP102: Identification of key interactions between putative active site residues and CYP3 A -specific chemicals. Xenobiotica 26: 1067-1086.
Loiseau, N. 2002. Conception d'analogues structuraux d'un cyclopeptide modele: étude du mode de reconnaissance moléculaire par trois systèmes enzymatiques membranaires. Université Paris XI, Orsay.
Nelson, D. R. 1999. Cytochrome P450 and the individuality of species. Arch Biochem Biophys 369: 1-10.
Nelson, D. R., Koymans, L., Kamataki, T., Stegeman, J. J., Feyereisen, R., Waxman, D. J., Waterman, M. R., Gotoh, O., Coon, M. J., Estabrook, R. W., et al. 1996. P450 superfamily: Update on new sequences, gene mapping, accession numbers and nomenclature. Pharmacogenetics 6: 1-42.
Park, S. Y., Shimizu, H., Adachi, S., Nakagawa, A., Tanaka, I., Nakahara, K., Shoun, H., Obayashi, E., Nakamura, H., Iizuka, T., et al. 1997. Crystal structure of nitric oxide reductase from denitrifying fungus Fusarium oxysporum. Nature Struct. Biology 4: 827-832.
Patard, L., Stoven, V., Gharib, B., Bontems, F., Lallemand, J. Y., and De Reggi, M. 1996. What function for human lithostathine? Structural investigations by three-dimensional structure modeling and high-resolution NMR spectroscopy. Protein Eng 9: 949-957.
Podust, L. M., Poulos, T. L., and Waterman, M. R. 2001. Crystal structure of cytochrome P450 14alpha -sterol demethylase (CYP51) from Mycobacterium tuberculosis in complex with azole inhibitors. Proc Natl Acad Sci USA 98: 3068-3073.
Poulos, T. L., Finzel, B. C., Gunsalus, I. C., Wagner, G. C., and Kraut, J. 1985. The 2.6 Å crystal structure of Pseudomonas putida cytochrome P450cam. J. Biol. Chem 260: 16122-16130.
Raag, R., and Poulos, T. L. 1989. Crystal structure of the carbon monoxide-substrat-cytochrome P-450cam ternary complex. Biochemistry 28: 7586-7592.
Ravichandran, K. G., Boddupalli, S. S., Hasemann, C. A., Peterson, J. A., and Deisenhofer, J. 1993. Crystal Structure of Hemoprotein Domain of P450BM-3, a Prototype for Microsomal P450's. Science 261: 731-736.
Sanchez, R., Pieper, U., Melo, F., Eswar, N., Marti-Renom, M. A., Madhusudhan, M. S., Mirkovic, N., and Sali, A. 2000. Protein structure modeling for structural genomics. Nat Struct Biol 7 Suppl: 986-990.
Schmiedlin-Ren, P., Edwards, D. J., Fitzsimmons, M. E., He, K., Lown, K. S., Woster, P. M., Rahman, A., Thummel, K. E., Fisher, J. M., Hollenberg, P. F., and Watkins, P. B. 1997. Mechanisms of enhanced oral availability of CYP3 A 4 substrates by grapefruit constituents. Decreased enterocyte CYP3 A 4 concentration and mechanism-based inactivation by furanocoumarins. Drug Metab Dispos. 25: 1228-1233.
Szklarz, G. D., and Halpert, J. R. 1997. Molecular modeling of cytochrome P450 3 A 4. J Comput Aid Molec Design 11: 265-272.
Westlind-Johnsson, A., Malmebo, S., Johansson, A., Otter, C., Andersson, T. B., Johansson, I., Edwards, R. J., Boobis, A. R., and Ingelman-Sundberg, M. 2003. Comparative analysis of CYP3A expression in human liver suggests only a minor role for CYP3 A 5 in drug metabolism. Drug Metab Dispos. 31: 755-761.
Williams, P. A., Cosme, J., Sridhar, V., Johnson, E. F., and McRee, D. E. 2000. Mammalian microsomal cytochrome P450 monooxygenase: structural adaptations for membrane binding and functional diversity. Molecular Cell 5: 121-131.
Williams, J. A., Ring, B. J., Cantrell, V. E., Jones, D. R., Eckstein, J., Ruterbories, K., Hamman, M. A., Hall, S. D., and Wrighton, S. A. 2002. Comparative metabolic capabilities of CYP3 A 4, CYP3 A 5, and CYP3 A 7. Drug Metab Dispos. 30: 883-891.
Wrighton, S. A., Schuetz, E. G., Thummel, K. E., Shen, D. D., Korzekwa, K. R., Watkins, P. B. 2000. The human CYP3A subfamily: practical considerations. Drug Metab Rev. 32: 339-361.
Yano, J. K., Koo, L. S., Schuller, D. J., Li, H., Ortiz de Montellano, P. R., and Poulos, T. L. 2000. Crystal structure of a thermophilic cytochrome P450 from the archaeon Sulfolobus solfataricus. J Biol Chem 275: 31086-31092.
This application is a national phase application of International Application Number PCT/IB2003/005134, filed Oct. 28, 2003, and claims the benefit of U.S. Provisional Application No. 60/421,569, filed on Oct. 28, 2002, the contents of both of which are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB03/05134 | 10/28/2003 | WO | 4/28/2005 |
Number | Date | Country | |
---|---|---|---|
60421569 | Oct 2002 | US |