1. Field of the Invention
The present invention relates to computer-implemented methods and systems of determining the affinity between polypeptide amino acid residues and one or more molecular fragments. The invention further provides methods and systems of using the affinity values to aid in drug design.
2. Related Art
The action of a particular drug is believed to be due to the interaction of that drug with a particular molecular target, such as a protein, nucleic acid, or other molecule found in the biological system. Typical protein drug targets include enzymes and receptors. Thomas, G., “Medicinal Chemistry—An Introduction” (John Wiley & Sons, Ltd., New York, 2001). The binding of the drug to the active or other sites (allosteric sites) of an enzyme usually has the effect of preventing the normal operation of that enzyme. Similarly, drugs act on receptors by binding to or near to a specific receptor that may either activate the receptor or prevent the binding of the receptor's normal substrate to that receptor. Ultimately, both of these actions can result in a physiological response that may have a therapeutic effect. The drug's effectiveness will depend on the stability of the drug-enzyme or drug-receptor complex and the number of sites occupied by the drug. Other targets for drug action include nucleic acids and other naturally occurring molecules. Id.
To rationally develop a drug lead, therefore, it is desirable to know the binding site on the target molecule (e.g., enzyme, receptor or nucleic acid). One approach to determining protein binding sites is protein mapping, where molecular probes, such as small organic molecules or functional groups are placed around the protein surface to determine the most favorable binding positions (Dennis et al., PNAS 99:4290-4295 (2002)). Experimental approaches to protein mapping include x-ray crystallography and NMR methods. Id. Both of these approaches have shown that probes, even those generally unrelated to any natural substrate of the protein, bind to only a limited number of positions on the protein. Generally, a pocket of the active site tends to form a consensus site that binds many ligands, regardless of their sizes and polarities. Id.
Because of the experimental difficulties associated with co-crystallizing proteins and probes, or the use of NMR to determine binding sites, a number of methods have been developed to perform mapping computationally rather than experimentally (e.g, the drug design program GRID (Goodford, P. J., J. Med. Chem. 28:849-875 (1985) and the multiple copy simultaneous search (MCSS) strategy (Miranker, A. & Karplus, M., Proteins Struct. Funct. Genet. 11:29-34 (1991)).
The major problem with these approaches, however, is that they result in many energy minima along the surface of the protein, making it difficult to determine which of the minima is actually relevant (Dennis et al., PNAS 99:4290-4295 (2002)).
In one approach, a mapping algorithm was developed that uses an empirical free energy function to determine the active sites of egg white lysozyme and thermolysin. Id. This study reported that the sites identified by computational mapping agreed with those identified experimentally.
The MCSS tool, discussed above, uses a fragment-based computational approach to identify binding sites and, as discussed above, is essentially based on an energy minimization approach, providing fragment states corresponding to various local minima of the fragment-protein interaction potential energy field.
Improved computational methods are necessary to provide accurate, quantitative estimates of the free energy of binding of molecular fragments to potential binding sites so that ligands can be designed for these sites. Such a computational method should yield values which can be used to determine the affinity of particular fragment-residue pairs, so that the results of particular simulations can be compared, the degree of convergence of a particular simulation can be determined and binding sites and key fragments can be identified.
Recognizing the tremendous need for accurate determinations of the interaction energies between fragments and the amino acid residues of polypeptide molecules, the present inventors have developed methods and systems of determining the affinity values of fragment-residue pairs.
Accordingly, the present invention provides methods and systems of determining the affinity between polypeptide amino acid residues and one or more molecular fragments. The invention includes conducting a computer simulation of (i) a polypeptide and (ii) at least one molecular fragment, wherein at least one interaction energy is calculated between the polypeptide and at least one molecular fragment, wherein each calculated interaction energy is associated with a position of said at least one molecular fragment; and assigning an affinity value to at least one fragment and residue pair when said fragment is in the vicinity of the residue, wherein said affinity value is a measure of the free energy of interaction between the polypeptide and the fragment; wherein the above calculations are conducted for each molecular fragment present in the computer simulation.
The present invention further provides methods and systems of using the affinity values of the present invention to, for example, determine the degree of convergence of a particular simulation, compare the results of multiple computer-implemented simulations, identify protein binding sites, and help determine the key fragments to use in constructing ligands for a given polypeptide.
Terms are used herein as generally used in the art, unless otherwise defined herein.
In one aspect, the present invention provides methods and systems of determining the affinity between polypeptide amino acid residues and one or more molecular fragments. In one embodiment, the present invention includes conducting a computer simulation of (i) a polypeptide, and (ii) at least one molecular fragment, wherein at least one interaction energy is calculated between said polypeptide and said at least one molecular fragment, wherein each calculated interaction energy is associated with a position of said at least one molecular fragment; and assigning an affinity value to at least one fragment and residue pair when said fragment is in the vicinity of the residue, wherein said affinity value is a measure of the free energy of interaction between the polypeptide and the fragment; wherein the interaction energy and the affinity value is determined for each molecular fragment present in the computer simulation.
As used herein, the term “polypeptide” encompasses a molecule comprised of amino acid molecules linked by peptide bonds, and includes all such molecules, regardless of the number of amino acids in the molecule. The term polypeptide, as used herein, also includes molecules which include other moieties in addition to amino acids, such as glycosylated polypeptides, e.g., antibodies. The term polypeptide, as used herein, also includes protein molecules which consist of more than one chain of amino acids linked by peptide bonds; the multiple chains may be covalently bonded to each other by means of disulfide sidechain bonds.
“Fragments,” as the term is used herein, includes molecules or molecular fragments (e.g., radicals) that can be used to model one or more interaction with a macromolecule, such as the interactions of carbonyls, hydroxyls, amides, hydrocarbons, and the like. Examples of useful fragments include:
Preferably, the fragments selected are representative of chemical features that have proven useful in the design of pharmaceuticals or other bioactive chemicals. Additional fragments will be readily apparent to one skilled in the art.
In one aspect of the methods of the present invention, a computer simulation of a polypeptide and at least one molecular fragment is conducted, wherein at least one interaction energy is calculated between the polypeptide and the at least one molecular fragment, wherein each calculated interaction energy is associated with a position of the molecular fragment. For example, in one aspect of the present invention, the calculation of interaction energy between the polypeptide and each molecular fragment, uses a Monte Carlo method to explore the protein/fragment confirmation space.
In one embodiment, the three dimensional structure of a target protein, usually obtainable experimentally from x-ray crystallography, is known and the basic interactions between the protein and the small fragments (e.g., average molecular weight of approximately 150) are computed. This computation can be carried out by Monte Carlo (MC) modeling and analysis (usually implemented in software) for a large collection of organic fragments with diverse physico-chemical properties. The number of fragments can be in the hundreds to thousands. For this purpose, Locus Pharmaceuticals, Inc., Blue Bell, Pa., developed the Locus Monte Carlo (LMC) code. For each rigid fragment instance, a set of attributes are saved, including:
(x,y,z), q=(q1, q2, q3, q4), fragment-protein energy,
where (x,y,z) are the coordinates of the fragment's center of mass, and q is the quaternion characterizing its orientation.
This MC data for the different fragments is analyzed for identifying potential binding sites using the methods of the present invention. These tools are based on the postulate that a binding site must be a high affinity region for a diverse collection of fragments. In one aspect, experimental binding site data (e.g., co-crystal X-ray data and residue mutational analysis), if available, is used to determine the final site within which the leads are designed.
In another aspect, the actual thermodynamic fragment distributions around the protein, i.e., distributions consistent with thermal fluctuations at physiological temperatures are calculated. Information on the thermodynamic distribution is essential for computing free energies of binding, which is the basic biologically relevant quantity for quantifying the binding affinity of a ligand. The MCSS approach, by contrast, is essentially based on an energy minimization approach, providing fragment states corresponding to various local minima of the fragment-protein interaction potential energy field. Such a procedure is computationally more expeditious than computing the actual physical distributions, but is unable to provide information on entropic effects, essential for free energy estimates.
For computing the thermodynamic distributions, MC simulation packages often make use of a Metropolis Monte Carlo approach (Metropolis, N., et al., J. Chem. Physics 21:1087-1092 (1953)) for sampling from a grand-canonical ensemble of states (Adams, D. J., Molecular Physics 29:307-311 (1975); Mezei, M., Molecular Physics 61:565-582 (1987)). In addition to exchanging just energy with a surrounding thermal bath, as in the case of a canonical ensemble, the system described by a grand-canonical ensemble may exchange particles or fragments as well. The energy cost associated with inserting/deleting a fragment from the system is controlled by its chemical potential. By varying this chemical potential, so-called simulated annealing of the chemical potential, one may vary the average number of fragments in the simulation system. It can be shown that measuring the values of the chemical potential at which fragments leave various sites on the protein provides an estimate of the free energy of binding for the different binding modes over the protein surface.
The practicality of the simulated annealing procedure for estimating binding affinities was demonstrated by Guarnieri and Mezei for differentiating hydration propensities of different DNA grooves (Guarnieri, F. and Mezei, M., J. Am. Chem. Soc. 118:8493-8494 (1996)). These results were obtained with the Metropolis Monte Carlo (MMC) code developed by the group of Mezei, Mount Sinai School of Medicine, N.Y. For these simulations, the system was composed of a molecule fraction of DNA surrounded by a varying number of interacting water molecules. In one embodiment, the LMC algorithm carries out a similar calculation for all fragments with respect to the target protein.
Accordingly, in an embodiment of the present invention, a macromolecule is analyzed for potential binding sites. For example, this analysis can be accomplished by (1) positioning an instance of a computer representation of a molecule or molecular fragment at a plurality of sampling sites of the macromolecule; (2) selecting a value of B, wherein B=μ′/kT+ln <N>, where μ′ is the excess chemical potential, k is Boltzmann's constant, T is the absolute temperature, and <N> is the mean number of molecules of the molecule or molecular fragment; (3)repositioning the instances of the molecule or molecular fragment; (4)accepting or rejecting each instance of the repositioned molecule or molecular fragment based on the Metropolis sampling criteria using the computed binding energy compared to the selected value of B; (5) repeating steps (1) through (4) at a lesser value of B; and outputting a list of unrejected instances of the molecule or molecular fragment; wherein the molecule or molecular fragment is an organic fragment.
In an embodiment, the sampling sites comprise an unbiased sampling of sites of the macromolecule.
In another aspect of the invention, a Monte Carlo based computer simulation is conducted which excludes the fragment-fragment interactions. It has been acknowledged that considering fragment-fragment interactions may be detrimental to the interpretation of the simulation results for all fragments but water. Indeed, due to the high dilution of the solute molecules in actual biochemical relevant conditions, considering interactions between non-water fragments may not be realistic. Furthermore, the drug leads assembled by LCD usually are composed of only one fragment of each type. Fragment-fragment interactions in a simulation may thus lead to detrimental correlation effects.
Therefore, in an embodiment, the interaction between a given fragment and a protein is analyzed by sampling the fragment states from a thermodynamically relevant Grand-Canonical distribution. The underlying sampling algorithm is a weighted Metropolis Monte Carlo approach, described herein as weighted Grand-Canonical Metropolis Monte Carlo (WGCMMC) sampling. The weighting procedure is implemented by subdividing space with an orthogonal, equidistant grid. Each grid cell x is assigned a local, numerical chemical potential field value Bnum,(x), which is adapted iteratively to ensure an approximately uniform numerical sampling of fragment states around the protein. Bnum is related to the thermodynamic cost of inserting or removing a fragment, and its local value defines the weight for each sampled fragment state.
Once the Bnum field has sufficiently converged, and the Markov chain associated to the MC sampling has equilibrated, the Markov chain can be sampled periodically at successive decorrelated states. Positions, orientations, potential energies and statistical weights for all fragment states are saved. Binding modes are then identified and corresponding binding free energies estimated.
This approach makes use of a Grand-Canonical Metropolis Monte Carlo algorithm for sampling fragments around the target protein. This sampling data can then be directly used for estimating the free energy of binding for different binding modes of the fragment on the protein surface. This approach distinguishes itself from the MMC process, in that it removes fragment-fragment interactions.
It turns out that the standard Monte Carlo approach has difficulty in handling simulations where fragment-fragment interactions are removed. Indeed, the absence of fragment-fragment interactions can lead to a broad range of fragment densities between the high and low affinity binding sites on the protein. This results from the possible overlap of fragments. The standard Metropolis Monte Carlo scheme used in conventional approaches has trouble resolving this dynamical range in densities. This problem is overcome with a weighted Monte Carlo scheme.
Therefore, in an embodiment, a linear Monte Carlo approach (i.e., without fragment-fragment interactions) is used to calculate the protein-fragment interaction energies. This scheme can be described as “linear” in reference to the linear properties of the Liousville equation in the absence of fragment-fragment interactions. Liousville's equation describes the time evolution of the system away from equilibrium.
First, the derivation of the grand canonical distribution for WGCMMC is presented below.
The potential energy of the system composed of N fragments is denoted U(Γ, N). In general, U includes both contributions from fragment-protein and fragment-fragment interactions. The configuration of the system is characterized by
Γ=(Y1,Y2, . . . ,YN), (1)
where Yi=(xi,Ωi) stands for the position xi and orientation Ωi of fragment i.
In the grand canonical ensemble, the probability that the system has N fragments in configuration Γ is given by
with the normalization factor given by the grand partition function
Here V is the volume of the system, φ is the volume of orientational space, and B is related to the excess chemical potential μex, i.e. the energy cost in units of β−1=KT for a particle to leave the system:
B=βμex+log{overscore (N)}, (4)
where N is the average number of fragments in the system.
Assuming no fragment-fragment interactions, the potential energy U of the system becomes:
where E(Yi) is the energy of interaction of the ith fragment with the protein.
The Grand Partition Function can then be written as
The probability P(N) for having N fragments in the system is then given by
As expected, this is simply the Poisson distribution with parameter Z. In particular, the average number of fragments in the system is given by
which thus scales exponentially with B.
In fact, more generally, the probability P(n,ΔV) of finding n fragments in any given subvolume ΔV of configuration space is given by a Poisson distribution:
Finally, the single fragment density is given by
which again scales exponentially with respect to B.
Equation (12) for the single fragment density shows the large dynamical range that may result from the exponential dependence of this quantity with respect to the single fragment-protein potential energy E(Y). This dependence results from the possible overlap of the non-interacting fragments. This was not an issue in the presence of fragment-fragment interactions, as an upper bound to the fragment density was then set by the tightest possible packing of the molecules.
The underlying method developed for WGCMMC to enable the accurate resolution of the above-mentioned dynamical range in densities is presented here.
For numerical purposes, instead of considering a constant B value, one may consider a field Bnum(Y) in the single particle configuration space Y. This field represents the energy cost for a particle to leave the system specifically from position Y. In this case, the density of states in the grand canonical ensemble (2) is given by
with the normalization factor (grand partition function) now given by
A similar derivation as the one used for obtaining Eq. (12) leads to the corresponding single fragment density:
Equation (16) shows that through the field Bnum(Y), the amplitude of the density in each position Y of the single particle configuration space can be calculated. Thus, by iteratively adapting Bnum(Y) during the convergence phase of the Metropolis MC simulation, one may obtain appropriate sampling in all regions of interest. This is achieved by taking
Bnum(Y)≅min (βE(Y)+const, Bmax), (17)
leading to similar numerical densities of fragment instances in various regions of space. An upper bound Bmax is set on Bnum to avoid unnecessary sampling in strongly unfavorable positions, i.e., essentially for configurations leading to steric clashes. In practice, the field Bnum(Y) is chosen to be independent of the fragment orientation, and to be piece-wise constant on a 3-D grid in x-space (translational-space).
Making use of the exponential dependence in B of the density, one can infer the physical fragment density fgc(Y) at any B=B0=constant value from the simulation results for a given numerical Bnum(Y) field. Assume that one has a sampling {Γi=(Y1, . . . ,YN
where wj is the weight assigned to the fragment state Yj, and defined by
The scan over the B schedule carried out in an MMC-type annihilation simulation is thus replaced in a WGCMMC run by a simulation for a single Bnum(Y) field.
The following addresses how the WGCMMC data can be handled and analyzed.
The starting point for the data interpretation is the relation linking the WGCMMC data to the association constant Ka characterizing the binding of the considered fragment to a given region on the protein. As a reminder, this relation is rederived here.
The association constant Ka characterizes the equilibrium of the binding process
F+P⇄FP, (20)
and is defined by
where [P], [F], and [FP] are respectively the concentrations of protein alone, fragment alone, and of a particular protein-fragment complex (binding mode). The association constant is indeed the basic biologically relevant quantity.
In the case of the LMC system, consider a single protein in a volume V. For the sake of the following discussion, take V to be large, although for the actual MC simulation this need not be the case. The protein concentration is therefore given by [P]=1/V. Furthermore, note that n is the average number of fragments in the binding volume ΔVb (in general a volume with limits both in translational and orientational space), and N is the average total number of fragments in the system, so that [F](N−n)/V and [FP]=n/V. The association constant can thus be written
having invoked the thermodynamic limit of large volume V, so that n<<N (N/V Π const, for V Π∞). The values n and N can be obtained from the fragment density (12):
having again invoked the assumption of the high protein dilution, so that the total system volume V is much larger than the effective region of interaction between the fragment and the protein and thus one may consider E(Y)≅0 in deriving Eq. (24). The association constant now becomes:
On the basis of Eq. (25) one can also write the association constant in terms of the free energy of binding ΔA
Ka=V exp (−βΔA). (26)
where ΔA=AFP−AF, with AFP and AF the free energies of the fragment-protein complex and of the fragment alone respectively:
The critical value Bc that is associated to the binding volume ΔVb can be defined as the value for which the average number of fragments in the binding site is 1. From Eq. (23) follows:
and from (25), (26) and (29) one finally obtains:
Thus, a low Bc value reflects a high affinity binding mode, and inversely a higher Bc value reflects a lower affinity mode.
The critical value Bc can be computed from the WGCMMC data using definition (29), as well as Eqs (18) and (19):
Equations (30), (31) and (32) provide the basic relations for how the WGCMMC data is to be interpreted.
Following the computer simulation which calculates the interaction energy between a polypeptide and at least one molecular fragment, an affinity value is assigned to at least one fragment and residue pair when the fragment is in the vicinity of the residue. A fragment is defined as being in the vicinity of a residue when at least one pair of fragment-residue atoms (i,j) is within a predetermined threshold distance, wherein said threshold distance is based on the sum of the Van der Waals radii of said fragment-residue atoms. In an embodiment of the present invention, the predetermined threshold distance is defined as follows:
rij<α(RVdW,i+RVdW,j) (32)
wherein rij is the distance between the two atoms, RVdW is the Van der Waals radius and α is a numerical parameter. In an embodiment, α is between about 0.5 and about 2.0, and preferably is about 1.2. In one aspect the Van der Waals radius is about half the Lennard-Jones parameter from a molecular mechanics force-field. In an aspect of the present invention, the molecular mechanics force field is selected from the group consisting of AMBER, GROMOS, CHARMM, Xplor, Discover, MMFFF and Tripos. AMBER is a particularly preferred force field.
As discussed above, the affinity value that is assigned to any particular fragment-residue pair is a measure of the free energy of interaction between the polypeptide and fragment, thus, both enthalpic and entropic contributions are included.
In an aspect of the present invention, when a Monte Carlo based computer simulation is used which includes fragment-fragment interactions, and a simulated annealing of chemical potential is conducted, the affinity value comprises B-critical. B critical is defined as the minimum B value for which a particular fragment is persistently observed in the vicinity of a residue, wherein B=μ′/kT+ln <N>, where: ′ is the excess chemical potential, k is the Boltzmann's constant, T is the absolute temperature, and <N> is the mean number of molecules of the molecular fragment. In an embodiment, a particular type of fragment is persistently observed in the vicinity of a residue when the average number of fragments in the vicinity of the residue is between 0.8 and 1.0. In another aspect of the present invention, a particular type of fragment is persistently observed in the vicinity of a residue when the average number of fragments in the vicinity is greater than or equal to 0.9.
In another embodiment, with respect to linear Monte Carlo, B-critical is also used to determine the fragment/residue affinity. Accordingly, the binding affinity of a fragment for different regions on the protein surface can be estimated by assigning a critical B, to each fragment-residue pair. These Bc values are obtained from the WGCMMC data by applying relation (32), where the volume ΔVb is approximated for each residue on the basis of a proximity criteria.
The volume defined on the basis of the proximity criteria may only be an estimate of a binding mode volume. The corresponding Bc values must therefore be used as estimates of free energy of binding. Nonetheless, comparing sets of Bc values for different fragments has proven valuable to help identify protein binding sites as follows: A binding site is identified as a set of residues with low Bc values (high affinity) for multiple fragments with diverse physico-chemical properties. This approach is based on the assumption that diverse interactions in a localized region are the necessary condition for ensuring the specificity of a binding site. Preferably, this numerical localization of the binding site is complemented by experimental binding information such as co-crystal X-ray data and mutational analysis.
Improved estimates for the binding mode volumes ΔVb, compared to the above described residue-based proximity criteria, provide more accurate estimates of free energy of binding using Eq. (32). Such improved binding mode volume estimates are determined and represented by clustering sampled fragment states belonging to the same potential energy well. For this purpose the potential energies saved for the sampled fragment states are used.
In another aspect of the present invention, following the computer simulation of the polypeptide and at least one fragment and the assignment of affinity values to specific fragment/residue pairs, a binding analysis profile is outputted that comprises a matrix of affinity values for each fragment-residue pair.
In an embodiment of the present invention, numerous separate computer simulations are conducted on a particular polypeptide, wherein in each simulation a different fragment type interacts with the protein. For example, a simulation of polypeptide A is conducted with fragment X, wherein interaction energies are calculated, and affinity values assigned to fragment/residue pairs as described above. A computer simulation of polypeptide A is then conducted with fragment Y, wherein interaction energies are calculated, and affinity values assigned to fragment/residue pairs as described above, etc.
When separate simulations are conducted for a given polypeptide, a separate affinity value matrix can be generated for each fragment type. In this way the output can enable a ranking of the residues with respect to average fragment-binding ability for a given residue. A matrix of affinity values can be generated which is averaged over fragments types, and the polypeptide surface is coded according to average fragment binding affinity. For example, residues with highest fragment binding affinity values are a different color from the residues with the lowest affinity values. In this way, potential binding sites on the macromolecule can be identified. For example, the output can be visual, displaying the residues on the macromolecule's surface in different colors across the visible light spectrum from red (highest average residue-fragment affinity) to blue (lowest average residue-fragment affinity). In such an output, the potential binding sites with higher probabilities of being actual binding sites will appear as groups of residues colored red or closer to red in the spectrum.
The residue-fragment affinity can also be used to measure the degree of convergence of a simulation. For example, a matrix of B-critical values for each residue-molecular fragment pair can be created (an “affinity profile”). Convergence is considered to have occurred when the affinity profile remains constant (or stops changing) within a predetermined threshold range.
The residue-fragment affinity can also be used to measure the degree of difference between simulations. For example, the affinity profile of two or more simulations can be compared, and the absolute and statistical measures of their variance can be calculated.
The residue-fragment affinity can also be used to identify key fragments which can be used to design ligands (i.e., drug candidates). For one or more selected residues, molecular fragments can be ranked according to affinity value. For example, for a selected residue, the molecular fragments can be listed in ascending or descending order of residue-fragment affinity. Similarly, in an embodiment, the invention enables the display of a table of residues for each fragment that highlights the regions on the protein for which the fragment has the highest affinity.
The present invention is described in further detail in the following non-limiting examples.
The following data in Table 1 was generated from a simulation conducted according to the methods of the present invention on the protein Caspase-3. Amino acids are listed on the left hand side, while the fragments are listed at the top. The binding affinities associated with the fragment-residue pairs are listed.
The following data in Table 2 was generated from a simulation conducted according to the methods of the present invention on the protein Caspase-8. Amino acids are listed on the left hand side, while the fragments are listed at the top. The binding affinities associated with the fragment-residue pairs are listed.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in detail can be made therein without departing from the spirit and scope of the invention. Thus the present invention should not be limited by any of the above-described exemplary embodiments.
All references and publications referred to herein are hereby incorporated by reference in their entirety.