1. Field of the Invention
The invention described herein relates to drug discovery, and in particular relates to the evaluation of candidate molecular fragments.
2. Related Art
Stated broadly, the primary technical issue faced by many pharmaceutical companies is that of discovering or creating one or more molecules that bind to a specific protein in an appropriate manner. In particular, a molecule or molecules must be found that bind to a protein at a specific location, in a particular orientation, and that bind to the protein in a manner that satisfies thermodynamic requirements. One approach for creating such a molecule is to attack the problem at a fragment level. Here, the molecule is engineered one fragment at a time. Any candidate fragments must generally be evaluated one fragment at a time.
To achieve this, a given candidate fragment must be characterized. In particular, the fragment's three-dimensional structure and charge distribution must be determined. In addition, thermodynamic properties must be considered, for example, the solvation energy of the fragment. Moreover, given that the candidate fragment is in fact only a part of what may become a larger molecule, it is necessary to determine where, on the fragment, additional fragments may be attached and how feasible such attachments are.
Currently, there is no method to answer these questions precisely and comprehensively. Therefore, a method is needed to prepare a fragment, i.e., collect data related to a candidate fragment that facilitates the evaluation of that fragment as a possible building block of a larger molecule.
The invention described herein is a method for characterizing a molecular fragment so as to collect data related to the fragment. This data allows evaluation of the fragment for drug discovery purposes. Starting with a two-dimensional model of the candidate fragment, an initial three-dimensional model of the fragment is derived. Conformers of the fragment are identified. The conformers are then grouped, or clustered, and a representative conformer is selected from each cluster. An ab initio or semi-empirical electronic calculation is then performed on one or more of these selected conformers to characterize the geometry and charge distribution of the conformer. Each atom in a selected conformer is assigned a category, or type. The selected conformer is analyzed to determine if it is structurally symmetric. If so, the three-dimensional model of the fragment is adjusted to reflect the symmetry. The size of the fragment is calculated to allow analysis as to how the fragment physically fits with the protein and/or other fragments. The solvation energy of the fragment is also calculated. The free energy curve for the fragment is calculated. Derivatization points for the fragment are determined; a score is then assigned to each derivatization point, reflecting the ease or difficulty in bonding at the derivatization points. The fragment is assigned a name and categorized. The candidate fragment and its characterizing data derived in the above process can then be stored in a database.
Further embodiments, features, and advantages of the present invention, as well as the operation of the various embodiments of the present invention, are described below with reference to the accompanying drawings.
Embodiments of the present invention are now described with reference to the figures, where like reference numbers indicate identical or functionally similar elements. Also in the figures, the leftmost digit of each reference number corresponds to the figure in which the reference number is first used.
While specific configurations and arrangements are discussed, it should be understood that this is done for illustrative purposes only. A person skilled in the relevant art will recognize that other configurations and arrangements can be used without departing from the spirit and scope of the invention. It will be apparent to a person skilled in the relevant art that this invention can also be employed in a variety of other systems and applications.
I. Overview
The invention described herein represents a method for obtaining information about a fragment, wherein the information allows subsequent evaluation of the fragment as candidate for use in creating a drug.
In step 120, the structure of a candidate fragment is determined, along with information regarding the charges at various points in the structure, and derivatization points of the fragment at which other fragments can be attached.
In step 125, the interaction between the fragment and the protein is simulated. Conceptually, this simulation can entail the analysis of a system comprising an instance of the protein and numerous instances of the fragment. An evaporation process is then simulated, such that fragments that have not bound to the protein are evaporated or otherwise lost from the system. After a phase transition, what remains are fragments “bound” to the protein. This serves to reveal particular binding sites on the protein. Moreover, it is also necessary to determine the free energy for the fragment with respect to the protein. This determination is made is step 130, which is discussed in greater detail in U.S. patent application Ser. 10/784,708, filed Dec. 31, 2003, and incorporated herein by reference in its entirety.
Given that information has been collected regarding the fragment in relation to the protein in the above steps, in step 135 an evaluation of the fragment is performed. This represents a determination as to whether to proceed with the fragment to the synthesis stage. If the evaluation is favorable, the process continues at step 140. Here, a molecule can be engineered incorporating the evaluated fragment. Step 140 includes, for example, determination of the appropriate bond angles and lengths, as well as the necessary torsions in the molecular structure. Step 140 further provides information as to whether actual synthesis of the molecule is practical.
If so, the molecule may actually be synthesized in step 145. Independent of whether or not the molecule is synthesized, the information gained from the above steps can be compiled and organized for future reference. This takes place in step 150. This compilation of the results of the preceding analysis represents a characterization of the fragment. This characterization can then be stored in step 155. This characterization can be stored electronically, for example, in a database format using a commercially available database package. The process concludes at step 160.
An embodiment of fragment preparation, step 120 above, is illustrated generally in
In step 240, each atom of a given conformer is assigned to a particular type that is based on a variety of factors, including the element of the atom, its bonds, and the structures to which the atom is bonded. In step 245, the conformer is symmetrized. Here, a determination is made as to whether a fragment should be symmetrical, given its known molecular structure. If so, a determination is made as to whether corresponding bond lengths (i.e., those lengths that should be equal if symmetry is presumed) are in fact equal in the existing model of the fragment as derived in the above steps. If not, the corresponding bond lengths of the fragment model are adjusted so as to achieve this presumed symmetry. Likewise, a determination is made as to whether corresponding bond angles are equal in the existing model. If not, the bond angles of the fragment model are adjusted to achieve this presumed symmetry.
In step 250, the size of a fragment is calculated for purposes of geometric analysis. A measure of the size of the fragment is denoted here as the fragment-fragment cutoff. This provides information that allows analysis of whether a particular fragment will fit in a particular location, in light of the topologies of the protein and/or other neighboring fragments. The fragment-fragment cutoff can also be used by an energy evaluation algorithm as a measure of when to include or exclude a fragment or atoms of the fragment in an energy evaluation step. In step 255, the solvation energy of the fragment is calculated. In step 260, the B-shift for the fragment is calculated. As will be described in greater detail below, this allows for expedited computation of the free energy curve of the fragment.
In step 265, the derivatization points of the fragment are determined, and a score is assigned to each derivatization point. The score indicates the ease or difficulty of bonding another structure to that derivatization point. In step 270, the fragment is assigned to a category and assigned a particular name. In step 275, all the information derived above for the given fragment conformer is stored. Such information can be stored electronically in a database, for example. The process concludes at step 280.
II. Processing, Fragment Preparation
As described above, in particular embodiments, the first step in the fragment preparation process is to receive a two-dimensional model of a fragment. The next step is to derive an initial three-dimensional model of the fragment on the basis of the received two-dimensional model. This derivation is illustrated in more detail in
Once an initial three-dimensional model of the fragment is constructed, conformers of the fragment can be identified which begins with step 410 of
A selected conformer can then be prepared for an ab initio or semi-empirical calculation. The ab initio or semi-empirical calculation and analysis is illustrated in greater detail in
Each atom in the fragment under analysis is then assigned to a particular atom type. The process for this classification is illustrated in greater detail in
One scheme under which atoms can be typed is illustrated in the following table. For each element, the type's name is given, followed by the definition of the type.
Atoms not fitting any of these categories can be flagged for later analysis.
The above chart is meant to be exemplary only; other classification schemes can also be used in addition to or in conjunction with the above scheme without departing from the spirit or scope of the invention.
At this point, a three-dimensional model of the fragment has been derived and refined. Some fragments can be further refined with respect to their structural model by determining whether or not the fragment should be symmetric. If so, the bond lengths; bond angles and partial charges of the atoms of the three-dimensional model can be adjusted to achieve symmetry. This process is illustrated in greater detail in
At step 745, corresponding bond angles are compared. In step 750, a determination is made as to whether the difference between two corresponding bond angles exceeds a predetermined threshold value. Here, the difference in bond angles is referred to as “differenceA”, while the threshold is denoted “thresholdA.” Again, if differenceA exceeds thresholdA, then significant asymmetry is present, and the fragment can be evaluated offline in step 755. Otherwise, the process continues at step 760. Here, a determination is made as to whether thresholdA exceeds zero. If so, then the process continues at step 765, where the corresponding bond angles are adjusted. In an embodiment of the invention, corresponding bond angles are adjusted by averaging. The average bond angle is then substituted for each of the corresponding angles. If thresholdA does not exceed zero in step 760, then there is no need to adjust the bond angles and the process of comparing bond angles is concluded at step 770.
At step 775, corresponding partial charges are compared. In step 780, a determination is made as to whether the difference between two corresponding partial charges exceeds a predetermined threshold value. Here, the difference in partial charges is referred to as “differenceA”, while the threshold is denoted “thresholdA.” Again, if differenceA exceeds thresholdA, then significant asymmetry is present, and the fragment can be evaluated offline in step 785. Otherwise, the process continues at step 790. Here, a determination is made as to whether thresholdA exceeds zero. If so, then the process continues at step 795, where the corresponding partial charges are adjusted. If thresholdA does not exceed zero in step 790, then there is no need to adjust the partial charges and the process concludes at step 798.
In an embodiment of the invention, the partial charges can be averaged. The average partial charge can then be substituted for each of the corresponding partial charges. If, however, differenceL does not exceed zero, then there is no point in adjusting the partial charges.
An example of a symmetrical fragment is illustrated in
Another symmetrical molecule is illustrated in
Another determination that can be made in this invention is the fragment-fragment cutoff. The fragment-fragment cutoff represents the size of a fragment. This size is used as a unit of distance for analytical purposes. If a fragment is a certain number of units away from another fragment, the interaction between the two fragments can be ignored for modeling purposes. Also, fragments may attach themselves to a protein in layers. Any fragment that is outside the innermost layer of fragments (i.e., outside the monolayer) can be disregarded for modeling purposes. It is the fragments that are in the monolayer that might represent fragments of interest. The monolayer of fragments can be characterized by considering the distance of such a fragment from the protein, as measured by the fragment-fragment cutoff distance.
The determination of a fragment-fragment cutoff is illustrated in greater detail in
In addition to the fragment-fragment cutoff, it is also useful to calculate the solvation energy of a fragment. Conceptually, the solvation energy for a fragment refers to the energy required to break its interaction with a solvent, along with any energy recovered if and when the fragment bonds to the protein. Generally, there are several ways to calculate solvation energy. One is the use of a continuum solvent model. One example is the general born/surface area (GB/SA) model. This model is often used for small fragment molecules to calculate the free energy of solvation. Another method is to use MacroModel (Maestro), a commercially available product (Schrödinger, LLC, Portland, Oreg.). Other models that can be used to calculate solvation energy include the TIP3P, TIP4P, TIP4P models and the Poisson-Boltzmann model.
The invention also includes a process for generating a free energy curve for a fragment. The process of simulating a fragment against a given protein can be viewed conceptually as a system that includes an instance of the protein molecule and a plurality of instances of a fragment. In the simulation, free fragments are allowed to leave the system, lowering the total number of fragments in the system. Eventually, the system has a lower energy ΔG, given that free fragments have left the system in a process akin to evaporation. Remaining in the system at this point would be the protein molecule along with whatever fragments have bonded to the protein. A free energy curve represents the change in the number of fragments in the system as the free energy decreases, given the loss of the free fragments.
A free energy curve is illustrated in
Also shown in
An energy offset can be calculated from the Nfrag curve which can aid in determining the free energy schedule of the simulation between the protein and the fragment. The energy offset aids in determining when the transition point for the protein-fragment free energy curve is approaching. Accordingly, calculating and using the free energy offset in the protein-fragment simulation saves computer time by allowing the free energy to change in relatively large increments prior to the energy offset.
In the protein-fragment simulation, prior to the energy offset, the free energy is changed in relatively large increments. As the free energy in the protein-fragment simulation approaches the energy offset, the increments at which the free energy changes become smaller. The ability to change the free energy in relatively large increments prior to the energy offset saves valuable computational time.
It is also useful to determine the derivatization points on a fragment. A derivatization point represents a point on a fragment where additional atoms or structures can be bonded. This information is useful for purposes of determining what molecules can be generated by building on the fragment. Moreover, it is also useful to determine how easy or difficult it is to synthesize or modify a molecule at a given derivatization point. This process is illustrated in
Once all the above has been performed, it can be useful to assign a name to the fragment and/or assign the fragment to a category. The name can be that used by the International Union of Pure and Applied Chemists (IUPAC) or the common name. Generally, the name is unique for every conformer.
Categorization of the fragments can be performed for purposes of organization of data accumulated with respect to a given protein. There are a number of categories which can be used. For example, a fragment can be categorized as a scaffold. This indicates that the fragment can be used as a frame on which a larger molecule can be constructed. A fragment can also be categorized as a linker, indicating that the fragment can be used to link two or more other molecular structures. Alternatively, a fragment can be categorized as a hydrophobe, a hydrogen bond acceptor, or a hydrogen bond donor. Note that these categories are not mutually exclusive. In yet a third scheme, categories can be substructure based. A fragment can, for example, be a benzene core molecule, a biphenyl core molecule, or a diphenyl ether core.
Finally, all of the above information can be stored in a database. Existing, commercially-available databases can be used for this process. The stored information can include, for example, one-, two-, or three-dimensional structural information as derived above.
III. Computing Environment
Some or all of the present invention may be implemented using software and may be implemented in conjunction with a computing system or other processing system. An example of such a computer system 1200 is shown in
Computer system 1200 also includes a main memory 1208, preferably random access memory (RAM), and may also include a secondary memory 1210. The secondary memory 1210 may include, for example, a hard disk drive 1212 and/or a removable storage drive 1214, representing a magnetic tape drive, an optical disk drive, etc. The removable storage drive 1214 reads from and/or writes to a removable storage unit 1218 in a well-known manner. Removable storage unit 1218 represents a magnetic tape, optical disk, or other storage medium that is read by and written to by removable storage drive 1214. As will be appreciated, the removable storage unit 1218 can include a computer usable storage medium having stored therein computer software and/or data.
In alternative implementations, secondary memory 1210 may include other means for allowing computer programs or other instructions to be loaded into computer system 1200. Such means may include, for example, a removable storage unit 1222 and an interface 1220. An example of such means may include a removable memory chip (such as an EPROM, or PROM) and associated socket, or other removable storage units 1222 and interfaces 1220 which allow software and data to be transferred from the removable storage unit 1222 to computer system 1200.
Computer system 1200 may also include one or more communications interfaces, such as network interface 1224. Network interface 1224 allows software and data to be transferred between computer system 1200 and external devices. Examples of network interface 1224 may include a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA slot and card, etc. Software and data transferred via network interface 1224 are in the form of signals 1228 which may be electronic, electromagnetic, optical or other signals capable of being received by network interface 1224. These signals 1228 are provided to network interface 1224 via a communications path (i.e., channel) 1226. This channel 1226 carries signals 1228 and may be implemented using wire or cable, fiber optics, an RF link and other communications channels.
In this document, the terms “computer program medium” and “computer usable medium” are used to generally refer to media such as removable storage units 1218 and 1222, a hard disk installed in hard disk drive 1212, and signals 1228. These computer program products are means for providing software to computer system 1200.
Computer programs (also called computer control logic) are stored in main memory 1208 and/or secondary memory 1210. Computer programs may also be received via communications interface 1224. Such computer programs, when executed, enable the computer system 1200 to implement the present invention as discussed herein. In particular, the computer programs, when executed, enable the processor 1204 to implement the present invention. Accordingly, such computer programs represent controllers of the computer system 1200. Where the invention is implemented using software, the software may be stored in a computer program product and loaded into computer system 1200 using removable storage drive 1214, hard drive 1212 or communications interface 1224.
IV. Conclusion
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in detail can be made therein without departing from the spirit and scope of the invention. Thus the present invention should not be limited by any of the above-described exemplary embodiments.