The present invention relates to a drug screening method for determining a novel target-based drug. More specifically, the present invention relates to a drug screening method for determining a novel target-based drug by finding an optimal pharmacophore of a drug through numerical inversions of a quantitative structure-(drug)performance relationships (QSPR) model of a test compound group, and through molecular dynamics simulation on complexes each being composed of a target molecule and a candidate compound.
For many years, drug discovery has been dominated by structure-based methods that focus on development and analysis of compounds themselves. In recent years, with advances in technology, in silico prediction technology such as target-based drug design and ligand-based virtual screening has been widely used during the initial stage of research and development of a novel drug. The in silico prediction technology is a method of virtually predicting active compounds against a specific biological target with high accuracy by using a computer.
The food and drug administration (FDA) in USA encourages pharmaceutical companies to use in silico computer modeling technology for evaluation of drug efficacy and side-effects. The use of an appropriate in silico prediction model can significantly reduce clinical tests required in a traditional drug approval process. In fact, the FDA assumes that a 10% improvement in accuracy of identification of targets for a test compound as novel drug can reduce the cost of drug discovery by an amount of 100 of millions to 1 billion dollars.
Therefore, global leading pharmaceutical companies are developing computer-based prediction systems using experimental data of metabolic screening and absorption screening before entering a preclinical study stage. For example, Novartis, a multinational pharmaceutical company, is known to have developed a computer program called ToxCheck in 2003 to solve toxicity-related problems. However, it is difficult to determine the technological level of the company because the program has not been commercially available. In addition, in recent years, the establishment of a large number of institutions/entities that evaluate interactions and toxicity of drugs proves the need of the development of prediction models.
In addition, in order to create a prediction model using a computer, investments for collection and accumulation of necessary experimental data are being made mainly in the United States, Germany, the United Kingdom, and Japan.
The present invention has been made due to the above-described need, and an objective of the present invention is to provide a target-based drug screening method capable of significantly reducing the cost of drug discovery and avoiding synthesis of false compounds and wasteful pharmaceutical tests of the false compounds by using numerical inversions of a quantitative structure-(drug)performance relationships model of a test compound group against a target molecule, and by performing molecular dynamics simulation on complexes each being composed of a drug candidate and a target molecule during discovery of a novel drug.
In order to accomplish the above objective, according to one aspect of the present invention, there is provided a drug screening method for determining a novel target-based drug by using a numerical inversion of a quantitative structure-(drug)performance relationships (QSPR) model and molecular dynamics simulation, the method including: modeling a molecular structure of a test compound group against a target molecule; obtaining a quantitative structure-(drug)performance relationships (QSPR) model between the molecular structure and the performance of the test compound group; acquiring an optimal pharmacophore of a novel target-based drug through a numerical inversion of the QSPR model; and screening a group of drug candidates having a molecular structure similar to the optimal pharmacophore to determine the novel target-based drug.
In addition, the modeling of the molecular structure of the test compound group may include a compound selection process of selecting the test compound group, a data collection process of collecting chemical and biological experimental data of the test compound group, and a molecular structure modeling process of optimizing the molecular structure by modeling the molecular structure of the test compound group on the basis of the experimental data.
In addition, obtaining the quantitative structure-(drug)performance relationships (QSPR) model may include a calculation process of calculating molecular descriptors from the molecular structure and a QSPR modeling process of modeling the quantitative structure-(drug)performance relationships (QSPR) model by using the molecular descriptors.
In addition, in the quantitative structure-activity relationships (QSPR) model, the performance may include one or more drug performances selected from among biological activity, inhibitory activity, lipophilicity, toxicity, metabolic stability and blood-brain barrier permeability.
In addition, the QSPR modeling process may select a part of the molecular descriptors by using a genetic algorithm and then model the quantitative structure-(drug)performance relationships by using the selected molecular descriptors.
In addition, in the acquiring of the optimal pharmacophore of the novel drug, the optimal pharmacophore of the novel drug may be obtained through the numerical inversion performed according to Expression 1 or Expression 2.
x*=arg max log {circumflex over (k)}w
s·t log {circumflex over (k)}w=C{circumflex over (t)}
{circumflex over (t)}=Px
{circumflex over (t)}TSt−1t≤c1
∥P{circumflex over (t)}−x∥≤c
2 Expression 1
x: a vector of molecular descriptors of a novel drug
x*: a vector of optimal molecular descriptors of a novel drug candidate calculated through mathematical programming based on the above mathematical expression
C: output variable loading matrix of partial least squares (PLS)
t: a PLS score vector of input variables (where the input variables are molecular descriptors x)
P: PLS loading matrix
{circumflex over ( )}: value predicted by PLS model
St: sample covariance matrix of t
c1, c2: appropriate constant
x: a vector of molecular descriptors of a novel drug
x*: a vector of optimal molecular descriptors of a novel drug candidate calculated through mathematical programming based on the above mathematical expression
C: output variable loading matrix of partial least squares (PLS)
a PLS score vector of input variables (where the input variables are molecular descriptors x)
P: PLS loading matrix
{circumflex over ( )}: value predicted by PLS model
St: sample covariance matrix of t
c1, c2: appropriate constant
log kw,ref: lipophilicity value set by user
log ki,ref: activity value set by user
In addition, the screening of the group of drug candidates may include a process of rating each of the drug candidates according to the Euclidean distance between the optimum pharmacophore of the novel drug and the molecular structure of each of the drug candidates and a process of selecting drug candidates which are rated equal to or higher than a predetermined level from among the drug candidates in the candidate group.
In addition, the method may further include a step of verifying the novel target-based drug, which is performed after the screening of the group of drug candidates. The verifying may be performed through molecular dynamics simulation of complexes each being composed of one of the selected drug candidates and the target molecule.
The drug screening method according to the present invention can be applied to all kinds of drugs and target molecules, can be applied to the performances of all kinds of drugs, and can significantly reduce computations for in silico experiments and save time and manpower for drug discovery.
Hereinafter, the present invention will be described in detail. Since the following description is provided for detailed description of one embodiment of the present invention, the scope of the present invention as defined by the appended claims is not limited to the embodiment although definite or limiting terms and expressions are used in the description. In describing one embodiment of the present invention, well-known functions or constructions will not be described in detail when they may obscure the gist of the present invention.
Referring to
The molecular structure modeling step S10 is to model the molecular structures of compounds in a test compound group to be tested against a target molecule. Examples of the target molecule include a protein, an enzyme, DNA, and RNA. The test compound refers to a compound that inhibits the activity of the target molecule or alters the target molecule.
Specifically, referring to
The compound selection process S11 is to select a group of test compounds. In this process, test compounds that can inhibit the activity of the target molecule or alter the target molecule are selected.
The data reception process S12 is to receive biological and chemical experimental data of the test compounds. The data includes the boiling point, freezing point, polarity, solubility, reactivity, toxicity, selectivity, and the like of each test compound.
In addition, the molecular structure modeling process S13 is to model the molecular structure of the test compound group on the basis of the experimental data and optimize the molecular structure by using quantum chemistry.
The quantitative structure-(drug)performance relationships model creation step S20 is to obtain relationships between the structures and the performances of the test compound group.
Specifically, referring to
The molecular descriptor calculation process S21 is to calculate 4000 or more molecular descriptors on the basis of the molecular structure.
In addition, the quantitative structure-(drug)performance relationships (QSPR) modeling process S22 is to obtain a QSAR model on the basis of the molecular descriptors.
Specifically, in the QSPR modeling process S22, a genetic algorithm (GA) is applied to the molecular descriptors resulting from the calculation process S21 to select a part of the molecular descriptors, and then the quantitative structure-(drug)performance relationships is modeled by using the selected molecular descriptors.
Here, the genetic algorithm (GA) is the most popular optimization algorithm that is based on the direct inference of natural selection and the Darwinian evolution of genes in biological systems, and it can be successfully applied to various processes such as data mining and optimization.
In the present invention, QSPRs are modeled by using molecular descriptors calculated theoretically, rather than using 2-dimensional molecular descriptors that are often used in conventional quantitative structure-activity relationships (QSAR). Therefore, a more accurate description of the molecular structure of a target-based drug is possible.
In addition, in the quantitative structure-(drug)performance relationships (QSPR), the performance may include one or more performances selected from among biological activity, inhibitory activity, lipophilicity, toxicity, metabolic stability and blood-brain barrier permeability. The relationships between each of the various performances and the molecular structure are obtained and used in the subsequent step.
In the present invention, since nearly 4000 molecular descriptors are used to model the structure-performance relationships, the selection of the molecular descriptors, which has the greatest impact on the activity prediction of drug, and regression modeling can be simultaneously performed.
In addition, the optimal pharmacophore acquisition step S30 is to obtain the optimal pharmacophore of a novel drug through numerical inversions of the QSPRs.
Specifically, the optimal pharmacophore acquisition step S30 is to obtain the optimal pharmacophore which maximizes the performance (e.g., lipophilicity) of a drug through the numerical inversion which is performed according to Expression 1.
x*=arg max log {circumflex over (k)}w
s.t log {circumflex over (k)}w=C{circumflex over (t)}
{circumflex over (t)}=Px
{circumflex over (t)}
T
S
t
−1
t≤c
1
∥P{circumflex over (t)}−x∥≤c
2 Expression 1
x: a vector of molecular descriptors of a novel drug
x*: a vector of optimal molecular descriptors of a novel drug candidate calculated through mathematical programming based on Expression 1
C: output variable loading matrix of partial least squares (PLS)
a PLS score vector of input variables (where the input variables are molecular descriptors x)
P: PLS loading matrix
{circumflex over ( )}: value predicted by PLS model
St: sample covariance matrix of t
c1, c2: appropriate constant
In addition, in the optimal pharmacophore acquisition step S30, the optimal pharmacophore having the performance (for example, lipophilicity or activity (log ki) designated by the user) can be obtained through the numerical inversion performed according to Expression 2.
x: a vector of molecular descriptors of a novel drug
x*: a vector of optimal molecular descriptors of a novel drug calculated through mathematical programming based on Expression 2
C: output variable loading matrix of partial least squares (PLS)
a PLS score vector of input variables (where the input variables are molecular descriptors x)
P: PLS loading matrix
{circumflex over ( )}: value predicted by PLS model
St: sample covariance matrix of t
c1, c2: appropriate constant
log kw,ref: lipophilicity value set by user
log ki,ref: activity value set by user
In addition, in the optimal pharmacophore acquisition step S30, the optimal pharmacophore having the performance designated by the user can be obtained through the numerical inversion using various objective functions such as Expression 1 or Expression 2.
The drug candidate group screening step S40 is to select drug candidates having a molecular structure similar to the optimal pharmacophore of the novel drug.
Specifically, referring to
In addition, the novel drug candidate group rating process S40 is a process of rating each of the drug candidates in the novel drug candidate group according to the Euclidean distance between the optimal pharmacophore of the novel drug and the molecular structure of each of the drug candidates. A candidate with a shorter Euclidean distance is rated a higher level.
In addition, the novel drug candidate group selection process 41 is a process of selecting drug candidates that are rated equal to or higher than a predetermined level from among all of the candidates in the novel drug candidate group.
On the other hand, referring to
In addition, the novel target-based drug verification step S50 is a step of verifying complexes each being composed of one of the drug candidates for the novel target-based drug, which are selected through the screening step, and the target molecule through molecular dynamics simulation.
Specifically, in the novel target-based drug verification step S50, the optimum candidate for a novel drug against the target molecule can be selected by verifying the drug candidates by performing molecular dynamics simulation on the complexes each being composed of one of the selected drug candidates and the target molecule.
On the other hand, as described above, the present invention checks various structural changes (conformational ensembles) rather than checking only the fixed molecular structures of the drug candidate group and the target molecule by using the molecular dynamics simulation.
The greater details of the present invention will be described below with reference to examples and experiments described below. However, the examples and experiments are intended to describe the present invention in greater detail, and the scope of the present invention is not limited thereto.
The present invention was applied to designing of sulfonamide derivatives inhibiting CA IX (i.e., target molecule). In the present example, lipophilicity was set as the performance of a drug, and quantitative structure-performance relationships (QSPR) were modeled using partial least squares (PLS).
Liquid chromatography-mass spectrometry (LC-MS) was used to determine the lipophilicity (log kw) values of 14 sulfonamide isomers which were pre-synthesized (Table 1).
The molecular structure was optimized through PM3 semi-empirical quantum mechanics and nearly 4000 molecular descriptors were calculated for the optimized structure. This data was divided into a training data set and a test data set. Next, a genetic algorithm combined with the partial least squares method was used for descriptor selection. Thus, a quantitative structure-performance relationships (QSPR) was established on the basis of four molecular descriptors (Table 2).
As shown in
The developed QSPR model is inverted through numerical optimization having constrains described below.
x*=arg max log {circumflex over (k)}w
s.t log {circumflex over (k)}w=C{circumflex over (t)}
{circumflex over (t)}=Px
{circumflex over (t)}
T
S
t
−1
t≤c
1
∥P{circumflex over (t)}−x∥≤c
2
x: a vector of molecular descriptors of a novel drug
x*: a vector of molecular descriptor of a novel drug calculated through mathematical programming based on the above expression.
C: output variable (i.e., lipophilicity) loading matrix of partial least squares (PLS)
a PLS score vector of input variables (where the input variables are molecular descriptors x)
P: PLS loading matrix
{circumflex over ( )}: value predicted by PLS model
St: sample covariance matrix of t
c1, c2: appropriate constant
The optimum molecular descriptors were obtained through this method, and the optimum molecular descriptors were compared against a database of previously generated drug candidates. Each of the drug candidates was then rated according to the Euclidean distance. This procedure yielded 11 drug candidates for a maximum log kw value of 3.965 (Table 3).
obtained from Database
indicates data missing or illegible when filed
In order to verify the derived candidate compounds for a target-based drug, as shown in
The molecular dynamics simulation showed that Zn2+ ions were coordinated with three histidine residues at the active sites. This was performed to ensure that the function of the CA IX enzyme was not impaired in the simulation.
The stability and flexibility of the enzyme were analyzed by calculating the root mean square deviation (RMSD) and the root mean square fluctuation (RMSF). The hydrogen bonds and hydrophobic and hydrophilic interactions were also evaluated.
As shown in
This discrepancy is due to the small size of the 9FK sulfonamide. Since the screened compounds are bulky, the enzyme must match in its form in such a way that it can accommodate the compound within its active site.
The root mean square fluctuation (RMSF) was calculated for all the complexes to evaluate the conformational ensemble of CA IX as shown in
The results showed that the enzyme was the most stable in the CA IX-9FK complex. All the other complexes have a similar pattern of RMSF shapes. However, in most cases, higher RMSF values were exhibited in two typical regions: (i) flexible N ends (residues 9 to 20) and (ii) flexible loops (residues 230-240).
The hydrogen bonding network found in the CA IX structure was analyzed to determine the cause of the difference in RMSF value. To this end, the percentage of hydrogen bonds that are formed was calculated through simulation (Table 4). It was found that a hydrogen bond was formed in each of the W9-H68, A133-R136, and R136-G139 bond pairs.
The stability of the designed sulfonamide derivatives at the active site of the CA IX has a significant impact on the interaction. For this reason, changes in the conformation and flexibility of enzymes are considered. Their flexibility was observed by analyzing the root mean square deviation (RMSD) value when overlaying the conformations first in 22.06 ns. The results showed that all of the ligands were quite stable at the active site.
The analysis of the interactions between CA IX and the designed sulfonamide derivatives and between CA IX and 9FK was performed in a manner of calculating the percentage of ligand-amino residue interactions (Table 5) and then calculating the average number of atoms and residues within a ligand coverage of 0.35 nm throughout the simulation (Table 6).
The interaction between 9FKs determined by crystallographic studies was maintained throughout the simulation. C65 ligand showed very similar characteristics except for their interaction with a compound E106. C66, in contrast, is the most different complex from the CA IX-9FK complex. Only two of the seven residues that interact with a ligand were conserved. Stronger hydrophobic and hydrophilic interactions were observed for the simulation of other complexes.
As noted above, according to the example, 14 novel drug candidates were proposed through an inverse quantitative structure-performance relationships analysis, and molecular dynamics simulations were performed on 11 of the 14 candidates. As a result, all of the 11 candidates have been found to be suitable for the inhibition of CA IX than sulfonamide which is a baseline compound.
In addition, all hydrophobic and hydrophilic interactions between substitution groups and active sites were carefully analyzed. Since crystallization of CA IX-ligand complexes is very difficult due to the complex membrane binding structures of enzymes, such an analysis can provide insight into and guidance for future synthesis.
According to the analysis results, two compounds C59 and C34 are particularly promising for actual synthesis for in vitro and in vivo experiments to be performed in the subsequent step, in which the two compounds are (i) 5-chloro-4-methyl-2-sulfamoyl-phenyl) (1E)-4-chloro-5-hydrocy-N- (4-methylanilino)pentaneimidothioate and (ii) (5-chloro- 4-methyl-2-sulfamoyl-phenyl) (1E)-N-(4-methyl-2-nitro-anilino)hexamidothioate.
As described above, a target-based drug screening method according to the present invention can contribute to the rapid and efficient discovery of novel target-based drug. That is, the simplicity and rapid computation of inverse quantitative structure-performance relationships analysis and molecular dynamics simulation significantly reduce the cost of drug discovery and enable synthesis and wasteful pharmaceutical tests of false compounds to be avoided.
The present invention can be applied to an initial research stage for drug development because it is possible to significantly reduce investment for discovery of target-based drug and to avoid synthesis and wasteful pharmaceutical tests of false compounds.
Number | Date | Country | Kind |
---|---|---|---|
10-2017-0085981 | Jul 2017 | KR | national |
This is a U.S. National Stage Application, filed under 35 U.S.C. 371, of International Patent Application No. PCT/KR2017/007269, filed on Jul. 6, 2017, which claims priority to Korean Patent Application No. 10-2017-0085981 filed on Jul. 6, 2017, contents of both of which are incorporated herein by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2017/007269 | 7/6/2017 | WO | 00 |