The present invention relates to compound safety prediction devices, compound safety prediction programs, and compound safety prediction methods.
There are several tens of millions of kinds of compounds used in chemicals, pharmaceuticals, or the like, and the compounds have various structures. Because a compound may have a harmful effect on the ecosystem and environment, it is extremely important to predict various safety properties, such as a degradability, toxicity, or the like, of the compound. Hence, in various fields, such as the chemical industry, the pharmaceutical industry, or the like, there are studies to develop of a compound safety prediction device for predicting various safeties of the compound.
If a safety prediction rate of the compound is low, the compound has a possibility of harming humans and the environment, and thus, in order to put the safety prediction device into practical use, it is essential to realize an extremely high reliability for the prediction of the compound safety.
As a compound safety prediction device for predicting the safety of the compound, a safety evaluation system which includes a means for learning and analyzing descriptors effective for specific evaluation of a cosmetic material from descriptors computed using information related to cosmetic materials, and a means for searching for an evaluation model effective for the specific evaluation using the analyzed descriptors, and acquiring a prediction value of irritancy, sensitization, or repeated-dose toxicity of the cosmetic material, has been proposed, for example (refer to Patent Document 1, for example).
In addition, as another compound safety prediction device, a method compound safety evaluation method which computes a similarity between a molecule of a general chemical substance with unknown teratogenicity and all pharmaceutical molecules with known teratogenicity prestored in a database, and computes a similarity of a chemical structure that is provided by scores of pharmaceutical safety evaluation related to the molecule of the general chemical substance in a descending order of the scores of the similarity, has been proposed, for example (refer to Patent Document 2, for example).
Patent Document 1: Japanese Patent No. 5512077
Patent Document 2: Japanese Laid-Open Patent Publication No. 2007-153767
However, the technique of Patent Document 1 is limited to prediction of the irritancy, sensitization, or repeated-dose toxicity of the cosmetic material, and there is a problem in that the safety of the compound cannot be predicted with a high accuracy depending on the kind of compound, such as a new compound different from conventional compounds, or the like.
In addition, in the technique of Patent Document 2, there is a problem in that the compound safety evaluation is troublesome to perform and inconvenient for a user, because the similarity must be computed with respect to all of the pharmaceutical molecules registered in the database and the reference must be made to the safety data of the similar molecules.
It is one object of one aspect of the present invention is to provide a compound safety prediction device capable of evaluating the safety of the compound with a high accuracy while improving convenience for the user.
The present invention includes the following configurations.
[1] A compound safety prediction device including:
[2] The compound safety prediction device according to [1] above, wherein the output unit outputs
[3] The compound safety prediction device according to [1] above, further including:
[4] The compound safety prediction device according to [3] above, wherein the output unit outputs
[5] The compound safety prediction device according to [4] above, wherein the output unit, when the confidence score of the prediction is low, outputs
[6] The compound safety prediction device according to any one of [1] to [5] above, wherein the safety prediction unit includes:
[7] The compound safety prediction device according to [6] above, wherein the feature value computation unit computes the feature value of the molecule using one or more of a fingerprint based on the structural formula of the molecule, a physical property value computed by quantum chemical computation, based on the structural formula of the molecule, a physical property value obtained by a quantitative structure activity relationship between the structural formula and the physical property value of the molecule, and a prediction value obtained by a trained model that has learned the relationship between the structural formula and the physical property value of the molecule.
[8] The compound safety prediction device according to any one [1] to [7] above, wherein the similar molecule data searching unit includes:
[9] A compound safety prediction program which causes a computer to execute the steps of:
[10] A compound safety prediction method including the steps of:
According to one aspect of the compound safety prediction device, the compound safety prediction program, and the compound safety prediction method according to the present invention, the safety of a compound can be appropriately evaluated by quantifying the confidence score of the prediction of the safety of a molecule, and in a case where the confidence score is high, the safety of the compound can easily be evaluated quickly with a high accuracy by adopting the prediction result as it is. Hence, according to one aspect of the compound safety prediction device, the compound safety prediction program, and the compound safety prediction method according to the present invention, it is possible to evaluate the safety of the compound with a high accuracy, while improving the convenience for the user.
Hereinafter, embodiments of the present invention will be described in detail. In order to facilitate understanding of the description, the same constituent elements are designated by the same reference numerals in each of the drawings, and a redundant description thereof will be omitted. In addition, in the present specification, “to” indicating a numerical range, includes numerical values appearing before and after the “to”, as a lower limit value and an upper limit value of the range, unless indicated otherwise.
A compound safety prediction device according to a first embodiment of the present invention will be described.
The safety prediction device 1A outputs a prediction result of a molecule safety evaluation and a confidence score of the prediction obtained by the safety prediction unit 20, and safety evaluation data obtained by the similar molecule data searching unit 30. Accordingly, a device user (user) can consider whether or not to adopt the prediction result as it is when the confidence score is high, and whether or not to adopt either the prediction result or the safety evaluation data when the confidence score is low. Hence, the safety prediction device 1A quantifies the confidence score and outputs the quantified confidence score, so that the user can judge the safety of the compound, based on at least one of the prediction result of the molecule safety evaluation obtained by the safety prediction unit 20 and the safety evaluation data obtained by the similar molecule data searching unit 30. Thus, the safety prediction device 1A can improve the convenience for the user, and improve the accuracy of the compound safety evaluation.
The term output includes displaying on a screen, generating sound, or the like, as will be described later.
The confidence score that is high and the confidence score that are low are the same as a high confidence score and a low confidence score which will be described later, respectively, and a threshold value for determining whether the confidence score is high or low can be set appropriately according to a type of molecule to be evaluated for safety. For example, when the threshold value is set to 50%, the confidence score is determined to be high if the confidence score is greater than or equal to the threshold value.
The safety is an index indicating a magnitude of a load of a compound on humans and the environment, and examples of the safety include biodegradability, bioaccumulation, mutagenicity, acute toxicity, acute immobilization, growth inhibition, repeated-dose toxicity, or the like.
The input unit 10 inputs a structural formula of one or more molecules for which the safety is to be evaluated.
SMILES or the like can be used as the structural formula. The SMILES is a character string representing a molecular structure of a compound. An example of a table describing the structural formulas (SMILES) is illustrated in
The input unit 10 may check the input structural formula of the molecule for a description error. When the user inputs the structural formula, there is a possibility of making an erroneous input. The input unit 10 can determine that the structural formula of the molecule that is input includes a description error by checking the erroneous input of the structural formula.
The input unit 10 may determine whether or not the input structural formula of the molecule includes a description error, by checking whether or not the input structural formula of the molecule is converted into a molecule Mol object, using RDKit or the like included in a library, such as Anaconda (registered trademark), which is software distributed from Anaconda, Inc. of the United States, or the like, for example. In a case where the structural formula is the SMILES, MolFromSmiles included in the RDKit is used to read the character string of the SMILES and read the structural formula of the molecule. In a case where the SMILES is converted into the Mol object and the molecule Mol object is normally created, it can be determined that there is no description error in the input structural formula of the molecule. On the other hand, in a case where the SMILES is not converted into the Mol object and the molecule Mol object is not created, it can be determined that the input structural formula of the molecule is erroneous.
The input unit 10 may separately create a table including the structural formulas having no description error and a table including the structural formulas having a description error, and output the tables by the output unit 80 which will be described later. Thus, even in a case where the user fails to input the structural formula correctly and the structural formula includes an error, the safety prediction device 1A can predict a safety evaluation without abnormally ending the prediction.
As illustrated in
The feature value computation unit 21 computes a feature value based on the structural formula of the molecule.
The feature value can be obtained based on the structural formula of the molecule having no description error. Fingerprints based on the structural formulas of the molecules, such as fingerprints corresponding to EXTENDED Connectivity Fingerprints (ECFP) computed using Morgan fingerprints (Circular fingerprints) implemented in the RDKit, other fingerprints such as AtomPair or the like, or the like can be used as the feature value. The feature value may be a physical property, such as an octanol/water partition coefficient (logP) or the like, representing a liposolubility of the molecule. The fingerprint may express the presence and the absence of a partial structure by 1 and 0, respectively, express a number of partial structures, or express a ratio of partial structures obtained by dividing the number of partial structures by a number of constituent atoms.
The feature value may be computed using one or more of a physical property value computed by quantum chemical computation based on the structural formula of the molecule, a physical property value obtained by a quantitative structure activity relationship between the structural formula and the physical property value of the molecule, and a prediction value obtained by a trained model that has learned the relationship between the structural formula and the physical property value of the molecule. Examples of the physical property value computed by the quantum chemical computation include HOMO, LUMO, electric charge, refractive index, frequency, or the like. The structure activity relationship refers to a correlation between the chemical structural feature (or a physicochemical constant) of a substance, and the biological activity (for example, the degradability, accumulation, various toxicity endpoints, or the like).
In addition, the feature value may be a physical property measurable by an experiment on a melting point, viscosity, and a specific surface area, or the like.
The predictor 22 predicts the safety evaluation of the molecule based on the feature value computed by the feature value computation unit 21, and computes the confidence score of the prediction.
For example, a biochemical oxygen demand (BOD) or the like can be used as an index for evaluating the safety of the molecule. In a case where the BOD is greater than or equal to a predetermined value (for example, 60%), the safety of the molecule can be evaluated as being good.
The confidence score of the prediction can be computed using the characteristic prediction model 70. The predictor 22 inputs the feature value computed by the feature value computation unit 21, as an explanatory variable, to the characteristic prediction model 70, and outputs a classification probability P(OK) that indicates a classification result “OK”, where OK indicates good. The predictor 22 computes a confidence score (unit: %) of the prediction, with respect to a classification probability P(OK) for which the classification result is “OK”, using the following formula (1).
(In the formula (1), P(OK) indicates a classification probability for which the classification result is “OK”.)
The confidence score of the prediction takes a value from 0% to 100%, and a percentage of correct answer of the prediction result becomes higher as the confidence score of the prediction becomes closer to 100%. For this reason, the user can easily judge, from the confidence score of the prediction, whether or not the prediction result is reliable.
The confidence score of the prediction corresponds to the classification probability as in the formula (1) described above, and the confidence score of the prediction varies according to a magnitude of the classification probability.
The threshold values for determining the high confidence score and the low confidence score can be set appropriately according to the type of molecule to be evaluated for safety, and are preferably 50%, respectively, for example.
The predictor 22 can create a table of the prediction result of the safety evaluation of the molecule, including the structural formula of each molecule, the prediction result, and the confidence score of the prediction. An example of the table describing the prediction result of the safety evaluation of the molecule is illustrated in
As illustrated in
The feature value computation unit 21 may create a table of the prediction result of the safety evaluation of the molecule, including the prediction result of the safety evaluation of the molecule and the confidence score of the prediction, as illustrated in
As illustrated in
The similarity evaluation unit 31 computes and evaluates a similarity between the structural formula of the molecule input from the input unit 10, and the structural formulas of the plurality of evaluated molecules stored in a safety evaluation database 33. The similarity evaluation unit 31 may use the SMILES for the structural formula of the molecule.
The safety evaluation database 33 stores safety evaluation data of the evaluated molecules evaluated in the past.
The similarity can be obtained by computing a Tanimoto coefficient, using Bulk Tanimoto Similarity implemented in the RDKit. The similarity may be a Dice coefficient, a cosine (cos) similarity, or the like.
The similarity evaluation unit 31 can vary the number of safety evaluation data of similar molecules to be acquired as appropriate according to the purpose, ease of use, or the like, among the safety evaluation data stored in the safety evaluation database 33, and may acquire a predetermined number of data corresponding to top similarities (for example, top 20 data), as the safety evaluation data of the similar molecule (similar molecule data).
As illustrated in
The determination by the Chemical Substances Control Law refers to a determination by the “Act on the Regulation of Manufacture and Evaluation of Chemical Substances”.
The residual change substance refers to a change substance remaining after a test of a biodegradability test in conformance with the Chemical Substances Control Law or the like.
As illustrated in
As illustrated in
The similarity evaluation unit 31 displays information on the molecule to be evaluated, and information related to the similar molecules, together in the table including the safety evaluation data of the similar molecule, so that the molecule to be evaluated and the similar molecules can be compared visually, and thus, the user can easily determine the safety evaluation data of the similar molecule to be referred to among the similar molecules.
The similarity evaluation unit 31 may create a table including the safety evaluation data of the similar molecule, such as that illustrated in
The data searching unit 32 acquires the safety evaluation data of the similar molecule having a high similarity.
As illustrated in
The integration unit 40 may output the integrated file from the output unit 80 which will be described later. In this case, the user can easily ascertain together the information related to the molecule to be evaluated and the information related to the safety evaluation of the similar molecules, which are included in the integrated file.
The storage unit 50 stores, as training data, associated data in which the structural formula of the molecule of the compound, the safety evaluation, the feature value of the compound, the characteristics of the compound, or the like are associated with one another. An example of the training data table is illustrated in
The storage unit 50 may update the associated data by inputting the structural formula (for example, the SMILES or the like) of the molecule of the compound, the feature value of the compound, the characteristics of the compound, or the like to the associated data.
The model training unit 60 trains a model by utilizing the associated data stored in the storage unit 50 as the training data.
More particularly, the model training unit 60 uses the structural formula (for example, the SMILES or the like) of the molecule of the compound and the feature value of the compound, stored in the storage unit 50, as an explanatory variable, and uses the characteristics of the compound to be predicted as the objective variable. Thus, the model training unit 60 trains the model for specifying the correspondence relationship between the feature value of the compound and the characteristics of the compound, and generates a trained model (characteristic prediction model 70). The model training unit 60 trains the model, so that the correspondence relationship approaches the correspondence relationship of the training data by machine learning.
The machine learning applied to the model is preferably an algorithm of supervised learning. Examples of the supervised learning include linear regression, logistic regression, random forest, boosting, support vector machine (SVM), neural network, or the like, for example. The neural network may use deep learning (deep learning) in which a multi-layer neural network is formed of more than three layers. Examples of the type of the neural network include a convolutional neural network (CNN), a recurrent neural network (RNN), a general regression neural network, or the like, for example. In addition, the model may be expressed by a mathematical expression, such as a function or the like.
More particularly, a machine trained model constructed using Anaconda (registered trademark) or the like, which is software distributed from Anaconda, Inc. of the United States.
Anaconda (registered trademark) includes library groups used by machine learning, such as scikit-learn or the like, and the model training unit 60 may perform machine learning using one or more of such libraries.
Further, the model training unit 60 may perform relearning on the trained model by using, from the safety evaluation data newly stored in the storage unit 50, the structural formula (for example, the SMILES or the like) of the molecule of the compound and the feature value of the compound as explanatory variables, and the characteristics of the compound as objective variables.
The first acquisition unit 61 acquires the training data including a table that includes the structural formula (for example, the SMILES) of the molecule of the compound and a listing of the structural formulas, and a table including a listing of the characteristics of the compound.
The training data can be stored in a file having a format, such as CSV, Excel which is spreadsheet software, or the like, for example.
The second acquisition unit 62 acquires the molecular structure of one molecule, from the training data acquired by the first acquisition unit 61.
The molecular structure of one molecule is preferably the SMILES of one molecule.
The function unit 63 computes the feature value, based on the molecular structure of one molecule acquired by the second acquisition unit 62. Because the feature can be computed in a manner similar to the feature value computation unit 21, a detailed description thereof will be omitted.
The determination unit 64 determines whether or not the feature values of all molecules included in the training data are computed.
The model 65 is trained by the model training unit 60, using the structural formula of the molecule of the compound and the feature value of the compound stored in the storage unit 50 as explanatory variables, and the characteristics of the compound stored in the storage unit 50 as objective variables.
The storage unit 66 stores the trained model generated by the model training unit 60 which trains the model 65.
As illustrated in
The high and low levels of the confidence score of the prediction may be appropriately set according to the predetermined value of the classification probability. In a case where the predetermined value of the classification probability is 0.50, the high confidence score of the prediction refers to a case where the confidence score of the prediction is greater than or equal to 50%, for example, and the low confidence score of the prediction refers to a case where the confidence score of the prediction is less than 50%, for example.
The output unit 80 outputs the prediction result of the safety evaluation of the molecule, the confidence score of the prediction, and the safety evaluation data of the similar molecule obtained by the integration unit 40. That is, the output unit 80 outputs the integrated file.
The output may utilize any method capable of notifying the user, and includes displaying on a screen, generating sound, or the like.
In addition, the output unit 80 may output a table of structural formulas (for example, the SMILES) having no description error, and a table of structural formulas having a description error, which are created by the input unit 10. Moreover, the output unit 80 may output a table of the prediction results of the safety evaluation of the molecules, including the prediction results of the safety evaluation of the molecules and the confidence scores of the prediction, which is created by the safety prediction unit 20, or may output the safety evaluation data of the similar molecule, including information related to the similar molecules, which is created by the similarity evaluation unit 31. Further, the output unit 80 may refer to the integrated file, and output the safety evaluation data of the similar molecule when the confidence score of the prediction of the safety evaluation of the molecule is low.
The output unit 80 may output a message related to the prediction result of the safety evaluation of the molecule and the confidence score of the prediction in a case where the confidence score of the prediction of the safety evaluation of the molecule is high (high confidence score), and output a message related to the prediction result of the safety evaluation of the molecule, the confidence score of the prediction, and the safety evaluation data of the similar molecule in a case where the confidence score of the prediction of the safety evaluation of the molecule is low (low confidence score).
For example, the contents of the message may be “The prediction result of the safety evaluation of the molecule is high, and the confidence score of the prediction is greater than or equal to 50%.” or the like in the case where the confidence score of the prediction is high, and may be “The prediction result of the safety evaluation of the molecule is low, and the confidence score of the prediction is less than 50%.” or the like in the case where the confidence score of the prediction is low.
A compound safety prediction program according to the present embodiment (hereinafter, simply referred to as a “safety prediction program”) can use a program having the following configuration.
That is, the safety prediction program according to the present embodiment may use a program which causes a computer to execute at least the steps of:
Next, a compound safety prediction method (hereinafter, simply referred to as a “safety prediction method”) applied to the safety prediction device according to the present embodiment will be described. The safety prediction method applied to the safety prediction device according to the present embodiment is a method for predicting the safety evaluation of the compound using the safety prediction device 1A having the configuration illustrated in
Next, a training method of the characteristic prediction model 70 used in the safety prediction method will be described. As described above, because the model 65 constructed by the model training unit 60 is applied to the characteristic prediction model 70, the training method of the characteristic prediction model 70 will be described as the training method of the model 65.
In the model training method, the first acquisition unit 61 of the safety prediction device 1A acquires the training data (training data acquisition process: step S11).
The training data includes a table listing the structural formulas (for example, the SMILES) of the molecules of the compounds, a table listing the characteristics of the compounds, or the like.
Next, the second acquisition unit 62 of the safety prediction device 1A acquires the structural formula of one molecule from the training data (acquisition process to acquire the structural formula of one molecule: step S12).
The structural formula of one molecule may be the SMILES of one molecule.
Next, the second acquisition unit 62 of the safety prediction device 1A computes the feature value by utilizing a library group included in Anaconda (registered trademark), such as scikit-learn and RDKit or the like, using the structural formula of one molecule acquired by the function unit 63 (feature value computation process: step S13).
Next, the determination unit 64 of the safety prediction device 1A determines whether or not the feature value of all of the molecules included in the training data are computed (determination process to determine the feature value of all of the molecules: step S14).
If the feature value of all of the molecules are not computed (step S14: No), the process returns to the acquisition process to acquire the structural formula of one molecule (step S12), and the structural formula of the remaining molecule for which the feature value is not computed is acquired.
If the feature value of all of the molecules is computed (step S14: Yes), the model training unit 60 trains the model using training data in which the explanatory variables including the feature value of all of the molecules and the objective variables including the characteristics of all of the molecules are associated with one another, and constructs the model 65 (training process: step S15).
The training unit 15 trains the model so that the output coincides with the objective variable associated with the explanatory variable, according to the input of the explanatory variable included in the training data.
Next, the safety prediction device 1A stores the model constructed by the training unit 15 in the storage unit 66 (storing process: step S16).
Next, the safety prediction method applied to the safety prediction device according to the present embodiment will be described.
Next, the safety prediction unit 20 of the safety prediction device 1A checks the input structural formula for a description error (checking process: step S22).
The details of the checking process (step S22) will be described later. The checking process (step S22) may be omitted.
Next, the safety prediction unit 20 of the safety prediction device 1A predicts the safety evaluation of the molecules and computes the confidence score of the prediction, and acquires a table of the prediction result of the safety evaluation of the molecules, including the prediction of the safety evaluation of the molecule and the confidence score of the prediction (computation process to predict the safety evaluation of the molecule and compute the confidence score of the prediction: step S23).
The details of the computation process (step S23) to predict the safety evaluation of the molecule and compute the confidence score of the prediction will be described later.
The similar molecule data searching unit 30 of the safety prediction device 1A searches and acquires the safety evaluation data of the similar molecule similar to the molecule to be evaluated for safety (searching process to search and acquire the safety evaluation data of the similar molecule: step S24).
The details of the searching process (step S24) to search and acquire the safety evaluation data of the similar molecule will be described later.
Next, the integration unit 40 of the safety evaluation prediction device 1A obtains the integrated data by integrating the prediction result of the safety evaluation of the molecule and the confidence score of the prediction obtained in the computation process (step S23) to predict the safety evaluation of the molecule and compute the confidence score of the prediction, and the safety evaluation data of the similar molecule acquired in the searching process (step S24) to search and acquire the safety evaluation data of the similar molecule (integration process: step S25).
The details of the integration process (step S25) will be described later.
Next, the output unit 80 of the safety prediction device 1A outputs the integrated data integrated by the integration unit 40 (output process: step S26).
The safety prediction device 1A may output, from the output unit 80, the prediction result and the confidence score of the prediction of the integrated data by making a display or the like in the case where the confidence score of the prediction is high, and output the safety evaluation data of the similar molecule in addition to the prediction result and the confidence score of the prediction of the integrated data by making a display or the like in the case where the confidence score of the safety prediction is low.
The computation process (step S23) to predict the safety evaluation of the molecule and compute the confidence score of the prediction may be performed simultaneously with the searching process (step S24) to search and acquire the safety evaluation data of the similar molecule, or may be performed after the searching process (step S24) to search and acquire the safety evaluation data of the similar molecule.
Next, the checking process (step S22) in
The SMILES illustrated in
Next, the safety prediction unit 20 of the safety prediction device 1A acquires the structural formula of one molecule from among the input structural formulas of all of the molecules to be evaluated (acquisition process to acquire the structural formula of one molecule: step S222).
Next, the safety prediction unit 20 of the safety prediction device 1A checks for a description error in the structural formula of one molecule (description error checking process: step S223).
Next, the safety prediction unit 20 of the safety prediction device 1A determines whether or not the computation error of the structural formula is checked for all of the molecules (description error determination process: step S224).
If the computation error is not determined for all of the molecules (step S224: No), the structural formula of the molecule not determined of the description error is acquired again (step S222).
If the computation error is determined for all of the molecules (step S224: Yes), the safety prediction unit 20 of the safety prediction device 1A outputs the table of the structural formula having no description error to a file (output process to output the table of structural formulas having no description error: step S225).
Next, the safety prediction unit 20 of the safety prediction device 1A outputs the table of the structural formula having the description error to a file (output process to output the structural formula having the description error: step S226).
Next, the computation process (step S23) to predict the safety evaluation of the molecule and compute the confidence score of the prediction in
Next, the safety prediction unit 20 of the safety prediction device 1A acquires the table of the structural formula having no description error (structural formula acquisition process: step S232).
Next, the safety prediction unit 20 of the safety prediction device 1A acquires the structural formula of one molecule among the structural formulas of all of the molecules described in the table of the structural formulas having no description error (acquisition process to acquire the structural formula of one molecule: step S233).
Next, safety prediction unit 20 of the safety prediction device 1A generates a feature value of one molecule (generation process to generate feature value of one molecule: S234).
Next, the safety prediction unit 20 of the safety prediction device 1A predicts the safety evaluation of one molecule and computes the confidence score of the prediction (computation process to predict safety evaluation of molecule and compute confidence score of prediction: S235).
Next, the safety prediction unit 20 of the safety prediction device 1A determines whether or not the prediction of the safety evaluation and the computation of the confidence score of the prediction are performed for all of the molecules (determination process to determine the prediction of the safety evaluation and the computation of the confidence score of the prediction for all of the molecules: step S236).
If the prediction of the safety evaluation and the computation of the confidence score of the prediction are not performed for all of the molecules (step S236: No), the structural formula of the molecule for which the prediction and computation are not performed is acquired again (step S232).
If the prediction of the safety evaluation and the computation of the confidence score of the prediction are performed for all of the molecules (step S236: Yes), the table of the prediction results of the safety evaluation of the molecules, including the prediction of the safety evaluation for all of the molecules and the confidence score of the prediction of all the molecules is output to a file (output process to output the table of prediction results of the safety evaluation of the molecules: step S237).
Next, the searching process (step S24) to search and acquire the safety evaluation data of the similar molecule in
Next, the similar molecule data searching unit 30 of the safety prediction device 1A acquires a table of structural formulas having no description error (acquisition process to acquire the table of structural formulas: step S242).
Next, the similar molecule data searching unit 30 of the safety prediction device 1A acquires the structural formula of one molecule among all of the molecules described in the table of structural formulas having no description error (acquisition process to acquire the structural formula of one molecule: step S243).
Next, the similar molecule data searching unit 30 of the safety prediction device 1A computes the similarity between the acquired one molecule and all of the molecules in the safety evaluation database (similarity computation process: step S244).
Next, the similar molecule data searching unit 30 of the safety prediction device 1A acquires a predetermined number of safety evaluation data corresponding to top similarities, among the similarities of all of the molecules computed in the similarity computation process (step S244) (acquisition process to acquire the predetermined number of safety evaluation data: step S245).
Next, the similar molecule data searching unit 30 of the safety prediction device 1A determines whether or not the similar molecules are searched, for all of the molecules described in the table of structural formulas having no description error (determination process to determine the searched similar molecules of all of the molecules: step S246).
If the similar molecules of all of the molecules are not searched (step S246: No), the structural formula of the unchecked molecule is acquired again (step S243).
If the similar molecules of all of the molecules are searched (step S246: Yes), a table of the safety evaluation data of each of the similar molecules is output for all of the molecules (step S247).
Next, the integration process (step S25) in
Next, the integration unit 40 of the safety prediction device 1A acquires the table of the safety evaluation data of the similar molecule for all the molecules obtained in the acquisition process (step S24) to search and acquire the safety evaluation data of the similar molecule from the similar molecule data searching unit 30 (acquisition process to acquire the safety evaluation data of the similar molecule: step S252).
Next, the integration unit 40 of the safety prediction device 1A integrates the table of the prediction result of the safety evaluation of the molecules and the table of the safety evaluation result of the molecules similar for all of the molecules into a single table to create an integrated file (table integration process: step S253).
Next, the output unit 80 of the safety prediction device 1A outputs the integrated file as illustrated in
The safety prediction device 1A according to the present embodiment includes the input unit 10, the safety prediction unit 20, the similar molecule data searching unit 30, and the output unit 80. In the safety prediction device 1A, the safety prediction unit 20 predicts the safety evaluation of molecule and computes the confidence score of the prediction, and the similar molecule data searching unit 30 acquires the safety evaluation data of the similar molecule. The safety prediction device 1A can appropriately provide the user with the prediction result of the safety evaluation of the compound, by quantifying and outputting the confidence score of the prediction of the safety evaluation of the molecule. In the case where the confidence store of the prediction is high, the user can quickly and easily evaluate the safety of the compound with a high accuracy, by adopting the prediction result as it is. In the case where the confidence score of the prediction is low, the user can quickly and easily evaluate the safety of the compound with a high accuracy, quickly, by considering which of the prediction result and the safety evaluation data is to be adopted. Accordingly, the safety prediction device 1A can perform the safety evaluation of the compound with a high accuracy, while improving the convenience for the user.
The output unit 80 of the safety prediction device 1A can output a message related to the prediction result of the safety evaluation of the molecule and the confidence score of the prediction when the confidence score of the prediction is high, and output a message related to the prediction result of the safety evaluation of the molecule, the confidence score of the prediction, and the safety evaluation data when the confidence score of the prediction is low. The user can accurately determine the evaluation contents of the safety of the compound, by checking the contents of the output message. For this reason, the safety prediction device 1A can appropriately perform the safety evaluation of the compound with a high accuracy, while further improving the convenience for the user.
In the safety prediction device 1A, the safety prediction unit 20 can include the feature value computation unit 21 and the predictor 22. Hence, in the safety prediction device 1A, the feature value computation unit 21 can compute the feature value based on the structural formula of the molecule, and the predictor 22 can predict the safety of the molecule based on the computed feature value. For this reason, the safety prediction device 1A can perform the safety evaluation of the compound with an even higher accuracy.
The feature value computation unit 21 of the safety prediction device 1A can compute the feature value of the molecule by inputting the structural formula of the molecule to the characteristic prediction model 70. The safety prediction unit 20 can predict the safety evaluation of the molecule and the confidence score of the prediction with a high accuracy in a simple manner, from the structural formula of the molecule, and can reduce the load and the time required for the computation. Accordingly, the safety prediction device 1A can conveniently predict the safety evaluation of the compound with a high accuracy, and at a low computation cost.
In the safety prediction device 1A, the similar molecule data searching unit 30 may include the similarity evaluation unit 31 and the data searching unit 32. Thus, in the safety prediction device 1A, the similarity evaluation unit 31 can evaluate the similarity between the input molecules and the plurality of molecules described in the safety evaluation database 33, and the data searching unit 32 can acquire the safety evaluation data of the similar molecule having a high similarity. Accordingly, the safety prediction device 1A can perform the safety evaluation of the compound with an even higher accuracy.
The safety prediction device 1A may include the output unit 80. Hence, the safety prediction device 1A can visually present the information related to the prediction result of the safety evaluation of the predicted compound, and the information related to the similar molecule data, with respect to the user, thereby enabling the user to easily ascertain the information related to the compound.
As described above, because the safety prediction device 1A can conveniently predict the safety evaluation of the compound with a high accuracy, and at a low computation cost, the safety of the compound used in materials, pharmaceuticals, or the like utilized in chemical industry, pharmaceutical industry, or the like, for example, can be predicted with a high accuracy, and thus the safety prediction device 1A is suitable for safely performing the research and development, product manufacturing, or the like.
In addition, the safety prediction device 1A can be effectively used for an evaluation test on the biodegradability, bioaccumulation, mutagenicity, fish acute toxicity, daphnia magna acute immobilization, algal growth inhibition, mammal repeat dose toxicity, or the like. Examples of the evaluation test of the mutagenicity include a reverse mutation test (Ames test), a chromosomal aberration test, or the like. Examples of the evaluation test of the fish acute toxicity include a measurement of median lethal concentration (LC50) by “Fish Acute Toxicity Test—JIS K 0102.71—”, or the like. Examples of the evaluation test of the daphnia magna acute immobilization include a measurement of a half maximal inhibitory concentration (EC50), or the like. Examples of the evaluation test of the algae growth inhibition include a measurement of the 50% inhibitory concentration (EC50), or the like. Examples of the evaluation test of the mammal repeat dose toxicity include a measurement of lowest observed adverse effect level (NOAEL), or the like.
The safety prediction device according to a second embodiment of the present invention will be described.
The verification unit 110 verifies a validity of the prediction result of the safety evaluation of the molecule, by determining a coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data.
In the case where the confidence score of the prediction is low, the verification unit 110 determines the coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data of the similar molecule. In a case where the prediction result of the safety evaluation of the molecule coincides with the safety evaluation data of the similar molecule, the verification unit 110 verifies that the prediction result is valid (OK low confidence score) although the confidence score of the prediction is low. In a case where the prediction result of the safety evaluation of the molecule does not coincide with the safety evaluation data of the similar molecule, the verification unit 110 determines that the confidence score of the prediction is low and verifies that the prediction result is not valid (NG low confidence score). The verification unit 110 refers to the safety evaluation data of the similar molecule only when the confidence score of the prediction is low, thereby reducing the frequency of use of the safety evaluation data of the similar molecule, and improving the convenience for the user.
For example, in the case where the ID in
In a case where the safety evaluation data of a predetermined number (for example, 11) or more similar molecules among the safety evaluation data of the plurality of (for example, 20) similar molecules coincide and the coincidence level is high, the verification unit 110 may determine that the safety evaluation of the molecule to be predicted is OK and the molecule is not easily degradable, and regard the molecule as having the OK low confidence score. In this case, the safety evaluation of the molecule to be predicted is OK, indicating easily degradable, and even when the reference is made to the safety evaluation data of the similar molecule, the safety evaluation of the molecule to be predicted is OK, indicating easily degradable, and the prediction result of the safety evaluation of the molecule obtained from the safety evaluation data and the safety evaluation data of the similar molecule coincide. For this reason, the verification unit 110 can verify that the prediction result of the safety evaluation of the molecule is valid.
On the other hand, in a case where only the safety evaluation data of less than the predetermined number of (for example, 11) similar molecules among the safety evaluation data of a plurality of (for example, 20) similar molecules coincide and the coincidence level is low, the verification unit 110 may determine that the molecule to be predicted is not easily degradable, and regard the molecule as having the NG low confidence score. In this case, the safety evaluation of the molecule to be predicted is OK, indicating easily degradable, but when a reference is made to the safety evaluation data of the similar molecule, the safety evaluation of the molecule to be predicted is NG, indicating not easily degradable, and the prediction result of the safety evaluation of the molecule and the safety evaluation data of the similar molecule do not coincide. For this reason, the verification unit 110 can verify that the prediction result of the safety evaluation of the molecule is not valid.
When determining the coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data of the similar molecule, the verification unit 110 may determine the coincidence level from a sum of the similarities of the similar molecules or a sum of values obtained by multiplying a weight to the similarities of the similar molecules, in place of determining the coincidence level from the majority decision on the safety evaluation data of the number of similar molecules. The weight may be the same value or different values for each of the similar molecules.
In the present embodiment, the output unit 80 may output a message indicating that the prediction result of the safety evaluation of the molecule coincides with the safety evaluation data of the similar molecule in the case where the confidence score of the prediction is low and the coincidence level is high, and output a message indicating that the prediction result of the safety evaluation of the molecule does not coincide with the safety evaluation data of the similar molecule in the case where the confidence score and the coincidence level of the prediction are low.
For example, the contents of the message may be “The confidence score of the prediction is less than 50%, but the coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data of the similar molecule is high.” or the like in the case where the confidence score of the prediction is low and the coincidence level is high, and may be “The confidence score of the prediction is less than 50%, and the coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data of the similar molecule is also low.” or the like in the case where the confidence score of the prediction and the coincidence level are low.
Next, a safety prediction method applied to the safety prediction device according to the present embodiment will be described. The safety prediction method applied to the safety prediction device according to the present embodiment is a method for predicting the safety of the compound using the safety prediction device 1B having a configuration illustrated in
The safety prediction method applied to the safety prediction device 1B according to the present embodiment will be described.
Next, the safety prediction unit 20 of the safety prediction device 1B checks the input structural formula for a description error (checking process: step S32).
The checking process (step S32) is the same as the checking process (step S22) of the safety prediction method according to the first embodiment illustrated in
Next, the safety prediction unit 20 of the safety prediction device 1B predicts the safety evaluation of the molecule and computes the confidence score of the prediction, and acquires a table of the prediction result of the safety evaluation of the molecule, including the prediction of the safety evaluation of the molecule and the confidence score of the prediction (computation process to predict the safety evaluation of the molecule and compute the confidence score of the prediction: step S33).
The computation process (step S33) to predict the safety evaluation of the molecule and compute the confidence score of the prediction is the same as the computation process (step S23) to predict the safety evaluation of the molecule and compute the confidence score of the prediction of the safety prediction method according to the first embodiment illustrated in
Next, the similar molecule data searching unit 30 of the safety prediction device 1B searches and acquires the safety evaluation data of the similar molecule similar to the molecule to be evaluated for safety (searching process to search and acquire the safety evaluation data of the similar molecule: step S34).
The searching process (step S34) to search and acquire the similar molecules is the same as the searching process (step S24) to search and acquire the similar molecules in the safety prediction method according to the first embodiment illustrated in
Next, the verification unit 110 of the safety prediction device 1B determines whether or not the confidence score of the prediction is greater than or equal to 50% (determination process to determine the confidence score of the prediction: step S35) after the computation process (step S33) to predict the safety evaluation of the molecule and compute the confidence score of the prediction.
In the determination process (step S35) to determine the confidence score of the prediction, if the confidence score of the prediction is greater than or equal to 50% (step S35: Yes), the output unit 80 of the safety prediction device 1B outputs the prediction result of the safety evaluation of the molecule (output process to output the prediction result: step S36).
If the confidence score of the prediction is less than 50% (step S35: No), the verification unit 110 of the safety prediction device 1B determines whether or not the coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data of the similar molecule is high (coincidence level determination process: step S37) after the searching process (step S34) to search and acquire the safety evaluation data of the similar molecule.
If the coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data of the similar molecule is high (step S37: Yes), the verification unit 110 of the safety prediction device 1B verifies that the prediction result of the safety evaluation of the molecule is valid (OK low confidence score) although the confidence score of the prediction is low, and the output unit 80 outputs the table of the prediction result of the safety evaluation of the molecule (output process to output the table of the prediction result of the safety evaluation of the molecule: step S36).
If the coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data of the similar molecule is low (step S37: No), the verification unit 110 of the safety prediction device 1A determines that the confidence score of the prediction is low and verifies that the prediction result of the safety evaluation of the molecule is not valid (NG low confidence score). The integration unit 40 of the safety prediction device 1A integrates the table of the prediction result of the safety evaluation of the molecule obtained in the computation process (step S33) to predict the safety evaluation of the molecule and compute the confidence score of the prediction, and the safety evaluation data of the similar molecule obtained in the searching process (step S34) to search and acquire the safety evaluation data of the similar molecule, to obtain integrated data (integration process: step S38).
The integration process (step S38) is the same as the integration process (step S25) of the safety prediction method according to the first embodiment illustrated in
Next, the output unit 80 of the safety prediction device 1B outputs the integrated data (refer to
In the safety prediction method according to the present embodiment, the computation process (step S33) to predict the safety evaluation of the molecule and compute the confidence score of the prediction may be performed simultaneously as the searching process (step S34) to search and acquire the safety evaluation data of the similar molecule, or after the searching process (step S34) to search and acquire the safety evaluation data of the similar molecule.
The safety prediction device 1B according to the present embodiment includes the verification unit 110, in addition to the configuration of the safety prediction device 1A according to the first embodiment described above, and the verification unit 110 verifies a validity of the prediction result of the safety evaluation of the molecule, determining a coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data. Thus, even when the confidence score of the prediction is low, the safety prediction device 1B can determine the coincidence level between the prediction result of the safety evaluation of the molecule and the safety evaluation data by making a reference to the safety evaluation data of the similar molecule, so that the safety evaluation of the compound can be performed with a high accuracy even for the compound for which the safety evaluation is difficult to predict. Accordingly, the safety prediction device 1A can perform the safety evaluation of the compound with an even higher accuracy, while further improving the convenience for the user.
The output unit 80 of the safety prediction device 1B can output a message related to the prediction result of the safety evaluation of the molecule and the confidence score of the prediction in the case where the confidence score of the prediction is high, and output a message related to the prediction result of the safety evaluation of the molecule, the confidence score of the prediction, and the safety evaluation data in the case where the confidence score of the prediction is low. Similar to the safety prediction device 1A, the user can accurately determine the evaluation of the safety of the compound, by checking the contents of the output message. Hence, the safety prediction device 1B can also appropriately perform the safety evaluation of the compound with a high accuracy, while further improving the convenience for the user.
In the safety prediction device 1B, the output unit 80 can output a message indicating that the prediction result of the safety evaluation of the molecule coincides with the safety evaluation data in a case where the confidence score of the prediction is low and the coincidence level is high, and can output a message indicating that the prediction result of the safety evaluation of the molecule does not coincide with the safety evaluation data in a case where the confidence score and the coincidence level are low. Thus, the safety prediction device 1B can provide the user with the message content indicating the coincidence level between the prediction result of the safety evaluation of the predicted compound and the safety evaluation data. The user can more accurately judge the safety evaluation of the compound by checking the contents of the output message. As a result, the safety prediction device 1B can more appropriately perform the safety evaluation of the compound, in particular, the compound for which the safety evaluation is difficult to predict, with a high accuracy, while further improving the convenience for the user.
Because the safety prediction device 1B can conveniently predict the safety evaluation of the compound with a high accuracy and at a low computation cost, similar to the safety prediction device 1A, the safety of the compound used in materials, pharmaceuticals, or the like utilized in chemical industry, pharmaceutical industry, or the like, for example, can be predicted with a high accuracy, and thus the safety prediction device 1B is suitable for safely performing the research and development, product manufacturing, or the like.
Further, similar to the safety prediction device 1A, the safety prediction device 1B can be effectively used for an evaluation test on the biodegradability, bioaccumulation, mutagenicity, fish acute toxicity, daphnia magna acute immobilization, algal growth inhibition, mammal repeat dose toxicity, or the like.
Next, an example of a hardware configuration of the safety prediction devices 1A and 1B will be described.
The CPU 101 controls an overall operation of the safety prediction devices 1A and 1B, and performs various information processing. The CPU 101 executes a safety prediction program stored in the ROM 103 or the auxiliary storage device 107, to control display operations of a measurement recording screen and an analyzing screen.
The RAM 102 is used as a work area of the CPU 101, and may include a non-volatile RAM that and stores main control parameters and information.
The ROM 103 stores a basic input output program or the like. The safety prediction program may be stored in the ROM 103.
The input device 104 is a keyboard, a mouse, an operation button, a touchscreen panel, or the like.
The output device 105 is a monitor display or the like. The output device 105 displays the prediction result or the like, and the screen is updated according to an input-output operation via the input device 104 or the communication module 106.
The communication module 106 is a data transmission reception device, such as a network card or the like, and functions as a communication interface that acquires information from an external data recording server or the like, and outputs analyzed information to another electronic device.
The auxiliary storage device 107 is a storage device, such as a Solid State Drive (SSD), a Hard Disk Drive (HDD), or the like, and stores various data, files, or the like required for the operation of the safety prediction devices 1A and 1B, for example.
The functions of the safety prediction devices 1A and 1B illustrated in
The safety prediction program is stored in the storage device included in the computer, for example. A part or all of the safety prediction program may be transmitted via a transmission medium, such as a communication line or the like, and received and recorded (including being installed) by the communication module 106 or the like included in the computer. Further, the safety prediction program may be configured to be recorded (including being installed) in the computer from a state where a part or all of the safety prediction program is stored in a portable storage medium, such as a CD-ROM, a DVD-ROM, a flash memory, or the like.
The program executed by the information processor has a module configuration including each of the processing units of the safety prediction devices 1A and 1B described above, and the processor 101 reads and executes the program as appropriate, thereby generating the processing units described above on the memory, such as the RAM 102 or the like.
The safety prediction devices 1A and 1B may be configured as a system in which a plurality of information processors are communicably connected to one another, and each of the processing units described above may be distributed among the plurality of information processors. Alternatively, the safety prediction devices 1A and 1B may be a virtual machine operating on a cloud system.
Although the present invention is described with reference to the embodiments, the present invention is not limited to the embodiments described above. The embodiments can be implemented in various other forms, and various combinations, omissions, substitutions, modifications, or the like can be made without departing from the scope of the present invention. The embodiments and modifications thereof are included in the scope and spirit of the present invention, and are also included in the scope of the present invention described in the claims and equivalents thereof.
This application is based upon and claims priority to Japanese Patent Application No. 2021-144755, filed on Sep. 6, 2021, before the Japan Patent Office, and the entire contents of which are incorporated herein by reference.
Number | Date | Country | Kind |
---|---|---|---|
2021-144755 | Sep 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/032725 | 8/31/2022 | WO |