The disclosed technique relates to a storage medium, an estimation device, and an estimation method.
Conventionally, a concerned event has been estimated using a machine learning model in which machine learning has been executed using past cases as training data. For example, a system that calculates similarities between drugs and estimates side effects of a given drug has been proposed. This system includes a similarity calculation device and a side effect determination device. The similarity calculation device obtains data related to drug sets from a plurality of open data sources, generates resource description framework (RDF) triples, and stores an RDF graph of the RDF triples. The similarity calculation device generates feature vectors for each drug, based on the RDF triples, and calculates similarities of each drug to all other drugs by comparing the feature vectors. The side effect determination device estimates side effects of a given drug, based on the similarities between the drugs.
Patent Document 1: Japanese Laid-open Patent Publication No. 2016-212853.
According to an aspect of the embodiments, a non-transitory computer-readable storage medium storing an estimation program that causes at least one computer to execute a process, the process includes inputting training data that includes a vector of graph data, a vector of ontology, and a label; training a machine learning model based on a loss function acquired by the label and a value obtained by merging a value of an activation function acquired with the vector of the graph data and a value of the activation function acquired with the vector of the ontology.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
As in the conventional technique described above, there are cases where the accuracy of estimating side effects is not sufficient with only similarities between medications (drugs) obtained by comparing feature vectors. This is because, for example, even patients to which the same medication is administered sometimes experience different situations as to side effects when the patients suffer from different diseases. The situation as described above can arise not only when estimating side effects with similarities between medications, but also when estimating some event using a machine learning model in which machine learning has been executed using past cases as training data.
In one aspect, the disclosed technique aims to train a machine learning model so as to improve the accuracy of event estimation.
In one aspect, the effect that a machine learning model may be trained so as to improve the accuracy of event estimation is achieved.
Hereinafter, examples of embodiments according to the disclosed technique will be described with reference to the drawings. Note that, in each of the following embodiments, a case where the disclosed technique is applied to the estimation of an unexpected effect (hereinafter referred to as “side effect”) in the administration of medications will be described as an example.
First, before describing the details of the embodiments, the case of using feature vectors that combine ontology with past case data will be considered by taking into account that side effects are sometimes not allowed to be estimated with high accuracy only by comparing the similarities between medications, as in the conventional technique. Case data is assumed to include information such as attributes of patients, medications that were administered, and diseases the patients are suffering from. In addition, ontology is a systematization of background knowledge in a concerned field, and in the case of the present embodiments, for example, information such as the similarities and relationships between diseases, and the similarities between medications and the ingredients contained therein are organized in a tree structure format or the like. There is a possibility that alike side effects will arise, for example, when diseases are similar or when medications containing the same ingredient are administered. Thus, it is considered that such a possibility can be estimated by using a feature vector including information on the ontology as described above, as a feature.
However, it is sometimes difficult to generate feature vectors by arranging features indicating the case data and features indicating the ontology. For example, although it is possible to arrange ingredients contained in medications as features, it is difficult to use relationships between diseases organized in a tree structure format as features.
Thus, the method as follows is conceivable. This method converts the case data into graph data constituted by nodes and edges coupling between the nodes and merges the tree structure ontology to this graph data. This method then calculates embedding vectors expressing each node from the graph data that combines the case data and the ontology. Furthermore, this method is a method that trains a machine learning model using feature vectors generated from these embedding vectors as training data. However, in the case of this method, there is no distinction in handling information regarding the case data and the information regarding the ontology included in the feature vector, and the information on the ontology is sometimes not allowed to be appropriately reflected in the estimation of the event (here, the side effect). Thus, each of the following embodiments ensures that the information on the ontology is appropriately reflected in machine learning of a machine learning model. Hereinafter, each embodiment will be described in detail.
A machine learning system according to a first embodiment includes a machine learning device 10 and an estimation device 30. First, the machine learning device 10 will be described. As illustrated in
Similarly, the disease ontology is also tree structure information including nodes indicating diseases (circles with disease names written inside), nodes indicating background knowledge (ellipses with background knowledge written inside), and edges (arrows) coupling between related nodes. For example, when a disease called alcohol ingestion is classified as a mental disease, the node indicating alcohol ingestion and the node indicating mental disease are coupled by an edge, and related information such as “classification” is attached to the edge.
The machine learning device 10 functionally includes a graph generation unit 12, an embedding vector calculation unit 14, a training data generation unit 16, and a machine learning unit 18, as illustrated in
The graph generation unit 12 acquires the machine learning case data input to the machine learning device 10 and generates graph data constituted by nodes and edges coupling between the nodes, from the acquired machine learning case data. For example, as illustrated in
In addition, the graph generation unit 12 generates graph data in which the ontology is coupled to the case graph data based on the machine learning case data. Specifically, the graph generation unit 12 couples the case graph data and the ontology by sharing matching nodes between the case graph data and the ontology. For example, the graph generation unit 12 searches the medication ontology and the disease ontology for nodes that match the nodes indicating “medications” and “diseases” included in the case graph data and extracts the nodes found by the search and the portions coupled to these nodes. Then, the graph generation unit 12 couples the portions extracted from the ontology to the case graph data so as to superimpose the matching nodes indicating “medications” or “diseases”, as in the portion indicated by the dashed line in
The embedding vector calculation unit 14 calculates embedding vectors representing each node included in the overall graph data, based on the overall graph data. Specifically, the embedding vector calculation unit 14 calculates the embedding vectors by mapping each of the nodes and edges included in the overall graph data to an n-dimensional vector space. More specifically, as illustrated in the upper diagram of
First, as illustrated in the middle diagram of
The training data generation unit 16 uses the embedding vectors calculated by the embedding vector calculation unit 14 and correct answer labels generated from information on side effects to generate training data to be used for machine learning of the machine learning model. Specifically, for each node with “ID” included in the overall graph data, the training data generation unit 16 generates features by concatenating the vector values of the embedding vectors calculated for each node coupled to each node with “ID”. Then, based on the information on side effects, the training data generation unit 16 generates a correct answer label indicating “TRUE” when the concerned side effect has been caused, and a correct answer label indicating “FALSE” when the concerned side effect has not been caused, and generates training data by adding the generated correct answer labels to the features.
The machine learning unit 18 uses the training data generated by the training data generation unit 16 to update the parameters of a machine learning model 20, for example, constituted by a neural network or the like. Here,
The machine learning unit 18 updates the parameters of the machine learning model 20 having the network configuration as described above so as to minimize the value LOSS of the loss function indicated below.
The loss function of A and B is denoted by g(A, B) and, for example, is a function for working out the sum-of-squares error, cross-entropy error, and the like. The function that returns 1 when the correct answer label has TRUE and 0 when the correct answer label has FALSE is denoted by Label. The output value when features of the training data are input to the machine learning model 20 is denoted by Output. A vector made up of the case data feature among the features included in the training data is denoted by T. A vector made up of the medication feature among the features included in the training data is denoted by O1. A vector made up of the disease feature among the features included in the training data is denoted by O2. The activation function corresponding to the first hidden layer is denoted by f1, the activation function corresponding to the second hidden layer is denoted by f2, and the activation function corresponding to the third hidden layer is denoted by f3. These activation functions are, for example, rectified linear units (ReLUs). That is, the value of the activation function calculated only with the embedding vectors of the nodes of the case graph data in the input training data is denoted by f1(T). In addition, the value of the activation function calculated only with the embedding vectors of the nodes of the medication ontology in the input training data is denoted by f2(O1). Likewise, the value of the activation function calculated only with the embedding vectors of the nodes of the disease ontology in the input training data is denoted by f3(O2). The activation function corresponding to the fourth hidden layer is denoted by f4 and, for example, is a sigmoid function. That is, the value obtained by applying the activation function to the vector obtained by merging all features and output from each of the first to third hidden layers is denoted by f4(T, O1, O2, f1(T), f2(O1), f3(O2)).
In cases such as when the value LOSS of the loss function described above is equal to or lower than a predetermined threshold value, when the difference from previously worked-out LOSS is equal to or lower than a predetermined value, and when the number of iterations of machine learning has reached a predetermined number, the machine learning unit 18 concludes that the value LOSS of the loss function has been minimized. When concluding that the value LOSS of the loss function has been minimized, the machine learning unit 18 ends the machine learning and outputs the machine learning model 20 including information on the network configuration and the values of the parameters at the time point when the machine learning ended.
Next, the estimation device 30 will be described. As illustrated in
The estimation device 30 functionally includes a graph generation unit 32, an embedding vector calculation unit 34, and an estimation unit 36, as illustrated in
The graph generation unit 32 is similar to the graph generation unit 12 of the machine learning device 10, except that the data from which the graph data is generated is the estimation object case data instead of the machine learning case data. In addition, the embedding vector calculation unit 34 is also similar to the embedding vector calculation unit 14 of the machine learning device 10.
For each node with “ID” included in the overall graph data generated by the graph generation unit 32, the estimation unit 36 generates features by concatenating the vector values of the embedding vectors calculated by the embedding vector calculation unit 34 for each node coupled to each node with “ID”. The features to be generated include each of the case data features, the medication features, and the disease features, similar to the features included in the training data generated by the training data generation unit 16 of the machine learning device 10. By inputting the generated features to the machine learning model 20, the estimation unit 36 outputs an estimation result indicating whether or not the concerned side effect is to occur for the estimation object case data. For example, as illustrated in
The machine learning device 10 can be implemented by a computer 40 illustrated in
The storage unit 43 can be implemented by a hard disk drive (HDD), a solid state drive (SSD), a flash memory, or the like. The storage unit 43 as a storage medium stores a machine learning program 50 for causing the computer 40 to function as the machine learning device 10. The machine learning program 50 has a graph generation process 52, an embedding vector calculation process 54, a training data generation process 56, and a machine learning process 58.
The CPU 41 reads the machine learning program 50 from the storage unit 43 to load the read machine learning program 50 into the memory 42 and sequentially executes the processes included in the machine learning program 50. The CPU 41 operates as the graph generation unit 12 illustrated in
The estimation device 30 can be implemented by, for example, a computer 60 illustrated in
The storage unit 63 can be implemented by an HDD, an SSD, a flash memory, or the like. The storage unit 63 as a storage medium stores an estimation program 70 for causing the computer 60 to function as the estimation device 30. The estimation program 70 has a graph generation process 72, an embedding vector calculation process 74, and an estimation process 76. In addition, the storage unit 63 includes an information storage area 80 in which information constituting the machine learning model 20 that has undergone machine learning is stored.
The CPU 61 reads the estimation program 70 from the storage unit 63 to load the read estimation program 70 into the memory 62 and sequentially executes the processes included in the estimation program 70. The CPU 61 operates as the graph generation unit 32 illustrated in
Note that the functions implemented by each of the machine learning program 50 and the estimation program 70 can also be implemented by, for example, a semiconductor integrated circuit, in more detail, an application specific integrated circuit (ASIC) or the like.
Next, an effect of a machine learning system according to the first embodiment will be described. First, when the machine learning case data and the ontology are input to the machine learning device 10, the machine learning device 10 executes machine learning processing illustrated in
First, the machine learning processing illustrated in
Next, in step S12, the graph generation unit 12 searches the medication ontology and the disease ontology for nodes that match the nodes indicating “medications” and “diseases” included in the case graph data and extracts the nodes found by the search and the portions coupled to these nodes. Then, the graph generation unit 12 couples the portions extracted from the ontology to the case graph data so as to superimpose the matching nodes indicating “medications” or “diseases” and generates the overall graph data.
Next, in step S14, the embedding vector calculation unit 14 places each of the nodes and edges included in the overall graph data in an n-dimensional vector space as an initial value vector. Then, the embedding vector calculation unit 14 calculates the embedding vector of each node included in the overall graph data, by optimizing the placement of each vector so as to represent the coupling relationship between the nodes. Therefore, the embedding vector of each node of the case graph data and the embedding vector of each node of the ontology are calculated.
Next, in step S16, for each node with “ID” included in the overall graph data, the training data generation unit 16 generates features by concatenating the vector values of the embedding vectors calculated for each node coupled to each node with “ID”. Then, the training data generation unit 16 generates the correct answer labels for the concerned side effect, based on the information on the side effect, and adds the generated correct answer labels to the features to generate the training data.
Next, in step S18, the machine learning unit 18 uses the training data generated in above step S16 to update the parameters of the machine learning model 20 so as to minimize the value LOSS of the loss function described above. When concluding that the value LOSS of the loss function has been minimized, the machine learning unit 18 ends the machine learning and outputs the machine learning model 20 including information on the network configuration and the values of the parameters at the time point when the machine learning ended, which completes the machine learning processing.
Next, the estimation processing illustrated in
As described above, according to the machine learning system according to the first embodiment, the machine learning device accepts input of the training data including embedding vectors of the case graph data, the embedding vectors of the ontology, and the correct answer labels. The machine learning device then executes machine learning of the machine learning model, based on the loss function. The values of the loss function are calculated by values obtained by merging the values of the activation function calculated only with the embedding vectors of the case graph data of the input training data and the values of the activation function calculated only with the embedding vectors of the ontology, and the correct answer labels. This allows the machine learning device according to the first embodiment to train a machine learning model in which information on the case data and information on the ontology are grouped and transmitted. Therefore, the machine learning device according to the first embodiment may train the machine learning model by appropriately reflecting the information on the ontology so as to improve the accuracy of event estimation.
In addition, according to the machine learning system according to the first embodiment, the estimation device uses the machine learning model that has been subjected to the machine learning as described above and the embedding vectors calculated from the estimation object case data and the ontology to estimate an event for the estimation object case. This may improve the accuracy of event estimation.
Next, a second embodiment will be described. Note that, in a machine learning system according to the second embodiment, similar parts to those of the machine learning system according to the first embodiment are designated by the same reference signs and detailed description thereof will be omitted.
A machine learning system according to the second embodiment includes a machine learning device 210 and an estimation device 230. First, the machine learning device 210 will be described. The machine learning device 210 functionally includes a graph generation unit 12, an embedding vector calculation unit 214, a training data generation unit 16, and a machine learning unit 18, as illustrated in
The embedding vector calculation unit 214 first calculates embedding vectors of nodes of ontology in an overall graph data in which the ontology is coupled to the case graph data. For example, as illustrated in
Since the ontology is a systematization of background knowledge, the embedding vector of the ontology accurately reflects the meaning that the coupling between nodes has. Since the embedding vector can be calculated with higher accuracy when the initial values are more appropriately given, the embedding vectors of the case graph data can be calculated with higher accuracy, by using the embedding vectors of the ontology as initial values.
The estimation device 230 functionally includes a graph generation unit 32, an embedding vector calculation unit 234, and an estimation unit 36, as illustrated in
The machine learning device 210 can be implemented by a computer 40 illustrated in
A CPU 41 reads the machine learning program 250 from the storage unit 43 to load the read machine learning program 250 into a memory 42 and sequentially executes the processes included in the machine learning program 250. The CPU 41 operates as the embedding vector calculation unit 214 illustrated in
The estimation device 230 can be implemented by, for example, a computer 60 illustrated in
The CPU 61 reads the estimation program 270 from the storage unit 63 to load the read estimation program 270 into a memory 62 and sequentially executes the processes included in the estimation program 270. The CPU 61 operates as the embedding vector calculation unit 234 illustrated in
Note that the functions implemented by each of the machine learning program 250 and the estimation program 270 can also be implemented by, for example, a semiconductor integrated circuit, in more detail, an ASIC or the like.
As for the effect of the machine learning system according to the second embodiment, only the embedding vector calculation procedures in step S14 of the machine learning processing illustrated in
As described above, according to the machine learning system of the second embodiment, the machine learning device first calculates the embedding vectors of the ontology and, with these calculated embedding vectors as initial values, calculates the embedding vectors of the case graph data. This allows calculation of the embedding vectors with high accuracy, such that the machine learning model may be trained so as to improve the accuracy of event estimation. In addition, the accuracy of event estimation may be improved in the estimation device according to the second embodiment.
Note that, in the above second embodiment, the case where all embedding vectors of the nodes included in the ontology are used as features has been described, but the embodiments are not limited to this. After calculating the embedding vectors by a procedure similar to the procedure in the second embodiment, the medication features and disease features may be generated from the embedding vectors of nodes common between the case graph data and the ontology. That is, in the example in
In addition, in each of the above embodiments, an example in which the disclosed technique is applied to the case of estimating side effects to the administration of a medication to a patient has been described, but the disclosed technique can also be applied to an example of estimating other events. For example, the application to the case of estimating an event that occurs when mixing a plurality of chemical substances, or the like is possible. In this case, the case data can include information such as chemical substances to be mixed, mixing conditions (temperature, catalyst, and the like), information on chemical substances with similar properties, such as the melting points of substances A and B being the same, or the like can be used as ontology, and events that occur during mixing can be treated as correct answer labels.
In addition, in each of the above embodiments, the case of using two types of ontology has been described, but one type of ontology may be used, or three or more types of ontology may be used. In this case, the hidden layers of the machine learning model can be provided in correspondence to each type of ontology to be used.
In addition, in each of the above embodiments, the case where the machine learning device and the estimation device are configured by separate computers has been described, but the machine learning device and the estimation device may be configured by one computer.
In addition, while a mode in which the machine learning program and the estimation program are stored (installed) in the storage unit in advance has been described in each of the above embodiments, the embodiments are not limited to this. The program according to the disclosed technique can also be provided in a form stored in a storage medium such as a compact disc read only memory (CD-ROM), a digital versatile disc read only memory (DVD-ROM), or a universal serial bus (USB) memory.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2020/041077 filed on Nov. 2, 2020 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2020/041077 | Nov 2020 | WO |
Child | 18302084 | US |