The present invention relates to a storage medium, a machine learning method, and a machine learning device.
A knowledge graph embedding technique has been known. For example, in the knowledge graph, knowledge is expressed as a triad, a so-called triple, such as “for s (subject), a value (object) of r (predicate) is o”. There is a case where s and o are referred to as entities, and r is referred to as a relation. Transformation for embedding each of the triple elements (s, r, and o) as a vector in a feature space is acquired by performing machine learning. A model generated through machine learning in this way is used for inference such as link prediction for predicting a triple having an unknown relationship, as an example.
Patent Document 1: Japanese Laid-open Patent Publication No. 2019-125364, Patent Document 2: Japanese National Publication of International
Patent Application No. 2016-532942.
According to an aspect of the embodiments, a non-transitory computer-readable storage medium storing a machine learning program that causes at least one computer to execute a process, the process includes classifying a plurality of entities included in a graph structure that indicates a relationship between the plurality of entities to generate a first group and a second group; specifying a first entity positioned in a connection portion of the graph structure between the first group and the second group; and training a machine learning model by inputting first training data that indicates a relationship between the first entity and a second entity of the plurality of entities into the machine learning model in priority to a plurality of pieces of training data that indicates the relationship between the plurality of entities other than the first training data.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
In the knowledge graph embedding technique described above, even though an effect on convergence of a parameter of a model is not the same between all entities, all the entities are treated in the same way at the time when machine learning is performed. Because this prolongs the convergence of the parameter of the model due to some entities, this causes a processing delay of machine learning.
In one aspect, an object of the present invention is to provide a machine learning program, a machine learning method, and a machine learning device that can realize acceleration of machine learning related to graph embedding.
It is possible to realize acceleration of machine learning related to graph embedding.
Hereinafter, a machine learning program, a machine learning method, and a machine learning device according to the present application will be described with reference to the attached drawings. Note that this embodiment does not limit the disclosed technology. Then, each of the embodiments may be suitably combined without causing contradiction between processing content.
As illustrated in
The server device 10 is an example of a computer that provides the machine learning service described above. The server device 10 may correspond to an example of a machine learning device. As an embodiment, the server device 10 can be implemented by installing a machine learning program that realizes a function corresponding to the machine learning service described above on an arbitrary computer. For example, the server device 10 can be implemented as a server that provides the machine learning service described above on-premises. In addition, the server device 10 may provide the machine learning service described above as a cloud service by being implemented as a software as a service (SaaS) type application.
The client terminal 30 is an example of a computer that receives the provision of the machine learning service described above. For example, the client terminal 30 corresponds to a desktop type computer such as a personal computer or the like. This is merely an example, and the client terminal 30 may be any computer such as a laptop type computer, a mobile terminal device, or a wearable terminal.
A knowledge graph is given as an example of a graph to be embedded. For example, in the knowledge graph, knowledge is expressed as a triad, a so-called triple, such as “for s (subject), a value (object) of r (predicate) is o”. There is a case where s and o are referred to as entities, and r is referred to as a relation. Transformation for embedding each of the triple elements (s, r, and o) as a vector in a feature space is acquired by performing machine learning. A model generated through machine learning in this way can be used for inference such as link prediction for predicting a triple having an unknown relationship, as an example.
As described in the field of background art above, in the knowledge graph embedding technique described above, even though an effect on convergence of a parameter of a model is not the same between all entities, all the entities are treated in the same way at the time when machine learning is performed. Therefore, acceleration of machine learning is limit because the convergence of the model parameter is prolonged due to some entities.
Therefore, the machine learning service according to the present embodiment adopts a problem-solving approach that gives superiority or inferiority to the entities used at the time of performing machine learning as one aspect. Such an approach may be adopted first from a technical point of view such that an effect on the convergence of the model parameter is different according to whether or not an entity is positioned in a connection portion between modules appearing in a network having a graph structure indicating a relationship between the entities.
That is, as described below, it is obtained as various study results that the network described above has a module structure. Merely as an example, modularity is often found in a network investigated with “Stack Overflow”. The network may be divided into some modules, and in particular, the module may be referred to as a community in a social network. As another example, as is clear from the following paper 1, in the actual metabolic network, functional modules that can be separately considered in a biochemically collective manner exist. [Paper 1] Ravasz E, Somera A L, Mongru D A, Oltvai Z N, Barabasi A L. “Hierarchical organization of modularity in metabolic networks.” Science. 2002 August 30;297(5586):1551-5.
Moreover, in the module that appears in the network, the modularity has an aspect of appearing in a correlation matrix. In
For example, the network described above can be generated from the correlation matrix between the entities. Here, merely as an example, an example will be described where it is assumed that entities of which a correlation coefficient is equal to or more than a predetermined threshold, for example, 0.7 have a relation and a network having a graph structure in which the entity is expressed as a node and the relation is expressed as an edge is generated.
Such a network can be classified into a plurality of modules as described above. Merely as an example, by applying spectral clustering to the node included in the network, the node can be classified into a cluster corresponding to the module.
Entity: ei is an element of Ei(i=1, . . . , 6)module_1
Entity: ej is an element of E2(j=7, 8, 9)module_2
In the example illustrated in
Under such a condition in which the mediating entities and the in-module entities are identified, training data is distinguished according to whether or not the element s or o of the triple among a plurality of pieces of training data (training data) expressed by the triple (s, r, and o) includes the mediating entity.
For example, while the training data in which the element s or o of the triple includes the mediating entity is identified as “first training data”, the training data in which the element s or o of the triple does not include the mediating entity is identified as “another piece of training data”.
As an example of such first training data, ti (e8, r, e1) and t2 (e8, r, e2) are exemplified. When a parameter of a model is updated on the basis of these t1 and t2, that is, e8, e1, and e2, this affects almost all the entities E1 and E2. Furthermore, although the mediating entities such as e8, e1, or e2 are not included, when the parameter of the model is corrected on the basis of the another piece of the training data including the in-module entities E1 and E2, it is needed to correct the parameter of the model on the basis of t1 and t2 again accordingly.
From these, even if training (training) based on the entities in the module_1 and the module_2 converges, it is still considered that cost of the triples t1 and t2 of the mediating entities does not decrease and the number of necessary times of training is larger than of the in-module entities.
More specifically, since the number of in-module entities is larger than that of the mediating entities, the cost is calculated with a large number of triples for only one epoch. Therefore, an embedded vector that satisfies these triples at the same time is trained. On the other hand, since the number of triples of the mediating entity is less than that of the in-module entity, there are few opportunities for cost calculation in one epoch. From these, there is a high possibility that the triple including the mediating entity takes a long time to converge the parameter of the model or end training at high cost.
From the above, the machine learning service according to the present embodiment prioritizes an execution order of machine learning of the first training data including the mediating entity in the triple or a change rate of the model parameter than other pieces of training data that does not include the mediating entity in the triple.
That is, training regarding a global relationship across the plurality of modules is preceded. This stabilizes expression of an embedded vector of an influencer that affects expression of an embedded vector of another entity in advance. Besides, training regarding a local relationship closed in a single module follows. Entities other than the influencer, for example, the in-module entity is highly independent. Therefore, even if the expression of the embedded vector of the entity of the other module is trained, an effect on the entity of the module is small. Therefore, under a situation in which the expression of the embedded vector of the mediating entity corresponding to the influencer is stable, it is possible to suppress the expression of the embedded vector of the entity of the module to correction at a minor correction level.
Therefore, according to the machine learning service according to the present embodiment, since the convergence of the model parameter can be accelerated, it is possible to accelerate machine learning related to graph embedding.
Next, a functional configuration of the server device 10 according to the present embodiment will be described. In
The communication interface unit 11 corresponds to an example of a communication control unit that controls communication with another device, for example, the client terminal 30.
Merely as an example, the communication interface unit 11 is realized by a network interface card such as a LAN card. For example, the communication interface unit 11 receives a request for performing machine learning from the client terminal 30 or outputs the machine learning model generated as a result of machine learning to the client terminal 30.
The storage unit 13 is a functional unit that stores data used for various programs such as the machine learning program described above, including an operating system (OS) executed by the control unit 15.
As an embodiment, the storage unit 13 is realized by an auxiliary storage device of the server device 10. For example, a hard disk drive (HDD), an optical disc, a solid state drive (SSD), or the like corresponds to the auxiliary storage device. Additionally, a flash memory such as an erasable programmable read only memory (EPROM) may correspond to the auxiliary storage device.
As an example of the data used for the program executed by the control unit 15, the storage unit 13 stores correlation data 13A, training data 13L, and model data 13M. In addition to the correlation data 13A, the training data 13L, and the model data 13M, the storage unit 13 can store various types of data such as account information of a user who receives provision of the machine learning service described above, in addition to test data used for a test of a trained model.
The correlation data 13A is data indicating a correlation of entities. Merely as an example, as the correlation data 13A, data associated with a correlation coefficient between the entities for each combination of the entities or the like can be adopted.
The training data 13L is data used for machine learning related to graph embedding. As an example of such training data 13L, the storage unit 13 stores a plurality of pieces of training data expressed by the triple (s, r, o).
Fujitsu Ref. No.: 20-00123
The model data 13M is data related to the machine learning model. For example, in a case where the machine learning model is a neural network, the model data 13M may include a parameter of a model such as a weight of each layer or a bias, including a layer structure of a model such as a neuron or a synapses of each layer including an input layer, a hidden layer, and an output layer forming the model. Note that, at a stage before model training is performed, as an example of the parameter of the model, a parameter that is initially set by a random number is stored, while at a stage after the model training is performed, a trained parameter is saved.
The control unit 15 is a processing unit that performs overall control of the server device 10. As one embodiment, the control unit 15 is realized by a hardware processor such as a central processing unit (CPU) or a micro-processing unit (MPU). While the CPU and the MPU are exemplified as an example of the processor here, it may be implemented by any processor regardless of whether it is a versatile type or a specialized type. In addition, the control unit 15 may be realized by hard wired logic such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
By developing the machine learning program described above in a memory (not illustrated), for example, in a work area of a random access memory (RAM), the control unit 15 virtually realizes the following processing units. As illustrated in
The reception unit 15A is a processing unit that receives an execution request of machine learning described above. As an embodiment, the reception unit 15A can receive designation of a set of data used for machine learning described above, for example, the correlation data 13A, the training data 13L, and the model data 13M. As described above, some or all of the datasets used for machine learning do not necessarily need to be data stored in the storage unit 13. For example, the reception unit 15A can receive some or all of the datasets saved in the client terminal 30 or an external device (not illustrated), for example, a file server or the like. Then, the reception unit 15A reads a set of the data designated from the client terminal 30, for example, the correlation data 13A, the training data 13L, and the model data 13M from the storage unit 13 to a predetermined storage region, for example, a work area that can be referred by the control unit 15.
The generation unit 15B is a processing unit that generates a network having a graph structure indicating a relationship between the entities. As an embodiment, the generation unit 15B can be generated from the correlation matrix between the entities included in the correlation data 13A. For example, the generation unit 15B generates the network having the graph structure indicating each entity as a node and indicating a relation as an edge, as assuming that entities of which a correlation coefficient is equal to or more than a predetermined threshold, for example, 0.7 have a relation among the correlation matrices between the entities included in the correlation data 13A.
In the example illustrated in
The classification unit 15C is a processing unit that classifies the node included in the network into a plurality of modules. The “module” here corresponds to an example of a group. As an example, by applying spectral clustering on the node included in the network generated by the generation unit 15B, the classification unit 15C can classify the node into a cluster corresponding to the module. In a case where spectral clustering is applied in this way, the classification unit 15C can use a correlation coefficient between entities corresponding to nodes at both ends of an edge, as a similarity, to set a weight given to each edge included in the network. For example, in a case where the node corresponding to each of the entities e1 to e9 included in the network illustrated in
Note that, here, a case is illustrated where each entity is independent between the two modules. However, each entity does not necessarily need to be completely independent. For example, in a case where conditions intersection of E1 and E2 is Es, but |Es|<<|E1| and |Es|<<|E2| are satisfied, the existence of the entity Es overlapping between the two modules may be recognized.
The specification unit 15D is a processing unit that specifies the first entity positioned in the connection portion of the graph structure between the modules. The “module” here corresponds to an example of a group. As an embodiment, the specification unit 15D searches for an edge connecting between the modules that are generated according to the classification of the clustering by the classification unit 15C, from the network generated by the generation unit 15B. The entities corresponding to the nodes at both ends of the edge hit in such a search are specified as the mediating entities. For example, in the example illustrated in
Here, the plurality of modules is not necessarily connected by one edge.
In preparation for such a case, an upper limit of the number of concatenations of the edges for connecting between the modules at the time of searching for the edge can be set as a search condition. For example, when the upper limit of the number of concatenations is set to “2”, edges for connecting the two module_m and module_m+1 with two concatenations, that is, two edges indicated by thick lines in
Note that, here, an example has been described where the upper limit of the number of concatenations of the edges for connecting between the modules is set. However, it is possible to increment the number of concatenations and search for the edge connecting between the modules, when an initial value of the number of concatenations is set to “0” and until a predetermined number of mediating entities are obtained or the number of concatenations reaches the upper limit.
As described above, the mediating entity specified by the specification unit 15D is saved in the storage region that can be referred by the execution unit 15F, as a first entity 15E.
The execution unit 15F is a processing unit that performs machine learning. As an embodiment, the execution unit 15F prioritizes an execution order of machine learning of the first training data including the mediating entity in the triple or a change rate of the model parameter than other pieces of training data that does not include the mediating entity in the triple.
Hereinafter, merely as an example, an example will be described where the execution order of machine learning of the first training data is prioritized. In this case, the execution unit 15F extracts the first training data including the mediating entity specified by the specification unit 15D, from among the training data included in the training data 13L. Besides, the execution unit 15F repeats the following processing for the number of times corresponding to the number of pieces of first training data for each epoch until a predetermined end condition is satisfied. In other words, the execution unit 15F inputs the first training data into a model developed on the work area (not illustrated) according to the model data 13M. As a result, a score Φ. of a triple of the first training data is output from the model.
Here, various models illustrated in
All of the examples of the model illustrated in
[Paper 3]
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning. pages 809-816
[Paper 4]
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. The 3rd International Conference on Learning Representations.
[Paper 5]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems. pages 2787-2795
[Paper 6]
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by trans-lating on hyperplanes. In The Twenty-eighth AAAI Conference on Artificial Intelligence. pages 1112-1119.
Thereafter, in a case where the score Φ of the triple is obtained for each piece of the first training data, the execution unit 15F updates the parameter of the model. Merely as an example, in a case where “TransE” of the models illustrated in
In this case, after machine learning of the first training data ends, the execution unit 15F extracts another piece of the training data that does not include the mediating entity specified by the specification unit 15D, from among the training data included in the training data 13L. Besides, the execution unit 15F repeats the following processing for the number of times corresponding to the number of the other pieces of the training data for each epoch until a predetermined end condition is satisfied. In other words, the execution unit 15F inputs the another piece of the training data into the model developed on the work area (not illustrated) according to the model data 13M after machine learning of the first training data ends. As a result, a score Φ of a triple of the another piece of the training data is output from the model.
Thereafter, in a case where the score Φ of the triple is obtained for each piece of the another training data, the execution unit 15F updates the parameter of the model. Merely as an example, in a case where “TransE” of the models illustrated in
As illustrated in
Subsequently, the generation unit 15B generates the network having the graph structure indicating the relationship between the entities on the basis of the correlation matrix between the entities included in the correlation data 13A (step S102).
Then, by applying spectral clustering on the node included in the network generated in step S102, the classification unit 15C classifies the node into the cluster corresponding to the module (step S103).
Subsequently, the specification unit 15D specifies the mediating entity positioned in the connection portion of the graph structure between the modules generated according to the classification of the clustering in step S103, from the network generated in step S102 (step S104).
Thereafter, the execution unit 15F extracts the first training data including the mediating entity specified in step S104, from among the training data included in the training data 13L (step S105). Besides, the execution unit 15F repeats the processing from step S106 to step S108 below until a predetermined end condition is satisfied. Moreover, the execution unit 15F repeats the processing in step S106 below for the number of times corresponding to the number of pieces of the first training data for each epoch.
In other words, the execution unit 15F inputs the first training data into the model developed on the work area (not illustrated) according to the model data 13M (step S106). As a result, a score Φ of a triple of the first training data is output from the model.
Thereafter, in a case where the score Φ of the triple is obtained for each piece of the first training data, the execution unit 15F calculates a cost on the basis of the scores 13. of all the triples (step S107). On the basis of the cost calculated in this way, after executing the calculation of the parameter such as the optimization of the log likelihood, the execution unit 15F updates the parameter of the model included in the model data 13M to a parameter obtained through calculation (step S108).
Then, by executing steps S106 to S108 described above until a predetermined end condition is satisfied, after machine learning of the first training data ends, the execution unit 15F executes following processing. In other words, as illustrated in
Besides, the execution unit 15F repeats the processing from step S110 to step S112 below until a predetermined end condition is satisfied. Moreover, the execution unit 15F repeats the processing in step S110 below for the number of times corresponding to the number of pieces of the another training data for each epoch.
In other words, the execution unit 15F inputs the another piece of training data into the model developed on the work area (not illustrated) according to the model data 13M after machine learning of the first training data ends (step S110). As a result, a score Φ of a triple of the another piece of the training data is output from the model.
Then, in a case where the score Φ of the triple is obtained for each piece of the another training data, the execution unit 15F calculates a cost on the basis of the scores 13. of all the triples (step S111). On the basis of the cost calculated in this way, after executing the calculation of the parameter such as the optimization of the log likelihood, the execution unit 15F updates the parameter of the model included in the model data 13M to a parameter obtained through calculation (step S112).
Thereafter, after repeatedly executing step S110 to step S112 described above until the predetermined end condition is satisfied, machine learning of the another piece of the training data ends, and the entire processing ends.
As described above, the machine learning service according to the present embodiment prioritizes machine learning of the training data including the mediating entity positioned in the connection portion between the modules that appear in the network with the graph structure indicating the relationship between the entities in the triple than machine learning of the another piece of the training data. Therefore, according to the machine learning service according to the present embodiment, since the convergence of the model parameter can be accelerated, it is possible to accelerate machine learning related to graph embedding.
While the embodiment relating to the disclosed device has been described above, the present invention may be carried out in a variety of different modes in addition to the embodiment described above. Thus, hereinafter, another embodiment included in the present invention will be described.
In the first embodiment described above, as an example, an example has been described where the execution order of machine learning of the first training data is prioritized. However, the present invention is not limited to this. For example, the execution unit 15F can collectively perform machine learning of the first training data and the another piece of the training data. In this case, when updating the parameter using the first training data that includes the mediating entity in the triple, it is sufficient for the execution unit 15F to largely change the change rate of the parameter than a case where the parameter is updated using the another piece of the training data that does not include the mediating entity in the triple.
In the first embodiment described above, merely as an example, an example has been described where the network data is generated using the correlation data 13A that is prepared separately from the training data 13L. However, the correlation data can be generated from the training data 13L. For example, in a case where a triple including a pair of entities corresponding to each combination of the entities included in the training data 13L as s and o exists in the training data 13L, that is, in a case where a relation of the combination exists, the correlation coefficient is set to “1”. On the other hand, in a case where the triple including the pair of the entities corresponding to the combination as s and o does not exist in the training data 13L, that is, in a case where the relation of the combination does not exist, the correlation coefficient is set to “0”. As a result, it is possible to generate the correlation data.
In the first embodiment described above, as an example of the graph structure of the network, an undirected graph is described. However, the machine learning processing illustrated in
As an example, while the server device 10 generates the edge between the nodes corresponding to the combination in a case where at least one relation of all the types of relations exists for each combination of the entities, the server device 10 prohibits the edge between the nodes corresponding to the combination in a case where any one of all the types of relations does not exist. The mediating entity can be specified from the obtained module by clustering the node included in the network generated in this way.
As another example, the server device 10 generates the network for each type of the relation and performs clustering on the network for each type of the relation. Besides, the server device 10 can specify the mediating entity on the basis of the module obtained as a clustering result of a relation having the highest modularity from among the results of clustering for each type of the relation. In this case, the server device 10 can evaluate a degree of the modularity of each relation according to Newman Modularity or the like.
In the first embodiment described above, an example has been described where machine learning of the another piece of the training data is performed in order for the number of pieces of the another training data.
However, the present invention is not limited to this. As described above, the another piece of the training data includes only the in-module entities, and the in-module entity has a sufficiently smaller effect on the expression of the vector of the entity in the different module than the mediating entity. For example, the another piece of the training data may include second training data indicating a relationship between entities in a first group and third training data indicating a relationship between entities in a second group. From this, the execution unit 15F performs machine learning of the machine learning model by inputting the second training data and the third training data into the machine learning model in parallel. For example, in the example in
Furthermore, individual components of each of the illustrated devices are not necessarily physically configured as illustrated in the drawings.
In other words, specific modes of distribution and integration of the individual devices are not restricted to those illustrated, and all or some of the devices may be configured by being functionally or physically distributed and integrated in any unit depending on various loads, usage status, and the like. For example, the reception unit 15A, the generation unit 15B, the classification unit 15C, the specification unit 15D, or the execution unit 15F may be connected via a network as the external device of the server device 10. Furthermore, each of the reception unit 15A, the generation unit 15B, the classification unit 15C, the specification unit 15D, or the execution unit 15F is included in another device, connected to the network, and collaborates together so that the functions of the server device 10 described above may be realized.
In addition, various types of processing described in the embodiment above may be realized by executing a program prepared in advance by a computer such as a personal computer or a workstation. Therefore, hereinafter, an example of a computer that executes the machine learning program that has a function similar to the first embodiment described above and the present embodiment will be described with reference to
As illustrated in
Under such an environment, the CPU 150 reads the machine learning program 170a from the HDD 170 and then loads the machine learning program 170a into the RAM 180. As a result, the machine learning program 170a functions as a machine learning process 180a as illustrated in
Note that the machine learning program 170a described above does not necessarily have to be stored in the HDD 170 or the ROM 160 from the beginning. For example, the machine learning program 170a is stored in a “portable physical medium” such as a flexible disk, which is what is called an FD, a compact disc (CD)-ROM, a digital versatile disk (DVD), a magneto-optical disk, or an integrated circuit (IC) card to be inserted in the computer 100. Then, the computer 100 may obtain and execute the machine learning program 170a from those portable physical media. Furthermore, the machine learning program 170a may be stored in another computer, a server device, or the like connected to the computer 100 via a public line, the Internet, a local area network (LAN), a wide area network (WAN), or the like, and the computer 100 may obtain and execute the machine learning program 170a from them.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2020/008992 filed on Mar. 3, 2020 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2020/008992 | Mar 2020 | US |
Child | 17897290 | US |