The embodiments discussed herein are related to machine learning based on a knowledge graph.
A knowledge graph (KG) is embedded in a vector space to represent nodes (entities) and links (relations) in the knowledge graph by vectors. Such a vector representation is also called an embedded representation. The knowledge graph is also an example of knowledge data attached with knowledge by an ontology, which is generalized level of knowledge (class) and has a hierarchical structure, and an instance which is concrete example level of knowledge and has a graph structure.
Machine learning using such a KG vector representation has been used to give the relationship between entities by a vector representation. For example, the machine learning is performed so that vectors (vh, vr, vt) corresponding to a triple (subject, predicate, object)=(h: starting point, r: relation, t: end point), which is a set of three elements included in the given KG, satisfies “vh+vr=vt”, and the vectors of entities and the vectors of relations are updated. By using the vectors generated by such machine learning, link prediction, relation extraction, class prediction, and the like are performed.
For example, the link prediction is an operation that predicts entities with relationships by using entities and links, and for example, predicts the vector “end point” by inputting the vector “starting point” and the vector “relation” into a model. The relation extraction is an operation that predicts, from two entities, the relationship therebetween, and for example, predicts the vector “relation” by inputting the vector “starting point” and the vector “end point” into a model. The class prediction is an operation that predicts, from two entities, a class to which they belong, and for example, predicts a vector “class” by inputting the vector “starting point” and the vector “end point” into a model.
In recent years, machine learning methods that introduce constraints using the entailment relation between relations into embedding calculations (vector calculations) are known as a way to increase the accuracy of models. Specifically, when there is a relation q between given entities e1 and e2 and there is always a relation r (r entails q), each vector is updated so that the score of a triple (e1, q, e2) is higher than the score of a triple (e1, r, e2). A conventional technology is described in Boyang Ding et al, “Improving Knowledge Graph Embedding Using Simple Constraints”, ACL 2018, for example.
According to an aspect of an embodiment, a non-transitory computer-readable recording medium stores therein a program that causes a computer to execute a process. The process includes specifying a first triple and a second triple included in a knowledge graph, determining which of the first triple and the second triple is associated with more specific information on the basis of at least one of a first comparison and a second comparison, the first comparison being a comparison between a first relation between first two entities included in the first triple and a second relation between second two entities included in the second triple according to an occurrence status of each of a plurality of relations between a plurality of entities in a specific set of classes included in the knowledge graph, the second comparison being a comparison between a first entity connected to any one of the first two entities and a second entity connected to any one of the second two entities, and when it is determined by the determining that the first triple is associated with the more specific information, generating vectors representing elements of the first triple and vectors representing elements of the second triple by machine learning based on a constraint that a difference in the vectors representing the elements of the first triple is smaller than a difference in the vectors representing the elements of the second triple.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
However, the above conventional technology do not always provide a highly accurate vector representation acquired using a model.
For example, the above conventional technology give higher scores to a triple including a relation (target) with a low abstraction level, but the accuracy of vector representations may decrease in unorganized knowledge graphs. For example, targets may be commonly used by different classes of entities, but there are cases where the above conventional technology is not able to find a relation that entails the target and is not able to apply the constraint accurately, which reduces training accuracy. Furthermore, the above conventional technology deal with the abstraction level of relations, but is not able to deal with the abstraction level of entities, so the accuracy of models generated is not always as high as expected.
Preferred embodiments will be explained with reference to accompanying drawings. These examples do not limit the invention. The examples can be appropriately combined without causing a contradiction.
The knowledge graph is an example of knowledge data having ontologies and instances.
The instance of the knowledge graph is concrete example level of knowledge and has a graph structure. For example, an entity “Hanako” is connected to an entity “Kawasaki” by a relation “residence”, and an entity “Jiro” is connected to the entity “Hanako” by a relation “friend”. An entity “Ichiro” is connected to the entity “Kawasaki” by a relation “birthplace”, and is connected to the entity “Jiro” by a brother relation. The entity “Kawasaki” belongs to the class “Place”, and therefore has a relation “type” between the class “Place” and the entity “Kawasaki”. Similarly, the entities “Hanako”, “Jiro”, and “Ichiro” belong to the class “Person” and each have the class “Person” and the relation “type”.
In a reference technology using a technology called TransE, which is a kind of Translation-based model, machine learning is performed so that vectors (vh, vr, vt) corresponding to a triple (h, r, t), which is a set of three elements included in a given knowledge graph, satisfies “vh+vr=vt”, and the vectors of entities and the vectors of relations are updated. In this case, the reference technology introduces constraints using the entailment relation between relations into vector calculations (embedding calculations).
This reference technology performs machine learning by giving higher scores to a triple including a relation and a target with a low abstraction level, but in an unorganized knowledge graph, targets may be commonly used by different classes of entities. There is a case where the reference technology does not find a relation that entails the target and constraints are not able to be properly applied.
Furthermore, the reference technology deals with the abstraction level of relations, but is not able to deal with the abstraction level of entities. For example, it may be conceivable to focus more on the fact that a baseball player and a basketball player are friends than on the fact that one person and another person are friends.
Therefore, in the example, ontology-based constraints are introduced in machine learning based on a knowledge graph to improve the accuracy of vector calculation. Specifically, as illustrated in
First, the information processing apparatus 10 using the first method is described.
As illustrated in
The storage unit 12 stores therein various data, computer programs to be executed by the control unit 20, and the like. For example, the storage unit 12 stores therein a knowledge graph 13 and a model 14. Note that the storage unit 12 can also store therein intermediate data and the like generated while the control unit 20 performs processing.
The knowledge graph 13 is an example of a knowledge base including ontologies and instances.
The class “Person” includes an entity “Ichiro”, an entity “Jiro”, an entity “Hanako”, and an entity “Saburo”. The class “Company” includes an entity “A Corp.”, an entity “B Corp.”, and an entity “C Corp.”. The class “SportsClub” includes an entity “A Team”.
The entity “Ichiro” is connected to the entity “A Corp.” by a relation “affiliation”, and is connected to the entity “A Corp.” by a relation “member”. The entity “Jiro” is connected to the entity “B Corp.” by a relation “affiliation”, and is connected to the entity “B Corp.” by a relation “member”. The entity “Hanako” is connected to the entity “C Corp.” by a relation “affiliation”. The entity “Saburo” is connected to the entity “A Team” by a relation “member”.
The model 14 is a model used for machine learning of vector representations. For example, the model 14 is a translation-based model for complementing the knowledge graph, and is a model for obtaining vectors of continuous values indicating entities and relations.
The control unit 20 is a processing unit that controls the entire information processing apparatus 10, and includes an acquisition unit 21, a determination unit 22, a generation unit 23, and a prediction unit 24.
The acquisition unit 21 acquires the knowledge graph 13 and stores the knowledge graph 13 in the storage unit 12. For example, the acquisition unit 21 acquires the knowledge graph 13 from a designated acquisition destination, and acquires the knowledge graph 13 transmitted from the administrator terminal and the like.
When determining the entailment relation of each class in the knowledge graph, the determination unit 22 performs an entailment determination by limiting the range of the entailment determination based on the class pair. Specifically, the determination unit 22 specifies a first triple and a second triple included in the knowledge graph. Then, the determination unit 22 compares a first relation between first two entities included in the first triple with a second relation between second two entities included in the second triple according to the occurrence status of each of a plurality of relations between a plurality of entities in a specific set of classes included in the knowledge graph 13. Then, the determination unit 22 determines which of the first triple and the second triple is associated with more specific information.
For example, the determination unit 22 enumerates triples belonging to a certain class pair, and, for all relations included in the enumerated triples, enumerates subject-predicate pairs having the relations. Then, for each pair, the determination unit 22 determines that an entailment relation holds when subject-predicate combinations of one relation include all subject-predicate combinations of the other relation.
When
When it is determined that the first triple is associated with the more specific information, the generation unit 23 performs machine learning based on the knowledge graph 13 and the model 14 under the constraint that the difference in vectors representing elements of the first triple is smaller than the difference in vectors representing elements of the second triple, and generates vectors of entities and vectors of relations.
For example, when there is a relation q between given entities e1 and e2 (e1 belongs to C1 and e2 belongs to C2) belonging to a class pair (C1, C2) and there is always a relation r (r entails q), the generation unit 23 updates each vector so that the score of a triple (e1, q, e2) is higher than the score of a triple (e1, r, e2).
For example, for the triple (Ichiro, affiliation, A Corp.) and the triple (Ichiro, member, A Corp.), the generation unit 23 updates vectors so that “v(Ichiro)+v(member) is closer to v(A Corp.) than v(Ichiro)+v(affiliation)”. Similarly, for the triple (Jiro, affiliation, B Corp.) and the triple (Jiro, member, B Corp.), the generation unit 23 updates vectors so that “v(Jiro)+v(member) is closer to v(B Corp.) than v(Jiro)+v(affiliation)”.
In this way, for each triple of each class pair, the generation unit 23 performs machine learning based on the knowledge graph 13 and the model 14 to generate vectors of entities and vectors of relations. Various methods including a gradient method and the like can be used as a machine learning method.
The prediction unit 24 performs link prediction, relation extraction, class prediction, and the like by using the model 14 and the like. Specifically, the prediction unit 24 predicts the vector (end point) by inputting the vector (starting point) and the vector (relation) to the model 14. The prediction unit 24 also predicts the vector (relation) by inputting the vector (starting point) and the vector (end point) to the model.
For example, when predicting an entity connected to the entity “Ichiro” by a relation “brotherOf”, the prediction unit 24 inputs the vector “v(Ichiro)” of the entity “Ichiro” and the vector “v(brotherOf)” of the relation “brotherOf” to the model 14. Then, the prediction unit 24 acquires, as a prediction result, a result that is output by the execution of a vector operation “v(Ichiro)+v(brotherOf)” and the like by the model 14. Then, the prediction unit 24 stores the prediction result in the storage unit 12, displays the prediction result on a display and the like, or transmits the prediction result to the administrator terminal.
Subsequently, the generation unit 23 acquires the triple (e1, r, e2) from the knowledge graph (S104), and determines whether the vector magnitude “∥e1+r−e2∥” of the triple is greater than a threshold value (Margin) (S105).
When “∥e1+r−e2∥” is greater than the threshold value (Yes at S105), the generation unit 23 updates the vectors of “e1, r, e2” so that the vector difference (e1+r−e2) is closer to 0 (S106).
After S106 is performed or when “∥e1+r−e2∥” is less than the threshold value (No at S105), the generation unit 23 acquires the relation q that entails the relation r or is entailed by the relation r (S107).
When the relation r entails the relation q (Yes at S108), the generation unit 23 updates the vectors of “e1, r, e2” so that the vector difference (e1+r−e2) is greater than the score of the vector difference (e1+q−e2) (S109).
On the other hand, when the relation r does not entail the relation q (No at S108), the generation unit 23 updates the vectors of “e1, r, e2” so that the vector difference (e1+r−e2) is smaller than the vector difference (e1+q−e2) (S110).
Subsequently, the generation unit 23 terminates the process when there is no vector to be updated or when the process has been repeated a prescribed number of times (Yes at S111). When there is a vector to be updated or when the number of times of execution is less than the prescribed number (No at S111), the generation unit 23 repeats S104 and subsequent steps.
As described above, by determining the entailment of a relation for each class pair, the information processing apparatus 10 according to the first example can appropriately distinguish an entailment relation of the relationship between entities and reflect the entailment relation in machine learning with respect to relations used between a plurality of class pairs even though they have a relation with a low abstraction level. As a result, the information processing apparatus 10 can generate highly accurate vector representations.
Next, in the second example, the second method using a class hierarchy is described. The functional configuration of the information processing apparatus 10 according to the second example is the same as in the first example, so a detailed description thereof is omitted. The information processing apparatus 10 according to the second example applies constraints using the class hierarchy during machine learning of vector representations.
Specifically, the determination unit 22 compares a first entity connected to any one of the first two entities with a second entity connected to any one of the second two entities and determines which of the first triple and the second triple is associated with more specific information.
When it is determined that the first triple is associated with the more specific information, the generation unit 23 performs machine learning based on the knowledge graph 13 and the model 14 under the constraint that the difference in vectors representing elements of the first triple is smaller than the difference in vectors representing elements of the second triple. Specifically, when a class C′1 is a subconcept of a class C1 and a class C′2 is a subconcept of a class C2, the generation unit 23 updates vectors of entities (e1, e2) belonging to (C1, C2) and entities (e1′, e2′) belonging to (C′1, C′2) so that the score of a triple (e1′, r, e2′) is higher than the score of the triple (e1, r, e2).
The second example is described in detail with reference to
The knowledge graph also includes an entity “Taro”, an entity “Ichiro”, an entity “Hanako”, and an entity “Jiro” as an instance. The entity “Taro” and the entity “Ichiro” belong to the class “Person” and have a relation “friend”. The entity “Hanako” belong to the class “Teacher” and the entity “Jiro” belong to the class “Doctor” and have a relation “friend”.
In the case of
The generation unit 23 acquires a triple t (e1, r, e2) from the knowledge graph (S203), and determines whether the vector magnitude “∥e1+r−e2∥” of the triple t is greater than the threshold value (Margin) (S204).
When “∥e1+r−e2∥” is greater than the threshold value (Yes at S204), the generation unit 23 updates the vectors of “e1, r, e2” so that the vector difference (e1+r−e2) is closer to 0 (S205).
After S205 is performed or when “∥e1+r−e2∥” is less than the threshold value (No at S204), the generation unit 23 acquires a triple t′ (e1′, r, e2′) having an upper-lower relation with the triple t from the knowledge graph (S206).
When the triple t′ is an upper triple of the triple t (Yes at S207), the generation unit 23 updates the vectors of “e1, e2, e1′, e2′, r” so that the vector difference (e1′+r−e2′) is greater than the score of the vector difference (e1+r−e2) (S208).
On the other hand, when the triple t′ is a lower triple of the triple t (No at S207), the generation unit 23 updates the vectors of “e1, e2, e1′, e2′, r” so that the vector difference (e1′+r−e2′) is smaller than the score of the vector difference (e1+r−e2) (S209).
Subsequently, the generation unit 23 terminates the process when there is no vector to be updated or when the process has been repeated a prescribed number of times (Yes at S210). When there is a vector to be updated or when the number of times of execution is less than the prescribed number (No at S210), the generation unit 23 repeats S203 and subsequent steps.
As described above, the information processing apparatus 10 according to the second example can generate highly accurate vector representations by performing machine learning with an emphasis on a more specific relation between entities even though they are in the same relation.
Although the examples of the present invention have been described so far, the present invention may be carried out in various different forms in addition to the examples described above.
The knowledge graphs, entity examples, class examples, relation examples, numerical value examples, threshold values, display examples, and the like used in the above examples are merely examples, and can be changed as desired. The first method described in the first example and the method described in the second example can also be used in combination.
In each of the above examples, an example of performing machine learning using TransE has been described; however, the present invention is not limited thereto and other machine learning models can be employed. Therefore, flowcharts for the first example and the second example are described when a generic model is used.
Specifically, S301 to S304 in
When the score function (f(e1, r, e2)) is greater than the threshold value (Yes at S305), the generation unit 23 updates the vectors of “e1, r, e2” so that the score function (f(e1, r, e2)) is closer to 0 (S306).
After S306 is performed or when the score function (f(e1, r, e2)) is less than the threshold value (No at S305), the generation unit 23 acquires the relation q that entails the relation r or is implied by the relation r (S307).
When the relation r entails the relation q (Yes at S308), the generation unit 23 updates the vectors of “e1, r, e2” so that the score function (f(e1, r, e2)) is greater than a score function (f(e1, q, e2)) (S309).
On the other hand, when the relation r does not entail the relation q (No at S308), the generation unit 23 updates the vectors of “e1, r, e2” so that the score function (f(e1, r, e2)) is smaller than the score function (f(e1, q, e2)) (S310).
Subsequently, the generation unit 23 terminates the process when there is no vector to be updated or when the process has been repeated a prescribed number of times (Yes at S311). When there is a vector to be updated or when the number of times of execution is less than the prescribed number (No at S311), the generation unit 23 repeats S304 and subsequent steps.
Specifically, S401 to S403 in
When the score function (f(e1, r, e2)) is greater than the threshold value (Yes at S404), the generation unit 23 updates the vectors of “e1, r, e2” so that the score function (f(e1, r, e2)) is closer to 0 (S405).
After S405 is performed or when the score function (f(e1, r, e2)) is less than the threshold value (No at S404), the generation unit 23 acquires the triple t′ (e1′, r, e2′) having an upper-lower relation with the triple t from the knowledge graph (S406).
When the triple t′ is an upper triple of the triple t (Yes at S407), the generation unit 23 updates the vectors of “e1, e2, e1′, e2′, r” so that the score function (f(e1′, r, e2′)) is greater than the score function (f(e1, r, e2)) (S408).
On the other hand, when the triple t′ is a lower triple of the triple t (No at S407), the generation unit 23 updates the vectors of “e1, e2, e1′, e2′, r” so that the score function (f(e1′, r, e2′)) is smaller than the score function (f(e1, r, e2)) (S409).
Subsequently, the generation unit 23 terminates the process when there is no vector to be updated or when the process has been repeated a prescribed number of times (Yes at S410). When there is a vector to be updated or when the number of times of execution is less than the prescribed number (No at S410), the generation unit 23 repeats S403 and subsequent steps.
As described above, the information processing apparatus 10 can apply the above first method and second method to widely used machine learning models, thus improving versatility.
The processing procedures, control procedures, specific names, and information including various data and parameters illustrated in the above documents and drawings may be changed as desired, unless otherwise noted.
Furthermore, each component of each apparatus illustrated in the drawings is functionally conceptual and does not necessarily have to be physically configured as illustrated in the drawings. That is, specific forms of distribution and integration of the apparatuses are not limited to those illustrated in the drawings. In other words, all or some of the apparatuses can be functionally or physically distributed and integrated in desired units according to various loads, usage conditions, and the like.
Moreover, each processing function performed by each apparatus can be implemented in whole or in part by a CPU and a computer program that is analyzed and executed by the CPU, or as hardware using wired logic.
The communication device 10a is a network interface card or the like, and communicates with other devices. The HDD 10b stores therein computer programs and DBs that operate the functions illustrated in
The processor 10d reads, from the HDD 10b or the like, a computer program for executing the same process as that of each processing unit illustrated in
In this way, the information processing apparatus 10 operates as an information processing apparatus that performs the machine learning method by reading and executing the computer programs. The information processing apparatus 10 can also read the above computer programs from a recording medium by a medium reading device and executes the read computer programs, thereby implementing the same functions as in the above examples. Note that other computer programs referred to in the examples are not limited to being executed by the information processing apparatus 10. For example, the present invention can be applied in the same way even when other computers or servers execute the computer programs or even when they execute the computer programs in cooperation with each other.
The computer program can be distributed via a network such as the Internet. The computer program can also be executed by being recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a CD-ROM, MO (Magneto-Optical disk), and a digital versatile disc (DVD) that can be read by a computer, and being read from the recording medium by the computer.
According to the embodiments, a highly accurate vector representation can be generated.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation of International Application No. PCT/JP2020/029718, filed on Aug. 3, 2020, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2020/029718 | Aug 2020 | US |
Child | 18093342 | US |