The present invention relates to a word vector changing device, method and program that, given a set of pairs of a word and a vector representing a concept of the word, converts the vectors of words such that a distance between the vectors of words in a semantically distant word pair which is present in a dictionary becomes greater than before, that the distance between the vectors of words in a semantically close word pair present in the dictionary becomes smaller than before, and that the distance between the vectors of words in a word pair that is not present in the dictionary is changed as little as possible.
A concept base is a set of pairs of a word and a vector representing the concept of the word and approaches to it include ones described in Non-Patent Literatures 1 and 2.
These approaches both generate the vectors of words using a corpus as input, producing an arrangement in which the vectors of semantically close words lie close to each other. Their generation algorithms are based on a distribution hypothesis that the concept of a word can be estimated from an occurrence pattern of surrounding words around the word (marginal distribution) within a corpus.
Using a concept base generated by these approaches, a distance representing similarity between texts can be calculated. For a given text, a vector for the text is generated by combining the vectors of words in the text (e.g., by determining a barycenter of the word vectors). The distance between the corresponding text vectors is calculated as the distance between the texts.
Non-Patent Literature 1: Bessho Katsuji, Uchiyama Toshiro, Uchiyama Tadasu, Kataoka Ryoji and Oku Masahiro, “A Method for Generating Corpus-concept-base Based on Co-occurrences between Words and Semantic Attributes,” Information Processing Society of Japan Transaction, Vol.49, No.12, pp.3997-4006, December 2008.
Non-Patent Literature 2: Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, “Efficient estimation of word representations in vector space,”ICLR, 2013.
A concept base generated by the foregoing conventional approaches has the following issues.
Issue 1)
Since a pair of antonyms (e.g., yoi (good), warui (bad)) has similar marginal distributions for the respective words, it has an issue of the vectors of the words being close to each other. This leads to an inappropriate distance relationship between text vectors.
For example, with respect to the word “yoi (good)”, its antonym “warui (bad)” will have a smaller vector-to-vector distance than a synonym “yoroshii (all right)”. Consequently, for text A below, text B will have a smaller vector-to-vector distance than text C even though text C has a closer meaning to text A than text B does.
Text A: Kono kangae wa yoi (this idea is good).
Text B: Kono kangae wa warui (this idea is bad).
Text C: Kono kangae wa yoroshii (this idea is all right).
Issue 2)
For a pair of words, when one of the words is not a hypernym, a hyponym, or a synonym of the other and they have a common hypernym, the pair is called a pair of co-hyponyms. For a pair of words “yakyu (baseball), sakka (soccer)”, either one of the words is not a hypernym, a hyponym, or a synonym of the other and they have a common hypernym “supotsu (sport)”, so they constitute a pair of co-hyponyms.
Since a pair of co-hyponyms (e.g., yakyu (baseball), sakka (soccer)) has similar marginal distributions for the respective words, it has an issue of the vectors of the words being close to each other. This leads to an inappropriate distance relationship between text vectors.
For example, with respect to the word “yakyu (baseball)”, a co-hyponym of it “sakka (soccer)” will have a smaller vector-to-vector distance than a hyponym “kusayakyu (sandlot baseball)”. Consequently, for text A below, text B will have a smaller vector-to-vector distance than text C even though text C has a closer meaning to text A than text B does.
Text A: Yakyu wo miru (watch a baseball).
Text B: Sakka wo miru (watch a soccer).
Text C: Kusayakyu wo miru (watch a sandlot baseball).
Issue 3)
A pair of synonyms (e.g., yakyu, besuboru (transliteration of “baseball”)) has the issue of the vectors of the words being distant from each other in some cases. This leads to an inappropriate distance relationship between text vectors.
For example, with respect to the word “yakyu (baseball)”, a synonym of it “besuboru” will have a larger vector-to-vector distance than a hyponym “kusayakyu (sandlot baseball)”. Consequently, for text A below, text C will have a smaller vector-to-vector distance than text B even though text B has a closer meaning to text A than text C does.
Text A: Yakyu wo miru (watch a baseball).
Text B: Besuboru wo miru (watch a besuboru).
Text C: Kusayakyu wo miru (watch a sandlot baseball).
The present invention seeks to solve these issues and makes an arrangement of word vectors reflect semantic closeness between words by converting the vectors of words in a once generated concept base. More specifically, it performs conversion such that the distance between the vectors of words in a semantically distant word pair which is present in a dictionary, such as antonyms and co-hyponyms, becomes greater than before, and that the distance between the vectors of words in a semantically close word pair present in the dictionary, such as synonyms, becomes smaller than before. However, performing only it results in an unreasonable increase or decrease in the distance between vectors for a word pair that is not present in the dictionary, making their arrangement inappropriate. Thus, it is necessary to convert the vectors of words so that a word pair present in the dictionary is given an appropriate distance, while at the same time so that the distance of a word pair that is not present in the dictionary is changed as little as possible. An object of the present invention is to arrange all words so that the distance of a given word pair will be appropriate.
To attain the object, a word vector changing device according to the present invention includes: a concept base which is a set of pairs of a word and a vector representing a concept of the word; and a conversion means configured: with a dictionary which is a set of semantically distant or close word pairs as input, when a word pair C being a pair of given words A, B in the concept base is present in the dictionary, to associate with the word pair C a magnitude D of a difference vector between a difference vector V′ between a converted vector of the word A and a converted vector of the word B, and a vector kV determined by multiplying a difference vector V between the vector of the word A in the concept base and the vector of the word B in the concept base by a scalar value k, when the word pair C is not present in the dictionary, to associate the magnitude D of the difference vector between the difference vector V′ and the difference vector V with the word pair C, and to convert the vector of a given word in the concept base such that a total sum of the magnitude D corresponding to every word pair C is minimized.
The conversion means of the word vector changing device according to the present invention sets the scalar value k to a value equal to or greater than 1 when word pair C is a semantically distant word pair in the dictionary, and sets the scalar value k to a value equal to or greater than 0 and equal to or smaller than 1 when the word pair C is a semantically close word pair in the dictionary.
A word vector changing method according to the present invention is in a word vector conversion device including a concept base which is a set of pairs of a word and a vector representing a concept of the word, the method including, with a dictionary which is a set of semantically distant or close word pairs as input, when a word pair C being a pair of given words A, B in the concept base is present in the dictionary, associating, by conversion means, with the word pair C a magnitude D of a difference vector between a difference vector V′ between a converted vector of the word A and a converted vector of the word B, and a vector kV determined by multiplying a difference vector V between the vector of the word A in the concept base and the vector of the word B in the concept base by a scalar value k, when the word pair C is not present in the dictionary, associating the magnitude D of the difference vector between the difference vector V′ and the difference vector V with the word pair C, and converting the vector of a given word in the concept base such that a total sum of the magnitude D corresponding to every word pair C is minimized.
A program according to the present invention is a program for causing a computer to function as the conversion means of the word vector changing device according to the present invention.
With the present invention, the difference vector V′ after conversion is approximately equal to kV, which is k times the difference vector V before conversion. For a semantically distant word pair present in the dictionary such as antonyms and co-hyponyms, setting k to k>1 results in their vector-to-vector distance after conversion being greater than that before conversion. For a semantically close word pair present in the dictionary such as synonyms, setting k to k<1 results in their vector-to-vector distance after conversion being smaller than that before conversion. For a word pair that is not present in the dictionary, their vector-to-vector distance after conversion is changed little from that before conversion. In this manner, a converted concept base that achieves an arrangement of all the words with a given word pair having an appropriate distance can be generated.
A certain concept base has a property that the difference vectors between the vectors of words from word pairs that are in the same relationship are approximately the same. That is, when the vector of word x is represented as Ux,
U
a
−U
b
≈U
c
−U
d [Formula 1]
holds for a word pair (a, b) and a word pair (c, d) which are in the same relationship. For example, a word pair (otoko (man), onnna (woman)) and a word pair (oji (uncle), oba (aunt)) are in the same relationship, where
U
otoko
−U
onna
≈U
oji
−U
oba [Formula 2]
holds.
In the present invention, when the converted vector of word x is represented as Ux′, for a word pair (a, b) and a word pair (c, d) that are in the same relationship,
U
a
−U
b
≈U
c
−U
d [Formula 3]
holds, while
U
a
′−U
b
′≈k(Ua−Ub), Uc′−Ud′≈k(Uc−Ud) [Formula 4]
holds (where k depending on the word pair). For all of the word pairs that are in the same relationship, the magnitude of their corresponding difference vector before conversion are approximately the same. Then, all of the word pairs that are in the same relationship are either semantically distant, or semantically close, or nor semantically distant nor close. The value k is usually defined depending on the magnitude of the difference vector before conversion. Accordingly, the same value of k can be set for each one of word pairs that are in the same relationship. Thus, for a word pair (a, b) and a word pair (c, d) that are in the same relationship,
U
a
′−U
b
′≈U
c
′−U
d′ [Formula 5]
holds. Thus, the present invention also has the effect of being able to maintain the property that the difference vectors between the vectors of words from word pairs that are in the same relationship are approximately the same, as much as possible even after conversion.
Use of a converted concept base for the calculation of an inter-text distance provides a more appropriate distance relationship between text vectors.
In the case of Issue 1), for the word “yoi (good)”, its antonym “warui (bad)” will have a greater vector-to-vector distance than a synonym “yoroshii (all right)”, so text C will have a smaller vector-to-vector distance to text A than text B does.
In the case of Issue 2), for the word “yakyu (baseball)”, a co-hyponym of it “sakka (soccer)” will have a greater vector-to-vector distance than a hyponym “kusayakyu (sandlot baseball)”, so text C will have a smaller vector-to-vector distance to text A than text B does.
In the case of Issue 3), for the word “yakyu (baseball)”, a synonym of it “besuboru (transliteration of “baseball”)” will have a smaller vector-to-vector distance than a hyponym “kusayakyu (sandlot baseball)”, so text B will have a smaller vector-to-vector distance to text A than text C does.
Embodiments of the present invention are described below in conjunction with the drawings.
<Configuration of a Word Vector Changing Device According to an Embodiment of the Present Invention>
The word vector changing device 100 includes a concept base 22 which is a set of pairs of a word and a vector representing a concept of the word, and conversion means 30 configured: with a dictionary 24 which is a set of semantically distant or close word pairs as input, when a word pair C being a pair of given words A, B in the concept base 22 is present in the dictionary 24, to associate with the word pair C a magnitude D of a difference vector between a difference vector V′ between a converted vector of the word A and a converted vector of the word B, and a vector kV determined by multiplying a difference vector V between the vector of the word A in the concept base 22 and the vector of the word B in the concept base 22 by a scalar value k; when the word pair C is not present in the dictionary 24, to associate the magnitude D of the difference vector between the difference vector V′ and the difference vector V with the word pair C; and to convert the vector of a given word in the concept base 22 such that a total sum of the magnitude D corresponding to every word pair C is minimized, and to generate a converted concept base 32.
None of the words in the concept base 22 overlap.
The vector of each word is an n-dimension vector and the vectors of semantically close words are arranged close to each other.
Only content words, such as nouns, verbs and adjectives, may be registered in the concept base 22. It is also possible to register words in the end-form (shushi-kei) in the concept base 22 and search the concept base 22 with the end-form of a word.
As an example of the dictionary 24, it can be configured such that records are divided into record groups related to antonyms, co-hyponyms, and synonyms, respectively.
The dictionary 24 is not limited to the example above, but may be formed from a group of records each consisting of a basis word and a list of semantically distant words for that basis word and a group of records each consisting of a basis word and a list of semantically close words for that basis word. In that case, a pair of a basis word and each word in its list of semantically distant words constitutes a semantically distant word pair, and a pair of a basis word and each word in its list of semantically close words constitutes a semantically close word pair.
The dictionary 24 is typically configured such that, in records of similar types (such as being semantically distant or semantically close) in the dictionary 24, when there is a record for basis word A that has word B in its word list, there will be a record for basis word B that has word A in its word list.
Conversion processing performed by the conversion means 30 is formulated as follows.
A list of words in the concept base 22 is defined as:
W1, W2, . . . , Wm. [Formula 6]
A vector of Wp in the concept base 22 is defined as:
τp=(τp1, τp2, . . . , τpn). [Formula 7]
and a vector of Wp in the converted concept base 32 is defined as:
ωp=(ωp1, ωp2, . . . , ωpn). [Formula 8]
The value τpq is a constant and ωpq is a variable.
For a word pair Wi,Wj in the concept base 22, a scalar value k by which a difference vector τi−τj between τi and τj is multiplied generally depends on {i,j} and is represented as k{i,j}.
The conversion means 30 determines (ωpq) that minimizes the objective function F below.
∥◯∥ [Formula 9]
represents L2 norm.
Alternatively, (ωpq) that minimizes the following objective function F is determined.
Alternatively, F(i,j) is set as some other magnitude of:
(ωi−ωj)−k{i,j}(τi−τj) [Formula 12],
and (ωpq) that minimizes the objective function F which is the total sum of F(i,j) is determined.
The value k{i,j} is set as follows.
The value k{i,j} is defined as k{i,j}≥1 when the word pair Wi,Wj is a semantically distant word pair in the dictionary 24, is defined as 0≤k{i,j}≤1 when the word pair Wi,Wj is a semantically close word pair in the dictionary 24, and is defined as k{i,j}=1 when the word pair Wi,Wj is not present in the dictionary 24. While it is defined as k{i,j}≥1 when the word pair Wi,Wj is a semantically distant word pair in the dictionary 24, it may be k{i,j}>1 instead. Likewise, while it is defined as 0≤k{i,j}≤1 when the word pair Wi,Wj is a semantically close word pair in the dictionary 24, it may be 0≤k{i,j}≤1 instead.
When the word pair Wi,Wj is a semantically distant word pair in the dictionary 24, k{i,j} may be a constant that does not depend on {i,j}. It is also possible to define a constant α>0, with
k
{i,j}=(∥τi−τj∥+α)/∥τi−τj∥. [Formula 13]
It is also possible to define a constant β greater than the maximum of:
∥τi−τj∥ [Formula 14]
with
k
{i,j}=β/∥τi−τj∥. [Formula 15]
It is also possible to make k{i,j} smaller as
∥τi−τj∥ [Formula 16]
is greater so that the converted distance is prevented from being too large for a word pair originally having a large distance.
When the word pair Wi,Wj is a semantically close word pair in the dictionary 24, k{i,j} may be a constant that does not depend on {i,j}. It is also possible to define a constant α>0, and when
∥τi−τj∥≥α [Formula 17]
then
k
{i,j}=(∥τi−τj∥−α)/∥τi−τj∥, [Formula 18]
and when
∥τi−τj∥<α [Formula 19]
then
k{i,j}=0 [Formula 20]
It is also possible to define a constant β≥0 equal to or smaller than the minimum of:
∥τi−τj∥, [Formula 21]
with
k
{i,j}=β/∥τi−τj∥. [Formula 22]
It is also possible to make k{i,j} greater as
∥τi−τj∥ [Formula 23]
is smaller so that the converted distance is prevented from being too small for a word pair originally having a small distance.
The value (ωpq) that minimizes the objective function F is determined with stochastic gradient descendant, for example. (Other optimization approaches may be used.) The stochastic gradient descendant employs the algorithm:
An update portion of (m,n) matrix (ωpq) in the above algorithm is performed in the following manner by Adagrad, for example.
Before starting the algorithm, an initial value of the matrix (ωpq) is set for example as:
ωpq:=τpq. [Formula 25]
A (m,n) matrix (rpq) is determined. Before starting the algorithm, an initial value of the matrix (rpq) is set for example as:
rpq:=ϵ(a constant) [Formula 26]
In the updated portion of the (m,n) matrix (ωpq) in the above algorithm, for given
1≤p≤m, 1≤q≤n [Formula 27]
an update is made as:
This update will be called update (1).
When the objective function F is Formula 1, calculation of:
is done as follows.
1.1) When p=i and F{i,j}≠0,
1.2) When p=j and F{i,j}≠0,
1.3) Otherwise,
When the objective function F is Formula 2, calculation of:
is done as follows.
2.1) When p=i
2.2) When p=j,
2.3) Otherwise,
<Effects of the Word Vector Changing Device According to Embodiments of the Present Invention>
S1)
It is determined whether to execute a turn consisting of processing at S2 to S6.
When the turn has been executed a predetermined number of times, it is determined to end and the conversion processing routine is terminated.
When the turn has not been executed the predetermined number of times, it is determined to execute it and the flow proceeds to S2.
Even when the turn has not been executed the predetermined number of times, if the present matrix (ωpq) is the same as the matrix (ωpq) at the immediately preceding S1 (i.e., it has converged), it may be determined to end and the conversion processing routine may be terminated.
When the conversion processing routine is terminated, a set of pairs of each word Wp and its word vector ωp in the concept base 22 is output as the converted concept base 32.
S2)
A list of words in the concept base 22 is represented as W1, W2, . . ., Wm. One word Wx selected from ones of W1, W2, . . . , Wm that are not selected yet at S2 is set as a target word X, and then the flow proceeds to S3. If there is no unselected word, the current turn is terminated and the flow proceeds to S1.
S3)
In a corresponding word list(s) in a record (or possibly multiple records) in the dictionary 24 for which Wx is the basis word, one word Wy is selected from those words that are present in the concept base 22, that are not selected yet at S3, and that satisfy a condition that a pair (as a set) of itself and the word Wx is not a pair (as a set) that has been subjected to processing at S4 or S6 of the current turn, and Wy is set as target word Y, after which the flow proceeds to S4. If there is no such a word, the flow proceeds to S5.
S4)
Setting i=x, j=y when x<y, and i=y, j=x when y<x, the update (1) is performed on {i,j}. In doing so,
∥τi−τj∥ [Formula 37]
may be determined, and based on
∥τi−τj∥, [Formula 38]
an appropriate k{i,j} may be determined, after which the update (1) may be performed. The flow then proceeds to S3.
S5)
Among W1, W2, . . . , Wm, one word Wz is selected from those words that are not Wx, that are not any Wy that was selected at S3, that are not selected yet at S5, and that satisfy a condition that a pair (as a set) of itself and Wx is not a pair (as a set) that has been subjected to processing at S4 or S6 of the current turn, and Wz is set as target word Z, after which the flow proceeds to S6. If there is no such a word, the flow proceeds to S2.
In order to reduce the computational complexity of S6, W1, W2, . . . , Wm excluding Wx may be sorted in ascending order of their distance to Wx at the start of the conversion processing routine, and a word that satisfies the above condition may be selected from among the top G words (in the order of sorting in some cases). As another alternative, among W1, W2, . . . , Wm excluding Wx, a word that satisfies the above condition may be selected from those words whose distance to Wx is equal to or smaller than (or less than) a threshold (in some cases, in the ascending order of distance as sorted).
S6)
Setting i=x, j=z when x<z, and i=z, j=x when z<x, the update (1) is performed on {i,j}. The flow proceeds to S5.
Before performing the conversion processing routine, it is also possible to determine, for each Wx selected at S2, a list of a pair of Wy to be selected at S3 and a distance:
∥τx−τy∥, [Formula 39]
or a pair of Wz to be selected at S5 and a distance:
∥τx−τz∥, [Formula 40]
and based on the distance
∥τx−τy∥, [Formula 41]
to determine the scalar value k corresponding to each word pair (Wx,Wy). Then, in the conversion processing routine, selections at S3 and S5 may be made in accordance with the orders in the list and reference may be made to the previously determined scalar value k instead of calculating it.
The processing described thus far can be constructed into a program, which can be installed from a communications line or a recording medium and implemented by means such as a CPU.
It is noted that the present invention is not limited to the above embodiments but various modifications and applications thereof are possible within the scope of the claims.
The present invention is applicable to a word vector changing technique for, given a set of pairs of a word and a vector representing the concept of the word, converting the vectors of the words so that the distance of a given word pair will be appropriate.
22 concept base
24 dictionary
30 conversion means
32 converted concept base
100 word vector changing device
Number | Date | Country | Kind |
---|---|---|---|
2018-076253 | Apr 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/015025 | 4/4/2019 | WO | 00 |