Embodiments described herein relate generally to a contextual analpredictionysis device, which performs contextual analysis, and a contextual analysis method.
In natural language processing, performing contextual analysis such as anaphora resolution, coreference resolution, and dialog processing is an important task for the purpose of correctly understanding a document. It is a known fact that the use of procedural knowledge, such as the notion of script by Schank and the notion of frame by Fillmore, in contextual analysis proves effective. However, as far as manually-created procedural knowledge is concerned, there is a limitation of coverage. In that regard, there is an attempt to enable automatic acquisition of such procedural knowledge from the document.
For example, a method has been proposed in which a sequence of mutually-related predicates (hereinafter, called an “event sequence”) is treated as procedural knowledge; and event sequences are acquired from an arbitrary group of documents and used as procedural knowledge.
However, event sequences acquired in the conventional manner lack in the accuracy as far as procedural knowledge is concerned. Hence, if contextual analysis is performed using event sequences, then there are times when a sufficient accuracy is not achieved. That situation needs to be improved.
According to an embodiment, a contextual analysis device includes an predicted-sequence generator, a probability predictor, and an analytical processor. The predicted-sequence generator is configured to generate, from a target document for analysis, an predicted sequence in which some elements of a sequence having a plurality of elements arranged therein are obtained by prediction. Each element is a combination of a predicate having a common argument, word sense identification information for identifying word sense of the predicate, and case classification information indicating a type of the common argument. The probability predictor is configured to predict an occurrence probability of the predicted sequence based on a probability of appearance of the sequence that is acquired in advance from an arbitrary group of documents and that is matching with the predicted sequence. The analytical processor is configured to perform contextual analysis with respect to the target document for analysis by using the predicted occurrence probability of the predicted sequence.
An exemplary embodiment of a contextual analysis device and a contextual analysis method is described below with reference to the accompanying drawings. The embodiment described below is an example of application to a device that particularly performs anaphora resolution as contextual analysis.
Anaphora points to a phenomenon in which a particular linguistic expression indicates the same content or the same entity as a preceding expression in the document. While expressing an anaphoric relationship, instead of repeating the same word, either a pronoun is used or the word at trailing positions is omitted. The former method is called pronoun anaphora, while the latter method is called zero anaphora. In regard to pronoun anaphora, predicting the target indicated by the pronoun is anaphora resolution. Similarly, in regard to zero anaphora, complementing the nominal that has been omitted in zero anaphora (i.e., complementing the zero pronoun) is anaphora resolution. Anaphora includes intra-sentential anaphora in which the anaphor such as a pronoun or a zero pronoun indicates the target within the same sentence, and includes inter-sentential anaphora in which the target indicated by the anaphor is present in a different sentence. Generally, anaphora resolution of inter-sentential anaphora is a more difficult task than anaphora resolution of intra-sentential anaphora. In a document, anaphora is found on a frequent basis, and provides significant clues that facilitate understanding of meaning and context. For that reason, as far as natural language processing is concerned, anaphora resolution is a valuable technology.
While performing such anaphora resolution, the use of procedural knowledge proves effective. That is because procedural knowledge can be used as one of the indicators in evaluating the accuracy of anaphora resolution. As a method of automatically acquiring such procedural knowledge, a method is known in which an event sequence, which is a sequence of predicates having a common argument, is acquired from an arbitrary group of documents. This is based on the hypothesis that terms having a common argument are in some kind of relationship with each other. Herein, a common argument is called an anchor.
Herein, regarding an event sequence that is acquired by implementing the conventional method, a specific example is given with reference to example sentences illustrated in
In the example illustrated in
In the conventional method, the predicate is extracted from each of a plurality of sentences that includes the anchor. Then, with each pair of an extracted predicate and case classification information (hereinafter, called a “case type”), which indicates the type of the case of the anchor in that sentence, serving as an element; a sequence is acquired as an event sequence in which a plurality of elements is arranged in order of appearance of the predicates. From the example sentences illustrated in
However, in the event sequence acquired in the conventional method, the same predicate used with different word senses is not distinguished according to the word sense. That leads to a lack of accuracy as far as procedural knowledge is concerned. Regarding a polysemous predicate, sometimes there is a significant change in the meaning depending on the case of the predicate. However, in the conventional method, even if the predicate is used with different word senses, it is not distinguished according to the word sense. Hence, there are times when a case example of an event sequence that is not supposed to be identified gets identified. For example, in the example sentences illustrated in
In that regard, in the embodiment, a new type of event sequence is proposed in which each element constituting the event sequence not only has a predicate and the case classification information attached thereto but also has word sense identification information attached thereto that enables identification of the word sense of that predicate. In this new-type event sequence, because of the word sense identification information attached to each element, it becomes possible to avoid the ambiguity in the word sense of the corresponding predicate. That enables achieving enhancement in the accuracy as far as procedural knowledge is concerned. Thus, when this new-type event sequence is used in anaphora resolution, it becomes possible to enhance the accuracy of anaphora resolution.
In the embodiment, in order to identify the word sense of a predicate, a “case frame” is used as an example. In a case frame, cases acquirable with reference to a predicate and the restrictions related to the values of the cases are written for each category of predicate usage. For example, there exists data of case frames called “Kyoto University Case Frames” (Daisuke Kawahara and Sadao Kurohashi, Case Frame Compilation from the Web using High-Performance Computing, The Information Processing Society of Japan: Natural Language Processing Research Meeting 171-12, pp. 67-73, 2006.), and it is possible to use those case frames.
In
In the case of using the Kyoto University Case Frames, the labels such as “dou2” (v2) and “dou3” (v3), which represent the word senses of a predicate, can be used as the word sense identification information to be attached to each element of the new-type event sequence. In the event sequence in which the elements have the word sense identification information attached thereto, different word sense identification information is attached to the elements of a predicate having different word senses. Hence, it becomes possible to avoid event sequence mix-up caused due to the polysemy of predicates. That enables achieving enhancement in the accuracy as far as procedural knowledge is concerned.
Regarding an event sequence acquired from an arbitrary group of documents, the probability of appearance can be obtained using a known statistical tool and can be used as one of the indicators in evaluating the accuracy of anaphora resolution. In the conventional method, in order to obtain the probability of appearance of an event sequence, point-wise mutual information (PMI) of pairs of elements constituting the event sequence is mainly used. However, in the conventional method of using PMI of pairs of elements, it is difficult to accurately obtain the probability of appearance of the event sequence that is effective as procedural knowledge.
In that regard, in order to obtain the frequency of appearance or the probability of appearance of an event sequence; for example, a number of probability models that have been devised in the field of language models are used. For example, the n-gram model in which the order of elements is taken into account, the trigger model in which the order of elements is not taken into account, and the skip model in which it is allowed to have combinations of elements that are not adjacent to each other are used. Such probability models have the characteristic of being able to handle the probability with respect to sequences having arbitrary lengths. Moreover, in order to deal with unknown event sequences, it is possible to perform smoothing that has been developed in the field of language models.
Given below is the explanation of a specific example of a contextual analysis device according to the embodiment.
The operations performed in the contextual analysis device 100 are broadly divided into three operations, namely, “an event sequence model building operation”, “an anaphora resolution learning operation”, and “an anaphora resolution predicting operation”. In the event sequence model building operation, an event sequence model D2 is generated from an arbitrary document group D1 using the case frame predictor 1 and the event sequence model builder 2. In the anaphora resolution learning operation, training-purpose case example data D4 is generated from an anaphora-tagged document group D3 and the event sequence model D2 using the case frame predictor 1 and the machine-learning case example generator 3, and then an anaphora resolution learning model D5 is generated from the training-purpose case example data D4 using the anaphora resolution trainer 4. In the anaphora resolution predicting operation, prediction-purpose case example data D7 is generated from an analysis target document D6 and the event sequence model D2 using the case frame predictor 1 and the machine-learning case example generator 3, and then an anaphora resolution prediction result D8 is generated from the training-purpose case example data D4 and the anaphora resolution learning model D5 using the anaphora resolution predictor 5.
In the embodiment, for ease of explanation, it is assumed that a binary classifier is used as the technique of machine learning. However, instead of using a binary classifier, it is possible to implement any other known method such as ranking learning as the technique of machine learning.
Firstly, the explanation is given about a brief overview of the three operations mentioned above. At the time of performing the event sequence model building operation in the contextual analysis device 100, the arbitrary document group D1 is input to the case frame predictor 1. Thus, the case frame predictor 1 receives the arbitrary document group D1; predicts, with respect to each predicate included in the arbitrary document group D1, a case frame to which that predicate belongs; and outputs case-frame-information-attached document group D1′ in which case frame information representing a brief overview of the top-k candidate case frames is attached to each predicate. Meanwhile, the detailed explanation of a specific example of the case frame predictor 1 is given later.
Subsequently, the event sequence model builder 2 receives the case-frame-information-attached document group D1′ and acquires a group of event sequences from the case-frame-information-attached document group D1′. Then, with respect to the group of event sequences, the event sequence model builder 2 performs frequency counting and probability calculation and eventually outputs the event sequence model D2. Herein, the event sequence model D2 represents the probability of appearance of each sub-sequence included in the group of event sequences. As a result of using the event sequence model D2, it becomes possible to decide on the probability value of an arbitrary sub-sequence. This feature is used in the anaphora resolution learning operation (described later) and the anaphora resolution learning operation (described later) as a clue for predicting the antecedent probability in anaphora resolution. Meanwhile, the explanation of a specific example of the event sequence model builder 2 is given later in detail.
At the time of performing the anaphora resolution learning operation in the contextual analysis device 100, the anaphora-tagged document group D3 is input to the case frame predictor 1.
Upon receiving the anaphora-tagged document group D3, in an identical manner to receiving the arbitrary document group D1, the case frame predictor 1 predicts, with respect to each predicate included in the anaphora-tagged document group D3, a case frame to which that predicate belongs; and outputs case frame information and anaphora-tagged document group D3′ in which case frame information representing a brief overview of the top-k candidate case frames is attached to each predicate.
Then, the machine-learning case example generator 3 receives the case frame information and the anaphora-tagged document group D3′, and generates the training-purpose case example data D4 from the case frame information and the anaphora-tagged document group D3′ using the event sequence model D2 generated by the event sequence model builder 2. Meanwhile, the detailed explanation of a specific example of the machine-learning case example generator 3 is given later.
Subsequently, the anaphora resolution trainer 4 performs training for machine learning with the training-purpose case example data D4 as the input, and generates the anaphora resolution learning model D5 as the learning result. Meanwhile, in the embodiment, it is assumed that a binary classifier is used as the anaphora resolution trainer 4. Since machine learning using a binary classifier is a known technology, the detailed explanation is not given herein.
In the case of performing the anaphora resolution predicting operation in the contextual analysis device 100, the analysis target document D6 is input to the case frame predictor 1. The analysis target document D6 represents target application data for anaphora resolution. Upon receiving the analysis target document D6, in an identical manner to receiving the arbitrary document group D1 or the anaphora-tagged document group D3, the case frame predictor 1 predicts, with respect to each predicate included in the analysis target document D6, a case frame to which that predicate belongs; and outputs case-frame-information-attached analysis target document D6′ in which case frame information representing a brief overview of the top-k candidate case frames is attached to each predicate.
Then, the machine-learning case example generator 3 receives the case-frame-information-attached analysis target document D6′, and generates the prediction-purpose case example data D7 from the case-frame-information-attached analysis target document D6′ using the event sequence model D2 generated by the event sequence model builder 2.
Subsequently, with the prediction-purpose case example data D7 as the input, the anaphora resolution predictor 5 performs machine learning using the anaphora resolution learning model D5 generated by the anaphora resolution trainer 4; and generates the anaphora resolution prediction result D8 as a result. Generally, this output serves as the output of the application. Meanwhile, in the embodiment, it is assumed that a binary classifier is used as the anaphora resolution predictor 5, and the detailed explanation is not given herein.
Given below is the explanation of a specific example of the case frame predictor 1.
The event noun-to-predicate converter 11 performs an operation of replacing the event nouns included in the pre-case-frame-prediction document D11, which has been input, with predicate expressions. This operation is performed on the backdrop of having a purpose of increasing the case examples of predicates. In the embodiment, the event sequence model builder 2 generates the event sequence model D2, and the machine-learning case example generator 3 generates the training-purpose case example data D4 and the prediction-purpose case example data D7 using the event sequence model D2. At that time, greater the number of case examples of predicates; better becomes the performance of the event sequence model D2. Hence, it becomes possible to generate more suitable training-purpose case example data D4 and more suitable prediction-purpose case example data D7, and to enhance the accuracy of machine learning. Thus, as a result of using the event noun-to-predicate converter 11 for the purpose of replacing the event nouns with predicate expressions, it becomes possible to enhance the accuracy of machine learning.
For example, when the pre-case-frame-prediction document D11 is written in Japanese, the event noun-to-predicate converter 11 performs an operation of substituting nominal verbs for such verbs in the sentences which are formed by adding “suru” (to do) to nouns. More particularly, when a verb formed by adding “suru” to a noun “nichibeikoushou” (Japan-U.S. negotiations) is present in the pre-case-frame-prediction document D11, that verb is replaced with a phrase “nichibei ga koushou suru” (Japan and U.S. hold trade negotiations). In order to perform such an operation, it is necessary to determine whether or not the concerned noun is an event noun and what is the argument of the event noun. Generally, such an operation is a difficult operation to perform. In this regard, however, there exists a corpus such as the NAIST text corpus (http://cl.naist.jp/nldata/corpus/) in which annotations are given about the relationship between the event nouns and the arguments. Using such a corpus, it becomes possible to easily perform the abovementioned operation with the use of annotations. In the example of “nichibeikoushou” (Japan-U.S. trade negotiations), the annotation indicates that “koushou” (negotiations) is an event noun, and the “ga” case argument of “koushou” (negotiations) is “nichibei” (Japan-U.S.).
Meanwhile, the event noun-to-predicate converter 11 is an optional feature that is used as may be necessary. In the case of not using the event noun-to-predicate converter 11, the pre-case-frame-prediction document D11 is input without modification to the case frame parser 12.
The case frame parser 12 detects, from the pre-case-frame-prediction document D11, predicates including the predicates obtained by the event noun-to-predicate converter 11 by converting event nouns; and then predicts the case frames to which the detected predicates belong. As far as Japanese language is concerned, a tool such as KNP (http://nlp.ist.i.kyoto-u.ac.jp/index.php?KNP) has been released that has the function of predicting the case frames to which the predicates in the sentences belong. Thus, KNP is a Japanese syntax/case analysis system that makes use of the Kyoto University Case Frames mentioned above and has the function of predicting the case frames to which the predicates in the sentences belong. In the embodiment, it is assumed that the case frame parser 12 implements an identical algorithm to KNP. Meanwhile, since the case frames predicted by the case frame parser 12 represent only the prediction result, it is not necessary that a single case frame is uniquely determined with respect to a single predicate. In that regard, with respect to a single predicate, the case frame parser 12 predicts the top-k candidate case frames and attaches case frame information, which represents a brief overview of the top-k candidate case frames, as the annotation to each predicate. Meanwhile, “k” is a positive number and, for example, k=5 is set.
The result of having the case frame information, which represents a brief overview of the top-k candidate case frames, attached as the annotation to each predicate detected from the pre-case-frame-prediction document D11 is the post-case-frame-prediction document D12. Moreover, the post-case-frame-prediction document D12 serves as the output of the case frame predictor 1.
Given below is the explanation of a specific example of the event sequence model builder 2.
The event sequence acquiring unit 21 acquires a group of event sequences from the case-frame-information-attached document group D1′. As described above, each event sequence in the group of event sequences acquired by the event sequence acquiring unit 21 is attached with the word sense identification information, which enables identification of predicates, in addition to the conventional event sequence elements. That is, from the case-frame-information-attached document group D1′, the event sequence acquiring unit 21 detects a plurality of predicates having a common argument (the anchor). Then, with respect to each detected predicate, the event sequence acquiring unit 21 obtains, as the element, a combination of the predicate, the word sense identification information, and the case classification information. Subsequently, in order of appearance of the predicates, the event sequence acquiring unit 21 arranges the elements obtained for the predicates in the case-frame-information-attached document group D1′; and obtains an event sequence. Herein, of the case frame information given as the annotation in the case-frame-information-attached document group D1′, the labels enabling identification of the word senses of the predicates are used as the word sense identification information of the elements of the event sequence. For example, in the example of English language; the labels v1, v3, and v7 included in the case frame information illustrated in
Regarding the method by which the event sequence acquiring unit 21 acquires the group of event sequences from the case-frame-information-attached document group D1′, it is possible to implement a method in which a coreference-tag anchor is used or a method in which a surface anchor is used.
Firstly, the explanation is given about the method in which the group of event sequences is acquired using a coreference-tag anchor. In this method, the premise is that the case-frame-information-attached document group D1′ that is input to the event sequence acquiring unit 21 has coreference tags attached thereto. Herein, the coreference tags may be attached from beginning to the arbitrary document group D1 input to the case frame predictor 1, or the coreference tags may be attached to the case-frame-information-attached document group D1′ after it is obtained from the arbitrary document group D1 but before it is input to the event sequence model builder 2.
Given below is the explanation about the coreference tags.
Given below is the explanation of an anchor. As described above, an anchor is a common argument shared among a plurality of predicates. In the case of using coreference tags, a coreference cluster having the size of two or more is searched and the group of nouns included in that coreference cluster is treated as the anchor. As a result of identifying the anchor using coreference tags, it becomes possible to eliminate an inconvenience in which a group of nouns matching on the surface but differing in substance are treated as the anchor or to eliminate an inconvenience in which a group of nouns matching in substance but differing only on the surface are not treated as the anchor.
In the case of acquiring an event sequence using the coreference-tag anchor, the event sequence acquiring unit 21 firstly picks the group of nouns from the coreference cluster and treats the group of nouns as the anchor. Then, from the case-frame-information-attached document group D1′, the event sequence acquiring unit 21 detects the predicate of a plurality of sentences in which the anchor is present, identifies the type of the case of the slot in which the anchor is placed in each sentence, and obtains the case classification information. Subsequently, from the case frame information attached as the annotation to each detected predicate in the case-frame-information-attached group D1′, the event sequence acquiring unit 21 refers to the label that enables identification of the word sense of that predicates and obtains the word sense identification information of the predicate. Then, with respect to each of a plurality of predicates detected from the case-frame-information-attached group D1′, the event sequence acquiring unit 21 obtains, as the element, a combination of the predicate, the word sense identification information, and the case classification information. Subsequently, the event sequence acquiring unit 21 arranges the elements in order of appearance of the predicates in the case-frame-information-attached document group D1′ and obtains an event sequence. Meanwhile, in the embodiment, as described above, the case frame information of the top-k candidates is attached to a single predicate. For that reason, a plurality of sets of word sense identification information is obtained with respect to a single predicate. Hence, in each element constituting the event sequence, a plurality of combination candidates (element candidates) is present differing only in the word sense identification information.
The event sequence acquiring unit 21 performs the operations described above with respect to all coreference clusters, and obtains a group of event sequences that represents the set of anchor-by-anchor event sequences.
Given below is the explanation of a method of acquiring an event sequence using a surface anchor. In this method, there is no assumption that the case-frame-information-attached document group D1′ that is input to the event sequence acquiring unit 21 has coreference tags attached thereto. Instead, it is considered that, in the case-frame-information-attached document group D1′ that is input to the event sequence acquiring unit 21, the nouns matching on the surface have coreference relationship. For example, in the example of English sentences illustrated in
With respect to each event sequence acquired by the event sequence acquiring unit 21, the event sub-sequence counter 22 counts the frequency of appearance of each sub-sequence in that event sequence. A sub-sequence is a partial set of N number of elements from among the elements included in the event sequence, and forms a part of the event sequence. Thus, a single event sequence includes a plurality pf sub-sequences according to the combination of N number of elements. Herein, “N” represents the length of a sub-sequence (the number of elements constituting a sub-sequence). Moreover, the number of sub-sequences is set to a suitable number from the perspective of treating the sub-sequences as procedural knowledge.
With respect to the sub-sequence that includes the leading element of the event sequence; it is possible to use <s>, which represents a space, in one or more elements anterior to that sub-sequence so that the sub-sequence has N number of elements including the spaces <s>. With that, it becomes possible to express that the leading element of the event sequence is appearing at the start of the event sequence. Similarly, with respect to the sub-sequence having the last element of the event sequence; it is possible to use <s>, which represents a space, in one or more elements posterior to that sub-sequence so that the sub-sequence has N number of elements including the spaces <s>. With that, it becomes possible to express that the leading element of the event sequence is appearing at the end of the event sequence.
Meanwhile, in the embodiment, the configuration is such that the group of event sequences is acquired from the case-frame-information-attached document group D1′ without limiting the number of elements, and subsets of N number of elements are picked from each event sequence. However, alternatively, at the time of acquiring the group of event sequences from the case-frame-information-attached group D1′, it is possible to have a limitation that each event sequence includes only N number of elements. In this case, the event sequences that are acquired from the case-frame-information-attached group D1′ themselves serve as the sub-sequences. In other words, when the event sequences are acquired without any limit on the number of elements, the sub-sequences picked from those event sequences are equivalent to the event sequences that are acquired under a limitation on the number of elements.
As far as the methods of obtaining sub-sequences from an event sequences are concerned, one method is to obtain the subsets of adjacent N number of elements of the event sequence, while the other method is to obtain subsets of N number of elements without imposing the restriction that the elements need to be adjacent. The model for counting the frequency of appearance of the sub-sequences obtained according to the latter method is particularly called the skip model. Since the skip model allows combinations of non-adjacent elements, it offers a merit of being able to deal with sentences in which there is a temporary break in context due to, for example, interrupts.
With respect to each event sequence acquired by the event sequence acquiring unit 21, the event sub-sequence counter 22 picks all sub-sequences having the length N. Then, for each type of sub-sequences, the event sub-sequence counter 22 counts the frequency of appearance. That is, from among the group of sub-sequences that represents the set of all sub-sequences picked from an event sequence, the event sub-sequence counter 22 counts the frequency at which the sub-sequences having the same arrangement of elements appear. When counting of the frequency of appearance of the sub-sequences is performed for all event sequences, the event sub-sequence counter 22 outputs a frequency list that contains the frequency of appearance for each sub-sequence.
However, as described above, each element constituting an event sequence has a plurality of element candidates differing only in the word sense identification information. For that reason, the frequency of appearance of sub-sequences needs to be counted for each combination of element candidates. In order to obtain the frequency of appearance for each combination of element candidates with respect to a single sub-sequence; for example, a value obtained by dividing the number of counts of the frequency of appearance of the sub-sequence by the number of combinations of element candidates can be treated as the frequency of appearance of each combination of element candidates. That is, with respect to each element constituting the sub-sequence, all combinations available upon selecting a single element candidate are obtained as sequences, and the value obtained by dividing the number of counts of the frequency of appearance of the sub-sequence by the number of obtained sequences is treated as the frequency of appearance of each sequence. For example, assume that a sub-sequence A-B includes an element A and an element B; assume that the element A has element candidates a1 and a2; and assume that the element B has element candidates b1 and b2. In this case, the sub-sequence A-B is expanded into four sequences, namely, a1-b1, a2-b1, a1-b2, and a2-b2. Then, the value obtained by dividing the number of counts of the sub-sequence A-B by 4 is treated as the frequency of appearance of each of the sequences a1-b1, a2-b1, a1-b2, and a2-b2. Thus, if the number of counts of the frequency of appearance of the sub-sequence A-B is one, then the frequency of appearance of each of the sequences a1-b1, a2-b1, a1-b2, and a2-b2 is equal to 0.25.
The probability model building unit 23 refers to the frequency list output by the event sub-sequence counter 22, and builds a probability model (the event sequence model D2). Regarding the method by which the probability model building unit 23 builds a probability model, there is the method of using the n-gram model, or the method of using the trigger model in which the order of elements is not taken into account.
Firstly, the explanation is given about the method of building a probability model using the n-gram model. When target sequences for probability calculation are expressed as {x1, x2, . . . , xn} and the frequency of appearance of the sequences is expressed as c(•); then an equation for calculating the probability using the n-gram model is given below as Equation (1).
p(xn|xn-1, . . . ,x1|)=c(x1, . . . ,xn)/c(x1, . . . ,xn-1) (1)
In the case of building a probability model using the n-gram model, the probability model building unit 23 performs calculation according to Expression 1 with respect to all sequences for which the frequency of appearance is written in the frequency list output by the event sub-sequence counter 22; and calculates the probability of appearance for each sequence. Then, the probability model building unit 23 outputs a probability list in which the calculation results are compiled. Moreover, as an optional operation, it is also possible to perform any existing smoothing operation.
Given below is the explanation about the method of building a probability model using the trigger model. When target sequences for probability calculation are expressed as {x1, x2, . . . , xn} and the frequency of appearance of the sequences is expressed as c(•); then an equation for calculating the probability using the n-gram model is given below as Equation (2), which represents the sum of point-wise mutual information (PMI).
In Equation (2), “ln” represents logarithm natural; and the value of p(xi|xj) and p(xj|xi) are obtained from Bigram model: p(x2|x1)=c(x1, x2)/c(x1).
In the case of building a probability model using the trigger model, the probability model building unit 23 performs calculations according to Expression 2 with respect to all sequences for which the frequency of appearance is written in the frequency list output by the event sub-sequence counter 22; and calculates the probability of appearance for each sequence. Then, the probability model building unit 23 outputs a probability list in which the calculation results are compiled. Moreover, as an optional operation, it is also possible to perform any existing smoothing operation. Furthermore, if the length N is set to be equal to two, then the calculation of the sum (in Equation 2, the calculation involving “Σ”) becomes redundant, thereby making Equation 2 equivalent to the conventional calculation using PMI.
Given below is the explanation of a specific example of the machine-learning case example generator 3.
The pair generating unit 31 generates pairs of an anaphor candidate and an antecedent candidate using the case frame information and the anaphora-tagged document group D3′ or using the case-frame-information-attached analysis target document D6′. When the learning operation for anaphora resolution is to be performed, in order to eventually obtain the training-purpose case example data D4, the pair generating unit 31 generates a positive example pair as well as a negative example pair using the case frame information and the anaphora-tagged document group D3′. Herein, a positive example pair represents a pair that actually has an anaphoric relationship, while a negative example pair represents a pair that does not have an anaphoric relationship. Meanwhile, the positive example pair and the negative example pair can be distinguished using anaphora tags.
Explained below with reference to
The pair generating unit 31 generates pairs of all combinations of anaphor candidates and antecedent candidates. However, any antecedent candidate paired with an anaphor candidate needs to be present in the preceding context as compared to that anaphor candidate. From the English sentences illustrated in
Meanwhile, when the prediction operation for anaphora resolution is to be performed, the pair generating unit 31 generates pairs of an anaphor candidate and an antecedent candidate using the case-frame-information-attached target document D6′. In this case, since the case-frame-information-attached target document D6′ does not have anaphora tags attached thereto, the pair generating unit 31 needs to somehow find the antecedent candidates and the anaphor candidates in the sentences. If the case-frame-information-attached target document D6′ is in English; then it is possible to think of a method in which, for example, part-of-speech analysis is performed with respect to the case-frame-information-attached target document D6′, and the words determined to be pronouns are treated as anaphor candidates and all other nouns are treated as antecedent candidates. If the case-frame-information-attached target document D6′ is in Japanese; then it is possible to think of a method in which, for example, predicate argument structure analysis is performed with respect to the case-frame-information-attached target document D6′, the group of predicates is detected, and the slots of requisite cases not filled by any predicate are treated as anaphor candidates and the nouns present in the preceding context to the anaphor candidates are treated as antecedent candidates. Upon finding the antecedent candidates and the anaphor candidates in the abovementioned manner, the pair generating unit 31 obtains a group of pairs of an anaphor candidate and an antecedent candidate in an identical manner to obtaining the group of pairs in the case in which the learning operation for anaphora resolution is to be performed. However, herein, it is not required to attach positive example labels and negative example labels.
With respect to each pair of an anaphor candidate and an antecedent candidate, the predicted-sequence generating unit 32 predicts a case frame to which belongs the predicate in the sentence in which the anaphor candidate is replaced with the antecedent candidate; as well as extracts the predicates in the preceding context with the antecedent candidate serving as the anchor and generates an event sequence described above. In the event sequence generated by the predicted-sequence generating unit 32, a combination of the predicate in the sentences when the anaphor candidate is replaced with the antecedent candidate, the word sense identification information, and the case classification information is the last element of the sequence; and that last element is obtained by means of prediction. Hence, it is called an predicted sequence to differentiate from the event sequence acquired from the arbitrary document group D1.
Given below is the detailed explanation of a specific example of the operations performed by the predicted-sequence generating unit 32. Herein, the predicted-sequence generating unit 32 performs the operations with respect to each pair of an anaphor candidate and an antecedent candidate generated by the pair generating unit 31.
Firstly, with respect to the predicates of the sentences to which the anaphor candidate belongs, the predicted-sequence generating unit 32 assigns not the anaphor candidate but the antecedent candidate as the argument, and then predicts the case frame for the predicates. This operation is performed using an existing case frame parser. However, the case frame parser used herein needs to predict the case frame using the same algorithm as the algorithm of the case frame parser 12 of the case frame predictor 1. Consequently, with respect to a single predicate, case frames of the top-k candidates are obtained. Herein, the case frame of the top-1 candidate is used.
Then, from the case frame information and the anaphora-tagged document group D3′ or from the case-frame-information-attached analysis target document D6′, the predicted-sequence generating unit 32 detects a group of nouns that are present in the preceding context as compared to the antecedent candidate and that have a coreference relationship with the antecedent candidate. The determination of the coreference relationship is either performed using a coreference analyzer, or the nouns matching on the surface are treated to have coreference. The group of nouns obtained in this manner serves as the anchor.
Subsequently, from the case frame information and the anaphora-tagged document group D3′ or from the case-frame-information-attached analysis target document D6′, the predicted-sequence generating unit 32 detects the predicates of the sentences to which the anchor belongs and generates an predicted sequence in an identical manner to the method implemented by the event sequence acquiring unit 21. However, the length of predicted sequence is set to N in concert with the length of the sub-sequences present in the event sequence. That is, as the predicted sequence, a sequence is generated in which the elements corresponding to the predicates in the sentences to which the antecedent candidate belongs are connected to the element corresponding to each of the N−1 number of predicates detected in the preceding context. The predicted-sequence generating unit 32 performs this operation with respect to all pairs of an anaphora candidate and an antecedent candidate generated by the pair generating unit 31, and generates an predicted sequence corresponding to each pair.
The probability predicting unit 33 collates each predicted sequence, which is generated by the predicted-sequence generating unit 32, with the event sequence model D2; and predicts the occurrence probability of each predicted sequence. More particularly, the probability predicting unit 33 searches the event sequence model D2 for the sub-sequence matching with an predicted sequence, and treats the frequency of appearance of that sub-sequence as the occurrence probability of the predicted sequence. The occurrence probability of an predicted sequence represents the probability (likelihood) of the pair of an anaphora candidate and an antecedent candidate used in generating the predicted sequence to have a coreference relationship. Meanwhile, if no sub-sequence in the event sequence model D2 is found to match with an predicted sequence, then the occurrence probability of that predicted sequence is set to zero. Moreover, if a smoothing operation has been performed while generating the event sequence model D3; then it becomes possible to reduce the occurrence of a case in which a matching sub-sequence to an predicted sequence is not found.
The feature vector generating unit 34 treats the pairs of an anaphora candidate and an antecedent candidate, which are generated by the pair generating unit 31, as case examples; and, with respect to each case example, generates a feature vector in which the occurrence probability of the predicted sequence generated by the predicted-sequence generating unit 32 is added as one of the elements (one of the features). Thus, in addition to using a standard group of features that is generally used as the elements of a feature vector representing the pair of an anaphor candidate and an antecedent candidate, that is, in addition to using a group of features illustrated in
In the case in which the prediction operation for anaphora resolution is to be performed, the feature vector generated by the feature vector generating unit 34 becomes the prediction-purpose case example data D7 that is the final output of the machine-learning case example generator 3. Moreover, in the case of performing the learning operation for anaphora resolution, when the positive example label or the negative example label, which has been attached to the pair of an anaphora candidate and the antecedent candidate, is added to the feature vector generated by the feature vector generating unit 34; the result becomes the training-purpose case example data D4 that is the final output of the machine-learning case example generator 3.
The training-purpose case example data D4 that is output from the machine-learning case example generator 3 is input to the anaphora resolution trainer 4. Then, using the training-purpose case example data D4, the anaphora resolution trainer 4 performs machine learning with a binary classifier and generates the anaphora resolution learning model D5 serving as the learning result. Moreover, the prediction-purpose case example data D7 that is output from the machine-learning case example generator 3 is input to the anaphora resolution predictor 5. Then, using the anaphora resolution learning model D5 and the prediction-purpose case example data D7 generated by the anaphora resolution trainer 4, the anaphora resolution predictor 5 performs machine learning with a binary classifier and outputs the anaphora resolution prediction result D8.
The training for machine learning as performed by the anaphora resolution trainer 4 indicates the operation of obtaining the weight vector W using the training-purpose case example data D4. That is, the anaphora resolution trainer 4 is provided with, as the training-purpose case example data D4, the feature vector X of the case example and a positive example label or a negative label indicating the result of threshold value comparison of the score value y of the case example; and obtains the weight vector W using the provided information. The weight vector W becomes the anaphora resolution learning model D5.
The machine learning performed by the anaphora resolution predictor 5 includes calculating the score value y of the case example using the weight vector W provided as the anaphora resolution learning model D5 and using the feature vector X provided as the prediction-purpose case example data D7; comparing the score value y with a threshold value; and outputting the anaphora resolution prediction result D8 that indicates whether or not the case example is correct.
As described above in detail with reference to specific examples, in the contextual analysis device 100 according to the embodiment, anaphora resolution is performed using not only the predicate and the case classification information but also a new-type event sequence that is a sequence of elements that additionally include the word sense identification information which enables identification of the word sense of the predicate. For that reason, it becomes possible to perform anaphora resolution with accuracy.
Moreover, in the contextual analysis device 100 according to the embodiment, an event sequence is acquired that is a sequence of elements having a plurality of element candidates differing only in the word sense identification information; the frequency of appearance of the event sequence is calculated for each combination of element candidates; and the probability of appearance of the event sequence is calculated for each combination of element candidates. Hence, during case frame prediction, it becomes possible to avoid the cutoff phenomenon that occurs when only the topmost word sense identification information is used. That enables achieving enhancement in the accuracy of anaphora resolution.
Furthermore, in the contextual analysis device 100 according to the embodiment, in the case in which the probability of appearance of an event sequence is calculated using the n-gram model, it becomes possible to obtain the probability of appearance of the event sequence by taking into account an effective number of elements as procedural knowledge. That enables achieving further enhancement in the accuracy of the event sequence as procedural knowledge.
Moreover, in the contextual analysis device 100 according to the embodiment, in the case in which the probability of appearance of an event sequence is calculated using the trigger model, it also becomes possible to deal with a change in the order of appearance of elements. Hence, for example, even with respect to a document in which transposition has occurred, it becomes possible to obtain the probability of appearance of an event sequence that serves as effective procedural knowledge.
Furthermore, in the contextual analysis device 100 according to the embodiment, at the time of obtaining sub-sequences from an event sequence, it is allowed to have combinations of non-adjacent elements in a sequence. As a result, even with respect to sentences in which there is a temporary break in context due to interrupts, it becomes possible to obtain sub-sequences that serve as effective procedural knowledge.
Moreover, in the contextual analysis device 100 according to the embodiment, at the time of acquiring an event sequence from the arbitrary document group D1, the anchor is identified using coreference tags. As a result, it becomes possible to eliminate an inconvenience in which a group of nouns matching on the surface but differing in substance are treated as the anchor or to eliminate an inconvenience in which a group of nouns matching in substance but differing only on the surface are not treated as the anchor.
Each of the abovementioned functions of contextual analysis device 100 according to the embodiment can be implemented by, for example, executing predetermined computer programs in the contextual analysis device 100. In that case, for example, as illustrated in
The computer programs executed in the contextual analysis device 100 according to the embodiment are recorded as installable or executable files in a computer-readable recording medium such as a compact disk read only memory (CD-ROM), a flexible disk (FD), a compact disk readable (CD-R), or a digital versatile disk (DVD); and are provided as a computer program product.
Alternatively, the computer programs executed in the contextual analysis device 100 according to the embodiment can be stored in a downloadable manner on a computer connected to a network such as the Internet or can be distributed over a network such as the Internet.
Still alternatively, the computer programs executed in the contextual analysis device 100 according to the embodiment can be stored in advance in the ROM 102.
Meanwhile, the computer programs executed in the contextual analysis device 100 according to the embodiment contain module for each processing unit (the case frame predictor 1, the event sequence model builder 2, the machine-learning case example generator 3, the anaphora resolution trainer 4, and the anaphora resolution predictor 5). As far as the actual hardware is concerned, for example, the CPU 101 (a processor) reads the computer programs from the memory medium and runs them such that the computer programs are loaded in a main memory device. As a result, each constituent element is generated in the main memory device. Meanwhile, in the contextual analysis device 100 according to the embodiment, some or all of the operations described above can be implemented using dedicated hardware such as an application specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
In the contextual analysis device 100 described above, the event sequence model building operation, the anaphora resolution learning operation, as well as the anaphora resolution predicting operation is performed. However, alternatively, the contextual analysis device 100 can be configured to perform only the anaphora resolution predicting operation. In that case, the event sequence model building operation and the anaphora resolution learning operation are performed in an external device. Then, along with receiving input of the analysis target document D6, the contextual analysis device 100 receives input of the event sequence model D2 and the anaphora resolution learning model D5 from the external device; and then performs anaphora resolution with respect to the analysis target document D6.
Still alternatively, the contextual analysis device 100 can be configured to perform only the anaphora resolution learning operation and the anaphora resolution predicting operation. In that case, the event sequence model building operation is performed in an external device. Then, along with receiving input of the anaphora-tagged document group D3 and the analysis target document D6, the contextual analysis device 100 receives input of the event sequence model D2 from the external device; and generates the anaphora resolution learning model D5 and performs anaphora resolution with respect to the analysis target document D6.
Herein, the contextual analysis device 100 is configured to perform particularly anaphora resolution as contextual analysis. Alternatively, for example, the contextual analysis device 100 can be configured to perform other contextual analysis, such as consistency resolution or dialogue processing, other than anaphora resolution. Even in the case in which the configuration enables performing contextual analysis other than anaphora resolution, if a new-type event sequence is used as a sequence of elements including the word sense identification information which enables identification of the word sense of the predicates, it becomes possible to enhance the accuracy of contextual analysis.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
This application is a continuation of International Application No. PCT/JP2012/066182, filed on Jun. 25, 2012, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2012/066182 | Jun 2012 | US |
Child | 14475700 | US |