The present invention relates to a device for extracting contradictory expressions from a huge amount of texts and, more specifically, to a device for extracting, with high reliability, pairs of mutually contradicting expressions from a huge amount of texts.
If contradictory expressions can be detected from texts, its results may be used for various objects. By way of example, if mutually contradictory expressions can be detected in large bodies of texts, it will be possible to notify an author of the texts by marking such expressions. In a process of peer review of texts prepared by others, it will be possible to verify logical consistency thereof. If texts prepared by different authors are to be compared, it will be possible to confirm differences between assertions.
For example, many Japanese web pages claim that “agaricus prevents cancer”. This has been generally accepted by many Japanese. If one searches articles on the Web using “agaricus”, “cancer” and “promotes” as keywords, however, we can find reports claiming that “agaricus promotes cancer in rats.” Some of these reports point to a study authorized by the Ministry of Health, Labor and Welfare reporting that a commercial product containing agaricus promoted cancer in rats. Existence of such reports contradict the assertion that agaricus is effective to prevent cancer, and encourages one interested in agaricus to further study about this subject.
At the time of a disaster, a huge amount of information is gathered on blogs, mini-blogs, social media and the like on networks. Such information is very useful to enable effective evacuation or timely aid delivery. It is noted, however, that such information often includes ungrounded pieces of information or false rumor. It is not easy at a glance to distinguish such pieces of information from correct pieces of information. Therefore, such pieces of information are not only useless to make good decision but also harmful in that they hinder proper circulation of information and possibly spreading damage or delaying recovery. If it is possible to analyze pieces of information on the network and to extract and present contradictory pieces of information to a user, it will help the user to sort out reliable and unreliable pieces of information. As a result, chaos at the time of emergency could be settled early.
The above examples suggest that recognizing contradictory information on a certain subject can guide users through further search to a true fact eventually. This relates not only to knowledge of facts but also to non-factual information that occupy most of our daily lives. By way of example, consider discussions on TPP (Trans Pacific Partnership). There is a big controversy whether Japan should join TPP. Quite serious but contradictory claims are plentiful, such as TPP will wipe out Japan's agricultural businesses and TPP will strengthen Japan's agricultural businesses. These are assertions or predictions that can be realized or disputed after the underlying decision-making is done: joining or refusing the TPP.
Furthermore, after reading different texts including contradictory assertions, one should notice that each of them is supported by a convincing theory that has no obvious defect. For example, we find claims “Exports of Japan's agricultural products will increase thanks to TPP” and “A large amount of low-price agricultural products will be imported to Japan due to the TPP.” One of these predictions may just happen to be true because of unexpected reasons such as fluctuations in exchange rate of yen. We must survey such theories that support contradictory predictions, conduct balanced decision-making, and prepare counter measures for the expected problems after examining multiple viewpoints. Contradiction recognition should be useful to select documents to be surveyed in such situations.
Non-patent literature 1 listed above describes a study on such recognition of contradictory expressions. The study described in Non-patent literature 1 is directed to recognition of contradiction between sentences or in a document as a whole. In order to determine contradictory expressions with higher efficiency, however, a technique of recognizing contradictory expressions with smaller units is necessary. Such a technique will enable more efficient and more accurate recognition of contradictions between sentences or in a document as a whole.
Further, though examples described above are limited to Japanese, such a problem is common to all languages and not limited to Japanese.
Therefore, an object of the present invention is to provide a device for collecting contradictory expressions capable of efficiently collecting contradictory expressions in a unit smaller than a whole sentence.
Another object of the present invention is to provide a language-independent device for collecting contradictory expressions capable of efficiently collecting contradictory expressions in a unit smaller than a whole sentence.
According to a first aspect, the present invention provides a device for collecting contradictory expressions used connected to entailment relation storage means for storing entailment relation of words and to a first storage device storing a plurality of binary pattern pairs. A binary pattern pair includes two binary patterns and each binary pattern includes a unary pattern as a sub pattern. The device for collecting contradictory expressions includes: first classifying means for extracting, by machine learning using as training data binary pattern pairs selected from the binary pattern pairs stored in the first storage device, mutually contradictory binary pattern pairs from the plurality of binary patterns stored in the first storage device; deriving means for applying, to each of the binary pattern pairs extracted by the first classifying means, the entailment relation stored in the entailment relation storage means, for rewriting one binary pattern and thereby for deriving a new binary pattern pair; training data expanding means for extracting, from the new binary patterns derived by the deriving means, binary pattern pairs highly possibly be consisting of mutually contradictory binary patterns and adding to the training data, for expanding the training data; and second classifying means for classifying, by machine learning using the expanded training data expanded by the training data expanding means, given binary pattern pairs to binary pattern pairs which are mutually contradictory and to those which are not.
Preferably, the device for collecting contradictory expressions is used further connected to polarity storage means for storing polarities of unary patterns. The first classifying means includes: first pattern pair extracting means for extracting, using the polarities of unary patterns stored in the polarity storage means, a binary pattern pair having a unary pattern pair having mutually opposite polarities from the first storage means; and machine learning means, using as training data a plurality of binary pattern pairs each having a label indicating whether or not it consists of mutually contradictory binary patterns, for learning by machine learning a function of selecting a binary pattern pair consisting of mutually contradictory patterns, and for selecting and outputting a binary pattern pair consisting of mutually contradictory binary patterns from the plurality of binary pattern pairs stored in the first storage means.
More preferably, the first classifying means outputs a binary pattern pair, adding, to the binary pattern pair, a score indicating possibility of whether the pair consists of mutually contradictory binary patterns or not; and the training data expanding means includes: score calculating means for calculating, for each group of binary pattern pairs extracted by the first classifying means, ratio of binary pattern pairs included in the group having scores equal to or higher than a predetermined threshold value as a score of each binary pattern included in the group; score establishing means for establishing, for each of the binary pattern pairs newly derived by the first classifying means, the score of each binary pattern pair by allocating highest of the scores calculated by the score calculating means for the binary pattern pairs; and adding means for selecting a prescribed number of binary pattern pairs having top scores established by the score establishing means from the binary pattern pairs newly derived by the first classifying means and for adding these to the training data.
More preferably, the adding means excludes, at the time of addition to the training data, those of the binary pattern pairs newly derived by the first classifying means which are already included in a set of binary pattern pairs extracted by the first classifying means.
Either the first or second classifying means may include classifying means based on machine learning, such as classifying means based on Support Vector Machine.
According to a second aspect, the present invention provides a computer program executed in a computer connected to entailment relation storage means for storing entailment relation of words and to a first storage device storing a plurality of binary pattern pairs. A binary pattern pair includes two binary patterns and each binary pattern includes a unary pattern as a sub pattern. The computer program causes the computer to operate as: first classifying means for extracting, by machine learning using as training data binary pattern pairs selected from the binary pattern pairs stored in the first storage device, mutually contradictory binary pattern pairs from the plurality of binary patterns stored in the first storage device; deriving means for applying, to each of the binary pattern pairs extracted by the first classifying means, the entailment relation stored in the entailment relation storage means, for rewriting one binary pattern and thereby for deriving a new binary pattern pair; training data expanding means for extracting, from the new binary patterns derived by the deriving means, binary pattern pairs highly possibly be consisting of mutually contradictory binary patterns and adding to the training data, for expanding the training data; and second classifying means for classifying, by machine learning using the expanded training data expanded by the training data expanding means, given binary pattern pairs to binary pattern pairs which are mutually contradictory and to those which are not.
In the following description and in the drawings, the same components are denoted by the same reference characters. Therefore, detailed description thereof will not be repeated.
[Configurations]
The device in accordance with an embodiment described in the following collects pattern pairs, with each pair having two elements and being mutually contradictory, such as “X promotes Y” and “X prevents Y”, or “X will expel Y” and “X will reinforce Y.” Each pattern has two elements such as X and Y. In the following, such a pattern will be referred to as a “binary pattern” as it has two variable elements. By collecting such binary patterns, we can easily build a system recognizing texts of contradictory expressions such as “agaricus prevents cancer” and “agaricus promotes cancer” as described above.
Further, in the embodiment described in the following, the nature of whether a pattern is excitatory/inhibitory (these two will be generally referred to as “polarity” of an expression), proposed in Non-patent literature 2, is utilized. In accordance with the proposal of Non-patent literature 2, the polarity of an expression (a sub-pattern representing a predicate including one variable element; the portion of “(verb) Y” in “X (verb) Y”, such as “promotes Y”, “prevents Y”; each of these will be referred to as a “unary pattern” as it has one variable element) is classified to three categories of excitatory, neutral and inhibitory. “Excitatory” means that a function, effect, object or role of the variable included in the pattern is invoked or reinforced. Examples are to “cause” of “cause Y” and to “increase” of “increase Y.” In contrast, “inhibitory” represents that a function, effect, object or role of the variable included in the pattern is stopped or weakened. Examples are “prevent Y,” “diminish Y” and the like. “Neutral” represents an expression that is not either excitatory or inhibitory. For example, an expression “close to Y” is neutral.
Referring to
Web question-answering system 30 includes: a contradiction pattern pair collecting device 40 collecting a huge amount of documents from Web pages on the Internet 32 and collecting therefrom binary pattern pairs as mutually contradictory expressions; a contradiction pattern pair storage device 42 storing the contradictory expressions collected by contradiction pattern pair collecting device 40; and a contradictory expression presenting system 44, receiving an input of a question sentence from PC 34, detecting mutually contradicting expressions as answers to the question sentence from documents on the Web by using mutually contradicting expressions stored in contradiction pattern pair storage device 42, generating and returning to PC 34 a source text of a Web screen image representing these portions symmetrically in high-lighted manner. The contradictory expression presenting system 44 includes a Web server and a program executing system in a prescribed program language, both not shown. Receiving a request designating a certain program and a question sentence from PC 34, the Web server passes the designated question sentence to the designated program. The program analyzes the received question sentence, searches for and reads expressions including answer candidates to the question sentence from documents on the Web, and classifies these expressions to those mutually contradictory and those not, using a contradictory expression classifier trained using contradictory expressions stored in contradiction pattern pair storage device 42 as training data. The program further adds modification of high-light to portions of detected expressions and thereby generates HTML source texts symmetrically displaying mutually contradictory expressions in comparison with each other, and transmits the generated source texts back to PC 34.
Contradiction pattern pair classifying unit 68 has a two-stage configuration. The first stage of contradiction pattern pair classifying unit 68 includes: a first-stage contradiction pattern pair classifying unit 80 including a classifier for classifying the huge amount of binary pattern pairs stored in candidate pattern pair storage device 60 to a first type of pattern pairs each having a pair of unary patterns of the same element and opposite polarities and the rest to a second type of pattern pairs; a contradiction pattern pair intermediate storage device 82 storing the first type of pattern pairs classified by the first-stage contradiction pattern pair classifying unit 80; a non-contradiction pattern pair intermediate storage device 84 storing the second type of pattern pairs classified by the first-stage contradiction pattern pair classifying unit 80; an opposite polarity pair storage device 102 storing opposite polarity pairs as pattern pairs having unary pattern portions of opposite polarities, generated by an internal work of first-stage contradiction pattern pair classifying unit 80; and a training data storage device 108 storing training data for learning of first-stage contradiction pattern pair classifying unit 80, generated by an internal work of first-stage contradiction pattern pair classifying unit 80. Data stored in contradiction pattern pair storage device 42, opposite polarity pair storage device 102 and training data storage device 108 will be the inputs to a second-stage contradiction pattern pair classifying unit 86, as will be described later.
Here, the first type of pattern pair refers to a pair of patterns such as “promote Y” and “prevent Y”, that is, a pair of unary patterns having common portion “Y” and mutually opposite polarities.
The second stage of contradiction pattern pair collecting device 40 includes a second-stage contradiction pattern pair classifying unit 86, performing re-learning of the classifier using the contradiction pattern pairs stored in contradiction pattern pair intermediate storage device 82 and the entailment relations stored in entailment relation storage device 64, again classifying the candidate pattern pairs stored in candidate pattern pair storage device 50 to contradiction patterns and non-contradiction patterns using the re-learned classifier, and storing the contradiction patterns in contradiction pattern pair storage device 42 and storing non-contradiction patterns in non-contradiction pattern pair storage device 66, respectively.
First-stage contradiction pattern pair classifying unit 80 further includes: a training data generating unit 106 performing, under an operator's control, a process for extracting pattern pairs for generating training data for SVM 104 from candidate pattern pairs stored in candidate pattern pair storage device 60 and appending necessary tags, and storing the results in training data storage device 108; and an SVM training unit 110 for training SVM 104 using the training data stored in training data storage device 108.
SVM training unit 110 generates feature vectors for training of SVM 104, from the training data stored in training data storage device 108. As elements of the feature vectors, the following are used in the present embodiment. Two types of elements are mainly used. Specifically, features of surface structure obtained from pattern contents themselves, and features related to lexicon. The table below lists features used in the present embodiment. In the table, features not belonging to the two types mentioned above are also listed as “others.” These are commonly used by both the SVM 104 and an SVM in second-stage contradiction pattern pair classifying unit 86, which will be described later. It is naturally understood that selection of features is not limited to those listed in Table 1.
Second-stage contradiction pattern pair classifying unit 86 further includes: an SVM 142 classifying the candidate pattern pairs stored in candidate pattern pair storage device 60 to contradiction pattern pairs and non-contradiction pattern pairs and storing the contradiction pattern pairs in contradiction pattern pair storage device 42 and non-contradiction pattern pairs in non-contradiction pattern pair storage device 66, respectively; and an SVM training unit 140 for training SVM 142 using the expanded training data stored in expanded training data storage device 138. Specifically, SVM training unit 140 classifies the candidate pattern pairs using the training data originally obtained and stored in training data storage device 108 as well as training data including the contradiction pattern pairs added by additional contradiction pattern pair deriving unit 130, scoring unit 134 and training data expanding unit 136. It has been confirmed through experiments that accuracy of classification by SVM 142 having such a configuration becomes higher than the accuracy of classification by the first-stage SVM 104. The results of experiments will be discussed later.
Referring to
Additional contradiction pattern pair deriving unit 130 includes a contradiction pattern pair candidate generating unit 164 for generating a new contradiction pattern pair by reading contradiction pattern pairs 162 from contradiction pattern pair intermediate storage device 82, applying entailment relation 160 read from entailment relation storage device 64 to one of the patterns of each of the pattern pairs and thereby rewriting it. The logical constraint for the expansion is as follows.
If a pattern p entails a pattern q and pattern q contradicts a third pattern r, then pattern p must contradict r. For example, because “X causes Y” (pattern p) entails “X promotes Y” (pattern q) and pattern q contradicts “X prevents Y” (pattern r), then we conclude that pattern p contradicts pattern r. Here, the contradiction pattern pair <q, r> consisting of patterns q and r is called a source pattern pair, and the contradiction pattern pair <p, r> consisting of patterns p and r is called an expanded pattern pair.
Additional contradiction pattern pair storage device 132 stores candidate groups 180, 182 and 184 consisting of candidates (candidate pairs) of contradiction pattern pairs generated by contradiction pattern pair candidate generating unit 164.
Scoring unit 134 includes: a candidate pair determining unit 200 determining, for each of the candidate groups 180, 182, 184 and the like, whether or not the score at the time of classification by SVM 104 (see
The sub score CDPsub (q, r) over a source contradiction pattern pair <q, r> is defined as follows.
Here, Ex(q, r) is the set of expanded pattern pairs derived from a source pair <q, r>, and Sc is the score given by SVM 104 to the source pattern pair. In the experiments as will be described later, we set α=0.46. This value was selected such that the pattern pairs for which SVM 104 gives a score over a correspond to the top 5% of the outputs of SVM 104.
Training data expanding unit 136 includes a score establishing unit 218 establishing, in response to addition of sub score CDPsub to every candidate pair included in candidate groups 180, 182, 184 and the like by scoring unit 134, the value of score CDP for each candidate pair in accordance with the equation below.
CDP(p,r)=max(q,r)εSource(p,r)CDPsub(q,r)
Among the candidate pairs already given the score CDP stored in additional contradiction pattern pair storage device 132, identical contradiction pattern pairs separately derived from a plurality of contradiction pattern pairs may exist. Since contradiction patterns from which they are derived differ, generally, these pattern pairs have different score CDPs. If such candidate pairs exist, score establishing unit 218 gives the maximum value of sub score CDPsub calculated by the equation above for the candidate pair by sub score calculating unit 202 as the score CDP of the candidate pair.
Training data expanding unit 136 further includes: a top candidate extracting unit 220 sorting the candidate pairs in descending order of CDP and extracting top N candidate pairs; a candidate merging unit 222 merging the candidate pairs extracted by top candidate extracting unit 220 with the training data stored in training data storage device 108 and outputting new training data; and a negative cleaning unit 224 performing a negative cleaning process for removing contradiction pattern pairs conflicting newly added candidate data from the training data output from candidate merging unit 222.
Among the candidate pairs already given the scores stored in additional contradiction pattern pair storage device 132, identical contradiction pattern pairs separately derived from a plurality of contradiction pattern pairs may exist. Since contradiction patterns from which they are derived differ, generally, these pattern pairs have different CDPs. If such candidate pairs exist, top candidate extracting unit 220 gives the maximum value calculated by sub score calculating unit 202 as CDP of the candidate pair.
After the CDP of each candidate pair is established by score establishing unit 218, top candidate extracting unit 220 extracts only those candidate patterns which are not in the set of contradiction pattern pairs stored in contradiction pattern pair intermediate storage device 82, and outputs top N thereof to candidate merging unit 222. Specifically, top candidate extracting unit 220 removes those of the candidate patterns which are already stored in contradiction pattern pair intermediate storage device 82, from the object of addition.
The process by negative cleaning unit 224 is necessary for attaining consistency of training data. Here, of the contradiction pattern pairs obtained through classification by SVM 104, those conflicting with the pattern pairs added by candidate merging unit 222 are removed. Of the pattern pairs, the pair of content words is considered to be the strongest ground as to whether the pair of patterns contradicts with each other. Therefore, here, of the contradiction pattern pairs obtained at the beginning, contradiction pattern pairs having common content word or words with any of the newly added contradiction pattern pairs (referred to as negative samples) are removed.
The process above described as pseudo-code is as follows.
[Operation]
Contradiction pattern pair collecting device 40 having the above-described configuration operates in the following manner. Referring to
In the present embodiment, we use polarity dictionary storage device 62 manually prepared in advance. The entailment relationship stored in entailment relation storage device 64 may be manually prepared or may be prepared using a classifier trained by machine learning using manually prepared training data.
Referring to
On the other hand, an operator extracts candidate patterns to be training data from candidate pattern pair storage device 60 using training data generating unit 106, and adds tags indicating whether or not each candidate pattern pair consists of mutually contradicting patterns, and thereby generates training data. The training data are stored in training data storage device 108. SVM training unit 110 generates feature vectors for learning of SVM 104 from the training data stored in training data storage device 108, and conducts learning of SVM 104. Here again, training data prepared manually beforehand is used for learning of SVM 104. It is noted, however, that the data may not be directly prepared manually, and data classified and labeled by a learned classifier may be used as the training data. Further, a method of generating training data that does not require any manual determination may be used.
In accordance with the result of learning, SVM 104 classifies each of the candidate pattern pairs having mutually opposite polarities stored in opposite polarity pair storage device 102 to contradiction pattern pairs and non-contradiction pattern pairs, and stores them in contradiction pattern pair intermediate storage device 82 and non-contradiction pattern pair intermediate storage device 84, respectively. Here, SVM 104 gives SVM score to each of the output pattern pairs. If it is highly possible that a pattern pair is a contradiction pattern pair, the score will be high, and otherwise, the score will be low.
Referring to
When generation of additional contradiction pattern pairs by additional contradiction pattern pair deriving unit 130 is completed, scoring unit 134 calculates CDP of each contradiction pattern pair. Referring to
Score establishing unit 218 of training data expanding unit 136 establishes CDP of each candidate pair, by allocating, to a candidate pair derived from a plurality of contradiction patterns among the additional contradiction pattern pairs stored in additional contradiction pattern pair storage device 132, the maximum CDPsub allocated to the candidate pair, and allocating, to other candidate pairs, CDPsub as the CDP. Top candidate extracting unit 220 extracts, from the candidate pairs stored in additional contradiction pattern pair storage device 132, those out of the set of top 5% scores in the contradiction pattern pairs stored in contradiction pattern pair intermediate storage device 82, and of these, outputs top N to candidate merging unit 222 of training data expanding unit 136.
Candidate merging unit 222 merges the candidate pairs output from top candidate extracting unit 220 with those stored in training data storage device 108, and outputs results to negative cleaning unit 224.
From the training data output from candidate merging unit 222, negative cleaning unit 224 removes those conflicting with the newly added candidate pairs, and stores the remaining training data in expanded training data storage device 138.
Again referring to
Accuracy of contradiction pattern pairs in contradiction pattern pair storage device 42 obtained in this manner was confirmed by experiments as will be described in the following, and it was confirmed that the performance was clearly improved over the prior art.
[Experiment 1]
In the embodiment above, only the candidate pattern pairs having opposite polarities are used when training data are extracted, by opposite polarity pair extracting unit 100 shown in
In the experiments, the binary patterns and their co-occurring noun pairs were extracted from 600 million Japanese web pages dependency-parsed with KNP (Reference 1 as listed below). We restricted the patterns to the most frequent 3.9 million patterns (of the form “X-[case particle] Y-[case particle] predicate” such as “X-ga Y-ni am” (“X is in Y”)), which do not contain any negation, number, symbol or punctuation character. Based on an observation that patterns in meaningful contradiction pattern pairs tend to share co-occurring noun pairs, we used as inputs to classifiers the set Pall of 792 million pattern pairs for which both patterns share three co-occurring noun pairs.
Further, considering that unary patterns with opposite polarity have a higher chance to be contradictions, by opposite polarity pair extracting unit 100, a set Popp of binary pattern pairs that contain unary patterns with opposite polarities was selected from the set Pall. Polarity dictionary storage device 62 used here stored 6,470 unary patterns of which polarities were hand-labeled. Of these 4,882 were labeled excitatory and 1,558 inhibitory.
The set Popp contained 8 million unary pattern pairs with roughly 38% true contradiction pairs, which were input to SVM 104 (see
The work by training data generating unit 106 was prepared by majority vote of three human operators. As a result we had training data stored in training data storage device 108 including 796 patterns, of which 238 were labeled as contradiction pairs and 558 were non-contradiction pairs. These unary pattern pairs were selected among pairs with high distributional similarity, regardless of whether the polarity is opposite or not.
We then extracted from the set Pall 256,000 pattern pairs containing a contradictory unary pattern pair, and 5.2 million pattern pairs containing a non-contradictory unary pattern pair. These are used as positive training data and negative training data, respectively.
The composition of training data to be stored in training data storage device 108 was determined beforehand using development data. For this determining process, 1,000 manually labeled samples were used. Twenty different classifiers were trained using from 6,250 to 50,000 positive samples (4 sets) and 12,500 to 200,000 negative samples (5 sets), doubling the amounts in each step. The resulting optimal training data set consisted of 12,500 positive samples and 100,000 negative samples, which were used in the experiments.
To train SVM, TinySVM (see Reference 2 as listed below) with a polynominal kernel of degree 2 was used. This setting showed the best performance during preliminary experiments.
With this setting, an experiment was conducted to examine the effect of restricting the input patterns to opposite polarity pair storage device 102 to pattern pairs having opposite polarities. For the experiment, a test set of 2,000 manually labeled samples and 250 manually labeled by majority vote of three operators from top scores of the set Pall were used as inputs to SVM 104, and top 2 million pattern pairs of both Popp and Pall sets were classified, with the results indicated by precision curves.
The precision curve of
[Experiment 2]
In Experiment 2 also, the development set and the test set were used as described above. For this purpose, we asked three human operators to label 3,000 binary patterns as contradiction pattern pairs or non-contradiction pattern pairs. The 3,000 pattern pairs were randomly selected from the set Popp. Of the 3,000 pattern pairs, 1,000 were used as the development set and 2,000 were used as the test set. In labeling by three operators, the label of pattern pair was determined by majority vote. The development set was the same as the data of 1,000 samples manually labeled, used in Experiment 1 for determining the composition of training data to be stored in training data storage device 108.
As a definition of “contradiction”, we used the notion of incompatibility (that is, two statements are extremely unlikely to be simultaneously true) proposed in an article listed below as Reference 3. Therefore, we can say pattern pairs such as “X causes Y” and “X prevents Y” are contradictory if the above condition holds for any noun pair that can instantiate the patterns' variables in the semantic class of these patterns.
In the experiment, the following three results of classification were compared. Results are as shown in
Referring to
With the same precision of 80%, BASE and PROP-SCORE acquired only 285,000 and 636,000 contradiction pattern pairs, respectively. This implies that the two-stage method of extracting contradiction pattern pairs in accordance with the embodiment can more than double the number of contradiction pattern pairs that are correctly extracted, can increase their variety, and that use of score CDP when adding candidate pairs to the training data in the second stage enables extraction of larger number of contradiction pattern pairs with higher precision than the method of using the score by SVM 104 of the first stage.
[Computer Implementation]
The contradiction pattern pair collecting device 40 in accordance with the above-described embodiment can be implemented by computer hardware and a computer program running on the computer hardware.
Referring to
Referring to
The computer program causing computer system 530 to function as various functional units of contradiction pattern pair collecting device 40 in accordance with the above-described embodiment is stored in a DVD 562 or a removable memory 564 loaded to DVD drive 550 or memory port 552, and transferred to hard disk 554. Alternatively, the program may be transmitted to computer 540 through the Internet 32 and stored in hard disk 554. The program is loaded to RAM 560 at the time of execution. The program may be directly loaded to RAM 560 from removable memory 564, or through the Internet 32.
The program includes a sequence of instructions consisting of a plurality of instructions causing computer 540 to function as various functional units of contradiction pattern pair collecting device 40 in accordance with the embodiment above. Some of the basic functions necessary to cause computer 540 to operate in this manner may be statically linked at the time of creating the program or dynamically linked at the time of executing the program, by the operating system running on computer 540, by a third-party program, or various programming tool kits or program library (for example, a computer program library for SVM) installed in computer 540. Therefore, the program itself may not include all functions to realize the system and method of the present embodiment at the time of circulation. The program may include only the instructions that call appropriate functions or appropriate program tools in the programming tool kits or in the program library in a controlled manner to attain a desired result and thereby to realize the functions of the system described above. Naturally, the program itself may have all necessary functions statically linked so that it can operate without any other resources.
The embodiments as have been described here are mere examples and should not be interpreted as restrictive. The scope of the present invention is determined by each of the claims with appropriate consideration of the written description of the embodiments and embraces modifications within the meaning of, and equivalent to, the languages in the claims.
By way of example, an SVM is used as a classifier. The present invention, however, is not limited to such an embodiment. For instance, a Naive Bayes classifier, or a classifier trained by supervised learning with maximum entropy model may be used.
The present invention can be used for collecting mutually contradictory expressions from everyday languages. Further, the present invention can be used, for example, when presses, publishers, general companies or individuals offer information, to verify correctness of contents by finding contradictory expressions, or to verify logics of information to prevent confusion of readers due to use of contradictory expressions related to one same object. Particularly, when it is difficult to verify reliability and a huge amount of information circulates in a short period of time such as at the time of a disaster, the present invention can be used to help presses, administrative organizations and individuals to choose good pieces of information and to behave appropriately.
Number | Date | Country | Kind |
---|---|---|---|
2013-210793 | Oct 2013 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2014/076730 | 10/6/2014 | WO | 00 |