The present invention relates to a technique for generating, from information in a huge amount of documents, information consisting of chained causalities promoting decision making taking every risk and chance into consideration, in a scope of coverage exceeding the extent of information in the documents.
In the modern society where actions are complicated and of grand-scale and have global influences, it is essential how to predict the future. Preparation based on the prediction would prevent an “unexpected contingency.” Such a prediction is essential to make well-balanced, appropriate decisions in politics, economics and other various situations of everyday living.
Considering the current status where we have a formidable amount of ever-changing information to be considered, right prediction would be difficult to make if we rely on limited knowledge and imagination of an individual or of an organization. Though prediction of limited events with limited scope of data has been practically used as in the case of weather forecast, no one has ever conceived applying such an idea to the social movements, except for the concept disclosed in Patent Literature 1 below.
A computer is a powerful tool reinforcing one's ability. By way of example, computers have decided superiority in memorizing information over an individual's ability. Further, highly advanced techniques of natural language processing have been developed. Question-answering systems making full use of such information storage ability and the natural language processing techniques have been realized, and now they can provide, with high accuracy, correct answers to questions formulated in natural languages.
Such prior art techniques, however, cannot provide an answer to questions such as “what-if” type, though they can provide answers to a so-called “factoid” type question of “what is XX?” The system disclosed in Patent Literature 1 proposes a solution to such a problem, for predicting events that could happen in the future, considering every risk and chance using computer power. Practical application of such a device would be helpful for better decision making by people. According to Patent Literature 1, information referred to as a “social scenario” consisting of chained causalities is generated for such a prediction.
The causality phrase pair as used herein refers to a set of a phrase (cause phrase) describing some event or action as a cause, and a phrase (result phrase) describing its resultant event or action.
Referring to
Social scenario generating device 76 includes: a causality phrase pair collecting device 90 for collecting causality phrase pairs from the documents stored in WEB archive 74; causality phrase pair DB 92 storing the causality phrase pairs collected by causality phrase pair collecting device 90 in such a manner that any phrase pair can be accessed and retrieved at least by using its cause phrase as a key; a social scenario generating unit 94 generating a large number of social scenarios by successively linking, among the large number of causality phrase pairs stored in causality phrase pair DB 92, a certain causality phrase pair with another causality phrase pair having as a cause phrase the result phrase of the certain phrase pair; a social scenario DB 34 storing the social scenarios generated by social scenario generating unit 94; and social scenario output unit 36 responsive to a question from a user for extracting social scenarios 38 appropriate as answers from social scenario DB 34, and ranking and outputting the same. In the chained causality, even if a result phrase of a causality phrase pair in the former half of the chain and a cause phrase in the latter half of the causality have different character sequences, these phrases are chained provided that they have semantic consistency (in Patent Literature 1, this consistency is referred to as “causal consistency” as it means semantic consistency regarding causality).
Causality phrase pair collecting device 90 collects a huge number of causality phrase pairs from WEB archive 74 and stores them in causality phrase pair DB 92. Social scenario generating unit 94 generates a social scenario having causality chain, by repetitively linking, among causality phrase pairs stored in causality phrase pair DB 92, a certain causality phrase pair to another causality phrase pair having a cause phrase that can be linked to (i.e., has causal consistency with) the result phrase of the certain phrase pair. Generally, there would be a plurality of causality phrase pairs that have cause phrases having causal consistency with the result phrase of a single causality phrase pair. Therefore, the number of social scenarios increases exponentially as the number of links of causal phrase pairs becomes larger. These social scenarios are stored in social scenario DB 34. When a user poses some question to social scenario output unit 36, social scenario output unit 36 generates a cause phrase from the contents of the question, retrieves social scenarios 38 having the cause phrase as a start point from social scenario DB 34, and presents them to the user. At this time, the presented social scenarios are scored based on a relation with the question, and presented to the user in descending order of scores.
PTL 1: JP2015-121897A
NPL 1: Gergely Palla, Imre Derenyi, Illes Farkas, and Tamas Vicsek, 2005, Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435:814-818.
By the technique described in Patent Literature 1 above, we can obtain a huge number of social scenarios. Very useful social scenarios are often contained therein. When the number of documents stored in WEB archive 74 increases, however, the number of social scenarios we obtain increases. Therefore, there arises a problem that the social scenarios tend to include practically impossible or meaningless scenarios.
By way of example, referring to
The technique described in Patent Literature 1 has the effect that results of inference not expected by humans can be obtained as it provides a huge number of social scenarios. On the other hand, it also provides a huge number of irrelevant scenarios such as described above, posing a serious problem when using the social scenarios.
In the embodiments of the present application, in order to emphasize the characteristic that we predict future from a certain question, we use the term “future scenario” in place of “social scenario.”
Therefore, an object of the present invention is to provide a future scenario generating device and method that can generate a huge number of appropriate future scenarios, as well as to provide a computer program for that purpose.
According to a first aspect, the present invention provides a future scenario generating device, including: phrase pair storage means for storing a large number of causality phrase pairs; causality network building means for building a causality network, by linking, among the phrases stored in the phrase pair storage unit, phrases connectable as causality; community detecting means for detecting a community in the causality network built by the causality network building means; initial phrase selecting means for selecting any phrase as an initial phrase; and future scenario generating means for generating a future scenario, by linking, using the initial phrase selected by the initial phrase selecting means as a start point, causality pairs belonging to the same community as the initial phrase until a predetermined end condition is satisfied.
Preferably, the community detecting means includes means for detecting a community in the causality network applying clique percolation method (CPM) on the causality network.
More preferably, the means for detecting a community uses, in k-clique detection while executing CPM, an integer k selected from the range of k=3 to 6.
More preferably, the initial phrase selecting means includes means for selecting, based on a question sentence input by a user, a phrase having causal consistency with a main part of the question sentence as the initial phrase.
According to a second aspect, the present invention provides a future scenario generating method, including the steps of: a computer storing a large number of causality phrase pairs in phrase pair storage means; a computer generating a causality network by linking, among the phrases stored in the phrase pair storage unit, phrases connectable as causality; a computer detecting a community in the causality network; a computer selecting any phrase stored in the storage means as an initial phrase; and a computer generating a future scenario, by linking, using the initial phrase as a start point, phrases connectable as causality and belonging to the same community as the initial phrase in the causality network until a predetermined end condition is satisfied.
According to a third aspect, the present invention provides a computer program causing a computer to function as all means of any of the future scenario generating devices described above.
In the following description and in the drawings, the same components are denoted by the same reference characters. Therefore, detailed description thereof will not be repeated.
Though the following description is directed to Japanese, future scenario generating device similar to those of the embodiments of the present invention can be obtained in the languages other than Japanese by using similar methods as the embodiments disclosed in the present specification and by considering the characteristics of the language of interest.
Let us consider the reason why such a situation as shown in
As a solution to this problem, an approach may be conceivable that focuses on the text forming causality phrases to narrow social scenarios by. Such an approach, however, may be too much dependent on intuition or may be unable to allow selection on clear criterion because the criterion is arbitrary. Therefore, it is preferable to find a method of narrowing social scenarios focusing on a factor other than the text.
In this regard, the inventors of the present invention noted a technique for finding an undetected lower level unit forming a network in SNS (Social Networking Service) described in Non-Patent Literature 1, which is directed to a technical field not related to the generation of future scenarios for which the present invention is aiming. This technique is mainly used for finding a community in a network in SNS, or for classifying proteins focusing on structural similarities among various proteins. Referring to
Therefore, in each of the embodiments below, in a causality network 190 shown in
As described above, by using the community detecting technique for SNS, when the future scenarios are to be generated from a certain causality phrase pair and following causality relations, the possibility of causality phrases unrelated to the cause phrase of the original causality phrase slipping into future scenarios becomes lower, and the possibility that only the meaningful future scenarios result becomes higher. Further, since this method does not directly based on the text forming the causality phrase pairs, it can be effectively applied no matter what language is used in the source documents, without any modification to the method.
[Configuration]
Referring to
Future scenario generating device 272 includes: causality phrase pair collecting device 90 and causality phrase pair DB 92 similar to those shown in
In a chain of phrases in causality network building device 290, as in Patent Literature 1, even when the result phrase of a causality phrase pair in the former half of the chain and the cause phrase of the causality phrase pair of the latter half do not have the same character sequences, these phrases are linked if they have causal consistency.
When such causality link is to be formed, it is easy to determine a link if the result phrase of certain causality phrase pair is identical with the cause phrase of another causality phrase pair. Actually, however, there is such a relation between phrases that can establish a link between two causality phrase pairs, even though the phrases have different character sequences. If such a relation is overlooked, the scope of generated scenarios could be too narrow. Therefore, when considering a result phrase of a certain causality phrase pair and a cause phrase of another causality phrase pair that can be a linking part of two causality phrase pairs, it is important to find a relation that allows identification of these two phrases as substantially the same, even if they do not have identical character sequences.
In the present embodiment, as in Patent Literature 1, even when the phrases are not identical in character sequences, the phrases are linked if they have causal consistency. As mentioned in Patent Literature 1, the causal consistency is a new idea encompassing paraphrasing and entailment, which cannot be realized by the conventional natural language processing techniques only. For any two causality phrase pairs, causality network building device 290 evaluates causal consistency between the result phrase of one pair and the cause phrase of the other, and links causality phrase pairs having causal consistency.
In determining whether or not they have causal consistency, various criteria are used. First, among phrases forming the causality network, phrases having the same noun and having predicate templates representing the structure of the phrase of the same polarity are regarded as synonymous phrases having causal consistency.
Predicate templates are classified as simple predicate templates or complex predicate templates. In Japanese, a combination of a particle and a predicate (example: <o, taberu> (eat XX)), connecting one particle with one predicate will be referred to as a “simple predicate template.” Examples may include “ga shinko suru (something proceeds),” “o fusegu (prevent something),” “ga kengen suru (something emerges).” Here, in a sentence, immediately preceding the particle forming a simple predicate template, a subject, an object or the like of the predicate is positioned. In Japanese, a combination connecting a particle “ (pronounced ‘no’),” a noun, and a simple predicate template will be referred to as a “complex predicate template.” In the present embodiment, only the documents in Japanese will be discussed and hence, definitions as above are used. The definitions of simple predicate template and complex predicate template naturally differ language by language.
Polarity is a concept introduced to represent the characteristics of a predicate template. In the present embodiment, three polarities, that is, excitatory, inhibitory and neutral, are used. Excitatory refers to the polarity of a predicate template that describes an event exhibiting or promoting a function, effect or the like of an object indicated by a subject, object or the like of the noun positioned immediately preceding the particle at the head of the predicate template in a sentence. Inhibitory refers to the polarity of a predicate template that describes an event that prevents exhibition of a function, effect or the like of an object. Neutral refers to the polarity of a predicate template to which the definition of neither excitatory nor inhibitory applies.
Further, in the present embodiment, phrase pairs having the same nouns and having the same evaluation polarities of the entire phrases, and phrase pairs having the same nouns and having templates appearing the similar contexts in a huge amount of documents are regarded as synonymous phrases having causal consistency. Whether or not the contexts of appearance are similar is determined by calculating in advance distribution similarities of templates.
The method of joining phrases is not limited to the above-described use of causal consistency between the two phrases. For example, causal consistency may be established if there is a semantic relation bridging a certain phrase to another phrase, though there is no causal consistency between the two phrases. Assume, for example, that one phrase is “sunlight is blocked” and the other phrase is “photosynthesis is prevented.” Here, it is possible to consider a phrase “sunlight is necessary for photosynthesis” as a link bridging the two. Then, using this phrase as an intermediary, a phrase of “sunlight is blocked” and a cause phrase of “photosynthesis is prevented” may be linked.
Future scenario generating device 272 further includes: a community detecting device 294 for detecting a community in a causality network stored in causality network DB 292, by the method described in Non-Patent Literature 1, and for forming and outputting a new network (referred to as a causality network) by adding an identifier of a community to phrases corresponding to nodes forming the community; a causality community DB 296 storing the causality network; a future scenario generating unit 298 for generating a huge amount of future scenarios by tracking only the phrases belonging to the same community as that of the main part of the question received by question input unit 280, from the nodes corresponding to various phrases stored in causality community DB 296; a future scenario DB 300 for storing the future scenarios generated by future scenario generating unit 298; and a future scenario output unit 302 for ranking the future scenarios stored in future scenario DB 300 and outputting the results as answers to the question.
In the present embedment, a causality community is detected by a method referred to as clique percolation method (CPM). According to CPM, a complete sub-graphs consisting of k nodes (k is a positive integer), which is called a “k-clique”, is extracted in a network (graph), and by connecting these, a community is detected. The complete sub-graph as used herein refers to such a sub-graph consisting k nodes all connected to each other by edges. For example, if k=2, a k-clique consists of two nodes and one edge connecting these nodes. A 3-clique is a graph consisting of three edges connecting three nodes, having a triangular shape.
Two k-cliques are adjacent to each other when they share k−1 nodes. A community as used herein is a sub-graph comprised of a set of k-cliques reachable to each other via several adjacent k-cliques. This definition means that two communities can share a node or nodes, as described above.
Basic algorithms for community detection are disclosed in Non-Patent Literature 1 and its appended document. The appended document is available at http://nature.com/nature/journal/v435/n7043/suppinfo/nature03607.html
Further, CPM is implemented as various programs. For Example, some are distributed at the following URLs:
http://www.cfinder.org
https://github.com/aaronmcdaid/MaximalCliques
In addition to the above, many algorithms for realizing CPM are provided, and various studies for increasing the speed of processing have been published.
It is noted that if the value of k is small, a community of a huge size close to the size of the whole network would be formed, and detection of a community becomes meaningless. On the other hand, if the value of k is large, each community becomes too small and diversity in generating future scenarios would be lost. In the method of community detection described in Non-Patent Literature 1, the preferable value of k is from 3 to 6. Therefore, in order to generate a sufficient number of future scenarios that are semantically appropriate, it is similarly desirable to select the value k from the range of k=3 to 6. It goes without saying that use of CPM is possible with the value outside of this range, and depending on the state of causality network, the value k may be selected outside of this range. In the present embodiment, k=4.
Community detection by CPM is performed on a so-called undirected graph, whereas the causality network in accordance with the present embodiment is a directed graph. In detecting a community in the present embodiment, the causality network is regarded as an undirected graph, to apply CPM.
For detecting a community, various methods are available other than CPM. Algorithms known for community detection include the following examples. Some of these are applicable not only to undirected graphs but also to directed graphs. Some allow one node belonging to a plurality of communities, while others do not. Any of the following examples may be used for detecting a community in a network (graph) in accordance with the present embodiment.
Hierarchical clustering
Girvan-Newman algorithm
Modularity maximization
Statistical inference
Referring to
In future scenario candidate generating unit 314, the process of generating each future scenario from the initial phrase is terminated when a prescribed condition is satisfied. For example, when the number of phrases linked from the initial phrase reaches a prescribed number, generation of the future scenario may be finished. Alternatively, the process may be terminated when a phrase to be linked can no longer be found in the community.
[Operation]
Referring to
Causality network building device 290 searches the causality phrase pairs stored in causality phrase pair DB 92 for any two pairs where the result phrase of the pair and the cause phrase of the other pair has causal consistency with each other, links these pairs, and by repeating this operation for every causality, generates a causality network and stores it in causality network DB 292.
Community detecting device 294 detects communities, in the causality network described by the information stored in causality network DB 292, using CPM realizing the method described in Non-Patent Literature 1, generates information describing a new causality network (causality community) by adding, to each causality phrase pair, an identifier indicating the community to which it belongs, and stores the results to causality community DB 296.
Referring to
Future scenario ranking unit 316 stores the generated future scenarios. When generation of all future scenarios ends, future scenario ranking unit 316 ranks these in accordance with prescribed scores, generates a future scenario display image having scenarios of higher ranks arranged near the root and allowing tracking of each causality, and displays it on a display device, not shown. The display device is controlled such that a requested future scenario is displayed in response to a user's instruction.
As described above, according to the present embodiment, a causality network is built from the causality phrase pairs, and from the causality network, communities are detected. Future scenario candidates are generated only from the phrases belonging to the same community. Therefore, there is little possibility that phrases belonging to different communities are erroneously mixed in a generated future scenario, and hence, we can obtain only the future scenarios semantically consistent. Detection of a community is determined only by the topology of the network. The text forming the causality phrase pairs are not used. Therefore, this method can provide useful future scenarios regardless of the languages in which the causality phrase pairs are described.
In the first embodiment above, as shown in
Referring to
Future scenario generating device 330 is different from future scenario generating device 272 in that it does not include causality network DB 292, community detecting device 294 or causality community DB 296 of
The second embodiment is characterized in that in place of newly building causality community DB 296 from causality network DB 292, causality sub-network reading unit 310 is updated and thereby information similar to that of causality community DB 296 is obtained. Except for this point, the configuration and operation are the same as the configuration and operation of the first embodiment.
In the first embodiment, causality community DB 296 is built from causality network DB 292. In the second embodiment, causality network DB 340 is updated by the community identifier, so that causality network DB 340 has the information similar to that of causality community DB 296. The present invention, however, is not limited to such embodiments. In the third embodiment, the causality network DB itself is unchanged, and a list of communities and causality phrase pairs belonging to the communities (referred to as a community list) is saved as a separate file.
Referring to
Different from future scenario generating device 272 shown in
In the third embodiment, community detecting device 370 detects communities of a causality network based on the information stored in causality network DB 292, and stores, for each community, a community list in community list storage unit 372.
Future scenario generating unit 374 reads the community list stored in community list storage unit 372, and for each community, reads causality phrase pairs forming the community from causality network DB 292, generates a future scenario and outputs it to future scenario DB 300.
According to the third embodiment, every time the future scenario generating device 360 generates a community list related to a certain community and outputs it to community list storage unit 372, it is possible for future scenario generating unit 374 to perform processing related to that community. Specifically, community detection by future scenario generating device 360 and generation of future scenario by future scenario generating unit 374 can be executed in a simultaneous and parallel manner.
Except for this point, the configuration and operation of future scenario generating device 360 are the same as those of future scenario generating device 272 shown in
The future scenario generating systems in accordance with the embodiments above all build a causality network after receiving a question and generate future scenarios from the built causality network. The present invention, however, is not limited to such embodiments.
For example, for each phrase stored beforehand in causality phrase pair DB 92, an overall causality network may be built so that a causality network having that phrase as a start point can be obtained, and community detection may be done in advance. In this case, when a question is input, at first, determination is made as to which community a phrase semantically the same as the question belongs, and future scenarios may be generated only from the phrases belonging to this community.
By further advancing this approach, all future scenarios may be generated in advance and stored in future scenario DB 300. Each future scenario is adapted to store an identifier of the community to which the phrase used for generating the future scenario belongs. When a question is input, a future scenario having as a start point a phrase having the same meaning as the question, and having the same community identifier, is searched and output. In this manner, a future scenario appropriate for the question can be selected and displayed.
[Computer Implementation]
The system in accordance with the embodiments above can be implemented by computer hardware and computer programs executed on the computer hardware.
Referring to
Referring to
The computer program causing computer system 930 to function as each of the functioning sections of the system in accordance with each of the embodiments above is stored in a DVD 962 or a removable memory 964 loaded to DVD drive 950 or to memory port 952, and transferred to hard disk 954. Alternatively, the program may be transmitted to computer 940 through a network, not shown, and stored in hard disk 954. At the time of execution, the program is loaded to RAM 960. The program may be directly loaded from DVD 962, removable memory 964 or through a network to RAM 960.
The program includes a plurality of instructions to cause computer 940 to operate as functioning sections of the system in accordance with each of the embodiments above. Some of the basic functions necessary to realize the operation are provided by the operating system (OS) running on computer 940, by a third party program, or by a module of various programming tool kits installed in computer 940. Therefore, the program may not necessarily include all of the functions necessary to realize the system and method of the present embodiment. The program has only to include instructions to realize the functions of the above-described system by calling appropriate functions or appropriate program tools in a program tool kit in a manner controlled to attain desired results. The operation of computer system 930 is well known and, therefore, description thereof will not be given here.
The embodiments as have been described here are mere examples and should not be interpreted as restrictive. The scope of the present invention is determined by each of the claims with appropriate consideration of the written description of the embodiments and embraces modifications within the meaning of, and equivalent to, the languages in the claims.
The present invention is applicable to provide decision making services considering risks and chances such as question-answering services; risk assessment services; auxiliary services for marketing research; and prediction of market trend in every industry, as well as to manufacturing of devices for that purpose.
34 social scenario DB
36 social scenario output unit
38 social scenario
60 social scenario generating system
70 the Internet
74 WEB archive
76 social scenario generating device
90 causality phrase pair collecting device
92 causality phrase pair DB
94 social scenario generating unit
190 causality network
250, 252, 254 community
270, 320, 350 future scenario generating system
272, 330, 360 future scenario generating device
290 causality network building device
292, 340 causality network DB
294, 342, 370 community detecting device
296 causality community DB
298, 374 future scenario generating unit
300 future scenario DB
302 future scenario output unit
310 causality sub-network reading unit
312 communitywise causality sub-network DB
314 future scenario candidate generating unit
316 future scenario ranking unit
372 community list storage unit
Number | Date | Country | Kind |
---|---|---|---|
2015-159376 | Aug 2015 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/072362 | 7/29/2016 | WO | 00 |