This application is a U.S. 371 Application of International Patent Application No. PCT/JP2019/018268, filed on 7 May 2019, which application claims priority to and the benefit of JP Application No. 2018-090637, filed on 9 May 2018, the disclosures of which are hereby incorporated herein by reference in their entireties.
The present invention relates to a dialogue data generation device, a dialogue data generation method, and a program and, in particular, to a dialogue data generation device, a dialogue data generation method, and a program for generating a question sentence in a dialogue system.
A dialogue system for holding a dialogue with a user is roughly divided into the two types of a task-oriented dialogue system and a non-task-oriented dialogue system.
The task-oriented dialogue system is one in which a specific task is achieved by holding a dialogue with a system. For example, the task-oriented dialogue system is used in flight reservation systems or weather information guidance systems (NPL 1). Such systems generally have a structure called a frame (constituted by a slot including a slot name and a slot value), and a dialogue proceeds on the basis of the frame.
The task-oriented dialogue system has such a structure and therefore can obtain information out of the other party by generating a question sentence to hear an unfilled slot.
On the other hand, the non-task-oriented dialogue system handles a meaningless dialogue, and the content of a dialogue is a so-called chat.
[NPL 1] Ryuichiro Higashinaka, Katsuhito Sudoh, Mikio Nakano, “Incorporating Discourse Features into Confidence Scoring of Intention Recognition Results in Spoken Dialogue Systems”, Speech Communication, Volume 48, Issues 3-4, 2006, pp.417-436.
However, the non-task-oriented dialogue system does not have a structure called a slot unlike the task-oriented dialogue system since a chat contains various topics, and it is not obvious what type of an interrogative is used to ask a question.
For this reason, there has been a problem that the non-task-oriented dialogue system has a difficulty in generating a question sentence to deeply delve the utterance of the other party.
In order to solve such a problem, the conventional non-task-oriented dialogue system realizes a dialogue with a user through a rule base method or a machine learning method.
However, there has been a problem that a rule is described by hand in the rule base method and therefore it is necessary to make a large amount of rules by hand to extensively deeply delve a dialogue.
Further, in the machine learning method, data such as an utterance that is a question sentence corresponding to the utterance of the other party does not exist, and it is difficult to prepare an amount of data for substantial learning. That is, there has been a problem that it is difficult to prepare a corpus (learning data) for machine learning aimed to generate a question sentence.
In summary, there has been a problem that the conventional non-task-oriented dialogue system cannot realize a dialogue system to deeply delve a dialogue at a low cost and therefore cannot smoothen the interaction between a dialogue system and a user.
The present invention has been made in view of the above circumstances and has an object of providing a dialogue data generation device, a dialogue data generation method, and a program that make it possible to generate dialogue data for generating a question sentence to deeply delve conversation at a low cost.
A dialogue data generation device according to the present invention includes: an input unit that receives an input of a plurality of pieces of data each including a set of a first utterance sentence that is a sentence uttered by a first user, a second utterance sentence that is a sentence uttered by a second user and is a response to the first utterance sentence, and a third utterance sentence that is a sentence uttered by the first user and is a response to the second utterance sentence; and a dialogue data generation unit that generates, for each of the plurality of pieces of data received by the input unit, a set of the first utterance sentence of the data and the second utterance sentence of the data as dialogue data when the second utterance sentence of the data is a question sentence using an interrogative.
Further, a dialogue data generation method according to the present invention includes: receiving an input of a plurality of pieces of data each including a set of a first utterance sentence that is a sentence uttered by a first user, a second utterance sentence that is a sentence uttered by a second user and is a response to the first utterance sentence, and a third utterance sentence that is a sentence uttered by the first user and is a response to the second utterance sentence by an input unit; and generating, for each of the plurality of pieces of data received by the input unit, a set of the first utterance sentence of the data and the second utterance sentence of the data as dialogue data when the second utterance sentence of the data is a question sentence using an interrogative by a dialogue data generation unit.
According to the dialogue data generation device and the dialogue data generation method of the present invention, the input unit receives an input of a plurality of pieces of data each including a set of a first utterance sentence that is a sentence uttered by a first user, a second utterance sentence that is a sentence uttered by a second user and is a response to the first utterance sentence, and a third utterance sentence that is a sentence uttered by the first user and is a response to the second utterance sentence.
Then, the dialogue data generation unit generates, for each of the plurality of pieces of data received by the input unit, a set of the first utterance sentence of the data and the second utterance sentence of the data as dialogue data when the second utterance sentence of the data is a question sentence using an interrogative.
As described above, for each of a plurality of pieces of data each including a set of a first utterance sentence that is a sentence uttered by a first user, a second utterance sentence that is a sentence uttered by a second user and is a response to the first utterance sentence, and a third utterance sentence that is a sentence uttered by the first user and is a response to the second utterance sentence, a set of the first utterance sentence and the second utterance sentence of the data is generated as dialogue data when the second utterance sentence of the data is a question sentence using an interrogative. As a result, it is possible to generate dialogue data for generating a question sentence to deeply delve conversation at a low cost.
Further, the dialogue data generation unit of the dialogue data generation device according to the present invention can generate, for each of the plurality of pieces of data received by the input unit, a set of the first utterance sentence of the data and the second utterance sentence of the data as dialogue data when the second utterance sentence of the data contains an interrogative related to a tense, a place, a subject, a target, a reason, a method, a degree, or a state and is a question sentence.
Further, the dialogue data generation device according to the present invention can further include: a question generation model learning unit that learns, for each of a plurality of pieces of dialogue data obtained by the dialogue data generation unit, a neural network for generating a sentence from an input sentence so that the second utterance sentence contained in the dialogue data is output when the first utterance sentence contained in the dialogue data is input.
The dialogue data generation device according to the present invention can further include: a question sentence generation unit that inputs a received utterance sentence to the neural network and handles an output of the neural network as a question sentence corresponding to the utterance sentence.
A program according to the present invention is a program for realizing functioning as each of the units of the above dialogue data generation device.
According to the dialogue data generation device, the dialogue data generation method, and the program of the present invention, it is possible to generate dialogue data for generating a question sentence to deeply delve conversation at a low cost.
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
The configuration of a dialogue data generation device 10 according to the embodiment of the present invention will be described with reference to
The dialogue data generation device 10 is constituted by a computer including a CPU, a RAM, and a ROM storing a program for performing a model learning processing routine and a question sentence generation processing routine that will be described later, and is functionally configured as follows.
As shown in
The input unit 100 receives the input of a plurality of pieces of data each including a set of a first utterance sentence that is a sentence uttered by a first user, a second utterance sentence that is a sentence uttered by a second user and is a response to the first utterance sentence, and a third utterance sentence that is a sentence uttered by the first user and is a response to the second utterance sentence.
The plurality of pieces of data are those obtained by extracting and collecting in advance sets of first utterance sentences that are utterances given by the first user, second utterance sentences that are responses made by the second user in response to the first utterance sentences, and third utterance sentences that are responses made by the first user in response to the second utterance sentences from a chat system, a social networking service (SNS) in which utterance sentences are posted, or the like.
For example, as shown in
Note that the plurality of pieces of data may be configured to be input to the input unit 100 from a chart system, an SNS, or the like open to the public on the Internet by a device or the like that automatically collects the plurality of pieces of data.
Then, when receiving the input of such a plurality of pieces of data each including a set of a first utterance sentence, a second utterance sentence, and a third utterance sentence, the input unit 100 transfers the plurality of pieces of data to the dialogue data generation unit 110.
For each of a plurality of pieces of data received by the input unit 100, the dialogue data generation unit 110 generates a set of the first utterance sentence of the data and the second utterance sentence of the data as dialogue data when the second utterance sentence of the data contains an interrogative related to a tense, a place, a subject, a target, a reason, a method, a degree, or a state and is a question sentence.
Specifically, for each of a plurality of pieces of data, the dialogue data generation unit 110 first determines whether the second utterance sentence of the data is a question sentence and contains an interrogative used to ask a question about so-called 5W1H (When, Where, Who, What, Why, How).
That is, the dialogue data generation unit 110 determines whether the second utterance sentence is a 5W1H question rather than determining whether the second utterance sentence is the utterance of a simple question sentence (such as an utterance ending with “?”).
This is because it is necessary to make a talk continuous as long as possible in a chat dialogue. Therefore, if the second utterance sentence is a question answerable by Yes/No, a dialogue ends at the time that the other party answers a question.
Therefore, in order to make a dialogue continuous with a question to deeply delve the content of an utterance, dialogue data is created by questions based on the 5W1H. Note that instead of the 5W1H, any interrogative such as “Whom” and “Whose” used to ask a question not answerable by Yes/No may only be contained.
For example, for each of a plurality of pieces of data shown in
Further, for example, the second utterance sentence “Friday is a day for a drinking party, right?” of the bottom data in
Further, since a third utterance sentence right after a second utterance sentence is a response made by the first user in response to the second utterance sentence, it is presumed that the second utterance sentence that is a question sentence using an interrogative has high quality as a question sentence to deeply delve the content of an utterance.
Further, in order to determine whether the second utterance sentence is a question sentence asking a question based on the 5W1H, the dialogue data generation unit 110 may use a determination device that has performed learning in advance.
Then, the dialogue data generation unit 110 transfers a plurality of pieces of generated dialogue data to the question generation model learning unit 120.
The question generation model learning unit 120 learns, for each of a plurality of pieces of dialogue data, a neural network for generating a sentence from an input sentence so that a second utterance sentence contained in the dialogue data is output when a first utterance sentence contained in the dialogue data is input.
Specifically, the question generation model learning unit 120 learns, for each of a plurality of pieces of dialogue data generated by the dialogue data generation unit 110, a question generation model that is a neural network for generating a sentence from an input sentence so that a question sentence that is a response to an utterance sentence becomes the second utterance sentence of the dialogue data when the first utterance sentence of the dialogue data is input as an utterance sentence (
For example, the question generation model learning unit 120 learns the question generation model within an encoder-decoder framework. That is, the question generation model learning unit 120 learns the question generation model using an encoder and a decoder so that the first utterance sentence of the dialogue data is input as an utterance sentence (input) and the second utterance sentence of the dialogue data becomes a question sentence (output) (for example, Reference Literature 1).
[Reference Literature 1] Oriol Vinyals, Quoc Le, “A Neural Conversational Model”, [online], 2015, Internet <URL:https://arxiv.org/abs/1506.05869>.
Then, the question generation model learning unit 120 stores a learned question generation model in the question generation model storage unit 130.
The question generation model storage unit 130 stores a learned question generation model.
The input unit 140 receives the input of an utterance sentence from a dialogue system, a user, or the like and transfers the received utterance sentence to the question sentence generation unit 150.
The question sentence generation unit 150 inputs a received utterance sentence to a neural network and handles the output of the neural network as a question sentence corresponding to the utterance sentence.
Specifically, the question sentence generation unit 150 first receives a question generation model from the question generation model storage unit 130.
Next, upon receiving an utterance sentence from the input unit 140, the question sentence generation unit 150 inputs the utterance sentence to the received question generation model to generate a question sentence using an interrogative.
Here, by using the encoder and the decoder, the question sentence generation unit 150 can generate a question sentence using an interrogative even if the utterance sentence does not correspond to the first utterance sentence of any generated dialogue data.
Then, the question sentence generation unit 150 transfers the generated question sentence to the output unit 160.
The output unit 160 outputs the generated question sentence. For example, the output unit 160 outputs, as the utterance of a dialogue system, the question sentence to a user holding a dialogue with the dialogue system through a method such as displaying the question sentence.
Upon receiving data by the input unit 100, the dialogue data generation device 10 performs the model learning processing routine shown in
First, in step S100, the input unit 100 receives the input of a plurality of pieces of data each including a set of a first utterance sentence that is a sentence uttered by a first user, a second utterance sentence that is a sentence uttered by a second user and is a response to the first utterance sentence, and a third utterance sentence that is a sentence uttered by the first user and is a response to the second utterance sentence.
In step S110, the dialogue data generation unit 110 selects the first data from among the plurality of pieces of data received in step S100.
In step S120, the dialogue data generation unit 110 determines whether the second utterance sentence of the data is a question sentence containing an interrogative related to a tense, a place, a subject, a target, a reason, a method, a degree, or a state.
When the second utterance sentence of the selected data is not a question sentence containing an interrogative (NO in step S120), the processing proceeds to step S140.
On the other hand, when the second utterance sentence of the selected data is a question sentence containing an interrogative (YES in step S120), the dialogue data generation unit 110 generates a set of the first utterance sentence of the data and the second utterance sentence of the data as dialogue data in step S130.
In step S140, the dialogue data generation unit 110 determines whether all the plurality of pieces of received data have been subjected to the above processing.
When all the plurality of pieces of data have not been subjected to the processing (NO in step S140), the dialogue data generation unit 110 selects the next data in step S150.
On the other hand, when all the plurality of pieces of data have been subjected to the processing (YES in step S140), in step S160, the question generation model learning unit 120 learns, for each of a plurality of pieces of generated dialogue data, a neural network for generating a sentence from an input sentence so that the second utterance sentence contained in the dialogue data is output when the first utterance sentence contained in the dialogue data is input.
In step S170, the question generation model learning unit 120 stores the learned neural networks in the question generation model storage unit 130.
Upon receiving an utterance sentence by the input unit 140, the dialogue data generation device 10 performs the question sentence generation processing routine shown in
First, in step S200, the input unit 140 receives the input of an utterance sentence from a dialogue system, a user, or the like.
In step S210, the question sentence generation unit 150 acquires a neural network from the question generation model storage unit 130.
In step S220, the question sentence generation unit 150 inputs the received utterance sentence to the neural network and handles the output of the neural network as a question sentence that is a response to the utterance sentence.
In step S230, the output unit 160 outputs the generated question sentence.
As described above, for each of a plurality of pieces of data each including a set of a first utterance sentence that is a sentence uttered by a first user, a second utterance sentence that is a sentence uttered by a second user and is a response to the first utterance sentence, and a third utterance sentence that is a sentence uttered by the first user and is a response to the second utterance sentence, the dialogue data generation device according to the embodiment of the present invention handles the first utterance sentence and the second utterance sentence of the data as dialogue data when the second utterance sentence of the data is a question sentence and a positive sentence or a negative sentence is not made as a response to the second utterance sentence. As a result, it is possible to generate dialogue data for generating a question sentence to deeply delve conversation at a low cost.
Note that the present invention is not limited to the above embodiment but various modifications or applications are made possible without departing from the spirit of the present invention.
In the above embodiment, the question sentence generation unit 150 generates a question sentence using neural networks learned on the basis of a plurality of pieces of dialogue data. However, the question sentence generation unit 150 may generate a question sentence using a plurality of pieces of generated dialogue data instead of neural networks.
For example, when a received utterance sentence is the same as or most likely to be similar to the first utterance sentence of any of a plurality of pieces of generated dialogue data, the question sentence generation unit 150 may generate the second utterance sentence of the dialogue data as a question sentence.
Further, a plurality of pieces of dialogue data may be used as scenarios for chat dialogues.
Further, examples of the above embodiment include, but not limited to, a case in which a third utterance sentence is used only from the viewpoint of whether the third utterance sentence is a response made by a person who has made a first utterance sentence in response to a second utterance sentence and in which data including a set of a first utterance sentence, a second utterance sentence, and a third utterance sentence collected from a chat system, a social networking service (SNS) in which utterance sentences are posted, or the like is input.
A determination is made as to whether a third utterance sentence is appropriate as a response to a second utterance sentence, and only data in which a third utterance sentence is determined to be appropriate as a response to a second utterance sentence may be input as data including a set of a first utterance sentence, a second utterance sentence, and a third utterance sentence.
Thus, it is possible to enhance the quality of a second utterance sentence generated as dialogue data as a question sentence for deeply delving.
Note that for a determination as to whether a third utterance sentence is appropriate as a response to a second utterance sentence, it is possible to use a method in which the input of a result confirmed by visual recognition is received, a method in which a determination as to whether the relationship between a second utterance sentence and a third utterance sentence is a response relationship is automatically made, or the like.
Further, the embodiment in which a program is installed in advance is described in the present specification, but it is also possible to provide the program in a state of being stored in a computer-readable recording medium.
10 dialogue data generation device
100 input unit
110 dialogue data generation unit
120 question generation model learning unit
130 question generation model storage unit
140 input unit
150 question sentence generation unit
160 output unit
Number | Date | Country | Kind |
---|---|---|---|
2018-090637 | May 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/018268 | 5/7/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2019/216316 | 11/14/2019 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
8275803 | Brown | Sep 2012 | B2 |
8332394 | Fan | Dec 2012 | B2 |
8515764 | Nakano | Aug 2013 | B2 |
8768925 | Brown | Jul 2014 | B2 |
9703861 | Brown | Jul 2017 | B2 |
10319379 | Ikeno | Jun 2019 | B2 |
10514881 | Nagasaka | Dec 2019 | B2 |
10614106 | Kelsey | Apr 2020 | B2 |
10762305 | Liu | Sep 2020 | B2 |
10936664 | Abe | Mar 2021 | B2 |
10991366 | Kang | Apr 2021 | B2 |
11087757 | Ikeno | Aug 2021 | B2 |
11256658 | Kruengkrai | Feb 2022 | B2 |
11734520 | Sugiyama | Aug 2023 | B2 |
Entry |
---|
Higashinaka et al. (2006) “Incorporating Discourse Features into Confidence Scoring of Intention Recognition Results n Spoken Dialogue Systems,” Speech Communication, vol. 48, No. 3-4, pp. 417-436. |
Katayama et al. (2018) “Question Generation to Deepen Your Talk,” Journal of the Japan Association of Artificial Intelligence, pp. 1-3. |
Number | Date | Country | |
---|---|---|---|
20210342553 A1 | Nov 2021 | US |