The present invention relates to a learning device, an estimation device, a learning method, an estimation method, a learning program, and an estimation program.
There is a conventional technology of a conversation system that generates a speech responding to a speech of a user and achieves smooth interaction between the user and the system. In such a conversation system, quick response is an important element, and for example, there is a technology of randomly generating a quick response (for example, see Patent Literature 1).
However, the conventional technology has a problem that a more natural quick response cannot be generated as a listener. For example, in the conventional technology, there is a limit to performing a speech at an appropriate timing, and there is a problem that the content of the speech is far from a natural quick response.
The present invention has been made in view of the above, and an object thereof is to provide a learning device, an estimation device, a learning method, an estimation method, a learning program, and an estimation program capable of generating a more natural quick response as a listener.
In order to solve the above-described problems and achieve the object, a learning device of the present invention includes: an acquisition unit that acquires speech data of a speaker, information on the speaker, conversation data of a listener, information on the listener, and a classification label of a quick response included in the conversation data of the listener; and a creation unit that creates a learned model of estimating a type of the quick response of the listener to a conversation of the speaker using the information acquired by the acquisition unit with the classification label of the quick response as correct answer data.
In addition, an estimation device includes: an acquisition unit that acquires speech data of a speaker, information on the speaker, conversation data of a listener, and information on the listener; and an estimation unit that inputs the information acquired by the acquisition unit as input data to a learned model of estimating a type of a quick response of the listener to a conversation of the speaker and estimates the type of the quick response of the listener to the conversation of the speaker.
According to the present invention, it is possible to generate a more natural quick response as a listener.
Hereinafter, embodiments of a learning device, an estimation device, a learning method, an estimation method, a learning program, and an estimation program according to the present application will be described in detail with reference to the drawings. Moreover, the present invention is not limited to the embodiment described below.
[Configuration of Learning Device]
The communication processing unit 11 is implemented by a network interface card (NIC) or the like, and controls communication via an electric communication line such as a local area network (LAN) or the Internet.
The input unit 12 is implemented by using an input device such as a keyboard or a mouse and inputs various types of instruction information such as processing start to the control unit 14 in response to an input operation by an operator. The output unit 13 is implemented by a display device such as a liquid crystal display.
The storage unit 15 stores data and programs necessary for various types of processing by the control unit 14, and includes a learned model storage unit 15a. For example, the storage unit 15 is a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk.
The learned model storage unit 15a stores a learned model learned by the creation unit 14b to be described later. For example, the learned model storage unit 15a stores, as a learned model, a classifier for estimating a type of a quick response of a listener to a conversation of a speaker.
The control unit 14 includes an internal memory for storing a program defining various processing procedures and the like and required data, and executes various types of processing using the program and the data. For example, the control unit 14 includes a learning data acquisition unit 14a and a creation unit 14b. Here, the control unit 14 is an electronic circuit such as a central processing unit (CPU) or a micro processing unit (MPU), or an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
The learning data acquisition unit 14a acquires the speech data of the speaker and the information on the speaker, the conversation data of the listener and the information on the listener, and the classification label of the quick response included in the conversation data of the listener. For example, the learning data acquisition unit 14a acquires any one or more of the expression, the motion, and the voice of the speaker as the information on the speaker, and acquires any one or more of the expression, the motion, and the voice of the speaker as the information on the listener. The learning data acquisition unit 14a may acquire, for example, image data of the face of the speaker or the entire speaker, or may acquire information such as the expression “smile” or the motion “absence” as the information of the expressions and the motions of the speaker and the listener.
The creation unit 14b uses the information acquired by the learning data acquisition unit 14a to create a learned model of estimating a type of a quick response of the listener to the conversation of the speaker using the classification label of the quick response as correct answer data. That is, the creation unit 14b creates a learned model of estimating the type of quick response included in the speech of both the speaker and the listener and the conversation data of the listener. The creation unit 14b may use any method using the model as a learning method. Here, the quick response included in the conversation data of the listener is, for example, a speech such as “yes, yes, yes”, “yeah, yeah”, “good”, “I see”, “true”, “yeah”, “yes”, “oh”, “ooh”, “hmm”, “great”, or “what” included in the conversation data. Thereafter, the creation unit 14b stores the created learned model in the learned model storage unit 15a.
Here, processing of creating a learned model will be described with reference to
[Configuration of Estimation Device]
The communication processing unit 21 is implemented by an NIC or the like, and controls communication via a telecommunication line such as a LAN or the Internet. The input unit 22 is implemented by using an input device such as a keyboard or a mouse and inputs various types of instruction information such as processing start to the control unit 24 in response to an input operation by an operator. The output unit 23 is implemented by a display device such as a liquid crystal display.
The storage unit 25 stores data and programs necessary for various types of processing by the control unit 24, and includes a learned model storage unit 25a. For example, the storage unit 25 is a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk.
The learned model storage unit 25a stores a learned model learned by the creation unit 14b. For example, the learned model storage unit 25a stores, as a learned model, a classifier for estimating a type of a quick response of a listener to a conversation of a speaker.
The control unit 24 includes an internal memory for storing a program defining various processing procedures and the like and required data, and executes various types of processing using the program and the data. For example, the control unit 24 includes an input data acquisition unit 24a and an estimation unit 24b. Here, the control unit 24 is an electronic circuit such as a central processing unit (CPU) or a micro processing unit (MPU), or an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
The input data acquisition unit 24a acquires the speech data of the speaker and the information on the speaker, and the conversation data of the listener and the information on the listener. For example, the input data acquisition unit 24a acquires any one or more of the expression, the motion, and the voice of the speaker as the information on the speaker, and acquires any one or more of the expression, the motion, and the voice of the speaker as the information on the listener.
The estimation unit 24b inputs the information acquired by the input data acquisition unit 24a as input data to a learned model of estimating a type of a quick response of the listener to the conversation of the speaker, and estimates the type of the quick response of the listener with respect to the conversation of the speaker. Then, the estimation unit 24b outputs the classified type of quick response.
Here, processing of estimating a type of a quick response of a listener to a conversation of a speaker will be described with reference to
For example, as illustrated in
As a result, the estimation device 20 systematically classifies speeches indicating a wide variety of modes, that is, quick responses, which can be useful for improving mutual understanding in communication and improving the accuracy of dialogue analysis. That is, for example, there are many quick responses having different meanings even having the same syllable, and there is a large difference in nuances due to language and culture, which often causes misunderstanding. Therefore, the estimation device 20 can clarify the emotion and intention of the speaker of the quick response by systematizing and classifying the quick response spoken by the listener. Furthermore, for example, if there is a system in which the estimation device 20 classifies and displays quick responses in real time, the speaker can accurately understand the emotion and intention of the listener. Also in the analysis of the dialogue, when the estimation device 20 performs the classification of quick response, it is possible to more clearly grasp the clarification of the intention and the change in the state of mind not included in the speech.
[Processing Procedure by Learning Device] Next, an example of a processing procedure of processing executed by the learning device 10 will be described with reference to
As illustrated in
The creation unit 14b uses the information acquired by the learning data acquisition unit 14a to create a learned model of classifying a quick response of the listener to the conversation of the speaker using the classification label of the quick response as correct answer data (step S104). Thereafter, the creation unit 14b stores the created learned model in the learned model storage unit 15a (step S105).
[Processing Procedure by Estimation Device] Next, an example of a processing procedure of processing executed by the estimation device 20 will be described with reference to
As illustrated in
[Effects of Embodiment] As described above, the learning device 10 according to the embodiment acquires speech data of a speaker and information on the speaker, conversation data of a listener and information on the listener, and a classification label of a quick response included in the conversation data of the listener, and creates a learned model of estimating a type of the quick response of the listener to a conversation of the speaker using the acquired information with the classification label of the quick response as correct answer data. For this reason, the learning device 10 learns the classification of the content of the quick response spoken by the listener with respect to the speech of the speaker, so that it is possible to appropriately classify the content of the quick response, and it is possible to use it for generating an appropriate quick response.
The estimation device 20 acquires speech data of a speaker, information on the speaker, conversation data of a listener, and information on the listener, and inputs the acquired information as input data to a learned model of estimating a type of a quick response of the listener to a conversation of the speaker and estimates the type of the quick response of the listener to the conversation of the speaker. Therefore, the estimation device 20 can appropriately classify the content of the quick response, and can generate a more natural quick response as a listener by helping to generate an appropriate quick response.
Each component of each device illustrated according to the above embodiments is functionally conceptual and does not necessarily have to be physically configured as illustrated. That is, a specific form of distribution and integration of each device is not limited to the illustrated form, and all or a part thereof can be functionally or physically distributed and integrated in any unit according to various loads, usage conditions, and the like. Furthermore, all or any part of the processing functions performed in each device can be implemented by a CPU and a program analyzed and executed by the CPU, or can be implemented as hardware by wired logic.
Furthermore, among the processing described in the above embodiments, all or a part of the processing described as being automatically performed can be manually performed, or all or a part of the processing described as being manually performed can be automatically performed by a known method. In addition, the processing procedures, the control procedures, the specific names, and the information including various data and parameters illustrated in the above document and drawings can be arbitrarily changed unless otherwise specified.
In addition, it is also possible to create a program in which the processing to be executed by the learning device 10 or the estimation device 20 described in the embodiment described above is described in a language that can be executed by a computer. In this case, the computer executes the program, and thus the effects similar to those of the above embodiments can be obtained. Furthermore, the program may be recorded in a computer-readable recording medium, and the program recorded in the recording medium may be read and executed by the computer to implement processing similar to that of the above embodiments.
As exemplified in
Here, as illustrated in
In addition, various data described in the above embodiments is stored as program data in, for example, the memory 1010 and the hard disk drive 1031. The CPU 1020 then reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1031 to the RAM 1012 as necessary, and executes various processing procedures.
Note that the program module 1093 and the program data 1094 related to the program are not limited to being stored in the hard disk drive 1031, and may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive or the like. Alternatively, the program module 1093 and the program data 1094 related to the program may be stored in another computer connected via a network (such as a local area network (LAN) or a wide area network (WAN)) and read by the CPU 1020 via the network interface 1070.
Although the embodiments to which the invention made by the present inventor is applied have been described above, the present invention is not limited by the description and drawings constituting a part of the disclosure of the present invention according to the present embodiments. In other words, other embodiments, examples, operational techniques, and the like made by those skilled in the art or the like on the basis of the present embodiment are all included in the scope of the present invention.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/JP2022/007726 | 2/24/2022 | WO |