This application is based on and claims priority under 35 U.S.C. § 119 to Japanese Patent Application 2022-024904, filed on Feb. 21, 2022, the entire content of which is incorporated herein by reference.
This disclosure relates to a dialogue system and a dialogue unit.
For example, JP 2017-49427A (Reference 1) discloses a dialogue control apparatus for having a dialogue with a user. The dialogue control apparatus estimates emotions of the user to have a dialogue on a topic preferred by the user.
An automaton is well known as a technique for representing a state of a dialogue. Specifically, a scenario of the dialogue is created in advance. The scenario is expanded according to a transition described by the automaton. In this case, the dialogue control apparatus has a dialogue according to the transition described by the automaton.
In the above case, when the user brings up an unexpected topic out of the scenario, the dialogue control apparatus cannot respond. It is difficult to create a scenario in advance based on an assumption of all states so as to avoid such a situation.
According to an aspect of this disclosure, a dialogue system includes: a storage device; and an execution device, in which scenario data is stored in the storage device, and the scenario data is data defining a response sentence corresponding to a state and a transition condition for transition to a different state. The execution device executes a text data generation process, a determination process, a scenario response process, a chat process, a storage process, and a return process, the text data generation process is a process of converting a voice of a user into text data using an output signal of a microphone as input, the determination process is a process of determining whether the transition condition is satisfied based on the text data and the scenario data, the scenario response process is a process of operating a speaker so as to make a response based on a response sentence defined in a state of a transition destination according to the transition condition based on the scenario data when it is determined that the transition condition is satisfied, the chat process is a process of operating the speaker so as to make a response different from the response based on the response sentence defined in the scenario data when it is determined that the transition condition is not satisfied, the storage process is a process of storing and maintaining a state before execution of the chat process in the storage device when the chat process is to be executed, and the return process is a process of returning to the stored and maintained state when the chat process ends.
According to another aspect of this disclosure, a dialogue unit is a dialogue unit included in the above dialogue system.
The foregoing and additional features and characteristics of this disclosure will become more apparent from the following detailed description considered with the reference to the accompanying drawings, wherein:
(a) and (b) of
Hereinafter, an embodiment will be described with reference to the drawings.
The control device 20 operates the display unit 12 to control an image displayed on the display unit 12. At this time, the control device 20 refers to RGB image data Drgb output by an RGB camera 30 in order to control the image. The RGB camera 30 is disposed toward a direction in which the user is assumed to be located. The RGB image data Drgb includes luminance data of three primary colors including red, green, and blue. Further, the control device 20 refers to infrared image data Dir output by an infrared camera 32 in order to control the image. The infrared camera 32 is also disposed toward the direction in which the user is assumed to be located. In addition, the control device 20 refers to a sound signal Ss output by a microphone 34 in order to control the image. The microphone 34 is provided to sense a sound signal generated by the user.
The control device 20 operates a speaker 36 to output a sound signal in accordance with an action in the agent image 14.
The control device 20 includes a PU 22, a storage device 24, and a communication device 26. The PU 22 is a software processing device including at least one of a CPU, a GPU, a TPU, and the like. The storage device 24 stores scenario data 24b. The scenario data 24b includes a finite automaton.
As illustrated in
Referring back to
The back-end unit 50 executes a process of processing data transmitted from the dialogue unit 10, and the like. The back-end unit 50 includes a PU 52, a storage device 54, and a communication device 56. The PU 52 is a software processing device including at least one of a CPU, a GPU, a TPU, and the like.
(a) and (b) of
In a series of processes illustrated in (a) of
When it is determined that the utterance is detected (S10: YES), the PU 22 converts the sound signal Ss as an analog signal into digital sound data Ds (S12). Then, the PU 22 operates the communication device 26 to transmit the sound data Ds to the back-end unit 50 (S14). Specifically, at this time, the PU 22 also transmits, in addition to the sound data Ds, a request for converting the sound data Ds into text data to the back-end unit 50.
When the process of S14 is executed, the PU 52 of the back-end unit 50 determines that a text generation request is present as illustrated in (b) of
Next, the PU 52 decomposes the text data into words by morphological analysis (S46). Then, the PU 52 operates the communication device 56 to transmit the text data decomposed into words to the dialogue unit 10 (S48).
On the other hand, as illustrated in (a) of
When it is determined that the transition condition is satisfied (S20: YES), the PU 22 causes a transition of the state to a transition destination associated with the transition condition (S22). Then, the PU 22 operates the speaker 36 to execute an utterance process according to the utterance content defined by the state number of the transition destination based on the scenario data 24b (S24). That is, the PU 22 causes the speaker 36 to output a sound signal corresponding to the utterance content.
On the other hand, when it is determined that the transition condition is not satisfied (S20: NO), the PU 22 stores the data indicating the current state number in the storage device 24 as transition source data 24c (S26). In addition, the PU 22 substitutes “1” into the flag F. Then, the PU 22 operates the communication device 26 to transmit the text data indicating the utterance content of the user received in the process of S16 to the back-end unit 50 (S28). Specifically, at this time, the PU 22 transmits, in addition to the text data, a request for generating a chat corresponding to the text data to the back-end unit 50.
When the process of S28 is executed, the PU 52 of the back-end unit 50 determines that a chat generation request is present as illustrated in (b) of
The PU 52 operates the communication device 56 to transmit the chat text data to the dialogue unit 10 (S56). The PU 52 temporarily ends a series of processes illustrated in (b) of
On the other hand, as illustrated in (a) of
When the PU 22 determines that the chat ending condition is satisfied (S34: YES), the PU 22 substitutes “0” into the flag F (S36). The PU 22 temporarily ends the series of processes illustrated in (a) of
Here, functions and effects according to the present embodiment will be described.
In
Then, according to an utterance content defined by the state number 1 in the scenario data 24b, the dialogue system utters “What time would you like?” In the example illustrated in
In the example illustrated in
In this manner, when the transition condition in the scenario data 24b is not satisfied, the PU 22 stores the current state number in the storage device 24 as the transition source data 24c. Then, the PU 22 uses the chat generation mapping to continue a conversation with the user. Therefore, even when a conversation content of the user deviates from the scenario defined by the scenario data 24b, the dialogue system can cope with this situation.
According to the present embodiment described above, functions and effects are further obtained as follows.
(1) The chat generation mapping is used as the trained model. Accordingly, it is possible to form a chat process without relying on a scenario-type dialogue process.
(2) The chat text data is generated by the back-end unit 50. Accordingly, a calculation load of the dialogue unit 10 can be reduced as compared with a case in which the dialogue unit 10 generates the chat text data.
(3) The sound data Ds obtained by converting a voice of the user into digital data is converted into text data in the back-end unit 50. Accordingly, the calculation load of the dialogue unit 10 can be reduced as compared with a case in which the dialogue unit 10 executes the process of converting into the text data. In addition, as compared to the case in which the dialogue unit 10 executes the process of converting into the text data, a highly accurate external service of converting the sound data Ds into text data can be used.
Correspondence between matters in the above embodiment and matters described in a section of “Solution to Problem” is as follows. Hereinafter, a correspondence relationship is shown for each number in the solution described in the section of “Solution to Problem”. [1] A storage device corresponds to the storage devices 24 and 54. An execution device corresponds to the PUs 22 and 52. A text data generation process corresponds to the process of S44. A determination process corresponds to the process of S20. A scenario response process corresponds to the process of S24. A chat process corresponds to processes of S28 to S32 and processes of S50 to S56. A storage process corresponds to the process of S26. A return process corresponds to the process of S36 when it is determined to be YES in the process of S34. [2] Chat generation mapping data corresponds to the chat generation mapping data 54c. [3, 5] A first storage device corresponds to the storage device 24. A second storage device corresponds to the storage device 54. A first execution device corresponds to the PU 22. A second execution device corresponds to the PU 54. A first communication device corresponds to the communication device 26. A second communication device corresponds to the communication device 56. A chat text data calculation process corresponds to the process of S54. A response sentence transmission process corresponds to the process of S56. A response sentence reception process corresponds to the process of S30. [4] A text data transmission process corresponds to the process of S48. A text data reception process corresponds to the process of S16.
The present embodiment may be modified and implemented as follows. The present embodiment and the following modifications can be implemented in combination with each other within a range that the embodiment and the modifications do not technically contradict each other.
In the above embodiment, an example is described in which the chat generation mapping data 54c defining the chat generation mapping is trained data based on machine learning, but this disclosure is not limited thereto. For example, the chat generation mapping data 54c may be a scenario-type chatbot or the like. Even in this case, when the chatbot or the like is an external service provided via the network 40, it is possible to prevent the scenario data 24b in the dialogue unit 10 from becoming complicated.
In (a) and (b) of
The chat process is not limited to the process executed by the back-end unit 50. For example, the chat process may be implemented by the PU 22 alone by storing the chat generation mapping data 54c in the storage device 24.
In the processes illustrated in (a) and (b) of
The text data generation process is not limited to the process executed by the back-end unit 50. For example, the text data generation process may be implemented by the PU 22 alone by storing the text generation mapping data 54b in the storage device 24.
The scenario data is not limited to data including an action of the agent. For example, the scenario data may be data including utterance contents such as a response sentence while not including the action of the agent.
A display device is not limited to a device including the display unit 12. For example, holography may be used. In addition, for example, a head-up display or the like may be used.
It is not essential that the dialogue unit includes the display device.
It is not essential that the dialogue system includes the back-end unit 50.
The execution device is not limited to a device that executes a software process such as a CPU, a GPU, and a TPU. For example, the execution device may include a dedicated hardware circuit such as an ASIC that executes a hardware process on at least a part of data which is subjected to the software process in the above embodiment. That is, the execution device may have any one of the following configurations (a) to (c). (a) A processing device that executes all of the above processes according to a program, and a program storage device that stores the program are provided. (b) A processing device that executes a part of the above processes according to a program, a program storage device, and a dedicated hardware circuit that executes the remaining processes are provided. (c) A dedicated hardware circuit that executes all of the above processes is provided. Here, a plurality of software execution devices including the processing device and the program storage device and a plurality of dedicated hardware circuits may be provided.
Hereinafter, a method for solving the problems of the related art and functions and effects thereof will be described.
1. A dialogue system includes: a storage device; and an execution device, in which scenario data is stored in the storage device, and the scenario data is data defining a response sentence corresponding to a state and a transition condition for transition to a different state. The execution device executes a text data generation process, a determination process, a scenario response process, a chat process, a storage process, and a return process, the text data generation process is a process of converting a voice of a user into text data using an output signal of a microphone as input, the determination process is a process of determining whether the transition condition is satisfied based on the text data and the scenario data, the scenario response process is a process of operating a speaker so as to make a response based on a response sentence defined in a state of a transition destination according to the transition condition based on the scenario data when it is determined that the transition condition is satisfied, the chat process is a process of operating the speaker so as to make a response different from the response based on the response sentence defined in the scenario data when it is determined that the transition condition is not satisfied, the storage process is a process of storing and maintaining a state before execution of the chat process in the storage device when the chat process is to be executed, and the return process is a process of returning to the stored and maintained state when the chat process ends.
In the above configuration, when the text data based on the voice of the user satisfies the transition condition, a response at a state after transition according to the transition condition is made. In contrast, when the transition condition is not satisfied, the process proceeds to the chat process. The chat process is a process of making a response different from the response based on the response sentence defined in the above scenario data. Therefore, it is possible to respond to a topic of the user while preventing the above scenario data from becoming complicated.
2. In the dialogue system according to 1, the return process is executed when the response of the chat process ends or when the user utters a predetermined word related to an end of a chat.
In the above configuration, when an utterance process is executed, it is possible to determine whether a chat ending condition is satisfied.
3. In the dialogue system according to 1, chat generation mapping data is stored in the storage device, the chat generation mapping data is trained data defining chat generation mapping that outputs a response sentence with respect to input, and the chat process includes a process of inputting, to the chat generation mapping, data corresponding to the text data when it is determined that the transition condition is not satisfied, so as to obtain output of the chat generation mapping.
In the above configuration, chat generation mapping defined by trained data is used. Accordingly, it is possible to form a chat process without relying on a scenario-type dialogue process.
4. The dialogue system according to 3 further includes: a dialogue unit; and a back-end unit, in which the storage device includes a first storage device and a second storage device, the execution device includes a first execution device and a second execution device, the dialogue unit includes the first storage device, the first execution device, and a first communication device, the back-end unit includes the second storage device, the second execution device, and a second communication device, the scenario data is stored in the first storage device, the chat generation mapping data is stored in the second storage device, the chat process includes a chat text data calculation process, a response sentence transmission process, and a response sentence reception process, the chat text data calculation process is executed by the second execution device and is a process of calculating output of the chat generation mapping corresponding to the text data when it is determined that the transition condition is not satisfied, the response sentence transmission process is a process of transmitting a response sentence corresponding to the output of the chat generation mapping by the second execution device operating the second communication device, and the response sentence reception process is a process of receiving the response sentence corresponding to the output by the first execution device operating the first communication device.
In the above configuration, the chat text data calculation process is executed outside the dialogue unit, so that a calculation load of the first execution device can be reduced as compared with a case in which the first execution device executes the chat text data calculation process.
5. In the dialogue system according to 4, the first execution device executes a text data reception process, the second execution device executes a text data transmission process, the text data generation process is executed by the second execution device, the text data transmission process is a process of transmitting the text data generated by the text data generation process to the dialogue unit by the second execution device operating the second communication device, and the text data reception process includes a process of receiving the text data by the first execution device operating the first communication device.
In the above configuration, the text data generation process is executed outside the dialogue unit, so that the calculation load of the first execution device can be reduced as compared with a case in which the first execution device executes the text data generation process.
6. A dialogue unit, which is the dialogue unit included in the dialogue system according to 4 or 5.
The principles, preferred embodiment and mode of operation of the present invention have been described in the foregoing specification. However, the invention which is intended to be protected is not to be construed as limited to the particular embodiments disclosed. Further, the embodiments described herein are to be regarded as illustrative rather than restrictive. Variations and changes may be made by others, and equivalents employed, without departing from the spirit of the present invention. Accordingly, it is expressly intended that all such variations, changes and equivalents which fall within the spirit and scope of the present invention as defined in the claims, be embraced thereby.
Number | Date | Country | Kind |
---|---|---|---|
2022-024904 | Feb 2022 | JP | national |