The present disclosure relates to a response processing apparatus, a response processing method, and a response processing program. In particular, the present disclosure relates to a response process for a user who uses a plurality of information devices.
With the improvement of a network technology, opportunities for a user to use a plurality of information devices are increasing. In view of the situation as described above, a technology for smoothly using a plurality of information devices has been proposed.
For example, a technology for, in a system in which a plurality of client devices are connected via a network, arranging a device that integrally controls the system to effectively perform a process of the entire system has been proposed.
Patent Literature 1: Japanese Laid-open Patent Publication No. 7-4882
According to the conventional technology as described above, the device that integrally controls the system receives a processing request to each of the information devices and performs a process corresponding to functions of each of the information devices, so that it is possible to effectively perform a process of the entire system.
However, in the conventional technology, it is not always possible to improve usability for a user. Specifically, in the conventional technology, only determination on whether each of the information devices is able to accept a processing request is performed, and when each of the information devices performs a process by accepting a request of the user for example, it is not always possible to perform the process in a mode that meets a demand of the user.
To cope with this, in the present disclosure, a response processing apparatus, a response processing method, and a response processing program that are able to improve usability for a user are proposed.
To solve the problem described above, a response processing apparatus includes: an acquisition unit that acquires, from a user, input information that is information used as a trigger to cause an information device to generate a response; a selection unit that selects an information device that generates a response corresponding to the input information from among a plurality of information devices; and an output control unit that controls output of a response that corresponds to the input information and that is generated by the selected information device.
Embodiments of the present disclosure will be described in detail below based on the drawings. Meanwhile, in each of the embodiments below, the same components are denoted by the same reference symbols, and repeated explanation will be omitted.
The present disclosure will be described in sequence of items below.
1-1. Example of response process according to first embodiment
1-2. Configuration of response processing system according to first embodiment
1-3. Flow of response process according to first embodiment
1-4. Modification of first embodiment
2-1. Example of response process according to second embodiment
2-2. Modification of second embodiment
1-1. Example of Response Process According to First Embodiment
One example of a response process according to a first embodiment of the present disclosure will be described with reference to
The response processing apparatus 100 is one example of a response processing apparatus according to the present disclosure. The response processing apparatus 100 is what is called an Internet of Things (IoT) device, and performs various kinds of information processing in cooperation with an external device, such as a cloud server. For example, the response processing apparatus 100 is an apparatus that dialogues with a user, and performs various kinds of information processing, such as voice recognition or response generation for a user. The voice recognition, a voice response process, and the like performed by the response processing apparatus 100 may be referred to as an agent function. Further, the response processing apparatus 100 may be referred to as an agent device.
In the first embodiment, an example in which the response processing apparatus 100 is what is called a smart speaker is illustrated. Meanwhile, the response processing apparatus 100 may include not only a speaker unit that performs voice output, but also a display unit (a liquid crystal display or the like) that outputs a video or the like. Further, the response processing apparatus 100 may be a smartphone, a tablet terminal, or the like. In this case, the smartphone or the tablet terminal functions as the response processing apparatus 100 according to the present disclosure by executing a program (application) for implementing the response process according to the present disclosure.
Furthermore, the response processing apparatus 100 may be a wearable device, such as a watch-type terminal or a glasses-type terminal, other than the smartphone or the tablet terminal. Moreover, the response processing apparatus 100 may be implemented by various smart devices that have information processing functions. For example, the response processing apparatus 100 may be a smart household appliance, such as a television, an air conditioner, or a refrigerator, a smart vehicle, such as a car, or an autonomous robot, such as a drone, a pet-like robot, or a humanoid robot.
In the example in
Further, in the example illustrated in
As in the example in
For example, an activation word for starting to use the agent function is set in each of the agent devices. Therefore, when using the plurality of agent devices, the user needs to speak the action word corresponding to each of the agent devices.
Further, upon receiving a query from the user for example, each of the agent devices accesses a different service and obtains an answer. Specifically, when receiving a query about weather information from the user, each of the agent devices accesses a different weather information service and obtains a different answer. Therefore, it is difficult for the user to determine whether the agent device that provides an answer to the query will provide information (for example, ultraviolet information, pollen information, or the like) that the user wants to know. Furthermore, a certain agent device may have difficulty in accessing a service to obtain an answer to the query issued by the user, and may have difficulty in generating an answer. If an appropriate answer is not obtained, the user needs to take time and effort to speak the same query to a different agent device.
Moreover, each of the agent devices has various different kinds of performance and functions in terms of, for example, whether it is possible to output an image or whether it is possible to output a voice. With an increase in the number of agent devices, it becomes more difficult for the user to memorize the performance and the functions, so that it becomes difficult to implement the performance and the functions of the agent devices. Furthermore, when the agent devices are updated and new functions are added for example, a large burden is imposed on the user to check the updates one by one, so that the added functions may remain unused.
To cope with this, the response processing apparatus 100 according to the present disclosure solve the problems as described above by a response process as described below.
Specifically, the response processing apparatus 100 functions as a front end device of the plurality of agent devices, and collectively receives communications with the user. For example, the response processing apparatus 100 analyzes a content of a query received from the user and selects an agent device that generates a response. As one example, the response processing apparatus 100 refers to functions and performance of a plurality of cooperating agent devices, and selects an agent device that is expected to generate a most appropriate answer to the query from the user. With this configuration, the response processing apparatus 100 is able to improve a degree of accuracy at which a response desired by the user is generated. Furthermore, the response processing apparatus 100 determines a mode of output of the response that is generated by the selected agent device. For example, the response processing apparatus 100 receives the generated response (for example, voice data), detects a position of the user, and transmits the response to a different agent device that is installed at a position closest to the user. Further, the response processing apparatus 100 causes the agent device, which has received the response, to output the response. With this configuration, the response processing apparatus 100 is able to output the response from the agent device that is located at the position closest to the user, so that it is possible to appropriately send the information to the user.
In this manner, the response processing apparatus 100 operates as the front end device of the plurality of agent devices, and controls generation and output of the response, to thereby improve usability for the user. One example of the response process according to the first embodiment of the present disclosure will be described below in sequence with reference to
In the example illustrated in
Furthermore, the response processing apparatus 100 receives, in advance, a setting of an activation word for activating the subject apparatus from the user. For example, it is assumed that the response processing apparatus 100 has received a voice input of “hello” as the activation word.
In this case, if a voice A01 of “hello” is received from the user, the response processing apparatus 100 activates a response process of the subject apparatus (Step S1). Further, the response processing apparatus 100 activates each of the cooperating terminals 10 by using the voice A01 as a trigger.
Specifically, the response processing apparatus 100 converts information indicating that “the voice A01 is input from the user” to information corresponding to the activation word of the terminal 10A, and transmits converted information A02 to the terminal 10A (Step S2). The information corresponding to the activation word of the terminal 10A may be voice data for actually activating the terminal 10A or a script (program) for activating the terminal 10A. For example, the response processing apparatus 100 transmits the information A02 to the terminal 10A by using a home network, such as Wi-Fi (registered trademark), or wireless communication, such as Bluetooth (registered trademark).
In other words, the user is able to activate the terminal 10A in a linked manner by only inputting the voice A01 that is the activation word to the response processing apparatus 100.
Similarly, the response processing apparatus 100 converts the information indicating that “the voice A01 is input from the user” to information corresponding to the activation word of the terminal 10B, and transmits converted information A03 to the terminal 10B (Step S3). Furthermore, the response processing apparatus 100 converts the information indicating that “the voice A01 is input from the user” to information corresponding to the activation word of the terminal 10C, and transmits converted information A04 to the terminal 10C (Step S4).
In this manner, when recognizing the activation word of the subject apparatus, the response processing apparatus 100 activates each of the cooperating terminal 10A, the cooperating terminal 10B, and the cooperating terminal 10C. With this configuration, the user is able to activate all of the devices installed in his/her home without speaking the activation words to all of the devices with which the user is going to dialogue. Meanwhile, the response processing apparatus 100 may receive, from the user, setting for individually specifying the terminal 10 that is not activated in cooperation with the subject apparatus. With this configuration, the user is able to distinguish between the terminal 10 that is activated in a cooperative manner and the terminal 10 that is not activated in a cooperative manner.
One example in which the response processing apparatus 100 outputs a response to a query received from the user will be described below with reference to
In the example in
The response processing apparatus 100 starts the response process by using the voice A11 as input information (Step S11). Specifically, the response processing apparatus 100 acquires the voice A11, and analyzes the query that is issued by the user and that is included in the voice A11 through an automatic speech recognition (ASR) process or a natural language understanding (NLU) process. For example, if the voice A11 includes a purpose of the query from the user, the response processing apparatus 100 recognizes the purpose of the query as the input information and starts a process of outputting, as the response, an answer corresponding to the purpose of the query.
In the example illustrated in
If the response processing apparatus 100 is not able to perform “a search for a recipe for a dish XXX”, e.g., if it is difficult to access a service that performs “a search for a recipe” or if it is difficult to detect “a recipe for a dish XXX”, the response processing apparatus 100 transmits the query issued by the user to the cooperating terminal 10A or the like. In other words, the response processing apparatus 100 transmits “a search for a recipe for a dish XXX” that is the purpose of the query issued by the user to the terminal 10A or the like, and causes the terminal 10A or the like to perform the search in place of the subject apparatus.
For example, the response processing apparatus 100 transmits information A12 indicating “a search for a recipe for a dish XXX” that is the purpose of the query issued by the user to the terminal 10A (Step S12). At this time, the response processing apparatus 100 converts the voice A11 received from the user to the information A12 indicating “a search for a recipe for a dish XXX” in accordance with the audio API or the like of the terminal 10A, for example. In other words, the response processing apparatus 100 converts the voice A11 to the information A12 that is in a certain format that can be recognized by the terminal 10A, and thereafter transmits the information A12 to the terminal 10A. This indicates the same situation in which the terminal 10A receives a speech of “let me know a recipe for a dish XXX” from the user.
Similarly, the response processing apparatus 100 converts the voice A11 to information A13 in a certain format that can be recognized by the terminal 10B, and thereafter transmits the information A13 to the terminal 10B (Step S13). Furthermore, the response processing apparatus 100 converts the voice A11 to information A14 in a certain format that can be recognized by the terminal 10C, and thereafter transmits the information A14 to the terminal 10C (Step S14). In other words, the response processing apparatus 100 selects the terminal 10A, the terminal 10B, and the terminal 10C as the agent devices that generate responses to the query issued by the user.
Thereafter, the response processing apparatus 100 receives a return indicating whether it is possible to generate an answer to the query issued by the user from the terminal 10A, the terminal 10B, or the terminal 10C. In the example in
Upon receiving the reply from the terminal 10B, the response processing apparatus 100 selects the terminal 10B as the agent device that generates the response to be output to the user. In this case, the response processing apparatus 100 outputs a voice for notifying the user of the terminal 10B that serves as an output destination. For example, the response processing apparatus 100 outputs a voice A15 including a content of “the terminal 10B performs output” to the user. Subsequently, the response processing apparatus 100 causes the terminal 10B to output, by voice, a recipe that the terminal 10B has retrieved.
Meanwhile, the response processing apparatus 100 may acquire, from the terminal 10B, voice data that is to be output by the terminal 10B, by using the audio API common to the terminal 10B. In this case, the response processing apparatus 100 itself may output voice instead of causing the terminal 10B to output voice. With this configuration, the user is able to obtain information by voice output performed by the response processing apparatus 100 even if the information is actually retrieved by the terminal 10B, so that the user is able to perform a dialogue without regard to the terminal 10B and the like other than the response processing apparatus 100.
Next, a process performed by the response processing apparatus 100 to specify the terminal 10 that is caused to generate a response to a query issued by the user will be described with reference to
Similarly to
In the example in
The response processing apparatus 100 refers to the database DB01 and recognizes that the terminal 10A and the terminal 10C do not have a function to search for a recipe and that the terminal 10B has the function to search for a recipe. In this case, the response processing apparatus 100 selects the terminal 10B and transmits information A17 indicating the purpose of the user to the terminal 10B without transmitting the information to the terminal 10A and the terminal 10C (Step S17).
The terminal 10B transmits, to the response processing apparatus 100, a reply indicating that it is possible to generate a response to the information A17 (Step S18). Thereafter, similarly to the example in
Next, one example in which the response processing apparatus 100 outputs a response based on a context of the user will be described with reference to
Similarly to
In the example in
Then, the response processing apparatus 100 determines a mode of output of the response in accordance with the acquired context. For example, if the terminal 10B is located in the kitchen, the response processing apparatus 100 determines that the response generated by the terminal 10B is to be output by the terminal 10B.
Further, because the user is operating, the response processing apparatus 100 causes the terminal 10B to output the response both by voice and by an image to allow the user not to stop what he/she is doing. For example, the terminal 10B activates a projector function and projects an image on a wall of the kitchen so that the generated response (recipe information) is output on the wall. Furthermore, the terminal 10B outputs the generated response by voice. Moreover, similarly to the example in
In this manner, the response processing apparatus 100 may determine the mode of output of the response in accordance with the context of the user. For example, the response processing apparatus 100 causes the terminal 10 that is located near the user to output the response, or select a type of information (a voice, an image, a video, or the like) of the response in accordance with the situation of the user. With this configuration, the response processing apparatus 100 is able to implement a dialogue system with good usability.
Next, one example of a case in which the response processing apparatus 100 acquires a plurality of responses will be described with reference to
In the example in
In other words, similarly to
The terminal 10A transmits, as a response corresponding to the information A26, retrieved weather information to the response processing apparatus 100 (Step S35). Similarly, the terminal 10B transmits, as a response corresponding to the information A27, retrieved weather information to the response processing apparatus 100 (Step S36). Similarly, the terminal 10C transmits, as a response corresponding to the information A28, retrieved weather information to the response processing apparatus 100 (Step S37).
As described above, the terminal 10A, the terminal 10B, and the terminal 10C acquire information from different services, and therefore, transmit different kinds of information to the response processing apparatus 100 even though all pieces of the information are the weather information. In other words, the response processing apparatus 100 acquires a different response (weather information) from each of the terminal 10A, the terminal 10B, and the terminal 10C.
For example, a database DB02 illustrated in
The response processing apparatus 100 refers to the database DB02 and determines which of the responses is to be output to the user. For example, the response processing apparatus 100 may output the weather information acquired by the terminal 10C, which includes a number of pieces of information, such as the “precipitation probability”, the “ultraviolet information”, and the “pollen information”, among the pieces of acquired weather information. Alternatively, the response processing apparatus 100 may output the weather information acquired by the terminal 10A or the terminal 10B, for which screen display is available, among the pieces of acquired weather information. The response processing apparatus 100 outputs a voice A29 including the weather information that is determined to be output, or a screen included in the weather information. Alternatively, the response processing apparatus 100 may cause the terminal 10A or the like to output the weather information.
In this manner, the response processing apparatus 100 may acquire a plurality of responses and determine a response to be output to the user among the acquired responses. For example, the response processing apparatus 100 may determine a response that is to be actually output to the user in accordance with information amounts, qualities, or the like of the acquired responses. With this configuration, the response processing apparatus 100 is able to select and output an appropriate response from among the plurality of responses, so that it is possible to easily realize a response process as desired by the user. Further, the response processing apparatus 100 may generate a response by appropriately integrating or combining pieces of information that are acquired from a plurality of sources. Specifically, the response processing apparatus 100 may combine parts of image information and voice information acquired from the different terminals 10, or edit and combine parts of a plurality of pieces of voice information.
Next, one example in which the response processing apparatus 100 performs the response process in cooperation with the terminals 10 in various modes will be described with reference to
In the example illustrated in
In the example in
In other words, similarly to
The terminal 10A transmits retrieved weather information to the response processing apparatus 100 (Step S46). In contrast, the terminal 10D sends, as an output to the response processing apparatus 100, information indicating that movement corresponding to the retrieved weather information is to be performed (Step S47). For example, the terminal 10D sends, to the response processing apparatus 100, information indicating that movement representing pleasure is to be performed if the retrieved weather information indicates fine weather. Further, the terminal 10E sends, to the response processing apparatus 100, information indicating that the retrieved weather information may be output by voice (Step S48). Meanwhile, the terminal 1OF may transmit information indicating that a response to the weather information is not available or return error information indicating that the purpose of the user is not understandable.
The response processing apparatus 100 determines a mode of output of the response to the user, on the basis of the information transmitted from each of the terminals 10. For example, the response processing apparatus 100 outputs a voice A32 indicating the weather information by the subject apparatus. Furthermore, the response processing apparatus 100 causes the terminal 10A to output the weather information with screen display. Moreover, the response processing apparatus 100 causes the terminal 10D to output movement representing pleasure. Furthermore, the response processing apparatus 100 causes the terminal 10E to output voice indicating the weather information.
In this manner, the response processing apparatus 100 may output responses in different modes by using the characteristics of the respective terminals 10, instead of always causing the single terminal 10 to output only a single piece of weather information. With this configuration, the user is able to check various responses output by the various terminals 10 by performing only a single dialogue with the response processing apparatus 100.
As illustrated in
As described above, the response processing apparatus 100 serves as the front end that controls the plurality of terminals 10, so that the user is able to obtain information acquired by the plurality of terminals 10 and responses to be output by performing a dialogue with only the response processing apparatus 100. With this configuration, the response processing apparatus 100 is able to improve the usability for the user.
1-2. Configuration of Response Processing System According to First Embodiment
A configuration of the response processing apparatus 100 and the like according to the first embodiment as described above will be described below with reference to
As illustrated in
The terminal 10 is an information processing terminal that is used by the user. The terminal 10 is what is called an agent device and performs a dialogue with the user or generates a response to a voice or a movement provided by the user. The terminal 10 may include all or a part of the components included in the response processing apparatus 100 to be described later.
The external server 200 is a service server that provides various services. For example, the external server 200 provides weather information, traffic information, or the like in accordance with a request from the terminal 10 or the response processing apparatus 100.
The response processing apparatus 100 is an information processing terminal that performs the response process according to the present disclosure. As illustrated in
The sensor 20 is a device for detecting various kinds of information. The sensor 20 includes, for example, a voice input sensor 20A that collects a voice spoken by the user. The voice input sensor 20A is, for example, a microphone. Further, the sensor 20 includes, for example, an image input sensor 20B. The image input sensor 20B is, for example, a camera for capturing an image of the user or an image of a situation in the home of the user.
Furthermore, the sensor 20 may include a touch sensor, an acceleration sensor, a gyro sensor, or the like that detects that the user has touched the response processing apparatus 100. Moreover, the sensor 20 may include a sensor that detects a current location of the response processing apparatus 100. For example, the sensor 20 may receive a radio wave transmitted from a global positioning system (GPS) satellite and detect location information (for example, a latitude and a longitude) indicating a current location of the response processing apparatus 100 on the basis of the received radio wave.
Furthermore, the sensor 20 may include a radio wave sensor that detects a radio wave emitted by an external apparatus, an electromagnetic sensor that detects an electromagnetic wave, or the like. Moreover, the sensor 20 may detect an environment in which the response processing apparatus 100 is placed. Specifically, the sensor 20 may include an illuminance sensor that detects illuminance around the response processing apparatus 100, a humidity sensor that detects humidity around the response processing apparatus 100, a geomagnetic sensor that detects a magnetic field at a position at which the response processing apparatus 100 is located.
Furthermore, the sensor 20 need not always be arranged inside the response processing apparatus 100. For example, the sensor 20 may be installed outside the response processing apparatus 100 as long as it is possible to transmit information that is sensed using communication or the like to the response processing apparatus 100.
The input unit 21 is a device for receiving various kinds of operation from the user. For example, the input unit 21 is implemented by a keyboard, a mouse, a touch panel, or the like.
The communication unit 22 is implemented by, for example, a network interface card (NIC) or the like. The communication unit 22 is connected to the network N in a wired or wireless manner, and transmits and receives information to and from the terminal 10, the external server 200, and the like via the network N.
The storage unit 30 is implemented by, for example, a semiconductor memory element, such as a random access memory (RAM) or a flash memory, or a storage device, such as a hard disk or an optical disk. The storage unit 30 includes a user information table 31, a terminal information table 32, and a function table 33. Each of the data tables will be described in sequence below.
The user information table 31 stores therein information on a user who uses the response processing apparatus 100.
The “user ID” indicates identification information for identifying a user. The “user attribute information” indicates various kinds of information on the user that are registered by the user at the time of use of the response processing apparatus 100. In the example illustrated in
The “history information” indicates a use history of the response processing apparatus 100 by the user. In the example illustrated in
In other words, in the example illustrated in
Meanwhile, the “history information” illustrated in
The “user ID” corresponds to the same item as illustrated in
As illustrated in
Next, the terminal information table 32 will be described. The terminal information table 32 stores therein information on the terminal 10 that cooperates with the response processing apparatus 100.
The “terminal ID” indicates identification information for identifying the terminal 10. Meanwhile, in the specification, it is assumed that the same reference symbols are assigned to the terminal ID and the terminal 10. For example, the terminal 10 that is identified by a terminal ID of “10A” indicates the “terminal 10A”.
The “input information” indicates information on a file format or the like at the time of input of information to the terminal 10. The “voice input” indicates information on an input format in which a voice is input to the terminal 10, or the like. The “input system” indicates, for example, information on an input system of a voice transmitted from the response processing apparatus 100. The “corresponding format” indicates a format of data (a voice, an image, or the like) that can be processed by the terminal 10. In the example illustrated in
The “function” indicates a function included in the terminal 10. In the example illustrated in
The “output format” indicates a data format that can be output by the terminal 10. In the example illustrated in
The “installation position” indicates a position at which the terminal 10 is installed. Meanwhile, in the example in
In other words, in the example illustrated in
Next, the function table 33 will be described. The function table 33 stores therein detailed information on each of the functions of the terminal 10.
The “function ID” indicates identification information for identifying a function. The “terminal ID” corresponds to the same item as illustrated in
The “output format” indicates a format in which information that is received by each of the terminals 10 from the connected service can be output. For example, the output format is a voice, an image, or the like. The “average replay time” indicates a time that is taken to replay the information that is received by each of the terminals 10 from the connected service. The “content” indicates a content that can be acquired by each of the terminals 10 from the external service or the like. The “selection history of the user” indicates a history of selection of a certain terminal 10 and a frequency of the selection performed by the user who uses a certain function.
In other words, in the example illustrated in
Meanwhile, in
Referring back to
The acquisition unit 40 is a processing unit that acquires various kinds of information. As illustrated in
The detection unit 41 detects various kinds of information via the sensor 20. For example, the detection unit 41 detects a voice spoken by the user via the voice input sensor 20A that is one example of the sensor 20. Further, the detection unit 41 may detect various kinds of information on a movement of the user, such as a facial information on the user, orientation, inclination, motion, a moving speed, or the like of the body of the user, or a movement of the user, via the image input sensor 20B, the acceleration sensor, the infrared sensor, or the like. In other words, the detection unit 41 may detect, as a context, various physical amounts, such as location information, acceleration, temperature, gravity, rotation (angular velocity), illuminance, geomagnetism, pressure, adjacency, humidity, or a rotation vector.
The registration unit 42 receives registration from the user via the input unit 21. For example, the registration unit 42 receives registration of a user profile (attribute information) from the user via the touch panel or the keyboard.
Further, the registration unit 42 may receive registration of a schedule or the like of the user. For example, the registration unit 42 receives registration of a schedule from the user by using an application function that is incorporated in the response processing apparatus 100.
The receiving unit 43 receives various kinds of information. For example, if the attribute information or the schedule information on the user is registered in an external service or the like instead of the response processing apparatus 100, the receiving unit 43 receives the attribute information, the schedule, or the like on the user from the external server 200.
Further, the receiving unit 43 may receive a context related to communication. For example, the receiving unit 43 may receive, as the context, a connection condition between the response processing apparatus 100 and various devices (a server on the network, a household appliance in the home, or the like). The connection condition with the various devices may be information indicating whether mutual communication is established, a communication standard used for the communication, or the like, for example.
The acquisition unit 40 controls each of the processing units as described above and acquires various kinds of information. For example, the acquisition unit 40 acquires, from the user, input information that is information used as a trigger to cause the terminal 10 to generate a response.
For example, the acquisition unit 40 acquires voice information spoken by the user as the input information. Specifically, the acquisition unit 40 acquires a speech of the user, such as “let me know weather”, and acquires, as the input information, a certain purpose included in the speech.
Alternatively, the acquisition unit 40 may acquire, as the input information, detection information on a detected behavior of the user. The detection information is information that is detected by the detection unit 41 via the sensor 20. Specifically, the detection information is a behavior of the user, such as information indicating that the user has viewed the camera of the response processing apparatus 100 or information indicating that the user has moved from a certain room to an entrance in the home, which may be used as a trigger to cause the response processing apparatus 100 to generate a response.
Further, the acquisition unit 40 may acquire information on various contexts. The context is information indicating various situations in which the response processing apparatus 100 generates a response. Meanwhile, the context includes “information indicating a situation of the user”, such as the behavior information indicating that the user has viewed the response processing apparatus 100, and therefore, the context may be used as the input information.
For example, the acquisition unit 40 may acquire, as the context, the attribute information on the user that is registered in advance by the user. Specifically, the acquisition unit 40 acquires information, such as gender, an age, or a domicile, on the user. Furthermore, the acquisition unit 40 may acquire, as the attribute information, information indicating characteristics of the user, such as visual impairment of the user. Moreover, the acquisition unit 40 may acquire, as the context, information on a hobby, a preference, or the like of the user on the basis of the use history or the like of the response processing apparatus 100.
Furthermore, the acquisition unit 40 may acquire, as the context, location information indicating the location of the user. The location information may be information indicating a position, such as specific longitude and latitude, or information indicating a room in which the user is present in the home. For example, the location information may be information indicating a location of the user, such as whether the user is in a living room, a bedroom, or a child room in the home. Alternatively, the location information may be information on a specific place indicating an outing place in which the user is present. Furthermore, the information indicating the outing place in which the user is present may include information indicating a situation about whether the user is on a train, whether the user is driving a vehicle, or whether the user is in a school or an office. The acquisition unit 40 may acquire the information as described above by, for example, performing mutual communication with a mobile terminal, such as a smartphone, carried by the user.
Moreover, the acquisition unit 40 may acquire, as the context, estimation information on an estimated behavior or emotion of the user.
For example, the acquisition unit 40 acquires, as the context, behavior prediction information that is information estimated from a behavior of the user and that is information indicating a predicted future behavior of the user. Specifically, the acquisition unit 40 acquires behavior prediction information indicating that “the user is going out” as information that is estimated from a behavior indicating that the user has moved from a certain room to an entrance in the home. For example, if the acquisition unit 40 acquires the behavior prediction information indicating that “the user is going out”, the acquisition unit 40 acquires a context that is tagged with “outing” on the basis of the information.
Furthermore, the acquisition unit 40 may acquire, as the behavior of the user, schedule information that is registered in advance by the user. Specifically, the acquisition unit 40 acquires schedule information that is registered with a scheduled time within a predetermined period from a time at which the user provides a voice (for example, within 1 day or the like). With this configuration, the acquisition unit 40 is able to estimate information or the like indicating that the user is going out at a certain time.
Moreover, the acquisition unit 40 may estimate a situation or an emotion of the user by detecting a moving speed of the user captured by the sensor 20, a location position of the user, a speech speed of the user, or the like. For example, the acquisition unit 40 may estimate a situation or an emotion indicating that “the user is in a hurry” when a speech speed that is faster than a normal speech speed of the user is observed. For example, the response processing apparatus 100 is able to perform adjustment to output a shortened response if the context indicating that the user is in a hurry as compared to a normal state is acquired.
Meanwhile, the contexts as described above are mere examples, and any kind of information indicating a situation in which the user and the response processing apparatus 100 are present may be used as the context. For example, the acquisition unit 40 may acquire, as the context, various physical amounts, such as location information, acceleration, temperature, gravity, rotation (angular velocity), illuminance, geomagnetism, pressure, adjacency, humidity, or rotation vector, of the response processing apparatus 100 acquired via the sensor 20. Furthermore, the acquisition unit 40 may acquire, as the context, a connection condition (for example, information on establishment of communication, or a communication standard being used) with various devices by using a built-in communication function.
Moreover, the context may include information on a dialogue between the user and a different user or between the user and the response processing apparatus 100. For example, the context may include dialogue context information indicating a context of a dialogue made by the user, a domain of the dialogue (weather, news, train operation information, or the like), a purpose of a speech of the user, the attribute information, or the like.
Furthermore, the context may include date/time information indicating a date and time at which the dialogue is performed. Specifically, the date/time information is information indicating a date, a time, a day, a public holiday characteristic (Christmas or the like), a time of day (morning, daytime, night, midnight), or the like.
Moreover, the acquisition unit 40 may acquire, as the context, various kinds of information indicating a situation of the user, such as information on a specific household task performed by the user, information on a content of a television program being viewed or what the user is eating, or information indicating that the user is making a conversation with a specific person.
Furthermore, by mutual communication with a household appliance (IoT device or the like) that is installed in the home, the acquisition unit 40 may acquire information on whether which of household appliances is activated (for example, whether a power is turned on or off) or a type of a process that is performed by a certain household appliance.
Moreover, by mutual communication with an external service, the acquisition unit 40 may acquire, as the context, traffic condition, weather information, or the like in a living area of the user. The acquisition unit 40 stores each piece of acquired information in the user information table 31 or the like. Furthermore, the acquisition unit 40 may refer to the user information table 31 or the terminal information table 32 and appropriately acquire information needed for a process.
Next, the selection unit 50 will be described. As illustrated in
Meanwhile, the selection unit 50 may select the terminal 10 that generates the response corresponding to the input information from among the plurality of terminals 10 when determining that it is difficult for the response processing apparatus 100 to generate a response to the input information. In other words, the selection unit 50 may cause the subject apparatus to generate the response if the subject apparatus is able to generate the response. With this configuration, the selection unit 50 is able to promptly cope with a dialogue that can be processed by the subject apparatus.
Furthermore, the selection unit 50 may determine whether each of the terminals 10 is able to generate the response corresponding to the input information, and select, as the terminal 10 that generates the response corresponding to the input information, the certain terminal 10 other than the terminal 10 that is determined as being not able to generate the response corresponding to the input information. In other words, the selection unit 50 may refer to the terminal information table 32 or the function table 33, and select the terminal 10 that is expected to be able to generate the response. With this configuration, the selection unit 50 is able to save time and effort to randomly transmit a request to all of the terminals 10.
Moreover, the selection unit 50 may select the plurality of terminals 10 as the terminals 10 that generate the responses corresponding to the input information. In other words, the selection unit 50 may select the plurality of terminals 10 that are able to generate the responses, instead of selecting only the single terminal 10 as the terminal 10 that generates the response. With this configuration, the selection unit 50 is able to diversify the responses that are generated for the query from the user.
The selection unit 50 converts the input information to a mode that can be recognized by each of the selected terminals 10, and transmits the converted input information to the plurality of terminals 10. For example, as illustrated in
For example, the selection unit 50 may transmit an analysis result of a speech of the user by using the API of each of the terminals 10. Furthermore, the selection unit 50 may transmit an analysis result of the speech of the user by using a different method if the API used by the terminal 10 is not available or if the API is unknown.
For example, the selection unit 50 may transmit the input information to the terminal 10 by actually replaying the speech of the user by voice if the terminal 10 is not able to receive information by communication but is only able to receive analog voice input.
The request analysis unit 51 performs a meaning understanding process on the information acquired by the acquisition unit 40. Specifically, the request analysis unit 51 performs the automatic speech recognition (ASR) process or the natural language understanding (NLU) process on voice information or the like acquired by the acquisition unit 40. For example, the request analysis unit 51 decomposes the acquired voice into morphemes through the ASR or the NLU, or determines a purpose or an attribute that is included in each of the morphemes as elements.
Meanwhile, if the purpose of the user is not understandable as a result of analysis of the input information, the request analysis unit 51 may send a notice of this fact to the output control unit 55. For example, if information that is not estimated from the speech of the user is included as a result of the analysis, the request analysis unit 51 sends the content to the output control unit 55. In this case, the output control unit 55 may generate a response to request a user to accurately speak again about unclear information.
The state estimation unit 52 estimates a state of the user on the basis of the context acquired by the acquisition unit 40. The selection unit 50 may select the terminal 10 on the basis of the information that is estimated by the state estimation unit 52. For example, if the state estimation unit 52 estimates that the user is present near an entrance at a timing at which the user speaks, the selection unit 50 may preferentially select the terminal 10 that is installed near the entrance as the terminal 10 that generates a response.
The output control unit 55 controls output of the response that corresponds to the input information and that is generated by the terminal 10 selected by the selection unit 50.
For example, if the plurality of terminals 10 generate the responses, the output control unit 55 determines a response to be output to the user on the basis of information on comparison among the responses generated by the plurality of terminals 10.
As one example, the output control unit 55 determines a response to be output to the user on the basis of an information amount or a type of each of the responses generated by the plurality of terminals 10.
For example, as illustrated in
Meanwhile, the output control unit 55 may determine a response to be output among the responses, by using, as a determination criterion, the fact that reliability of source information is high (for example, the service is used by a large number of users, or the like) or the fact that the service is more preferred by the user.
Meanwhile, the output control unit 55 may generate the response to be output to the user by combining the plurality of responses that are generated by the plurality of terminals 10.
For example, as illustrated in
Meanwhile, the output control unit 55 may flexibly combine pieces of information by selecting information so as to match a predetermined replay time for example, instead of combining all kinds of acquired weather information.
Furthermore, the output control unit 55 may determine a mode of output of the response generated by the selected terminal 10, on the basis of the context.
As one example, the output control unit 55 may determine a type of a response to be output to the user or an output destination where the response is to be output, in accordance with the attribute information on the user. For example, when the weather information is to be output, and if the attribute of the user is a “child”, the output control unit 55 may select “image output” by which the weather can be recognized at a glance, instead of voice output that may include a word or the like that may be hard to understand. In this case, the output control unit 55 may select the terminal 10 that performs output in accordance with the type (image information) of the information to be output. Specifically, the output control unit 55 selects the terminal 10 that can perform image display, and causes the terminal 10 to output the response.
Furthermore, if the attribute of the user includes “visual impairment”, the output control unit 55 may give priority to voice output instead of image output. Moreover, if it is expected that the user may have difficulty in understanding the response of the terminal 10, the output control unit 55 may add a voice to be output to the user. For example, it is assumed that the user issues a request to “reduce room temperature” to the response processing apparatus 100. In this case, the response processing apparatus 100 transmits the request of the user to the terminal 10 that is an air conditioner, and causes the terminal 10 to provide a response to the user. In this case, the terminal 10 provides a response of “reduction of set room temperature”. At this time, if the attribute of the user includes “visual impairment”, the user is not able to understand an operating condition even when the user views the air conditioner, and therefore, the response processing apparatus 100 outputs specific information that can easily be understood by the user, such as “temperature of the air conditioner is reduced by two degrees”, by voice. With this configuration, the user is able to perceive, via the response processing apparatus 100, operation that is performed by the terminal 10. Furthermore, if the attribute of the user includes “auditory impairment”, the response processing apparatus 100 may perform various kinds of output in accordance with the user, such as displaying, on a screen, of a reaction sound that occurs at the time of operation of the terminal 10.
Moreover, the output control unit 55 may determine the terminal 10 that outputs the response to the user, on the basis of a positional relationship between the user and at least one of the terminals 10.
For example, the output control unit 55 may cause the terminal 10, which is located closest to the user at the time the response is generated, to output the response. For example, if a position of the terminal 10 that has generated the response and a position of the user are located distant from each other, and if a different terminal 10 is located near the user, the output control unit 55 acquires the response from the terminal 10 that has generated the response and transmits the acquired response to the terminal 10 that is located near the user. Then, the output control unit 55 causes the different terminal 10 located near the user to output the response. With this configuration, the user is able to more accurately perceive the response.
Moreover, if the estimation information on an estimated behavior or an estimated emotion of the user is acquired as the context, the output control unit 55 may determine a type of a response to be output to the user, a mode of the response, or the terminal 10 that outputs the response, on the basis of the estimation information.
For example, if the user speaks that “let me known today's weather” at a speech speed that is faster than usual, the output control unit 55 estimates that the user is “in a hurry” as compared to a normal state. In this case, the output control unit 55 outputs a response that can convey the weather information to the user in the shortest replay time among the responses obtained from the plurality of terminals 10, for example. Alternatively, the output control unit 55 may change a mode of output such that the weather information is output at a faster speed than usual.
Furthermore, if the user speaks that “let me know today's weather” while moving from a living room to an entrance, the output control unit 55 estimates a behavior that the user is moving to the entrance. In this case, the output control unit 55 may cause the terminal 10 that is installed at the entrance to output the weather information such that the information can easily be perceived by the user.
One example of a relationship between the behavior of the user and the response will be described below with use of an example of the behavior of the user as illustrated in
Furthermore, in the example in
When outputting the response generated by the output control unit 55, the output control unit 55 converts the response to an output format corresponding to each of the terminals 10 for example, and transmits the converted information to the terminal 10. For example, the output control unit 55 converts a text response included in the response to voice data corresponding to the terminal 10 that serves as the output destination. Alternatively, the output control unit 55 converts a response including image information that is generated or acquired by any of the terminals 10 to image data corresponding to the terminal 10 that serves as the output destination.
The output unit 60 is a mechanism for outputting various kinds of information. For example, the output unit 60 is a speaker or a display. For example, if the output control unit 55 outputs a response, the output unit 60 outputs, to the user by voice, a name or the like of the terminal 10 that serves as the output destination. Further, the output unit 60 may output image data on the display. Furthermore, if the response processing apparatus 100 generates a response by the subject apparatus, the output unit 60 outputs the generated response by voice, by image, or the like. Meanwhile, the output unit 60 may output the response in various modes, such as by performing character recognition on the generated voice data and displaying the characters on the display.
1-3. Flow of Response Process According to First Embodiment
Next, a flow of the response process according to the first embodiment will be described with reference to
As illustrated in
In contrast, if the input information is received (Step S101; Yes), the response processing apparatus 100 analyzes the input information (Step S102). Specifically, the response processing apparatus 100 analyzes the input information and acquires a purpose of the user, an attribute of the speech, or the like that is included in the input information.
Subsequently, the response processing apparatus 100 determines whether it is possible to execute the request of the user by the subject apparatus (Step S103). If it is possible to execute the request of the user by the subject apparatus (Step S103; Yes), the response processing apparatus 100 further determines whether it is possible to execute the request of the user by the cooperating terminal 10 (Step S104).
If it is possible to cope with the request of the user by the cooperating terminal 10 (Step S104; Yes), or if it is difficult to execute the request of the user by the subject apparatus at Step S103 (Step S103; No), the response processing apparatus 100 selects the terminal 10 to which the request is to be transmitted (Step S105). As described above, the response processing apparatus 100 may select the single terminal 10 or select the plurality of terminals 10.
At this time, the response processing apparatus 100 determines whether the terminal 10 serving as a transmission destination has an API for transmitting the request (Step S106). If the terminal 10 does not have the API (Step S106; No), the response processing apparatus 100 transmits the request in a certain mode corresponding to the terminal 10 (Step S107). For example, the response processing apparatus 100 converts text representing the request to an analog voice, and outputs the converted voice to the terminal 10 in order to transmit (send) the request of the user. In contrast, if the terminal 10 has the API (Step S106; Yes), the response processing apparatus 100 gives an instruction to execute the request by the API (Step S108).
Thereafter, the response processing apparatus 100 acquires an execution result obtained by executing a process on the request of the user by each of the terminals 10 (Step S109). For example, the response processing apparatus 100 acquires an execution result of a search process or the like corresponding to a query spoken by the user from each of the terminals 10.
Meanwhile, at Step S104, if it is difficult to execute the request of the user by the cooperating terminal 10 (Step S104; No), the response processing apparatus 100 performs a process of responding to the request by the subject apparatus (Step S110).
The response processing apparatus 100 that has acquired the execution result determines a mode of output of the response to the user (Step S111). For example, the response processing apparatus 100 determines a response to be output or determines the terminal 10 serving as an output destination that outputs the response, in accordance with the context or the like of the user.
The response processing apparatus 100 causes the terminal 10 serving as the output destination to output the response in the mode that is determined at Step S111 (Step S112). Alternatively, the response processing apparatus 100 outputs the response from the subject apparatus.
The response processing apparatus 100 that has output the response to the user determines whether a dialogue process with the user is terminated (Step S113). Specifically, the response processing apparatus 100 determines whether a single session related to the dialogue with the user is terminated.
If the dialogue process is not terminated (Step S113; No), the response processing apparatus 100 returns the process to Step S101 and continues the dialogue process. In contrast, if it is determined that the dialogue process is terminated (Step S113; Yes), the response processing apparatus 100 terminates the process.
1-4. Modification of First Embodiment
The response process according to the first embodiment as described above may include various modifications. A modification of the first embodiment will be described below.
For example, the response processing apparatus 100 may periodically update information that is stored in the terminal information table 32 or the function table 33. For example, functions of the terminal 10 may be expanded via the network in some cases. Specifically, the terminal 10 having a “translation” function may be updated to deal with a certain language that has not been dealt with.
In this case, the response processing apparatus 100 receives information indicating that the update is performed from the cooperating terminal 10, and updates information that is stored in the terminal information table 32 or the function table 33 on the basis of the received information. With this configuration, the user is able to enjoy the latest function without regard to update or the like of the functions of each of the terminals 10.
Furthermore, the response processing apparatus 100 may periodically transmit the activation word to each of the terminals 10, and check whether each of the terminals 10 normally operates.
Moreover, the response processing apparatus 100 may prevent the cooperating terminal 10 from giving a voice reply or the like. For example, if the user speaks to the response processing apparatus 100, the terminal 10 located nearby also detects the speech. In this case, the terminal 10 may provide a voice reply before the response processing apparatus 100 generates a response. Therefore, the response processing apparatus 100 may control a reply process to prevent the terminal 10 from giving a reply in advance of the subject apparatus.
Furthermore, when transmitting the request of the user to the plurality of terminals 10, the response processing apparatus 100 may simultaneously transmit the request to the plurality of terminals 10 by, for example, separating voice bands to be used. With this configuration, the response processing apparatus 100 is able to promptly send the request of the user to the plurality of terminals 10. Moreover, when transmitting the request, the response processing apparatus 100 may refrain from using a voice in an audible region as long as the terminal 10 serving as the transmission destination is able to perform the process. Furthermore, the response processing apparatus 100 may detect a frequency of a surrounding noise or a human voice, select a TTS with a voice sound that is different from the frequency of the noise, and output a voice.
Moreover, the response processing apparatus 100 may acquire a reaction to a certain response that has been output in the past from the user, and determine a type of a response to be output to the user, a mode of the response, or an output destination that outputs the response. In other words, the response processing apparatus 100 may perform a learning process on the basis of a reaction of the user.
For example, when a certain response is output to the user who has issued a query about a certain kind of information, the response processing apparatus 100 may receive a reaction, such as “let me know other information”, from the user. In this case, the response processing apparatus 100 determines that the information that has been output in the past is not information that is desired by the user. In contrast, if the user accepts subsequently output information, the response processing apparatus 100 determines that the information is information desired by the user.
In this case, when receiving the same query from the user next time or later, the response processing apparatus 100 may preferentially select the terminal 10 that is able to generate a response as desired by the user. Furthermore, for example, if the user tends to request a certain terminal 10 to perform output (if the frequency that the user specifies the specific terminal 10 as the output destination is statistically high), the response processing apparatus 100 may perform adjustment such that the certain terminal 10 preferentially outputs the response. In this manner, the response processing apparatus 100 is able to perform the response process that can more appropriately cope with the request of the user, by performing learning on the basis of an instruction or an operation history of the user.
Moreover, for example, if biological authentication (face authentication, fingerprint authentication, or the like) of the user rather than the activation word is needed to activate the terminal 10, the response processing apparatus 100 may output a voice that notifies the user of this fact. Furthermore, the response processing apparatus 100 may notify the user of a position or information on the terminal 10 that is not activated, and request the user to activate the terminal 10.
Moreover, the response processing apparatus 100 may select the plurality of terminals 10 as output destinations. In this case, the response processing apparatus 100 may change the output destination in accordance with a type of information to be output as the response, such that, for example, a voice is output by the terminal 10A and an image is output by the terminal 10B. Furthermore, the response processing apparatus 100 may flexibly determine the output destination by, for example, simultaneously displaying information on both of a projector capable of displaying the information on a relatively large screen and a smart speaker with a monitor capable of displaying the information in a small size. Moreover, the response processing apparatus 100 may perform an output process in accordance with the surrounding context by, for example, displaying the information on the projector in a dark surrounding environment, and displaying the information on a smart television in a bright surrounding environment.
Furthermore, when transmitting the request of the user to each of the terminals 10, the response processing apparatus 100 may transmit the input information (voice or the like) received from the user as it is without extracting the purpose of the user. Moreover, in this case, the response processing apparatus 100 may perform character recognition on the voice of the user, and converts a text indicating the request. For example, it is assumed that the user has issued a request of “let me show an almanac of a next month” to the response processing apparatus 100. In this case, for example, if it is determined that each of the terminals 10 may have difficulty in recognizing the “almanac”, the response processing apparatus 100 may refer to a synonym dictionary or a thesaurus, and converts the request of the user to a mode that can be recognized by each of the terminals 10. For example, the response processing apparatus 100 may convert the request of the user to “let me show a calendar of a next month” and transmits the converted information to the terminal 10A. Furthermore, the response processing apparatus 100 may convert the request of the user to “let me show a schedule of a next month” and transmits the converted information to the terminal 10B. In this manner, the response processing apparatus 100 may perform, as the front end device, various adjustment processes such that the request is smoothly issued to each of the terminals 10. With this configuration, the user is able to issue a request without regard to a phrase that can be recognized by each of the terminals 10.
Moreover, the response processing apparatus 100 may set a priority or the like to the terminal 10 that provides a response to the request of the user. In this case, the response processing apparatus 100 preferentially selects the terminal 10 with a higher priority as the terminal 10 that generates the response. For example, the response processing apparatus 100 may set a higher priority to the terminal 10 that uses a non-chargeable service, with regard to an external service that is used when a certain function is executed. With this configuration, the response processing apparatus 100 is able to prevent the user from falsely being charged for use of the service or the like.
2-1. One Example of Response Process According to Second Embodiment
A second embodiment will be described below. In the second embodiment, an example will be described in which an information device that behaves as the response processing apparatus according to the present disclosure is changed in accordance with a situation of the user. Meanwhile, in the second embodiment, reference symbols, such as a response processing apparatus 100A and a response processing apparatus 100B, are assigned for distinguishing the apparatuses, but functional configurations of the response processing apparatus 100A and the response processing apparatus 100B are the same as that of the response processing apparatus 100 described in the first embodiment. Further, in the description below, the response processing apparatus 100A and the response processing apparatus 100B will be collectively referred to as the “response processing apparatus 100” when they need not be distinguished from each other.
In the example in
At this time, the response processing apparatus 100A transfers the function as the response processing apparatus according to the present disclosure to the response processing apparatus 100B that is one example of the terminal 10 located in the home. For example, the response processing apparatus 100A performs the transfer of the function of the response processing apparatus according to the present disclosure in accordance with a database DB11 as illustrated in
In the example in
Thereafter, the user inputs a request including a content of “let me know today's weather” to the response processing apparatus 100B that newly functions as the response processing apparatus according to the present disclosure. The response processing apparatus 100B acquires a voice A41 including the content of “let me know today's weather” (Step S52). If the response processing apparatus 100B determines that the purpose of the user is to “search for today's weather information” through the ASR process or the NLU process on the voice A41, the response processing apparatus 100B transmits the purpose of the user to each of the terminals 10.
The response processing apparatus 100B transmits, to the terminal 10A, information on a certain format corresponding to the terminal 10A (Step S53). Further, the response processing apparatus 100 transmits, to the terminal 10C, information on a certain format corresponding to the terminal 10C (Step S54).
The terminal 10A transmits, as a response, the retrieved weather information to the response processing apparatus 100B (Step S55). Similarly, the terminal 10B transmits, as a response, the retrieved weather information to the response processing apparatus 100B (Step S56).
Thereafter, the response processing apparatus 100B outputs the response to the user in accordance with the configuration included in the subject apparatus. For example, if the response processing apparatus 100B is an apparatus that does not output Japanese by voice, the response processing apparatus 100B outputs the response by giving an expression indicating fine weather (joyful emotional expression or the like), for example. In other words, the response processing apparatus 100B converts the output mode in accordance with the configuration included in the subject apparatus, and performs output to the user. Meanwhile, similarly to the first embodiment, the response processing apparatus 100B may cause the terminal 10A or the terminal 10C to output the response.
In this manner, the response processing apparatus 100A and the response processing apparatus 100B may transfer the functions as the response processing apparatus according to the present disclosure to the terminal 10 or the like. The terminal 10 to which the functions as the response processing apparatus according to the present disclosure are transferred subsequently behaves as the response processing apparatus according to the present disclosure. With this configuration, the user is able to perform the response process according to the present disclosure using the alternative terminal 10 even if the user loses sight of the response processing apparatus or the user leaves the response processing apparatus in a different place. In other words, the response processing apparatus according to the present disclosure is not limited to a specific apparatus, but may be any of the cooperating terminals 10.
2-2. Modification of Second Embodiment
The response processing apparatus 100 may set information, such as a priority, with respect to the terminal 10 that serves as a transfer destination. In this case, the response processing apparatus 100 may preferentially transfer the functions as the response processing apparatus according to the present disclosure to the terminal 10 with a higher priority. For example, the response processing apparatus 100 may set a higher priority to the terminal 10 that has higher information processing performance or the terminal 10 that has a larger number of functions.
A third embodiment will be described below. In the third embodiment, an example is described in which the response processing apparatus 100 responds to a request of the user by combining functions included in each of the terminals 10.
In the example in
The response processing apparatus 100 refers to the database DB21 and recognizes that a “photograph”, a “text”, and a “video” are to be collected as collection data in order to realize the request of “diary”. Further, the response processing apparatus 100 recognizes that each piece of data is to be collected from a photograph application of the “terminal 10A”, dialogue data accumulated by the “terminal 10C”, video data uploaded on the network via the “terminal 10B”, or the like.
Furthermore, the response processing apparatus 100 transmits a request to acquire each piece of data to write a diary to each of the terminals 10 (Step S62, Step S63, and Step S64). Moreover, the response processing apparatus 100 acquires data transmitted from each of the terminals 10 (Step S65, Step S66, and Step S67).
Furthermore, the response processing apparatus 100 responds to the request of the user by combining the pieces of acquired data. Specifically, the response processing apparatus 100 stores a diary of the day of the user by combining images that are captured within a predetermined time (for example, 24 hours), dialogues with the user, videos, and the like. The response processing apparatus 100 outputs a voice A52 including a content of “certainly” if the request is completed.
In this manner, the response processing apparatus 100 may respond to the request of the user by combining pieces of data that can be collected by each of the terminals 10. With this configuration, the response processing apparatus 100 is able to respond to a complicated request that is issued by the user and that can hardly be executed by a single apparatus. For example, it is assumed that the response processing apparatus 100 receives a request to “make a travel plan” from the user. In this case, the response processing apparatus 100 causes each of the terminals 10 to perform a process of “searching for tourist site information”, a process of “making a reservation of transportation”, a process of “making a reservation of accommodation”, and the like. Then, the response processing apparatus 100 responds to the request of the user by combining the pieces of information as described above. In this manner, the response processing apparatus 100 is able to accurately respond to the request of the user by combining a best process or an executable function of each of the terminals 10. Meanwhile, the response processing apparatus 100 may hold, in the subject apparatus, the information that is stored in the database DB21, or may access an external server or the like that holds information to realize the request and acquire the information every time a request of the user is received.
A fourth embodiment will be described below. In the fourth embodiment, an example will be described in which the response processing apparatus 100 outputs a content of a request to a different user with respect to a request for which each of the terminals 10 is not able to provide a response.
In the example in
The terminal 10A provides a reply indicating that “a recipe of a dish YYY” is not retrieved with respect to the transmitted request (Step S74). Similarly, the terminal 10B provides a reply indicating that “a recipe of a dish YYY” is not retrieved (Step S75).
In this case, the response processing apparatus 100 controls a camera or the like included in the subject apparatus, the terminal 10A, or the terminal 10B, and attempts to detect a context of a different user who is present nearby. In the example in
If the response processing apparatus 100 detects the second user, the response processing apparatus 100 outputs a voice A62 including a content of “Mr./Ms. ZZZ is present nearby, and a request will be sent to Mr./Ms. ZZZ” to the user. Meanwhile, if the user does not want to allow the second user to hear about the request, the user may input this fact to the response processing apparatus 100.
After outputting the voice A62, the response processing apparatus 100 causes the neighboring terminal 10B to output a voice A63 including a content of “Mr./Ms. ZZZ, please input a voice if you know a recipe for a dish YYY” (Step S76). If the second user knows “a recipe for a dish YYY”, the second user inputs a voice to, for example, the terminal 10B (Step S77). Alternatively, the second user inputs information indicating that he/she does not know “a recipe for a dish YYY” to the terminal 10B.
The response processing apparatus 100 outputs the content returned from the second user to the user. In other words, if each of the terminals 10 does not generate a response, the response processing apparatus 100 outputs a query to the second user or the like and acquires a response to the request.
In this manner, if the response processing apparatus 100 determines that it is difficult to generate the response corresponding to the input information by any of the response processing apparatus 100 and the plurality of terminals 10, the response processing apparatus 100 acquires a context of a different user other than the subject user. Further, the response processing apparatus 100 determines an output destination of an output related to the input information on the basis of the context of the different user. The output related to the input information is, for example, a voice indicating that the agent device is not able to generate the response corresponding to the input information, a voice indicating a request to a different user for a reply to the input information, or the like. Specifically, the response processing apparatus 100 determines the terminal 10B as an output destination of the voice A63 illustrated in
As described above, even if a request that a dialogue system is not able to solve is issued, the response processing apparatus 100 is able to detect a different user who is expected to solve the request by acquiring the context of the different user by controlling the plurality of terminals 10. With this configuration, the response processing apparatus 100 is able to improve the possibility that a certain response is to be output to the user even for a rare query that can hardly be solved by only the dialogue system or a query that is not recognizable by the agent device. Meanwhile, the response processing apparatus 100 may detect an object (for example, a specific tool or a book) that may serve as a reply to the request even the object is not cooperated with the response processing apparatus 100, rather than the different user. In this case, the response processing apparatus 100 may transmit a content described in the specific tool or the book to the user or provide a location of the specific tool or the book to the user.
The processes performed in each of the embodiments as described above may be performed in various different modes other than the embodiments as described above.
In each of the embodiments as described above, the examples have been described in which the response processing apparatus 100 is what is called a smart speaker and performs the process in a stand-alone manner. However, the response processing apparatus 100 may perform the response process according to the present disclosure in cooperation with a server apparatus (what is called a cloud server or the like) that is connected via the network.
In this case, the response processing apparatus 100 acquires a voice or a context that is input to a terminal, such as a smart speaker, generates a response on the basis of the acquired information, and transmits the generated response to the terminal. In this case, the terminal, such as the smart speaker, functions as an interface that mainly executes a dialogue process, such as a process of collecting a speech of the user, a process of transmitting the collected speech to the server apparatus, and a process of outputting a reply that is transmitted from the server, with the user.
Furthermore, the response processing apparatus according to the present disclosure may be realized by a mode of an IC chip or the like that is incorporated in the response processing apparatus 100.
Moreover, in each of the processes described in each of the embodiments as described above, all or part of a process described as being performed automatically may also be performed manually. Alternatively, all or part of a process described as being performed manually may also be performed automatically by known methods. In addition, the processing procedures, specific names, and information including various kinds of data and parameters illustrated in the above-described document and drawings may be arbitrarily changed unless otherwise specified. For example, various kinds of information illustrated in each of the drawings are not limited to information as illustrated in the drawings.
Furthermore, the components of the apparatuses illustrated in the drawings are functionally conceptual and do not necessarily have to be physically configured in the manner as illustrated in the drawings. In other words, specific forms of distribution and integration of the apparatuses are not limited to those illustrated in the drawings, and all or part of the apparatuses may be functionally or physically distributed or integrated in arbitrary units depending on various loads or use conditions. For example, the request analysis unit 51 and the state estimation unit 52 may be integrated with each other.
Moreover, the embodiments and the modifications as described above may be combined appropriately as long as the processes do not conflict with each other.
Furthermore, the effects described in this specification are merely exemplified effects, and are not limitative, and, other effects may be achieved.
As described above, the response processing apparatus according to the present disclosure (the response processing apparatus 100 according to one embodiment) includes an acquisition unit (the acquisition unit 40 according to one embodiment), a selection unit (the selection unit 50 according to one embodiment), and an output control unit (the output control unit 55 according to one embodiment). The acquisition unit acquires, from a user, input information that is information used as a trigger to cause an information device (the terminal 10 according to one embodiment) to generate a response. The selection unit selects an information device that generates a response corresponding to the input information from among a plurality of information devices. The output control unit controls output of a response that corresponds to the input information and that is generated by the selected information device.
In this manner, the response processing apparatus according to the present disclosure behaves as a front end device of the plurality of information devices, selects the information device that generates the response, and controls output. With this configuration, the response processing apparatus is able to omit time and effort to perform a dialogue with each of the information devices when the user uses the plurality of information devices, so that it is possible to improve usability for the user.
Furthermore, the acquisition unit acquires, as the input information, voice information that is spoken by the user. With this configuration, the response processing apparatus is able to perform an appropriate dialogue in accordance with a situation of the user during communication with the user via the voice.
Moreover, the acquisition unit acquires, as the input information, detection information on a detected behavior of the user. With this configuration, the response processing apparatus is able to generate an appropriate response in accordance with the behavior of the user even if the user does not speak.
Furthermore, when determining that the response processing apparatus is not able to generate a response corresponding to the input information, the selection unit selects an information device that generates a response corresponding to the input information from among the plurality of information devices. With this configuration, the response processing apparatus transmits, to the information device, only a request for which the subject apparatus is not able to respond, so that it is possible to prevent unnecessary communication from being performed, and it is possible to reduce a communication amount and a communication load.
Moreover, the selection unit determines whether each of the information devices is able to generate a response corresponding to the input information, and selects, as an information device that generates a response corresponding to the input information, an information device other than an information device that is determined as not being able to generate a response corresponding to the input information. With this configuration, the response processing apparatus is able to selectively transmit the request to only the information device that is able to cope with the request, so that it is possible to reduce a communication amount and a communication load.
Furthermore, the selection unit selects a plurality of information devices as information devices that generate responses corresponding to the input information. The output control unit determines a response to be output to the user on the basis of information on comparison among the responses generated by the plurality of information devices. With this configuration, the response processing apparatus is able to prepare a plurality of responses to the request of the user, so that it becomes easy to output a response that meets the request of the user.
Moreover, the output control unit determines a response to be output to the user on the basis of an information amount or a type of each of the responses generated by the plurality of information devices. With this configuration, the response processing apparatus is able to select and output a response with a large information amount from among a plurality of responses, so that it becomes easy to output a response that meets the request of the user.
Furthermore, the selection unit selects a plurality of information devices as information devices that generate responses corresponding to the input information. The output control unit synthesizes a plurality of responses generated by the plurality of information devices and generates a response to be output to the user. With this configuration, the response processing apparatus is able to select and generate a response from among pieces of information acquired by the plurality of devices, so that it is possible to issue an accurate response as desired by the user.
Moreover, the selection unit converts the input information to a mode that is recognizable by each of the selected information devices, and transmits the converted input information to the plurality of information devices. With this configuration, the response processing apparatus is able to promptly transmit the request of the user even to the plurality of information devices that have various APIs or input systems.
Furthermore, the acquisition unit acquires the context of the user. The output control unit determines a mode of output of the response generated by the selected information device on the basis of the context. With this configuration, the response processing apparatus is able to flexibly output a response in accordance with the context of the user, so that it is possible to more effectively provide an agent function, such as a dialogue, to the user.
Moreover, the acquisition unit acquires, as the context, attribute information on the user, which is registered in advance by the user. With this configuration, the response processing apparatus is able to generate a response in accordance with characteristics, such as an old age, a child, or visual impairment, of each of the user.
Furthermore, the output control unit determines a type of a response to be output to the user or an output destination that outputs the response, in accordance with the attribute information on the user. With this configuration, the response processing apparatus is able to select an appropriate output, such as a voice or an image, in accordance with the attribute of the user.
Moreover, the acquisition unit acquires, as the context, location information indicating a location of the user. With this configuration, the response processing apparatus is able to perform a response process with high usability, such as output of a response at a position at which the user is present.
Furthermore, the output control unit determines an information device that outputs a response to the user on the basis of a positional relationship between the user and at least one of the information devices. With this configuration, the response processing apparatus is able to flexibly perform output in accordance with the position of the user, such as output of the response from the information device located at a position close to the user.
Moreover, the acquisition unit acquires, as the context, estimation information on an estimated behavior or an estimated emotion of the user. With this configuration, the response processing apparatus is able to output a response that is suitable for a behavior to be performed by the user.
Furthermore, the output control unit determines a type of a response to be output to the user, a mode of the response, or an information device that outputs the response, on the basis of the estimation information. With this configuration, for example, if it is estimated that the user is in a hurry as compared to a normal state, the response processing apparatus is able to more flexibly cope with the output, such as output of a response in a shortened replay time.
Moreover, when determining that any of the response processing apparatus and the plurality of information devices is not able to generate a response corresponding to the input information, the acquisition unit acquires a context of a different user other than the user. The output control unit determines an output destination of an output related to the input information on the basis of the context of the different user. With this configuration, for example, the response processing apparatus is able to send, to the user, a query about a request that the agent device is not able to cope with, so that it is possible to improve the possibility to respond to the request of the user.
Furthermore, the acquisition unit acquires, from the user, a reaction to a response that has been output in the past. The output control unit determines a type of a response to be output to the user, a mode of the response, or the output destination that outputs the response, on the basis of the reaction acquired from the user. With this configuration, the response processing apparatus is able to reflect a learning result of the past reaction of the user in an output, so that it is possible to more accurately respond to the request of the user.
The information devices, such as the response processing apparatus 100, the terminal 10, and the external server 200, according to each of the embodiments as described above is realized by a computer 1000 configured as illustrated in
The CPU 1100 operates based on a program that is stored in the ROM 1300 or the HDD 1400, and controls each of the units. For example, the CPU 1100 loads a program stored in the ROM 1300 or the HDD 1400 onto the RAM 1200, and performs a process corresponding to various programs.
The ROM 1300 stores therein a boot program, such as basic input output system (BIOS), which is executed by the CPU 1100 at the time of activation of the computer 1000, a program that depends on the hardware of the computer 1000, and the like.
The HDD 1400 is a computer-readable recording medium that permanently records thereon a program executed by the CPU 1100 and data or the like used by the program. Specifically, the HDD 1400 is a recording medium that records thereon a response processing program according to the present disclosure, which is one example of program data 1450.
The communication interface 1500 is an interface that allows the computer 1000 to connect to an external network 1550 (for example, the Internet). For example, the CPU 1100 receives data from a different device and transmits data generated by the CPU 1100 to the different device via the communication interface 1500.
The input output interface 1600 is an interface for connecting an input output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device, such as a keyboard or a mouse, via the input output interface 1600. Further, the CPU 1100 transmits data to an output device, such as a display, a speaker, or a printer, via the input output interface 1600. Furthermore, the input output interface 1600 may function as a media interface that reads a program or the like that is stored in a predetermined recording medium (media). Examples of the media include an optical recording medium, such as a digital versatile disk (DVD) or a phase change rewritable disk (PD), a magneto optical recording medium, such as a magneto-optical (MO) disk, a tape medium, a magnetic recording medium, and a semiconductor memory.
For example, if the computer 1000 functions as the response processing apparatus 100 according to the first embodiment, the CPU 1100 of the computer 1000 executes the response processing program loaded on the RAM 1200, and implements the functions of the acquisition unit 40 and the like. Further, the HDD 1400 stores therein the response processing program according to the present disclosure and the data stored in the storage unit 30. Meanwhile, the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program data 1450, but, as another example, the program may be acquired from a different apparatus via the external network 1550.
Additionally, the present technology may also be configured as below.
(1)
A response processing apparatus comprising:
an acquisition unit that acquires, from a user, input information that is information used as a trigger to cause an information device to generate a response;
a selection unit that selects an information device that generates a response corresponding to the input information from among a plurality of information devices; and
an output control unit that controls output of a response that corresponds to the input information and that is generated by the selected information device.
(2)
The response processing apparatus according to (1), wherein the acquisition unit acquires, as the input information, voice information that is spoken by the user.
(3)
The response processing apparatus according to (1) or (2), wherein the acquisition unit acquires, as the input information, detection information on a detected behavior of the user.
(4)
The response processing apparatus according to any one of (1) to (3), wherein when determining that the response processing apparatus is not able to generate a response corresponding to the input information, the selection unit selects an information device that generates a response corresponding to the input information from among the plurality of information devices.
(5)
The response processing apparatus according to (4), wherein the selection unit determines whether each of the information devices is able to generate a response corresponding to the input information, and selects, as an information device that generates a response corresponding to the input information, an information device other than an information device that is determined as not being able to generate a response corresponding to the input information.
(6)
The response processing apparatus according to (4) or (5), wherein
the selection unit selects a plurality of information devices as information devices that generate response corresponding to the input information, and
the output control unit determines a response to be output to the user on the basis of information on comparison among the responses generated by the plurality of information devices.
(7)
The response processing apparatus according to (6), wherein the output control unit determines a response to be output to the user on the basis of one of an information amount and a type of each of the responses generated by the plurality of information devices.
(8)
The response processing apparatus according to any one of (4) to (7), wherein
the selection unit selects a plurality of information devices as information devices that generate responses corresponding to the input information, and
the output control unit synthesizes a plurality of responses generated by the plurality of information devices and generates a response to be output to the user.
(9)
The response processing apparatus according to any one of (1) to (8), wherein the selection unit converts the input information to a mode that is recognizable by each of the selected information devices, and transmits the converted input information to the plurality of information devices.
(10)
The response processing apparatus according to any one of (1) to (9), wherein
the acquisition unit acquires a context of the user, and
the output control unit determines a mode of output of a response generated by the selected information device on the basis of the context.
(11)
The response processing apparatus according to (10), wherein the acquisition unit acquires, as the context, attribute information on the user, the attribute information being registered in advance by the user.
(12)
The response processing apparatus according to (11), wherein the output control unit determines one of a type of a response to be output to the user and an output destination that outputs the response, in accordance with the attribute information on the user.
(13)
The response processing apparatus according to any one of (10) to (12), wherein the acquisition unit acquires, as the context, location information indicating a location of the user.
(14)
The response processing apparatus according to (13), wherein the output control unit determines an information device that outputs a response to the user on the basis of a positional relationship between the user and at least one of the information devices.
(15)
The response processing apparatus according to any one of (10) to (14), wherein the acquisition unit acquires, as the context, estimation information on one of an estimated behavior and an estimated emotion of the user.
(16)
The response processing apparatus according to (15), wherein the output control unit determines one of a type of a response to be output to the user, a mode of the response, and an information device that outputs the response, on the basis of the estimation information.
(17)
The response processing apparatus according to any one of (10) to (16), wherein
when determining that any of the response processing apparatus and the plurality of information devices is not able to generate a response corresponding to the input information, the acquisition unit acquires a context of a different user other than the user, and
the output control unit determines an output destination of an output related to the input information on the basis of the context of the different user.
(18)
The response processing apparatus according to any one of (1) to (17), wherein
the acquisition unit acquires, from the user, a reaction to a response that has been output in the past, and
the output control unit determines one of a type of a response to be output to the user, a mode of the response, and an output destination that outputs the response, on the basis of the reaction acquired from the user.
(19)
A response processing method performed by a computer, the response processing method comprising:
acquiring, from a user, input information that is information used as a trigger to cause an information device to generate a response;
selecting an information device that generates a response corresponding to the input information from among the plurality of information devices; and
controlling output of a response that corresponds to the input information and that is generated by the selected information device.
(20)
A response processing program that causes a computer to function as:
an acquisition unit that acquires, from a user, input information that is information used as a trigger to cause an information device to generate a response;
a selection unit that selects an information device that generates a response corresponding to the input information from among a plurality of information devices; and
an output control unit that controls output of a response that corresponds to the input information and that is generated by the selected information device.
1 response processing system
10 terminal
100 response processing apparatus
20 sensor
20A voice input sensor
20B image input sensor
21 input unit
22 communication unit
30 storage unit
31 user information table
32 terminal information table
33 function table
40 acquisition unit
41 detection unit
42 registration unit
43 receiving unit
50 selection unit
51 request analysis unit
52 state estimation unit
55 output control unit
60 output unit
200 external server
Number | Date | Country | Kind |
---|---|---|---|
2018-230404 | Dec 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/040156 | 10/11/2019 | WO | 00 |