The present application claims priority from Japanese patent application JP 2019-032335 filed on Feb. 26, 2019, the content of which is hereby incorporated by reference into this application.
The present invention relates to a response apparatus and a response method for responding to a user.
JP-2005-258820-A discloses a feeling guidance apparatus for enabling an agent to establish communication influential to a person even if a person's mental state is negative. This feeling guidance apparatus includes mentality detection means detecting a mental state of a person by using at least one of a biological information detection sensor and a person's state detection sensor; situation detection means detecting a situation in which the person is put; and mental state determination means determining whether or not the person's mental state is a state in which the person feels unpleasant on the basis of the person's mental state detected by the mentality detection means, the situation in which the person is put detected by the situation detection means, and duration time of the situation in which the person is put. In a case in which the mental state determination means determines that the person's mental state is the state in which the person feels unpleasant, an agent establishes communication in conformity to the person's mental state.
However, with the conventional technique described above, it is impossible to estimate a target to which a user expresses a feeling; thus, there is a case in which the agent sends an inappropriate response to the user and does not contribute to inducing an action of the user.
An object of the present invention is to achieve an improvement in accuracy for a response to a user.
According to one aspect of the invention disclosed in the present application, a response apparatus includes a processor that executes a program, and a storage device that stores the program, and is connected to an acquisition device that acquires biological data and a display device that displays an image. The processor executes a target identification process that identifies a feeling expression target of a user using the response apparatus on the basis of the biological data on the user acquired by the acquisition device, a feeling identification process that identifies a feeling of the user on the basis of facial image data on the user, a determination process that determines a feeling indicated by the image displayed on the display device on the basis of the feeling expression target identified by the target identification process and the feeling of the user identified by the feeling identification process, and a generation process that generates image data indicating the feeling determined by the determination process to output the image data to the display device.
According to a typical embodiment of the present invention, it is possible to achieve an improvement in accuracy for a response to a user. Objects, configurations, and advantages other than those described above will be readily apparent from the description of embodiments given below.
In
In
In
In
In
In this way, in the present embodiment, identifying the target to which the user 101 expresses a feeling enables the interactive robot 102 to send an appropriate response to the user 101 to contribute to inducting the user 101 to exhibit spontaneous behavior.
The microphone 202 is used to input a voice on the front face 200a of the response apparatus 200 to the microphone 202. The display device 203 displays an agent 230 that personifies the interactive robot 102. The agent 230 is a facial image (including a facial video) displayed on the display device 203. The speaker 204 outputs a voice of a speech of the agent 230 or the other voice.
The processor 301 controls the response apparatus 200. The storage device 302 serves as a work area of the processor 301. Furthermore, the storage device 302 serves as either a non-transitory or transitory recording medium that stores various programs and data (including a facial image of a target). Examples of the storage device 302 include a Read Only Memory (ROM), a Random Access Memory (RAM), a Hard Disk Drive (HDD), and a flash memory.
The drive circuit 303 controls a driving mechanism of the response apparatus 200 to be driven in response to a command from the processor 301, thereby moving the interactive robot 102. The communication IF 304 is connected to a network to transmit and receive data. The sensor 305 detects a physical phenomenon and a physical state of the target. Examples of the sensor 305 include a range sensor that measures a distance to the target and an infrared ray sensor that detects whether or not the target is present.
The input device 306 is a button or a touch panel touched by the target to input data to the response apparatus 200 through the input device 306. The camera 201, the microphone 202, the sensor 305, and the input device 306 are generically referred to as an “acquisition device 310” that acquires information associated with the target such as biological data. In addition, the communication IF 304, the display device 203, and the speaker 204 are generically referred to as an “output device 320” that outputs information to the target.
It is noted that the drive circuit 303, the acquisition device 310, and the output device 320 may be provided outside of the response apparatus 200, for example, provided in the interactive robot 102 communicably connected to the response apparatus 200 via the network.
In a case in which the user feeling 402 is the joy 421, the sadness 422, and the surprise 424, the response feeling of the agent 230 displayed by the interactive robot 102 is “joy,” “sadness,” and “surprise,” respectively, irrespective of whether the target 401 is the user 101, the interactive robot 102, or the third party 103. In other words, the interactive robot 102 expresses a feeling as if the agent 230 sympathizes with the user 101 as a facial expression of the agent 230.
In a case in which the user feeling 402 is the anger 423 and the target 401 is the third party 103, the response feeling of the agent 230 displayed by the interactive robot 102 is also “anger.” In contrast, in a case in which the user feeling 402 is the anger 423 and the target 401 is the user 101 or the interactive robot 102, the response feeling of the agent 230 displayed by the interactive robot 102 is “sadness.” In a case in which the user feeling 402 is the anger 423, the user 101 is a male, in particular, and the target 401 is the user 101 himself, the response feeling of the agent 230 displayed by the interactive robot 102 is not “sadness” but “anger.”
The feeling response model 104 is a model reflective of statistical results depicted in
[Target Identification Process Based on Biological Data about User 101]
The target identification section 901 executes a target identification process for identifying the target 401 to which the feeling of the user 101 is expressed (hereinafter, referred to as “feeling expression target 401”) on the basis of the biological data, acquired by the acquisition device 310, regarding the user 101 using the response apparatus 200. The user 101 is a person whose facial image data is registered in the storage device 302 of the response apparatus 200. It is assumed that the facial image data is facial image data captured by the camera 201 of the response apparatus 200. A user name (which is not necessarily a real name) and voice data on the user name besides the facial image data may be registered in the storage device 302.
The biological data includes image data on the face of the user 101, image data on the hand of the user 101, and voice data on a speech of the user 101. The image data is assumed as data captured by the camera 201 installed in front of the interactive robot 102 in a case in which the interactive robot 102 faces the user 101.
Specifically, in a case, for example, in which the biological data is the facial image data on the user 101, the target identification section 901 identifies the feeling expression target 401 of the user 101 by identifying the face direction 1001 of the user 101 on the basis of the facial image data on the user 101. For example, the target identification section 901 extracts three feature points indicating inner corners of both eyes and a tip of the nose, and identifies the face direction 1001 of the user 101 from a relative position relation among the three feature points. The target identification section 901 then calculates a certainty factor per target 401 on the basis of the face direction 1001.
In a case in which the face direction 1001 is, for example, a front direction, the target identification section 901 determines that the user 101 is looking at the agent 230 of the interactive robot 102. Therefore, the target identification section 901 calculates 100% as a certainty factor that the feeling expression target 401 of the user 101 is the interactive robot 102, and calculates 0% as a certainty factor that the feeling expression target 401 of the user 101 is the third party 103. The target identification section 901 calculates both certainty factors such that a total of the factors is 100%.
On the other hand, as the face direction 1001 deviates more greatly from the front direction in a horizontal direction, the target identification section 901 determines that a probability that the third party 103 is present in the face direction 1001 is higher. Therefore, as the face direction 1001 deviates more greatly from the front direction in the horizontal direction, the target identification section 901 sets lower the certainty factor that the feeling expression target 401 of the user 101 is the interactive robot 102 and sets higher the certainty factor that the feeling expression target 401 of the user 101 is the third party 103. The target identification section 901 then identifies the interactive robot 102 or the third party 103 at the higher certainty factor as the feeling expression target 401 of the user 101. It is noted that both certainty factors of 50% indicate that the target identification section 901 is unable to identify the target 401.
It is noted that the target identification section 901 may determine whether the third party 103 is present from a detection result by the infrared ray sensor that is one example of the sensor 305. For example, only in a case in which the infrared ray sensor detects the presence of a person other than the user 101, the target identification section 901 may calculate the certainty factor that the feeling expression target 401 of the user 101 is the third party 103.
Furthermore, in a case in which the infrared ray sensor is used and the infrared ray sensor does not detect the presence of a person other than the user 101, the probability that the user 101 does not pay attention to anyone is higher as the face direction 1001 deviates more greatly from the front direction. In this case, as the face direction 1001 deviates more greatly from the front direction, the target identification section 901 may set lower the certainty factor that the feeling expression target 401 of the user 101 is the interactive robot 102 and set higher the certainty factor that the feeling expression target 401 of the user 101 is the third party 103. Also in this case, the target identification section 901 similarly calculates both certainty factors such that the total of the factors is 100%. The target identification section 901 then identifies the interactive robot 102 or the third party 103 at the higher certainty factor as the feeling expression target 401 of the user 101. It is noted that both certainty factors of 50% indicate that the target identification section 901 is unable to identify the target 401.
Moreover, in a case in which the biological data is the facial image data on the user 101, the target identification section 901 may identify the feeling expression target 401 of the user 101 by identifying the line-of-sight direction 1002 of the user 101 on the basis of the facial image data on the user 101. The target identification section 901 may identify the line-of-sight direction 1002 of the user 101 from image data on the eye (which may be any of the right and left eyes) of the user 101.
A central position 1102a of the iris in a case in which the line-of-sight direction 1002 of the left eye is the front direction is assumed, for example, as an intermediate point between the inner corner 1101 and the tail 1103 of the left eye. In this case, the distance d between the inner corner 1101 and the central position 1102a of the iris is assumed as a distance da. In the case of d=da, the target identification section 901 determines that the line-of-sight direction 1002 is the front direction and calculates 100% as the certainty factor that the feeling expression target 401 of the user is the interactive robot 102, and calculates 0% as the certainty factor that the feeling expression target 401 of the user 101 is the third party 103. The target identification section 901 calculates both certainty factors such that a total of the factors is 100%.
When the user 101 turns the user's eyes on the right side of the front, the central position 1102a of the iris moves rightward (the central position 1102 of the iris after movement is assumed as 1102b). In this case, the distance d is db (<da). Likewise, when the user 101 turns the user's eyes on the left side of the front, the central position 1102a of the iris moves leftward (the central position 1102 of the iris after movement is assumed as 1102c). In this case, the distance d is dc (>da).
In this way, the target identification section 901 determines that the line-of-sight direction 1002 of the user 101 deviates rightward from the front when the distance d is smaller than da, and that the user 101 deviates leftward from the front when the distance d is larger than da. Therefore, the target identification section 901 determines that the probability that the user 101 is looking at the agent 230 of the interactive robot 102 is higher as the line-of-sight direction 1002 of the user 101 deviates from the front more greatly in the horizontal direction.
Therefore, the target identification section 901 sets lower the certainty factor that the feeling expression target 401 of the user 101 is the interactive robot 102 and higher the certainty factor that the feeling expression target 401 of the user 101 is the third party 103 as the distance d deviates more greatly from the distance da. In this case, the target identification section 901 calculates both the certainty factors such that a total of the factors is 100%. The target identification section 901 then identifies the interactive robot 102 or the third party 103 at the higher certainty factor as the feeling expression target 401 of the user 101. It is noted that both certainty factors of 50% indicate that the target identification section 901 is unable to identify the target 401.
It is noted that the target identification section 901 may determine whether the third party 103 is present from the detection result by the infrared ray sensor that is one example of the sensor 305. For example, only in the case in which the infrared ray sensor detects the presence of a person other than the user 101, the target identification section 901 may calculate the certainty factor that the feeling expression target 401 of the user 101 is the third party 103.
Furthermore, in the case in which the infrared ray sensor is used and the infrared ray sensor does not detect the presence of a person other than the user 101, the probability that the user 101 does not pay attention to anyone is higher as the line-of-sight direction 1002 of the user 101 deviates more greatly from the front direction. In this case, as the line-of-sight direction 1002 more greatly deviates from the front direction, the target identification section 901 may set lower the certainty factor that the feeling expression target 401 of the user 101 is the interactive robot 102 and set higher the certainty factor that the feeling expression target 401 of the user 101 is the third party 103. also in this case, the target identification section 901 similarly calculates both certainty factors so that the total of the factors is 100%. The target identification section 901 then identifies the interactive robot 102 or the third party 103 at the higher certainty factor as the feeling expression target 401 of the user 101. It is noted that two certainty factors of 50% indicate that the target identification section 901 is unable to identify the target 401.
Moreover, in a case in which the biological data is the image data on the hand of the user 101, the target identification section 901 may identify the feeling expression target 401 of the user 101 by identifying the finger pointing direction 1003 of the user 101 on the basis of the image data on the hand of the user 101. Specifically, the target identification section 901, for example, acquires the image data on the hand of the user 101 with the ToF camera that is one example of the camera 201, and identifies the finger pointing direction 1003 of, for example, a forefinger using a learning model of deep learning. The target identification section 901 then calculates the certainty factor per target 401 on the basis of the finger pointing direction 1003.
As a result, in a case in which the finger pointing direction 1003 is the front direction, the target identification section 901 determines that the user 101 is pointing a finger at the agent 230 of the interactive robot 102. Therefore, the target identification section 901 calculates 100% as the certainty factor that the feeling expression target 401 of the user 101 is the interactive robot 102, and calculates 0% as the certainty factor that the feeling expression target 401 of the user 101 is the third party 103. The target identification section 901 calculates both certainty factors such that the total of the factors is 100%.
In contrast, as the finger pointing direction 1003 deviates more greatly from the front direction, the target identification section 901 determines that the probability that the third party 103 is present in the finger pointing direction 1003 is higher. Therefore, as the finger pointing direction 1003 deviates more greatly from the front direction in the horizontal direction, the target identification section 901 sets lower the certainty factor that the feeling expression target 401 of the user 101 is the interactive robot 102 and sets higher the certainty factor that the feeling expression target 401 of the user 101 is the third party 103. The target identification section 901 then identifies the interactive robot 102 or the third party 103 at the higher certainty factor as the feeling expression target 401 of the user 101. It is noted that both certainty factors of 50% indicate that the target identification section 901 is unable to identify the target 401.
It is noted that the target identification section 901 may determine whether the third party 103 is present from the detection result by the infrared ray sensor that is one example of the sensor 305. For example, only in the case in which the infrared ray sensor detects the presence of a person other than the user 101, the target identification section 901 may calculate the certainty factor that the feeling expression target 401 of the user 101 is the third party 103.
Furthermore, in a case in which the biological data is the voice data, the target identification section 901 may identify the feeling expression target 401 of the user 101 on the basis of voice recognition. Specifically, the target identification section 901 determines first, for example, whether or not the acquired voice data is voice data on the user 101 by the voice recognition on the basis of the voice data on the user 101 registered in advance.
In a case of determining that the acquired voice data is the voice data from the user 101 and a recognition result of the voice data from the user 101 is the first person such as “I,” “my,” and “me” as indicated in the voice 1004 of
[Target Identification Process Based on Interaction with User 101 (1)]
Furthermore, the target identification section 901 may identify the target 401 by an interaction with the user 101. Specifically, the target identification section 901 identifies the feeling expression target 401 of the user 101 on the basis of, for example, a change in the user feeling 402. In this case, the interactive robot 102 captures an image of the facial expression of the user 101 with the camera 201 and identifies the user feeling 402 by the user feeling identification section 902. The interactive robot 102 causes the generation section 904 to generate facial image data on the agent 230 that expresses the user feeling 402 identified by the user feeling identification section 902, to output the facial image data to the display device 203, and to display a facial image of the agent 230 that expresses the user feeling 402 on the display device 203.
In this case, the user feeling identification section 902 calculates a feeling intensity per user feeling 402. The feeling intensity indicates a likelihood of the user feeling 402 estimated from the facial expression of the user 101. The user feeling identification section 902 may calculate the feeling intensity by applying a facial action coding system (FACS) to be described later. Furthermore, the user feeling identification section 902 may apply a learning model of deep learning learned by applying a learning data set of the facial image data and a correct answer label of the user feeling 402 to a convolutional neural network, to the convolutional neural network. In this case, the user feeling identification section 902 inputs the facial image data on the user 101 into the convolutional neural network, and may determine an output value from the convolutional neural network (for example, an output value from a SoftMax function) as the feeling intensity.
In a case in which the feeling intensity of the user feeling 402 that is the anger 423 continues to be higher than those of the other user feelings 402 and the anger 423 then changes to the other user feeling 402, the user feeling identification section 902 calculates a positive negative degree as an evaluation value that indicates the change in the user feeling 402. The positive negative degree is an index value that indicates the positiveness (affirmative degree, activeness) and the negativeness (negative degree, inactiveness) of the user feeling 402, and is a difference between an amount of change J of the feeling intensity of the joy 421 that represents the positiveness and an amount of change S of the feeling intensity of the sadness 422 that represents the negativeness. The user feeling 402 is more positive as the positive negative degree is larger, and is more negative as the positive negative degree is smaller.
More specifically, in a case in which an absolute value of the positive negative degree is equal to or greater than a threshold and the positive negative degree is a positive value, the target identification section 901 determines that the user feeling 402 is in a positive state in which the user feeling 402 changes from the anger 423 to the joy 421.
Conversely, in a case in which the absolute value of the positive negative degree is equal to or greater than the threshold and the positive negative degree is a negative value, the target identification section 901 determines that the user feeling 402 is in a negative state in which the user feeling 402 changes from the anger 423 to the sadness 422. It is noted that the target identification section 901 determines that the anger 423 that is the user feeling 402 continues in a state in which the feeling intensity 1201 of the anger 423 is higher than those of the other user feelings 402 in a case in which the absolute value of the positive negative degree is not equal to or greater than the threshold.
Conversely, in the case in which the user reaction 1301 is negative, the target identification section 901 determines that the target 401 is the user 101 or the interactive robot 102. In this case, the target identification section 901 executes a target identification process based on a dialog.
The target identification section 901 identifies the target 401 as either the user 101 or the interactive robot 102 by a dialog with the user 101. Specifically, the target identification section 901 displays, for example, a character string that urges the user 101 to reply to the interactive robot 102 by voice output or the display device 203. The target identification section 901 determines that the target 401 is the interactive robot 102 in a case of recognizing that the user 101 does not reply or that a content of a voice from the user 101 is that the user 101 denies the dialog with the interactive robot 102 by the voice recognition. In contrast, the target identification section 901 identifies the target 401 as the user 101 in a case of recognizing that the content of the voice from the user 101 is that the user 101 affirms the dialog with the interactive robot 102.
[Target Identification Process Based on Interaction with User 101 (2)]
Furthermore, the target identification section 901 identifies the feeling expression target 401 of the user 101 as either the user 101 or the interactive robot 102 on the basis of data indicative of a user reaction to a finger pointing image acquired by the acquisition device 310 as a result of display of the finger pointing image indicating finger pointing at either the user 101 or the interactive robot 102 on the display device 203.
Specifically, the generation section 904 generates, for example, facial image data on the agent 230 indicating finger pointing at the user 101 or facial image data on the agent 230 indicating finger pointing at the interactive robot 102 (or agent 230) itself as a gesture of the interactive robot 102, and displays a facial image of the agent 230 on the display device 203 of the interactive robot 102.
As a result of displaying the facial image of the agent 230 and causing the acquisition device 310 to acquire the facial expression or voice of the user 101 as data indicating the user reaction, the target identification section 901 identifies whether the user reaction is agreement (an action indicating a nod or a voice meaning the agreement) or disagreement (an action of shaking the user's head or a voice meaning the disagreement).
The target identification section 901 identifies the target 401 as the user 101 if the content of the gesture 1401 of the interactive robot 102 is that the facial image of the agent 230 is indicative of finger pointing at the interactive robot 102 (or agent 230) itself and the user reaction 1402 indicates disagreement. The target identification section 901 identifies the target 401 as the interactive robot 102 if the content of the gesture 1401 of the interactive robot 102 is that the facial image of the agent 230 is indicative of finger pointing at the interactive robot 102 (or agent 230) itself and the user reaction 1402 indicates agreement.
An example in which the facial image of the agent 230 indicative of finger pointing at the user 101 or the interactive robot 102 (or agent 230) is used as the gesture 1401 of the interactive robot 102 has been described above. Alternatively, the target identification section 901 may control the interactive robot 102 to strike a pose of pointing a finger at the user 101 or the interactive robot 102 (or agent 230) itself as the gesture 1401 of the interactive robot 102 by moving an arm and a finger of the interactive robot 102 by drive control from the drive circuit 303.
It is noted that the target identification section 901 may execute any one of the “Target Identification Processes Based on Interaction with User 101 (1) and (2)” in a case in which the target identification section 901 is unable to identify the target 401 by performing “Target Identification Process Based on Biological Data on User 101.” Alternatively, the target identification section 901 may execute any one of the “Target Identification Processes Based on Interaction with User 101 (1) and (2)” independently of “Target Identification Process Based on Biological Data on User 101.”
The user feeling identification section 902 executes a feeling identification process for identifying the user feeling 402 on the basis of the facial image data on the user 101. Specifically, the user feeling identification section 902, for example, acquires the facial image data on the user 101 with the camera 201, and extracts many feature points, for example, 64 feature points from the facial image data. The user feeling identification section 902 identifies the user feeling 402 by a combination of the 64 feature points and changes thereof.
The user feeling identification section 902 calculates the feeling intensities for each of a plurality of calculation target AU numbers 1701 per user feeling 402. The user feeling identification section 902 calculates statistics of the plurality of calculated feeling intensities per user feeling 402. The statistics are, for example, at least one of an average value, a maximum value, a minimum value, a median value of the plurality of calculated feeling intensities. The user feeling identification section 902 identifies the user feeling 402 having maximum statistics among the statistics of the feeling intensities calculated for the user feelings 402 from among the user feelings 402, and outputs the identified user feeling 402 to the determination section 903.
The determination section 903 executes a determination process for determining the response feeling of the agent 230 indicated by a facial image displayed on the display device 203 on the basis of the feeling expression target 401 identified by the target identification section 901 and the user feeling 402 identified by the user feeling identification section 902. Specifically, the determination section 903, for example, refers to the feeling response model 104, and determines the response feeling of the agent 230 corresponding to the feeling expression target 401 identified by the target identification section 901 and the user feeling 402 identified by the user feeling identification section 902.
Furthermore, the determination section 903 may determine the response feeling of the agent 230 indicated by the facial image of the agent 230 displayed on the display device 203 on the basis of the gender of the user 101. In a case in which the gender of the user 101 is registered in advance in the storage device 302 by the user 101 using the input device 306, the determination section 903 may determine the response feeling of the agent 230 in response to the gender of the user 101.
For example, in a case in which the gender is not applied, the target 401 is the user 101, and the user feeling 402 is the anger 423, the determination section 903 determines the response feeling of the agent 230 as “sadness.” In a case in which the gender is applied, the gender of the user 101 is a male, the target 401 is the user 101, and the user feeling 402 is the anger 423, the determination section 903 determines the response feeling of the agent 230 as “anger.”
Moreover, the determination section 903 may apply the learning model of deep learning learned by applying the learning data set of the facial image data and the correct answer label to the convolutional neural network, to the convolutional neural network. In this case, the determination section 903 inputs the facial image data 1501 on the user 101 to the convolutional neural network, and applies an output value from the convolutional neural network as a determination result of the gender.
The generation section 904 executes a generation process for generating the facial image data on the agent 230 indicating the response feeling determined by the determination section 903 and outputting the facial image data to the display device 203. An example of facial images of the agent 230 is depicted in
In contrast, in a case in which the response apparatus 200 has not been able to identify the target 401 (Step S2002: No), the response apparatus 200 executes either “Target Identification Process Based On Interaction With User 101 (1)” or “Target Identification Process Based on Interaction With User 101 (2) ” described above (Step S2003). In a case in which the response apparatus 200 has been able to identify the target 401 (Step S2004: Yes), the process goes to Step S1902.
In contrast, in a case in which the response apparatus 200 has not been able to identify the target 401 (Step S2004: No), the response apparatus 200 executes the target identification process based on dialog described above (Step S2005). The process then goes to Step S1902. In a case in which the response apparatus 200 executes the “Target Identification Process Based on Interaction with User 101 (2)” in Step S2003, the target 401 is identified. Therefore, the process goes to Step S1902 without executing Steps S2004 and S2005.
Furthermore, in the case of acquiring, for example, the facial image data 1501 on the user 101 by the acquisition device 310, the response apparatus 200 identifies the line-of-sight direction 1002 of the user 101 (Step S2102). In this case, the response apparatus 200 calculates the certainty factor per target 401 from the identified line-of-sight direction 1002 of the user 101 and identifies the target 401 on the basis of the certainty factor (Step S2106). The process then goes to Step S2002.
Moreover, in the case of acquiring, for example, the image data on the hand of the user 101 by the acquisition device 310, the response apparatus 200 identifies the finger pointing direction 1003 of the user 101 (Step S2103). In this case, the response apparatus 200 calculates the certainty factor per target 401 from the identified finger pointing direction 1003 of the user 101 and identifies the target 401 on the basis of the certainty factor (Step S2107). The process then goes to Step S2002.
Furthermore, in the case of acquiring the voice data by the acquisition device 310, the response apparatus 200 identifies that the acquired voice data is the voice data from the user 101 on the basis of voice recognition associated with the voice data on the user 101 registered in advance (Step S2104). In this case, the response apparatus 200 identifies a content of the speech on the basis of the voice recognition result of the identified voice data from the user 101 and identifies the target 401 from the content of the speech (Step S2108). The process then goes to Step S2002.
<Target Identification Process Based on Interaction with User 101>
In contrast, in a case in which the user feeling 402 is the anger 423 (Step S2202: Yes), the response apparatus 200 generates the facial image data on the user feeling 402 (anger 423) and displays the facial image 230a of the agent 230 indicating the “anger” on the display device 203 by the generation section 904 (Step S2203). The response apparatus 200 then calculates the positive negative degree by the target identification section 901 (Step S2204). The response apparatus 200 determines whether or not the absolute value of the positive negative degree is equal to or greater than the threshold by the target identification section 901 (Step S2205).
In a case in which the absolute value of the positive negative degree is not equal to or greater than the threshold (Step S2205: No), then the response apparatus 200 determines that the anger 423 that is the user feeling 402 indicating the maximum feeling intensity continues by the target identification section 901, and the process returns to Step S2204. In contrast, in a case in which the absolute value of the positive negative degree is equal to or greater than the threshold (Step S2205: Yes), then the response apparatus 200 determines that the anger 423 that is the user feeling 402 indicating the maximum feeling intensity continues by the target identification section 901, and the process returns to Step S2204.
In contrast, in a case in which the absolute value of the positive negative degree is equal to or greater than the threshold (Step S2205: Yes), the response apparatus 200 determines that the user feeling 402 has changed from the anger 423 to the joy 421 or the sadness 422, and determines whether or not the user feeling 402 is positive by the target identification section 901 (Step S2206). Specifically, the response apparatus 200 determines, for example, that the user feeling 402 is positive if the positive negative degree takes a positive value, and that the user feeling 402 is negative if the positive negative degree takes a negative value by the target identification section 901.
In a case in which the user feeling 402 is positive (Step S2206: Yes), then the response apparatus 200 refers to the first target identification table of
In a case in which the response apparatus 200 has not detected the face of the user 101 (Step S2301: No), the process goes to Step S2004 without identifying the target 401. In contrast, in a case in which the response apparatus 200 has detected the face of the user 101 (Step S2301: Yes), the response apparatus 200 generates the facial image data on the agent 230 indicating finger pointing at the user 101 and displays the facial image of the agent 230 indicating finger pointing at the user 101 on the display device 203 (Step S2302).
Next, the response apparatus 200 determines whether or not the user 101 has agreed on the basis of the biological data acquired from the acquisition device 310 by the target identification section 901 (Step S2303). Specifically, the response apparatus 200 determines whether or not the user reaction 1402 depicted in
In a case in which the user 101 has agreed (Step S2303: Yes), then the response apparatus 200 identifies the target 401 as the user 101 by the target identification section 901 (Step S2304), and the process goes to Step S2004.
In a case in which the user 101 has not agreed (Step S2303: No), the response apparatus 200 determines whether or not the user 101 has disagreed on the basis of the biological data acquired from the acquisition device 310 by the target identification section 901 similarly to Step S2303 (Step S2305). Specifically, the response apparatus 200 determines whether or not the user reaction 1402 depicted in
In a case in which the user 101 has not disagreed (Step S2305: No), the process goes to Step S2004 without identifying the target 401. In a case in which the user 101 has disagreed (Step S2305: Yes), the response apparatus 200 generates the facial image data on the agent 230 indicating finger pointing at the agent 230 itself and displays the facial image of the agent 230 indicating finger pointing at the agent 230 itself on the display device 203 by the target identification section 901 (Step S2306). The response apparatus 200 then determines whether the user 101 has agreed on the basis of the biological data acquired from the acquisition device 310 by the target identification section 901 similarly to Step S2303 (Step S2307).
In a case in which the user 101 has agreed (Step S2307: Yes), then the response apparatus 200 identifies the target 401 as the interactive robot 102 by the target identification section 901 (Step S2308), and the process goes to Step S2004.
In a case in which the user 101 has not agreed (Step S2307: No), then the response apparatus 200 determines whether or not the user 101 has disagreed on the basis of the biological data acquired from the acquisition device 310 by the target identification section 901 similarly to Step S2303 (Step S2309).
In a case in which the user 101 has not disagreed (Step S2309: No), the process goes to Step S2004 without identifying the target 401. In a case in which the user 101 has disagreed (Step S2309: Yes), then the response apparatus 200 identifies the target 401 as the third party 103 by the target identification section 901 (Step S2310), and the process goes to Step S2004.
(1) In this way, the response apparatus 200 in the present embodiment identifies the feeling expression target 401 of the user 101; identifies the user feeling 402; determines the feeling indicated by the facial image of the agent 230 on the basis of the target 401 and the user feeling 402; and generates facial image data on the agent 230 indicating the determined feeling and displays the facial image of the agent 230 on the display device 203. It is thereby possible to achieve an improvement in accuracy for a response to the user 101.
(2) Furthermore, in (1), the response apparatus 200 may identify the feeling expression target 401 of the user 101 by identifying the face direction 1001 of the user 101 from the facial image data 1501 on the user 101. It is thereby possible to estimate a companion faced by the user 101 as the feeling expression target 401 of the user 101.
(3) Moreover, in (1), the response apparatus 200 may identify the feeling expression target 401 of the user 101 by identifying the line-of-sight direction 1002 of the user 101 from the facial image data 1501 on the user 101. It is thereby possible to estimate a companion to which the user 101 turns the user's eyes as the feeling expression target 401 of the user 101.
(4) Furthermore, in (1), the response apparatus 200 may identify the feeling expression target 401 of the user 101 by identifying the finger pointing direction 1003 of the user 101 from the image data on the hand of the user 101. It is thereby possible to estimate a companion at which the user 101 is pointing a finger as the feeling expression target 401 of the user 101.
(5) Moreover, in (1), the response apparatus 200 may identify the feeling expression target 401 of the user 101 on the basis of the voice data on the user 101. It is thereby possible to estimate a companion to which the user 101 is talking as the feeling expression target 401 of the user 101.
(6) Furthermore, in (1), the response apparatus 200 may identify the feeling expression target 401 of the user 101 on the basis of the change in the user feeling 402. It is thereby possible to identify the feeling expression target 401 of the user 101 as the third party 103 if the user feeling 402 after the change is positive.
(7) Moreover, in (6), the response apparatus 200 may calculate the positive negative degree that indicates the change in the user feeling 402, and identify the feeling expression target 401 of the user 101 on the basis of the positive negative degree. It is thereby possible to digitize the change in the user feeling 402 and, therefore, achieve an improvement in target identification accuracy.
(8) Furthermore, in (7), the response apparatus 200 may identify the feeling expression target 401 of the user 101 as the third party 103 in a case in which the user feeling 402 before the change is the anger 423 and the user feeling 402 after the change in the positive negative degree is positive. It is thereby possible to identify the feeling expression target 401 of the user 101 as the third party 103 in the case in which the user feeling 402 is the anger 423 and the user reaction 1301 is positive when the interactive robot 102 imitates the user feeling 402 (anger 423).
(9) Moreover, in (1), the response apparatus 200 may identify the feeling expression target 401 of the user 101 as either the user 101 or the interactive robot 102 on the basis of the user reaction 1402 acquired by the acquisition device 310 as a result of display of the facial image of the agent 230 indicating finger pointing at the user 101 or the agent 230 itself on the display device 203. It is thereby possible to identify the feeling expression target 401 of the user 101 by a dialog between the user 101 and the interactive robot 102.
(10) Furthermore, in (1), the response apparatus 200 may determine the feeling indicated by the facial image of the agent 230 displayed on the display device 203 on the basis of the gender of the user 101. It is thereby possible to determine the feeling indicated by the facial image of the agent 230 in the light of a difference in gender.
While the feeling is expressed with the image of only the face of the agent 230 in the embodiment described above, the image is not limited to the facial image but may be an image of a humanoid robot and the feeling such as the anger, the surprise, the sadness, or the joy may be expressed by a motion or an action of the humanoid robot.
The present invention is not limited to the embodiment described above but encompasses various modifications and equivalent configurations within the meaning of the accompanying claims. For example, the above-mentioned embodiments have been described in detail for describing the present invention in order to facilitate easy understanding of the present invention, and the present invention is not always limited to the embodiment having all the described configurations. Furthermore, part of the configurations of a certain embodiment may be replaced by configurations of another embodiment. Moreover, the configurations of another embodiment may be added to the configurations of the certain embodiment. Further, for part of the configurations of each embodiment, addition, deletion, or replacement may be made of the other configurations.
Moreover, part of or all of the configurations, the functions, the processing sections, processing means, and the like described above may be realized by hardware by being designed, for example, as an integrated circuit, or may be realized by software by causing the processor to interpret and execute programs that realize the functions.
Information in programs, tables, files, and the like for realizing the functions can be stored in a storage device such as a memory, a hard disk, or a solid state drive (SSD), or in a recording medium such as an integrated circuit (IC) card, a secure digital (SD) card, or a digital versatile disc (DVD).
Furthermore, control lines or information lines considered to be necessary for the description are illustrated and all the control lines or the information lines necessary for implementation are not always illustrated. In practice, it may be considered that almost all the configurations are mutually connected.
Number | Date | Country | Kind |
---|---|---|---|
2019-032335 | Feb 2019 | JP | national |