The present disclosure relates to an information processing device, an information processing method, and a program.
In recent years, voice input systems using voice recognition technology have been used. When a user performs an input by voice, it may be difficult to recognize a voice due to noise around the user (a sound other than the voice in the input by voice). For example, in a case in which the noise around the user is large, the voice of the user may not be recognized unless the user speaks with a louder voice. Here, in a case in which a source of noise is a device in which the volume can be adjusted by the user such as a television or a speaker, the voice recognition can be performed with a high degree of accuracy by the user manipulating the device and lowering the volume.
For the volume adjustment, there is a technique of automatically adjusting the volume of the sound output from the same device as the device to which the user's voice is input. For example, a television receiver that detects the user's voice and performs automatic volume adjustment so that a conversation can be smoothly performed even in a case in which a plurality of users are wearing headphones is disclosed in Patent Literature 1.
Patent Literature 1: JP 2008-72558A
However, since the device that performs the voice recognition and the source of the sound around the user are not necessarily the same device, sufficient voice recognition accuracy is unlikely to be obtained even when the technology related to the volume adjustment mentioned above is applied to the voice recognition technology.
In this regard, the present disclosure proposes an information processing device, an information processing method, and a program which are novel and improved and capable of improving the voice recognition accuracy even in a case in which there are other sound sources around the user.
According to the present disclosure, there is provided an information processing device including: a state detecting unit configured to detect a state of another device which can be a source of noise; and a state control unit configured to control the state of the other device on a basis of a detection result for the state of the other device and speech prediction of a user.
In addition, according to the present disclosure, there is provided an information processing method including: detecting a state of another device which can be a source of noise; and controlling, by a processor, the state of the other device on a basis of a detection result for the state of the other device and speech prediction of a user.
In addition, according to the present disclosure, there is provided a program causing a computer to implement: a function of detecting a state of another device which can be a source of noise; and a function of controlling the state of the other device on a basis of a detection result for the state of the other device and speech prediction of a user.
As described above, according to the present disclosure, it is possible to improve the voice recognition accuracy even in a case in which there are other sound sources around the user.
Note that the effects described above are not necessarily limitative. With or in the place of the above effects, there may be achieved any one of the effects described in this specification or other effects that may be grasped from this specification.
Hereinafter, (a) preferred embodiment(s) of the present disclosure will be described in detail with reference to the appended drawings. Note that, in this specification and the appended drawings, structural elements that have substantially the same function and structure are denoted with the same reference numerals, and repeated explanation of these structural elements is omitted.
Note that, in this description and the drawings, structural elements that have substantially the same function and structure are sometimes distinguished from each other using different alphabets after the same reference sign. However, when there is no need in particular to distinguish elements that have substantially the same function and structure, the same reference sign alone is attached.
Further, the description will proceed in the following order.
First, an overview of a first embodiment of the present disclosure will be described with reference to
An external appearance of the information processing device 1 is not particularly limited and may be, for example, a columnar shape as illustrated in
As illustrated in the scene T1 of
Here, as illustrated in the scene T1 of
In this regard, in the voice recognition system according to the first embodiment of the present disclosure, it is possible to improve the voice recognition accuracy by controlling states of peripheral devices related to an output of noise in the voice recognition on the basis of speech prediction for the user.
Specifically, as illustrated in a scene T2 of
As described above, in the voice recognition system according to the present embodiment, for example, if speech of the user is predicted, it is possible to improve the voice recognition accuracy by performing control such that the volume of the device related to the output of the noise is reduced.
The overview of the voice recognition system according to the present disclosure has been described above. Further, a shape of the information processing device 1 is not limited to a cylindrical shape illustrated in
As described with reference to
The peripheral device 7 connected to the information processing device 1 via the communication network 9 is a device that is placed near the information processing device 1 and outputs a sound. The peripheral device 7 may include, for example, a device having a function of outputting a sound such as music or voice such as the television receiver 71 or the audio device 72 (for example, a speaker, a mini-component system, or the like) as illustrated in
The peripheral device 7 may transmit capability information indicating what the peripheral device 7 can do and state information indicating the state of the peripheral device 7 to the information processing device 1 via the communication network 9. The capability information may include, for example, information such as operations which can be performed by the peripheral device 7 (for example, a sound output, blast, ventilation, and the like), states which the peripheral device 7 can enter, or a type of state information which can be transmitted (or that cannot be transmitted) by the peripheral device 7. Further, the state information may include information such as a volume level, an operation mode (for example, a standby mode, a silent mode, or a common mode), or a state (ON/OFF) of a power switch or a setting value related to other operations which relate to the current peripheral device 7. Further, the peripheral device 7 may transmit the requested capability information or state information upon receiving a transmission request for the capability information or the state information from the information processing device 1.
Further, the peripheral device 7 receives a state control signal from the information processing device 1 via the communication network 9, and the state of the peripheral device 7 is controlled. The state of the peripheral device 7 controlled by the information processing device 1 may include, for example, the volume level, the operation mode, the power ON/OFF, and the like.
Further, a distance between the information processing device 1 and the peripheral device 7 is, for example, within a range that the sound reaches, and the sound output from the peripheral device 7 is collected through a microphone of the information processing device 1 and may serve as noise in the voice recognition by the information processing device 1. In the following description, the sound output from the peripheral device 7 is also referred to as noise without distinguishing music, voice, driving sounds, and the like from one another. Further, the peripheral device 7 is also referred to as another device which can be a source of noise or a device related to the output of the noise.
The communication network 9 is a wired or wireless transmission path of information transmitted from a device or a system connected to the communication network 9. In the present embodiment, since the distance between the information processing device 1 and the peripheral device 7 connected to the communication network 9 is within the range that the sound reaches as described above, for example, the communication network 9 may be various kinds of local area networks (LANs) including Ethernet (registered trademark). Further, the communication network 9 is not limited to a LAN, and the communication network 9 may include a public network such as the Internet, a telephone network, or a satellite communication network, a wide area network (WAN), or the like. Further, the communication network 9 may include a dedicated network such as an Internet protocol-virtual private network (IP-VPN).
The configuration of the voice recognition system according to the present embodiment has been described above. Next, a configuration example of the information processing device 1 included in the voice recognition system according to the present embodiment will be described with reference to
As illustrated in
The control unit 10 controls the components of the information processing device 1. Further, as illustrated in
The speech predicting unit 101 performs the speech prediction for the user (for example, predicts that the user is about to speak). Further, in a case in which the speech of the user is predicted, the speech predicting unit 101 may give a notification indicating that the speech of the user is predicted to the voice recognizing unit 102, the state detecting unit 104, and the state control unit 105. The speech prediction for the user by the speech predicting unit 101 can be performed in various methods.
For example, the speech predicting unit 101 may predict the speech of the user in a case in which the voice recognizing unit 102 to be described later detects a predetermined activation word (for example, “hello agent” or the like) from the voice of the user collected by the sound collecting unit 12 to be described later. Further, the speech predicting unit 101 may predict the speech of the user in a case in which it is detected that a voice input button (not illustrated) disposed in the information processing device 1 is pushed by the user. Further, the speech predicting unit 101 may predict the speech of the user in a case in which it is detected that the user is waving her or his hand on the basis of data obtained by the camera 14 and the range sensor 15 to be described later. Further, the speech predicting unit 101 may predict the speech of the user in a case in which it is detected that the user claps her or his hands on the basis of data obtained by the camera 14 or the range sensor 15 to be described later or a sound collected by the sound collecting unit 12. Further, the speech prediction for the user by the speech predicting unit 101 is not limited to the above examples, and the speech of the user may be predicted in various methods.
The voice recognizing unit 102 recognizes the voice of the user collected by the sound collecting unit 12 to be described later, converts the voice into a character string, and acquires speech text. Further, it is also possible to identify a person who is speaking on the basis of a voice feature of the voice recognizing unit 102 or to estimate a source of the voice, that is, a direction of the speaker.
Further, in a case in which a predetermined activation word is included in the acquired speech text, the voice recognizing unit 102 gives a notification indicating that the activation word has been detected to the speech predicting unit 101. Further, the voice recognizing unit 102 may compare the activation word with other speech text and detect the activation word more reliably with respect to the noise.
The semantic analyzing unit 103 performs semantic analysis on the speech text acquired by the voice recognizing unit 102 using a natural language process or the like. A result of the semantic analysis by the semantic analyzing unit 103 is provided to the output control unit 106.
The state detecting unit 104 detects the state of the peripheral device 7 (other devices) which can be the source of the noise and provides a detection result to the state control unit 105. For example, the state detecting unit 104 detects the state of the peripheral device 7 on the basis of sound collection. The detection of the state of the peripheral device 7 based on the sound collection may be performed, for example, by specifying a magnitude (a sound pressure level) of ambient sound (noise around the information processing device 1) collected by the sound collecting unit 12. Further, the state detecting unit 104 may provide the magnitude of the specified ambient sound to the state control unit 105 as the detection result.
Further, the state detecting unit 104 may detect the state of the peripheral device 7 on the basis of communication. The detection of the state of the peripheral device 7 based on the communication may be performed, for example, such that the communication unit 11 to be described later is controlled such that a transmission request for the capability information and the state information is transmitted to the peripheral device 7, and the capability information and the state information are acquired from the peripheral device 7 via the communication unit 11. Further, the state detecting unit 104 may provide the capability information and the state information to the state control unit 105 as the detection result.
The state control unit 105 controls the state of the peripheral device 7 (other device) on the basis of the detection result by the state detecting unit 104 and the speech prediction for the user by the speech predicting unit 101. For example, in a case in which the speech predicting unit 101 predicts the speech of the user, and the magnitude of the ambient sound serving as the detection result by the state detecting unit 104 is larger than a predetermined threshold value, the state control unit 105 may control the state of the peripheral device 7 such that the noise output from the peripheral device 7 is further reduced.
Further, the state control of the peripheral device 7 by the state control unit 105 may be performed in various methods. Further, a method of controlling the state of the peripheral device 7 by the state control unit 105 may be decided on the basis of the capability information of the peripheral device 7 acquired via the communication unit 11 or from the storage unit 17.
For example, in a case in which the peripheral device 7 is determined to be a device whose volume level can be controlled via communication on the basis of the capability information of the peripheral device 7, the state control unit 105 may control the state of the peripheral device 7 such that the volume level of the peripheral device 7 is decreased or increased. In this case, for example, the state control unit 105 may generate a control signal for causing the volume level of the peripheral device 7 to be decreased or increased and control the communication unit 11 such that the control signal is transmitted to the peripheral device 7.
Further, in a case in which the peripheral device 7 is determined to be a device whose operation mode can be controlled via communication on the basis of the capability information of the peripheral device 7, the state control unit 105 may control the state of the peripheral device 7 by causing the operation mode of the peripheral device 7 to be changed. In this case, for example, the state control unit 105 may generate a control signal for causing the operation mode to be changed to an operation mode in which the noise output from the peripheral device 7 is further decreased and control the communication unit 11 such that the control signal is transmitted to the peripheral device 7. Further, for example, in a case in which the peripheral device 7 operates in one of three operation modes, that is the standby mode, the silent mode, and the common mode, the noise output by the peripheral device 7 may increase in the order of the standby mode, the silent mode, and the common mode.
Further, in a case in which a setting value related to an operation of the peripheral device 7 is determined to be controlled via communication on the basis of the capability information of the peripheral device 7, the state control unit 105 may control the state of the peripheral device 7 by causing the setting value related to the operation of the peripheral device 7 to be changed. The setting value related to the operation of the peripheral device 7 may include, for example, a strength of an air volume, the number of revolutions, power consumption, and the like. In this case, for example, the state control unit 105 may generate a control signal for causing the setting value related to the operation of the peripheral device 7 to be changed to a setting value in which the noise output from the peripheral device 7 is further decreased and control the communication unit 11 such that the control signal is transmitted to the peripheral device 7.
Further, in a case in which the peripheral device 7 is determined to be a device in which ON/OFF of the power supply can be controlled via communication on the basis of the capability information of the peripheral device 7, the state control unit 105 may generate a control signal for causing the power supply of the peripheral device 7 to be changed to ON or OFF and control the communication unit 11 such that the control signal is transmitted to the peripheral device 7. Further, the state control unit 105 may determine whether the peripheral device 7 is powered off on the basis of the capability information of the peripheral device 7 or the like. For example, in a case in which the peripheral device 7 is determined not to be powered off, the state control unit 105 may perform the state control of the peripheral device 7 using another state control method described above. Further, the state control unit 105 may perform the state control of the peripheral device 7 using another state control method described above preferentially rather than the control of the power supply. With this configuration, since the peripheral device 7 is simply controlled without completely stopping the operation, the user is unlikely to receive an uncomfortable feeling or inconvenience due to the stop of the peripheral device 7.
Further, the state control unit 105 may control the state of the peripheral device 7 such that the noise output from the peripheral device 7 is further reduced after causing the state information of the peripheral device acquired from the state detecting unit 104 to be stored in the storage unit 17. Further, in a case in which the speech of the user ends, the state control unit 105 may control the state of the peripheral device 7 on the basis of the state information of the peripheral device 7 stored in the storage unit 17 such that the state of the peripheral device 7 returns to the state at a time point at which the state of the peripheral device 7 is stored in the storage unit 17. The state control example of the peripheral device will be described in detail later with reference to
The output control unit 106 controls a response to the speech of the user or an output related to an operation required by the user in accordance with the semantic analysis result provided from the semantic analyzing unit 103. For example, in a case in which the speech of the user is to obtain “tomorrow's weather,” the output control unit 106 acquires information related to “tomorrow's weather” from a weather forecast server on a network, and controls the speaker 13, the projecting unit 16, or the light emitting unit 18 such that the acquired information is output.
The communication unit 11 performs reception and transmission of data with an external device. For example, the communication unit 11 is connected to the communication network 9 and performs transmission to the peripheral device 7 or reception from the peripheral device 7. For example, the communication unit 11 transmits the transmission request for the capability information and the state information to the peripheral device 7. Further, the communication unit 11 receives the capability information and the state information from the peripheral device 7. The communication unit 11 also transmits the control signal generated by the state control unit 105 to the peripheral device 7. Further, the communication unit 11 is connected to a predetermined server (not illustrated) via the communication network 9 or another communication network, and receives information necessary for the output control by the output control unit 106.
The sound collecting unit 12 has a function of collecting the ambient sound and outputting the collected sound to the control unit 10 as an audio signal. Further, for example, the sound collecting unit 12 may be implemented by one or more microphones.
The speaker 13 has a function of converting the audio signal into a voice and outputting the voice under the control of the output control unit 106.
The camera 14 has a function of imaging a surrounding area with an imaging lens installed in the information processing device 1 and outputting the captured image to the control unit 10. Further, the camera 14 may be implemented by, for example, a 360 degree camera, a wide angle camera, or the like.
The range sensor 15 has a function of measuring a distance between the information processing device 1 and the user or a person around the user. The range sensor 15 is implemented by, for example, an optical sensor (a sensor that measures a distance to an object on the basis of phase difference information at a light emission/reception timing).
The projecting unit 16 is an example of a display device and has a function of performing display by projecting (enlarging) an image on a wall or a screen.
The storage unit 17 stores a program or a parameter causing each component of the information processing device 1 to function. Further, the storage unit 17 also stores information related to the peripheral device 7. For example, the information related to the peripheral device 7 may include information for establishing a connecting with the peripheral device 7 connected to the communication network 9, the capability information, the state information, and the like.
The light emitting unit 18 is implemented by a light emitting element such as an LED and can perform full lighting, partial lighting, blinking, lighting position control, and the like. For example, the light emitting unit 18 can cause it look like that the line of sight faces in the direction of the speaker by lighting a part thereof in the direction of the speaker recognized by the voice recognizing unit 102 in accordance with the control of the control unit 10.
The configuration of the information processing device 1 according to the present embodiment has been specifically described above. Further, the configuration of the information processing device 1 illustrated in
Next, an operation example of the information processing device 1 according to the present embodiment will be described with reference to
In a case in which the speech of the user is predicted (YES in step S110), the information processing device 1 measures the ambient sound (S120). For example, the state detecting unit 104 may measure the ambient sound by specifying the magnitude of the ambient sound on the basis of the audio signal collected by the sound collecting unit 12.
Then, the state control unit 105 determines whether or not the ambient sound measured in step S120 is large (S130). For example, the state control unit 105 may perform the determination in step S130 by comparing the ambient sound measured in step S120 with a predetermined threshold value.
In a case in which the ambient sound is determined to be large (YES in step S130), the state control unit 105 causes the state information of the peripheral device 7 acquired on the basis of the communication from the peripheral device 7 through the state detecting unit 104 to be stored in the storage unit 17 (step S140).
Then, the state control unit 105 controls the state of the peripheral device 7 (S150). For example, the state control unit 105 may generate a control signal for causing the volume level to be decreased by a predetermined value for all the peripheral devices 7 whose state can be controlled and cause the communication unit 11 to transmit the control signal. Further, the state control unit 105 may generate a control signal for reducing the ambient sound (noise) for each of the peripheral devices 7 on the basis of the capability information and the state information of the peripheral device 7 acquired in step S140 and cause the communication unit 11 to transmit the control signal.
Then, the information processing device 1 receives a voice input of the user and performs a voice recognition process (S160). In step S170, for example, in a case in which a non-speech period continues for a predetermined time (for example, 10 seconds) or more, the control unit 10 determines that speech of the user ends, and continues the voice recognition process of step S160 until the speech ends.
In a case in which the speech of the user is determined to end (YES in S170), the semantic analyzing unit 103 performs a semantic analysis process on the basis of the recognition result (speech text) of the voice recognizing unit 102, and the output control unit 106 controls the projecting unit 16 and the light emitting unit 18 in accordance with the semantic analysis result (S180).
Finally, the state control unit 105 performs the state control such that the state of the peripheral device 7 returns to the state at a time point of step S140 on the basis of the state information of the peripheral device 7 stored in the storage unit 17 in step S140 (S190). For example, the state control unit 105 may generate a control signal of causing the state of the peripheral device 7 to be changed to the state of the peripheral device 7 at a time point of step S140 for each of the peripheral devices 7 and cause the communication unit 11 to transmit the generated control signal.
The series of processes described above (S110 to S190) may be repeated, for example, each time a series of processes ends.
As described above, according to the first embodiment of the present disclosure, in a case in which the speech of the user is predicted, the magnitude of the ambient sound (noise) of the information processing device 1 is measured, and in a case in which the ambient sound is large, the state of the peripheral device 7 which can be the source of the noise is controlled such that the ambient sound is reduced. With this configuration, even in a case in which there are other sound sources around the user, it is possible to improve the voice recognition accuracy when the user speaks. Further, in a case in which the information processing device 1 outputs a voice-based response corresponding to the speech of the user, the ambient sound is reduced, so that the user can more easily hear the voice-based response output from the information processing device 1.
The first embodiment of the present disclosure has been described above. Several modified examples of the first embodiment of the present disclosure will be described below. Further, each of the modified examples to be described below may be applied alone to the first embodiment of the present disclosure or may be applied in combination to the first embodiment of the present disclosure. Further, each modified example may be applied instead of the configuration described in the first embodiment of the present disclosure or may be additionally applied to the configuration described in the first embodiment of the present disclosure.
In the above operation example, the example in which the state control process of the peripheral device 7 for reducing the noise in step S150 illustrated in
With this operation, it is possible to repeat the state control process for the peripheral device until the ambient sound is sufficiently reduced, and thus the accuracy of the voice recognition process and the semantic analysis process of step S160 and subsequent steps is further improved.
Further, a method of controlling the state of the peripheral device 7 twice or more in order to reduce the noise is not limited to the above example. For example, in order to reduce the noise, the state control unit 105 may control the state of the peripheral device 7 twice or more on the basis of the voice recognition or the semantic analysis result based on the speech of the user.
For example, in step S160, the state control of the peripheral device 7 may be performed again in a case in which the voice recognition process fails (the speech text is unable to be acquired) despite the user is speaking. Further, for example, the speech of the user may be detected on the basis of a motion of the mouth of the user included in an image acquired by the camera 14.
Further, in step S180, in a case in which the semantic analysis from speech text fails (the semantic analysis result is unable to be obtained), the state control of the peripheral device 7 may be performed again.
In the above example, the state control unit 105 acquires the state information of the peripheral device 7 and causes the state information to be stored in the storage unit 17, but the present embodiment is not limited to this example. As second modified example, the state control unit 105 may cause a parameter in the control signal related to the state control to be stored instead of the state information of the peripheral device 7.
For example, in step S150 of
Thus, the first embodiment of the present disclosure and the modified examples have been described. Then, a second embodiment of the present disclosure will be described. In the first embodiment described above, all the peripheral devices 7 which can be controlled by the information processing device 1 are set as the control target on the basis of the magnitude of the ambient sound. On the other hand, an information processing device according to the second embodiment extracts the peripheral device 7 serving as the control target on the basis of the state information of each of the peripheral devices 7 obtained via communication and controls the state of the extracted peripheral device 7.
Similarly to the state detecting unit 104 described in the first embodiment, the state detecting unit 204 according to the present embodiment detects the state of the peripheral device 7 (other devices) which can be the source of the noise. For example, similarly to the state detecting unit 104, the state detecting unit 204 detects the state of the peripheral device 7 on the basis of communication and acquires the capability information and the state information of the peripheral device 7 through the communication unit 11. Further, the state detecting unit 204 may provide the capability information and the state information to the state control unit 205 as the detection result.
Further, the state detecting unit 204 according to the present embodiment may or may not have the function of detecting the state of the peripheral device 7 on the basis of the sound collection as described in the first embodiment.
Similarly to the state control unit 105 described in the first embodiment, the state control unit 205 according to the present embodiment controls the state of the peripheral device 7 (other devices) on the basis of the detection result by the state detecting unit 204 and the speech prediction for the user by the speech predicting unit 101. Unlike the state control unit 105 according to the first embodiment, the state control unit 205 according to the present embodiment has a function of extracting the peripheral device 7 whose state is controlled from a plurality of peripheral devices 7 on the basis of the state of the peripheral device 7. For example, in a case in which the speech predicting unit 101 predicts the speech of the user, the state control unit 205 according to the present embodiment extracts the peripheral device 7 satisfying a predetermined condition based on the state information of the peripheral device 7, and controls the state of the extracted peripheral device 7.
For example, the predetermined condition based on the state information of the peripheral device 7 may be a condition that a current volume level is a predetermined threshold value or more. Further, the predetermined condition based on the state information of the peripheral device 7 may be a condition that the operation mode of the peripheral device 7 is a predetermined operation mode. Further, the predetermined condition based on the state information of the peripheral device 7 may be a condition that a magnitude of a predetermined setting value related to the operation of the peripheral device 7 is a predetermined threshold value or more.
With this configuration, for example, it is possible to control the state of the peripheral device 7 which is outputting a larger noise preferentially or efficiently. Further, since only the state of the peripheral device 7 which may be outputting a larger noise is changed, and the state of the peripheral device 7 which may be outputting a smaller noise is not changed, there is an effect in that the user is unlikely to have an uncomfortable feeling.
Further, the state control unit 205 according to the present embodiment may cause the state information of the peripheral device 7 extracted as described above to be stored in the storage unit 17.
Further, since the other functions of the state control unit 205 according to the present embodiment (the state control method and the decision of the state control method of the peripheral device 7) are similar to those of the state control unit 105 described in the first embodiment, description thereof is omitted.
The configuration example of the information processing device 2 according to the second embodiment of the present disclosure has been described above. Next, an operation example of the information processing device 2 according to the present embodiment will be described with reference to
In a case in which the speech of the user is predicted (YES in step S210), the state detecting unit 204 transmits the transmission request for the capability information and the state information to the peripheral devices 7, and receives the capability information and the state information from the peripheral devices 7 (S220).
Then, the state control unit 205 extracts the peripheral device 7 satisfying the condition based on the state information among the peripheral devices 7 (S230). The condition based on the state information may be, for example, any one of the conditions described above. Further, the state control unit 205 causes the state information of the extracted peripheral device 7 to be stored in the storage unit 17 (S240).
Then, the state control unit 205 controls the states of the extracted peripheral devices 7 (S250). For example, the state control unit 205 may generate a control signal for reducing the ambient sound (noise) for each of the extracted peripheral devices 7 on the basis of the capability information and the state information of the peripheral devices 7 received in step S220 and cause the communication unit 11 to transmit the control signal.
A subsequent process of steps S260 to 290 illustrated in
As described above, according to the second embodiment of the present disclosure, if the speech of the user is predicted, the state information of the peripheral device 7 around the information processing device 2 is acquired, and the state control is performed such that the noise output from the peripheral device 7 extracted on the basis of the state information is reduced. With this configuration, even in a case in which there are other sound sources around the user, it is possible to improve the voice recognition accuracy when the user speaks. Further, according to the second embodiment of the present disclosure, the peripheral device 7 whose state is changed is extracted, and the state control is performed, and thus there is an effect in that the user is unlikely to have an uncomfortable feeling.
Further, in the above example, the example in which the state detecting unit 204 may not have the function of detecting the state of the peripheral device 7 on the basis of sound collection as described in the first embodiment has been described, but the present embodiment is not limited to this example. For example, the state detecting unit 204 may measures the ambient sound with the state detection function based on the sound collection, and in a case in which the ambient sound is determined to be large, the state control unit 205 may extract the peripheral device 7 whose state is changed and perform the state control.
Further, it is also possible to apply each of the modified examples described in the first embodiment to the second embodiment.
The first embodiment and the second embodiment of the present disclosure have been described above. Next, a third embodiment of the present disclosure will be described. The information processing device in accordance with the third embodiment further controls the state of the peripheral device 7 on the basis of a position of the peripheral device 7.
The control unit 30 according to the present embodiment controls respective components of the information processing device 3. Further, the control unit 30 according to the present embodiment functions as a speech predicting unit 301, a voice recognizing unit 302, a semantic analyzing unit 103, a state detecting unit 204, a state control unit 305, and an output control unit 106 as illustrated in
Similarly to the speech predicting unit 101 described in the first embodiment, the speech predicting unit 301 according to the present embodiment performs the speech prediction for the user. In addition to the function of the speech predicting unit 101, the speech predicting unit 301 according to the present embodiment has a function of giving a notification indicating that the speech of the user is predicted to a user position acquiring unit 308 in a case in which the speech of the user is predicted.
Similarly to the voice recognizing unit 102 described in the first embodiment, the voice recognizing unit 302 according to the present embodiment recognizes the voice of the user, converts the voice of the user into a character string, and acquires a speech text. The voice recognizing unit 302 according to the present embodiment is different from the voice recognizing unit 102 described in the first embodiment in that the voice recognizing unit 302 receives and recognizes the voice of the user separated and acquired by the sound source separating unit 309 described below from the voice acquired by the sound collecting unit 12. With this configuration, it is possible to further improve the voice recognition accuracy.
Similarly to the state control unit 105 described in the first embodiment, the state control unit 305 controls the state of the peripheral device 7 (other devices) on the basis of the detection result by the state detecting unit 204 and the speech prediction for the user by the speech predicting unit 301. The state control unit 305 according to the present embodiment has a function of controlling the state of the peripheral device 7 on the basis of the position of the peripheral device 7 in addition to the function of the state control unit 105 according to the first embodiment. For example, information of the position related to the peripheral device 7 may be stored in a storage unit 37 to be described later.
For example, in a case in which the speech predicting unit 101 predicts the speech of the user, the state control unit 305 according to the present embodiment extracts the peripheral devices 7 satisfying a predetermined condition on the basis of the position of the peripheral device 7, and controls the states of the extracted peripheral devices 7. Several examples will be described below as examples in which the state control unit 305 extracts the peripheral device 7 on the basis of the position of the peripheral device 7 and controls the state of the extracted peripheral device 7.
For example, the state control unit 305 may extract the peripheral device 7 located in a noise occurrence region specified on the basis of the sound collection, and control the extracted state. The information of the noise occurrence region may be provided from an acoustic field analyzing unit 307 to be described later, and the state control unit 305 may associate the information of the noise occurrence region with the information of the position of the peripheral device 7 and extract the peripheral device 7 located within the noise occurrence region.
With this configuration, for example, it is possible to control the state of the peripheral device 7 which is outputting a larger noise preferentially or efficiently. Further, since only the state of the peripheral device 7 which is outputting a larger noise is changed, and the state of the peripheral device 7 which is outputting a smaller noise is not changed, there is an effect in that the user is unlikely to have an uncomfortable feeling.
Further, the state control unit 305 may control the state of the peripheral device 7 on the basis of the position of the peripheral device 7 and the position of the user. The position of the user may be provided from the user position acquiring unit 308 to be described later to the state control unit 305.
For example, as illustrated in
With this configuration, it is possible to efficiently control the state of the peripheral device 7 such that the noise output by the peripheral device 7 located in substantially the same direction as the position of the user with reference to the position of the sound collecting unit 12 is reduced. As compared with the sound going toward the sound collecting unit 12 in other directions, it is difficult to separate the sound going toward the sound collecting unit 12 in the substantially same direction as the position of the user from the voice of the user through the sound source separating unit 309 to be described later. Therefore, with this configuration, the sound source separation accuracy is improved, and the voice recognition accuracy is also improved consequently.
Further, as illustrated in
With this configuration, it is possible to efficiently reduce the noise output from the peripheral device 7 close to the user, and the user is likely to easily speak.
Further, the method of extracting the peripheral device 7 on the basis of the position of the peripheral device 7 by the state control unit 305 is not limited to the above example. For example, the state control unit 305 may extract the peripheral device 7 located near the sound collecting unit 12 and control the state of the extracted peripheral device 7. Further, the state control unit 305 may extract the peripheral device 7 using a combination of the above-described extraction methods.
The acoustic field analyzing unit 307 analyzes the acoustic field (a space or a region in which sound waves exist) around the information processing device 3 on the basis of the voice collected by the sound collecting unit 12. For example, the acoustic field analyzing unit 307 analyzes the acoustic field on the basis of the voice acquired from each of a plurality of microphones installed in the sound collecting unit 12. The analysis result for the acoustic field may be provided to the sound source separating unit 309. Further, the acoustic field analyzing unit 307 specifies a direction having a high sound pressure level with reference to the sound collecting unit 12, and provides a region included in a predetermined angle range centered on the direction to the state control unit 305 as the noise occurrence region.
The user position acquiring unit 308 acquires the position of the user on the basis of the data acquired from the camera 14 and the range sensor 15. For example, the user position acquiring unit 308 may detect the user from the image acquired by the camera 14 using a face detection technique, a face recognition technique, or the like, associate the detected user with the data acquired from the range sensor 15, and acquire the position of the user. The user position acquiring unit provides the acquired user position to the state control unit 305 and the sound source separating unit 309.
The sound source separating unit 309 obtains the voice of the user by separating the voice of the user on the basis of the acoustic field analysis result by the acoustic field analyzing unit 307 and the position of the user. For example, the sound source separating unit 309 may separate the voice of the user from the noise on the basis of a beam forming method. The voice of the user separated by the sound source separating unit 309 is provided to the voice recognizing unit 302.
Similarly to the storage unit 17 described in the first embodiment, the storage unit 37 stores a program or a parameter causing the respective components of the information processing device 3 to function. In addition to the information stored in the storage unit 17, the storage unit 37 stores map information of an area around the information processing device 3. Further, in addition to the information stored in the storage unit 17, the storage unit 37 further stores information of the position of the peripheral device 7 as the information related to the peripheral device 7. Further, for example, the information of the position of the peripheral device 7 stored in the storage unit 17 may be information related to a relative position with reference to the information processing device or information of the position of the peripheral device 7 in the map information of the area around the information processing device 3.
Further, the map information related to the area around the information processing device 3 may be input to the information processing device 3 by the user or may be acquired by the information processing device 3 on the basis of information of the camera 14, the range sensor 15, or the like. Further, the information of the position of the peripheral device 7 may be input to the information processing device 3 by the user or may be acquired from the peripheral device 7.
The configuration example of the information processing device 3 according to the third embodiment of the present disclosure has been described above. Next, an operation example of the information processing device 3 according to the present embodiment will be described with reference to
In a case in which the speech of the user is predicted (YES in step S310), the user position acquiring unit 308 acquires the position of the user (S315). Then, the state detecting unit 204 transmits the transmission request for the capability information and the state information to the peripheral device 7, and receives the capability information and the state information from the peripheral device 7 (S320). Further, the state control unit 305 acquires the position of the peripheral device 7 from the storage unit 37 (S325).
Then, the state control unit 305 extracts the peripheral device 7 satisfying a condition based on the acquired position of the peripheral device 7 (S330). The state control unit 305 may extract the peripheral device 7 on the basis of the position of the peripheral device 7 or the position of the peripheral device 7 and the position of the user by any of the methods described above.
Then, the state control unit 305 causes the state information of the extracted peripheral device 7 to be stored in the storage unit 37 (S340). Further, the state control unit 305 controls the state of the extracted peripheral device 7 (S350). For example, the state control unit 305 generates a control signal for reducing the ambient sound (noise) for each of the extracted peripheral devices 7 on the basis of the capability information and the state information of the peripheral device 7 received in step S320 and causes the communication unit 11 to transmit the control signal.
Since a subsequent process of steps S360 to 390 illustrated in
As described above, according to the third embodiment of the present disclosure, if the speech of the user is predicted, the state information with the position of the peripheral device 7 around the information processing device 3 is acquired, and the state control is performed such that the noise output from the peripheral device 7 extracted on the basis of the state information is reduced. With this configuration, even in a case in which there are other sound sources around the user, it is possible to improve the voice recognition accuracy when the user speaks. Further, according to the third embodiment of the present disclosure, the peripheral device 7 whose state is changed is extracted on the basis of the position of the peripheral device, and the state control is performed, and thus the state control of the peripheral device 7 can be performed more efficiently.
Further, in the above example, the example of extracting the control target on the basis of the position of the peripheral device 7 has been described, but the extraction of the control target may be performed in combination with the extraction of the control target based on the state of the peripheral device 7 described in the second embodiment.
Further, a control amount (for example, the size of causing the volume level to be decreased) may be dynamically set on the basis of the position of the peripheral device 7. For example, the control amount may be set such that the volume level of the peripheral device 7 closer to user is decreased. The setting of the control amount based on the position of the peripheral device 7 described above may be performed in combination with the extraction of the control target based on the position of the peripheral device 7.
Further, it is also possible to apply each of the modified examples described in the first embodiment to the third embodiment.
The embodiments of the present disclosure have been described above. The information processing such as the speech prediction process, the state detection process, the state control process, the voice recognition process, the semantic analysis process, and the like is implemented in cooperation with software and the information processing devices 1 to 3. A hardware configuration example of an information processing device 1000 will be described as an example of a hardware configuration of the information processing devices 1 to 3 which are information processing devices according to the present embodiment.
The CPU 1001 functions as an operation processing device and a control device and controls an overall operation of the information processing device 1000 in accordance with various kinds of programs. Further, the CPU 1001 may be a microprocessor. The ROM 1002 stores a program, an operation parameter, and the like which are used by the CPU 1001. The RAM 1003 temporarily stores a program to be used in the execution of the CPU 1001, a parameter that appropriately changes in the execution thereof, or the like. These components are connected to one another via a host bus including a CPU bus or the like. The functions of the control unit 10, the control unit 20, and the control unit 30 are mainly implemented by cooperation of the CPU 1001, the ROM 1002, and the RAM 1003 and software.
The input device 1004 includes an input device for inputting information such as a mouse, a keyboard, a touch panel, a button, a microphone, a switch, a lever, or the like, an input control circuit for generating an input signal on the basis of an input by the user and outputting the input signal to the CPU 1001. By operating the input device 1004, the user of the information processing device 1000 can input various kinds of data to the information processing device 1000 or give an instruction to perform a processing operation.
The output device 1005 includes a display device such as, for example, a liquid crystal display (LCD) device, an OLED device, a see-through display, or a lamp. Further, the output device 1005 includes an audio output device such as a speaker and a headphone. For example, the display device displays a captured image, a generated image, or the like. On the other hand, the audio output device converts voice data or the like into a voice and outputs the voice. For example, the output device 1005 corresponds to the speaker 13, projecting unit 16, and the light emitting unit 18 described above with reference to
The storage device 1006 is a device for storing data. The storage device 1006 may include a storage medium, a recording device for recording data in a storage medium, a reading device for reading data from a storage medium, a deleting device for deleting data recorded in a storage medium, and the like. The storage device 1006 stores a program executed by the CPU 1001 and various kinds of data. The storage device 1006 corresponds to the storage unit 17 described above with reference to
The imaging device 1007 includes an imaging optical system such as a photographing lens for collecting light and a zoom lens and a signal converting element such as a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS). The imaging optical system collects light emitted from a subject and forms a subject image in a signal converting unit, and the signal converting element converts the formed subject image into an electric image signal. The imaging device 1007 corresponds to the camera 14 described above with reference to
The communication device 1008 is, for example, a communication interface constituted by a communication device or the like for establishing a connecting with a communication network. Further, the communication device 1008 may include a communication device compatible with a wireless local area network (LAN), a communication device compatible with long term evolution (LTE), a wire communication device performing wired communication, or a Bluetooth (registered trademark) communication device. The communication device 1008 corresponds to the communication unit 11 described above with reference to
As described above, according to the embodiment of the present disclosure, it is possible to improve the voice recognition accuracy even in a case in which there are other sound sources around the user.
The preferred embodiment(s) of the present disclosure has/have been described above with reference to the accompanying drawings, whilst the present disclosure is not limited to the above examples. A person skilled in the art may find various alterations and modifications within the scope of the appended claims, and it should be understood that they will naturally come under the technical scope of the present disclosure.
For example, respective steps in the above embodiments need not be necessarily processed chronologically in accordance with the order described as the flowchart diagram. For example, respective steps in the processes of the above embodiments may be processed in an order different from the order described as the flowchart diagram or may be processed in parallel. For example, in the third embodiment, the example in which the peripheral devices satisfying a predetermined condition are extracted after detecting (acquiring) the state of the peripheral device, but the peripheral devices satisfying a predetermined condition may be extracted before detecting the state of the peripheral device.
Further, according to the above embodiments, it is also possible to provide a computer program causing hardware such as the CPU 1001, the ROM 1002, and the RAM 1003 to perform the functions similar to those of the information processing devices 1 to 3 described above. Further, a recording medium having the computer program recorded therein is also provided.
Further, the effects described in this specification are merely illustrative or exemplified effects, and are not limitative. That is, with or in the place of the above effects, the technology according to the present disclosure may achieve other effects that are clear to those skilled in the art from the description of this specification.
Additionally, the present technology may also be configured as below.
An information processing device including:
a state detecting unit configured to detect a state of another device which can be a source of noise; and
a state control unit configured to control the state of the other device on a basis of a detection result for the state of the other device and speech prediction of a user.
The information processing device according to (1), in which the state detecting unit detects the state of the other device on a basis of sound collection.
The information processing device according to (1) or (2), in which the state detecting unit detects the state of the other device on a basis of communication.
The information processing device according to any one of (1) to (3), in which the state control unit causes a volume level of the other device to be decreased.
The information processing device according to any one of (1) to (4), in which the state control unit causes an operation mode of the other device to be changed.
The information processing device according to any one of (1) to (5), in which the state control unit controls the state of the other device extracted from a plurality of the other devices on a basis of the state of the other device.
The information processing device according to any one of (1) to (6), in which the state control unit controls the state of the other device further on a basis of a position of the other device.
The information processing device according to (7), in which the state control unit controls a state of another device located within a noise occurrence region specified on a basis of sound collection.
The information processing device according to (7) or (8), in which the state control unit controls the state of the other device further on a basis of a position of the user.
The information processing device according to (9), in which the state control unit controls a state of another device located in substantially a same direction as the position of the user with reference to a position of the sound collecting unit.
The information processing device according to (10), further including: a sound source separating unit configured to acquire a voice of the user by separating the voice of the user from the voice acquired by the sound collecting unit.
The information processing device according to any one of (9) to (11), in which the state control unit controls a state of another device located near the position of the user.
The information processing device according to any one of (1) to (12), in which the state control unit controls the state of the other device further on a basis of a voice recognition result based on speech of the user.
The information processing device according to any one of (1) to (13), in which the state control unit controls the state of the other device further on a basis of a semantic analysis result based on speech of the user.
An information processing method including:
detecting a state of another device which can be a source of noise; and
controlling, by a processor, the state of the other device on a basis of a detection result for the state of the other device and speech prediction of a user.
A program causing a computer to implement:
a function of detecting a state of another device which can be a source of noise; and
a function of controlling the state of the other device on a basis of a detection result for the state of the other device and speech prediction of a user.
Number | Date | Country | Kind |
---|---|---|---|
2016-019193 | Feb 2016 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/087190 | 12/14/2016 | WO | 00 |