The application claims priority to Chinese Application No. 201611242997.7 filed on Dec. 29, 2016, which is incorporated herein by reference.
The present disclosure relates to the technical field of voice interaction, and in particular to a de-reverberation control method and device of sound producing equipment.
With the development of intelligent technology, many manufactures start to consider providing a voice recognition function in intelligent products. For example, computers, mobile phones, home appliances and other products are required to support wireless connection, remote control, voice interaction, and so on.
However, when a user performs voice interaction with the intelligent product, the sound made by the user is collected by the intelligent product after being reflected by a room, and thus reverberation is generated. Since the reverberation contains a signal similar to a correct signal, and has a relatively large interference on extraction of voice information and voice feature, it is desired to perform de-reverberation. The existing de-reverberation solution fails to be well applied to a scenario where the user interacts with the intelligent product. The existing de-reverberation solution either has a low de-reverberation degree which causes large reverberation residue, or has a high de-reverberation degree which attenuates a user's voice. Accordingly, recognition accuracy of a voice command may be severely reduced and thus the product fails to respond timely to a command from the user, leading to a poor interaction experience.
The disclosure is intended to provide a de-reverberation control method and device of sound producing equipment, for solving the problem of low recognition accuracy of a voice command and poor interaction experience in the current products.
To this end, the technical solutions of the disclosure are implemented as follows.
According to an aspect, the disclosure provides a de-reverberation control method of sound producing equipment, which includes that:
when a piece of equipment performs audio playing, a voice signal from a user is collected in real time;
a relative position of the user with respect to the equipment and acoustic parameters of a room environment in which the user and the equipment are located, are acquired;
according to one or more of the relative position and the acoustic parameters, a corresponding microphone in the equipment is selected, and a corresponding voice enhancement mode is called to perform de-reverberation; and
a voice command word from the user is acquired, and the equipment is controlled to perform a function corresponding to the voice command, as a respond to the user.
According to another aspect, the disclosure provides a de-reverberation control device of sound producing equipment, which includes:
a voice collector, which is arranged to, when the equipment performs audio playing, collect the voice signal from the user in real time;
a factor acquiring unit, which is arranged to acquire the relative position of the user with respect to the equipment and the acoustic parameters of the room environment in which the equipment is located;
a de-reverberation performing unit, which is arranged to, according to one or more of the relative position and the acoustic parameters, select the corresponding microphone in the equipment, and call the corresponding voice enhancement mode to perform the de-reverberation; and
a command executing unit, which is arranged to acquire the voice command word from the user, and control the equipment to perform the corresponding function, as a respond to the user.
By means of the technical solutions of the disclosure, when the voice enhancement mode is adjusted based on the relative position of the user with respect to the equipment, the user's voice can be enhanced or protected better while the de-reverberation is performed, and voice recognition accuracy can be improved; when the de-reverberation is performed based on the acoustic parameters associated with the user and the equipment, different voice enhancement modes can be adopted according to the change of acoustics environments indicated by the acoustic parameters to ensure an appropriate de-reverberation degree, thereby solving the problem of large reverberation residue or attenuated user's voice in the current solution, and achieving higher recognition accuracy. It can be understood that when the de-reverberation is performed based on both user information and environment information, the voice recognition accuracy can be further improved.
For making the aim, the technical solutions and the advantages of the disclosure more clear, implementation modes of the disclosure are further elaborated below in combination with the accompanying drawings.
An embodiment of the disclosure provides a de-reverberation control method of sound producing equipment. As shown in
In S101, when a piece of equipment performs audio playing, a voice signal from a user is collected in real time.
In S102, a relative position of the user with respect to the equipment and acoustic parameters of a room environment in which the user and the equipment are located, are acquired.
In the embodiment, when a factor (also called a reference quality) for controlling de-reverberation is selected, a comprehensive factor containing both user information and space information is derived based on two basic factors, namely a user-related quantity and a space-related quantity.
For example, a direction and distance of the user relative to the equipment is acquired as the relative position which is the user-related quantity. The acoustic parameters may belong to either the basic factor or the comprehensive factor. For example, reverberation time (T60, T30, T20 or the like) of a room environment belongs to a space-related quantity. A direct-to-reverberant ratio of user's voice (the ratio of direct sound to reverberant sound in the user's voice collected by the equipment), and an intelligibility (e.g. C50) obtained by the equipment using its built-in microphone array to collect the user's voice and then calculate, are associated with the user and the space, and belong to the comprehensive factor.
In S103, according to one or more of the relative position and the acoustic parameters, a corresponding microphone in the equipment is selected, and a corresponding voice enhancement mode is called to perform de-reverberation.
S104: a voice command word from the user is acquired, and the equipment is controlled to perform a function corresponding to the voice command, as a respond to the user.
From the above, by means of the technical solutions of the disclosure, when the voice enhancement mode is adjusted based on the relative position of the user with respect to the equipment, the user's voice can be enhanced or protected better while the de-reverberation is performed, and the voice recognition accuracy can be improved. When the de-reverberation is performed based on the acoustic parameters associated with the user and the equipment, different voice enhancement modes can be adopted according to the change of acoustics environments indicated by the acoustic parameters to ensure an appropriate de-reverberation degree. Therefore, the problem of large reverberation residue or attenuated user's voice in the current solution may be solved, and thus a higher recognition accuracy may be obtained. It can be understood that when the de-reverberation is performed based on both user information and environment information, the voice recognition accuracy can be further improved.
In another embodiment based on the embodiment shown in
In this way, according to the feature of a scenario of voice interaction between the user and the equipment, when the wake-up word is detected, it is judged that the user has a new requirement at this point, then the equipment is controlled to stop the current audios, and a new command of the user is waited, which not only contributes to further improving the recognition accuracy of the new command, but also conforms to a usage habit of the scenario of voice interaction, thereby improving interaction experience.
The action of controlling the audio playing and S102 are performed at the same time, thereby shortening the response time and responding to the user more timely.
Furthermore, in S104, the command word includes commands of controlling built-in functions of the equipment. For example, the command word may include the command of controlling the play volume of a speaker of the equipment, the command of controlling the equipment to move, the command of controlling an application program installed in the equipment, and the like.
Since relative to the wake-up words, the number of command words is large, and the content of the command words is complex, in order to reduce the equipment load and improve the recognition accuracy, a cloud processing mode is adopted for the command word in the this embodiment. After the equipment stops the audio playing, the voice signal sent by the user after the wake-up word is collected. The voice signal is transmitted to a cloud server, the cloud server performs feature matching on the voice signal, and acquires the command word from the voice signal upon that the feature matching is successful. The command word returned by the cloud server is received, and the equipment is controlled to perform the corresponding function according to the command word, so as to correspondingly respond to the user.
In another embodiment of the disclosure, how to perform the de-reverberation based on the user-related quantity and the space-related quantity is described in detail. Other embodiments may be referred for other content of the solution.
The sound producing equipment in each embodiment of the disclosure is a sound producing equipment a microphone array. The microphone array is used to collect the user's voice and perform de-reverberation. In a process of performing de-reverberation according to the basic factor or the comprehensive factor, the microphones selected according to product requirements and usage scenarios are different. It is possible to select either all the microphones in the microphone array or a part of microphones in the microphone array. For example, if the user is nearby, and the voice is loud and clear, merely using a part of microphones can achieve the effect of using all the microphones, then there is no need to use all the microphones. If the user is far away, and the voice is weak and the reverberation is heavy, it is required to use all the microphones to process.
For a scenario where multiple factors are required to perform de-reverberation, in the present embodiment, priorities are respectively set for factors included in the relative position and the acoustic parameters. From a highest priority to a lowest priority, the de-reverberation is performed based on the factors one by one. Alternatively, the de-reverberation is performed only based on one or more of the factors which has a priority higher than a predetermined level. Adopting the processing mode based on the priorities can not only provide a targeted voice enhancement mode according to different scenarios to achieve a better de-reverberation effect, but can reduce calculation complexity and shorten the response time. It should be noted that, de-reverberation may also be performed based on all the factors without considering the priorities.
For example, the priority of the relative position is set to be higher than the priority of the acoustic parameter, and the priority of the direction is set to be higher than the priority of the distance in the relative position. During the de-reverberation, the direction is first adopted, then the distance is adopted, and finally the acoustic parameter is adopted. Alternatively, a level value and a level threshold are set for the priority of each factor. For example, if the level value of the relative position is 5, the level value of the acoustic parameter is 3, and the level threshold is 4, when the factor with the priority higher than 4 is adopted according to a rule, the de-reverberation is performed only using the relative position. It can be understood that multiple priority levels can be respectively set for the factors in the acoustic parameters, and the processing mode similar to the above is adopted.
In the present embodiment, the de-reverberation may be performed in the following implementations.
A First Implementation
According to the direction of the user relative to the equipment, the corresponding microphone in the equipment is selected, and the voice direction enhanced by the voice enhancement mode is adjusted to perform the de-reverberation.
A Second Implementation
When the distance of the user relative to the equipment is less than a first distance threshold, a de-reverberation degree and a voice amplification function in the voice enhancement mode are reduced to a first enhancement level. When the distance of the user relative to the equipment is greater than a second distance threshold, the de-reverberation degree and the voice amplification function in the voice enhancement mode are improved to a second enhancement level. When the distance of the user relative to the equipment is greater than the first distance threshold and less than the second distance threshold, the de-reverberation degree and the voice amplification function in the voice enhancement mode are adjusted to be between the first enhancement level and the second enhancement level.
When the user is close to the equipment, the de-reverberation degree and the amplification degree of user's voice are reduced. When the user is far away from the equipment, the de-reverberation degree and the amplification degree of user's voice are improved.
A Third Implementation
When a reverberation degree in the room environment indicated by the acoustic parameters is greater than a first reverberation threshold, the de-reverberation degree in the voice enhancement mode is improved to a first degree. When the reverberation degree in the room environment indicated by the acoustic parameters is less than a second reverberation threshold, the de-reverberation degree in the voice enhancement mode is reduced to a second degree. When the reverberation degree in the room environment indicated by the acoustic parameters is greater than the first reverberation threshold and less than the second reverberation threshold, the de-reverberation degree in the voice enhancement mode is adjusted to be between the first degree and the second degree.
When the reverberation degree in the room environment is greater, the de-reverberation degree is improved. When the reverberation degree in the room is lesser, the de-reverberation degree is reduced.
Only the operations, closely related to the solution, in the voice enhancement mode are described above, but there are more operations; for example, equalization processing will be performed on the voice signal.
The specific values of the reverberation threshold and the reverberation degree are not strictly limited here, but can vary in a specific range.
Another embodiment of the disclosure provides a de-reverberation control device 200 of sound producing equipment. As shown in
The voice collector 201 is arranged to, when the equipment performs audio playing, collect the voice signal from the user in real time. The voice collector can be implemented by the microphone array in the equipment.
The factor acquiring unit 202 is arranged to acquire the relative position of the user with respect to the equipment and the acoustic parameters of the room environment in which the equipment is located.
The de-reverberation performing unit 203 is arranged to, according to one or more of the relative position and the acoustic parameters, select the corresponding microphone in the equipment, and call the corresponding voice enhancement mode to perform the de-reverberation.
The command executing unit 204 is arranged to acquire the voice command word from the user, and control the equipment to perform the corresponding function, as a respond to the user.
Based on the embodiment shown in
The de-reverberation performing unit 203 is arranged to respectively set priorities for the factors included in the relative position and the acoustic parameters, and from a highest priority to a lowest priority, perform the de-reverberation based on the factors one by one, or perform the de-reverberation only based on one or more of the factors which has a priority higher than the predetermined level.
The de-reverberation performing unit 203 is specifically arranged to perform at least one of the following three actions:
according to the direction of the user relative to the equipment, select the corresponding microphone in the equipment, and adjust the voice direction enhanced by the voice enhancement mode to perform the de-reverberation; or
when the distance of the user relative to the equipment is less than the first distance threshold, reduce the de-reverberation degree and the voice amplification function in the voice enhancement mode to the first enhancement level; when the distance of the user relative to the equipment is greater than the second distance threshold, improve the de-reverberation degree and the voice amplification function in the voice enhancement mode to the second enhancement level; when the distance of the user relative to the equipment is greater than the first distance threshold and less than the second distance threshold, adjust the de-reverberation degree and the voice amplification function in the voice enhancement mode to be between the first enhancement level and the second enhancement level; or
when the reverberation degree in the room environment indicated by the acoustic parameters is greater than the first reverberation threshold, improve the de-reverberation degree in the voice enhancement mode to the first degree; when the reverberation degree in the room environment indicated by the acoustic parameters is less than the second reverberation threshold, reduce the de-reverberation degree in the voice enhancement mode to the second degree; when the reverberation degree in the room environment indicated by the acoustic parameters is greater than the first reverberation threshold and less than the second reverberation threshold, adjust the de-reverberation degree in the voice enhancement mode to be between the first degree and the second degree.
The command executing unit 204 is specifically arranged to collect the voice signal sent by the user after the wake-up word, transmit the voice signal to the cloud server. The cloud server performs feature matching on the voice signal, acquires the command word from the voice signal upon that the feature matching is successful, receive the command word returned by the cloud server, and control the equipment to perform the corresponding function according to the command word.
The de-reverberation control device 200 of sound producing equipment is set in the sound producing equipment. The sound producing equipment includes, but is not limited to intelligent portable terminals and intelligence household electrical appliances. The intelligent portable terminals at least include a smart watch, a smart phone or a smart speaker. The intelligence household electrical appliances at least include a smart television, a smart air-conditioner or a smart recharge socket.
The specific working mode of each unit in the embodiment of the device can refer to the related content of the embodiment of the disclosure, so it will not be repeated here.
For example, the voice collector may be a microphone or a microphone array. The factor acquiring unit may be implemented in a range finder such as an infrared range finder and a laser range finder; a direction finder such as a radio direction finder; and a processor. The de-reverberation performing unit and the command executing unit may be implemented in a processor. The device may further include a transceiver arranged to transmit/receive a signal.
From the above, by means of the technical solutions of the disclosure, when the voice enhancement mode is adjusted based on the relative position of the user with respect to the equipment, the user's voice can be enhanced or protected better while the de-reverberation is performed, and the voice recognition accuracy can be improved. When the de-reverberation is performed based on the acoustic parameters associated with the user and the equipment, different voice enhancement modes can be adopted according to the change of acoustics environments indicated by the acoustic parameters to ensure an appropriate de-reverberation degree, thereby solving the problem of large reverberation residue or attenuated user's voice in the current solution, and achieving higher recognition accuracy. It can be understood that when the de-reverberation is performed based on both user information and environment information, the voice recognition accuracy can be further improved.
Those ordinary skilled in the art can understand that all or a part of steps of the above embodiments can be performed by using a computer program flow. The computer program can be stored in a computer readable storage medium. The computer program, when executed on corresponding hardware platforms (such as system, installation, equipment and device) performs one of or a combination of the steps in the method.
Optionally, all or a part of steps of the above embodiments can also be performed by using an integrated circuit. These steps may be respectively made into integrated circuit modules. Alternatively, multiple modules or steps may be made into a single integrated circuit module.
The devices/function modules/function units in the above embodiment can be realized by using a general computing device. The devices/function modules/function units can be either integrated on a single computing device, or distributed on a network composed of multiple computing devices.
When the devices/function modules/function units in the above embodiment are realized in form of software function module and sold or used as an independent product, they can be stored in a computer-readable storage medium. The computer-readable storage medium may be an ROM, a magnetic disk or a compact disk.
The above is only the preferred embodiment of the disclosure and not intended to limit the disclosure. Any modifications, equivalent replacements, improvements and the like within the spirit and principle of the disclosure shall fall within the scope of protection of the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
2016 1 1242997 | Dec 2016 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
20050047611 | Mao | Mar 2005 | A1 |
20060074686 | Vignoli | Apr 2006 | A1 |
20100008518 | Mao | Jan 2010 | A1 |
20120206553 | MacDonald | Aug 2012 | A1 |
20130136089 | Gillett | May 2013 | A1 |
20130156198 | Kim | Jun 2013 | A1 |
20140056439 | Kim | Feb 2014 | A1 |
20150181328 | Gupta | Jun 2015 | A1 |
20150189435 | Sako | Jul 2015 | A1 |
20160073198 | Vilermo et al. | Mar 2016 | A1 |
20160098989 | Layton et al. | Apr 2016 | A1 |
20170188437 | Banta | Jun 2017 | A1 |
Number | Date | Country |
---|---|---|
100508029 | Jul 2009 | CN |
104012074 | Aug 2014 | CN |
105957528 | Sep 2016 | CN |
106128451 | Nov 2016 | CN |
3002754 | Apr 2016 | EP |
2004038697 | May 2004 | WO |
2014147442 | Sep 2014 | WO |
2016049403 | Mar 2016 | WO |
Entry |
---|
Gomez, Randy, Keisuke Nakamura, and Kazuhiro Nakadai. “Robustness to speaker position in distant-talking automatic speech recognition.” Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013. |
Yoshioka, Takuya, et al. “Adaptive dereverberation of speech signals with speaker-position change detection.” Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. IEEE, 2009. |
Supplementary European Search Report issued in corresponding EP Application 17208986.4, dated Mar. 2, 2018, 8 pages. |
Number | Date | Country | |
---|---|---|---|
20180190308 A1 | Jul 2018 | US |