The disclosure relates in general to a communication apparatus and voice processing method therefor.
Users who use communication devices during phone calls frequently change the loudness of their voices due to the situation of their surrounding places. For example, the user speaks loudly in a noisy situation; the user speaks in a low voice in the situation where one needs to whisper. However, the sound quality experienced at the far-end may not be improved by the self-adjustment of loudness of voice by the one who speaks.
The disclosure provides embodiments of a communication apparatus and voice processing method therefor.
According to one embodiment of the disclosure, a voice processing method is provided, for use in a communication apparatus. The embodiment includes the following steps. A near-end audio signal is received by at least one microphone of the communication apparatus. Voice energy data and noise energy data are generated by performing voice activity detection on the near-end audio signal. An amount of noise is obtained by performing noise energy calculation with the noise energy data. It is determined whether the amount of noise exceeds a first noise amount threshold. If the amount of noise exceeds the first noise amount threshold, a sidetone mode of the communication apparatus is enabled to produce a sidetone signal according to the voice energy data and to play the sidetone signal through a speaker of the communication apparatus. A noise suppression mode is enabled to produce a far-end audio signal according to the voice energy data and transmitting the far-end audio signal by a communication module of the communication apparatus.
According to another embodiment of the disclosure, a communication apparatus is provided. An embodiment of the communication apparatus includes at least a microphone, an audio processing unit, a speaker, and a communication module. At least a microphone is for receiving a near-end audio signal. The audio processing unit is operative to: perform voice activity detection on the near-end audio signal to generate voice energy data and noise energy data; perform noise energy calculation with the noise energy data to obtain an amount of noise; determine whether the amount of noise exceeds a first noise amount threshold; enable a sidetone mode to produce a sidetone signal according to the voice energy data when the amount of noise exceeds the first noise amount threshold; and enable a noise suppression mode to produce a far-end audio signal according to the voice energy data. The speaker is for playing the sidetone signal. The communication module is for transmitting the far-end audio signal.
In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments.
Embodiments of a communication apparatus and voice processing method therefor are provided as follows.
Referring to
When a user uses a communication device as shown in
In one embodiment, the communication apparatus 1 can implement an embodiment of a voice processing method as shown in
Embodiments of
In the above embodiment, playing the sidetone signal in step S250 indicates that the loudness of the speaking at the side of the communication apparatus 1 is in a high level so as to remind the user of dropping one's voice. In another embodiment according to
In another embodiment according to
In step S260, the enabling of the noise suppression mode to generate the far-end audio signal is to make the far-end to receive audio sound with reduced noise. Further, step S260 can be performed before or after step S250 or S245; the order in which the steps can be performed is not limited to the above embodiments.
Besides, in order to avoid the far-end from having echo during a call, echo cancellation can be performed on the near-end audio signal before performing voice activity detection, for example, before step S220, or in step S220.
Referring to
In one embodiment, the criterion for the whisper mode in step S320 includes, for example: whether the amount of voice is less than a voice amount threshold; and whether the amount of noise is less than a second noise threshold, wherein if the amount of voice is less than the voice amount threshold and the amount of noise is less than the second noise threshold, then the criterion for the whisper mode is satisfied. Besides, the criterion for the whisper mode is not limited to this example; any other criterion, according to which a determination can be made as to whether the amount of voice and the amount of noise indicate the user whispering, can be taken as a criterion for the whisper mode. Further, in another embodiment, the first noise amount threshold can be greater than the second noise threshold.
In step S330, the communication apparatus 1 can employ filtering computation to generate the boosted audio signal based on the voice energy data, according to the nonlinear characteristics of human hearing for the sake of boosting.
Moreover, steps S220-S250, S260, S310-S330 can be implemented by the audio processing unit 110. The audio processing unit 110 can be disposed in the communication apparatus 1, as shown in
Referring to
The voice estimation module 420 can obtain a voice signal from the digital audio signal Sa according to the detection result signal Sc, and thus obtain the amount of voice. In such a way, the voice activity detection module 410 can be regarded as generating the voice energy data. In other words, for the voice estimation module 420, receiving the digital audio signal Sa and the detection result signal Sc is the same as receiving the voice energy data.
The noise estimation module 430 can also obtain a noise signal from the digital audio signal Sa according to the detection result signal Sc, and thus obtain the amount of noise. In such a way, the voice activity detection module 410 can be regarded as generating the noise energy data. In other words, for the noise estimation module 430, receiving the digital audio signal Sa and the detection result signal Sc is the same as receiving the noise energy data.
Further, every module in
In other embodiments, the voice estimation module 420 and the noise estimation module 430 can further employ smoothing technique to prevent the estimation of the amount of voice and amount of noise from being affected by short, rapid changes or errors, and to prevent the result of the determination in step S240 or S310 from being unstable or misjudgment. For instance, noise energy can be defined by Ne=α*Ne_c+(1−α)*Ne_p, wherein 0<α<1, Ne_c and Ne_p represent the current (present) noise energy value and previous noise energy value, respectively. As such, with setting a to an appropriate value, Ne can be replaced with Ne_c to smooth the current rapid change(s) of the noise energy.
The embodiments of the voice processing method are not limited by the manner of the voice activity detection as illustrated in
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments. It is intended that the specification and examples be considered as exemplary only, with a true scope of the disclosure being indicated by the following claims and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
7881927 | Reuss | Feb 2011 | B1 |
20050004796 | Trump | Jan 2005 | A1 |
20060085183 | Jain | Apr 2006 | A1 |
20060167691 | Tuli | Jul 2006 | A1 |
20100020940 | Zad-Issa | Jan 2010 | A1 |
20140349638 | Umezawa | Nov 2014 | A1 |
Number | Date | Country |
---|---|---|
101193381 | Jun 2008 | CN |
101278337 | Oct 2008 | CN |
102436821 | May 2012 | CN |
201030733 | Aug 2010 | TW |
201212008 | Mar 2012 | TW |
WO 9911047 | Mar 1999 | WO |
Number | Date | Country | |
---|---|---|---|
20140236590 A1 | Aug 2014 | US |