1. Field of the Invention
The invention relates to noise cancellation, and more particularly to noise cancellation in Internet communication devices.
2. Description of the Related Art
Because the cost of traditional circuit-switched telephony is great, Internet phones are frequently used to make domestic long distance and international calls. Consequently, Internet communication devices, such as VoIP devices and Instant Messengers, have become popular. For Instant Messengers such as Skype, MSN Messenger, Yahoo Messenger, Google Talker, and AOL Messenger are examples of software applications for Internet communication. Increased use of Internet communication devices demands increased audio quality of Internet communication devices. One of the greatest obstacles to audio quality of Internet communication devices is noise.
Noise from computer fans, typing, and mouse movement is often received by the microphone of an Internet communication device connected to the computer. Internet communication devices comprising noise suppression modules are typically capable of canceling a majority of the stationary noise with certain level in order not to affect too much on voice quality. In such case, quite some residual noise will be remained, even after noise suppression. In addition, normal noise suppression modules, however, cannot eliminate non-stationary noise.
Because the noise of each party is independent, when multiple parties are VoIP conferencing, the total level of noise is the sum of the noise of each party. Automatic gain control modules connected to Internet communication devices may further amplify and increase noise. Thus, a method for handling noise, particularly on non-stationary noise of Internet communication devices to improve audio quality Internet communication devices is desirable.
The invention provides an Internet communication devices. An exemplary embodiment of the Internet communication device plays a remote audio signal received through a network and transmits an audio signal to a remote user to complete the communication. The Internet communication device comprises a line-in speech detection module and a line-in channel control module. The line-in speech detection module detects whether or not the remote audio signal is speech to generate a remote speech detection result. The line-in channel control module then attenuates the remote audio signal if the remote speech detection result indicates that the remote audio signal is not speech, thus, noise is removed from the remote audio signal.
A method for controlling noise of an Internet communication device is also provided. The Internet communication device outputs a remote audio signal received from a network and transmits an audio signal to a remote user through the network to complete a conversation. Whether the remote audio signal is speech or not is first detected to generate a remote speech detection result. The remote audio signal is then attenuated if the remote speech detection result indicates that the remote audio signal is not speech, thus, noise is removed from the remote audio signal.
A detailed description is given in the following embodiments with reference to the accompanying drawings.
The invention can be more fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein:
The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.
The Internet communication device 100 is connected to the personal computer 108 via an interface 110, such as a USB interface, an analog audio interface, or a software API interface if the Internet communication device 100 is a software speakerphone module. Subsequent to the Internet communication device 100 receiving the remote audio signal through the Interface 110, the remote audio signal is processed by line-in signal path modules of the Internet communication device 100 before being output by a loudspeaker 122. The line-in signal path is shown in the lower half of
The line echo cancellation module 112 removes the echo caused by the network or line from the remote audio signal. The line-in noise suppression module 114 then removes some stationary noise from the remote audio signal. Only part of the stationary noise, however, can be eliminated because the remote audio is attenuated in conjunction with the elimination of the stationary noise. In addition, non-stationary noise cannot be removed by the line-in noise suppression module 114. Thus, two modules, the line-in speech detection module 102 and the line-in channel control module 104, are added to the Internet communication device 100 to cancel the residual noise and non-stationary noise carried by the remote audio signal.
The line-in speech detection module 102 first detects whether or not the remote audio signal is real speech. If the remote audio signal is real speech, a remote speech detection result with a value of 1 is generated. Otherwise, a remote speech detection result with a value of 0 is generated. The remote speech detection result is delivered to the line-in channel control module 104. If the remote speech detection result indicates that the remote audio signal is not speech, the line-in channel control module 104 attenuates the remote audio signal. For example, the line-in channel control module 104 mutes a non-speech remote audio signal. Thus, all noise including non-stationary noise is removed from the remote audio signal. The line-in automatic gain control module 116 then adjusts the signal level of the remote audio signal to an appropriate level. After being further converted to an analog signal and amplified by power amplifier 120, the remote audio signal is output by loudspeaker 122, allowing the user to hear the remote audio signal with no noise.
The microphone 130 receives an audio signal from a user. The audio signal is then processed by line-out signal path modules of Internet communication device 100 before transmission via interface 110 to a network. The line-out signal path is shown in the upper half of
Ps(n)=αs·Ps(n−1)+(1−αs)·L(n)·L(n); and (1)
Pl(n)=αl·Pl(n−1)+(1−αl)·L(n)·L(n); (2)
wherein the L(n) is the remote audio signal, the αs is a predetermined short-term smoothing parameter, the αl is a predetermined long-term smoothing parameter and the n is a sample index. The short-term smoothing parameter αs and the long-term smoothing parameter αl are chosen that (1−αl) is at least one order less than (1−αs), such that the short-term power Ps(n) is updated faster than the long-term power Pl(n).
The noise estimation module 206 derives a noise power estimate Pn(n) from a noise estimate N(m) of the remote audio signal. The frequency domain noise estimate N(m) is obtained from the line-in noise suppression module 114 of
P
n(n)=Q([2n/M]); (4)
wherein the k is a frame index, M is a frame size for frequency domain processing, and the function [x] denotes an integer closest to x.
After the short-term power Ps(n), the long-term power Pl(n), and the noise power estimate Pn(n) are obtained, they are delivered to the comparators 208 and 210. The comparator 208 compares the difference between the short-term and the long-term powers Ps(n) and Pl(n) with a first threshold T1(n) to generate a first comparison result C1(n). The comparator 210 compares the difference between the long-term power Pl(n) and the noise power estimate Pn(n) with a second threshold T2(n) to generate a second comparison result C2(n). The first comparison result C1(n) and the second comparison result C2(n) are determined according to the following algorithms:
wherein the function |x| denotes the absolute value of x, and log(x) denotes basis-10 logarithm of x.
If the first comparison result C1(n) indicates that the short-term power Ps(n) is much greater than the long-term power Pl(n), and the second comparison result C2(n) indicates that the long-term power Pl(n) is much greater than the long-term power Pn(n), both the first comparison result C1(n) and the second comparison result C2(n) are true, and the detector module 212 enables a detector output D(n) to trigger the harmonic detection module 214. Thus, the detector output D(n) is determined according to the following algorithm:
When triggered by the detector output D(n), the harmonic detection module 214 perform harmonic analysis on the remote audio signal L(n) to detect whether the remote audio signal L(n) consists of real speech or not. If the remote audio signal L(n) comprises speech, the harmonic detection module 214 generates a remote speech detection result S(n) with the value “1”, indicating the existence of speech. Thus, the line-in channel control module 104 of
The speech period control module 304 then generates the speech period signal G(n) to control the attenuation of the remote audio signal L(n) according to the detection frequency V(n) and the remote speech detection result S(n). If the detection frequency V(n) is greater than a frequency threshold B, the speech period is extended by the speech period control module 304. Otherwise, the speech period is shortened if the detection frequency is less than the frequency threshold B. Thus, during a conversation between two Internet communication devices, the remote audio signal L(n) is not repeatedly muted for short periods with high frequency, thus eliminating harsh, potentially ear damaging sound in remote audio signal L(n). The attenuation control module 306 then mutes the remote audio signal L(n) according to the speech period signal G(n) to obtain the remote audio signal L′(n). The speech period signal G(n) is determined according to the following algorithms:
wherein m is a frame index, and M is a frame size for frequency domain processing.
The comparator 402 determines whether a difference between a power Px(m) of the audio signal and a stationary noise estimate power Pn(m) of the audio signal is greater than a third threshold Tx(m) to obtain a third comparison result Cf(m). If the third comparison result Cf(m) is true, it means that the power Px(m) of the audio signal is much larger than the stationary noise estimate power Pn(m), and the audio signal may comprise speech. Thus, the pitch detection module 404 is triggered to perform pitch detection on the audio signal X(m) to generate a pitch detection signal Dx(m). If the pitch detection is positive, the audio signal is confirmed to comprise speech. In one embodiment, the pitch detection module 404 performs pitch detection based on the method provided by D. Huang, etc. in “Speech pitch detection in noisy environment using multi-rate adaptive lossless FIR filters”, ISCAS'04, 22-26 May 2004, or the method provided by L. Hui, etc. in “A Pitch Detection Algorithm Based on AMDF and ACF”, ICASSP'06, 14-19 May 2006.
If both the pitch detection signal Dx(m) and the remote detection signal Vf(m) are true, a conversation between Internet communication devices is underway, and the detector module 408 enables the speech detection result Sx(n). Thus, the automatic gain control module 138 of
wherein Sx(m) is the speech detection result of frequency domain, the Sx(n) is the speech detection result of time domain, and the function [x] denotes an integer closest to x.
The invention provides a method for controlling noise of an Internet communication device. A line-in speech detection module is added to detect the speech of a remote audio signal sent by a far-end talker, and the remote audio signal is muted by a line-in channel control module if the remote audio signal is not speech. A microphone speech detection module is added to detect the speech of an audio signal received from a near-end talker, and the audio signal is not amplified if the audio signal is not speech. Thus, the noise including non-stationary noise is eliminated from the remote audio signal and the audio signal, and the audio quality of the Internet communication device is improved.
While the invention has been described by way of example and in terms of preferred embodiment, it is to be understood that the invention is not limited thereto. To the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art). Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Number | Name | Date | Kind |
---|---|---|---|
5940499 | Fujii et al. | Aug 1999 | A |
20020116187 | Erten | Aug 2002 | A1 |
20020165711 | Boland | Nov 2002 | A1 |
20030002659 | Erell | Jan 2003 | A1 |
20050069114 | Eran | Mar 2005 | A1 |
20060271358 | Erell | Nov 2006 | A1 |
20070033030 | Gottesman | Feb 2007 | A1 |
20070237339 | Konchitsky | Oct 2007 | A1 |
20080118082 | Seltzer et al. | May 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20080147393 A1 | Jun 2008 | US |