This invention relates in general to hands-free communication devices, and more specifically to a method and apparatus for echo suppression within such devices.
An echo control system in a hands-free communication device attenuates a signal path between the microphone and the speaker to reduce the echoes experienced by a far-end user. However, due to inherent nonlinearities, acoustic echo cancellers used in such systems only provide between 25 dB and 30 dB of attenuation in the signal path. This attenuation may be insufficient and may allow residual echoes to be reflected back to the far-end when only a far-end user is actively producing audio signals. Therefore, the introduction of additional attenuation into the signal path during far-end only activity is necessary.
In addition to attenuation, many systems will insert simulated background or comfort noise using parameters generated from speech compression algorithms. The near-end hands-free communication device extracts parameters from current background noise and transmits these parameters to the far-end hands-free communication device across a narrow-band channel. The far-end hands-free communication device then reconstructs the noise from the parameters as it receives them. However, speech compression algorithms require additional and relatively complex processing and therefore increase overall system costs for the creation of bandwidth-efficient parameters.
Background noise can alternatively be simulated using an echo suppressor that locally generates what is known as comfort noise that closely approximates the background noise. The comfort noise is output simultaneously with the audio signal transmitted over the hands-free communications device to replace the background audio signal. This eliminates the need for bandwidth efficiency as the parameters are locally generated and used. However, one problem with such an echo suppressor is that parameters must be extracted from the current frame of background noise that also contains the echo. Another problem with such an echo suppressor is that it is necessary to span arbitrarily long periods of time without updating the parameters. This can cause undesirable clicking noises if done improperly.
An echo suppressor that uses an infinite impulse response (IIR) filter for comfort noise generation is known. Such an echo suppressor, which uses linear predictive coding (LPC) and synthesis codebooks, eliminates the problem of extracting parameters from a frame of background noise containing the echo. However, this type of echo suppressor requires complex LPC and therefore has large memory and computational requirements.
Therefore, what is needed is an echo suppressor for use in a hands-free communications device that provides high quality echo cancellation while maintaining low memory and computational requirements.
The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views and which together with the detailed description below are incorporated in and form part of the specification, serve to further illustrate various embodiments and to explain various principles and advantages all in accordance with the present invention.
In overview form the present disclosure concerns hands-free communications devices, and more specifically a method and apparatus for echo suppression within such devices. More particularly, various inventive concepts and principles that improve the performance and reduce the complexity and processing resources required by such methods and apparatus are discussed. The echo suppression systems and methods of particular interest are those that produce a simulated background noise signal, or comfort noise signal, to overwrite echoes. The echo suppression system and comfort noise generator therein are contemplated for use in wireless communications devices such as cellular phones but could be used in any communications device capable of operating in a hands-free or speakerphone mode in which echo suppression is desired.
The instant disclosure is provided to further explain in an enabling fashion the best modes of making and using various embodiments in accordance with the present invention. The disclosure is further offered to enhance an understanding and appreciation for the inventive principles and advantages thereof, rather than to limit in any manner the invention. The invention is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued. It is further understood that the use of relational terms, if any, such as first and second, top and bottom, and the like are used solely to distinguish one from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Much of the inventive functionality and many of the inventive principles are best implemented with or in software programs or instructions and integrated circuits. It is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating such software instructions or programs and integrated circuits with minimal experimentation. Therefore, in the interest of brevity and minimization of any risk of obscuring the principles and concepts according to the present invention, further discussion of such software and integrated circuits, if any, will be limited to the essentials with respect to the principles and concepts employed by the preferred embodiments.
Referring to the figures and specifically to
The AEC 102 is for receiving a digital audio, or digital input, signal 110, for removing estimated echoes from the digital input signal 110, and for outputting a modified digital audio signal 112 (hereinafter referred to as modified input signal 112). The digital input signal 110 (hereinafter referred to as input signal 110) is an inbound signal from a telephone near-end or local microphone (not shown) that has been converted into digital samples by a conventional A/D converter (not shown) and grouped into digital input signal frames (hereinafter referred to as signal frames) in a manner well known in the art. Each signal frame includes a predetermined number of samples such as, for example, 80 samples. The input signal 110 can be a digital background noise signal that includes digital background noise signal frames or a digital voice signal that includes digital voice signal frames.
The AEC 102 includes an adaptive filter 114 that receives a far-end receive signal (receive signal) 116 and produces an estimated echo signal 118 based on the receive signal 116. Preferably, the AEC 102 uses an adder 120 to subtract the estimated echo signal 118 from the input signal 110 to therefore produce and output the modified input signal 112.
The echo suppression controller 108 is for receiving the input signal 110 and the receive signal 116 and for controlling the soft switch 106 based upon the input signal 110 and the receive signal 116 according to well known algorithms. The echo suppression controller 108 could also receive the modified signal 112 from the AEC 102 and control the soft switch 106 based upon the input signal 110, the receive signal 116, and the modified signal 112. Specifically, during periods of only far-end audio activity, the echo suppression controller 108 instructs the soft switch 106 to output what will be referred to as a comfort noise signal, a simulated background noise signal, or, more generally, a simulated signal, 122 produced by the comfort noise generator 104 as a transmit output signal 124. During periods of only near-end or local audio activity, the echo suppression controller 108 instructs the soft switch 106 to output the modified input signal 112 as the transmit output signal 124. The receive signal 116 is a signal transmitted to a near-end speaker (not shown) after being converted into an audio signal by a conventional D/A converter (not shown). The receive signal 116 may also undergo additional processing before being converted to an analog signal and being output to the near-end speaker.
The soft switch 106, which is of the type known in the art and can be implemented in a number of conventional ways, is for switching between the modified input signal 112 and the comfort noise signal 122 produced by the comfort noise generator 104 to output the transmit output signal 124. The soft switch 106 is able to gradually switch between the modified input signal 112 and the comfort noise signal 122 to avoid audible clicks or abrupt cut-offs in the transmit output signal 124 that are noticeable to the far-end user. The soft switch 106 preferably includes an adder 126 and first and second variable-gain attenuators 128, 130 that are coupled to or in communication with the echo suppression controller 108. Based upon the levels of the input signal 110 and the receive signal 116, the echo suppression controller 108 determines an attenuation factor α and outputs signals representative of the attenuation factor α and an inverse attenuation factor 1-α to the first and second variable-gain attenuators 128 and 130, respectively.
The first variable-gain attenuator 128 attenuates the modified input signal 112 based on the attenuation factor a to produce a first attenuated signal 132. The second variable-gain attenuator 130 attenuates the comfort noise signal 122 based on the inverse attenuation factor 1-α to produce a second attenuated signal 134. The adder 126 combines the first attenuated signal 132 and the second attenuated signal 134 to form the transmit output signal 124. Therefore, during periods of only far-end audio activity when the input signal 110 at the near-end contains an echo signal, the echo suppression controller 108 determines the value of the attenuation factor α to be zero, so that the soft switch 106 outputs the comfort noise signal 122. During periods of only near-end or local audio activity when the input signal 110 is a digital voice signal, the echo suppression controller 108 determines the value of the attenuation factor a to be one, so that the soft switch 106 outputs the modified input signal 112. The soft switch 106 may gradually switch between the comfort noise signal 122 and the modified input signal 112 if the attenuation factor a approaches zero or one, causing the inverse attenuation factor 1-α to approach one or zero. For example, when the near-end user is talking, the input signal 110 is a digital voice signal. The modified input signal 112 is therefore multiplied by one, the comfort noise signal 122 is multiplied by zero and the transmit output signal 124 thus includes only the modified input signal 112. When the near-end user stops talking and the far-end user is talking, the input signal 110 includes echo, and the attenuation factor α may ramp from one to zero, thus causing the inverse attenuation factor 1-α to ramp from zero to one. As the attenuation factor a ramps from one to zero, the transmit output signal 124 changes from including only the modified input signal 112 to including both the modified input signal 112 and the comfort noise signal 122 in inversely proportional amounts, to ultimately include only the comfort noise signal 122 when there is only far-end audio activity.
It should be noted that the method of controlling the soft switch 106 using the echo suppression controller 108 could be performed by any number of algorithms or hardware implementations in addition to the implementation of the soft switch 106.
As mentioned above, however, the AEC 102 may not provide sufficient attenuation for the echo suppression system 100 and therefore may not cancel a sufficient amount of echo in the input signal 110. If the receive signal 116 has not been sufficiently attenuated, the far-end user may experience reflected residual echoes during periods of talk where only the far-end user is actively generating audio signals. Therefore, as will now be discussed, the soft switch 106 provides the necessary additional attenuation and inserts comfort noise from the comfort noise generator 104.
Referring to
The determination of the FIR filter coefficients is much simpler and therefore requires less memory and computational power than the linear predictive coding used in conventional echo suppression systems such as those including IIR filters. As shown specifically in
The coefficient updater 138 is for receiving a signal frame of the input signal 110 and for generating and outputting an updated set of filter coefficients 142 to the FIR filter 136. The coefficient updater 138 may include a speech detector 144 (
The random number generator 140 is of the type known in the art and can be implemented in a number of conventional ways. For example, the random number generator 140 may be a 16 bit linear feedback shift register that provides a white noise signal 146 to the FIR filter 136. Using the updated set of filter coefficients 142 provided by the coefficient updater 138, the FIR filter 136 produces the comfort noise signal 122 by shaping the noise signal 146 to correspond to the input signal 110.
Referring to
The correlator 150 is for receiving the current set of filter coefficients 154 from the buffer 148 and for correlating the signal frame of the input signal 110 with the current set of filter coefficients 154 to obtain a best fit subframe 156 of the signal frame. Cross correlation values xx[k] equal the sum over n from 0 to 49 of the product y[n+k] h[n], where y[n] is the signal frame of the input signal 110, h[n] is the current set of filter coefficients 154, and k is the position of the 50 coefficients within the possible 80 data samples and ranges from 0 to 29. This calculation is performed once per frame, or once per 80 samples, to obtain the best fit subframe 156, or in other words the best 50 consecutive samples, from the signal frame of the input signal 110. Of the 30 possible positions of the 50 consecutive samples, the best position and thus the best fit subframe 156 is the position which provides the largest cross correlation value xx[k]. As mentioned above, the coefficient updater 138 may include the speech detector 144 for detecting the levels of the signal frame. If the levels of the signal frame are within the threshold, the speech detector 144 instructs the correlator 150 to correlate the signal frame with the current set of filter coefficients 154 to obtain the best fit subframe 156.
The correlator 150 is also for outputting the best fit subframe 156, which may then be shaped by a window 158 before it is input into the integrator 152. The window 158 is a spectral estimate enhancement window and is preferably a Hanning window, but could also be a Hamming window, a Blackman window or another similar window that serves to smooth or shape the spectrum of the best fit subframe 156 as is well known in the art. It is beneficial that the best fit subframe 156 be smoothed in the window 158 to provide the best spectral estimate.
The integrator 152 is for combining the best fit subframe 156 of the signal frame of the input signal 110 with the current set of filter coefficients 154 received from the buffer to produce the updated set of filter coefficients 142. Alternately, the integrator 152 may combine the best fit subframe 156 with a linear combination of previous sets of filter coefficients to produce the updated set of filter coefficients 142. The integrator 152 is also for outputting the updated set of filter coefficients 142 to the FIR filter 136 and buffer 148 to replace the current set of filter coefficients 154.
The integrator 152 is known in the art and can be implemented in a number of conventional ways. For example, the integrator 152 may include an adder 160 and first and second attenuators 162, 164 in communication with the adder 160. The first attenuator 162 attenuates the best fit subframe 156 of the signal frame based on a predetermined attenuation factor γ. The second attenuator 164 attenuates the current set of filter coefficients 154 by a second predetermined attenuation factor that is the inverse of the first predetermined attenuation factor, or 1-γ. The outputs from the first and second attenuators 162 and 164 are input to the adder 160 where the outputs are combined to produce the updated set of filter coefficients 142. The updated set of filter coefficients 142 is then output to the FIR filter 136 and to the buffer 148 to replace the current set of filter coefficients 154.
Referring to
The echo suppression controller 108 detects the levels of the signal frames of the input signal 110. If the echo suppression controller 108 determines that the input signal 110 is a digital voice signal (the near-end user is speaking), the echo suppression controller 108 instructs the soft switch 106 to output the modified input signal 112 as the transmit output signal 124 as described above. The soft switch 106 performs this selective switching by attenuating the modified input signal 112 by the attenuation factor a determined by the echo suppression controller 108, attenuating the comfort noise signal 122 by the inverse attenuation factor 1-α and combining the attenuated signals 132, 134. As discussed above, the soft switch 106 may gradually switch between the comfort noise signal 122 and the modified input signal 112.
The comfort noise generator 104 receives the input signal 110 and approximates the input signal 110 using the random number generator 140 and the FIR filter 136. The speech detector 144 may detect the levels of the signal frames to determine if the levels of the signal frames are within a predetermined threshold. If not, the FIR filter 136 uses an initial set of coefficients or the set of coefficients it used to produce the previous comfort noise signal 122. If the levels of the signal frames are within the predetermined threshold, the input signal 110 is a digital background noise signal and the speech detector 144 instructs the correlator 150 to correlate a current set of filter coefficients 154 with the signal frame to determine the best fit subframe 156 of the input signal 110. The best fit subframe 156 may be conditioned by the window 158 and then combined with the current set of coefficients 154 to produce an updated set of coefficients 142 in an integrator using first and second attenuators 162, 164 and the adder 160.
The updated set of filter coefficients 142 is output to the FIR filter 136, which uses these coefficients to condition the white noise signal 146 produced by the random number generator 140 to produce the comfort noise signal 122 that is output to the soft switch 106. Therefore, if the levels of the signal frames are within a threshold such as, for example, 0.5 dB of a continuously measured noise floor, the current set of coefficients 154 is replaced with the updated set of coefficients 142. The echo suppression controller 108, rather than the comfort noise generator 104, determines whether the comfort noise signal 122 is output as the transmit output signal 124. Therefore, the comfort noise generator 104 always produces the comfort noise signal 122. However, if the input signal 110 is not a digital background noise signal, the coefficients are not updated.
As discussed above, the soft switch 106 produces a transmit output signal 124 by outputting either the comfort noise signal 122 output by the FIR filter 136 of the comfort noise generator 104 or the modified input signal 112 depending upon the levels of the input signal 110 as determined by the echo suppressor controller 108. Therefore, the comfort noise signal 122 is only output as the transmit output signal 124, and effectively the comfort noise generator 104 is only used when the far-end user is speaking. The receive signal 116 is then converted to an analog signal and output to the near-end speaker. Therefore, when a far-end user is actively producing audio signals and the near-end user is not actively producing audio signals, the comfort noise generator 104 outputs the comfort noise signal 122 that the far-end user hears. The comfort noise signal 122 consequently replaces the residual echo.
In summary, the echo suppression system 100 provides additional attenuation in the signal path between the speaker and the microphone of a hands-free communication device to reduce the echoes experienced by a far-end user through use of a comfort noise generator 104. Specifically, the comfort noise generator 104 is able to replace the missing background noise signal by generating the comfort noise signal 122 shown in
The processes discussed above and the inventive principles thereof are intended to and will alleviate insufficient attenuation problems caused by prior art echo suppression systems. In addition, the comfort noise generator of the present invention will enhance echo suppression while advantageously requiring lower memory and computational requirements than prior art comfort noise generators.
This disclosure is intended to explain how to fashion and use various embodiments in accordance with the invention rather than to limit the true, intended, and fair scope and spirit thereof. The invention is defined solely by the appended claims, as they may be amended during the pendency of this application for patent, and all equivalents thereof. The foregoing description is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications or variations are possible in light of the above teachings. The embodiment(s) was chosen and described to provide the best illustration of the principles of the invention and its practical application, and to enable one of ordinary skill in the art to utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. All such modifications and variations are within the scope of the invention as determined by the appended claims, as may be amended during the pendency of this application for patent, and all equivalents thereof, when interpreted in accordance with the breadth to which they are fairly, legally, and equitably entitled.
Number | Name | Date | Kind |
---|---|---|---|
5475712 | Sasaki | Dec 1995 | A |
5537509 | Swaminathan et al. | Jul 1996 | A |
5563944 | Hasegawa | Oct 1996 | A |
5630016 | Swaminathan et al. | May 1997 | A |
5646991 | Sih | Jul 1997 | A |
5794199 | Rao et al. | Aug 1998 | A |
5862452 | Cudak et al. | Jan 1999 | A |
5893056 | Saikaly et al. | Apr 1999 | A |
5949888 | Gupta et al. | Sep 1999 | A |
5960389 | Jarvinen et al. | Sep 1999 | A |
5978760 | Rao et al. | Nov 1999 | A |
6101466 | Rao et al. | Aug 2000 | A |
6108623 | Morel | Aug 2000 | A |
6163608 | Romesburg et al. | Dec 2000 | A |
6185300 | Romesburg | Feb 2001 | B1 |
6424942 | Mustel et al. | Jul 2002 | B1 |
6597787 | Lindgren et al. | Jul 2003 | B1 |
6816592 | Kirla | Nov 2004 | B1 |
Number | Date | Country | |
---|---|---|---|
20040204934 A1 | Oct 2004 | US |