This application claims the benefit of priority from European Patent Application No. 06 009692.2, filed May 10, 2006, which is incorporated by reference.
1. Technical Field
This disclosure relates to echo reduction. In particular, this disclosure relates to echo reduction and suppression of residual echo signals in communication systems.
2. Related Art
Echo reduction or suppression may be needed in communication systems, such as hands-free sets and speech recognition systems. Communication systems may include a microphone that detects a desired signal, such as a speech signal from a user. The microphone may also detect undesirable signals, such as echoes produced by a loudspeaker.
Echoes may occur through the repetition of sound carried by the reflection of sound. Such signals may be detected by the near-end microphone and re-transmitted back to the remote party.
Multi-channel systems may have performance problems when individual channels have a correlation, which may occur when multiple microphones detect the speech from a speaker. When the individual channels are correlated, the adaptive filters may not converge to the desired impulse response. This may occur because portions of signals output by one loudspeaker may be compensated by a filter that processes the output of a different loudspeaker.
Optimization of echo compensation filters may depend upon the position of a speaker. Movement of the speaker may require recalculation of the filter coefficients. Filter convergence problems may result from the non-uniqueness of the adaptation calculations. Some approaches may delay the signals in individual channels and may introduce a non-linearity in the channel paths. These methods may introduce audible artifacts that reduce the quality of the speech signal. A need exists for an echo reduction system to reduce echoes in a multi-channel environment.
A multi-channel echo compensation system may receive first and second audio input signals from a first channel and a second channel of a multi-channel source device. A de-correlation processor may process the first and second audio input signals and generate first and second de-correlated audio signals. Loudspeakers may transmit the first and second de-correlated audio signals. A microphone may receive a desired and undesired signal. Adaptive echo compensation filters may filter the received signal on a channel-by-channel basis to generate an echo compensated signal. Adaptive echo compensation filters may remove the undesired signals.
Other systems, methods, features and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The system may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like-referenced numerals designate corresponding parts throughout the different views.
The de-correlation processor 120 may provide the de-correlated output signals {tilde over (x)}1(n) and {tilde over (x)}2(n) to devices that convert electric signals into sound. In
The echo compensation filters 244 and 246 may include hardware and/or software, and may include a digital signal processor (DSP). The DSP may execute instructions that delay an input signal one or more additional times, track frequency components of a signal, filter a signal, and/or attenuate or boost an amplitude of a signal. Alternatively, the echo compensation filters 244 and 246 or DSP may be implemented as discrete logic or circuitry, a mix of discrete logic and integrated logic, or may be implemented through multiple processors or software programs.
The coefficients for the first and second channel echo compensation filters 244 and 246 may be dynamically adjusted separately for each channel. This may improve the quality of the microphone output signal y(n) and may improve the intelligibility of a speech signal detected by the microphone 134.
The de-correlation processor 120 may de-correlate the first and second channel acoustic input signals x1(n) and x2(n) based on the performance or the adaptation state of the echo compensation filters 244 and 246. If the controller 124 determines that the echo compensation filters 244 and 246 are not adequately adapting, the controller 124 may control the de-correlation processor 120 to de-correlate the first and second channel acoustic input signals x1(n) and x2(n). If the controller 124 determines that the echo compensation filters 244 and 246 are adapting at an acceptable rate, the controller 124 may instruct the de-correlation processor 120 to maintain or reduce an amount of de-correlation.
The quality and reliability of the echo compensated microphone signal e(n) may be enhanced by de-correlating the first and second channel acoustic input signals x1(n) and x2(n) dynamically in response to the adaptation state of the echo compensation filters 244 and 246. Artifacts that may have been introduced by de-correlation may be reduced to ensure intelligibility of echo compensated speech signals e(n). The echo compensated speech signals e(n) may be transmitted to a remote party 182.
A first summing or combining circuit 252 may sum the estimated output signals {circumflex over (d)}1(n) and {circumflex over (d)}2(n) to produce a summed estimated output signal {circumflex over (d)}(n). A second summing or combining circuit 156 shown in
The controller 124 may analyze the first and second channel acoustic input signals x1(n) and x2(n) to determine a correlation between the signals. If the correlation is below a predetermined threshold, the controller 124 may inhibit or deactivate the de-correlation processor 120. If the de-correlation processor 120 is deactivated, it may be programmed to function in a “pass-through” mode so that the de-correlated output signals {tilde over (x)}1(n) and {tilde over (x)}2(n) may be identical to the respective first and second acoustic input signals x1(n) and x2(n). If the correlation exceeds the predetermined threshold, the de-correlation processor may continue or may begin to de-correlate the respective acoustic input signals x1(n) and x2(n).
The controller 124 may receive information regarding the adaptation state of the first and second channel echo compensation filters 244 and 246 through first and second communication signals 266 and 268. The controller 124 may also receive the echo compensated signal e(n). The controller 124 may determine the adaptation performance of the echo compensation filters 244 and 246 based on a system distance measurement. The controller 124 may control the amount of de-correlation provided by the de-correlation processor 120 based on the adaptation performance of the first and second channel echo compensation filters 244 and 246.
The degree of correlation between first and second channel acoustic input signals x1(n) and x2(n) may be calculated using the short time correlation of the signals. The degree of correlation may also be determined based on the short time coherence of the signals. De-correlating may be performed if the short time correlation or short time coherence, or their respective mean value, exceed a pre-determined threshold value. Such values may range, for example between about 0.96 and about 0.99.
The mean short time coherence may be calculated by averaging over frequency and time after performing a discrete Fourier transformation in a sub-band μ according to the following equations:
with the Fourier spectra X1,2(Ωμ,n) for the μ-th sub-band having a center frequency Ωμ of the discrete time point (sampling instant n). The symbol < > indicates smoothing in time, e.g., by a first order infinite impulse response filter, and the asterisk indicates the complex conjugate. The number of the nodes or sampling points of the discrete Fourier transform (DFT) spectra is given by NDFT, and λ may be an arbitrary time constant. The arbitrary time constant may range between about 0.0 and about 0.99. The value of the short time coherence may be a suitable measure or control parameter for controlling the amount of de-correlation.
Speech uttered by the speaker 176 may be detected by the microphone 134. The undesirable de-correlated output signals {tilde over (x)}1(n) and {tilde over (x)}2(n) reproduced by the loudspeakers 130 and 132 may also be detected by the microphone 134. The first and second channel echo compensation filters 244 and 246 may generate an echo-compensated signal e(n) representing speech uttered by the speaker 176. A telephone 424 or other communication device may receive the echo-compensated signal e(n). The multi-channel echo compensation system 110 may remove the signals generated by the loudspeakers 132 and 134.
A first channel de-correlation circuit 430 and a second channel de-correlation circuit 442 may de-correlate the first and second channel acoustic input signals x1(n) and x2(n). The first and second channel de-correlation circuit 430 and 442 may provide the de-correlated signals to the loudspeakers 132 and 134. Each channel may include one or more loudspeakers, but each loudspeaker may output signals from only one channel.
The second channel de-correlation circuit 442 may include a second time-varying filter 544, such as an all-pass filter, and may also include a second non-linear processing circuit 546, such as a half-wave rectifier. The second time-varying filter 544 may include a finite impulse response delay filter, which may have multiple filter coefficients. The controller 124 may control calculating the second filter coefficients β2(n) of the all-pass filters 544 through a second control signal 545. The second filter coefficients β1(n) may be based on the performance of the second channel echo compensation filter 246 and by the value of the system distance D(n). The second non-linear processing circuit or half-wave rectifier 546 may receive second filtered signals x′2(n) from an output of the all-pass filter 544. A second summing circuit 548 may add or combine the output of the all-pass filter 544 with the output of the half-wave rectifier 546 to provide the second channel de-correlated output signal {tilde over (x)}2(n).
The first and second de-correlation circuits 430 and 442 may de-correlate the first and second channel acoustic input signals x1(n) and x2(n) using the all-pass filters 534 and 544 and the half-wave rectifiers 536 and 546, respectively. The first and second filtered signals χ′1,2(n) received by the respective half-wave rectifiers 536 and 546 may operate according to following equation:
where the tilde may denote the de-correlated signals, and α1 and α2 may be arbitrary parameters representing a degree of the non-linearity controlled by the controller 124. In some applications, after initiation of de-correlation by the all-pass filters 534 and 544 and the half-wave rectifiers 536 and 546, the values of α1 and α2 may be reduced after a predetermined period of time, e.g., after about a few seconds. This may reduce perceivable audio artifacts that may be caused by de-correlation.
The parameters α1 and α2 may be different for the respective first and second channel acoustic input signals x1(n) and x2(n). The time-varying filters 534 and 544 may provide a delay in the signal path. The non-linear processing circuits 536 and 546 may provide non-linearity in the signal path of the first and the second acoustic input signals x1(n) and x2(n) according to the following equations:
where the tilde denotes the de-correlated signals, and α1 and α2 may be arbitrary parameters representing the degree of the non-linearity.
The de-correlated output signals {tilde over (x)}1(n) and {tilde over (x)}2(n) ay be robust in terms of convergence. The time-varying filtering may be performed by the all-pass filters 534 and 544 according to the following equation:
x′(n)=−β(n)x(n)+x(n−1)+β(n)x′(n−1)
where β is a time-varying parameter, n is the discrete time index, x is an audio signal of one channel, and the prime denotes the filtered audio signal. The coefficient β may be a different value for each channel and may be varied slowly in time with β∈[−0.1, 0.1].
The mean short time coherence of the first and second acoustic signals x1(n) and x2(n) may be calculated (Act 624) by averaging over frequency and time according to the following equations:
where the Fourier spectra X1,2(Ωμ,n) for the μ-th sub-band has a center frequency Ωμ of the discrete time point (sampling instant n). The symbol < > may indicate smoothing in time, for example, by a first order infinite impulse response filter, and the asterisk may indicate the complex conjugate. The number of the nodes or sampling points of the discrete Fourier transform (DFT) spectra may given by NDFT. The term C(Ωμ,n) may be given by the ratio of the root mean square of the cross periodogram, that is, the root mean square of the complex short-time cross power density spectrum and the product of the auto periodograms. The time constant λ may range from about 0.9 to about 99.
If the amount of correlation between the signals is not above a predetermined threshold (Act 626), the first and second de-correlation circuits 430 and 442 may be deactivated (Act 628) or the degree of de-correlation may be reduced. The parameter β(n) may remain substantially constant over multiple sampling periods. For example, β(n) may be about 0.1 over a period of about one second. The parameter β(n) may assume a value of about −0.1 through linear interpolation over about 200 sampling periods. Such modeling may result in unperceivable artifacts in the first and second channel loudspeaker output signals.
The amount of de-correlation may need to be increased. The time-varying filtering may be complemented by non-linear processing of the filtered signals χ′1,2(n). Previous all-pass filtering may have obtained a minimum convergence velocity for the overall adaptation of the first and second echo compensation filters. Non-linear processing may be performed according to the following equation:
where the tilde denotes the de-correlated signals, and α is an arbitrary parameter representing the degree of the non-linearity.
The non-linear processing or all-pass processing (Act 640) may initially be performed using a value for α of about 0.7. The degree of the non-linearity a may be adapted. The system distance D(n) may be periodically calculated (Act 644) according to the following equation:
where N is the length of the first and second channel echo compensation filters 244 and 246, NT is a pre-determined number of sampling times, and ĥ1(n) and ĥ2(n) are the filter coefficients of the respective echo compensation filters.
The strength or amount of de-correlation using non-linear processing and/or time-varying filtering β may depend on the adaptation state or performance of the echo compensation filters. The system distance may measure the performance of the first and second channel echo compensation filters 244 and 246. By controlling α(n) based on the system distance D(n), artifacts in the processed audio signals may be minimized. A mapping of the system distance D(n) to a value for α(n) for the non-linear processing may be performed using a table or other structure. The system difference D(n) may be recalculated after the parameter β has been varied for about one second. The parameter α(n) of the non-linear processing (e.g., half-wave rectification) may be set according to the following criteria:
If D(n) exceeds a predetermined value, the controller 124 may control the de-correlation circuits 430 and 442 to minimally de-correlate the acoustic input signals x1(n) and x2(n). Alternatively, the controller 124 may deactivate the de-correlation circuits 430 and 442. The measured distance D(n) may fall below the predetermined value due to changes in the LRM system 140, and the controller 124 may reactivate the de-correlation circuits 430 and 442. The system distance D(n) and the filter coefficients of the time-varying filters 434 and 444 may not necessarily be calculated for each sampling instant, and may be calculated, for example, about once per second.
The amount of de-correlation may be varied after a predetermined period of time. Non-linearity may be reduced to avoid generating audible artifacts. If the LRM environment changes, the value of α(n) may be modified in response, thus providing adequate echo compensation. Further de-correlating may become less important at the near end if audio signals provided by the remote party do not show an increased correlation. If echo compensation fails to sufficiently enhance the quality of the microphone signal due to an abrupt movement of the speaker, de-correlating may be re-activated or enforced. Non-linear parameters may also be adjusted (Act 650). If additional samples are available (Act 656), the next sample may be processed (
The overall quality of the echo compensated signal e(n) may be enhanced by using one or more directional microphones to provide a plurality of microphone signals. An optional beam-forming circuit 182 (
The logic, circuitry, and processing described above may be encoded in a computer-readable medium such as a CD ROM, disk, flash memory, RAM or ROM, an electromagnetic signal, or other machine-readable medium as instructions for execution by a processor. Alternatively or additionally, the logic may be implemented as analog or digital logic using hardware, such as one or more integrated circuits (including amplifiers, adders, delays, and filters), or one or more processors executing amplification, adding, delaying, and filtering instructions; or in software in an application programming interface (API) or in a Dynamic Link Library (DLL), functions available in a shared memory or defined as local or remote procedure calls; or as a combination of hardware and software.
The logic may be represented in (e.g., stored on or in) a computer-readable medium, machine-readable medium, propagated-signal medium, and/or signal-bearing medium. The media may comprise any device that contains, stores, communicates, propagates, or transports executable instructions for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared signal or a semiconductor system, apparatus, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium includes: a magnetic or optical disk, a volatile memory such as a Random Access Memory “RAM,” a Read-Only Memory “ROM,” an Erasable Programmable Read-Only Memory (i.e., EPROM) or Flash memory, or an optical fiber. A machine-readable medium may also include a tangible medium upon which executable instructions are printed, as the logic may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.
The systems may include additional or different logic and may be implemented in many different ways. A controller may be implemented as a microprocessor, microcontroller, application specific integrated circuit (ASIC), discrete logic, or a combination of other types of circuits or logic. Similarly, memories may be DRAM, SRAM, Flash, or other types of memory. Parameters (e.g., conditions and thresholds) and other data structures may be separately stored and managed, may be incorporated into a single memory or database, or may be logically and physically organized in many different ways. Programs and instruction sets may be parts of a single program, separate programs, or distributed across several memories and processors. The systems may be included in a wide variety of electronic devices, including a cellular phone, a headset, a hands-free set, a speakerphone, communication interface, or an infotainment system.
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
06009692 | May 2006 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
5828756 | Benesty et al. | Oct 1998 | A |
6049607 | Marash et al. | Apr 2000 | A |
6421377 | Langberg et al. | Jul 2002 | B1 |
6442275 | Diethorn | Aug 2002 | B1 |
6510225 | Robertson et al. | Jan 2003 | B1 |
6738480 | Berthault et al. | May 2004 | B1 |
6839426 | Kamoi et al. | Jan 2005 | B1 |
6895093 | Ali | May 2005 | B1 |
6895095 | Thomas | May 2005 | B1 |
20020176585 | Egelmeers et al. | Nov 2002 | A1 |
20030021389 | Hirai et al. | Jan 2003 | A1 |
20030091182 | Marchok et al. | May 2003 | A1 |
20030185402 | Benesty et al. | Oct 2003 | A1 |
20040018860 | Hoshuyama | Jan 2004 | A1 |
20040125942 | Beaucoup et al. | Jul 2004 | A1 |
20050213747 | Popovich et al. | Sep 2005 | A1 |
20060018459 | McCree | Jan 2006 | A1 |
20060062380 | Kim et al. | Mar 2006 | A1 |
20060067518 | Klinke et al. | Mar 2006 | A1 |
20060233353 | Beaucoup et al. | Oct 2006 | A1 |
20070093714 | Beaucoup | Apr 2007 | A1 |
Number | Date | Country |
---|---|---|
1 404 147 | Mar 2004 | EP |
1 406 397 | Apr 2004 | EP |
WO 9317510 | Sep 1993 | WO |
Number | Date | Country | |
---|---|---|---|
20080031469 A1 | Feb 2008 | US |