1. Priority Claim
This application claims the benefit of priority to EP 06016029.8, filed Aug. 1, 2006, which is incorporated herein by reference.
2. Technical Field
The present inventions relate to a dereverberation system for use in a signal processing apparatus and, more particularly, to a dereverberation system that may be used in a loudspeaker-room-microphone environment.
3. Related Art
Signal processing systems are used in many applications. One set of applications includes speech signal processing/recognition, where the signal processing system may be used to enhance the intelligibility of the speech signals. Another application is the enhancement of the quality of signals transmitted and/or received in a communication system. Signal processing in these systems may be used for noise reduction as well as echo compensation.
A microphone may be used in a reverberant environment. A microphone used in such environments may detect audio signals that are generated by an audio source directly and delayed reflections. The signals received at the microphone may be smeared over time as a result of the environmental acoustics that generate a reverb response. Reverberation signals at a microphone are noticeable in an office, a vehicle, dealer cabin, or other enclosed space, and may reduce the intelligibility of the desired microphone signal, such as a target speech signal.
One method of dereverberating a microphone signal is deconvolution. In this method, the microphone signal is inverse filtered using an estimate for the acoustic channel response. Accurate dereverberation depends on an accurate estimate of the acoustic channel response, which can be difficult to ascertain.
Another method of dereverberating the microphone signal process the direct-path speech signal uses pitch enhancement or a predictive coding. This method is a multi-channel approach that averages multiple microphone signals to obtain a reduction of the reverberation contribution to the processed signal. However, implementation of this multi-channel approach may be expensive and require many hardware components. Although various dereverberation systems have been contemplated, the contemplated systems suffer from various deficiencies.
A system used in a loudspeaker-room-microphone environment includes a microphone signal partitioner that divides a signal from a microphone into one or more divided portions. A reverberation energy estimator estimates reverberation energy in some of the divided portions of the microphone signal based on a loudspeaker signal. The estimated reverberation energy is processed to generate a dereverberated output signal.
Other systems, methods, features and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The invention can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
The microphone 120 receives direct signals and indirect reverberant signals that may interfere with the intelligibility of desirable direct signals. Audio provided by the output of loudspeaker 115 may include a portion 126 that is directly communicated through environment 105 and a portion 130 that is indirectly communicated through environment 105 as a result of reflections within the environment 105. Speech signals provided by speaker 125 may include a portion 135 that is directly communicated through environment 105 and a portion 140 that travels through environment 105 due to reflections in the environment 105.
The signal processing system 110 processes the signal from microphone 120 to generate a processed output signal 145. The output signal 145 includes the desired direct audio 135 provided by speaker 125 while suppressing the reverberant signals that may be generated in environment 105. In
The signal from microphone 120 is converted to a digital format by an analog-to-digital connector within or separate from the signal processing system 110 at a microphone signal partitioner 155. The microphone signal partitioner is adapted to divide the signal from the microphone into one or more divided portions. The microphone signal may be transformed into the frequency domain and subsequently divided into time frames. Alternatively, the microphone signal may be divided into time frames and subsequently transformed for processing in the frequency domain. The overall processing may be performed in the frequency domain. Processing may also occur after filtering the microphone signal using filter banks to divide the microphone signal into sub-band signals. The overall processing may be performed in the time domain with the microphone signal divided into individual time frames.
The signal processing system 110 may estimate the reverberation energy in at least some frames or sub-bands provided by the microphone signal partitioner 155. Reverberation energy may correspond to the squared magnitude of the unwanted reverberant signal portion present in the microphone signal. The estimated reverberation energy may be subtracted from the signal provided by microphone 120 to remove at least some part of the unwanted reverberant signal portion. Processing for dereverberation is not necessary for all of the divided portions. In some instances, the first frames/portions almost exclusively exhibit the desired speech signal without the corresponding reverberation signal portion.
The signal processing system 110 may include a reverberation energy estimator to estimate the reverberation energy in at least some of the divided portions of the microphone signal. The estimated reverberation energy may be calculated, at least in part, on the basis of at least one loudspeaker signal 150. The reverberation energy estimator 160 may use the loudspeakers signal 150 to estimate the impulse response of the loudspeaker-room-microphone in environment 105. In
There are several ways of using the loudspeaker signal 150 to estimate the reverberation energy. Given the loudspeaker signal 150 on the one hand and the microphone signal in the other, the impulse response of the loudspeaker-room-microphone environment 105 may be determined. From the impulse response, the reverberation energy may be derived. The unwanted reverberant signal portion of a microphone signal may be represented as
with the discrete time index n, the loudspeaker signal xc(n) and the impulse response of the loudspeaker-room-microphone system h(n). The sum starts with some discrete time value Dt indicating the beginning of reverberant portions of the acoustic signal detected at microphone 120 causing the reverberation. For a time interval up to Dt the microphone signal is dominated by the energy of the desired signal.
One way of estimating the impulse response of environment 105 is by incorporating an adaptive filter in the reverberation energy estimator 160. The adaptive filter filters the microphone signal so that the automatic adaptation of the filter coefficients results in filter coefficients that model the impulse response. The impulse response of the loudspeaker-room-microphone system is determined from the adapted filter coefficients of the adaptive filter.
The signal processing system 110 uses a microphone signal filter 165 that is adapted to apply a filter to one or more of the divided portions of the microphone signal on the basis of the estimated reverberation energy as determined by the reverberation energy estimator 160. The filtered signal is provided at output 145. The output signal at 145 includes the desired direct audio 135 provided by speaker 125 while the reverberant signals produced in environment 105 are suppressed.
The filtering of the microphone signal by filter 165 on the basis of the estimated reverberation energy of the at least some of the frames or sub-bands may be performed through a Wiener filter. Other filters such as a magnitude filter may also be used. When filtering the microphone signal in the frequency domain, the microphone signal may be Fourier transformed to obtain Fourier transformed signals Yμ(k), where k and μ denote the frame number and the index of the frequency bin, respectively. The microphone signal may be Fourier transformed before it is divided into frames or it may be divided into frames followed by Fourier transformations of each frame. At least some of the Fourier transformed signals Yμ(k) may be filtered using the Wiener filter Wμ(k)=1−|{circumflex over (R)}μ(k)|2/|Yμ(k)|2 to obtain filtered signals {circumflex over (X)}μ(k) according to {circumflex over (X)}μ(k)=Wμ(k)Yμ(k), where |{circumflex over (R)}μ(k)|2 denotes the estimated reverberation energy of the at least some of the frames.
The signal processing may be performed for sub-band signals obtained from the microphone signal using a filter bank in the microphone signal partitioner 155. In such instances, the microphone signal is filtered by the filter-bank to obtain sub-band signals Yμ(k), where k and μ denote the time index of the sub-sampled microphone signal that is filtered by the filter-bank and the index of the sub-band, respectively. At least some of the sub-band signals Yμ(k) are filtered using the Wiener filter Wμ(k)=1−|{circumflex over (R)}μ(k)|2/|Yμ(k)|2 to obtain filtered signals {circumflex over (X)}μ(k) according to {circumflex over (X)}μ(k)=Wμ(k)Yμ(k), where |{circumflex over (R)}μ(k)|2 denotes the estimated reverberation energy of at least some of the sub-bands.
The reverberation energy |{circumflex over (R)}μ(k)|2 of at least some of the frames or sub-bands may be estimated according to the following formula:
|{circumflex over (R)}μ(k)|2=|Yμ(k−D)|2Aμexp(−γμD)+|{circumflex over (R)}μ(k−1)|2exp(−γμ)
where D is a predetermined delay, Aμ is an amplitude representing the ratio of direct-path energy to reverberation energy, γμ is a parameter determined on the basis of at least one loudspeaker signal, k denotes the frame number or the time index of the sub-sampled microphone signal, and μ denotes the index of the frequency bin or the index of the sub-band, respectively. The predetermined delay may compensate or take into account that the initial part of the microphone signal is dominated by the direct acoustic path, e.g., a significant reverberant signal portion is present after some delay D, e.g. D≈30 ms. The predetermined amplitude Aμ may, in principle, be estimated when the position of the speaker 125 relative to the loudspeaker 115 is known. This position may be estimated by a beamforming of microphone signals obtained from a microphone array or other ways. Aμ can be chosen as a real value of the range from about 0.1 to about 0.5.
The parameter γμ may be determined in the time domain (e.g., obtain a quick valuation), in the frequency domain, or the sub-band frequency domain. When determined in the frequency domain or the sub-band frequency domain it may be averaged over frequencies or sub-bands in order to obtain an averaged parameter γμ that does not substantially depend on the frequency.
If an adaptive filter is used in the reverberation energy estimator 160, the parameter γμ may be determined from the filter coefficients. The reverberation time T60, which may correspond to the time it takes for the reverberation signal to decay by about 60 dB, may be estimated form the energy decay curve (EDCL(n)) given by the filter coefficients ĥL(n) of the adaptive filter. This corresponds to the following equation:
where LL denotes the length of the adaptive filter. The slope of the EDCL(n) is related (inversely proportional) to the reverberation time T60.
The parameter γμ may be calculated as a function of the reverberation time and a subsampling rate RS by which the microphone signal is sub-sampled by the microphone signal partitioner 155 for frame or sub-band processing. For example, one equation that may be used is γμ=6 ln 10 RS/T60 fS, where fS denotes the sampling rate of the microphone signal in the time domain.
At 220, the process receives a loudspeaker signal. The loudspeaker signal may be used to derive or estimate a reverberation compensation signal at 225. The estimated reverberation compensation signal is applied to at least some of the Fourier transformed microphone signals at 230. Application of the estimated reverberation compensation signal results in a dereverberated microphone output signal that includes the desired target microphone signal, while suppressing reverberant portions of the microphone signal. The dereverberated microphone output signal is provided to a target application at 235. The target application may include a transceiver of a vehicle communications system, a speech recognition system, or a process or system that enhances a microphone signal.
System 300 includes an echo canceller 325 that is adapted to receive microphone signals at 330 and a loudspeaker signal at 335. The echo canceller 325 includes an adaptive filter 340 and a combiner 345. A dereverberation processor 350 receives the filter coefficients of filter 340. Once the coefficients of filter 340 converge, they correspond to the impulse response of the loudspeaker-room-microphone environment 305 and may be used by the dereverberation processor to filter reverberant components from the signal at 355. The dereverberated signal at 360 includes the desired target signals received by microphone 315 unwanted reverberant signals are suppressed.
An adaptive echo canceller may estimate the impulse response of the loudspeaker-room-microphone environment. The filter coefficients of the echo cancellation canceller are automatically adjusted to model the impulse response of the loudspeaker-room-microphone system.
Due to the effect of reverberation, the acoustic spectrum detected by the microphone is smeared over time. The smearing of the energy of the microphone signal can be modeled by
where Xc,μ is the Fourier transformed signal emitted by the speaking person (clean speech signal), Gμ models the energy decay of the impulse response of the loudspeaker-room-microphone system in the frequency domain. In this example, it may be assumed that the wanted signal Xμ(k) and the reverberation signal portion Rμ(k) are uncorrelated. The energy decay comprises a first part corresponding to a number of initial frames, D frames, that exhibit no significant reverberation and a further part contributing to the reverberation signal portion:
The reverberation energy may be used in a spectral subtraction filtering of the microphone signal. Assuming an exponential decay of the reverberation energy for k>0 (and Gμ(k)=1 for k=0) the reverberation energy may be represented as follows:
where D is a fixed delay by D frames and γμ denotes the exponential decay parameter depending on room parameters as, e.g., the room size and absorption characteristics. The parameter Aμ denotes the ratio of the energy of the direct acoustic path from a source to the microphone to the reverberation energy that may depend on the position of a sound source or a speaking person relative to the microphone.
Approximating the speaking person's clean speech signal as |Xc,μ(k−D)|2 by the reverberated microphone signal |Yμ(k−D)|2 an estimate for the reverberation energy can be calculated by the recursive formula
|{circumflex over (R)}μ(k)|2=|Yμ(k−D)|2Aμexp(−γμD)+|{circumflex over (R)}μ(k−1)|2exp(−γμ).
The exponential decay parameter γμ may be determined in the time domain. The echo canceller may be used to filter the microphone signal. The reverberation time T60, which may correspond to the time that is needed for the reverberation to decay by about 60 dB, is estimated from the energy decay curve (EDCL(n)) given by the filter coefficients ĥL(n) of the echo canceller after convergence of the adaptation algorithm
where LL denotes the length of the adaptive filter used in the echo canceller. The EDCL(n) value may represent the total amount of signal energy remaining in the reverberation impulse response at time n.
The slope of the EDCL(n) may be estimated. An upper and a lower threshold for a range of values of the EDCL(n), e.g., Emax=−20 dB and Emin=−40 dB, may be chosen. The discrete time indices n1 and n2 are determined for which the EDCL(n) exhibits values closest to Emax and Emin, respectively. The reverberation time T60 may be determined by extrapolation of the slope of the EDCL(n):
where fS denotes the sampling rate of the digital microphone signal in the time domain. Given an exponential decay of the reverberation energy exp(−γμk) an energy decrease by 10−6 (after the reverberation time T60) implies exp(−γμ T60 fS/RS)=10−6, where RS denotes a subsampling rate of the processed microphone signal (divided into frames or sub-blocks) due to the frame based or sub-block based processing. Accordingly, the exponential decay parameter is given by
From the foregoing, an estimate of the reverberation energy based on the impulse response may be determined.
The estimated reverberation energy is used at 430 for spectral subtraction to obtain a dereverberated microphone signal. The spectral subtraction may be performed using a Wiener filter. The microphone signal in each frame Yμ(k) may include two uncorrelated parts (other contributions as, e.g., ambient noise are neglected here for simplicity): the wanted signal Xμ(k) and the reverberant signal contribution Rμ(k) where k and μ denote the frame number and the index of the frequency bin.
In the spectral subtraction operation used to achieve the dereverberated microphone signal {circumflex over (X)}μ(k), the amplitudes of the microphone signal in each frame Yμ(k) are scaled with real valued coefficients Wμ(k): {circumflex over (X)}μ(k)=Wμ(k) Yμ(k). A Wiener filter may be used having the coefficients Wμ(k):
with {circumflex over (R)}μ(k) being determined by the above recursion formula.
The signal {circumflex over (X)}μ(k) is output at 435. The output signal may represent an enhanced microphone signal that is to be transmitted to a remote communication party in a hands-free telephony system. It may also represent an enhanced speech input for a speech recognition or voice control system.
The system configuration shown in
By filtering the beamformed microphone signal with filter 540, an estimate for the reverberant signal portion of the beamformed microphone signal and an estimate of the reverberation energy may be obtained. The output of the filter 540 and the beamformed microphone signal are provided to the input of a spectral subtractor 545.
In this example, and in some of the examples described, it has been assumed that the impulse response of the loudspeaker-to-microphone environment is similar to the one from the speaking person. Consequently, if a plurality of loudspeakers are present, the loudspeaker signal corresponding to the loudspeaker that is closest to the speaking person may be chosen for an estimate of the impulse response of the loudspeaker-room-microphone system ĥ.
To obtain a reliable estimate for the reverberation energy, the acoustic time delays may be matched to the subset of filter coefficients {tilde over (h)} chosen for estimating the reverberant portion of the beamformed microphone signal. Thus, the subset of filter coefficients {tilde over (h)} may be delayed by Dh samples using, for example, delay line 545. In addition, the energy of the estimated reverberation energy may be adjusted by some factor bh.
The parameters Dh and bh may be optimized if the actual position of the person uttering speech signals relative to the microphones 510 is known. In
The described systems may be implemented in software, hardware, or a combination of software and hardware. One example of the platform on which the signal processing systems may be implemented is shown in
In
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
06016029 | Aug 2006 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
5305307 | Chu | Apr 1994 | A |
5721772 | Haneda et al. | Feb 1998 | A |
5859914 | Ono et al. | Jan 1999 | A |
6694020 | Benesty | Feb 2004 | B1 |
7068798 | Hugas et al. | Jun 2006 | B2 |
7171003 | Venkatesh et al. | Jan 2007 | B1 |
7630503 | Schulz et al. | Dec 2009 | B2 |
7764783 | Pai et al. | Jul 2010 | B1 |
20040170284 | Janse et al. | Sep 2004 | A1 |
20040204933 | Walker | Oct 2004 | A1 |
20050118956 | Haeb-Umbach et al. | Jun 2005 | A1 |
20050244023 | Roeck et al. | Nov 2005 | A1 |
20060115095 | Giesbrecht et al. | Jun 2006 | A1 |
20060222172 | Chhetri et al. | Oct 2006 | A1 |
20070165871 | Roovers et al. | Jul 2007 | A1 |
20080002833 | Kuster | Jan 2008 | A1 |
20080300869 | Derkx et al. | Dec 2008 | A1 |
Number | Date | Country |
---|---|---|
1 521 240 | Apr 2005 | EP |
WO 2006011104 | Feb 2006 | WO |
Entry |
---|
Heng-Chou Chen, Oscal T.-C.Chen, a Subband Acoustic Echo Canceller Using the NLMS Algorithm with Quasi-orthonormal Initialization Scheme,IEEE,p. 331-334. |
Shoji Makino and Yoichi Haneda,Subband Echo Canceler with an Exponentially weighted stepsize NLMS Adaptive Filter, 1999,Electronic and Communications in Japan, Part 3, vol. 82,No. 3,pp. 49-57. |
Shoji Makino and Yutaka Kaneda ,Exponentially Weighted stepsize NLMS Adaptive Filter Based on the Statistics of a Room Impulse Response, IEEE Transactions on Speech and audio Processing , vol. 1,No. 1,Jan. 1993,pp. 101-108. |
Enzner et al, Partitioned residual echo power estimation for frequency domain acoustic echo cancellation and postfiltering, European Transaction on telecom, vol. 13, No. 2, Mar. 2002. |
Martin et al, combined acoustic echo cancellation dereverberation and noise reduction a two microphone approach, 1994. |
Mahieux et al, comparison of dereverberation techniques for videoconferencing applications, aes,1996. |
Enzner et al, Partioned residual echo power estimation for frequency domain acoustic echo cancellation and postfiltering,2002. |
Lebart, K. et al., “A New Method Based on Spectral Subtraction for Speech Dereverberation,” acts acustica—ACUSTICA 2001, vol. 87, pp. 359-366. |
Schroeder, M. R., “New Method of Measuring Reverberation Time,” XP-002429121, pp. 409-412. |
Number | Date | Country | |
---|---|---|---|
20080292108 A1 | Nov 2008 | US |