The present invention relates to the dereverberation of audio signals. More in particular, the present invention relates to a method and a device for estimating the reverberations in audio signals, in particular non-stationary audio signals such as speech.
It is well known that a signal, such as an acoustic signal, may contain reverberations or echoes from various surfaces. In a room, for example, an acoustic signal (such as speech or music) is reflected by the walls, the ceiling and the floor. A microphone present in the room will therefore receive the acoustic signal as a combination of a direct signal (received directly from the source) and an indirect signal (received via reflecting surfaces). This indirect signal is referred to as the reverberations part of the received signal.
Many attempts have been made to separate the desired (that is, direct) signal from its reverberations. The paper “A New Method Based on Spectral Subtraction for Speech Dereverberation” by K. Lebart, J. M. Boucher and P. N. Denbigh, Acta Acustica, Vol. 87, pages 359-366 (2001), for example, discloses a method for the suppression of late room reverberation from speech signals based on spectral subtraction. In this known method, the frequency spectrum of the reverberations part of the received signal is estimated using the delayed frequency spectrum of the received signal and a (first) parameter that is indicative of the decay of the reverberations part over time. The frequency spectrum of the reverberations part may then be used to estimate, by spectral subtraction, the frequency spectrum of the direct part.
This known method works well for signals of which the amplitudes of the direct part and reverberations part are similar or, in other words, of which the energy content of the direct part is (much) smaller than the energy content of the reverberations part. However, when the amplitude (and hence the energy content) of the direct signal is significantly larger than the amplitude (and energy content) of the reverberations signal, the known method introduces errors which result in signal distortion.
It is therefore an object of the present invention to overcome these and other problems of the Prior Art and to provide a method of and a device for estimating the reverberations part of an acoustic signal that takes any difference in energy contents of the direct part and the reverberations part into account.
Accordingly, the present invention provides a method of estimating the reverberations in a signal comprising a direct part and a reverberations part, the method comprising the step of providing an estimate of the frequency spectrum of the reverberations part using a first parameter, a second parameter and the frequency spectrum of the signal, wherein the first parameter is indicative of the decay of the reverberations part over time, and wherein the second parameter is indicative of the amplitude of the direct part relative to the reverberations part of the signal.
By providing a second parameter that is indicative of the amplitude of the direct part relative to the reverberations part, the relative amplitudes of these two signal parts is taken into account. As a result, it is possible to compensate for any difference in amplitudes and thereby to obtain a more accurate estimate.
It will be understood that the frequency spectrum of the direct part, and hence the direct signal part itself, may be obtained from the estimated frequency spectrum of the reverberations part using well-known spectral subtraction techniques.
In a preferred embodiment, the second parameter is inversely proportional to the early-to-late ratio of the signal. The early-to-late ratio is a ratio that indicates the amplitude of the direct (early) part relative to the reverberations (late) part of the signal.
In an advantageous embodiment, the step of providing an estimate of the frequency spectrum of the reverberations part involves using a previous estimate of the frequency spectrum of the reverberations part. In this way, the estimate is determined using a previous value and an update term, the update term preferably comprising the frequency spectrum of the signal.
In a preferred embodiment, therefore, the estimate of the frequency spectrum of the reverberations part is equal to the first parameter times a previous absolute value of the frequency spectrum, minus a third parameter times the difference of the previous absolute value of the frequency spectrum and a previous estimate of the frequency spectrum of the reverberations signal, wherein the third parameter is equal to the first parameter minus the second parameter.
In a practical embodiment, the method of the present invention comprises the further steps of:
The frames may overlap partially, so that some signal values are used more than once.
Although a single value of the second parameter can be determined for a situation in which the distance between the source and the microphone (and hence the relative amplitude of the direct signal) does not change, it is preferred that the second parameter is determined for each frame separately. In this way, a more accurate determination of the second parameter is made possible, in particular when movement is involved. In accordance with the present invention, the second parameter is preferably determined using only the signal and derivatives thereof.
It is further preferred that for each time segment of the signal, the immediately preceding time segment is used to determine the frequency spectrum of the signal itself and the estimated frequency spectrum of the direct part that are used for the estimation of the frequency spectrum of the reverberations part in the current time segment.
In order to obtain an even better estimate it is preferred that the frequency spectrum of the signal is determined, for each frame, per frequency band, and that the second parameter, the estimated frequency spectrum of the direct signal and the estimated frequency spectrum of the reverberations part are also determined per frequency band. In this way, separate estimates can be made for individual frequency bands. The choice of frequency bands may be dictated by the particular signal.
As stated above, the frequency spectrum of the reverberations part is estimated using the frequency spectrum of a previous frame, the first parameter, the second parameter, and the estimated frequency spectrum of the reverberations part of a previous frame (for the first frame, the estimate of the previous frame may be assumed to have a predetermined value, for example zero). The method may comprise the further step of dereverberating the signal using the estimate of the frequency spectrum of the reverberations part, preferably using spectral subtraction and reconstruction of the dereverberated signal on the basis of the spectrum resulting from the subtraction.
The present invention also provides a computer program product for carrying out the method defined above. The computer program product may comprise a carrier, such as a CD, DVD, or a floppy disc, on which a computer program is stored in electronic or optical form. The computer program specifies the method steps to be carried out by a general purpose computer or a special purpose computer.
The present invention additionally provides a device for carrying out the method as defined above, as well as an audio system, such as a speech recognition system comprising such a device.
The present invention will further be explained below with reference to exemplary embodiments illustrated in the accompanying drawings, in which:
The exemplary signal shown in
The direct signal part d is received via the direct signal path, that is, without being reflected, while the indirect or reverberations part r is received via walls and other reflecting surfaces. As can be seen, the amplitude yi of the reverberations part r decays exponentially. This is also illustrated in
Expressed in words, the energy decay curve EDC yields, for a given sample index i, ten times the logarithm of the energy of the remainder of the signal. It can be seen that in the third section III containing the EDC of the reverberations part r, the EDC is approximately linear (it is noted that the values of the EDC shown are negative as all values of yi in the present example are much smaller than one). As the EDC is logarithmic, this linear decay of the EDC represents an exponential decay of the signal y. The slope of the EDC in the section III may be represented by a parameter αr, where the subscript r refers to the reverberations.
It can further be seen in
The signal y of
According to the Prior Art mentioned above this may be accomplished as follows. The signal y is divided into frames that each contain a number of samples of the digitized signal. Each frame may, for example, contain 128 or 256 samples. Zeroes may be added (so-called “zero padding”) to arrive at a suitable number of samples per frame. The frames typically but not necessarily overlap partially, the term “block” is used to refer to the “new” samples of each frame.
Advantageously a window, for example a Hamming window known per se, is applied to alleviate the introduction of artifacts. For each frame the frequency spectrum Y is determined using the well-known Fast Fourier Transform (FFT). For each frame κ an estimate {circumflex over (R)} of the reverberations frequency spectrum R is determined by:
{circumflex over (R)}(κB,m)=αr·|Y((κ−1)B,m)| (2)
where B is the block size (that is, the number of new samples in each frame), m is the frequency, the vertical bars indicate the absolute value, and αr is the parameter indicating the decay speed of the reverberations part r. Mathematically, αr may be defined as
where Fs is the sampling frequency and
In being the natural logarithm (3ln10 is approximately equal to 6.9) and T60 being the reverberation time, that is, the length of time after which the signal level has dropped 60 dB (deciBel) relative to the initial signal level.
The estimate {circumflex over (R)} of the reverberations frequency spectrum may be used to determine a gain function:
where λ is the so-called spectral floor, a value which ensures that any severe distortion of the dereverberated signal is avoided. A typical value of λ is 0.1, although other values may also be used. The frequency spectrum Y of the original signal is multiplied by this gain factor G(κB, m) to yield the frequency spectrum D of the direct (dereverberated) signal d.
Although this known method is very effective, it leads to signal distortions when the direct part d of the signal y has a large amplitude (energy) relative to the reverberations part r or, in other words, when the signal shows a large step in region II of
where k is the sample number separating regions II and III in
When the Early-to-Late Ratio ELR is small (for example smaller than 0 dB, using the definition above), the energy content of the direct part d is small compared to the reverberations part r of the signal y, and the Prior Art method discussed above can effectively be used to dereverberate the signal y. However, when the ELR is large (for example larger than 0 dB, or larger than 5 dB, using the definition above), the known method introduces. distortions as the step in region II of the EDC (
Accordingly, the present invention uses an improved estimation method in which the relative energy contents of the direct signal part d and the reverberations signal part r are taken into account.
Starting from equation (2) above, the present invention proposes to correct the estimation of the frequency spectrum R of the reverberations part r by subtracting a correction term γ.C(κ), where γ is a factor dependent on the ELR, and where C is a function of the frame number κ (that is, time), and possibly also of the block size B and the frequency m:
{circumflex over (R)}(κB, m)=αr·|Y((κ−1)B, m)|−γ.C((κ−1)B, m) (7)
The present invention further proposes to use the estimate {circumflex over (D)} of the direct part frequency spectrum D as the function C:
{circumflex over (R)}(κB,m)=αr·|Y((κ−1)B,m)|−γ.{circumflex over (D)}((κ−1)B,m) (8)
It will be understood that there may be other functions C that have the required properties.
Since the estimate {circumflex over (D)} of the direct part frequency spectrum D can be expressed as:
{circumflex over (D)}(κB,m)=|Y(κB,m)|−{circumflex over (R)}(κB,m) (9)
equation (8) may be rewritten as:
{circumflex over (R)}(κB,m)=(αr−γ)·|Y((κ−1)B,m)|+γ.{circumflex over (R)}((κ−1)B,m) (10)
Introducing a parameter β(κB)=αr−γ, with 0≦β(κB)≦αr, equation (10) can be written as:
{circumflex over (R)}(κB,m)=β(κB)·|Y((κ−1)B,m)|+(αr−β(κB)).{circumflex over (R)}((κ−1)B,m) (11)
Using equation (9), equation (11) may also be expressed as:
{circumflex over (R)}(κB,m)=β(κB)·{circumflex over (D)}((κ−1)B,m)+αr·{circumflex over (R)}((κ−1)B,m) (12)
It can be shown that:
in other words, the (second) parameter β(κB) is inversely proportional to the Early-to-Late Ratio ELR. It is further noted that both β and ELR are functions of time (that is, the frame index κ times the block index B), and that β (and ELR) may also depend on the frequency (or sub-band) m: β(κB,m).
It can be seen from equation (11) that the estimate {circumflex over (R)}(κB,m) according to the present invention combines both the (absolute value of the) frequency spectrum Y of the previous frame and the previous estimate, while taking the ELR into account. It can further be seen from equations (11) and (12) that for a large ELR, β(κB) is small and the estimate {circumflex over (R)}(κB,m) is effectively based on the previous estimate, suppressing the influence of the spectrum Y. For a small ELR, β(κB) is “large” and approximately equal to αr and {circumflex over (R)}(κB, m) is then approximately equal to αr·|Y((κ−1)B,m)|, as in the Prior Art.
It can therefore be seen that the method of the present invention is consistent with the Prior Art in case the Early-to-Late Ratio ELR is small, while providing an important improvement when ELR is large.
It is noted that the method of the present invention may be carried out per sub-band, that is per frequency m, or independent of the frequency, using a single term for all frequencies.
The method of the present invention may be implemented in software or in hardware. An exemplary hardware implementation is shown in
In the example shown, the first delay element 11 receives the absolute value (that is, the magnitude) |Y(κB,m)| of the frequency spectrum Y and outputs the delayed absolute value |Y((κ−1)B,m)|. In the preferred embodiment, the delay Δ is equal to one frame. In the amplifier 12, this delayed absolute value is multiplied by the (second) parameter β and fed to the combination element 13 which is preferably constituted by an adder.
The combination element 13 also receives the output signal of the second multiplier 15 and outputs the estimate {circumflex over (R)}(κB,m). This estimate is received by the second delay element 14 which outputs the delayed estimate {circumflex over (R)}((κ−1)B,m) to the second amplifier 15. This delayed estimate {circumflex over (R)}((κ−1)B,m) is multiplied with the factor (αr−β) in the amplifier 15 and fed to the combination element 13. As can be seen, the filter section 10 produces the same result as equation (11) above.
The parameter β (or β(κB)) may be predetermined. For example, a fixed value of 0.1 or 0.2 could be used for a situation, provided an estimate of the ELR for that situation is known. It is preferred, however, to estimate β(κB) for each signal. Of course β(κB) can be estimated on the basis of the Early-to-Late Ratio, using formula (13) above.
However, starting from an initial value of β(κB) (which may be a predetermined value, for example zero), an update may be provided using the formula:
β(κB)=β((κ−1)B)+f(|Y(κB,m)|,{circumflex over (R)}(κB,m),λ) (14)
where the function f( ) is an update function and the parameter λ is the spectral floor mentioned above. An example of an update function which uses the absolute value |Y(κB,m)| and the estimated spectrum {circumflex over (R)}(κB,m) is:
where κ is an auxiliary parameter having a (very small) value to prevent division by zero, and where μ is a non-negative parameter controlling the speed and accuracy of the update of β(κB).
A preferred embodiment of a device for dereverberating a signal is schematically shown in
The FFT unit 20 receives the (digital) signal y(k) and performs a well-known Fast Fourier Transform on a frame of signal samples. It will be understood that a A/D (analog/digital) converter may be present if the original signal is analog. The (complex) spectrum Y(m) produced by the FFT unit 20 is fed to the decomposition unit 30 which decomposes the 25 complex spectrum into a phase part φ and a magnitude part ρ. This magnitude ρ is equal to the absolute value |Y(κB,m)|, where κ is the frame index (frame number), B is the block size and m is the frequency, as before. The phase part φ is fed directly to the spectrum reconstruction unit 80, while the magnitude part ρ is fed to the filter unit 10, the parameter unit 40, the gain unit 50 and the multiplication unit 70.
The filter unit 10, which may be identical to the filter unit 10 of
The unit 50 produces a gain factor G(κB,m), for example in accordance with equation (5). This gain factor is fed to multiplier 70 where it is multiplied with the absolute value spectrum |Y(κB,m)| to produce the dereverberated spectral magnitude ρ′. The reconstruction unit 80 reconstructs the dereverberated spectrum |{circumflex over (D)}(κB,m)| on the basis of φ and ρ′. This spectrum |{circumflex over (D)}(κB,m)| is then converted into the dereverberated time signal {circumflex over (d)}(k) by the IFFT (Inverse Fast Fourier Transform) unit 90.
An audio system incorporating the device 1 of
The present invention is based upon the insight that the energy content ratio of the direct signal part and the reverberations signal part has to be taken into account when dereverberating a signal. By introducing a parameter related to this energy content ratio, a better dereverberation which introduces less signal distortion is achieved.
The present invention could be summarized as a method of estimating the reverberations in a signal comprising a direct part and a reverberations part, the method comprising the steps of estimating the spectrum of the reverberations part using a first parameter and the spectrum of the signal, and correcting the estimated spectrum using correction term involving a second parameter, wherein the second parameter is indicative of the amplitude of the direct part relative to the reverberations part of the signal.
It is noted that any terms used in this document should not be construed so as to limit the scope of the present invention. In particular, the words “comprise(s)” and “comprising” are not meant to exclude any elements not specifically stated. Single (circuit) elements may be substituted with multiple (circuit) elements or with their equivalents.
The term computer program product should be understood to include any physical realization, e.g. an article of manufacture, of a collection of commands enabling a processor—generic or special purpose—, after a series of loading steps to get the commands into the processor, to execute any of the characteristic functions of an invention. In particular the computer program product may be realized as program code, processor adapted code derived from this program code, or any intermediate translation of this program code, on a carrier such as e.g. a disk or other plug-in component, present in a memory, temporarily present on a network connection—wired or wireless—, or program code on paper. Apart from program code, invention characteristic data required for the program may also be embodied as a computer program product.
It will be understood by those skilled in the art that the present invention is not limited to the embodiments illustrated above and that many modifications and additions may be made without departing from the scope of the invention as defined in the appending claims.
Number | Date | Country | Kind |
---|---|---|---|
04103509.8 | Jul 2004 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2005/052377 | 7/18/2005 | WO | 00 | 1/18/2007 |