The present invention relates to a method for noise suppression, wherein noisy input signals in a multiple input audio processing device are subjected to adaptations and summed.
The present invention also relates to an audio processing device comprising multiple noisy inputs, an adaptation device coupled to the multiple noisy inputs, a summing device coupled to the adaptation device and an audio processor; and to a communication device having an audio processing device.
Such a method and device are known from U.S. Pat. No. 5,602,962. The known device is a speech processing arrangement having two or more inputs connected to microphones and a summing device for summing the processed input signals. The digitized input signals supply a combination of speech and noise signals to an adaptation device in the form of controllable multipliers, which provide a weighting with respective weight factors. An evaluation processor evaluates the microphone input signals and constantly adapts the weight factors or frequency domain coefficients for increasing the signal to noise ratio of the summed signal. For the case of a time variant and not stationary noise signal statistic, where noise standard deviations are not approximately time independent the respective weight factors are constantly recomputed and reset, where after their effect on the input signals is calculated and the summed signal computed. This alone leads to a very considerably number of calculations to be made by the evaluation processor. In particular in case Fast Fourier Transform (FFT) calculations are made for each input signal—wherein in addition the spectrum range of each input signal is subdivided in several sections, each section generally containing a complex number having a real part and an imaginary part, both to be calculated separately—the number of necessary real time calculations rises enormously. This puts the wanted calculation power of present days low cost processors beyond their feasible limits.
Therefore it is an object of the present invention to provide a method, an audio processing device and a communication device capable of performing noise evaluation in a multiple input device without excessive amounts of calculations and high speed processing being necessary therefor.
Thereto the method according to the invention is characterized in that noise frequency components of the noisy input signals in the summed input signals are estimated based on individually kept noise frequency components and on said adaptations.
Accordingly the audio processing device according to the invention is characterized in that the audio processor which is coupled to the adaptation device and the summing device is equipped to estimate individual noise frequency components of the noisy input signals.
It is an advantage of the method and audio processing device according to the present invention that the number of simultaneously necessary calculations can be reduced, since from the summing output signal and the individual adaptations the noise frequency components of all the noisy input signals can be estimated. This technique combines adaptive, so called beamforming with individualized noise determination, and is in particular meant for noise suppression applications in audio processing devices or communication devices and systems. Applications can now with reduced calculating power requirements more easily be implemented anywhere where noisy and reverberant speech is enhanced using multiple audio signals or microphones. Examples are found in audio broadcast systems, audio- and/or video conferencing systems, speech enhancement, such as in telephone, like mobile telephone systems, and speech recognition systems, speaker authentication systems, speech coders and the like.
Advantageously another embodiment of the method according to the invention is characterized in that the adaptations concern filtering or weighting of the noisy input signals.
When the adaptations concern filtering the noisy inputs are filtered, such as with Finite Impulse Response (FIR) filters. In that case one speaks of a Filtered Sum Beamformer (FSB), whereas in a Weighted Sum Beamformer (WSB) the filters are replaced by real gains or attenuations.
A further embodiment of the method according to the invention is characterized in that each estimated noise frequency component is related to a previous estimate of said noise frequency component and to a correction term which is dependent on the adaptations made on the noisy input signals.
Advantageously for every input signal separately the latest estimate of a respective input noise component in a frequency section or bin of the frequency spectrum is temporarily stored for later use by a recursion update relation to reveal an updated and accurately available noise component.
A still further embodiment of the method according to the invention is characterized in that the estimation of the noise frequency components of the respective input signals in the summed input signals can be made dependent on detection of an audio signal in the relevant input signal.
In this embodiment the estimation is made dependent on the detection of an audio signal, such as a speech signal. If speech is detected the estimation of noise frequency components is based on the previous not updated noise frequency component. If no speech is detected and only noise is present in the relevant input signal the estimation of the noise frequency components is based on an updated previous noise frequency component.
A following embodiment of the method according to the invention is characterized in that the method uses spectral subtraction like techniques to suppress noise.
Spectral subtracting is preferably used in case noise reduction is contemplated, such as in speech related applications.
At present the method, audio processing device and communication device according to the invention will be elucidated further together with their additional advantages while reference is being made to the appended drawing, wherein similar components are being referred to by means of the same reference numerals. In the drawing:
a and 3b show noise estimator diagrams to be implemented in the audio processor for application in the audio processing device according to the invention, with and without speech detection respectively; and
One could estimate the stationary noise magnitude spectra at the inputs of the adaptive beamformer, and calculate the (non-stationary) noise magnitude spectrum at the summing device output using current beamformer coefficient values. This, however, is costly due to the expensive M spectral transformations required for each beamformer input signal u1, u2, . . . uM.
a and 3b show respective noise estimator diagrams to be implemented in the generally programmable audio processor 5 far application in the present multi input audio processing device 2, with and without speech detection respectively.
If the audio processing device 2 is provided with an audio or speech detector having a switch 7, (k;1B) therefrom in a way to be explained later. Z−1 represents a Z-transform delay element. So it can be derived that if no speech is detected update takes place in accordance with:
(k;1B)=NS{(1−α)[Pin(k;1B)−
(k;1B−1)]}
where α is a memory parameter and NS is a function which represents the behavior of the noise spectrum estimator 6.
m=M(k;1B)=Σ|Fm(k;1B)|
m(k;1B)
m=1
and that:m(k;1B)=max[
m(k;1B−1)+δ(k;1B)μ(k;1B)|Fm(k;1B)|,c]
for all k, with m=1 . . . M, μ(k;1B) being the adaptation step size. So there are no updates smaller than c (c being a small non-negative constant), and for each input signal um a previous estimate of the actual spectrum m(k;1B) is being stored in the delay element Z−1 for later use thereof. Herewith every branch output signal provides information about the noise characteristics of every individual input signal without excessive frequency transformation calculations being necessary. In the down position of the switch 7, in case speech is being detected the noise spectrum estimator 6 still provides the latest actual noise estimate for noise suppression purposes.
b depicts the situation in case no speech detector is present. The embodiment of
Ps(k;1B)=α(1B) Ps(k;1B−1)+(1−α(1B)) Pin(k;1B)
For all k. The memory parameter α(1B) is chosen according to:
α(1B)=αup if Pin(k;1B)≧Ps(k;1B) else α(1B)=αdown
Here αup is a constant corresponding to a long memory (0<<αup<1) and αdown is a constant corresponding to a short memory (0<αdown<<1). Thus the recursion favors ‘going down’ above ‘going up’, so that in effect a minimum is tracked. Generally the step size μ(k;1B) is chosen in the FSB case according to:
and in the WSB case such that:
which may reduce to μ=1 if certain adaptive algorithms are being used having the property that the denominators of the two above expressions equal 1, such as disclosed in EP-A-0954850. The estimation update term δ(k;1B) is chosen according to: if Ps(k;1B)≧(k;1B−1) then (condition is true)
δ(k;1B)={q(1B)−1}(k;1B−1);q(1B+1)=q(1B)×INCFACTOR
else (condition is not true)
δ(k;1B)=Ps(k;1B)−(k;1B−1);q(1B+1)=INITVAL
Herein at a sampling rate of 8 KHz with data blocks B=128, one can take INCFACTOR=1.0004 and INITVAL=1.00025. With this mechanism (k;1B) is only effectively increased when the measured spectrum Ps(k;1B) is larger for a sufficiently long period of time, i.e. in situations wherein the noise has really changed to a larger noise power.
Whilst the above has been described with reference to essentially preferred embodiments and best possible modes it will be understood that these embodiments are by no means to be construed as limiting examples of the devices concerned, because various modifications, features and combination of features falling within the scope of the appended claims are now within reach of the skilled person.
Number | Date | Country | Kind |
---|---|---|---|
00201879 | May 2000 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
5574824 | Slyh et al. | Nov 1996 | A |
5602962 | Kellermann | Feb 1997 | A |
6339758 | Kanazawa et al. | Jan 2002 | B1 |
Number | Date | Country | |
---|---|---|---|
20020013695 A1 | Jan 2002 | US |