This invention relates to the field of digital signal processing, and in particular to a method of reducing noise in a signal that may contain speech, for example in telephony, when operating in high noise environments.
Noise cancellation is a crucial feature for acoustic echo cancellers when operating in high noise environments, such as the mobile telephone environment. For example, the ambient noise in an automotive environment is higher than that in other environments due to engine and road noise. Due to the elevated noise level the voice signal can become unintelligible. Under these conditions noise reduction can significantly improve the voice quality of a call.
The most common and effective method for noise reduction is spectral subtraction as described in J S. F. Boll: “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Trans. on Acous. Speech and Sig. Proc., 27, 1979. pp. 113-120, the contents of which are herein incorporated by reference. However, the spectral subtraction requires a transform (FFT or DCT are commonly used) to separate speech and background noise in a spectral transform domain. The noise spectrum is subtracted in each spectrum sub-band so that clean speech can be preserved. These transforms require a lot of computation power and are therefore costly to implement.
In the present invention, noise subtraction is done purely in the time domain so no transforms are required. The invention solves the problem of how to reduce background noise while minimizing the speech distortion. The method can also be applied to any spectral subtraction method where the inventive method can be applied to each sub-spectrum.
The noise reduction method includes accurate noise level measurement both when speech is dominant and not present, and achieves noise reduction without deteriorating the incoming speech. The inventive method can also be applied to any spectral subtraction method where the same implementation can be applied to each individual spectrum sub-band.
In one aspect the invention provides a method of reducing noise in an input signal that may contain speech, comprising obtaining a noise level estimate signal; comparing the level of said input signal with said noise level estimate signal to determine whether speech is dominant; and applying less aggressive noise reduction to said input signal when speech is dominant than when only noise is present.
In a preferred embodiment the noise estimate signal is obtained by accumulating the magnitude of the incoming signal over a predetermined number of samples to obtain an updated noise level signal; comparing the updated noise level signal with an incremented previous noise level estimate signal; and if the updated noise level signal is larger the incremented previous noise level signal, using the updated noise level as the current noise level signal, and if the updated noise level signal is smaller than the incremented previous level estimate signal, decreasing the noise level signal with a large step, whereby the noise level estimate signal has a slow ramp-up speed and a fast ramp-down speed.
In another aspect the invention provides a method of reducing noise in an incoming signal, comprising deriving an estimate of the noise level; detecting the level of the incoming signal; comparing the level of the incoming signal with the estimate of the noise level to determine whether speech is dominant; and applying an appropriate level of noise reduction based on said comparison.
The invention also provides a noise reduction circuit for an input signal that may contain speech, comprising a noise level detector block for producing a noise level estimate output signal; a level detector block for producing a signal level output signal; a parameter selector block for detecting the presence of dominant speech in said input signal based on outputs of said level detector block and said noise level detector block, and setting different noise reduction parameters depending on whether dominant speech is present or not; and a noise reduction block deriving a noise reduced output signal from one or more of the incoming signal, the signal level output signal, and the noise level estimate signal using parameters selected by said parameter selection block.
The invention is particularly applicable to acoustic echo cancellers, where it serves as an extremely low MIP (million instructions per second) noise reduction algorithm. This algorithm provides a simple and effective noise reduction without relying on spectral subtraction and hence removes the need for compute intensive transforms. It finds particular utility in an acoustic echo canceller chip.
The invention will now be described in more detail, by way of example only, with reference to the accompanying drawings, in which:
Detailed of the noise level detector will first be described with reference to
The updated noise level signal 104 is the accumulated result of input signal magnitude |Yin| over 128 samples. Its output is limited from 0 to a saturated number based on number of bits used, which in the case of a 16 bit representation of the input signal would be 32767.
On the rising edge of the comparator 101, when the output of the counter 102 reaches the set threshold value of 128, the updated noise level output from the memory 105 is compared with a new pre-scaled noise level, which is a previous prescaled noise level incremented by a small amount. The noise level is scaled by multiplier 112, which has a recommended multiplier factor η=1.002.
If the newly updated noise level is larger than the new incremented pre-scaled noise level, then the new pre-scaled noise level is used as the current prescaled noise level. If the updated noise level is smaller than the new pre-scaled noise level, then the current noise level is decreased with a large step (0.75*noise_level+0.25*new_calculated_value).
This ensures that the prescaled noise level has slow ramp-up speed and fast ramp-down speed. The objective is to maintain a noise level estimate that will not be affected by incoming speech signals. This ensures that the noise level always traces low level noise during a speech active period.
The final noise level estimate is the scale version of the prescaled noise level (the recommended scale is 0.026) plus an offset. The offset typically varies from 3 to 7 depending on the codec being used for the digital conversion of the speech signal. It also compensates any rounding inaccuracy when the noise level is very small.
The noise reduction unit shown in
The signal level detection blocks tries to find the instantaneous peak level of the signal. It operates as follows:
The parameter selection bock 202 compares the level of Yin with the noise level 115 scaled by a factor γ, which should be around 2 or 3. If the Level of Yin is larger, it means that speech is dominant and less noise reduction should be applied with parameters α and β being α2 and β2. The recommended values for α2 and α2 are α2=0.5 and β2=0.25. Otherwise, if the Level of Yin is smaller than the scaled noise level, it means that only noise is presented and more aggressive noise reduction should be applied with parameters α and β being α1 and β1. The recommended values for α1 and β1 are α1=1 and βa1=0.0625). For better subjective speech quality, a soft parameter switch should be used while α and β switched from speech period α2 and β2 to non-speech period α1 and β1.
The last block 203 is the output selection block. This generates the noise reduced output signal. This signal comes from one of four different values determined by three switch gate selectors 211, 212, and 213 controlled in turn by three comparators 214, 215, and 216. The output selection block functions follows:
When (|Yin|>4αNoise Level), the comparator 214 output is low and the switch gate 211 is set at selection 0. This indicates a strong speech signal and the output 220 takes Yin as bypass.
If 2α(Noise Level)<|Yin|<4α(Noise Level), the comparator 214 output is high and comparator 215 output is low. The switch gate 211 is set at selection 1 and the switch gate 212 is set at selection 0. The output is sign(Yin){|Yin|−0.5α(noise Level)}.
If {α(Noise Level)+β|Yin|}<|Yin|<2α(Noise Level), the outputs of comparators 214 and 215 are high and comparator 216 output is low. Both switch gates 211 and 212 are set at selection 1 and the switch gate 213 is set at selection 0. The output 220 is sign(Yin){|Yin|−α(Noise Level)}.
If |Yin|<{α(Noise Level)+β|Yin|}, the outputs of all comparators (214, 215, and 216 are high and all switch gates (211, 212, and 213) are set at selection 1. The output is βYin, which means that the signal will never be reduced below that level.
In this way more aggressive noise reduction is applied when dominant speech is absent.
The described method offers a simple low cost implementation of a noise reduction unit and provides a simple and effective noise level estimator for speech signals, particularly in echo canceller integrated circuits.
This application claims the benefit under 35 USC 119(e) of U.S. provisional application No. 60/707,123, the contents of which are herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60707123 | Aug 2005 | US |