In hearing devices, such as hearing aids, background noise is detrimental to the intelligibility of speech sounds. Most modern hearing devices address this issue by introducing noise reduction processing technology into the microphone output signal paths. The aim is to increase the Signal-to-Noise (SNR) ratio available to listeners, hence improve clarity and ease of listening to the hearing device wearer.
The success of noise reduction processing often depends greatly on the formation of appropriate reference signals to estimate the noise, the reason being that the reference signal is used to optimize an adaptive filter that aims to eliminate the noise, ideally leaving only the target signal. However, such reference estimates are often inaccurate because most known techniques, such as Voice Activity Detection, are susceptible to errors. In turn, such inaccuracies lead to inappropriate filtering and degradation in the output quality of processed sound (target distortion), particularly at low SNR where noise reduction functions are most needed.
There remains a need for improved noise reduction methods and systems.
In a first aspect the present invention a noise reduction method for reducing unwanted sounds in signals received from an arrangement of microphones including the steps of: sensing sound sources distributed around a specified target direction by way of an arrangement of microphones to produce left and right microphone output signals; determining the magnitude or power of the left and right microphone signals; attenuating the signals based on the difference of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals.
The method may further include the steps of: determining the sum of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals, wherein the step of attenuating the signals may be further based on a comparison of the difference of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals with the sum of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals.
The step of attenuating the signal may be based on the ratio of the difference of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals to the sum of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals.
The step of attenuating may be based on one minus the ratio.
The step of attenuating may be based on a transformation of the ratio.
The step of attenuating may be based on one minus the transformation of the ratio.
The difference of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals may be time-averaged.
The sum of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals may be time-averaged.
The step of time-averaging may include asymmetric rise and fall times
The step of attenuating may be frequency specific.
The step of attenuating may include determining the attenuation of low frequencies from other frequency bands.
The step of attenuating may include determining the attenuation of selected frequencies based on the magnitude or power of the difference between the left and right microphone signals or a value derived from the magnitude or power of the difference between the left and right microphone signals.
The selected frequencies may be low frequencies.
The attenuation may be scaled by a function.
Unwanted reduction of target output level in high noise levels may be eliminated through an estimator of the amount of noise being eliminated.
An estimator of the amount of noise being eliminated over a frequency range of interest may be derived from the maximum attenuation applied across that range.
In a second aspect the present invention provides a system for reducing unwanted sounds in signals received from an arrangement of microphones including: sensing means for sound sources distributed around a specified target direction by way of an arrangement of microphones to produce left and right microphone output signals; determination means for determining the magnitude or power of the left and right microphone signals; attenuation means for attenuating the signals based on the difference of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals.
The determination means may be further arranged to determine the sum of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals; and the attenuation means may be further arranged to attenuate the signals based on a comparison of the difference of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals with the sum of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals.
The attenuation means may be arranged to attenuate the signals based on the ratio of the difference of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals to the sum of the magnitudes or powers or values derived from the magnitudes or powers of the left and right microphone signals.
The attenuation means may be arranged to attenuate the signals based on one minus the ratio.
The attenuation means may be arranged to attenuate the signals based on a transformation of the ratio.
The attenuation means may be arranged to attenuate the signals based on one minus the transformation of the ratio.
In some embodiments, this signal processing technique reduces interference levels in spatially distributed sensor arrays, such as the microphone outputs available in bilateral hearing aids, when the desired target signal arrives from a different direction to those of interfering noise sources. In the field of hearing, this technique can be applied to reduce the effect of noise in devices such as hearing aids, hearing protectors and cochlear implants.
Embodiments of the invention provide an improved and efficient scheme for the removal of noise present in microphone output signals without the need for complex and error-prone estimates of reference signals.
Some embodiments may be used in an acoustic system with at least one microphone located at each side of the head producing microphone output signals, a signal processing path to produce an output signal, and means to present this output signal to the auditory system.
Embodiments of the present invention will now be described, by way of examples only, with reference to the accompanying drawings, in which:
The following description of an embodiment is presented for microphone output signals from the left and right sides of the head. The desired sound source to be attended to is presumed to arrive from a specific direction, referred to as the target direction. In the preferred embodiments, multiband frequency analysis is employed, using for example a Fourier Transform, with left and right channel signals XL (k) and XR (k), respectively, where k denotes the kth frequency channel.
Referring to
The outputs from detection means in the form of the left 101 and right 102 microphones are transformed into multichannel signals using an analysis filter bank block, 103 and 104, for example using a Fourier Transform to produce left and right signals XL(k) and XR(k) respectively.
The method then proceeds in the following manner:
1. Measure left and right microphone powers (in each frequency band). Power for each channel in the left and right signals are independently determined by way of determination means 105 and 106.
2. Calculate Ppm, the difference of microphone powers (assumed to contain the difference between L and R ear noise, and little target because that cancels). The absolute value of PDIF is calculated at 107. That is to say, PDIF always has a positive value.
3. Calculate PSUM, the sum of difference powers (which contains 2×target and L and R noise components).
4. Time average PDIF and PSUM (optionally with asymmetric rise fall times) by accumulating these values over time using integration processes, 108 and 110, respectively.
5. Calculate “attenuation” u(k) at 111 which equates to 1−(PDIF/PSUM), which is an estimate of how much the microphone power needs to be scaled back to better approximate the target-only component. Optionally the ratio (PDIF/PSUM) may by modified by a scaling function prior to subtracting it from one.
6. Alter the strength of noise reduction by applying a mapping function that translates “attenuation” to arrive at a set of filter weights W(k) In the preferred embodiment the mapping function takes the form of raising “attenuation” to a fixed power, with a default value of 2.6. The value of the fixed power coefficient may be application dependent, user selectable.
7. For low frequencies, there remains the problem that the head provides little attenuation between ears, which leaves much of the noise in that region. To address that problem the very low frequencies are scaled down by an additional factor that is determined from other frequency regions such as a power-weighted average, or alternatively the maximum, of the attenuation applied to the frequencies in the 500-4000 Hz range).
At 112 the left and right signals XL(k) and XR(k) are added together. The filter weights W(k) are applied to the combined signal from block 111 by programmable filter 113 to yield output signal Z(k).
A broadband time-domain signal is optionally created using a synthesis filter bank, 120, for example using an inverse Fourier Transform, and may benefit from further processing such as adjustment of spectral content or time-domain smoothing depending on the application, as will be evident to those skilled in the art.
In the method described above the left and right signals are added together to produce a monaural signal before the channel weight is applied. This provides an additional SNR gain at the expense of the loss of left and right directional cues. An alternative would be to apply the weight to left and right signals separately to retain directional information. Intermediate to those options, in an alternative implementation, ipsilateral and contralateral signals may be weighted unequally before addition to achieve the desired trade off of additive SNR gain and directional cue retention. Such additive weighting may be fixed, or dynamically determined, for example from the channel attenuation.
The following formulae are applied in the method conducted by system 100.
The power in each channel for signals from microphones located on the left and right sides of the head is calculated as follows:
P
L(k)=XL(k)×*XL(k) Eq. 1
P
R(k)=XR(k)λ*XR(k) Eq. 2
Eq. 1 and Eq. 2 describe the situation for which the target direction corresponds to the direction in which the head is orientated. Optionally the target direction can be altered by filtering the left and right microphone signals. Although the target direction can be specified by the user, it should be obvious to those skilled in the art that an automated process can also be used.
PDIF is calculated as follows:
P
DIF
=|P
R(k)−PL(k)| Eq. 3
PSUM is calculated as follows:
P
SUM
=P
R(k)+PL(k) Eq. 4
The time-averaged values of PDIF and PSUM are determined in the preferred embodiment using leaky integration with asymmetric rise (τr) and fall (τf) times as follows:
if (PDIF(k)<
else
DIF(k)=
if (PSUM(k)<
SUM(k)=
else
SUM(k)=
Alternative time-averaging methods can be used.
The level of attenuation is calculated as follows:
u(k)=1−(
Optionally, the ratio (
w(k)=u(k)S Eq. 8
Alternative methods to produce the desired strength of noise reduction w(k) from the ratio of (
The channel weighting values W(k) are applied to the combined channel signals XL(k) and XR(k), to produce the channel output signal:
Z(k)=W(k)(XL(k)+XR(k)) Eq. 9
Alternatively, the desired retention of directional information can be achieved by retaining partial independence of the left and right ear signals to produce a stereophonic output:
ZL(k)=W(k)(XL(k)×Yipsi+XR(k)×Ycontra)
ZR(k)=W(k)(XL(k)×Ycontra+XR(k)×Yipsi) Eq. 10
Further noise reduction and improved quality of the output signal is derived from an estimator of how much noise is being removed in the frequencies most important to voiced speech intelligibility between 500 Hz and 4 kHz. In the preferred embodiment that estimator is calculated as the largest of the attenuation values applied in the 500-4000 Hz speech range:
W
max=maxk(W(k)) Eq. 11
Wmax is used in the preferred embodiment to determine additional attenuation to be applied to frequency channels below a few hundred Herz, for which the head is an ineffective barrier. In addition it is used to adjust a slow varying AGC that minimises target level reduction that otherwise increases as noise levels increase relative to the target. Alternative metrics to Wmax, such as the power-weighted average of the attenuation applied to the frequency channels in the 500-4000 Hz speech range, may be used in a similar manner.
It will be evident to those skilled in the art that although the example implementation is described in terms of a target direction that is normal to the microphone configuration, i.e. in the “look direction” of a listener wearing a microphone at each ear, the desired target direction can be altered by filtering the left and right ear inputs prior to application of the noise reduction.
In the embodiment described above the power of the microphone signals was determined and then a degree of attenuation in the form of filter weights was calculated based on the power values. Similarly, in other embodiments the magnitude of the signals may be determined. The degree of attenuation may be calculated based on the magnitude values. In other embodiments, the degree of attenuation may be calculated based on values derived from the magnitude or power values.
In a variation to the embodiment described above there may be provided an option to make the attenuation also dependent on phase, rather than amplitude (powers or magnitude) alone. In practice, this new option is used only in low frequency regions where power/magnitude differences between ears can be too small to be effective. In low frequency bands using the new approach, not only are the powers of the left and right signals required, but also the left and right signals need to be subtracted, and the power of their difference (as opposed to the difference of the powers) needs to be calculated.
Referring to
The method then proceeds in the following manner:
1. As described in steps 1-3 for System 100, calculate the values of PSUM, and PDIF from the left and right power values determined by way of power determination means 205 and 206, and absolute value determination means 207.
2. Subtract the left and right signals, XL(k) and XR(k), and calculate VDIF, the power of the complex vector difference using determination means 208.
3. Calculate the preliminary attenuation a(k) values at 209 using PDIF, PSUM, and optionally VDIF. In the preferred embodiment high frequency bands are processed only using PDIF and PSUM according to: a(k)=1−(PDIF/PSUM), and attenuation for low frequency bands incorporates an additional factor dependent on VDIF according to:
a(k)=1−(PSUM×(PDIF+VDIF)−(PDIF×VDIF))/(PSUM*PSUM).
4. Optionally alter the strength of the preliminary attenuation to produce the attenuation by applying a mapping function. The mapping function need be neither linear nor time-invariant. In the preferred embodiment, the mapping function is a frequency dependent threshold function that inhibits attenuation above threshold.
5. Time average the attenuation by accumulating its values over time using integration process 208.
6. Optionally alter the strength of the time-averaged attenuation using a further mapping function to produce attenuation values u[k] using for example a power function with a fixed coefficient. The value of the fixed power coefficient is application dependent, and may be user selectable. In the preferred embodiment, the mapping function is unity for low frequency bands that incorporate VDIF dependence, and equal to 2 otherwise.
The introduction of VDIF dependence for low frequencies in system 200 eliminates the need for the additional attenuation factor described in system 100 for very low frequencies. The output weights W[k] determined in system 200 can be used to scale the left and right signals XL(k) and XR(k) in the same manner as described for system 100.
The following formulae are applied in the method conducted by system 200:
PL(k) is calculated according to Eq. 1
PR(k) is calculated according to Eq. 2
PDIF is calculated according to Eq. 3.
PSUM is calculated according to Eq. 4.
VDIF is the power of the vector difference between left and right signals, calculated as:
V
DIF=(XL(k)−XR(k)×*(XL(k)−XR(K)) Eq. 12
For high frequency bands the preliminary level of attenuation is calculated as follows:
a(k)=1(|PDIF|/PSUM) Eq. 13
Note that in contrast to Eq. 7, PDIF and PSUM have not been smoothed
For low frequency bands, the preliminary attenuation is determined according to:
a(k)=1−PSUM*(PDIF+VDIF)−(PDIF×VDIF))/(PSUM×PSUM) Eq. 14
Where Re(VDIF) is the real part of the complex power VDIF.
The time-averaged value of a[k] is determined in the preferred embodiment using frequency-dependent leaky integration as follows:
a
Alternative time-averaging methods can be used.
The time-averaged level of attenuation in the preferred embodiment described in System 200 is further modified by raising a[k] to a fixed frequency-dependent power coefficient as follows:
w(k)=
Alternative methods to produce the desired strength of noise reduction w(k) may be used.
It will be clear to those skilled in the art that alternative measures that exhibit phase-dependence between left and right signals may be used instead of VDIF to enhance performance in the low frequency bands.
In various embodiments, the boundary between high and low frequencies is dependent upon the particular application. The boundary between high and low frequencies may vary in the range between 500 Hz and 2500 Hz. In the detailed embodiment described above, a value of 1000 Hz may be used.
Any reference to prior art contained herein is not to be taken as an admission that the information is common general knowledge, unless otherwise indicated.
Finally, it is to be appreciated that various alterations or additions may be made to the parts previously described without departing from the spirit or ambit of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2013900843 | Mar 2013 | AU | national |
The present invention relates to a noise reduction method and to systems configured to carry out the method. Embodiments of the invention represent improvements upon, or alternatives to, methods or systems described in applicant's international patent application no PCT/AU2011/001476, published as WO2012/065217, the contents of which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/AU2014/000178 | 2/26/2014 | WO | 00 |