1. Field of the Invention
The present invention relates in one embodiment to a dual microphone end firing array. A standard headset consists of an ear cup with a microphone pickup arm. In this invention embodiment two microphones are attached to the ear cup and are configured as an end firing array. The end firing array suppresses unwanted sounds using an adaptive spectral method and spectral subtraction. According to a second embodiment, automatic calibration of an end-firing microphone array is provided.
2. Background for Arrays
Current ending firing implementations create directional nulls in the directivity pattern of the microphone array. In a reverberate environment the noise source may not come from a single direction. Accordingly, what is needed is an effective and simple system and method to improve the pickup of desired audio signals.
In a system with 2 or more microphones, which has subsequent signal processing, a proper balance of the microphones may be required for the subsequent signal processing to perform within design parameters. Existing solutions to the balance problem are: a) careful selection of matched microphones; or b) manual calibration by injection of a diagnostic tone, measurement, and persistent storage of the compensation coefficient. In contrast, a present invention embodiment features an automatic and continuous calibration of an unmatched pair of microphones arranged in a known configuration to be used with an input source in a known location, e.g. a pair of headphones.
A standard headset consists of one (or two) ear cups and a microphone pickup arm. The arm positions a microphone in front of the user's mouth and picks up the user's voice and back ground noises. Placing the microphone close to the wearer's mouth allows the user's voice to be heard over most background sounds. The drawback is that these headsets can be annoying to wear.
Reference Signals
In this application we propose a method to replace the microphone arm with microphones placed on the ear cup of the headset. To reduce the background noise and improve the near field voice pickup we use an end firing dual microphone array. The microphones are configured to create two cardioid arrays. The null of the rear facing cardioid is positioned to point in the direction of the desired signal and the front cardioid's null points in the opposite direction. The rear facing cardioid signal is used as a reference signal to determine any similarities with the front cardioid. We then subtract any similarities knowing that the front facing cardioid is the only signal that contains direct speech. We use a frequency based adaptive method to estimate these similarities with the adaption updating only when there is no direct speech detected. For residual suppression we use spectral subtraction. Spectral subtract is also used when speech is detected to remove background noises.
End Firing Algorithm
The end firing arrays have been used in Bluetooth headsets and hearing aids to pick up the user's voice while helping to suppress the back ground noise. In this section we review some of the current methods. One method can be described as a null-forming scheme where a null in the directivity pattern is steered in the direction of the noise source. Another method, however, is to create a noise reference signal from the array which is then subtracted from the desired signal. We begin by discussing the cardioid array as we use this array.
Cardioid
Two omnidirectional microphones can be used to create a directional microphone by adding a delay τ to one of them. For example, in
If we let the delay
where c is the speed of sound and d is the distance between the microphones, then we get the directivity pattern shown in
We saw that we can steer a null to a certain direction by adjusting the delay. In a digital system this would require us to implement a fractional interpolator, we can avoid this if we use two omnidirectional microphones to form front and back dual microphone end firing arrays, see
y(t)=c1(t)−b*c2(t) [2.4]
then for b=1 we get
If we let
then the expressions for the front and rear cardioid signals are
where k is the wave number k=ω/c.
In
for b, where μ is a small parameter and Σ(cr2) is the averaged smoothed power of cr. In the above method we have a single parameter b that we can change to reduce the ambient noise. If we band pass the signals, creating say 8 bands then we can use 8 coefficient b0, b1, . . . , b7 and the LMS algorithm 2.7 for each band to adjust bi, to improve the suppression. This method can be used to create nulls in each of the bands.
In
All microphone have some residual noise and by equalizing the cardioid end array this residual noise will be amplified. The closer the microphones are placed together the greater the amplification needed. If we double the spacing between the microphones from 1 cm to 2 cm, for example, we boost the cardioid signal by about 6 dB, requiring less equalization gain. But as the delay τ is determined by the sample rate for convenience the microphones separation is τ*c, where c is the speed of sound.
Up until now we have assumed that the microphones have ideal, flat responses. This is far from true and methods are sometimes need to compensate for the variability. Microphones manufactures normally specify sensitivity at 1 kHz and provide an envelop of variability for frequencies about and below this value. The variability can exceed 10 dB for some frequency values and this will obviously affect the performance of the suppression of the cardioids. In a later section we shall describe a method to match the microphones in a number of different bands and the method used to suppress unwanted noises.
To achieve the foregoing, the present invention provides an effective and simple system. According to one embodiment, the energy of the rear and front cardioid is used to determine if the adaptive filter should be updated. A polyphase filter bank separates the front and rear cardioid signals into spectral bands. The rear signal is used as a reference to spectral subtract it from the desired signal in an adaptive manner. Also we keep a history of the reference signal so we can cancel reflected noise sounds up to the length of this history. This provides an improvement over existing devices that use an end firing array type to steer a null to the direction of the sound, for example in a hearing aid or blue tooth headset. This implementation uses the rear reference signal to quickly suppress the unwanted noise by spectrally suppressing the unwanted sounds.
According to a second embodiment, Automatic Calibration of an End-firing Microphone Array is provided. This embodiment features an automatic and continuous calibration of an unmatched pair of microphones arranged in a known configuration to be used with an input source in a known location, e.g. a pair of headphones. The benefits include:
a) careful selection of matched microphones is not required
b) manual calibration at the point of production (factory) is not required
c) manual calibration by the end user (customer) is not required
d) persistent storage of the compensation coefficient is not required
Applications include consumer electronics and industrial electronics. These and other features and advantages of the present invention are described below with reference to the drawings.
Reference will now be made in detail to preferred embodiments of the invention. Examples of the preferred embodiments are illustrated in the accompanying drawings. While the invention will be described in conjunction with these preferred embodiments, it will be understood that it is not intended to limit the invention to such preferred embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
Current ending firing implementations create directional nulls in the directivity pattern of the microphone array. In a reverberate environment the noise source may not come from a single direction. We use the energy of the rear and front cardioid to determine if the adaptive filter should be updated. A polyphase filter bank separates the front and rear cardioid signals into spectral bands. The rear signal is used as a reference to spectral subtract it from the desired signal in an adaptive manner. Also we keep a history of the reference signal so we can cancel reflected noise sounds up to the length of this history. In short, in the first embodiment we provide an improved system and method using the rear signal as a reference and spectral implementation.
To reduce the background noise and improve the near field voice pickup we use an end firing dual microphone array. The microphones are configured to create two cardioid arrays. The null of the rear facing cardioid is positioned to point in the direction of the desired signal and the front cardioid's null points in the opposite direction.
The rear facing cardioid signal is used as a reference signal to determine any similarities with the front cardioid. We then subtract any similarities knowing that the front facing cardioid is the only signal that contains direct speech. We use a frequency based adaptive method to estimate these similarities with the adaption updating only when there is no direct speech detected. For residual suppression we use spectral subtraction. Spectral subtract is also used when speech is detected to remove background noises. In a previous section we described how to create nulls in the directivity pattern using two cardioids; we also showed how to do this for different frequency bands by band passing the cardioid signals. When the user is in a enclosed environment the noise source is reflected and its reverberant energy can be high causing the noise to persist and come from multiple directions. In this section we describe a different method where we do not try to steer a null but instead use the rear cardioid as a reference signal which we subtract from the front cardioid. We do this in an adaptive method using a sub band spectral method. Each of the spectral bands have a history which is used to try and suppress reflected sounds and reverberate tails. When the front facing cardioid points towards someone talking (the user) their speech will be in the null of the rear facing cardioid array. Therefore the rear array will pick up ambient noises and reflected user speech. The front facing array picks up user speech, reflected user speech and ambient noise. The rear facing array signal can be used to reduce the ambient noise and reflected speech in the front facing signal to improve speech intelligibility. In this case we are not trying to create a null in the direction of the noise source but are instead using the rear facing end firing signal as a reference signal which we wish to subtract from the front facing signal.
Adaptive End Firing Algorithm
In
UpdateSwitch Signal Detect
The signal detect routine uses the magnitude of the front cardioid signal to calculate Xs and Xf where
g=(Xs<|x(n)|)?G0s:G1s; [4.1]
and
Xs=g*Xs+(1−g)*|x(n)| [4.2]
and for Xf
g=(Xf<|x(n)|?G0f:G1f; [4.3]
and
Xf=g*Xf+(1−g)*|x(n)| [4.4]
where G0f≦G1f=G0s≦G1s; So the signal Xf adapts to variations in ∥x∥ more quickly than Xs. So when Xs≦Xf there is a signal, see
NoiseFloor=MIN(Xf,NoiseFloor) (1+ε) [4.5]
where ε≧0 is some small positive number used to keep the noise floor from freezing at a particular, see
VAD=(Xf>MAX (Xs*(1+ε1), MAX(NoiseFloor*(1+ε2)MAGNITUDETHRESHOLD))) [4.6]
where ε1 and ε2 are small positive numbers and MAGNITUDETHRESHOLD is the minimum signal magnitude. We also use these signals to determine when the signal is back ground noise (DBGN),
DBGN=(Xf<NoiseFloor*(1+ε3)) [4.7]
where ε3 is some small positive number, see
UpdateSwitch Adaptive Filter Switch
We begin by calculating the energy of the rear and front cardioid to determine whether the sound is in front or behind. The front signal's energy contains the users speech. Let Ef(m) and Er(m) be the energy of the front and rear at frame m so
We then smooth these energies
SmR=λSmR+(1−λ)Er(m) [4.10]
SmF=λSmF+(1−λ)Ef(m) [4.11]
So when SmR and SmF are similar both contain ambient noise and then can be little or no user speech. For local speech we estimate the front energy must be greater that 105% of the rear energy. In
SW=(SmF*G<SmR)?1:0; [4.12]
Analysis Filter Bank
The whitened cardioid signals are fed into a Polyphase filter bank creating two spectral sets of data. We whiten the signal first using
w(n)=x(n)−λx(n−1) [5.1]
to help decorrelate it. This helps the LMS algorithm to converge. After the synthesis reconstruction filter we do the inverse, that is
y(n)=λy(n−1)+w(n) [5.2]
to remove this whitening and get the correct time domain signal. The filter bank has been designed to have 16 bands in the Nyquist interval for a sample rate of 16 k Hz. In
Let h0(n) be the prototype filter so its z transform is
where N is the length of the filter. To create band pass filters at the frequencies 2πm/M for 0≦m<M we spectral shift h(0)(k) to create hk(n)
hk(n)=h0(n)WMkn [5.4]
where k=0, 1, . . . M1, M is the number of bands, and
Taking the z transform of this filter we get
If we now let n=q*M+m, where
we can express Eq 4.3 as
Which we can write as
Thus we can implement the filter bank using polyphase filtering and a FFT. The matrix in the above expression is in a Winograd form.
Adaptive Filter
We only want to update the adaptive coefficients when we detect ambient noise or when the rear signal is dominant, otherwise we might adapt the filters to subtract users speech. We therefore freeze the adaption if we detect local speech and this is determined by the adaptive switch. If we let F(k)=Fr(k)+iF i(k) and R(k)=Rr(k)+iRi(k) be the spectral band values for the front and rear cardioids then the estimated error is
where C(k) are the complex coefficient and are updated using the normalized 1 ms method
where ε(k) can vary as a function of the band number and
Err(k)=F(k)−E(k). [5.9]
In
Residual Error Suppression
We use the method of spectral subtraction to subtract the rear ambient noise estimated from the front array signal. We use two different noise floor estimates Ns[band] and Ne[band]. Ns is used when the 1 ms subtraction has been active and no user speech as been detected. The other estimate is used when the speech counter is greater than zero. This counter is decreased each time no speech is detected or set to the maximum every time it is. This counter determines a minimum speech interval but in that interval the signal may still contain speech pauses. We measure the noise floor and update for every bands during a speech pauses and the BackGroundNoise flag is true. We therefore have the following two cases:
if (BackGroundNoise and SW){Ns[band]=αNs[band]+(1−α)|Err[bands]|2;}otherwise
if (BackGroundNoise){(Ne[band]=αNe[band]+(1−α)|Err[bands]|2;}.
To subtract this estimate from the bands we uses spectral subtraction. If E(k) is the energy of spectral band k we define
We now smooth these gains using
SmGS(k)=γSmGS(k)+(1″γ)gS(k) [5.12]
And
SmGF(k)=γSmGF(k)+(1−γ)gF(k) [5.13]
where 0<γ<1. We then adjust the spectral band k using
Error(k)=SmGS(k)Error(k)
Or
Error(k)=SmGF(k)Error(k)
We also initialize these gains to typical values to reduce possible artifacts.
According to a second embodiment, an apparatus and method for performing automatic and continuous calibration of an unmatched pair of microphones arranged in a known configuration and with an input source (human speaker, hereafter “talker”) in a known location is provided. The amplitudes of the signals from the 2 microphones are continuously monitored. The talker is in a known location relative to the microphone pair, so the expected amplitude difference between the signals at the 2 microphones can be pre-determined, and compensated for. The talker is differentiated from input signals in other locations by applying simple heuristic metrics to the input pair. A compensating gain coefficient is derived from the relative amplitudes of the 2 microphone signals, and averaged over the long term. The averaged compensating gain is applied to one of the microphone signals to provide balanced input from the talker.
Even if the mechanism for distinguishing the talker from other input sources is fooled by some non-well-formed input signal, the long term averaging of the compensating gain coefficient will keep the system from following the errant input too quickly, and will keep the system tending towards nominal and correct operation, as the normal input conditions are likely to occur more frequently than the abnormal conditions.
Several advantages are provided by the novel system:
The continuous, long term compensation for mismatched microphones provides:
Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2014/043045 | 6/18/2014 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/205141 | 12/24/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20030031328 | Elko | Feb 2003 | A1 |
20080019548 | Avendano | Jan 2008 | A1 |
20080260175 | Elko | Oct 2008 | A1 |
20090106021 | Zurek et al. | Apr 2009 | A1 |
20150213811 | Elko | Jul 2015 | A1 |
Number | Date | Country |
---|---|---|
0652686 | May 1995 | EP |
1278395 | Jan 2003 | EP |
2007106399 | Sep 2007 | WO |
Number | Date | Country | |
---|---|---|---|
20160142815 A1 | May 2016 | US |
Number | Date | Country | |
---|---|---|---|
61836652 | Jun 2013 | US |