The present invention relates to the field of adaptive noise reduction control with a microphone array, particularly to a method and a device for noise reduction control using a microphone array.
Wireless mobile communication technologies and devices have been applied widely in daily life and work, releasing space-time constraints in communications and offering great convenience for people. However, since there is no space-time confinement, communication environment may be complex and variable, which includes a noisy environment in which noises may severely degrade quality of speech communication, therefore speech enhancement technologies for suppressing noises play a significant role in modern communication.
In common speech enhancement technologies, there is a single microphone spectral subtraction speech enhancement technology also called single channel spectral subtraction speech enhancement technology, such as those disclosed in the patent document 1 (CN1684143A) and patent document 2 (CN101477800A). This technology has the following defects: Firstly, only steady-state noise can be suppressed, and there is no significant suppression for non-steady noise such as surrounding talking in supermarkets. Secondly, in a case of low SNR (signal to noise ratio), noise energy can not be evaluated accurately, hence damaging speech. Finally, this technology spends long time evaluating noise energy, therefore noise reduction works only after a period of noise occurrence.
The patent document 3 provides a better speech enhancement technology using a microphone array consisting of two or more microphones in which noises received by one microphone are used by an adaptive filter to counteract noise component in signals received by the other microphone and maintain speech component. Since in practice, signals received by both microphones contain speech components, speech may be damaged while reducing noises, therefore a critical difficulty of this technology is how to control convergence and filtering of the adaptive filter to protect speech in one microphone from being counteracted by speech in another while effectively suppressing noise.
In patent document 4, the microphone array has a directivity by designing specific locations of microphones, while in patent document 3, a directive microphone is used, which has different energy responses to signals from different directions, and determines signal directions by comparing energy differences to control noise elimination. However, this method imposes strict requirements for microphones, such as consistency of microphones or a directive microphone needs to be designed carefully to have significant directivity, hence having great limitations; secondly, using this method, in a case of an environment with high noises, speech state can not be accurately determined, thus the noise reduction process of adaptive filter can not controlled accurately, hence speech may be damaged while reducing noise.
Patent document 1: China patent of invention publication CN1684143
Patent document 2: China patent of invention publication CN101477800
Patent document 3: China patent of invention publication CN101466055
Patent document 4: China patent of invention publication CN101466056
In view of the above problems in prior art, one object of the present invention is to determine accurately speech state with a microphone array consisting of two or more microphones, thereby effectively controlling an adaptive filter to eliminate noises, enhancing SNR and meanwhile protecting speech quality.
In order to solve the above-mentioned technical problem, the present invention provides an adaptive noise reduction control method using a microphone array comprising steps of:
S1: collecting, by the microphone array, acoustic signals;
S2: determining incidence angles of all acoustic signals of the microphone array;
S3: conducting statistics on signal components according to incidence angles;
S4: controlling an adaptive filter according to statistical result.
Further, said step of determining incidence angles of acoustic signals comprises:
S201: conducting frequency domain transformation or sub-band transformation on the acoustic signals;
S202: calculating phase differences of various frequency bins or sub-bands of the microphone array signals and calculating relative time delays of the frequency bins or sub-bands of the microphone array signals from the phase differences;
S203: calculating incidence angles of the microphone array signals according to the relative time delays of the frequency bins or sub-bands.
In step S4, the adaptive filter is updated fast when there is only noises; and the adaptive filter is updated slow when there is target signals.
Preferably, a control parameter α is used to control an update rate of the adaptive filter, wherein a value of α is determined by a ratio of a noise component in the statistical result; the smaller α is, the slower the adaptive filter is updated; when α is 0, the acoustic signal is exactly a target speech signal, and the adaptive filter is not updated; in contrast, when α is 1, the acoustic signal is all of noise signals and the adaptive filter is updated at a fastest speed.
Preferably, after step S2, it further comprises: setting an angle transition range, dividing an entire space into several areas according to an amount of the target speech signals, calculating a parameter β according to an area at which said incidence angle is located and taking β*α as the control parameter of the adaptive filter.
Further, an entire space is divided into a protection area, a transition area and a suppression area, wherein, β=0 for incidence angles within the protection area; 0<β<1 for incidence angles within the transition area, and β=1 for incidence angles within the suppression area.
Said step of converting acoustic signals into frequency domain further comprises:
S2011: separating acoustic signals into individual frames;
S2012: each frame of signal, after the above framing, is windowed;
S2013: DFT converting windowed data into frequency domain.
Further, in step S2011, a acoustic signal si is subjected to framing (i=1,2), with N sampling points in each frame or a frame size of 10 ms˜32 ms, letting a mth frame of signal is di(m,n), wherein 0≦n<N, 0≦m; there are M overlapping sampling points between two adjacent frames, with L=N−M sampling points of new data for each frame; the mth frame of data is di(m,n)=si(m*L+n).
On the other hand, the present invention also provides a noise reduction control device using a microphone array comprising: a microphone array for collecting acoustic signals; a filtering control unit for determining incidence angles of all acoustic signals of the microphone array, implementing a statistics on signal components according to the incidence angles and then controlling the adaptive filter according to statistical result of the signal components; an adaptive filter for filtering noises.
Said filtering control unit comprises: a DFT unit for discrete Fourier transforming acoustic signals into frequency domain; a signal delay estimation unit for calculating phase differences between various frequency bins or sub-bands of the microphone array signals and calculating relative time delays of the frequency bins or sub-bands of the microphone array signals from the phase differences; a signal direction estimation unit for calculating incidence angles of the microphone array signals based on the relative time delays of the frequency bins or sub-bands; a signal component statistics unit for implementing statistics on components of the target signal according to said incidence angles and distinguishing the signals to find out a target signal component and a noise component.
Further, the DFT unit comprises: a framing unit for framing or separating the acoustic signals into individual frames; a windowing unit for windowing each frame of signal after framing; a DFT converting unit for DFT converting windowed data into frequency domain.
Further, preferably, the microphone array in the technical solution proposed in the present invention is completely comprised of omnidirectional microphones, or comprised of omnidirectional microphones and monodirectional microphones or completely comprised of monodirectional microphones.
By applying the above technology, space orientation information of the sound may be obtained directly with the microphone array to take full advantage of the orientation information to control update filtering of the adaptive filter more accurately, allowing protecting speech well while effectively reducing noises. In addition, this technology doesn't need energy information of signals, and it doesn't impose strict requirements on consistency of the two microphones, and would not be influenced by energy variation.
The above-mentioned features and technical advantages of the present invention will become clearer and more apparent through the following description of other embodiments with reference to accompany drawings.
a is graph showing a waveform of speech signals with noises before noise reduction according to an embodiment of the present invention;
b is a graph showing a waveform of speech signals after noise reduction according to an embodiment of the present invention;
The present invention will be described in more detail below by way of specific embodiments with reference to drawings.
According to noise reduction technologies in the prior art for microphone array, taking a microphone array consisting of two microphones as an example, typically, noise reduction is implemented using an adaptive filter with respect to acoustic signals collected by two microphones, wherein acoustic signals collected by the two microphones are regarded as noisy speech signal s1 and reference signal s2, respectively. First of all, the reference signal s2 is input into the adaptive filter for filtering to output noise signal s3, subtracting s3 from the noisy speech signal s1 results in signal y, and y is fed back to the adaptive filter for updating a filter weight value. When y has large energy, the adaptive filter is updated quickly to make s3 continuously approach s1, then the energy of y resulted from subtraction between s1 and s3 becomes less and less. When s3=s1, y has the least energy, the adaptive filter stops updating, hence realizing the effect of suppressing noise of s1 with s2.
When s1 and s2 received by the microphone array contain only noise signals, the adaptive filter may suppress noises very well. However, when s1 and s2 contain speech signals, in order for y, which is resulted from subtracting s3 from s1, has the least energy, the adaptive filter may balance out speech signals therein, hence damaging speech. Therefore, in order not to suppress speech, the present invention provides a method for controlling update and filtering of the adaptive filter by means of sound incidence direction, which method can prevent the adaptive filter from damaging speech when speech occurs.
The scheme of a microphone array illustrated in
In the embodiments shown in the above
Among others, the filtering control unit includes a DFT unit, a signal delay estimation unit, a signal direction estimation unit and a signal composition evaluating unit, the DFT unit conducts discrete Fourier transform on the two signals to transform them into frequency domain respectively. Signals that have been transformed into frequency domain are input into the microphone signal delay estimation unit to calculate phase differences of each frequency bins or sub-bands of the two signals, and then relative time delays of each of frequency bins or sub-bands of the two signals are calculated according to phase differences. Assuming the target speech signal is incident from 0 degree direction, the signal direction estimation unit converts relative time delays of each of frequency bins or sub-bands of the two signals into their incidence angle, and target speech components within the angle of protection and noise components outside the angle of protection may be distinguished according to their incidence angles. The signal component statistics unit evaluates components of target speech signals whose incident angles locate within the angle of protection and calculates the control parameter α (0≦α≦1).
The more noise components, whose incident angles are outside the angle of protection, the larger the control parameter α is, and the faster the updating of the adaptive filter is. When all received signals are noise components outside the angle of protection, α=1, the adaptive filter conducts the fastest update in this noise section, hence suppressing noise signals.
In contrast, the more the target signal components, which are within the angle of protection, the smaller α is, the slower the updating of the adaptive filter is. When all signals are target speech components, α=0, the adaptive filter stops updating of weights of the filter in this speech section, thereby protecting speech in the desired speech signal s1 from being balanced out, thus effectively protecting target speech from being damaged. □
In
In
DFT unit conducts discrete Fourier transformation on signals s1 and s2: Firstly, s1 (i=1, 2) is subjected to framing to separate them into individual frames with N sampling points per frame or a frame size of 10 ms˜32 ms, and represent the mth frame signal as di(m,n), wherein 0≦n<N, 0≦m. There is an overlap of M (M=128˜192) sampling points between two adjacent frames, that is, the first M sampling points of the current frame are the last M sampling points of the previous frame and there are only L=N−M sampling points of new data in each frame. Therefore the mth frame of data is di(m,n)=si(n*L+n). In this embodiment, the frame size N=256, i.e., 32 ms, with overlap M=128, i.e., an overlap of 50%. After framing, each frame of signals are windowed with a window function win(n) and the windowed data is gi(m,n)=win(n)*di(m,n). As the window function, Hamming window, Hanning window etc. may be selected and in this embodiment, the Hanning window is selected
The windowed data is DFT converted into frequency domain
indicates a frequency bin, Gi(m,k) is the amplitude, and φi(m,k) is the phase.
The signal delay estimation unit calculates relative time delay of two signals:
The signal direction estimation unit obtains the range of incidence angles based on a comparison between relative time delay ΔT(m,k) of signals and the time delay ΔT(±45°) of the angle of protection (±45°):
The signal component statistics unit implements a statistics on signal components within the protection angle based on ΔT(m,k), and then evaluates the control parameter α for updating the adaptive filter, α is a number between 0˜1 determined by the amount of frequency contents within the angle of protection. When the number of frequency components within the angle of protection is 0, α=1; when the number of frequency components outside the angle of protection is 0, α=0.
As for the time domain adaptive filter, in this embodiment, the time domain adaptive filter is a FIR filter (finite impulse response filter) with length P(P1). The weight of the filter is
s
3(n)=w(0)*s2(n)+w(1)*s2(n−1)+ . . . +w(P−1)*s2(n−P+1)
The counteracted signal y(n) as a result of counteracting s1(n) with s3(n) is obtained by subtraction s3(n) from s1(n): y(n)=s1(n)−s3(n),
y(n) is fed back to the adaptive filter for updating the weight of the filter:
n)=
The update rate μ is controlled by the parameter α. When α=1, i.e., s1(n), s2(n) contain only noise components, the adaptive filter converges quickly, which makes s3(n) identical to s1(n), therefore the counteracted y(n) has minimum energy, thereby eliminating noises. When α=0, i.e., s1(n), s2(n) contain only target speech components, the adaptive filter stops updating, which makes the output signal s3(n) of the adaptive filter not converge to s1(n), and s3(n) and s1(n) are different, so that speech components will not be balanced out after subtraction s3(n) from s1(n) and speech components are maintained in the output y(n). When 0<α<1, i.e., signals collected by the microphones contain both speech components and noise components, then the update rate of the adaptive filter is controlled by the amounts of speech and noise components so as to ensure maintaining speech components while eliminating noises.
a and 6b show wave patterns of speech signals with noises before the noise reduction processing of the present invention, and speech signals with noise reduced after the noise reduction processing of the embodiment of the invention, respectively. As shown in
In addition, in the above-mentioned embodiment, the entire signal collection space is divided into two areas: a protection area and a suppression area, in a further case, a transition area may be additionally added, and a parameter β(0≦β≦1) is obtained. □ β=0 for signal incidence angle within the protection area; 0<β<1 within the transition area, the closer to the suppression area, the larger, and β=1 in the suppression area. β*α is the control parameter of the adaptive filter. This can make the control parameter of the adaptive filter more accurate, thereby enhancing noise reduction of speech.
According to an embodiment, the time domain adaptive filter is controlled by the control parameter α for noise reduction, however it is not limited to a time domain adaptive filter, it is also possible to control a frequency domain (sub-band) adaptive filter by the control parameter α for noise reduction. The difference between a time domain case and a frequency domain case is that: in a time domain case, the signal component statistics unit obtains a control parameter α by counting target signals or calculating a ratio of target signals to noise; in a frequency domain case, the signal component statistics unit obtains control parameters α of N frequency bins or sub-bands by evaluating incidence angles of each frequency bin or sub-band.
The frequency domain (sub-band) adaptive filter conducts update control over each frequency bin or sub-band respectively after signal component statistics according to characteristics of frequency bins or sub-bands. The incidence angle of each frequency bin or sub-band is converted into the control parameter αi of the adaptive filter (i representing frequency bin or sub-band). The larger the incidence angle is, the more the speech of this frequency bin or sub-band deviates from the target speech that is in 0 degree direction, and thus the larger αi is, and the more quickly this frequency bin or sub-band is updated. When the incidence angle of the ith frequency bin or sub-band is in the 0 degree direction within the angle of protection, αi=0, the sub-band adaptive filter does not update to protect the target speech component of this sub-band. When the incidence angle of the ith frequency bin or sub-band is outside of the angle of protection, and it deviates most from the target speech in the 0 degree direction, αi=1, the sub-band adaptive filter updates the most quickly to suppress the noise component in this sub-band.
By controlling frequency domain (sub-band) adaptive filters for noise reduction, the control parameter αi for each frequency bin or sub-band may be obtained and update of each frequency bin or sub-band of frequency adaptive filter is controlled independently, resulting in more significant noise reduction effect.
Again, in this embodiment, a transition area may be additionally added to obtain a parameter β(0≦β≦1), generating a new parameter αi*β. Wherein, β=0 for signal incidence angles within the protection area, 0<β<1 within the transition area, the closer to the suppression area, the larger, and β=1 in the suppression area. αi*β is used as the control parameter of the adaptive filter. This can also make the control parameter of the adaptive filter more accurate, thereby enhancing noise reduction for speech.
Still further, in a case where a transition area is added, to calculate the parameter βi(0≦βi≦1) for each frequency bin or sub-band is calculated, wherein, βi=0 for signal incidence angles within the protection area, 0<βi<1 within the transition area, the closer to the suppression area, the larger, and βi=1 in the suppression area. A new control parameter αi*βi is generated and αi*βi is used as the control parameter signal of the adaptive filter. This further improves the accuracy of the control parameter of the adaptive filter, thereby further enhancing the effect of noise reduction for speech.
While the protection area set in the above-mentioned embodiment is −45°˜45°, it may be adjusted in practice according to user's real position and demands. Positions of the two microphones relative to the user is not limited to those shown in
Furthermore, it is noted that since no energy information of signals is required during noise reduction process according to this application, there is no strict requirement on consistency of the two microphones; the energy variation of acoustic signals has no influence, and there is no strict requirement on directivity of microphones. Therefore, as compared with prior art microphone noise reduction technologies, the present invention is easier to realize Although in the above-mentioned embodiment proposed in the present invention, microphone arrays all consisted of omnidirectional microphones are employed, microphone arrays consisted of omnidirectional microphones and monodirectional microphones or microphone arrays consisted of all monodirectional microphones may be used.
Under the above teachings of the present invention, those skilled in the art may make various modifications and changes on the basis of the above-mentioned embodiments, which all lie in the protection scope of the present invention. Those skilled in the art will understand that the above specific description is only for the purpose of better explaining the present invention and the scope of the present invention is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
200910265426.9 | Dec 2009 | CN | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2010/079814 | 12/15/2010 | WO | 00 | 4/3/2012 |