1. Field of the Invention
The present invention relates to voice signal processing, and more particularly to method and system for eliminating noises mixed in voice signals.
2. Description of Related Art
Generally, NLMS (Normalized Least Mean Squares) algorithm is a familiar adaptive filter algorithm for eliminating noises mixed in voice signal.
There are a number of ways to implement an NLMS algorithm. However, some involve extensive computations while others would reduce the magnitude of the filtered voice or introduce echo.
Thus, there is a need for efficient techniques to enhance the voice quality in a device that is implemented with an NLMS adaptive filter.
This section is for the purpose of summarizing some aspects of the present invention and to briefly introduce some preferred embodiments. Simplifications or omissions in this section as well as in the abstract or the title of this description may be made to avoid obscuring the purpose of this section, the abstract and the title. Such simplifications or omissions are not intended to limit the scope of the present invention.
In general, the present invention pertains to improved method of implementing for adaptive filter. According to one aspect of the present invention, an adaptive filter technique is disclosed. The operation of an adaptive filter in accordance with the present invention involves the following operations:
gathering frames of the voice signal s(n) mixed with noise and frames of the reference noise u(n);
inputting a current frame of the reference noise signal u(n) to the adaptive filter;
estimating a noise value according to coefficients w(i) of the adaptive filter and the current frame of the reference noise signal u(n);
subtracting the estimated noise from the current frame of the voice signal s(n) mixed with noise to get one frame of pure voice signal e(n) by a subtracter;
providing the current frame of pure voice signal e(n) to the adaptive filter;
calculating an adaptive step size μ and estimating a voice probability contained in the current frame of reference noise, wherein μ is a constant, or μ=c/En, c is constant, En is an estimated energy of the current frame of the reference noise signal;
adjusting the adaptive step size μ according to the voice probability contained in the current frame of reference noise. Specifically, the higher the probability is, the smaller the adaptive step size μ becomes;
refreshing the adaptive filter coefficient according to the current frame of the pure voice signal, the current frame of reference noise signal, the adjusted adaptive step size
finally, returning to the operation of filtering a next frame of voice mixed with noise according to the refreshed adaptive filter coefficient until the voice signal mixed with noise is completely processed.
One of the objects, features, and advantages of the present invention is to provide an adaptive filter that may be used to minimize noise in audio/voice signals.
Other objects, features, and advantages of the present invention will become apparent upon examining the following detailed description of an embodiment thereof, taken in conjunction with the attached drawings.
These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings where:
The detailed description of the present invention is presented largely in terms of procedures, steps, logic blocks, processing, or other symbolic representations that directly or indirectly resemble the operations of devices or systems contemplated in the present invention. These descriptions and representations are typically used by those skilled in the art to most effectively convey the substance of their work to others skilled in the art.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the order of blocks in process flowcharts or diagrams or the use of sequence numbers representing one or more embodiments of the invention do not inherently indicate any particular order nor imply any limitations in the invention.
FD-NLMS is one example of NLMS algorithm in a Frequency-Domain, whose main purpose is to reduce calculation in the NLMS adaptive filter by utilizing the frequency-domain multiplication to substitute the time-domain convolution. The detailed theory about the FD-NLMS may be found in Adaptive Filter Theory, 4th Edition, by Simon Haykin
As an implementation,
1) Inputting frames of reference noise samples, each frame has N samples; combining a current frame of reference noise samples v[0], . . . , v[N−1] with a previous frame of reference noise samples v′[0], . . . , v′[N−1] into a vector V with a size of 2N, V={v′[0], . . . , v′[N−1], v[0], . . . , v[N−1]};
2) Calculating FFT of the vector V to produce a vector U, wherein U=FFT(V)={u[0], u[1], . . . u[2N−1]} the FFT refers to Fast Fourier Transform with the size of U being 2N;
3) Multiplying the vector U by a vector Fw which is obtained from the operations 7); calculating IFFT of the product to produce a vector Y′ with the size of 2N, wherein Y′ is a real number vector (the imaginary part is 0); abandoning the front N values of Y′ to produce a vector Y, wherein Y={y[0], . . . , y[N−1]}, and its size is N;
4) Inputting frames of voice with noise simples S=s(0)K s[N−1] each frame has N samples; subtracting the vector Y from the vector S to obtain the pure voice E, wherein E={e[0], . . . , e[N−1]}={s[0]−y[0], . . . , s[N−1]−y[N−1]}.
5) Inserting N zero(0) values in the front of the vector E to form a vector E′, E′={0, . . . , 0, e[0], . . . , e[N−1]}; calculating FFT of the E′ to get a vector F, where F=FFT(E′);
6) Conjugating the U to produce a vector UH, UH=Ū; multiplying the vector UH by the vector F to produce a vector G′; Calculating IFFT of the G′ to get a vector H, H=IFFT(G′), wherein the vector H is a real number vector with size of 2N (the imaginary part is 0); Setting the last N number of values of the vector H to 0, and then calculating FFT to the vector H to get a vector G, wherein the vector G is a plural vector with size of 2N;
7) Refreshing the vector FW according to FW=FW+μG; wherein μ is a constant, or μ==c/En, c is a constant, En is an estimated value of the present energy of signal, FW represents coefficients of the adaptive filter in FFT domain and the FW calculated at this time may be regarded as the value at next time. The estimated method will not be described detailed hereafter, the detail of which may be referenced in Adaptive Filter Theory, 4th Edith, issued to Simon Haykin.
8) returning to the operation 1) until the input voice is finished.
In the method above, when calculating the pure voice in time domain, it uses the following equation:
wherein, s[n] stands for the value of voice with noise at the n time point; u[n] stands for the value of reference noise at the n time point; w[i] stands for coefficient of the adaptive filter; N represents the order number of the adaptive filter;
represents the estimated noise value mixed in the voice s[n] at the n time point. It can be observed that the noise may be completely eliminated from s[n] if the u[n] as the reference noise has no any voice component. However, if the reference noise u[n] contains a strong voice component, the estimated noise value
may contain a portion or all of a voice in the current frame or previous frame. As a result, the pure voice e(n) after the adaptive filter may be weakened or even introduced with echo.
An adaptive filter technique is introduced. Depending on actual implementation, the adaptive filter technique may be implemented in software or hardware. According to one embodiment of the present invention, an adaptive filter in accordance with the present invention may be advantageously used in mobile communication fields. For example, a mobile phone is provided with a pair of microphones A, B. The microphone A is positioned away from a voice source and near to a noise source (e.g., a speaker). The microphone B is positioned away from the noise source and near to the voice source. In this condition, a double track voice signal could be recorded. However, the microphones A and B simultaneously produce signals with noise due to the small size of the mobile phone and the existence of the noise source, which adversely affects the tone of the double track voice signal. It should be noted that the two adjacent voice source and noise source are taken as examples. In reality, various voice sources and noise sources coexist, thus the influence of the noise sources on the voice signals could be more serious. With one embodiment of the present invention, the influence of the noise sources on the voice signals may be minimized.
In order to minimize or eliminate the noise mixed in the voice signals, and improve the quality of the output voice, an adaptive filter method and system according to one embodiment of the present invention may be applied. It is assumed that the signal recorded by the microphone A is regarded as a voice signal with noise, and the signal recorded by the microphone B is regarded as a reference noise.
In operation, the adaptive filter system 300 performs the following operations:
gathering frames of the voice signal s(n) mixed with noise and frames of the reference noise u(n);
inputting a current frame of the reference noise signal u(n) to the adaptive filter 302;
estimating a noise value according to the adaptive filter coefficient w(i) and the current frame of the reference noise signal u(n);
subtracting the estimated noise from the current frame of the voice signal s(n) mixed with noise to get one frame of pure voice signal e(n) by the subtracter 306;
providing the current frame of pure voice signal e(n) to the adaptive filter 302;
calculating an adaptive step size μ and estimating a voice probability contained in the current frame of reference noise, wherein μ is a constant, or μ=c/En, c is constant, En is an estimated energy of the current frame of the reference noise signal;
adjusting the adaptive step size μ according to the voice probability contained in the current frame of reference noise. Specifically, the higher the probability is, the smaller the adaptive step size μ becomes;
refreshing the adaptive filter coefficient according to the current frame of the pure voice signal, the current frame of reference noise signal, the adjusted adaptive step size
finally, returning to the operation of filtering a next frame of voice mixed with noise according to the refreshed adaptive filter coefficient until the voice signal mixed with noise is completely processed. Each frame of a signal comprises N number of signal simples.
In one embodiment, the refreshing operation of the adaptive filter coefficient may be performed according to the following equations:
W[k+1]=W[k]+
U(kL+i)={u(kL+i),u(kL+i−1) . . . u(kL+i−N+1)};
wherein W[k+1] represents the adaptive filter coefficient at k+1 frames; L stands for refreshing the adaptive filter coefficient after L reference noise samples, L may be an integer (e.g., a multiple of N); u(kL+i) stands for the reference noise value at kL+i time point; and e[kL+i] stands for the pure voice value at kL+i time point.
The adaptive filter operation as described may be applied into the NLMS adaptive filter algorithm.
1) Inputting frames of reference noise samples, each frame has N samples; combining a current frame of reference noise samples v[0], . . . , v[N−1] with a previous frame of reference noise samples v′[0], . . . , v′[N−1] into a vector V with size of 2N, V={v′[0], . . . , v′[N−1], v[0], . . . v[N−1]};
2) Calculating FFT to the vector V to produce a vector U, wherein U=FFT(V)={u[0], u[1], . . . u[2N−1]}, the FFT refers to Fast Fourier Transform, the size of U is 2N;
3) Multiplying the vector U by a vector Fw which is obtained from an operation referenced as 8); calculating IFFT of the product to produce a vector Y′ with the size of 2N, wherein Y′ is a real number vector (the imaginary part is 0); abandoning the front N number of values of Y′ to produce a vector Y, wherein Y={y[0], . . . , y[N−1]}, and its size is N;
4) Inputting frames of voice with noise simples S=s(0)K s[N−1] each frame has N samples; subtracting the vector Y from the vector S to obtain the pure voice E, wherein E={e[0], . . . , e[N−1]}={s[0]−y[0], . . . , s[N−1]−y[N−1]}.
5) Inserting N number of 0 values in the front of the vector E to form a vector E′, E′={0, . . . , 0, e[0], . . . , e[N−1]}; calculating FFT to the E′ to get a vector F, F=FFT(E′);
6) Conjugating the U to produce a vector UH, UH=Ū; multiplying the vector UH by the vector F to produce a vector G′; Calculating IFFT to the G′ to get a vector H, H=IFFT(G′), wherein the vector H is a real number vector with size of 2N (the imaginary part is 0); Setting the last N number of values of the vector H to 0, and then calculating FFT to the vector H to get a vector G, wherein the vector G is a plural vector with size of 2N;
7) calculating an adaptive step size μ and estimating a voice probability contained in the current frame of reference noise according to the vector U in the operation 2), wherein μ is a constant, or μ=c/En, c is constant, En is estimated energy of current frame of reference noise signal; adjusting the adaptive step size μ according to the voice probability contained in the current frame of reference noise, specifically, the higher the probability is, the smaller the adaptive step size μ becomes;
8) Refreshing the vector Fw according to FW=FW+
9) Returning to the operation 1) until the input voice is finished.
According to one embodiment, the calculating operation of the adaptive step size μ, the estimating operation of the voice probability and the adjusting operation of the adaptive step size μ are performed in a step size controller.
The present invention has been described in sufficient details with a certain degree of particularity. It is understood to those skilled in the art that the present disclosure of embodiments has been made by way of examples only and that numerous changes in the arrangement and combination of parts may be resorted without departing from the spirit and scope of the invention as claimed. Accordingly, the scope of the present invention is defined by the appended claims rather than the foregoing description of embodiments.