The present invention relates to an adaptive filter that generates pseudo echo by computing a received speech signal using a tap coefficient, and also relates to an echo canceller that eliminates echo using pseudo echo generated in this adaptive filter.
Following the advancement in the mode of communication and hardware, we have increasing opportunities to use microphones. A microphone generally picks up sound/voice output from a speaker and indirect waves (echo) reflected on the walls of a room, in addition to the desired sound/voice, and, if these are output as is from a speaker, the speaking party's voice is heard with a delay and this is very annoying. Also, with a hands-free telephone, there are cases where a specific frequency component of echo is amplified and produces howling (oscillation phenomenon).
An echo canceller is a system for eliminating this echo. That is, an echo canceller updates the tap coefficient of an adaptive filter to be equal to the transfer function from the speaker to the microphone, and subtracts the signal (hereinafter “pseudo echo”) obtained by processing an input signal using this tap coefficient, from the speech signal collected by the microphone. By this means, an echo component is suppressed in the signal (hereinafter “error signal”) output from the echo canceller.
The adaptive filter generates pseudo echo by computing the input speech at time n using tap coefficient wn[i] at discrete time n. Also, the adaptive filter updates tap coefficient wn[i] by a predetermined adaptation algorithm. Here, n is the sample time index and i is a parameter (index) to show the tap position of the adaptive filter.
The recurrence equation of the update of tap coefficient wn[i] is generally represented by following equation 1. In equation 1, wn[i] is the tap coefficient prior to adaptation, wn+1[i] is the tap coefficient after adaptation, μ is the step size to adjust the rate of convergence of the tap coefficient, and Δwn[i] is the update coefficient. Update coefficient Δwn[i] varies depending on the type of the adaptation algorithm.
(Equation 1)
w
n+1
[i]=w
n
[i]+μ·Δw
n
[i] [1]
Generally, an adaptive filter is defined as a filter to subject a signal that is received as input in a certain system with an unknown transfer function, to filtering processing, and that updates the tap coefficient in the filtering processing such that the system output signal to be output from the system and the signal after the filtering processing are equal.
Now, the operation of adaptive estimation in the adaptive filter is executed normally in a single-talk state in which there is only a speech signal from a far-end speaking party.
However, an adaptation algorithm heretofore does not take into account the influence of speech (that is, additive noise) from a near-end speaking party. Consequently, in a state of “double talk” in which a speech signal from a near-end speaking party (that is, “near end signal”) and a speech signal from a far-end speaking party (that is, “far end signal”) overlap one another, there is a threat that, by updating the adaptive filter, the adaptation error of the adaptive filter increases and leads to saturation of tap coefficient wn[i].
To combat this problem, the prior art provides a double-talk detection circuit to detect whether or not there is a double-talk state, and, when a double-talk state is found, stops the adaptive estimation operation of the adaptive filter or makes the step size, μ, of the adaptation algorithm small (see, for example, patent literature 1, patent literature 2 and patent literature 3).
With a technique to provide a double-talk detection circuit like the prior art, it takes time to find an optimal threshold value for determining whether or not there is a double-talk state. Furthermore, generally, it is difficult to detect a double-talk state accurately, and, depending on the prior art, it is difficult to maintain a good convergence behavior of an adaptive filter.
It is therefore an object of the present invention to provide an adaptive filer and echo canceller that can prevent adaptation error from increasing in a double-talk state without providing a double-talk detection circuit.
An adaptive filter according to the present invention has: a filter section that receives input signal x[n] that is equal to an system input signal entered in a system with an unknown transfer function, and that outputs output signal y[n] by performing filtering processing of input signal x[n]; and a coefficient setting section that updates tap coefficient wn[i] in the filtering processing to be equal to the transfer function of the system, based on error signal e[n] representing a difference between system output signal d[n] that is output from the system and output signal y[n], and input signal x[n], where n is a sample time index and i is a parameter to represent a tap position of the adaptive filter, and, with this adaptive filer, the tap coefficient setting section: generates updated tap coefficient wn+1[i] by adding a result of multiplication of step size 2μ and update coefficient Δwn[i] to tap coefficient wn[i]; generates update coefficient Δwn[i] by dividing a numerator term by a normalized denominator term; generates the numerator term by multiplying input signal x[n−i] by error signal e[n]; generates the normalized denominator term by one of: adding positive constant c to an average of a sum of a square of input signal x[n] and a square of error signal e[n]; adding positive constant c to a square root of an average of an product of the square of input signal x[n] and the square of error signal e[n]; and adding positive constant c to an average of an absolute value of the product of input signal x[n] and error signal e[n].
An echo canceller according to the present invention has: the above adaptive filter; and a subtractor that generates error signal e[n] by subtracting output signal y[n] that is output from the adaptive filter, from system output signal d[n].
According to the present invention, it is possible to maintain an optimal rate of convergence automatically by controlling the rate of convergence based on the power of a far-end signal and the power of a near-end signal, and consequently achieve a good and stable convergence behavior. Furthermore, with the present invention, it is possible to slow down the rate of convergence in a double-talk state and prevent the adaptation error from increasing.
Now, embodiments of the present invention will be described below in detail with reference to the accompanying drawings.
As shown in
In an receiving circuit (not shown), as a result of performing processing such as demodulation and decoding upon a signal transmitted from an apparatus of a communicating party, digital received speech signal x[n] is acquired. Digital received speech signal x[n] is an input signal in adaptive filter 108.
D/A converter 101 converts digital received speech signal x[n] into an analog speech signal. The analog received speech signal is amplified in power amplifier 102 and output as speech from speaker 103.
The analog transmission speech signal that is input in microphone 104 is amplified in microphone amplifier 105 and is input in A/D converter 106. In microphone 104, input speech s[n] from a near-end speaking party and speech g[n] playing from speaker 103 are input as echo. Input speech from a near-end speaking party is equivalent to additive noise that affects the convergence of adaptive filter 108.
A/D converter 106 converts the analog transmission speech signal into a digital transmission speech signal d[n].
Echo canceller 107 updates tap coefficient wn[i] of adaptive filter 108 to be equal to the impulse response of the transfer function from D/A converter 101 to A/D converter 106. Power amplifier 102 and microphone amplifier 105 generally have a flat frequency behavior, so that wn[i] becomes equal to the impulse response of the transfer function from speaker 103 to microphone 104. Then, echo canceller 107 subtracts pseudo echo y[n], which is obtained by processing digital received speech signal x[n] using tap coefficient wn[i], from digital transmission speech signal d[n], and outputs error signal e[n] in which echo is suppressed.
Adaptive filter 108 generates pseudo echo y[n] by operating digital received speech signal x[n] at discrete time n using tap coefficient wn[i]. Furthermore, adaptive filter 108 updates tap coefficient wn[i] using a predetermined adaptation algorithm, based on error signal e[n] and digital received speech signal x[n] output from sub tractor 109. Note that i is a parameter (index) to represent the tap position of the adaptive filter.
Subtractor 109 subtracts pseudo echo y[n] from digital transmission speech signal d[n] output from A/D converter 106, and acquires error signal e[n]. Error signal e[n] is subjected to processing such as encoding and modulation in a transmission circuit (not shown), and is transmitted to an apparatus of a communicating party.
Next, the inner configuration of adaptive filter 108 will be described with reference to drawings. In the following description, a digital received speech signal x[n] is the input signal and pseudo echo y[n] is the output signal.
Filter section 201 is, for example, an FIR filter and generates output signal y[n] by computing (that is, by filtering) input signal x[n] using tap coefficient wn[i].
Tap coefficient setting section 202 sets tap coefficient wn[i] based on a predetermined adaptation algorithm using input signal x[n] and error signal e[n], and outputs tap coefficient wn[i] to filter section 201. A feature of the present invention lies in this adaptation algorithm computation circuit in tap coefficient setting section 202.
The adaptive filter of the present invention in the event an NLMS (Normalized Least Mean Square) algorithm is used as an adaptation algorithm will be described below. The NLMS algorithm is an adaptation algorithm to perform time domain processing, and perform a computation every time a new signal sampling value is received as input, so that the tap coefficient gradually converges to an optimal value.
When an NLMS algorithm is used, the updated recurrence equation of tap coefficient wn[i] of the present invention is represented by one of following equation 2 to equation 4. From equation 2 to equation 4, wn[i] is the tap coefficient prior to adaptation (that is, at time n), wn+1[i] is the tap coefficient after adaptation (that is, at time n+1), 2μ is the step size to adjust the rate of convergence of the tap coefficient, x[n] is an input signal, e[n] is an error signal, and c is a small constant of a positive value for not making the value of the denominator of a fraction zero. Furthermore, the symbol “
The short time average is determined by integration processing using a moving average of a finite number of past samples or a forgetting factor. Assuming that the data length for averaging is N, a moving average can be calculated using an N-tap FIR filter having all-1/N tap coefficients. Also, integration processing using forgetting factor α is defined by equation 5 below. Equation 5 assumes a case where a short time average of input signal f[n] is found.
(Equation 5)
f[n]
Equation 2 to equation 4 are all different from a conventional NLMS algorithm in that normalization processing is performed using the power of error signal e[n] in addition to the power of input signal x[n].
Equation 2 executes normalization processing based on an average value of the sum of the power of input signal x[n] and the power of error signal e[n]. Equation 3 executes normalization processing based on the square root of an average value of the sum of the power of input signal x[n] and the power of error signal e[n]. The processing of equation 2 and the processing of equation 3 provide substantially the same effect.
To skip the square root computation of equation 3, equation 4 executes normalization processing based on an average value of the absolute value of the product of input signal x[n] and error signal e[n]. The product of input signal x[n] and error signal e[n] represents power level, so that it is not necessary to calculate the square root and equation 4 realizes easier processing than equation 3.
In
In
In
Next, the adaptive filter of the present invention will be described assuming a case where an FLMS (Fast Least Mean Square or Frequency domain Least Mean Square) algorithm is used as an adaptation algorithm. The FLMS algorithm is an adaptation algorithm to perform block processing in the frequency domain using the discrete Fourier transform, on a regular basis, every several samples, so that the tap coefficient gradually converges to an optimal value. The FLMS algorithm may also be referred to as an FBLMS (Fast Block Least Mean Square or Frequency domain Block Least Mean Square) algorithm.
In the event the FLMS algorithm is used, the updated recurrence equation of the tap coefficient of the present invention can be represented by following equations 6 and 7. In equation 6 and equation 7, Wn[k] is the discrete Fourier transform of tap coefficient wn[i], Wn+L[k] is the discrete Fourier transform of tap coefficient wn+L[i], μ is the step size for adjusting the rate of convergence of the tap coefficient, X[k] is the discrete Fourier transform of input signal x[n], E[k] is the discrete Fourier transform of error signal e[n], c is a small constant of a positive value for not making the value of the denominator of the fraction 120 zero, and p is a small constant of a positive value for not making the numerator a significantly low value. Also, k represents frequency. L represents the cycle of performing coefficient updating processing. L=1 may be calculated every time a new signal sample value is received as input. Also, the symbol“
In equation 6 and equation 7, unlike a conventional FLMS algorithm, normalization processing is performed using discrete Fourier transform E[k] of error signal e[n] in addition to discrete Fourier transform X[k] of input signal x[n].
In equation 6 and equation 7, X[k] is determined from following equation 8 and E[k] is determined from following equation 9. In equation 8 and equation 9, N is the tap length of an adaptive filter, and DFT is the discrete Fourier transform.
Time domain tap coefficient wn+L[i] can be found by performing an inverse discrete Fourier transform on frequency domain discrete Fourier transform Wn+L[k] of the tap coefficient updated by equation 6 and equation 7, and by extracting its real part. In equation 10, IDFT represents the inverse discrete Fourier transform. Also, REAL is the processing of extracting the real part of the parameter. Normally, the fast Fourier transform algorithm is used for DFT/IDFT calculation.
The tap coefficient length acquired from equation 10 is 2N of wn+L[0] to wn+L[2N−1], but wn+L[N] to wn+L[2N−1] are discarded and wn+L[0] to wn+L[N−1] are used as tap coefficient.
Equation 6 applies the normalization processing of equation 4 using the absolute value of the product of input signal x[n] and error signal e[n] to the FLMS algorithm to perform block processing in the frequency domain, and allows normalization processing in the frequency domain using the absolute value of the product of discrete Fourier transform X[k] of input signal x[n] and discrete Fourier transform E[k] of error signal e[n].
Equation 7 adds processing for improving the rate of conversion from the initial state upon starting the system to equation 6, and executes normalization processing using the result of dividing the absolute value of an average value of the product of output signal y[n] of an adaptive filter and error signal e[n], by the absolute value of the product of discrete Fourier transform X[k] of input signal x[n] and inverse Fourier transform E[k] of error signal e[n]. Note that the average value of the product of y[n] and e[n] needs to be calculated through integration processing of equation 5 above using a forgetting factor, and its initial value needs to be made a value other than zero. The initial value needs to be made approximately the same as the average power of input signal x[n].
The product of y[n] and e[n] can be represented as following equation 11. The symbol “*” in equation 11 is an operator to represent convolution.
In equation 11, s[n] and x[n] are signals from different signal sources and are uncorrelated, so that, by taking the averaging interval longer, the average of the product of s[n] and x[n]*wn[i], that is, the correlation between s[n] and x[n]*wn[i], becomes nearly zero. The absolute value of the average value of the product of y[n] and e[n] can be represented by following equation 12.
(Equation 12)
|
When the adaptive filter is converged completely, unless g[n]=y[n], there is correlation between estimation error with respect to g[n] (g[n]−y[n]) and y[n], and the absolute value of the average of the product of y[n] and e[n] shown in equation 12 is a positive value.
The estimation error tends to be large right after the adaptive filter starts operating, and, although the value of equation 12 has a greater value right after the adaptive filter starts operating, it nevertheless approaches zero following advancement of adaptation operation toward convergence. Consequently, in adaptive filter coefficient updating processing based on equation 7, right after the start where estimator error has a significant value, the value of step size μ is equivalently big, so that the rate of convergence is accelerated. This processing does not simply improve the rate of convergence right after the adaptive filter starts operating, and provides an advantage of accelerating the convergence of the adaptive filter when the impulse response of the system (audio system) of the identification target changes suddenly and as a result of this the level of error signal e[n] increases.
As obvious from
In
In
A result of a computer simulation, carried out to prove the effectiveness of the adaptive filter of the present invention described above will be shown below. The system configuration upon the simulation is the same as in
Changes with the amount of estimation error D[n] of tap coefficient h[i] of the identification target shown by equation 14 filter are shown in
As shown in
By contrast with this, as clear from the comparison of
Also, as shown in
As shown in
Thus, while a conventional NLMS algorithm executes normalization processing using input signal power alone, with the NLMS algorithm of the adaptive filter of the present invention, normalization processing is carried out using both input signal power and error signal power.
When the power of additive noise to be added to the desired signal increases, the error sigil power also increases. According to the present invention, by performing normalization processing using error signal power as well, it is possible to automatically slow down the rate of convergence of the adaptation algorithm when the power of additive noise is large, and prevent deterioration of convergence behavior due to the influence of noise.
With an echo canceller, if a double-talk state is created, the adaptation error of the adaptive filter increases, and, accompanying this, error signal power also increases. Accordingly, like the present invention, by performing normalization processing using error signal power, it is possible to slow down the rate of conversion of the adaptive filter when a double-talk state is created and prevent adaptation error from increasing rapidly due to double talk.
Furthermore, if a double-talk state is cancelled and the power of error signal e[n] becomes lower, then the rate of convergence automatically decreases.
Furthermore, the method of processing according to the present invention does not presume the stationarity of an input signal, does not presume characteristics such as whiteness (white noise) and normal distribution, or does not presume correlation characteristics, so that, when a speech signal, with which these presumptions rarely apply, is received as input, it is possible to achieve a good convergence behavior. The only presumption which the present invention is based on is that, as shown in
In the processing of equation 7 and equation 9 above, processing of a high rate of convergence is carried out based on a presumption related to the correlation characteristics of an input signal, but the simulation result of
Thus, according to the present invention, an optimal rate of convergence is maintained automatically based on far-end signal power and near-end signal power, so that it is possible to achieve a good and stable convergence behavior.
In comparison to a conventional technique of controlling the rate of convergence using a double-talk detection circuit, the present invention provides advantages of making the processing of setting a threshold for determining whether or not there is a double-talk state unnecessary and also making it unnecessary to keep setting an optimal value every time the operation environment changes.
Although an echo canceller having an adaptive filter has been described with the present embodiment, the adaptive filter of the present invention is by no means limited to an echo canceller and is applicable to other devices and apparatuses as well.
The disclosure of Japanese Patent Application No. 2008-292330, filed on Nov. 14, 2008, including the specification, drawings, and abstract is incorporated herein by reference in its entirety.
The present invention is suitable for use with an echo canceller, howling canceller and so on in a conventional communication system (wireless telephone, cable telephone, interphone, TV conference system).
Number | Date | Country | Kind |
---|---|---|---|
2008-292330 | Nov 2008 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2009/006017 | 11/11/2009 | WO | 00 | 7/13/2011 |