This invention relates to a telephone employing adaptive filters for echo canceling and noise reduction and, in particular, to an adaptive filter that adapts quickly even in low signal to noise conditions.
As used herein, “telephone” is a generic term for a communication device that utilizes, directly or indirectly, a dial tone from a licensed service provider. As such, “telephone” includes desk telephones (see
There are two kinds of echo in a telephone system, an acoustic echo between an earphone or a loudspeaker and a microphone and electrical echo generated in the switched network for routing a call between stations. In a handset, acoustic echo is typically not much of a problem. In speaker phones, where several people huddle around a microphone and loudspeaker, acoustic feedback is much more of a problem. Hybrid circuits (two-wire to four-wire transformers) located at terminal exchanges or in remote subscriber stages of a fixed network are the principal sources of electrical echo.
One way to reduce echo is to program the frequency response of a filter to match the frequency content of an echo. A filter typically used is a finite impulse response (FIR) filter having programmable coefficients. The echo is subtracted from the echo bearing signal at the microphone. This technique can reduce echo as much as 30 dB, depending upon the coefficient adaptation algorithm. Additional means using non-linear techniques are typically added to further reduce an echo. Approximating a solution for an adaptive filter is like trying clothes on a squirming child: the input signal keeps changing. At one extreme, sudden and/or large changes can upset the approximation process and make the process diverge rather than converge. At the other extreme, a low echo to noise ratio can cause instability.
A robust filter for echo cancellation is known in the art; U.S. Pat. No. 6,377,682 (Benesty et al.), the entire contents of which is incorporated by reference herein. As used in the patent, “robust” means “insensitivity to small deviations of the real distribution from the assumed model distribution.” A more functional or practical definition is that robust means insensitivity to outside disturbing influences, such as near-end talk or noise.
Convergence relates to a process for approximating an answer. In high school, one is taught how to calculate the roots of a quadratic equation f(x)=0 from the coefficients of the terms on the left side of the equation. This is not the only way to solve the problem. One can simply substitute a value (a guess) for x in the equation and calculate an answer. The guess is modified depending upon the difference (the error) between the calculated answer and zero. The error could be as large numerically as the guess. Thus, some fraction of the error is typically used to adjust the guess. Hopefully, successive guesses come closer and closer to a root. This is convergence. Calculations stop when the size of the error becomes arbitrarily small. For a human being, this approach is time consuming and boring. For a computer, this approach is extremely useful and applicable to many situations other than solving quadratic equations.
A simple fraction is a linear error function. If the fraction is small, convergence is slow. Fast convergence is desired to avoid double talk (both parties talking) or other errors during adaptation. If the fraction is large, successive calculations could diverge rather than converge. The Benesty et al. patent discloses that robustness is obtained by using a non-linear function of the error to determine successive approximations of coefficients for modeling the echo path.
The Benesty et al. patent relies on a Fast Recursive Least Squares (FRLS) algorithm for adapting a programmable FIR (finite impulse response) filter. Other algorithms are known in the art, such as normalized Least Mean Squares (nLMS). It is also known in the art to vary the step size of an nLMS filter; see S. Makino, Y. Kaneda, and N. Koizumi, “Exponentially Weighted Stepsize NLMS Adaptive Filter Based on the Statistics of a Room Impulse Response, IEEE Trans. on Speech and Audio Processing, Vol. 1, No. 1, January 1993, pages 101–108.
A digital signal processor (DSP) can be programmed according to any one of the available algorithms. There are at least two problems associated with implementing an algorithm on a DSP. A first problem is that the implementation may be unique to a particular processor. This is undesirable because it ties the implementation to the availability of a single semiconductor device. A second problem is that the implementation may not be efficient.
“Efficiency” in a programming sense is the number of instructions required to perform a function. Few instructions are better or more efficient than many instructions. In languages other than machine (assembly) language, a line of code may involve hundreds of instructions. As used herein, “efficiency” relates to machine language instructions, not lines of code, because it is the number of instructions that can be executed per unit time that determines how long it takes to perform an operation or to perform some function.
Stability is also affected by the range and resolution of the DSP. Poor resolution in a fixed point DSP (too few bits) can cause bad echo cancellation. For example, resolution and range are conflicting requirements in a fixed-point implementation. A solution is to use the MAC (Multiply/ACcumulate) function available in some DSPs. Some commercially available DSPs include two or more MAC units. Stability is also affected by the ability of the cancellation algorithm to operate in noise and double-talk.
In view of the foregoing, it is therefore an object of the invention to provide an efficient adaptive filter that is stable during noise and double talk, yet has fast convergence to an echo cancellation solution.
Another object of the invention is to provide an efficient method for adapting a programmable filter.
A further object of the invention is to provide an efficient and robust adaptive filter for noise reduction that is relatively machine independent; i.e. not tied to a single processor.
Another object of the invention is to provide a robust adaptive filter that is stable when the echo is nearly the same as near end noise.
The foregoing objects are achieved in this invention in which an adaptive filter is programmed with an algorithm based on a normalized Least Mean Squares (nLMS) algorithm that adapts each sample time. The algorithm is modified to be more efficient in a variety of DSPs by computing multiple errors, one per sample, before updating coefficients. The update equation utilizes the multiple errors to achieve adaptation at a similar performance to known nLMS algorithms that adapt each sample time but without the instability that is observed in low echo-to-near-end-noise ratio (ENR) input conditions. Varying the relaxation step size prevents divergence. The DSP utilizes one or more MAC units.
A more complete understanding of the invention can be obtained by considering the following detailed description in conjunction with the accompanying drawings, in which:
Those of skill in the art recognize that, once an analog signal is converted to digital form, all subsequent operations can take place in one or more suitably programmed microprocessors. Reference to “signal”, for example, does not necessarily mean a hardware implementation or an analog signal. Data in memory, even a single bit, can be a signal. In other words, a block diagram can be interpreted as hardware, software, e.g. a flow chart, or a mixture of hardware and software. Programming a microprocessor is well within the ability of those of ordinary skill in the art, either individually or in groups.
This invention finds use in many applications where the electronics is essentially the same but the external appearance of the device may vary.
The various forms of telephone can all benefit from the invention.
A cellular telephone includes both audio frequency and radio frequency circuits. Duplexer 55 couples antenna 56 to receive processor 57. Duplexer 55 couples antenna 56 to power amplifier 58 and isolates receive processor 57 from the power amplifier during transmission. Transmit processor 59 modulates a radio frequency signal with an audio signal from circuit 54. In non-cellular applications, such as speakerphones, there are no radio frequency circuits and signal processor 54 may be simplified somewhat. Problems of echo cancellation and noise remain and are handled in audio processor 60. It is audio processor 60 that is modified to include the invention. How that modification takes place is more easily understood by considering the echo canceling and noise reduction portions of an audio processor in more detail.
A new voice signal entering microphone input 62 may or may not be accompanied by a signal from speaker output 68. The signals from input 62 are digitized in A/D converter 71 and coupled to summation circuit 72. There is, as yet, no signal from echo canceling circuit 73 and the data proceeds to non-linear filter 74, which is initially set to minimum suppression.
The output from non-linear filter 74 is coupled to summation circuit 76, where comfort noise 75 is optionally added to the signal. The signal is then converted back to analog form by D/A converter 77, amplified in amplifier 78, and coupled to line output 64. Data from the four VAD circuits is supplied to control 80, which uses the data for allocating sub-bands, echo elimination, double talk detection, and other functions. Control circuit 40 (
In accordance with the invention, a normalized Least Mean Squares (nLMS) algorithm, which adapts each sample time, is modified to compute multiple errors, one per sample, before updating coefficients. Multiple error update has been found to provide similar performance to standard nLMS adapting each sample time but with instability during low ENR conditions. The invention requires robustness to maintain stability. Several other aspects of the invention are described below: (1) Exponential Step Size Weighting, (2) Multiple Error Update, (3) Scaling Robustness for Stability, and (4) Scale Factor.
Implementation
The following definitions are used in the calculation of the coefficient update:
The vector of past inputs is given by the following equation.
xk=x(k)=[x(k), x(k−1), . . . x(k−L)]T
xk refers to the past input x(k−1). L is the length of the FIR filter used to estimate the echo and k is the sample index.
The coefficient estimate vector (tap coefficients) is given by the following equation.
ĥk=ĥ(k)=[ĥ1(k), ĥ2(k), . . . ĥL(k)]T
The equations for dual-error nLMS adaptive filtering algorithm are as follows. ek=yk−xkTĥk gives the current error estimate for the current input, pk=xTx+δ is regularized power, where δ is the regularization parameter for the power normalization calculation (the value 0.001 has been used), and εk=ek/pk is the estimated error normalized by the power estimate. The coefficient estimate, ĥk, is updated using ĥk+1=ĥk−1+μxkεk+μxk−1εk−1, where μ is the relaxation step size.
A single MAC architecture will compute each error in a single-cycle per filter tap. A dual MAC architecture will compute both errors in a single-cycle per tap. The update equation can be similarly computed in two to four cycles per tap based on the number of MAC units, the resources to store the normalized errors as local operands for zero cycle fetching, and the ability to fetch operands and store results in parallel with the MAC unit operations. For example, this gives a total of 2.25 cycles per tap for a TMSC54xx processor (single MAC), 1.5 cycles per tap for a TMSC55xx processor (dual MAC), and 1.25 cycles per tap for a generic four MAC processor. Efficiency approaches one cycle per tap as the number of MACs increases.
The TMSC54xx and TMSC55xx processors calculate least mean square in a single machine instruction, which allows the error calculation and the coefficient update to be computed in two cycles per tap. Because the current error is being calculated as the coefficients are being updated, the previous error is used during calculation. Using the previous error also requires dual access memory rather than the single access memory for the dual error update. Dual error update does not require special memory, delayed errors, or a special LMS instruction, which is not available in many architectures. Thus, the invention can be used in many other architectures.
The step size, μ, controls the convergence and stability of the algorithm. Modifications of the basic multiple error algorithm are needed to control stability while maintaining a fast convergence to the error minimum. The following sections describe how the standard nLMS algorithm has been modified to an algorithm in accordance with the invention.
Exponential Step Size Weighting
For an adaptive filter, the impulse response envelope is well modeled by a decaying exponential curve; see S. Makino, Y. Kaneda, and N. Koizumi, “Exponentially Weighted stepsize NLMS Adaptive Filter Based on the Statistics of a Room Impulse Response, IEEE Trans. on Speech and Audio Processing, Vol. 1, No. 1, January 1993. This a priori information is incorporated into the step size used for each coefficient update, allowing improved tracking and convergence. The network adaptive filter does not require exponential step size weighting.
More than one stepsize is used. The coefficient vector, ĥk, is partitioned into a block of taps starting from tap zero and the remaining taps are partitioned into N equal length contiguous blocks. In one embodiment of the invention, N=8. Each block coefficient uses a different stepsize, μi in the update. Initially, the stepsize is zero over the initial taps that correspond to a fixed delay. The remaining blocks of coefficients use step sizes calculated as follows.
In the presence of certain types of inputs (for example narrow-band signals), the coefficients may drift from optimum values and grow slowly, eventually exceeding permissible word length. This is an inherent problem of the LMS algorithm; see Ifeachor and B. Jervis, Digital Signal Processing: A Practical Approach, Addison-Wesley, 1993, p. 556. The problem is overcome by introducing a coefficient leakage, that gently nudges the value toward zero. The leakage update equation using exponential steps that vary over the set of coefficients is as follows.
ĥk=ζ*ĥk+Axkεk
where
A=diag(0..0,μ0,S
For μi,j, i is the index for the stepsize to use in this position and j is the tap number of this position. ζ is a term in the range of [1–2−20,1–2−28] that ensures that the drift is contained and also introduces a bias in the normalized error term εk.
Multiple Error Update for DSP Acceleration
The single MAC calculation for one error per coefficient update, over one sample time, k, to update the FIR filter coefficients, and calculate the next error is:
B: hi+xi×εk−1+xi−1εk→hi
A is computed first in 1 or 2 cycles per tap depending on the number of
For a TMSC54xx (single MAC unit, single temporary register ) the B calculation is:
A and B together take five cycles every two samples on a C54xx processor. The total computation for each tap update for the C54xx processor is now: (2+3)/2=2.25 cycles/tap. Only single-port memory is required. Other single-
The computation of B using a dual-MAc processor is as follows:
Near end signals will disturb adaptation of the coefficients even to the point of adding echo or distorting the signal. A double talk detector is used to prevent adaptation during periods of near-end input. The double talk detector works on frame boundaries and does not turn off adaptation between boundaries. This can be for up to one frame time of thirty-two samples. The rest of the echo canceller should use a small step size in order to prevent divergence from the previously converged set of coefficients when this kind of double talk adaptation takes place.
Near-end background noise limits the amount of convergence that can be achieved by the algorithm. A small step size can guarantee convergence but at the cost of a larger error misalignment of the coefficients and slow convergence rate. A large step size gives a higher convergence rate but only in low-noise conditions. The stability limits discussed above show that the multiple error algorithm will have a lower upper bound for stability.
Robustness scaling works by using a large step size at initialization when the errors are large. As error diminishes a smaller step size is used. An increase of error after convergence is due either to double talk or a change in the echo path. The invention uses the following strategy to maintain a converged state, while allowing adaptation to a changing echo path:
Step 1 assumes the filter will be converging from zero. Large errors can be expected. The scale will only change at the ξ-limited τ rate until the scale eventually gets below the error limit and approaches the error mean. At this point, the filter is converged and scale is small. An error larger than the low error limit signifies double talk or echo path change. This strategy assumes double talk in an interval given by the τ constant. The scale will be increased after this interval, if either the double talk detector does not disable adaptation or the error decreases (double talk goes away).
Scale factor, Φk, affects the convergence rate during divergence. It is initialized to 0.1 and decreases as the filter taps converge to the room model. κ is the limiting factor for scale update, currently set to 1.1. Convergence is assumed when scaled |ek| is less than 90% (for the current κ value of 1.1) of the current scale.
The scale factor is updated using an-exponential window given by the robustness time constant τ. An update increment of 1.8 times the last scale value is added to the window during divergence. Thus, the scale will grow but delayed by the time constant, τ. Small errors as compared to κ (i.e. during convergence) will add the increment |ek|/β. In one embodiment of the invention, β had the value 0.607. Thus, scale during convergence will follow the error energy biased by the value 1/β.
Initial scale, Φ0, should be set to the rms value of the input signal. This is accomplished by letting the scale adapt during a period before echo cancellation is enabled. The adapted value of Φ provides a better starting point than using a fixed value of Φ, which is used only at process initialization.
The implementation is as follows.
Scale Factor
The update equation is modified by a scale factor, Φk, that is recalculated each sample as follows.
Alternatively, Φk+1=Φk can be used, which assumes the current scale should be used during the next adaptation interval. The first method is more stable than the second method and is preferred.
Otherwise,
can be used instead, which assumes that the scale should follow the size of the adaptation error with β being a bias that accounts for the distribution the error data. Gansler et al. (ibid.) relates eκ to β but this overcomplicates tuning the algorithm. b can be tuned to give good long term estimates of |eκ| in converged conditions.
The α used depends upon whether the loop is diverging or converging. If
the loop is converging and α=αf. Otherwise, the loop is diverging and α=αr. This differs from the Benesty et al. patent in which only one rate is used for tracking error. The convergence rate αf is set to give fast convergence on echo path change. The divergence rate αr is set to delay divergence by the expected length of possible double talk detection errors. The adaptive filter will quickly track to a convergence condition. ar determines how long to prevent the track away from the current condition to a new one, such as required by echo path changes. The rates are tuned for each application.
Error Update Calculation Smoothing
The robust error, ek, is used in the coefficient update calculation, based on the scale factor, as given by the following.
e′k=Cκsign(ek)
This replaces the error value used in the algorithm of the prior section. Thus, e′k is the value of the error used in the update algorithm that limits the amount of divergence from a convergent condition by means of the time constant and κ, the error magnitude limit.
Operation
Adaptation should be disabled when no echo is present and during double talk; i.e. when there is no signal to train on such that the filter will train to the background noise of the room, or when the filter will train to the near-end source. Cancellation occurs in all modes when the filter is in a convergent state. When adaptation is disabled, the echo path may change over time and the estimate will diverge. Thus, leakage should be used to unlearn (clear) the model in a time dependent fashion when adaptation is not being requested.
Quantization errors can accumulate in the coefficients as they are updated. Leakage prevents accumulation of errors.
Background noise will affect the achievable cancellation performance. Background noise can cause instability at a certain point. Decreasing the step size decreases tracking convergence but increases the times during which adaptation can take place in the presence of noise. The tuning of the relaxation stepsize, and exponential envelope parameters for the expected echo environment is essential. This environment includes the amount (length of time and strength) of double talk adaptation that may occur. Robust step size control, as described in the next section, is used to keep the algorithm stable in double talk environment.
Stability and Convergence
Mean square error analysis of the LMS, and multiple error LMS, gives the following result for the stability limit (the step size limits for guaranteed convergence) of each algorithm; see S. Douglas, “Analysis of the Multiple-Error and Block Least-Mean_Square Adaptive Algorithms”, IEEE Transactions on Circuits and Systems—II: Analog and Digital Signal Processing, Vol. 42, No. 2, February 1995
where μ is the step size, N is the number of errors used in the update, and R is the data correlation matrix summed over the input delay vector of length N at time k
This bound assumes that the input delay vectors are independent and the inputs simple, which is not really true. The bound is actually much stricter in practice, especially for correlated Gaussian input conditions. However, the given bound implies that as N increases the stability limit decreases by approximately 4*N. Normalization effectively removes the effect of tr[R] from the right-hand side. tr[R] is the sum of the diagonal terms of R (the trace of the matrix). Thus the limit is
Normalized least mean square with (N=1) requires a step size less than 0.67 and dual update (N=2) requires a step size less than 0.28, for non-zero near end white noise. This suggests that convergence of the multiple error algorithm diverges at a lower near-end noise condition than the nLMS algorithm, which has been observed in simulation of the invention. Statistical robustness techniques are used to maintain stability by dynamically scaling the step size to the stable range.
The invention thus provides a robust adaptive filter for noise reduction and an efficient method for adapting a programmable filter. Comparisons with other algorithms (single error update LMS and Fast Affine Projection (FAP)) show that, depending upon host processor, the invention uses 7.1–10.2 MIPS (million instructions per second), whereas single error update LMS uses 9.1–18.0 MIPS and FAP uses 12.2–20.4 MIPS. An adaptive filter constructed in accordance with the invention is relatively machine independent and is stable at low signal to noise ratios.
Having thus described the invention, it will be apparent to those of skill in the art that various modifications can be made within the scope of the invention. For example, circuits 72 and 83 (
Number | Name | Date | Kind |
---|---|---|---|
5384853 | Kinoshita et al. | Jan 1995 | A |
5410595 | Park et al. | Apr 1995 | A |
5949894 | Nelson et al. | Sep 1999 | A |
6223194 | Koike | Apr 2001 | B1 |
6377682 | Benesty et al. | Apr 2002 | B1 |
20030219113 | Bershad et al. | Nov 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20050152534 A1 | Jul 2005 | US |