1. Technical Field
This invention relates to a method for multiple channel acoustic echo cancellation (AEC), applicable to systems that derive a multi-channel spatialised signal from a monophonic signal, each channel of which is applied to a respective member of an array of loudspeakers at differing gains to give the percept or audible illusion of directionality. This class of spatialised signal will be termed here as steered mono. A steered mono system uses two or more gain elements to represent the spatialisation, which is mapped to a panning processor to generate corresponding loudspeaker outputs. In the embodiments to be described, a two-channel stereophonic signal is used, with two loudspeakers—a system known as “stereo from steered mono” (SSM), but the principles of the invention can be applied to systems with more than two channels. The invention has application in teleconferencing systems where each talker's voice is artificially given spatial positioning for the benefit of the listener.
2. Related Art
For comfortable speech communication in a teleconference system that uses a loudspeaker and microphone, as opposed to a headset, a method of acoustic echo cancellation (AEC) is required. For monophonic systems the topology shown in
Existing solutions to the stereo acoustic echo cancellation problem generally assume the system arrangement shown in
e(t)→0 (1)
With existing adaptive filter processes it is not possible to achieve a convergent set of filters such that
h1=ĥ1 and h2=ĥ2 (2)
Instead, a convergent solution such as the following is obtained
h1*g1+h2*g2=ĥ1*g1+ĥ2*g2 (3)
where * is the convolution operator. Note that Equation (3) satisfies Equation (1), but that Equation (2) is not a unique solution for Equation (3), so the values for h1 and h2 cannot be derived from this result.
If the filters g1 or g2 change, possibly due to the talker moving, the equality in Equation (3) no longer holds (unless Equation (2) is also met). Thus, the echo canceller no longer produces a convergent solution and the echo heard by the talker rises in level.
Various solutions to this problem have been proposed that either manipulate the loudspeaker signals, x1 and x2, or use the properties of the signals x1 and x2. The aim of these solutions is to make use of the cross-correlation properties of the two signals as it can be shown that a solution to Equation (2) exists when the two signals are sufficiently decorrelated. However, as the signals x1 and x2 are inherently highly correlated in a teleconferencing system, techniques that exploit the small decorrelated features in the signals have poor performance in anything but ideal conditions.
It has been proposed to add a small amount of independent white noise to the signals x1 and x2. It is shown that this significantly aids the convergence of the solution to that in Equation (2) by introducing some signal de-correlation. However, although adding noise in this manner does improve the convergence, the noise has to be added at such a level that it is undesirably audible.
According to an exemplary embodiment of the invention, there is provided a method of acoustic echo cancellation for a multiple channel steered spatialised signal, the steered spatialised signal being generated from a signal input modified according to respective spatialisation gain functions to generate a plurality of audio channels, the echo cancellation process using a combined spatialisation and echo path estimate, the estimate being derived from the gain functions applied to the respective channels, whereby when the gain functions applied in the respective channels are changed, an estimate of the echo is generated, the estimate being based on a previous estimate of the echo path and on the gain functions, the echo path estimates being used to generate an echo cancellation signal.
According to another aspect, there is provided apparatus for acoustic echo cancellation in a multiple channel steered spatialised audio system, the spatialised audio system comprising
signal input means for receiving an audio signal,
a plurality of audio output means for generating acoustic signals derived from the audio signal;
control means associated with the audio output means for generating gain control functions controlling the audio output means such that a spatialised version of the audio signal is generated by the said plurality of audio output means;
audio input means for detecting acoustic signals;
signal output means for transmitting a signal derived from the acoustic signals detected by the audio input means;
echo path estimation means comprising detection means for identifying changes in the gain control functions in the respective control means, and estimation means for generating an estimate of the echo path between the acoustic output means and the acoustic input means, the estimate being based on a previous estimate of the echo path and on the gain control functions detected by the detection means,
echo cancellation signal generation means for generating an echo cancellation signal derived from the spatialised audio signals generated by the control means and the estimates derived by the echo path estimation means, and
signal combination means for applying the echo cancellation signal to the signal generated by the audio input means.
This exemplary embodiment of the invention is an adaptation of the monophonic LMS process and avoids multiple updates to two of more echo path estimates, such as ĥ1 and ĥ2, and reduces the number of filter operations required when compared with existing stereo echo cancellation processes, such as ĥ1* x1 and ĥ2 * x2. Additionally, this embodiment of the invention uses the spatialisation parameters in the adaptive process, unlike existing stereo echo cancellation processes. The LMS update is modified to take into account the spatialisation parameters that are used to update the aggregated echo path estimate each time the spatialisation changes. After the Nth spatialisation change, (where N is the number of channels in the system), the aggregated echo path estimate converges towards the aggregate echo path for future changes in spatialisation. Prior to the Nth spatialisation change the process converges to a local solution for the aggregated echo path estimate so that some echo signal reduction is still given in the learning stage of the process.
The learning stage can be made part of a set-up phase prior to use of the system for live traffic. For example, the required number of spatialisation changes can be achieved by operating the monophonic LMS process for each channel in turn, by setting the gains of the other channels to zero.
An embodiment of the invention will now be described, by way of example, with reference to the Figures in which
The monophonic system illustrated in
In the general case shown in
h1*g1+h2*g2=ĥ1*g1+ĥ2*g2 (3)
but this does not necessarily imply that Equation (2) also holds:
h1=ĥ1 and h2=ĥ2 (2)
If the spatialisation, and hence the functions g1,g2 are changed, it will be seen that the adaptive filters must be reset to correspond to the new spatialisation.
In a steered system such as that illustrated in
The operation of the adaptive processor 24, 25 will now be described, with reference to
Following an initialisation step 101 in which notional values for the gain functions g1,g2 are set, the process runs on an iterative loop for each sampling period n as follows.
Firstly, values k1, k2, k3 are set (step 102). These identify the last three sampling periods at which the spatialisation values g1,g2 changed. If the spatialisation gain values g1,g2 have not changed since the previous sample n−1, the values of k1, k2, k3 are the same as for the previous sample. However, if the values have changed, then k3 is set to the previous value of k2, k2 is set to the previous value of k1, and k1 is set to n−1.
The estimated gain function is then determined (step 103). This is the matrix
If the spatialisation values g1,g2 are unchanged, this matrix is also unchanged and does not need to be recalculated. The inverse of this matrix is then determined.
Again, if the spatialisation values, g1,g2 are unchanged, this matrix is also unchanged and does not need to be recalculated.
Next, (step 104), if r=n−k1+1 is less than the number of terms L in the estimated echo path vector ĥ (in other words, if the number of samples r elapsed since the last spatialisation change is less than L), one term in the estimated echo path vector ĥn-1 is amended as follows
(ĥn-1 is the specific instance of the estimated echo path function ĥ from the previous iteration). All other terms ĥn-1(0) . . . ĥn-1(r−1) and ĥn-1(r+1). . .ĥn-1(L−1) remain unchanged.
The error cancellation signal snT ĥn-1 (where sn is the vector representing the last L samples of the input signal s(n)) is then generated using the revised estimate echo path vector ĥn-1 (step 105) and subtracted from the signal y(n) to generate the output signal e(n).
The estimated echo path vector ĥn-1 is then adapted in response to the echo signal e(n) (step 106) for use in the next iteration.
ĥn=ĥn-1+μsnε(n)
where ε(n)=snTsne(n) and μ is the step size.
The progress is stable provided that the spatialisation changes on a longer timescale than the period L, and that 0≦μ<2.
The computational complexity of steps 105-106 in the above process is the same as the normalised LMS process which is of the order 2L. The number of computations is of the order of two multiplications and one division for the matrix inversion used in step 103. As this is only performed once after each change in spatialisation it adds little to the complexity of the process for large L. Step 104 is only calculated in the first L samples after a spatialisation change and is insignificant for large L. Thus, when the process shown above is used for acoustic echo cancellation with a steered mono system, for which it is likely that L>100, the process has a complexity of approximately 2L.
A mathematical description follows. This will start from the system shown in
As shown in
Let the input to the spatialisation block at sample time n be represented by a column vector sn=[s(n) s(n−1) . . . s(n−(L−1))]T, the input to the listener end microphone by yn=[y(n) y(n−1) . . . y(n−(L−1))]T and the two loudspeaker-to-microphone echo paths be length L column vectors h1 and h2 (which incorporate the loudspeaker and microphone impulse responses), then
yn=g1(n)Snh1+g2(n)Snh2 (4)
where the spatialisation is represented as the gain values g1(n) and g2(n) which are constant over the sample periods n−(L−1) . . . n, and Sn=[sn. . . sn−(L−1)]T. (This is a “Toeplitz” matrix, that is, a symmetrical matrix of order L×L, having the terms of sn in the first row and the first column, the terms of sn-1 in the second row and column, and so on). It can be shown that h1 and h2 cannot be solved from Equation (4).
However, now consider using a second set of input and output observations at sample time n+a where Δ≧L and
g1(n+a)=g1(n+a−1)=. . . =g1(n+1)≠g1(n) g2(n+a)=g2(n+a−1)=. . .=g2(n+1)≠g2 (n) (5)
in other words the functions g1 and g2 have changed between sample time n and sample time n+1, but then remained unchanged between time n+1 and sample time n+a.
Hence,
where IL is the L×L identity matrix,
and {circle around (x)} is the Kronecker product. (The Kronecker product of two matrices A and B is given by multiplying matrix B separately by each individual term in matrix A and forming a new matrix, (whose order is the product of the original two matrices) with the resulting terms).
The solution to Equation (6) is
and using Kronecker product identities
Thus, a solution for h1 and h2 exists if the signal s is persistently exciting (i.e. it has a full spectral content) and the matrix Gn,n+a is non-singular, that is, it has an inverse matrix. The non-singular condition for Gn,n+a is met if the spatialisation values at sample times n and n+a are different and not scalar multiples of each other (i.e. g1(n)/g2(n)≠g1(n+a)/g2(n+a)). Ideally the values should be sufficiently different such that the solution of Equation (9) is well conditioned.
Having established that a solution exists the adaptive process for the solution is now derived from the LMS process. The normalised LMS (NLMS) process is used to perform monophonic echo cancellation as discussed with reference to
e(n)=y(n)−snTĥn-1 (10)
ε(n)=snTsne(n) (11)
ĥn=ĥn-1+μsnε(n) (12)
where e(n) is the echo signal, μ is the step size parameter and ĥn is the echo path estimate at the nth sample instance. The single channel normalised LMS equations above can be modified for the steered mono case by using a single aggregate echo path estimate and redefining ĥn-1 as
where ĥ1(t) and ĥ2 (t) are functions representing the two echo path estimates at sample interval n. Likewise define h as the combination of h1(t) and h2(t) in a form equivalent to that shown in Equation (13)
The task is then to use and update ĥ such that the normalised LMS updates of equations 10, 11 and 12 are used for the echo cancellation rather than using two echo path estimates explicitly. If the values of g1(n) and g2(n) are constant for all n then the updates in Equation 10, 11 and 12 can be used unchanged to determine an estimate of h, as h is constant over time. However, if g1(n) and g2(n)change over time then this solution can not be used as a change in h is not taken into account in the LMS updates of Equations 10, 11 and 12.
Consider three sample epochs i, i−a and i−b where
b>>L>a (15)
and
g1(i)=g1(i−1)=. . .=g1(i−a)≠g1(i−a−1)=g1(i−a−2)=. . . =g1(i−b) g1(i−b)≠g1(i−b−1)=g1(i−b−2)=. . . =g1(i−∞) (28)
and likewise for g2(n), i.e. values of g1(n) and g2(n) change only on the epochs i−a and i−b.
Consider the value of the jth coefficient in the combined echo path at the epochs i−a−1 and i−b−1 (i.e. just prior to the spatialisation changes) which from
Equations 14 and 16 is given by
hi-b-1(j)=h1(j)g1(i−b−1)+h2(j)g2(i−b−1) (17)
hi-a-1(j)=h1(j)g1(i−a−1)+h2(j)g2(i−a−1) (18)
Equations 17 and 18 can be expressed as
and thus
using the definition of G from (7).
Further consider the value of the jth coefficient in the combined echo path at the epoch i which, from (14) and (16) is given by
If the elements of G−1 are defined by a variable γ such that
then from (19), (21) and (22)
hi(j)=(γ00hi-b-1(j)+γ01hi-a-1(j))g1(i)+(γ10hi-b-1(j)+γ11hi-a-1(j))g2(i) (Equation 23)
This equation is the additional update required for the normalised LMS update of Equations (10), (11) and (12). Note that from (21) only one coefficient in h need be updated in each sample period to take account of a spatialisation change.
The process can be extended to a system that has more than two channels, by making a small modification to the process. Specifically for an N-channel system the previous N+1 changes in the spatialisation position are recorded in variables kN+1, . . . , k1 from the least to most recent respectively. The matrix G is generalised as
The step 104 may be generalised as
The steps 105 and 106 remain unchanged.
To demonstrate the described process both the stereo normalised least mean square process according to the invention and the normalised least mean square process were simulated using the configuration shown in
The performance of the process according to the invention can also be observed for speech signals in
The process described uses the normalised least mean square adaptive filter to form the update of the combined echo path estimate. However, any current or future adaptive process that updates an estimate of an unknown filter on a sample by sample basis can be used in place of the described normalised least mean square algorithm. The only modification required is to replace the process step 106 with another filter update. Suitable existing examples are fast affine projection, least mean squares or recursive least mean squares adaptive filters.
Number | Date | Country | Kind |
---|---|---|---|
99304040 | May 1999 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/GB00/01904 | 5/18/2000 | WO | 00 | 10/25/2001 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO00/72567 | 11/30/2000 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5371789 | Hirano | Dec 1994 | A |
5649011 | Garofalo et al. | Jul 1997 | A |
5661813 | Shimauchi et al. | Aug 1997 | A |
5671287 | Gerzon | Sep 1997 | A |
5734724 | Kinoshita et al. | Mar 1998 | A |
5745564 | Meek | Apr 1998 | A |
5828756 | Benesty et al. | Oct 1998 | A |
6246760 | Makino et al. | Jun 2001 | B1 |
6553122 | Shimauchi et al. | Apr 2003 | B1 |
6556682 | Gilloire et al. | Apr 2003 | B1 |
6707912 | Stephens et al. | Mar 2004 | B2 |
6895093 | Ali | May 2005 | B1 |
6931123 | Hughes | Aug 2005 | B1 |
6990205 | Chen | Jan 2006 | B1 |
7012630 | Curry et al. | Mar 2006 | B2 |
7245710 | Hughes | Jul 2007 | B1 |
Number | Date | Country |
---|---|---|
01-303852 | Dec 1989 | JP |
05-316239 | Nov 1993 | JP |
07-007557 | Jan 1995 | JP |
09-130306 | May 1997 | JP |
09-307651 | Oct 1997 | JP |
09-261351 | Nov 1997 | JP |