1. Field of the Invention
Embodiments of the present invention generally relate to a method and apparatus for noise and echo cancellation for two microphone system subject to cross-talk.
2. Description of the Related Art
For the case of cross-talk, noise leakage and echo interference are common on primary and reference channel inputs. There is a need for removing interfering noise and echo from an acoustics system with two microphone inputs, which suffers from the problem of cross-talk.
Embodiments of the present invention relate to a method and apparatus for joint noise and echo cancellation of a two microphone system subject to cross-talk. The method includes retrieving the primary microphone signal, the reference microphone signal and the reference echo signal, utilizing the retrieved primary microphone signal, the reference microphone signal and the reference echo signal to estimate the cross-talk and echo in reference channel, noise leakage and echo in primary channel, estimating the primary output by removing the noise leakage and the echo estimate from the primary channel, estimating the reference output by removing the cross-talk and echo estimate from the reference channel, when an echo is detected in the reference echo signal, adapting filters H13 and H23 by NLMS, when the estimated primary output includes speech, adapting filters H12 and H21 by de-correlation, when neither echo nor speech is detected, adapting filter H12 by NLMS, obtaining the primary output and the reference output by post-filtering of the estimated primary output and the estimated reference output, respectively, and utilizing the primary output to extract speech from a two microphone system subject to cross-talk, noise and echo.
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
Described herein is a method and apparatus for joint noise and echo cancellation in multi-microphone setup, which includes an assumed mixing model for the mixtures including speech, noise and echo. In addition, a de-mixing algorithm is included to invert the mixing model. The algorithm may use four filters to estimate the mixing filters and Voice Activity Detector (VAD), which is used to obtain references for each filter adaptation. Thus, in one embodiment, a model-based algorithm is utilized, which simultaneously models cross-talk, noise leakage, and echo path and adaptively removes noise and echo from the primary microphone channel. In one embodiment, it is assumed that a clean reference of the echo is available, usually from the downlink.
Therefore, the method and apparatus may combine the adaptive problems of two microphone noise canceller and echo reduction into one algorithm. In one embodiment, two Voice Activity Detectors (VAD) are used to identify the presence of noise, speech and echo, which uses different adaptation strategies based on the presence of one of these activities. Furthermore, the noise reduction is robust to the presence of cross-talk between the two microphones.
As a result, the outcome shows strong noise cancellation performance, even for non-stationary noise such as babble, the integrated noise and echo cancellation design reduces potential interaction issues between the noise adaptation and echo cancellation, good echo cancellation performance in the presence of noise, and the implementation is possible both in time and frequency domain.
Hence, the algorithm shows a good performance of speech separation from a mixture input including echo and noise in cross-talk. Such an algorithm adds echo reference input from downlink signal to remove far-end echo on primary channel input. To build up the algorithm, an environmental mixing model is utilized for cases, such as, when mixtures include speech, noise and echo.
The mixing model may have some assumptions, such as, unity gain for direct paths and the other one is assuming the relation between primary-echo channel and reference-echo channel. In this assumption, echo from downlink signal influences to primary and reference channel inputs, but may not be affected by the opposite directions. Next, a de-mixing algorithm based on the mixing model is developed. Since the algorithm may utilize four filters to be adapted, filter adaptation method may be implemented.
where H12(z) is an FIR filter modeling the noise leakage from the reference channel to the primary channel, H21(z) is the filter modeling the speech leakage from primary channel to the reference channel, H13(z) is the echo reference leakage into primary channel and H23(z) is the echo reference leakage that flow into the reference channel.
From the mixing model, Cross-talk Resistant Adaptive Noise and Echo Canceller (CTR-ANEC) de-mixing algorithm can be developed. By filter inversion operation, each source on primary and reference channel may be separated. Eq. (2) represents the de-mixing system by a matrix form. The echo reference input may not change and may remain the same as the echo reference input via the mixing and de-mixing systems.
Thus, the de-mixing algorithm may be implemented in a feed-forward fashion.
In one embodiment, NLMS is utilized due to its implementation convenience. Two channel VAD outputs from primary and echo channel inputs are referred for filter adaptation as well. Primary channel VAD may be activated during the time interval when there was speech input on the primary channel. Likewise, Echo channel VAD may be activated during the time when echo input was detected.
Cross-talk and noise leakage filters H12(z) and H21(z) may be estimated using de-correlation filter adaptation method using a steepest descent method. To be more specific, filter H12(z) may be adapted during the time there is no speech input on the primary channel. Similarly, filter H21(z) may be adapted while speech input is coming on the primary channel.
h
12
k+1
h
12
k
+μ
12
x
1(k)
h
21
k+1
=h
21
k
+μ
21
x
2(k)
And the step-sizes for each filter are given as following.
The echo filters H13(z) and H23(z) may be estimated by Normalized Least Square (NMLS) algorithm. The stereo VAD outputs are referred to select filters to be adapted and their adaptation scheme.
The echo filters H13(z) and H23(z) are updated in time domain by the equations as follows,
and the step-sizes for each filter will be updated by the following equations.
Since different filter adaptation methods de-correlation and NLMS may be used inside the proposed algorithm, VAD outputs from primary and echo reference channel inputs play important role in the filter adaptation scheme. Two channel VAD outputs may be used to decide which filter should be adapted based on certain primary and echo reference inputs.
There are a series of scenario for filter adaptation scheme using VAD output, however, approachable cases are selected. Table 1 shows the filter adaptation scheme for adapting filters in the CTR-ANEC. Some cases may not happen in real world. For example, pure speech only and echo only cases may not be expected on primary channel inputs.
As shown in Table 1, a filter adaptation for the case of double-talk and noise primary input. From the Table 1, all of the four filters are adapted in the CTR-ANEC. In the real world implementation, the four filters may not be adapted simultaneously. Instead, in one embodiment, two filters first, Ĥ12(z) and Ĥ21(z) are adapted, which may be frozen. Next, the next two filters are adapted, Ĥ13(z) and Ĥ23(z) with the frozen filters.
If neither echo nor speech is detected, then filter H12 is adapted by NLMS. The method proceeds to obtain the primary output and the reference output by post-filtering of the estimated primary output and the estimated reference output, respectively. The primary output and the reference output are used to cancel the echo and noise of a two microphone system subject to cross-talk.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims benefit of U.S. provisional patent application Ser. No. 61/414,943 filed Nov. 18, 2010, which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61414943 | Nov 2010 | US |