The present application claims priority to Chinese patent application No. 201410857165.0, filed on Dec. 30, 2014, and entitled “METHOD AND APPARATUS FOR REDUCING DISTORTION ECHO”, and the entire disclosure of which is incorporated herein by reference.
The present disclosure generally relates to echo technology field, and more particularly, to a method and apparatus for reducing distortion echo.
In audio systems, echo interference cannot be avoided due to a signal reflection path. In audio communications, echoes generally classified to electric echoes and acoustic echoes. Electric echoes are caused by signal reflection which is generated due to impedance mismatch. Acoustic echoes are generally generated in below scenario. At a receptor side, voice from a speaker is received by a voice receiving device and then sent to a speaker side. The acoustic echoes include direct echo and indirect echo. The voice which comes from the speaker and then directly received by the voice receiving device is called the direct echo. The voice which comes from the speaker are reflected for one or more than one time through different paths (for example, buildings or any objects in buildings) and then all the reflected voice is received by the voice receiving device, which is called the indirect echo. Echoes are sent to the speaker side after channel delay and heard by a speaker at the speaker side, which causes interference to audios at the speaker side, reduces audio clearness, and affects audio communication quality.
In the sixties of the 20th century, to eliminate the influence to audio communications caused by echoes, Sondhi in Bell Labs raised a self-adaptive filtering method to realize echo cancellation.
In echo cancellation technologies, as acoustic echoes have features such as multi-path, long delay, slow attenuation, time variation and nonlinear, acoustic echo cancellation (AEC) has strict requirements on the performance of the self-adaptive filter 4. To a handheld device which is seriously nonlinear, the requirements on the self-adaptive filter 4 may be stricter. As a handheld device is relatively small, a micro speaker therein is much smaller than a normal speaker. To satisfy volume requirement in hands-free communication, the micro speaker generally works in a nonlinear region, which results in more serious audio distortion. In this situation, the self-adaptive filter 4 may provide very small echo loss and works unsteadily. The self-adaptive filter 4 may provide no echo loss when facing a transient signal. Therefore, a method and apparatus for reducing distortion echo, which can steadily provide echo loss with high amplitude under a situation that a speaker has relatively serious distortion, are required.
In embodiments of the present disclosure, a method and apparatus for reducing distortion echo are provided, which can steadily provide echo loss with high amplitude under a situation that a speaker has relatively serious distortion.
In an embodiment of the present disclosure, a method for reducing distortion echo is provided, including: performing K-path amplification to a downlink reference signal x(t) to obtain K-path preprocessed signals, where K is a positive integer; performing a pre-distortion process to the K-path preprocessed signals, respectively, to obtain K-path pre-distorted signals rk(t), where k=1, 2, . . . , K; performing a filtering process to the downlink reference signal x(t) and each of the K-path pre-distorted signals using a self-adaptive filter which corresponds to the downlink reference signal x(t) and the corresponding pre-distorted signal, to obtain (K+1) path filtering signals; calculating differences between a target signal d(t) and each of the (K+1) path filtering signals to obtain (K+1) path error signals; performing a minimum-value fusion process to the (K+1) path error signals to obtain a residual signal e(t); and outputting the residual signal e(t) as a final self-adaptive echo cancellation.
Optionally, a pre-distortion mapping function used in the pre-distortion process meets the following equation:
rk(t)=fk(pk(t)),
where rk(t) is a kth-path pre-distorted signal, pk(t) is a kth-path preprocessed signal, fk(x)≠cx, fk(x)≠c, c is a constant, and k=1, 2, . . . , K.
Optionally, to control an input range and an output range of the pre-distortion mapping function by normalization, the pre-distortion mapping function meets the following equation:
where xmax is a maximum amplitude of the downlink reference signal x(t), −1<=fk(x)<=1, and k=1, 2, . . . , K.
Optionally, when an echo path model in the self-adaptive filter is a time-domain model hk,t, the error signal is represented as follows:
where e0(t) is the error signal corresponding to the downlink reference signal x(t), d(t) is the target signal, y0(t) is the filtering signal corresponding to the downlink reference signal x(t), h0,t is an Mth-order Finite Impulse Response (FIR) filter at a t time point, h0,t=[h0,t (1), h0,t (2), . . . , h0,t (M)]T, ek(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), yk(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), hk,t is an Mth FIR filter at a Kth t time point, hk,t=[hk,t (1), hk,t (2), . . . , hk,t (M)]T, k=1, 2, . . . , K, superscript T is a transposition symbol, {circle around (×)} is a convolution symbol, t is a time index, M is an order number and satisfies the echo path model, M is within a range from 0.01*fs to fs, and fs is sampling frequency.
Optionally, when the echo path model in the self-adaptive filter is a time-domain model hk,t, the echo path model corresponding to each path pre-distorted signal is updated to:
hk,t+1=hk,t+Δhk,t,
where Δhk,t is the updated item of the coefficient of the self-adaptive filter, Δhk,t is an Mth-order vector, M is a positive integer, and k=1, 2, . . . , K.
Optionally, when the echo path model in the self-adaptive filter is a frequency-domain model Hk,t, the error signals e0(t), e1(t), . . . , eK(t) are represented by:
e0(t)=d(t)−y0(t)
[y0(t−(N−M)+1),y0(t−(N−M)+2), . . . ,y0(t)]T=[0(N−M)×MI(N−M)×(N−M)]F−[H0,t·R0,t]
ek(t)=d(t)−yk(t),
[yk(t−(N−M)+1),yk(t−(N−M)+2), . . . ,yk(t)]T=[0(N−M)×MI(N−M)×(N−M)]F−[Hk,t·Rk,t],
where e0(t) is the error signal corresponding to the downlink reference signal x(t), d(t) is the target signal, y0(t) is the filtering signal corresponding to the downlink reference signal x(t), ek(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), yk(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), superscript T is a transposition symbol, 0(N−M)×M is a zero matrix having (N−M) rows and M columns, I(N−M)×(N−M) is a (N−M)-by-(N−M) identity matrix, F− is an inverse discrete Fourier transform matrix, · is a dot product symbol, H0,t is an N-point vector at a t time point, R0,t=F[x(t−N+1), x(t−N+2), . . . , x(t)]T, Hk,t is an N-point vector at a Kth t time point, Rk,t=F[rk(t−N+1), rk(t−N+2), . . . , rk(t)]T, t is a time index, N is length of a signal frame, M is an order number, M is within a range from 0.01*fs to fs, and fs is sampling frequency.
Optionally, when the echo path model in the self-adaptive filter is a frequency-domain model Hk,t, the echo path model corresponding to each path pre-distorted signal is updated to:
Hk,t+1=Hk,t+ΔHk,t,
where ΔHk,t is the updated item of the coefficient of the self-adaptive filter, ΔHk,t is an Nth-order vector, N is a positive integer, and k=1, 2, . . . , K.
Optionally, performing a minimum-value fusion process to the (K+1) path error signals to obtain a residual signal e(t) may include: mapping the (K+1) path error signals to corresponding mapping signals using a reversible space mapping method; calculating metric values of the corresponding mapping signals using a predetermined minimum-value metric function; searching the minimum metric value among the calculated metric values; and mapping the mapping signal which corresponds to the minimum metric value back to a space where the (K+1) path error signals stay, to obtain the residual signal e(t).
In an embodiment of the present disclosure, an apparatus for reducing distortion echo is provided, including: an amplification unit, configured to perform K-path amplification to a downlink reference signal x(t) to obtain K-path preprocessed signals, where K is a positive integer; a pre-distortion processing unit, configured to perform a pre-distortion process to the K-path preprocessed signals, respectively, to obtain K-path pre-distorted signals rk(t), where k=1, 2, . . . , K; a filtering unit, configured to perform a filtering process to the downlink reference signal x(t) and each of the K-path pre-distorted signals using a self-adaptive filter which corresponds to the downlink reference signal x(t) and the corresponding pre-distorted signal, to obtain (K+1) path filtering signals; a difference calculation unit, configured to calculate differences between a target signal d(t) and each of the (K+1) path filtering signals to obtain (K+1) path error signals; a fusion unit, configured to perform a minimum-value fusion process to the (K+1) path error signals to obtain a residual signal e(t); and an output unit, configured to output the residual signal e(t) as a final self-adaptive echo cancellation.
Optionally, a pre-distortion mapping function used in the pre-distortion process meets the following equation:
rk(t)=fk(pk(t)),
where rk(t) is a kth-path pre-distorted signal, pk(t) is a kth-path preprocessed signal, fk(x)≠cx, fk(x)≠c, c is a constant, and k=1, 2, . . . , K.
Optionally, to control an input range and an output range of the pre-distortion mapping function by normalization, the pre-distortion mapping function meets the following equation:
where xmax is a maximum amplitude of the downlink reference signal x(t), −1<=fk(x)<=1, and k=1, 2, . . . , K.
Optionally, when the echo path model in the self-adaptive filter is a time-domain model hk,t, the error signal is represented as follows:
where e0(t) is the error signal corresponding to the downlink reference signal x(t), d(t) is the target signal, y0(t) is the filtering signal corresponding to the downlink reference signal x(t), h0,t is an Mth-order Finite Impulse Response (FIR) filter at a t time point, h0,t=[h0,t (1), h0,t (2), . . . , h0,t (M)]T, ek(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), yk(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), hk,t is an Mth FIR filter at a Kth t time point, hk,t=[hk,t (1), hk,t (2), . . . , hk,t (M)]T, k=1, 2, . . . , K, superscript T is a transposition symbol, {circle around (×)} is a convolution symbol, t is a time index, M is an order number and satisfies the echo path model, M is within a range from 0.01*fs to fs, and fs is sampling frequency.
Optionally, when the echo path model in the self-adaptive filter is a time-domain model hk,t, the echo path model corresponding to each path pre-distorted signal is updated to:
hk,t+1=hk,t+Δhk,t,
where Δhk, is the updated item of the coefficient of the self-adaptive filter, Δhk,t is an Mth-order vector, M is a positive integer, and k=1, 2, . . . , K.
Optionally, when the echo path model in the self-adaptive filter is a frequency-domain model Hk,t, the error signals e0(t), e1(t), . . . , eK(t) are represented by:
e0(t)=d(t)−y0(t)
[y0(t−(N−M)+1),y0(t−(N−M)+2), . . . ,y0(t)]T=[0(N−M)×MI(N−M)×(N−M)]F−[H0,t·R0,t]
ek(t)=d(t)−yk(t),
[yk(t−(N−M)+1),yk(t−(N−M)+2), . . . ,yk(t)]T=[0(N−M)×MI(N−M)×(N−M)]F−[Hk,t·Rk,t],
where e0(t) is the error signal corresponding to the downlink reference signal x(t), d(t) is the target signal, y0(t) is the filtering signal corresponding to the downlink reference signal x(t), ek(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), yk(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), superscript T is a transposition symbol, 0(N−M)×M is a zero matrix having (N−M) rows and M columns, I(N−M)×(N−M) is a (N−M)-by-(N−M) identity matrix, F− is an inverse discrete Fourier transform matrix, · is a dot product symbol, H0,t is an N-point vector at a t time point, R0,t=F[x(t−N+1), x(t−N+2), . . . , x(t)]T, Hk,t is an N-point vector at a Kth t time point, Rk,t=F[rk(t−N+1), rk(t−N+2), . . . , rk(t)]T, t is a time index, N is length of a signal frame, M is an order number, M is within a range from 0.01*fs to fs, and fs is sampling frequency.
Optionally, when the echo path model in the self-adaptive filter is a frequency-domain model Hk,t, the echo path model corresponding to each path pre-distorted signal is updated to:
Hk,t+1=Hk,t+ΔHk,t,
where ΔHk,t is the updated item of the coefficient of the self-adaptive filter, ΔHk,t is an Nth-order vector, N is a positive integer, and k=1, 2, . . . , K.
Optionally, the fusion unit may include: a mapping sub-unit, configured to map the (K+1) path error signals to corresponding mapping signals using a reversible space mapping method; a metric value obtaining sub-unit, configured to calculate metric values of the corresponding mapping signals using a predetermined minimum-value metric function; a searching sub-unit, configured to search the minimum metric value among the calculated metric values; and a residual signal obtaining sub-unit, configured to map the mapping signal which corresponds to the minimum metric value back to a space where the (K+1) path error signals stay, to obtain the residual signal e(t).
From above, in embodiments of the present disclosure, a method and apparatus for reducing distortion echo are provided. K-path amplification and pre-distortion process are performed to the downlink reference signal to obtain K-path pre-distorted signals. Afterwards, filtering is performed using the self-adaptive filters which correspond to the downlink reference signal x(t) and the K-path pre-distorted signals to obtain the filtering signals. Error signals are obtained by calculating differences between the target signal and each of the filtering signals. The minimum-value fusion process is performed to the error signals to obtain the residual signal which is then output as the final self-adaptive echo cancellation. In embodiments of the present disclosure, the residual signal is relatively small as the minimum-value fusion process is performed to the error signals. That is to say, echo loss is relatively great. Therefore, the method may provide echo loss with high amplitude under a situation that a speaker has relatively serious distortion.
In order to clarify solutions of embodiments of the present disclosure or related art, accompanying drawings of the present disclosure or the related art will be described briefly. Obviously, the drawings are just examples and do not limit the scope of the disclosure, and other drawings may be obtained by a person skilled in the art based on these drawings without creative work.
Embodiments of present disclosure will be described clearly in detail in conjunction with accompanying drawings. The embodiments below are only described for example, and there are many other possible embodiments. Based on the embodiments below, all the other embodiments obtained by those skilled in the art without any creative efforts should belong to the scope of the present disclosure.
In embodiments of the present disclosure, a method and apparatus for reducing distortion echo are provided, which can steadily provide echo loss with high amplitude under a situation that a speaker has relatively serious distortion.
In S11, K-path amplification is performed to a downlink reference signal x(t) to obtain K-path preprocessed signals, where K is a positive integer.
In some embodiments, performing the K-path amplification may include: adjusting amplitude of the downlink reference signal x(t) using gains g1, g2, . . . , gk to obtain the K-path preprocessed signals p1(t), p2(t), . . . , pk(t). Equation used in the step is shown as follows:
where 0<=g1, g2, . . . , gk<=1.
It should be noted that, the gain is not greater than 1 to avoid the downlink reference signal to generate extra amplitude spillover distortion in a digital system.
In S12, a pre-distortion process is performed to the K-path preprocessed signals, respectively, to obtain K-path pre-distorted signals rk(t), where k=1, 2, . . . , K.
A pre-distortion mapping function used in the pre-distortion process meets the following equation:
rk(t)=fk(pk(t)),
where rk(t) is a kth-path pre-distorted signal, pk(t) is a kth-path preprocessed signal, fk(x)≠cx, fk(x)≠c, c is a constant, and k=1, 2, . . . , K.
The pre-distortion mapping function aims to generate a pre-distorted signal related to the downlink reference signal x(t) based on the downlink reference signal x(t), to simulate distortion of a speaker.
In S13, a filtering process is performed to the downlink reference signal x(t) and each of the K-path pre-distorted signals using a self-adaptive filter which corresponds to the downlink reference signal x(t) and the corresponding pre-distorted signal, to obtain (K+1) path filtering signals.
In S14, differences between a target signal d(t) and each of the (K+1) path filtering signals are calculated to obtain (K+1) path error signals.
In S15, a minimum-value fusion process is performed to the (K+1) path error signals to obtain a residual signal e(t).
In S16, the residual signal e(t) is output as a final self-adaptive echo cancellation.
From above, in the method for reducing distortion echo, K-path amplification and pre-distortion process are performed to the downlink reference signal to obtain K-path pre-distorted signals. Afterwards, filtering is performed using the self-adaptive filters which correspond to the downlink reference signal x(t) and the K-path pre-distorted signals to obtain the filtering signals. Error signals are obtained by calculating differences between the target signal and each of the filtering signals. The minimum-value fusion process is performed to the error signals to obtain the residual signal which is then output as the final self-adaptive echo cancellation. In embodiments of the present disclosure, the residual signal is relatively small as the minimum-value fusion process is performed to the error signals. That is to say, echo loss is relatively great. Therefore, the method may provide echo loss with high amplitude under a situation that a speaker has relatively serious distortion.
In some embodiments, to facilitate the design and use of the pre-distortion mapping function, an input range and an output range of the pre-distortion mapping function are controlled by normalization and the pre-distortion mapping function may meet the following equation:
where xmax is a maximum amplitude of the downlink reference signal x(t), −1<=fk(x)<=1, and k=1, 2, . . . , K.
In some embodiments, the pre-distortion mapping function may be selected from but not limited to the following equations:
fk(x)=|x|γ+c,
fk(x)=sign(x)|x|γ+c,
fk(x)=sin(cx),
fk(x)=tan(cx),
or any combination of the above equations, for example,
fk(x)=a1|x|γ
In some embodiments, the pre-distortion mapping function may be a function, for example,
where c, c1, c2, c3, c4, γ, γ1, γ2, a1, a2, a3, a4, x1 and x2 are real constants, and sign(x) is a function that extracts the sign of a real number.
Multi-path pre-distorted mapping function is required for obtaining the pre-distorted signals because of following reasons. The distortion of the speaker is complicated and time-varying. One type of distortion process hardly effectively approximates distortion component in an echo signal. Therefore, in embodiments of the present disclosure, multi-path distortion process is performed to obtain various distortion process results, such that an input of the minimum-value fusion process can be selected from abundant candidates.
In some embodiments, an echo path model in the self-adaptive filter may be a time-domain model or a frequency-domain model. The self-adaptive filter and the error signal which correspond to the downlink reference signal x(t) are described in detail below.
When the echo path model in the self-adaptive filter is a time-domain model hk,t, the error signal may be as follows:
where e0(t) is the error signal corresponding to the downlink reference signal x(t), d(t) is the target signal, y0(t) is the filtering signal corresponding to the downlink reference signal x(t), h0,t is an Mth-order Finite Impulse Response (FIR) filter at a t time point, h0,t=[h0,t (1), h0,t (2), . . . , h0,t (M)]T, ek(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), yk(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t) hk,t is an Mth FIR filter at a Kth t time point, hk,t=[hk,t (1), hk,t (2), . . . , hk,t (M)]T, k=1, 2, . . . , K, superscript T is a transposition symbol, {circle around (×)} is a convolution symbol, t is a time index, M is an order number and satisfies the echo path model, M is within a range from 0.01*fs to fs, and fs is sampling frequency.
A self-adaptive filtering algorithm is used to calculate the residual signal e(t) to obtain an update item of a coefficient of the self-adaptive filter. When the echo path model in the self-adaptive filter is a time-domain model hk,t, the calculation can be realized by any time-domain self-adaptive filtering algorithm, such as Least Mean Square (LMS), Normalized Least Mean Square (NMLS), Affine Projection (AP), Fast Affine Projection (FAP), Least Square (LS) or Recursive Least Square (RLS).
For example, the NMLS algorithm is used to realize the calculation:
where ε is a micro positive real number which prevents zero division error, μh,0 and μh,k are update step size, 0<μh,0, μh,k<2, k=1, 2, . . . , K, and t is a time index.
The echo path model corresponding to each path pre-distorted signal is updated to:
hk,t+1=hk,t+Δhk,t,
where Δhk,t is the updated item of the coefficient of the self-adaptive filter, Δhk,t is an Mth-order vector, M is a positive integer, and k=1, 2, . . . , K.
Calculation is performed to the downlink reference signal x(t) and the K-path pre-distorted signals r1(t), . . . , rK(t) using a self-adaptive filtering algorithm to obtain the error signals e0(t), e1(t), . . . , eK(t). When the echo path model in the self-adaptive filter is a frequency-domain model Hk,t, the error signals e0(t), e1(t), . . . , eK(t) may be represented by:
e0(t)=d(t)−y0(t)
[y0(t−(N−M)+1),y0(t−(N−M)+2), . . . ,y0(t)]T=[0(N−M)×MI(N−M)×(N−M)]F−[H0,t·R0,t]
ek(t)=d(t)−yk(t),
[yk(t−(N−M)+1),yk(t−(N−M)+2), . . . ,yk(t)]T=[0(N−M)×MI(N−M)×(N−M)]F−[Hk,t·Rk,t],
where e0(t) is the error signal corresponding to the downlink reference signal x(t), d(t) is the target signal, y0(t) is the filtering signal corresponding to the downlink reference signal x(t), ek(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), yk(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), superscript T is a transposition symbol, 0(N−M)×M is a zero matrix having (N−M) rows and M columns, I(N−M)×(N−M) is a (N−M)-by-(N−M) identity matrix, F− is an inverse discrete Fourier transform matrix, · is a dot product symbol, H0,t is an N-point vector at a t time point, R0,t=F[x(t−N+1), x(t−N+2), . . . , x(t)]T, Hk,t is an N-point vector at a Kth t time point, Rk,t=F[rk(t−N+1), rk(t−N+2), . . . , rk(t)]T, t is a time index, N is length of a signal frame, M is an order number, M is within a range from 0.01*fs to fs, fs is sampling frequency, F is a discrete Fourier transform matrix, k=1, 2, . . . , K, and K is a positive integer.
A self-adaptive filtering algorithm is used to calculate the residual signal e(t) to obtain an update item of a coefficient of the self-adaptive filter. When the echo path model in the self-adaptive filter is a frequency-domain model Hk,t, the calculation can be realized by any frequency-domain self-adaptive filtering algorithm, such as Frequency Domain Adaptive Filter (FDAF), Multi-Delay Adaptive Filter (MDAF), Windowing Frequency Domain Adaptive Filter (WDAF).
For example, the FDAF algorithm is used to realize the calculation:
where ε is a micro positive real number which prevents zero division error, * is a conjugation symbol, μH,0 and μH,k are update step size, 0<μH,0, μH,k<2, k=1, 2, . . . , K.
E0,t=F[e0(t−N+1),e0(t−N+2), . . . ,e0(t)]T,
R0,t=F[x(t−N+1),x(t−N+2), . . . ,x(t)]T,
Ek,t=F[ek(t−N+1),ek(t−N+2), . . . ,ek(t)]T,
Rk,t=F[rk(t−N+1),rk(t−N+2), . . . ,rk(t)]T,
E[|R0,t|2] is the expectation value of energy spectrum of R0,t, which is obtained by an autoregressive method as follows:
E[|R0,t|2]=ηE[|R0,t-1|2]+(1−η)|R0,t|2, and
E[|Rk,t|2] is the expectation value of energy spectrum of Rk,t, which is obtained by an autoregressive method as follows:
E[|Rk,t|2]=ηE[|Rk,t-1|2]+(1−η)|Rk,t|2,
where η is an update factor, 0<η<1, k=1, 2, . . . , K.
The echo path model is updated to:
Hk,t+1=Hk,t+ΔHk,t,
where ΔHk,t is the updated item of the coefficient of the self-adaptive filter, ΔHk,t is an Nth-order vector, N is a positive integer, and k=1, 2, . . . , K.
In S21, the (K+1) path error signals are mapped to corresponding mapping signals using a reversible space mapping method.
In S22, metric values of the corresponding mapping signals are calculated using a predetermined minimum-value metric function.
In S23, the minimum metric value is searched among the calculated metric values.
In S24, the mapping signal which corresponds to the minimum metric value is mapped back to a space where the (K+1) path error signals stay, to obtain the residual signal e(t).
As parameters of the self-adaptive filter are different, residual signals corresponding to the (K+1) path error signals e0(t), e1(t), . . . , eK(t) may be least at different time points or in different spaces. In the minimum-value fusion process, the (K+1) path error signals e0(t), e1(t), . . . , eK(t) are mapped to the mapping signal S0,t, S1,t, . . . , SK,t using a space mapping method. The metric values ν0, ν1, . . . , νK of the corresponding mapping signals S0,t, S1,t, . . . , SK,t are calculated using the predetermined minimum-value metric function, and the minimum metric value νk
In some embodiments, the minimum-value metric function aims to calculate a short-time amplitude, such as
where νk is the minimum metric value, t is a time index, and k=1, 2, . . . , K.
In some embodiments, the minimum-value metric function aims to calculate short-time power, such as
where νk is the minimum metric value, t is a time index, and k=1, 2, . . . , K.
In the above equations, L is a positive integer which represents a short-time section and is within a range from 0.001*fs to fs, and fs is sampling frequency.
When the mapping signal Sk
In some embodiments, adjacent short-time sections in one path may be overlapped partially, to facilitate performing a smoothing process to two ends of the sections.
In some embodiments, a relatively effective minimum-value fusion process may be a frequency-domain transformation as follows:
Sk,t=TF([ek(t−L+1), . . . ,ek(t)]),
where Sk,t is the mapping signal, TF represents the frequency-domain transformation, L is a positive integer which represents a short-time section and is within a range from 0.001*fs to fs, fs is sampling frequency, and k=1, 2, . . . , K.
The frequency-domain transformation TF may be Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Karhunen-Loeve (KL), Modified Discrete Cosine Transform (MDCT), etc. The frequency-domain transformation TF is reversible, and its reversible transformation is TF−.
The mapping signal Sk,t obtained by the frequency-domain transformation is a LF-point vector. To different mapping, LF may be different. For example, to DFT, DCT or KL transformation, LF=L, while to MDCT transformation, LF=L/2. The minimum-value metric function may be modulus of the mapping signal Sk,t[l], l=1, 2, . . . , LF: fmin(x)=|x|, or may be weight of absolute values of real part and virtual part of the modulus fmin(x)=λreal|real(x)|γ
The metric values of the mapping signals Sk,t[l], l=1, 2, LF are obtained using the minimum-value metric function based on the following equation:
νk,t=fmin(Sk,t[l]),
where νk,l is the minimum metric value, integer index l=1, 2, . . . , LF, and k=1, 2, . . . , K.
The mapping signals Sk,t[l], l=1, 2, . . . , LF are fused to fused signals St[l]=Sk
fmin(Sk
Afterward, the fused signals St[l]=Sk
[e(t−L+1), . . . ,e(t)]=TF−(St).
In some embodiments, adjacent short-time sections in one path may be overlapped partially, to facilitate performing a smoothing process to two ends of the sections.
In above embodiments, K is a positive integer.
Below test is performed to ensure that echo loss with high amplitude can be steadily provided under a situation that a speaker has relatively serious distortion.
Two path pre-distorted signals are selected to be processed, each path having a gain of 1, and the pre-distortion mapping functions are:
f1(x)=sign(x)|x|0.1, and
f2(x)=sign(x)|x|0.2.
DCT mapping is used as the space mapping in the minimum-value fusion process. The short-time section L=160, the minimum-value metric function is calculating an absolute value, and the self-adaptive filter uses FDAF. Compared a processing result in the embodiment with a result obtained by an existing method, a processed signal obtained in the embodiment is evidently smaller than a processed signal obtained in an existing Acoustic echo cancellation (AEC) method, and amplitude of echo loss is increased by at least 5 dB.
Accordingly, in an embodiment, an apparatus for reducing distortion echo is provided.
The amplification unit 31 is configured to perform K-path amplification to a downlink reference signal x(t) to obtain K-path preprocessed signals, where K is a positive integer.
In some embodiments, performing the K-path amplification may include: adjusting amplitude of the downlink reference signal x(t) using gains g1, g2, . . . , gk to obtain the K-path preprocessed signals p1(t), p2(t), . . . , pk(t). Equation used in the step is shown as follows:
where 0<=g1, g2, . . . , gk<=1.
It should be noted that, the gain is not greater than 1 to avoid the downlink reference signal to generate extra amplitude spillover distortion in a digital system.
The pre-distortion processing unit 32 is configured to perform a pre-distortion process to the K-path preprocessed signals, respectively, to obtain K-path pre-distorted signals rk(t), where k=1, 2, . . . , K.
A pre-distortion mapping function used in the pre-distortion process meets the following equation:
rk(t)=fk(pk(t)),
where rk(t) is a kth-path pre-distorted signal, pk(t) is a kth-path preprocessed signal, fk(x)≠cx, fk(x)≠c, c is a constant, and k=1, 2, . . . , K.
The pre-distortion mapping function aims to generate a pre-distorted signal related to the downlink reference signal x(t) based on the downlink reference signal x(t), to simulate distortion of a speaker.
The filtering unit 33 is configured to perform a filtering process to the downlink reference signal x(t) and each of the K-path pre-distorted signals using a self-adaptive filter which corresponds to the downlink reference signal x(t) and the corresponding pre-distorted signal, to obtain (K+1) path filtering signals.
The difference calculation unit 34 is configured to calculate differences between a target signal d(t) and each of the (K+1) path filtering signals to obtain (K+1) path error signals.
The fusion unit 35 is configured to perform a minimum-value fusion process to the (K+1) path error signals to obtain a residual signal e(t).
The output unit 36 is configured to output the residual signal e(t) as a final self-adaptive echo cancellation.
From above, by the apparatus for reducing distortion echo, K-path amplification and pre-distortion process are performed to the downlink reference signal to obtain K-path pre-distorted signals. Afterwards, filtering is performed using the self-adaptive filters which correspond to the downlink reference signal x(t) and the K-path pre-distorted signals to obtain the filtering signals. Error signals are obtained by calculating differences between the target signal and each of the filtering signals. The minimum-value fusion process is performed to the error signals to obtain the residual signal which is then output as the final self-adaptive echo cancellation. In embodiments of the present disclosure, the residual signal is relatively small as the minimum-value fusion process is performed to the error signals. That is to say, echo loss is relatively great. Therefore, the apparatus may provide echo loss with high amplitude under a situation that a speaker has relatively serious distortion.
In some embodiments, to facilitate the design and use of the pre-distortion mapping function, an input range and an output range of the pre-distortion mapping function are controlled by normalization and the pre-distortion mapping function may meet the following equation:
where xmax is a maximum amplitude of the downlink reference signal x(t), −1<=fk(x)<=1, and k=1, 2, . . . , K.
In some embodiments, the pre-distortion mapping function may be selected from but not limited to the following equations:
fk(x)=|x|γ+c,
fk(x)=sign(x)|x|γ+c,
fk(x)=sin(cx),
fk(x)=tan(cx),
or any combination of the above equations, for example,
fk(x)=a1|x|γ
In some embodiments, the pre-distortion mapping function may be a function, for example,
where c, c1, c2, c3, c4, γ, γ1, γ2, a1, a2, a3, a4, x1 and x2 are real constants, and sign(x) is a function that extracts the sign of a real number.
Multi-path pre-distorted mapping function is required for obtaining the pre-distorted signals because of following reasons. The distortion of the speaker is complicated and time-varying. One type of distortion process hardly effectively approximates distortion component in an echo signal. Therefore, in embodiments of the present disclosure, multi-path distortion process is performed to obtain various distortion process results, such that an input of the minimum-value fusion process can be selected from abundant candidates.
In some embodiments, an echo path model in the self-adaptive filter may be a time-domain model or a frequency-domain model. The self-adaptive filter and the error signal which correspond to the downlink reference signal x(t) are described in detail below.
When the echo path model in the self-adaptive filter is a time-domain model hk,t, the error signal may be as follows:
where e0(t) is the error signal corresponding to the downlink reference signal x(t), d(t) is the target signal, y0(t) is the filtering signal corresponding to the downlink reference signal x(t), h0,t is an Mth-order Finite Impulse Response (FIR) filter at a t time point, h0,t=[h0,t (1), h0,t (2), . . . , h0,t (M)]T, ek(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), yk(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), hk,t is an Mth FIR filter at a Kth t time point, hk,t=[hk,t (1), hk,t (2), . . . , hk,t (M)]T, k=1, 2, . . . , K, superscript T is a transposition symbol, {circle around (×)} is a convolution symbol, t is a time index, M is an order number and satisfies the echo path model, M is within a range from 0.01*fs to fs, and fs is sampling frequency.
A self-adaptive filtering algorithm is used to calculate the residual signal e(t) to obtain an update item of a coefficient of the self-adaptive filter. When the echo path model in the self-adaptive filter is a time-domain model hk,t, the calculation can be realized by any time-domain self-adaptive filtering algorithm, such as LMS, NMLS, AP, FAP, LS or RLS.
The echo path model corresponding to each path pre-distorted signal is updated to:
hk,t+1=hk,t+Δhk,t,
where Δhk,t is the updated item of the coefficient of the self-adaptive filter, Δhk,t is an Mth-order vector, M is a positive integer, and k=1, 2, . . . , K.
Calculation is performed to the downlink reference signal x(t) and the K-path pre-distorted signals r1(t), . . . , rK(t) using a self-adaptive filtering algorithm to obtain the error signals e0(t), e1(t), . . . , eK(t). When the echo path model in the self-adaptive filter is a frequency-domain model Hk,t, the error signals e0(t), e1(t), . . . , eK(t) may be represented by:
e0(t)=d(t)−y0(t)
[y0(t−(N−M)+1),y0(t−(N−M)+2), . . . ,y0(t)]T=[0(N−M)×MI(N−M)×(N−M)]F−[H0,t·R0,t]
ek(t)=d(t)−yk(t),
[yk(t−(N−M)+1),yk(t−(N−M)+2), . . . ,yk(t)]T=[0(N−M)×MI(N−M)×(N−M)]F−[Hk,t·Rk,t],
where e0(t) is the error signal corresponding to the downlink reference signal x(t), d(t) is the target signal, y0(t) is the filtering signal corresponding to the downlink reference signal x(t), ek(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), yk(t) is an error signal corresponding to each of the K-path pre-distorted signals rk(t), superscript T is a transposition symbol, 0(N−M)×M is a zero matrix having (N−M) rows and M columns, I(N−M)×(N−M) is a (N−M)-by-(N−M) identity matrix, F− is an inverse discrete Fourier transform matrix, · is a dot product symbol, H0,t is an N-point vector at a t time point, R0,t=F[x(t−N+1), x(t−N+2), . . . , x(t)]T, Hk,t is an N-point vector at a Kth t time point, Rk,t=F[rk(t−N+1), rk(t−N+2), . . . , rk(t)]T, t is a time index, N is length of a signal frame, M is an order number, M is within a range from 0.01*fs to fs, fs is sampling frequency, F is a discrete Fourier transform matrix, k=1, 2, . . . , K, and K is a positive integer.
A self-adaptive filtering algorithm is used to calculate the residual signal e(t) to obtain an update item of a coefficient of the self-adaptive filter. When the echo path model in the self-adaptive filter is a frequency-domain model Hk,t, the calculation can be realized by any frequency-domain self-adaptive filtering algorithm, such as FDAF, MDAF or WDAF.
The echo path model is updated to:
Hk,t+1=Hk,t+ΔHk,t,
where ΔHk,t is the updated item of the coefficient of the self-adaptive filter, ΔHk,t is an Nth-order vector, N is a positive integer, and k=1, 2, . . . , K.
The mapping sub-unit 41 is configured to map the (K+1) path error signals to corresponding mapping signals using a reversible space mapping method.
The metric value obtaining sub-unit 42 is configured to calculate metric values of the corresponding mapping signals using a predetermined minimum-value metric function.
The searching sub-unit 43 is configured to search the minimum metric value among the calculated metric values.
The residual signal obtaining sub-unit 44 is configured to map the mapping signal which corresponds to the minimum metric value back to a space where the (K+1) path error signals stay, to obtain the residual signal e(t).
As parameters of the self-adaptive filter are different, residual signals corresponding to the (K+1) path error signals e0(t), e1(t), . . . , eK(t) may be least at different time points or in different spaces. In the minimum-value fusion process, the (K+1) path error signals e0(t), e1(t), . . . , eK(t) are mapped to the mapping signal S0,t, S1,t, . . . , SK,t using a space mapping method. The metric values ν0, ν1, . . . , νK of the corresponding mapping signals S0,t, S1,t, . . . , SK,t are calculated using the predetermined minimum-value metric function, and the minimum metric value νk
In some embodiments, the minimum-value fusion process may be relatively simple. In some embodiments, the space mapping is framing of short-time signals, such as
Sk,t=[ek(t−L+1),ek(t−L+2), . . . ,ek(t)],
where Sk,t is the mapping signal, t is a time index, and k=1, 2, . . . , K.
In some embodiments, the minimum-value metric function aims to calculate a short-time amplitude, such as
where νk is the minimum metric value, t is a time index, and k=1, 2, . . . , K.
In some embodiments, the minimum-value metric function aims to calculate short-time power, such as
where νk is the minimum metric value, t is a time index, and k=1, 2, . . . , K.
In the above equations, L is a positive integer which represents a short-time section and is within a range from 0.001*fs to fs, and fs is sampling frequency.
When the mapping signal Sk
In some embodiments, adjacent short-time sections in one path may be overlapped partially, to facilitate performing a smoothing process to two ends of the sections.
In some embodiments, a relatively effective minimum-value fusion process may be a frequency-domain transformation as follows:
Sk,t=TF([ek(t−L+1), . . . ,ek(t)]),
where Sk,t is the mapping signal, TF represents the frequency-domain transformation, L is a positive integer which represents a short-time section and is within a range from 0.001*fs to fs, fs is sampling frequency, and k=1, 2, . . . , K.
The frequency-domain transformation TF may be Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Karhunen-Loeve (KL), Modified Discrete Cosine Transform (MDCT), etc. The frequency-domain transformation TF is reversible, and its reversible transformation is TF−.
The mapping signal Sk,t obtained by the frequency-domain transformation is a LF-point vector. To different mapping, LF may be different. For example, to DFT, DCT or KL transformation, LF=L, while to MDCT transformation, LF=L/2. The minimum-value metric function may be modulus of the mapping signal Sk,t[l], l=1, 2, . . . , LF: fmin(x)=|x|, or may be weight of absolute values of real part and virtual part of the modulus fmin(x)=λreal|real(x)|γ
The metric values of the mapping signals Sk,t[l], l=1, 2, . . . , LF are obtained using the minimum-value metric function based on the following equation:
νk,l=fmin(Sk,t[l]),
where νk,l is the minimum metric value, integer index l=1, 2, . . . , LF, and k=1, 2, . . . , K.
The mapping signals Sk,t[l], l=1, 2, . . . , LF are fused to fused signals St[l]=Sk
fmin(Sk
Afterward, the fused signals St[l]=Sk
[e(t−L+1), . . . ,e(t)]=TF−(St).
In some embodiments, adjacent short-time sections in one path may be overlapped partially, to facilitate performing a smoothing process to two ends of the sections.
In above embodiments, K is a positive integer.
Detailed working principles of components of the apparatus can be referred to corresponding part in the method described in the above-mentioned embodiments.
In the present disclosure, the various embodiments are described in a progressive way. The focus of each embodiment is different from that of other embodiments. And the same or the similar parts between the respective embodiments can refer to each other.
Although the present disclosure has been disclosed above with reference to preferred embodiments thereof, it should be understood that the disclosure is presented by way of example only, and not limitation. Those skilled in the art can modify and vary the embodiments without departing from the spirit and scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
2014 1 0857165 | Dec 2014 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
20050220292 | Okumura | Oct 2005 | A1 |
20060013383 | Barron | Jan 2006 | A1 |
20080101622 | Sugiyama | May 2008 | A1 |
20100184488 | Takada | Jul 2010 | A1 |
20100223311 | Sugiyama | Sep 2010 | A1 |
20110124380 | Wang | May 2011 | A1 |
20130188759 | Jain | Jul 2013 | A1 |
20150187348 | Kang | Jul 2015 | A1 |
Number | Date | Country | |
---|---|---|---|
20160189700 A1 | Jun 2016 | US |