This application claims priority under 35 U.S.C. § 119 to Korean Patent Application No. 10-2019-0011091 filed on Jan. 29, 2019, in the Korean Intellectual Property Office, the disclosure of which is incorporated by reference herein in its entireties.
Example embodiments of the inventive concepts described herein relate to a residual echo estimator configured to estimate a residual echo based on time correlation, a non-transitory computer-readable medium storing a program code configured to estimate a residual echo, and/or an application processor.
In an electronic device including a speaker and a microphone, as a voice of a far-end speaker is output through the speaker, the voice of the far-end speaker may be input to the microphone. This acoustic echo may make it difficult to recognize a voice of a near-end speaker at a far end. Accordingly it may be desirable to cancel the acoustic echo due to the coupling between the speaker and the microphone positioned adjacent to each other. Also, as a demand on an electronic device supporting voice recognition increases, it may be more important to effectively cancel the acoustic echo.
Example embodiments of the inventive concepts provide a residual echo estimator configured to estimate a residual echo based on time correlation, a non-transitory computer-readable medium storing a program code configured to estimate a residual echo, and an application processor.
According to an example embodiment, a residual echo estimator is configured to estimate a residual echo of a microphone signal. In some example embodiments, the residual echo estimator includes processing circuitry configured to, estimate a magnitude of the residual echo at a current frame based on a magnitude of a linear echo of a reference signal at the current frame and a magnitude of the linear echo of the reference signal at a past frame, and update weights to apply to the magnitude of the linear echo at the current frame and the magnitude of the linear echo at the past frame.
According to an example embodiment, a non-transitory computer-readable medium stores a program code executable by a processor. In some example embodiments, the program code is executable by the processor to, estimate a magnitude of a residual echo of a microphone signal at a current frame based on a magnitude of a linear echo of a reference signal at the current frame and a magnitude of the linear echo of the reference signal at a past frame, update weights to apply to the magnitude of the linear echo at the current frame and the magnitude of the linear echo at the past frame, and calculate a suppression gain based on the magnitude of the residual echo and a magnitude of an output signal obtained by canceling the linear echo from a microphone signal, the output signal being multiplied by the suppression gain to generate a final output signal.
According to an example embodiment, an application processor includes an audio processor; and a non-transitory computer-readable medium configured to store a program code executable by the audio processor to, generate an output signal by canceling a linear echo of a reference signal from a microphone signal input to a microphone as the reference signal is output from a speaker, the linear echo being determined based on a transfer path between the speaker and the microphone, and estimate a magnitude of a residual echo at a current frame based on one or more of a magnitude of the linear echo at the current frame and a magnitude of the linear echo at a past frame.
The above and other objects and features of the inventive concepts will become apparent by describing in detail example embodiments thereof with reference to the accompanying drawings.
Referring to
The electronic device 10 according to an example embodiment of the inventive concepts may receive the reference signal from the electronic device 20 and may output the reference signal through a speaker 11. The electronic device 10 may receive ambient sound (e.g., a voice of a near-end speaker or a noise) of the electronic device 10 through a microphone 12. However, as the reference signal is output through the speaker 11, an echo (or an echo signal) with respect to the reference signal may be input to the microphone 12 of the electronic device 10. Accordingly, the electronic device 10 may cancel or suppress an echo included in a microphone signal for inhibiting (or, alternatively, preventing) a voice of the far-end speaker from being again played to the far-end speaker through a speaker 21 of the electronic device 20.
The electronic device 10 may include an application processor 100. The application processor 100 may include an audio processor 110 and a memory 120.
The electronic device 10 may include an acoustic echo cancellation (AEC) system 1000 implemented using processing circuitry such as hardware including logic circuits; a hardware/software combination such as a processor executing software; or a combination thereof. For example, the processing circuitry more specifically may include, but is not limited to, a central processing unit (CPU), an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a System-on-Chip (SoC), a programmable logic unit, a microprocessor, application-specific integrated circuit (ASIC), etc.
For example, the application processor 100 may include an acoustic echo cancellation (AEC) system 1000 implemented by using a hardware component(s), circuit(s), or module(s) for canceling the echo associated with the reference signal. For another example, the application processor 100 may execute or implement the AEC system 1000 that is executable by using a software component(s), a functional block(s), or a module(s).
In the case where all or a part of the AEC system 1000 is implemented by software, the audio processor 110 may execute a program code stored in the memory 120 to execute or implement the AEC system 1000. The audio processor 110 may include at least one core (processing unit) that may read and execute a command(s), an algorithm(s), or a function(s) included in the program code from the memory 120 for executing the AEC system 1000.
The memory 120 may be a non-transitory computer-readable medium that stores the program code for executing the AEC system 1000. The memory 120 may be a random access memory (RAM), a flash memory, a read only memory (ROM), an electrically programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a register, a hard disk drive, a removable disk, a CD-ROM, or any type of storage medium. As illustrated in
The AEC system 1000 may include a linear echo canceler LEC and a residual echo suppressor RES. The linear echo canceler LEC may receive the reference signal. The linear echo canceler LEC may estimate (calculate) a linear echo X′(κ, m) of the reference signal linearly based on a transfer path between the speaker 11 and the microphone 12. “κ (kappa)” is a frame index associated with a time, and “m” is a frequency band. For example, the linear echo canceler LEC may execute a convolution operation for the reference signal and a transfer function that is in advance modeled to indicate the transfer path. The linear echo canceler LEC may include an adaptive filter. The linear echo canceler LEC may cancel the linear echo X′(κ, m) from a microphone signal by using the adaptive filter and may generate an output signal E(κ, m) as a result of the cancellation. A coefficient of the adaptive filter may be continually updated so that the linear echo X′(κ, m) is matched with an actual echo of the microphone signal.
Since the linear echo canceler LEC estimates the linear echo X′(κ, m) of the reference signal linearly, a residual echo may exist in the output signal E(κ, m). For example, nonlinear distortion of an echo, that is, causes of harmonic distortion (HD) may include a nonlinear characteristic of the speaker 11 itself, a vibration generated as the speaker 11 and the microphone 12 are adjacent to each other, nonlinear characteristics of components of the electronic device 10, etc.
The residual echo suppressor RES may suppress a residual echo that is not canceled by the linear echo canceler LEC. The residual echo suppressor RES may generate a suppression gain G(κ, m), may multiply the suppression gain G(κ, m) and the output signal E(κ, m) together, and may generate an output signal R(κ, m). The residual echo suppressor RES may adaptively update the suppression gain G(κ, m) so that a residual echo is not included in the output signal R(κ, m) as possible as maximum. The electronic device 10 may provide the echo-canceled microphone signal to the electronic device 20, and the far-end speaker may hear the microphone signal, in which an echo is not included as possible as maximum, through the speaker 21.
Referring to
The AEC system 1000 may include first and second Fourier transform (FT) modules 1100 and 1200, a linear echo canceler 1300, a residual echo suppressor 1400, and an inverse Fourier transform (IFT) module 1500. A transform module may also be called a “converter” or a “transformer”. As discussed above, the AEC system 1000 may be implemented by execution of a program code to estimate a magnitude of a residual echo, and the executed program code may transform the processor into a special purpose computer to perform the functions of a component(s), a function(s), a block(s), or a module(s) of the AEC system 1000.
The first Fourier transform module 1100 may receive a microphone signal and may transform (convert) the microphone signal from a time domain to a frequency domain. For example, the first Fourier transform module 1100 may perform a transformation operation based on modulated complex lapped transform (MCLT), Fourier transform (FT), short-time Fourier transform (STFT), discrete Fourier transform (DFT), fast Fourier transform (FFT), etc. and may output a microphone signal Y(κ, m). The second Fourier transform module 1200 may receive the reference signal and may transform the reference signal from a time domain to a frequency domain. The second Fourier transform module 1200 may perform a transformation operation in a way similar to the first Fourier transform module 1100 and may output a reference signal X(κ, m).
The linear echo canceler 1300 may execute a convolution operation on a reference signal and a transfer function that is in advance modeled to indicate a path between the speaker 11 and the microphone 12 and may estimate a linear echo X′(κ, m) of the reference signal. The linear echo canceler 1300 may provide the linear echo X′(κ, m) and the output signal E(κ, m), which is obtained by canceling the linear echo X′(κ, m) from the microphone signal Y(κ, m), to the residual echo suppressor 1400.
The residual echo suppressor 1400 may receive the microphone signal Y(κ, m), the linear echo X′(κ, m), an estimated residual echo, and the output signal E(κ, m). The residual echo suppressor 1400 according to an example embodiment of the inventive concepts may calculate the suppression gain G(κ, m) for suppressing a residual echo by using the microphone signal Y(κ, m), the linear echo X′(κ, m), and the estimated residual echo. The residual echo suppressor 1400 may generate the output signal R(κ, m) by multiplying the suppression gain G(κ, m) and the output signal E(κ, m) together.
The inverse Fourier transform module 1500 may receive the output signal R(κ, m) from the residual echo suppressor 1400. The inverse Fourier transform module 1500 may transform the output signal R(κ, m) from a frequency domain to a time domain. For example, the inverse Fourier transform module 1500 may perform a transformation operation based on inverse MCLT, inverse FT, inverse STFT, inverse DFT, inverse FFT, etc.
Referring to
The residual echo estimator 1410a may receive the linear echo X′(κ, m) and linear echoes X′(κ−t, m) at (or, alternatively, in) past frames stored in the memory 120. Unlike the illustration of
In Equation 1, “i” is fundamental frequency band, “M” is the number of sub-bands, “j” is a harmonic, “H” is the number of harmonics to be considered, “2K+1” that is a k (alphabet) index is a length of a harmonic search window, “m” is a frequency bin index, “WR(i, j, k)” are weights that are applied to a magnitude of the linear echo X′(κ, m) at a current frame, “T” is the number of past frames for performing estimation by using “t” as an index, and “WT1(t, m)” are weights that are applied to magnitudes of the linear echoes X′(κ−t, m) at past frames. The weights WR(i, j, k) and WT1(t, m) may be adaptively updated by the residual echo estimator 1410a.
According to an example embodiment of the inventive concepts, the residual echo estimator 1410a may estimate the magnitude of the residual echo {circumflex over (D)}r(κ, m) at a current frame by using magnitudes of the linear echoes X′(κ−t, m) at past frames as well as a magnitude of the linear echo X′(κ, m) at the current frame. A time correlation of the linear echo X′(κ, m) may be used to calculate the magnitude of the residual echo {circumflex over (D)}r(κ, m). Accordingly, echo return loss enhancement (ERLE) of the AEC system 1000 may be improved.
The magnitudes of the linear echoes X′(κ−t, m) at the past frames may be stored in the memory 120. As described above, the memory 120 may be any type of storage medium. The memory 120 may be registers in the application processor 100, a tightly coupled memory (TCM), or a static random access memory (SRAM). The memory 120 may be a dynamic random access memory (DRAM) outside the application processor 100. The magnitude of the linear echo X′(κ, m) at the current frame may be stored in the memory 120. Both the magnitudes of the linear echoes X′(κ−t, m) at the past frames and the magnitude of the linear echo X′(κ, m) at the current frame may be used to estimate the magnitude of the residual echo {circumflex over (D)}r(κ+1, m).
The noise estimator 1420 may estimate a magnitude of a noise N(κ, m) by using a magnitude of the output signal E(κ, m) of the linear echo canceler 1300. For example, the noise estimator 1420 may calculate the noise N(κ, m) depending on minimum statics.
The first to third smoothing calculators 1431 to 1433 may respectively execute smoothing operations on the magnitude of the residual echo {circumflex over (D)}r(κ, m), the magnitude of the output signal E(κ, m) of the linear echo canceler 1300, and the magnitude of the noise N(κ, m). The first to third smoothing calculators 1431 to 1433 may apply a recursive average to the magnitude of the residual echo {circumflex over (D)}r(κ, m), the magnitude of the output signal E(κ, m) of the linear echo canceler 1300, and the magnitude of the noise N(κ, m). For example, the smoothing calculators 1431 to 1433 may calculate smoothed magnitudes of a residual echo, an output signal of the linear echo canceler 1300, and a noise using Equation 2 to Equation 4, respectively.
In Equation 2 to Equation 4, “α1”, “α2”, and “α3” may be variables between 0 and 1, and may be the same as or different from each other.
The gain calculator 1440 may calculate the suppression gain G(κ, m), which is to be multiplied with the output signal E(κ, m) of the linear echo canceler 1300, by using the smoothed magnitudes of the residual echo, the output signal of the linear echo canceler 1300, and the noise. The gain calculator 1440 may calculate the suppression gain G(κ, m) by using various operation techniques. For example, the gain calculator 1440 may calculate the suppression gain G(κ, m) using Equation 5 based on Spectral subtraction.
In Equation 5, “β” is a variable between 0 and 1.
For another example, the suppression gain G(κ, m) may be calculated by Equation 6 based on wiener filter.
In Equation 6, ζ(κ, m) may be calculated by Equation 7.
In Equation 7, “α4” is a variable between 0 and 1, and u(⋅) is a unit step function, and γ(κ, m) may be calculated by Equation 8.
In Equation 8, “α5” is a variable between 0 and 1.
For another example, the gain calculator 1440 may calculate the suppression gain G(κ, m) using Equation 9 based on minimum mean square error—short time spectral amplitude estimator (MMSE-STSA).
In Equation 9, Γ(⋅) is a gamma function, v(κ, m) may be calculated by Equation 10, and M(⋅;⋅;⋅) may be a Confluent hypergeometric function.
The multiplier 1450 may generate the output signal R(κ, m) by multiplying the suppression gain G(κ, m) and the output signal E(κ, m) of the linear echo canceler 1300 together.
Referring to
The residual echo estimator 1410b may receive the linear echo X′(κ, m) and the linear echoes X′(κ−t, m) at past frames stored in the memory 120. Unlike illustration of
WT2(t, m) are weights that are applied to the magnitudes of the residual echoes {circumflex over (D)}r (κ−t, m) at the past frames. The weights WT2(t, m) may be adaptively updated by the residual echo estimator 1410b. According to another embodiment of the inventive concept, the residual echo estimator 1410b may estimate a magnitude of the residual echo at the current frame {circumflex over (D)}r (κ, m) by using magnitudes of the linear echoes X′(κ−t, m) at the past frames, magnitudes of the residual echoes {circumflex over (D)}r (κ−t, m) at the past frames, as well as a magnitude of the linear echo X′(κ, m) at the current frame. A time correlation of the linear echo X′(κ, m) and a time correlation of the residual echo {circumflex over (D)}r(κ, m) may be used to calculate the magnitude of the residual echo {circumflex over (D)}r(κ, m). Accordingly, the degree to which the ERLE is improved by the residual echo estimator 1410b may be greater than the degree to which the ERLE is improved by the residual echo estimator 1410a of
The magnitudes of the residual echoes {circumflex over (D)}r(κ−t, m) at the past frames may be stored in the memory 120. The magnitude of the residual echo {circumflex over (D)}r (κ, m) at the current frame may also be stored in the memory 120. The magnitudes of the residual echoes {circumflex over (D)}r(κ−t, m) at the past frames, the magnitude of the residual echo {circumflex over (D)}r(κ, m) at the current frame, the magnitudes of the linear echoes X′(κ−t, m) at the past frames, and the magnitude of the linear echo X′(κ, m) at the current frame may be used to estimate a magnitude of the residual echo {circumflex over (D)}r (κ−t, m) at a next frame.
Referring to
The residual echo estimator 1410c may receive the linear echo X′(κ, m) and the linear echoes X′(κ−t, m) at past frames stored in the memory 120. The residual echo estimator 1410c may further receive the microphone signal Y(κ, m) and microphone signals Y(κ−t, m) at past frames stored in the memory 120. The residual echo estimator 1410c may estimate a magnitude of the residual echo {circumflex over (D)}r(κ, m) at a current frame by using a magnitude of the linear echo X′(κ, m) at the current frame, magnitudes of the linear echoes X′(κ−t, m) at the past frames, a magnitude of the microphone signal Y(κ, m) at the current frame, and magnitudes of the microphone signals Y(κ−t, m) at the past frames. For example, the magnitude of the residual echo {circumflex over (D)}r(κ, m) at the current frame may be calculated by Equation 12.
WT3(t, m) are weights that are applied the magnitude of the microphone signal Y(κ, m) at the current frame and the magnitudes of the microphone signals Y(κ−t, m) at the past frames. The weights WT3(t, m) may be adaptively updated by the residual echo estimator 1410c. According to another embodiment of the inventive concept, the residual echo estimator 1410c may estimate the magnitude of the residual echo {circumflex over (D)}r (κ, m) at the current frame by using the magnitudes of the linear echoes X′(κ−t, m) at the past frames, the magnitude of the microphone signal Y(κ, m) at the current frame, and the magnitudes of the microphone signals Y(κ−t, m) at the past frames, as well as the magnitude of the linear echo X′(κ, m) at the current frame. A time correlation of the linear echo X′(κ, m) and a time correlation of the microphone signal Y(κ, m) may be used to calculate the magnitude of the residual echo {circumflex over (D)}r(κ, m). Accordingly, the degree to which the ERLE is improved by the residual echo estimator 1410c may be greater than the degree to which the ERLE is improved by the residual echo estimator 1410a of
The magnitudes of the microphone signals Y(κ−t, m) at the past frames may be stored in the memory 120. The magnitude of the microphone signal Y(κ, m) at the current frame may be stored in the memory 120. The magnitudes of the microphone signals Y(κ−t, m) at the past frames, the magnitude of the microphone signal Y(κ, m) at the current frame, the magnitudes of the linear echoes X′(κ−t, m) at the past frames, and the magnitude of the linear echo X′(κ, m) at the current frame may be used to estimate a magnitude of the residual echo {circumflex over (D)}r(+1, m)) at a next frame.
Referring to
The residual echo estimator 1410d may receive the linear echo X′(κ, m) and the linear echoes X′(κ−t, m) at past frames stored in the memory 120. The residual echo estimator 1410d may further receive the residual echoes {circumflex over (D)}r(κ−t, m) at the past frames stored in the memory 120, the microphone signal Y(κ, m) at the current frame, and the microphone signals Y(κ−t, m) at the past frames stored in the memory 120. The residual echo estimator 1410d may estimate a magnitude of the residual echo {circumflex over (D)}r (κ, m) at the current frame by using a magnitude of the linear echo X′(κ, m) at the current frame, magnitudes of the linear echoes X′(κ−t, m) at the past frames, magnitudes of the residual echoes {circumflex over (D)}r (κ−t, m) at the past frames, a magnitude of the microphone signal Y(κ, m) at the current frame, and magnitudes of the microphone signals Y(κ-t, m) at the past frames. For example, the magnitude of the residual echo {circumflex over (D)}r (κ, m) at the current frame may be calculated by Equation 13.
Weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m) may be adaptively updated by the residual echo estimator 1410d. According to another embodiment of the inventive concept, the residual echo estimator 1410d may estimate the magnitude of the residual echo {circumflex over (D)}r (κ, m) at the current frame by using the magnitudes of the linear echoes X′(κ−t, m) at the past frames, the magnitudes of the residual echoes {circumflex over (D)}r (κ−t, m) at the past frames, the magnitude of the microphone signal Y(κ, m) at the current frame, and the magnitudes of the microphone signals Y(κ−t, m) at the past frames, as well as the magnitude of the linear echo X′(κ, m) at the current frame. A time correlation of the linear echo X′(κ, m), a time correlation of the residual echo {circumflex over (D)}r (κ, m), and a time correlation of the microphone signal Y(κ, m) may be used to calculate a magnitude of the residual echo {circumflex over (D)}r (κ, m). Accordingly, the degree to which the ERLE is improved by the residual echo estimator 1410d may be greater than the degree to which the ERLE is improved by the residual echo estimators 1410a to 1410c of
The magnitudes of the linear echoes X′(κ−t, m) at the past frames may be stored in the memory 120. The magnitude of the linear echo X′(κ, m) at the current frame may be stored in the memory 120. The magnitudes of the residual echoes {circumflex over (D)}r (κ−t, m) at the past frames may be stored in the memory 120. The magnitude of the residual echo {circumflex over (D)}r(κ, m) at the current frame may be stored in the memory 120. The magnitudes of the microphone signals Y(κ−t, m) at the past frames, the magnitude of the microphone signal Y(κ, m) at the current frame, the magnitudes of the linear echoes X′(κ−t, m) at the past frames, the magnitude of the linear echo X′(κ, m) at the current frame, the magnitudes of the residual echoes ({circumflex over (D)}r(κ−t, m) at the past frames, and the magnitude of the residual echo {circumflex over (D)}r(κ, m) at the current frame may be used to estimate a magnitude of the residual echo {circumflex over (D)}r(κ+1, m) at a next frame.
In an example embodiment, each of the residual echo estimators 1410a to 1410d may be a hardware accelerator (i.e., a processor) implemented by using a hardware component(s), a circuit(s), or a module(s). The hardware accelerator may execute the AEC system 1000 together with the audio processor 110 described with reference to
Referring to
The memory 120 may be registers, a TCM, or an SRAM positioned inside the application processor 100. The memory 200 may be a DRAM positioned outside the application processor 100. The magnitude of the linear echo X′(κ, m) at the current frame, the magnitude of the microphone signal Y(κ, m), and the magnitude of the residual echo {circumflex over (D)}r (κ, m), which are described with reference to
The magnitude of the linear echo X′(κ, m) stored in the memory 200, the magnitude of the microphone signal Y(κ, m), and the magnitude of the residual echo {circumflex over (D)}r(κ, m) at the current frame may be used to estimate the magnitude of the residual echo {circumflex over (D)}r (κ+1, m) at a next frame and may correspond to magnitudes at a past frame. The memory 200 may transmit the stored magnitudes to the application processor 100 as the magnitudes of the linear echoes X′(κ−t, m), the magnitudes of the microphone signals Y(κ−t, m), and the magnitudes of the residual echoes {circumflex over (D)}r (κ−t, m) at past frames.
Also, the weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m) that are applied to the linear echoes X′(κ, m) and X′(κ−t, m) at the current and past frames, the magnitudes of the microphone signals Y(κ, m), and the magnitudes of the residual echoes {circumflex over (D)}r(κ−t, m) may be further stored in the memory 200 as well as the memory 120. The memory 200 may transmit the stored weights to the application processor 100. The weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m) may be adaptively updated.
Referring to
The double talk detector 1600 may detect whether a double talk occurs, by using a reference signal and a microphone signal. The double talk may indicate a situation where a voice of the far-end speaker and a voice of the near-end speaker are together input to the electronic device 10. For example, the double talk detector 1600 may detect a double talk by comparing a magnitude of the reference signal with a magnitude of the microphone signal or based on a correlation between the reference signal and the microphone signal.
In an example embodiment, when the double talk is detected by the double talk detector 1600, an estimation way of the residual echo estimators 1410b to 1410d may be changed. For example, magnitudes of the residual echo {circumflex over (D)}r (κ, m) at a current frame, which are respectively estimated by the residual echo estimators 1410b to 1410d may be respectively calculated by Equation 14 to Equation 16.
In Equation 14 to Equation 16, g(⋅) is a function associated with the double talk. When the double talk is detected, both magnitudes of the microphone signals Y(κ−t, m) in the current and past frames and magnitudes of the residual echoes {circumflex over (D)}r (κ−t, m) at the past frames may not be used (may be deactivated), and may not be applied to the magnitude of the residual echo {circumflex over (D)}r(κ, m) at the current frame. When the double talk is not detected, as described above, the residual echo estimators 1410a to 1410d may estimate the magnitudes of the residual echo {circumflex over (D)}r(κ, m) at the current frame calculated by Equation 1 and Equation 11 to Equation 13.
Referring to
In operation S120, the residual echo estimator 1410 may update at least a part of weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m) selectively used in the estimation of operation S110 so that a difference between the magnitude of the output signal E(κ, m) of the linear echo canceler 1300 and the magnitude of the residual echo {circumflex over (D)}r(κ, m) is minimized. Equation 17 indicates the above difference.
ξ(κ,m)=E(κ,m)−{circumflex over (D)}r(κ,m) [Equation 17]
The residual echo estimator 1410 may update at least a part of the weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m) selectively used in the estimation of operation S110 by using the difference of Equation 17. The residual echo estimator 1410 may use a normalized least mean square (NLMS), a recursive least square (RLS), or a Kalman filter, etc. for optimizing the weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m). For example, in the case where the NLMS is used, the weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m) may be optimized as represented as Equation 18. In Equation 18, weights may be updated in the direction of an arrow, and P1, P2, and P3 may be power smoothings of the linear echo X′(κ, m), the residual echo {circumflex over (D)}r(κ, m), and the microphone signal Y(κ, m).
For another example, in the case where the RLS is used, the weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m) may be optimized as represented as Equation 19 to Equation 21. In Equation 19, weights may be updated in the direction of an arrow.
In an example embodiment, the residual echo estimator 1410 may execute an artificial neural network that updates the weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m) used so that a difference between the magnitude of the output signal E(κ, m) of the linear echo canceler 1300 and the magnitude of the residual echo {circumflex over (D)}r(κ, m) is reduced (or, alternatively, minimized). The artificial neural network may learn the weights WR(i, j, k), WT1(t, m), WT2(t, m), and WT3(t, m) based on the above adaptive algorithm so that the above difference is minimized. Because a magnitude of the output signal E(κ, m) of the linear echo canceler 1300 indicates a magnitude of an actual residual echo, an ideal value of the difference may be, for example, “0”.
In operation S130, the first to third smoothing calculators 1431 to 1433 may respectively execute smoothing operations on the magnitude of the residual echo {circumflex over (D)}r (κ, m), the magnitude of the output signal E(κ, m) of the linear echo canceler 1300, and the magnitude of the noise N(κ, m).
In operation S140, the gain calculator 1440 may calculate the suppression gain G(κ, m), which is multiplied with the output signal E(κ, m) of the linear echo canceler 1300, by using a residual echo, an output signal of the linear echo canceler 1300, and smoothed magnitudes of a noise.
In operation S150, the multiplier 1450 may generate the output signal R(κ, m) by multiplying the output signal E(κ, m) of the linear echo canceler 1300 and the suppression gain G(κ, m) together.
Referring to
In the case where a first mode is selected, in operation S112a, the residual echo estimator 1410 may estimate a magnitude of the residual echo r (κ, m) by using only a magnitude of the linear echo X′(κ, m) at (or, alternatively, of) a current frame. The first mode may be a default mode, in which a time correlation of a linear echo, a time correlation of a residual echo, and a time correlation of a microphone signal may not be used to estimate a magnitude of the residual echo {circumflex over (D)}r(κ, m).
In the case where a second mode is selected, in operation S113a, the residual echo estimator 1410 may estimate the magnitude of the residual echo {circumflex over (D)}r (κ, m) by further using magnitudes of the linear echoes X′(κ−t, m) at past frames in addition to the magnitude of the linear echo X′(κ, m) at the current frame (refer to the residual echo estimator 1410a of
In the case where a third mode is selected, in operation S114a, the residual echo estimator 1410 may estimate the magnitude of the residual echo {circumflex over (D)}r(κ, m) by further using the magnitudes of the linear echoes X′(κ−t, m) at the past frames and magnitudes of the linear echoes {circumflex over (D)}r(κ−t, m) at the past frames in addition to the magnitude of the linear echo X′(κ, m) at the current frame (refer to the residual echo estimator 1410b of
In the case where a fourth mode is selected, in operation S115a, the residual echo estimator 1410 may estimate the magnitude of the residual echo {circumflex over (D)}r(κ, m) by further using the magnitudes of the linear echoes X′(κ−t, m) at the past frames and magnitudes of the microphone signals Y(κ, m) and Y(κ−t, m) at the current and past frames in addition to the magnitude of the linear echo X′(κ, m) at the current frame (refer to the residual echo estimator 1410c of
In the case where a fifth mode is selected, in operation S116a, the residual echo estimator 1410 may estimate the magnitude of the residual echo {circumflex over (D)}r(κ, m) by further using the magnitudes of the linear echoes X′(κ−t, m) at the past frames, the magnitudes of the residual echoes {circumflex over (D)}r(κ−t, m) at the past frames, and the magnitudes of the microphone signals Y(κ, m) and Y(κ−t, m) at the current and past frames in addition to the magnitude of the linear echo X′(κ, m) of the current frame (refer to the residual echo estimator 1410d of
However, the amount of operations that are necessary to update weights and estimate a magnitude of a residual echo when one of the second to fifth modes is selected may increase compared with the amount of operations when the first mode is selected. Accordingly, the user may appropriately select any one of the first to fifth modes in consideration of the ERLE and a hardware resource of the electronic device 10.
In operation S111b, the residual echo suppressor 1400 may select an AEC mode based on an echo return loss enhancement (ERLE) at a past frame. The residual echo suppressor 1400 may estimate a magnitude of a residual echo at a past frame according to a mode previously selected among the first to fifth modes and may evaluate the ERLE. The residual echo suppressor 1400 may compare the ERLE with a reference value and may change the previously selected mode to other mode. For example, when the ERLE does not reach the reference value, the residual echo suppressor 1400 may select any one of the second to fifth modes in which a higher ERLE may be expected. For another example, when the ERLE is considerably greater than the reference value, the residual echo suppressor 1400 may select any one of the second to fifth modes for reducing the amount of operations. Operation Si 12b to operation S116b are substantially the same as operation S112a to operation S116a of
Referring to
The speaker 2001 and the microphone 2002 may be included to process voice information of the electronic device 2000. The speaker 2001 and the microphone 2002 may be the speaker 11 and the microphone 12 described with reference to
The electronic device 2000 may communicate with an external system or an external device (e.g., the electronic device 20 of
The electronic device 2000 may include storage 2004. The storage 2004 may correspond to the memory 200 described with reference to
A display 2005 of the electronic device 2000 may perform an interface with the user under control of the application processor 2100. For example, the user may activate the above-described AEC system 1000 through the display 2005. In detail, the user may activate the AEC system 1000 and may select one of modes (the first to fifth modes of
The application processor 2100 may control overall operations of the electronic device 2000. The application processor 2100 may be implemented in the form of a system-on-chip (SoC). The application processor 2100 may include an audio processor 2110, a main processor 2120, a memory 2130, and an input/output interface 2140. The audio processor 2110 may be the audio processor 110 described above, and the memory 2130 may be the memory 120 described above.
The main processor 2120 may control overall operations of the application processor 2110 independently of the audio processor 2110. The main processor 2120 may load and execute a variety of software (e.g., an application program, an operating system, and a device driver), which the application processor 2100 supports, onto the memory 2130. The main processor 2120 may include one or more central processing units or one or more graphics processing units, that is, one or more cores.
Software, a program, or a program code executable by the main processor 2120 may be loaded onto the memory 2130. For example, the program or the program code may include a kernel, middleware, an application programming interface (API), and application programs AP1 to AP4. At least a portion of the kernel, the middleware, and the API may be called an “OS”. The kernel may manage a resource (e.g., the memory 2130 or the storage 2004) used to execute operations or functions by any other programs (e.g., the middleware, the API, and the application programs AP1 to AP4). The middleware may perform a mediation role for exchanging data between the API or application programs API to AP4 and the kernel. The API may be an interface that is used for the application programs API to AP4 to control a function provided from the kernel or the middleware.
The number of the application programs API to AP4 is not limited to illustration of
The input/output interface 2140 may perform an interface operation for data exchange between the application processor 2100 and any other components of the electronic device 2000. For example, data stored in the memory 2130 may be transmitted or backup up through the input/output interface 2140.
According to an example embodiment of the inventive concepts, a time correlation of a linear echo, a time correlation of a residual echo, and a time correlation of a microphone signal may be selectively used to calculate a magnitude of a residual echo. Accordingly, an ERLE of an AEC system may be further improved.
While example embodiments of the inventive concepts have been described with reference to some example embodiments thereof, it will be apparent to those of ordinary skill in the art that various changes and modifications may be made thereto without departing from the spirit and scope of the inventive concepts as set forth in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2019-0011091 | Jan 2019 | KR | national |
Number | Name | Date | Kind |
---|---|---|---|
6925176 | Myllyla et al. | Aug 2005 | B2 |
7773743 | Stokes et al. | Aug 2010 | B2 |
8213598 | Bendersky et al. | Jul 2012 | B2 |
9172816 | Ahgren | Oct 2015 | B2 |
9595997 | Yang | Mar 2017 | B1 |
9699552 | Kuech et al. | Jul 2017 | B2 |
10056092 | Buck et al. | Aug 2018 | B2 |
20160006480 | Tan | Jan 2016 | A1 |
20180040333 | Wung | Feb 2018 | A1 |
Number | Date | Country |
---|---|---|
10-1033336 | May 2011 | KR |
10-1573121 | Dec 2015 | KR |
10-2017-0052056 | May 2017 | KR |
Entry |
---|
Diego A. Bendersky et al., Nonlinear Residual Acoustic Echo Suppression for High Levels of Harmonic Distortion, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. |
Number | Date | Country | |
---|---|---|---|
20200243104 A1 | Jul 2020 | US |