The present disclosure generally relates to systems and methods for a modular echo cancellation, and specifically to systems and methods for providing modular echo cancellation in a vehicle.
All examples and features mentioned below can be combined in any technically possible way.
According to an aspect, an audio system includes: a head unit comprising at least a first processor, the head unit being configured to generate a plurality of program content signals, one of the plurality of program content signals being a phone program content signal being received from a phone, wherein the plurality of program content signals are transduced by an acoustic transducer into an acoustic signal within a vehicle cabin; a microphone disposed within the vehicle cabin such that the microphone receives the acoustic signal and produces a microphone signal comprising a plurality of echo signals, each echo signal of the plurality of echo signals being a component of the microphone signal correlated to at least one program content signal of the plurality of program content signals; a multichannel echo-cancellation unit being implemented by a second processor, the multichannel echo-cancellation unit being configured to receive a plurality of reference signals, each of the plurality of reference signals being correlated to at least one of the plurality of program content signals, and the microphone signal, and to minimize the plurality of echo signals, according to the plurality of reference signals, to produce an estimated voice signal, and to provide the estimated voice signal to the head unit.
In an example, the multichannel echo-cancellation unit comprises a multichannel echo-cancellation filter configured to provide an estimate of the plurality of echo signals, the estimate of the plurality of echo signals being subtracted from the microphone signal to produce the estimated voice signal, wherein an estimated phone program content echo signal, being correlated to the phone program content signal, is added to the estimated voice signal, such that the estimated voice signal and the estimated phone program content echo signal is provided to the head unit.
In an example, the audio system further includes a post filter configured to receive the estimated voice signal and to suppress at least one residual component correlated to at least one of the plurality of program content signals to produce an echo-suppressed estimated voice signal.
In an example, the estimated phone program content echo signal is added to the echo-suppressed estimated voice signal.
In an example, the post filter is configured to receive the estimated voice signal and the estimated phone program content echo signal and to output the echo-suppressed estimated voice signal and the estimated phone program content echo signal, wherein the estimated phone program content echo signal remains unsuppressed.
In an example, the post filter is configured to output the estimated phone program content echo signal unsuppressed by excluding the estimated phone program content echo signal from a spectral mismatch summation.
In an example, the plurality of reference signals comprises the plurality of program content signals.
According to another aspect, a multichannel echo cancellation unit being implemented on a first processor, includes: at least one program content input to receive a plurality of reference signals, each of the plurality of reference signals being correlated to at least one of a plurality of program content signals output from a head unit including a second processor, one of the plurality of program content signals being a phone program content signal; a microphone input to receive a microphone signal comprising a plurality of echo signals, each echo signal of the plurality of echo signals being a component of the microphone signal correlated to at least one program content signal of the plurality of program content signals; an echo canceler being configured to minimize the plurality of echo signals, according to the plurality of reference signals, to produce an estimated voice signal and to provide the estimated voice signal to the head unit.
In an example, the echo canceler comprises a multichannel echo-cancellation filter configured to provide an estimate of the plurality of echo signals, the estimate of the plurality of echo signals being subtracted from the microphone signal to produce the estimated voice signal, wherein an estimated phone program content echo signal, being correlated to the phone program content signal, is added to the estimated voice signal, such that the estimated voice signal and the estimated phone program content echo signal is provided to the head unit.
In an example, the multichannel echo cancellation unit further includes a post filter configured to receive the estimated voice signal and to suppress at least one residual component correlated to the plurality of program content signals to produce an echo-suppressed estimated voice signal.
In an example, the estimated phone program content echo signal is added to the echo-suppressed estimated voice signal.
In an example, the post filter is configured to receive the estimated voice signal and the estimated phone program content echo signal and to output the echo-suppressed estimated voice signal and the estimated phone program content echo signal, wherein the estimated phone program content echo signal remains unsuppressed.
In an example, the post filter is configured to output the estimated phone program content echo signal unsuppressed by excluding the estimated phone program content echo signal from a spectral mismatch summation.
According to another aspect, the method for performing multichannel echo cancellation, includes: receiving, at a first processor, a plurality of reference signals, each of the plurality reference signals being correlated to at least one of a plurality of program content signals output from a head unit including a second processor, one of the plurality of program content signals being a phone program content signal; receiving a microphone signal comprising a plurality of echo signals, each echo signal of the plurality of echo signals being a component of the microphone signal correlated to at least one program content signal of the plurality of program content signals; minimizing, with an echo canceler defined by first processor, the plurality of echo signals, according to a plurality of reference signals, to produce an estimated voice signal; and providing the estimated voice signal to the head unit.
In an example, wherein the step of minimizing the plurality of echo signals comprises: generating, with a multichannel echo-cancellation filter being defined by the first processor, an estimate of the plurality of echo signals, the estimate of the plurality of echo signals being subtracted from the microphone signal to produce the estimated voice signal
In an example, the method further includes: adding an estimated phone program content echo signal, being correlated to the phone program content signal, to the estimated voice signal, such that the estimated voice signal and the estimated phone program content echo signal is provided to the head unit.
In an example, the method further includes: receiving the estimated voice signal at a post filter, the post filter being implemented by the first processor; and applying a suppression, with the post filter, to at least one residual component correlated to the plurality of program content signals to produce an echo-suppressed estimated voice signal.
In an example, wherein the estimated phone program content echo signal is added to the echo-suppressed estimated voice signal.
In an example, the method further includes: receiving the estimated phone program content echo signal at the post filter; outputting, from the post filter, the estimated phone program content echo signal unsuppressed.
In an example, wherein the post filter is configured to output the estimated phone program content echo signal unsuppressed by excluding the estimated phone program content echo signal from a spectral mismatch summation.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and the drawings, and from the claims.
Vehicle head units typically include multiple subsystems for supplying program content signals such as music, navigation, and handsfree phone signal to an amplifier unit, which (often together with some associated processing) amplifies the program content signals for transduction into an audio signal by a speaker within the vehicle cabin. During a call utilizing the handsfree phone subsystem, a microphone, positioned within the vehicle cabin, will receive the user's voice signal, to be sent to a handsfree phone subsystem, where it is routed to the mobile device. If the speakers, however, are playing the program content signals in the vehicle cabin during the call, the microphone signal will include components correlated to the program content signals, as a result of receiving the acoustic program signals in the cabin. This is generally known as an echo signal and degrades the quality of the voice signal at the microphone.
In order to cancel the echo signal, an echo cancellation system may be included at the handsfree phone subsystem. But in order to cancel the echo of signals besides the phone signal echo, reference signals from the amplifier unit must be sent to the handsfree phone subsystem. Given the typically high number of channels at the amplifier unit, this may require an additional expensive bus for sending the program content reference signals from the amplifier unit to the handsfree phone subsystem. In addition, the time delay associated with sending signals over such a bus could introduce a significant delay that degrades the performance of the echo cancellation. Accordingly, there exists a need in the art for a modular echo cancellation unit that can introduce echo cancellation to the microphone signal at the amplifier unit, or at some other location convenient for receiving the reference signals.
Various examples disclosed herein are directed to a modular echo-cancellation subsystem that may cancel the echo signals related to the program content signals received from the head unit. There is shown in
The program content signals u(n) may be analog or digital signals and may be provided as compressed and/or packetized streams, and additional information may be received as part of such a stream, such as instructions, commands, or parameters from another system for control and/or configuration of the processing component(s), such as the multichannel echo cancellation unit 112, or other components.
The head unit 102 may be implemented by a processor, or collection of processors, together with a non-transitory storage medium configured to store program code that, when executed by the processor(s), performs the various functions necessary to define the various subsystems of the head unit 102.
Amplifier unit 104 may include an audio presentation processing subsystem 114, a multichannel echo cancellation unit 112, and an amplifier 116. Broadly speaking, the audio presentation processing subsystem 114 may provide various audio processing operations on the received program content signals u(n), such as mixing and loudspeaker routing, to be transduced by one or more acoustic transducer(s) 118. This functionality is, generally, implemented in
The presentation processing subsystem 114 may be implemented by a processor, or collection of processors, together with a non-transitory storage medium configured to store program code that, when executed by the processor(s), performs the various functions of presentation processing subsystem 114. Generally, the presentation processing subsystem 114 is implemented on a processor(s) distinct from the processor(s) that implement the head unit 102.
Amplifier 116 may amplify the output of the audio presentation processing subsystem 114, driving acoustic transducer 118 to produce an acoustic signal. The amplifier 116 may be implemented by the same processor(s) that defines the audio presentation processing subsystem 114 or by a separate processor(s). In an alternate example, the amplifier 116 may be implemented by hardware or a combination hardware and firmware.
It should be understood that, although the multichannel echo cancellation unit 112 is shown implemented in the amplifier unit 104, in various alternative examples, the multichannel echo cancellation unit 112 may be implemented in a processor or combination of processors distinct from the amplifier 116 or the audio-presentation processing subsystem 114. Indeed, as long as the multichannel echo canceler receives the program content channels u(n) as reference signals, the multichannel echo cancellation unit 112 may be located on a dedicated processor, or elsewhere. As such, the multichannel echo cancellation unit 112, as described herein, is completely modular, and may thus be included in any suitable processor.
The acoustic signal output by acoustic transducer 118 may, undesirably, be picked up by one or more microphone(s) 120. Generally, any aspect of the acoustic production of the acoustic transducer(s) 118 input to microphone(s) 120 is referred to herein as echo.
Multichannel echo cancellation unit 112 generally functions to remove any aspects of echo from the microphone signal, using the program content (e.g., phone signal up(n), announcement signal ua(n), entertainment audio signal ue(n), etc.) as reference signals, so that a microphone signal including only an estimated user's voice signal ŝ(n) (and noise that is uncorrelated with the echo) is provided back to the handsfree phone subsystem 106 of the head unit 102. The multichannel echo cancellation unit 112 thus provides multichannel echo canceling (i.e., several channels of program content u(n)) of the microphone signal y(n). In various examples, the multichannel echo cancellation unit 112 may artificially add an estimate of the echo dp(n) of the phone signal up(n) back to the output estimated voice signal ŝ(n) to be canceled by an echo canceler provided in the handsfree phone subsystem 106. As will be described in more detail below, it should be understood that, in various examples, the reference signals received by the multichannel echo cancellation unit 112 are not necessarily the program content signals u(n) output by head unit 102. Rather, some additional audio processing may be applied, e.g., by audio presentation processing 114, to program content signals u(n) before the signals are sent to multichannel echo cancellation unit 112 as reference signals.
The audio presentation processing subsystem 114 and the multichannel echo cancellation unit 112 are shown in greater detail in
The echo canceler 200 may include an adaptive algorithm to update the echo-cancellation filters 204, at intervals, to improve the estimated echo signal {circumflex over (d)}(n). Over time, the adaptive algorithm causes the echo-cancellation filters 204 to converge on satisfactory parameters that produce a sufficiently accurate estimated echo signal {circumflex over (d)}(n). Generally, the adaptive algorithm updates the echo-cancellation filters 204 during times when the user is not speaking, but in some examples the adaptive algorithm may make updates at any time. When the user speaks, such is deemed “double talk,” and the microphone(s) 120 picks up both the acoustic echo signal d(n) and the acoustic voice signal s(n). Double talk may be detected by double talk detector 208, according to any suitable method.
The echo-cancellation filters 204 may apply a set of filter coefficients to the content signal 202 to produce the estimated echo signal {circumflex over (d)}(n). The adaptive algorithm may use any of various techniques to determine the filter coefficients and to update, or change, the filter coefficients to improve performance of the echo-cancellation filters 204. Such adaptive algorithms, whether operating on an active filter or a background filter, may include, for example, a least mean squares (LMS) algorithm, a normalized least mean squares (NLMS) algorithm, a recursive least square (RLS) algorithm, or any combination or variation of these or other algorithms. The echo-cancellation filters 204, as adapted by the adaptive algorithm, converge to apply an estimated transfer function ĥ(n), which is representative of the echo path between acoustic transducer(s) 118 and microphone(s) 120 to the output of acoustic transducer(s) 118.
Generally speaking, as shown in
It should be understood that the number of adaptive echo-cancellation filters 204 will be dependent, generally, on the number of reference signals received. Thus, if the program content signals u(n) are used as reference signals, some number of echo-cancellation filters 204 equal to the number of program content signals u(n) may be implemented, each echo-cancellation filter 204 being respectively associated with one of program content signals u(n); whereas, if the soundstage rendering output b(n), is used, some N number of echo cancellation filters 204 may be implemented, each echo-cancellation filter 204 being respectively associated with one of N soundstage rendering outputs b(n). It should also be understood that, in some examples, a fewer number of adaptive echo-cancellation filters 204 than, e.g., program content signals u(n) or soundstage rendering outputs b(n), may be used. For example, fewer echo-cancellation filters 204 may be used if certain program content signals u(n), such as a set of woofer left, twiddler left, and twitter left program content signals u(n), are summed together and provided as a reference signal to a single echo-cancellation filter 204, or if only a subset of reference signals need to be used to achieve effective echo cancellation.
In addition to estimating the echo path(s) h(n), estimated transfer function ĥ(n) may represent an estimate of any processing disposed between the location from which the reference signals (e.g., program content signals u(n)) are taken and echo canceler 200. Thus, where, as shown in
In addition, as shown in
While the echo-canceler 200 cancels linear aspects of the microphone signal y(n) correlated to the program content channels, rapid changes and/or non-linearities in the echo path prevent the echo canceler 200 from providing a precise estimated echo signal d(n), and a residual echo will thus remain in the residual signal e(n). The post filter subsystem 210 thus operates to suppress the residual echo component with spectral filtering to produce an improved estimated voice signal ŝ(n). Such post filters are generally known in the art, however a brief description of one example will be provided below.
The post filter subsystem 210 comprises a post filter 212 and a coefficient calculator 214. The post filter 212 suppresses residual echo in the residual signal (from the echo canceler 200) by, in some examples, reducing the spectral content of the residual signal e(n) by an amount related to the likely ratio of the residual echo signal power relative to the total signal power (e.g., speech and residual echo), by frequency bin. In one example, the post filter 212 may multiply each frequency bin (represented by index “k”) of the residual signal e(n) by a filter coefficient Hpf (k), calculated by coefficient calculator 214, according to the following example equation:
where ΔHi(k) is a spectral mismatch, See(k) is the power spectral density of the residual signal, and Su
The spectral mismatch ΔHi(k) represents the spectral mismatch between the actual echo path and the acoustic echo canceler 200. The actual echo path is, for example, the entire path taken by the program content signal u(n) from where it is provided to the echo canceler 200, through the soundstage rendering 206, the acoustic transducer(s) 118, the acoustic environment, and through the microphone(s) 120. The actual echo path may further include processing by the microphone(s) 120 or other supporting components, such as array processing, for example. The spectral mismatch ΔHi(k) may be calculated as a ratio of the cross-power spectral density of program content signal u(n) on the i-th content channel 202 and the residual signal e(n), Su
In some examples, the power spectral densities used may be time-averaged or otherwise smoothed or low pass filtered to prevent sudden changes (e.g., rapid or significant changes) in the calculated spectral mismatch.
It should be understood that Eqs. 1 and 2 are generally related to the case in which reference signals are uncorrelated. If the reference signals are not necessarily uncorrelated (e.g., a left and right channel pair share some common content), the coefficient calculator 214 may calculate the filter coefficient Hpf(k) according to the following equation:
where ΔHH represents the Hermitian of ΔH, which is the complex conjugate transpose of ΔH, and where ΔH is given by:
ΔH=Suu−1Sue (4)
Suu is the matrix of power spectral densities and cross power spectral densities of the program content channels. ΔH is the vector containing the spectral mismatch of all channels, and Sue is the vector containing the cross power spectral densities of each reference channel with the error signal.
Although the above equations have been provided for a post filter 212 configured to suppress residual echo from multiple content channels 202, in alternate examples, the post filter 212 may be configured to suppress the residual echo from only one content channel 202.
In various examples, the post filter 212 may be configured to operate in the frequency domain or the time domain. Accordingly, use of the term “filter coefficient” is not intended to limit the post filter 212 to operation in the time domain. The terms “filter coefficients,” or other comparable terms, may refer to any set of values applied to or incorporated into a filter to cause a desired response or a desired transfer function. In certain examples, the post filter 212 may be a digital frequency domain filter that operates on a digital version of the estimated voice signal to multiply signal content within a number of individual frequency bins, by distinct values generally less than or equal to unity. The set of distinct values may be deemed filter coefficients.
Both the echo canceler 200 and the post filter subsystem 210 may be configured to calculate the echo-cancellation filter 204 coefficients and the post filter 212 coefficients, respectively, only during periods when a double talk condition is not detected, e.g., by a double talk detector 208. As described above, when a user is speaking within the acoustic environment of the audio system 100, the microphone signal y(n) includes a component that is the user's speech. In this case, the combined signal y(n) is not representative of only the echo from the acoustic transducers 118, and the residual signal e(n) is not representative of the residual echo, e.g., the mismatch of the echo canceler 200 relative to the actual echo path, because the user is speaking. Accordingly, the double talk detector 208 operates to indicate when double talk is detected, new coefficients may not be calculated during this period, and the coefficients in effect at the start or just prior to the user talking may be used while the user is talking. The double talk detector 208 may be any suitable system, component, algorithm, or combination thereof.
The amplifier unit 104, described in connection with
However, as described above, many handsfree phone subsystems will also perform some degree of echo cancellation with respect to echo signals correlated to the phone signal up(n). Thus, if an echo signal is not found to be present, some handsfree phone subsystems may register an error, interpreting the lack of echo to be indicative of a larger malfunction, such as a malfunctioning microphone. Accordingly, it is advantageous to spoof the phone echo signal dp(n) and provide it to the handsfree phone subsystem 106.
This may be accomplished in one of several ways, for example, in a first method, the estimated phone echo signal {circumflex over (d)}p(n), as calculated, e.g., by the echo cancellation filter 204b (that is, the echo cancellation filter 204 receiving the phone signal up(n) as a reference signal), may be included in the coefficient calculation and summed as part of the estimated echo signal {circumflex over (d)}(n) and subtracted from the microphone signal y(n) (as described below), but then added to the output signal at, at least, one of two locations, as shown in
As shown in
Alternatively, as shown in
(Here, i∈−{p} represents excluding the content channel 202b from the sum, which includes the phone program content signal up(n).) The post filter 212 thus filters the residual signal e(n), without filtering the component of the residual signal correlated to the phone program content signal up(n). Stated differently, the post filter 212 will pass the estimated phone echo signal {circumflex over (d)}p(n) through, unfiltered, while spectral mismatches in the remaining components of the residual signal are filtered as normal, again resulting in the estimated speech ŝ(n) and estimated phone echo signal {circumflex over (d)}(n) at the output of multichannel echo cancellation unit 112.
It should be understood that Eqs. 5 is generally related to the case in which reference signals are uncorrelated. If the reference signals are not necessarily uncorrelated (e.g., a left and right channel pair share some common content), the coefficient calculator 126 may calculate the filter coefficient Hpf(k) according to the following equation:
In Equation (6) the variables denoted with a tilde exclude the terms corresponding to the phone signal. is ΔH where the phone channel spectral mismatch ΔHphone was excluded. Similarly, {tilde over (s)}uu is suu with the phone channel PSD and cross PSDs removed, i.e. one row and one column less.
In another example, as shown in
In another example, shown in
The example described in connection with
The above examples of 2A-2D thus depict methods of providing the estimated phone echo signal {circumflex over (d)}p(n) at the output of the multichannel echo cancellation unit 112, where it may be canceled by the handsfree phone subsystem of the handsfree phone subsystem 106.
It should be understood that, in this disclosure, a capital letter used as an identifier or as a subscript represents any number of the structure or signal with which the subscript or identifier is used. Thus, acoustic transducer 118N represents the notion that any number of acoustic transducers 118 may be implemented in various examples. Indeed, in some examples, only one acoustic transducer may be implemented. Likewise, soundstage rendering output signal bN(n) represents the notion that any number of soundstage rendering output signals b(n) may be used. It should be understood that, the same letter used for different signals or structures, e.g., soundstage rendering output bN(n) and echo signals {circumflex over (d)}N(n), represents the general case in which there exists the same number of a particular signal or structure. Thus, in the general case, there will be the same number of soundstage rendering outputs bN(n) and echo signals {circumflex over (d)}N(n). The general case, however, should not be deemed limiting. A person of ordinary skill in the art will understand, in conjunction with a review of this disclosure, that, in certain examples, a different number of such signals or structures may be used.
The functionality described herein, or portions thereof, and its various modifications (hereinafter “the functions”) can be implemented, at least in part, via a computer program product, e.g., a computer program tangibly embodied in an information carrier, such as one or more non-transitory machine-readable media or storage device, for execution by, or to control the operation of, one or more data processing apparatus, e.g., a programmable processor, a computer, multiple computers, and/or programmable logic components.
A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a network.
Actions associated with implementing all or part of the functions can be performed by one or more programmable processors executing one or more computer programs to perform the functions of the calibration process. All or part of the functions can be implemented as, special purpose logic circuitry, e.g., an FPGA and/or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random-access memory or both. Components of a computer include a processor for executing instructions and one or more memory devices for storing instructions and data.
While several inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, and/or methods, if such features, systems, articles, materials, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.
Number | Name | Date | Kind |
---|---|---|---|
7117145 | Venkatesh | Oct 2006 | B1 |
7672445 | Zhang | Mar 2010 | B1 |
9275625 | Kim | Mar 2016 | B2 |
9373320 | Lyon | Jun 2016 | B1 |
20020172350 | Edwards | Nov 2002 | A1 |
20050159945 | Otsuka | Jul 2005 | A1 |
20070136053 | Ebenezer | Jun 2007 | A1 |
20080304675 | Roovers | Dec 2008 | A1 |
20160029124 | Paranjpe | Jan 2016 | A1 |
20200194019 | Kostic | Jun 2020 | A1 |
Number | Date | Country |
---|---|---|
2019028115 | Feb 2019 | WO |
Entry |
---|
International Search Report and the Written Opinion of the International Searching Authority, International Application No. PCT/US2020/038105, pp. 1-9, dated Oct. 1, 2020. |
Number | Date | Country | |
---|---|---|---|
20200395030 A1 | Dec 2020 | US |