The present invention pertains, among other things, to systems, methods and techniques for audio signal processing and has particular applicability to reduction of echoes in an audio signal.
The existence of echo is a frequent problem in audio systems. One example of an audio subsystem 10 in which echo arises is shown in
Unfortunately, it frequently is the case that some portion of the audio signal 12 that is played through speaker 14 reaches microphone 16, typically with some modifications, which are represented in
In order to address this issue, the signal x(n) 19 conventionally is processed by a digital echo canceler 20, which attempts to remove the echo noise. For this purpose, in the current disclosure: r(n) is used to denote the echo reference signal 22 (which typically is a digitized version of the received signal 12 that is provided to the speaker 14), x(n) 18 (as noted above) is a digitized version of the signal received by microphone 16, and y(n) is the echo cancellation (EC) digital output signal 24. Conventionally, all three of such signals are at the same sampling rate R, and the relationship between x(n) and r(n) is:
x(n)=r(n)*f(n)+d(n)
where * denotes the convolution operation and d(n) is a digitized version of the near-end target signal (i.e., a digitized version of the microphone input signal 18 that would be present in the absence of echo noise). Ideally, echo canceler 20 outputs y(n)=d(n). For this purpose, an estimate of the impulse response f(n), i.e., f(n), n=0, . . . , L−1 (where L is the chosen echo reference length), typically is generated. In conventional EC algorithms, Least-Mean-Square (LMS) or Normalized-Least-Mean-Square (NLMS) algorithms are used to continuously update the impulse response estimate, {circumflex over (f)}(n), at each of the time samples at the original sampling rate R. Then, in certain conventional subsystems 10, the echo canceler 20 is implemented such that:
y(n)=x(n)−r(n)*{circumflex over (f)}(n)=x(n)−Στ=0L−1 {circumflex over (f)}(τ)r(n−τ) Eq. 1
Such systems can be considered to employ a full-band EC algorithm.
Alternatively, as shown in
The down-sampled signals, which can be denoted as xmD(n) and rmD(m) for m=1, . . . , M, respectively, now at the sampling rate
are then fed into the corresponding sub-band's echo cancellation module 34m, labeled EC-m in
In certain conventional sub-band implementations, to further save on computational resources, the down-sampling operations 32 are combined into the decomposition module 30, and the up-sampling operations 36 are combined into the re-synthesis module 40. However, for either such implementation, it has been widely reported that increased down-sampling, while resulting in less computational complexity, also diminishes echo-reduction performance.
Conventional sub-band echo cancellation systems typically have faster convergence and better steady-state echo suppression performance than full-band systems. However, such improvements over traditional full-band echo cancellation are provided at the cost of a significant increase in computational (or system) complexity.
Among other benefits, the present invention provides systems, methods and techniques that can reduce such complexity. According to certain approaches of the present invention, sub-band decomposition of x(n) is performed at a different rate than sub-band decomposition of r(n), e.g., by using different downsampling rates. In certain approaches, x(n) is processed at one sampling rate and r(n) is processed at one or more different (preferably lower) rate(s). In either event, by properly constructing each subband's echo canceller, such different rates can be used to effectively reduce the echo reference length L and hence can help to: (1) reduce the echo canceler's computational complexity, (2) speed-up the echo canceler's convergence stage, and (3) stabilize the echo canceler's adaptive-learning and echo-reduction performance.
One particular embodiment of the invention is directed to a method of reducing echo in an audio signal. According to this method, an input signal, an estimate of a system-characterizing function, and a reference signal, each at a corresponding sample rate and each divided into a plurality of sub-bands are obtained. Such sub-bands are separately processed, such that for a given sub-band the estimate of the system-characterizing function and the reference signal are processed to generate an echo-estimation signal and then such echo-estimation signal is subtracted from the input signal to provide an echo-corrected signal for that given sub-band. The echo-corrected signals from different ones of the sub-bands are then combined to provide a final output signal. One feature of this method is that the echo-estimation signal is generated using a processing sample rate that is lower than the sample rate for the input signal.
Another embodiment is directed to a system for reducing echo in an audio signal, which includes: (a) a number of echo-cancellation modules, each such echo-cancellation module including: (i) an echo-estimation module that inputs an estimate of a system-characterizing function at a first sample rate and a reference signal at a second sample rate and that, processing at a third sample rate, outputs an echo estimate signal at a fourth sample rate, and (ii) a subtractor that subtracts the echo estimate signal from an input signal, also at the fourth sample rate, to produce an echo-canceled sub-band signal at the fourth sample rate; and (b) a synthesis module that synthesizes the echo-canceled sub-band signals from the echo-cancellation modules to produce a final output signal. In the system, the third sample rate is lower than the fourth sample rate.
The foregoing summary is intended merely to provide a brief description of certain aspects of the invention. A more complete understanding of the invention can be obtained by referring to the claims and the following detailed description of the preferred embodiments in connection with the accompanying figures.
In the following disclosure, the invention is described with reference to the accompanying drawings. However, it should be understood that the drawings merely depict certain representative and/or exemplary embodiments and features of the present invention and are not intended to limit the scope of the invention in any manner. The following is a brief description of each of the accompanying drawings.
The following discussion concerns, among other things, improved systems, methods and techniques for performing audio signal echo cancellation. As used herein, the term “cancellation” does not necessarily refer to complete cancellation. Although complete cancellation often is the preferred goal, some amount of echo ultimately might remain. Instead, expressions referring to echo cancellation herein are better understood as reducing echo to some tolerable level, often subject to other trade-offs.
y
m
D(n)=xmD(n)−Στ=0L−1 {circumflex over (f)}m(τ)rm(nD−τ) Eq. 2
where {circumflex over (f)}m is the mth sub-band decomposition of {circumflex over (f)}.
However, for each sub-band m, because it is known that {circumflex over (f)}m and rm are more band-limited than x, the present inventors have discovered that it is possible to effectively down-sample these two signals by a rate of Dm (typically greater than D, resulting in a lower effective sample rate) and still achieve the same echo estimates as Στ=0L−1 {circumflex over (f)}m(τ)rm(nD−τ). The choice of the effective down-sampling rate, Dm, preferably is only limited by the condition that no (or limited) frequency aliasing happens during such down-sampling process. Therefore, Dm generally can be even larger than D, which is usually chosen to be smaller than the (band-pass) Nyquist down-sampling rate, in order to allow better echo-reduction performance. Considering such effective down-sampling:
y
m
D(n)=xmD(n)−Στ=0L/D
where {circumflex over (f)}m is the Dm rate down-sampled version of {circumflex over (f)}m. In the preferred embodiments, a direct estimate is made of {tilde over (f)}m, rather than {circumflex over (f)}m. That is, rather than generating and then down-sampling {circumflex over (f)}m, the system finite impulse response function (or other type of system response function in other embodiments) preferably initially is generated at the lower sampling rate (R/Dm), i.e., {tilde over (f)}m. Also, it is noted that in Equation 3, and in system 100, rm(n) is not actually down-sampled but instead is just effectively down-sampled as a result of the processing performed in the corresponding echo-cancellation module 134m. That is, while rm(n) remains at a sampling rate of R, the processing (and, more specifically, the convolution processing) is performed within echo-cancellation module 134m at a processing sample rate of R/Dm, i.e., only using every Dm samples of rm(n). Generally speaking, the full-rate (R sample rate) version of rm(n) is retained in order to avoid timing mismatches that otherwise would occur as a result of Dm being different than D (e.g., so that the starting point of any particular convolution can be chosen arbitrarily).
In some cases, e.g., as discussed in greater detail below, it will be possible to actually down-sample rm(n), at least to some extent, without having such mismatches. However, even without any down-sampling of rm(n), the echo reference length of a given echo cancellation module 134m is reduced from L or L/D to L/Dm, thereby providing the benefits mentioned above.
Also, it should be noted that due to the commutative property of convolution, in alternate embodiments of the invention, rm(n) actually is down-sampled by Dm, or originally obtained at the sampling rate of R/Dm, and {circumflex over (f)}m(n) is estimated and retained within the corresponding echo cancellation module 134m at the full rate R (i.e., {circumflex over (f)}m(n) is just effectively down-sampled, instead of rm(n)). Still further, it is possible to just effectively (rather than actually) down-sample both rm(n) and {circumflex over (f)}m(n). Any such implementation will result in the same reduction in the echo reference length or, equivalently, in the amount of processing required to be performed by the echo cancellation modules 134m. However, actual down-sampling of at least one of such signals can further reduce processing requirements and, therefore, is preferred. For ease of discussion only, the present disclosure mainly assumes an embodiment in which {circumflex over (f)}m(n) is actually down-sampled by Dm (or initial estimation of {tilde over (f)}m at a rate that is lower by a factor of Dm), while rm(n) is maintained at the full rate R. However, no loss of generality is intended.
If the Dm s (or, equivalently, the effective sampling rates of {circumflex over (f)} and rm) are properly chosen, such that there is a non-trivial common factor (denoted by Dr) for {Dm, m=1, . . . , M}, as well as for D, such a down-sampling rate Dr can be applied at the sub-band decomposition module 130B for r(n) (similar to what is done in sub-band decomposition module 130A for x(n)), in order to further reduce computational complexity. In such a case, appropriate indexing changes are made to Equation 3 above.
In the preferred embodiments:
By choosing {Dm, m=1, . . . , M}, it is possible to control the computational complexity balance/trade-off between the sub-band echo-cancellation modules and the sub-band decomposition module of r(n). For instance, higher Dm can allow for a shorter echo reference in the corresponding echo cancellation module 134m but might reduce the possibility of down-sampling at the sub-band decomposition module 130B for r(n).
With M=32, and without providing any guard-band, the Dms that preferably can be used for each of the different sub-bands are shown as white cells (while the Dms that preferably cannot be used for each of the different sub-bands are shown as black cells) in
In a sub-band echo-cancellation system, any frequency aliasing that happens during down-sampling of the echo reference will cause degradation of the echo-reduction performance of the whole EC system. Therefore, in conventional sub-band based EC systems, there generally is no way to avoid frequency aliasing in some or all the sub-bands unless D is chosen to be 1, which would make the system's computational complexity prohibitive when M is non-trivial. In contrast, with a sub-band EC system 100 according to the present invention, it is possible to effectively down-sample the echo reference at each sub-band's EC module 134m, without causing any frequency-aliasing or other performance degradation. Thus, even while avoiding (or limiting) performance degradation, significant savings in computational complexity can be achieved, particularly when M is large.
The preceding discussion mainly is focused on one particular exemplary embodiment, e.g., in order to better and/or more clearly illustrate some of the conceptual underpinnings, of the present invention. A more generalized depiction of an echo-cancellation system 200, according to the preferred embodiments of the present invention, is shown in
Similar to system 100, system 200 includes M echo-cancellation processing modules 234m (although only a single one is shown in detail in
In the following discussion, a somewhat different notation is used, as compared to that used above. Each of the signals shown in
In the previous section, it was usually assumed that all signals initially have a full sample rate of R. However, in the present, more-generalized embodiments, no such assumption is made (although the concept of there being an underlying common sample rate of R, with all of the actual sample rates being an integer sub-rate of R is still useful). Instead, for example, the input signal x might initially be sampled (or otherwise input) at a lower rate. Similarly, the full sample rate R might be used only for the output signal, or even not at all, within the audio subsystem of which echo-cancellation system 200 is a part.
As in the previously discussed exemplary embodiment, system 200 also is a sub-band EC system, having a separate echo-cancellation processing module 234m for each sub-band m. Although only a single such module 234m is shown in detail in
Each echo-cancellation processing module 234m includes an echo estimation module 236m that inputs the mth sub-band of a reference signal 222 (i.e., rm), having a sample rate of Rrm. In the exemplary embodiment discussed above, Rrm typically will be R, but, e.g., as noted above, rm previously might have been down-sampled by Dr, or might have been initially input at a different sampling rate. Module 236m also inputs the mth sub-band of an impulse response estimate 223 ({circumflex over (f)}m), having a sampling rate of Rfm. In the exemplary embodiment discussed above, Rfm typically will be R/Dm, either as a result of downsampling or initially input at such rate, but instead might be at a different sampling rate, such as R. Preferably, at least one of rm and {circumflex over (f)}m is at a lower sampling rate, as discussed above. In the current embodiments, as in system 100 discussed above, {circumflex over (f)}m is generated by system response estimation module 225 in a conventional manner, e.g., using a Least-Mean-Square (LMS) or Normalized-Least-Mean-Square (NLMS) algorithm, and thereby updated continuously.
In any event, echo estimation module 236m generates an estimate of the echo (e.g., received at the microphone 16) based on these two input signals (rm 222 and {circumflex over (f)}m 223). In the preferred embodiments, the main (or even sole) processing performed by each echo estimation module 236m is a convolution between rm 222 and {circumflex over (f)}m 223. At least some of such processing (e.g., at least the convolution processing) is performed at a sample rate of RPm. Typically, at least two of the sample rates Rrm, Rfm and RPm are different from each other, so one of the signals rm 222 or {circumflex over (f)}n 223 is indexed differently (e.g., less frequently, with more skipped samples) than the other. For example, in the exemplary embodiment described above, Rfm=RPm<Rrm, so rm is indexed during such processing with more sample skips.
The mth sub-band output echo estimate 237 (Em) of echo estimation module 236m, preferably is at the same sample rate (Rx) as the mth sub-band input signal 221 (xm). Such mth sub-band output echo estimate 237 (Em) is subtracted from the mth sub-band input signal 221 (xm) in subtractor 238 to provide the mth sub-band echo-corrected signal 239m (ym), also at the sample rate Rx. All of such sub-band echo-corrected signals 239m are then resynthesized into the final output signal 242 (y at a sample rate of Ry) in sub-band resynthesis module 240, which can also include any desired re-sampling (e.g., up-sampling, particularly if x had been down-sampled).
As indicated above, one of the advantages of the present invention is that different sampling rates can be used for the various signals and processing throughout the system 200. For instance, for the reasons noted above, it usually is preferable for all or at least a portion of the processing performed in some or all of the echo estimation modules 236m to be at sample rate(s) RPm that are different than (preferably lower than) the rate Rx of the input signal 221 (xm), even after taking into account any down-sampling of input signal 221.
Another advantage of the present invention is that the processing sample rates (RPm) of the echo estimation modules 236m (for the different sub-bands m) can be different from each other. Generally speaking, it is preferable that the sample rates of the individual signals are selected appropriately such that: (1) aliasing is avoided or at least limited to an acceptable level; (2) the echo estimation signal 237 has the same sampling rate as the input signal 221; and (3) sufficient samples are available to perform the echo estimation processing in the corresponding module 236m. As noted in connection with the exemplary embodiment discussed above, this can be achieved by using the full sample rate R for the reference signal 222 or the impulse response estimate 223 and using an subrate R/N1 for the other such signal, together with a second subrate R/N2 for the input signal 221, where N1 and N2 are integers that are greater than or equal to 1. However, other appropriate rate selections are available and will be apparent to those of ordinary skill in the art based on the present teachings.
In the foregoing embodiments, echo is estimated based on a reference signal and an estimated impulse response. However, in alternate embodiments, echo may be estimated based on the reference signal and any other system-characterizing function, such as a frequency-based transfer function for a function that describes the system's response to any input other than an impulse.
Generally speaking, except where clearly indicated otherwise, all of the systems, methods, functionality and techniques described herein can be practiced with the use of one or more programmable general-purpose computing devices. Such devices (e.g., including any of the electronic devices mentioned herein) typically will include, for example, at least some of the following components coupled to each other, e.g., via a common bus: (1) one or more central processing units (CPUs); (2) read-only memory (ROM); (3) random access memory (RAM); (4) other integrated or attached storage devices; (5) input/output software and circuitry for interfacing with other devices (e.g., using a hardwired connection, such as a serial port, a parallel port, a USB connection or a FireWire connection, or using a wireless protocol, such as radio-frequency identification (RFID), any other near-field communication (NFC) protocol, Bluetooth or a 802.11 protocol); (6) software and circuitry for connecting to one or more networks, e.g., using a hardwired connection such as an Ethernet card or a wireless protocol, such as code division multiple access (CDMA), global system for mobile communications (GSM), Bluetooth, a 802.11 protocol, or any other cellular-based or non-cellular-based system, which networks, in turn, in many embodiments of the invention, connect to the Internet or to any other networks; (7) a display (such as a cathode ray tube display, a liquid crystal display, an organic light-emitting display, a polymeric light-emitting display or any other thin-film display); (8) other output devices (such as one or more speakers, a headphone set, a laser or other light projector and/or a printer); (9) one or more input devices (such as a mouse, one or more physical switches or variable controls, a touchpad, tablet, touch-sensitive display or other pointing device, a keyboard, a keypad, a microphone and/or a camera or scanner); (10) a mass storage unit (such as a hard disk drive or a solid-state drive); (11) a real-time clock; (12) a removable storage read/write device (such as a flash drive, any other portable drive that utilizes semiconductor memory, a magnetic disk, a magnetic tape, an opto-magnetic disk, an optical disk, or the like); and/or (13) a modem (e.g., for sending faxes or for connecting to the Internet or to any other computer network). In operation, the process steps to implement the above methods and functionality, to the extent performed by such a general-purpose computer, typically initially are stored in mass storage (e.g., a hard disk or solid-state drive), are downloaded into RAM, and then are executed by the CPU out of RAM. However, in some cases the process steps initially are stored in RAM or ROM and/or are directly executed out of mass storage.
Suitable general-purpose programmable devices for use in implementing the present invention may be obtained from various vendors. In the various embodiments, different types of devices are used depending upon the size and complexity of the tasks. Such devices can include, e.g., mainframe computers, multiprocessor computers, one or more server boxes, workstations, personal (e.g., desktop, laptop, tablet or slate) computers and/or even smaller computers, such as personal digital assistants (PDAs), wireless telephones (e.g., smartphones) or any other programmable appliance or device, whether stand-alone, hard-wired into a network or wirelessly connected to a network.
In addition, although general-purpose programmable devices can be used in the systems described above, in alternate embodiments one or more special-purpose processors or computers instead (or in addition) are used. In general, it should be noted that, except as expressly noted otherwise, any of the functionality described above can be implemented by a general-purpose processor executing software and/or firmware, by dedicated (e.g., logic-based) hardware, or any combination of these approaches, with the particular implementation being selected based on known engineering tradeoffs. More specifically, where any process and/or functionality described above is implemented in a fixed, predetermined and/or logical manner, it can be accomplished by a processor executing programming (e.g., software or firmware), an appropriate arrangement of logic components (hardware), or any combination of the two, as will be readily appreciated by those skilled in the art. In other words, it is well-understood how to convert logical and/or arithmetic operations into instructions for performing such operations within a processor and/or into logic gate configurations for performing such operations; in fact, compilers typically are available for both kinds of conversions.
It should be understood that the present invention also relates to machine-readable tangible (or non-transitory) media on which are stored software or firmware program instructions (i.e., computer-executable process instructions) for performing the methods and functionality of this invention. Such media include, by way of example, magnetic disks, magnetic tape, optically readable media such as CDs and DVDs, or semiconductor memory such as various types of memory cards, USB flash memory devices, solid-state drives, etc. In each case, the medium may take the form of a portable item such as a miniature disk drive or a small disk, diskette, cassette, cartridge, card, stick etc., or it may take the form of a relatively larger or less-mobile item such as a hard disk drive, ROM or RAM provided in a computer or other device. As used herein, unless clearly noted otherwise, references to computer-executable process steps stored on a computer-readable or machine-readable medium are intended to encompass situations in which such process steps are stored on a single medium, as well as situations in which such process steps are stored across multiple media.
The foregoing description primarily emphasizes electronic computers and devices. However, it should be understood that any other computing or other type of device instead may be used, such as a device utilizing any combination of electronic, optical, biological and chemical processing that is capable of performing basic logical and/or arithmetic operations.
In addition, where the present disclosure refers to a processor, computer, server, server device, computer-readable medium or other storage device, client device, or any other kind of apparatus or device, such references should be understood as encompassing the use of plural such processors, computers, servers, server devices, computer-readable media or other storage devices, client devices, or any other such apparatuses or devices, except to the extent clearly indicated otherwise. For instance, a server generally can (and often will) be implemented using a single device or a cluster of server devices (either local or geographically dispersed), e.g., with appropriate load balancing. Similarly, a server device and a client device often will cooperate in executing the process steps of a complete method, e.g., with each such device having its own storage device(s) storing a portion of such process steps and its own processor(s) executing those process steps.
As used herein, the term “coupled”, or any other form of the word, is intended to mean either directly connected or connected through one or more other elements or processing blocks, e.g., for the purpose of preprocessing. In the drawings and/or the discussions of them, where individual steps, modules or processing blocks are shown and/or discussed as being directly connected to each other, such connections should be understood as couplings, which may include additional elements and/or processing blocks. Unless otherwise expressly and specifically stated otherwise herein to the contrary, references to a signal herein mean any processed or unprocessed version of the signal. That is, specific processing steps discussed and/or claimed herein are not intended to be exclusive; rather, intermediate processing may be performed between any two processing steps expressly discussed or claimed herein.
As used herein, the term “attached”, or any other form of the word, without further modification, is intended to mean directly attached, attached through one or more other intermediate elements or components, or integrally formed together. In the drawings and/or the discussion, where two individual components or elements are shown and/or discussed as being directly attached to each other, such attachments should be understood as being merely exemplary, and in alternate embodiments the attachment instead may include additional components or elements between such two components. Similarly, method steps discussed and/or claimed herein are not intended to be exclusive; rather, intermediate steps may be performed between any two steps expressly discussed or claimed herein.
In the preceding discussion, the terms “operators”, “operations”, “functions” and similar terms refer to process steps or hardware components, depending upon the particular implementation/embodiment.
Unless clearly indicated to the contrary, words such as “optimal”, “optimize”, “maximize”, “minimize”, “best”, as well as similar words and other words and suffixes denoting comparison, in the above discussion are not used in their absolute sense. Instead, such terms ordinarily are intended to be understood in light of any other potential constraints, such as user-specified constraints and objectives, as well as cost and processing or manufacturing constraints.
In the above discussion, certain processes and/or methods are explained by breaking them down into functions or steps listed in a particular order. However, it should be noted that in each such case, except to the extent clearly indicated to the contrary or mandated by practical considerations (such as where the results from one function or step are necessary to perform another), the indicated order is not critical but, instead, that the described functions and steps can be reordered and/or two or more of such steps can be performed concurrently.
References herein to a “criterion”, “multiple criteria”, “condition”, “conditions” or similar words which are intended to trigger, limit, filter or otherwise affect processing steps, other actions, the subjects of processing steps or actions, or any other activity or data, are intended to mean “one or more”, irrespective of whether the singular or the plural form has been used. For instance, any criterion or condition can include any combination (e.g., Boolean combination) of actions, events and/or occurrences (i.e., a multi-part criterion or condition).
Similarly, in the discussion above, functionality sometimes is ascribed to a particular module or component. However, functionality generally may be redistributed as desired among any different modules or components, in some cases completely obviating the need for a particular component or module and/or requiring the addition of new components or modules. The precise distribution of functionality preferably is made according to known engineering tradeoffs, with reference to the specific embodiment of the invention, as will be understood by those skilled in the art.
In the discussions above, the words “include”, “includes”, “including”, and all other forms of the word should not be understood as limiting, but rather any specific items following such words should be understood as being merely exemplary.
Several different embodiments of the present invention are described above [and in the documents incorporated by reference herein, with each such embodiment described as including certain features. However, it is intended that the features described in connection with the discussion of any single embodiment are not limited to that embodiment but may be included and/or arranged in various combinations in any of the other embodiments as well, as will be understood by those skilled in the art.
Thus, although the present invention has been described in detail with regard to the exemplary embodiments thereof and accompanying drawings, it should be apparent to those skilled in the art that various adaptations and modifications of the present invention may be accomplished without departing from the intent and the scope of the invention. Accordingly, the invention is not limited to the precise embodiments shown in the drawings and described above. Rather, it is intended that all such variations not departing from the intent of the invention are to be considered as within the scope thereof as limited solely by the claims appended hereto.
Number | Date | Country | |
---|---|---|---|
Parent | 15704235 | Sep 2017 | US |
Child | 16161216 | US |