This patent document contains material subject to copyright protection. The copyright owner has no objection to the reproduction of this patent document or any related materials in the files of the United States Patent and Trademark Office, but otherwise reserves all copyrights whatsoever.
1. Field of the Invention
This invention relates to communication systems, and, more particularly, to echo reduction in digital communication systems.
2. Background
Digital communications have become ubiquitous. In a typical digital communication system users may connect to each other via a communication network such as the Internet or the like and may exchange information such audio (e.g., speech) or video data in real time. Signals, including audio signals, may be transmitted between nodes of the communication network from one user to one or more other users.
Each user may have a user device with which to communicate with one or more other devices via the network. Each device has an audio input means, e.g., a microphone or the like and some means of audio output, e.g., a speaker or the like. For devices involved in a conversation, sounds picked up by microphone of a device are converted from analog to digital form and then sent to other devices in current communication with the device.
When a loudspeaker on a device plays sound, and a microphone on the same device captures this sound a fraction of a second later, the captured sound is referred to as an echo. Left untreated, such echoes can be disturbing to remote users in a voice call, who, because of the echoes, hear themselves back. Echo cancellers serve the purpose of removing such echoes.
Consider a typical situation, with reference to
There are two broad techniques for removing echoes: echo cancellation, which subtracts an estimate of the echo signal from the microphone signal; and echo suppression, which suppresses the microphone signal over time and frequency, depending on how much echo is present. This invention relates to both methods, and we will use the term echo cancellation to mean either method.
A critical component in many echo cancellers is a delay estimation module, which estimates the time it takes from sending the far end signal to the loudspeaker until it comes back as an echo in the near end signal from the microphone. Echo cancellers normally use this estimate to delay the far end signal by the same amount before it goes into the actual subtraction or suppression module, thus aligning the far end and echo signals in time.
The delay estimation module is especially important in software-based echo cancellers, where buffers may have unknown and potentially time-varying delay. Such buffers exist in the incoming and outgoing streams, both in hardware and in the Operating System (OS), as shown in
State of the art methods typically estimate the delay by trying to correlate the far end signal with the near end signal. In an ideal situation the far end signal would contain speech from the remote side and the near end signal would contain the echo of that speech, and the correlation measure would contain a single sharp peak at a correlation lag equal to the delay. Unfortunately, however, this approach suffers from the dependency on the speech signal. The near end signal may contain other signals besides the echo, such as strong background noise or a loud near end voice signal. Furthermore, the echo signal may be distorted by imperfections in the system or by overloading the microphone. Still further, speech signals often have a strong periodic character. In light of these effects, the correlation measure is often noisy, has a broad peak and contains spurious peaks, making correlation-based estimates unreliable. Much effort has gone into finding heuristics to improve the robustness of correlation-based estimates, with limited success.
Another problem with this correlation-based approach is that the delay estimate cannot be updated when the far end signal is silent. As a result the delay estimate may be off by the time the remote side starts speaking. This wrong estimate can prevent the echo canceller from removing echo until the delay estimate is again accurate.
Yet a further weakness of the correlation-based method is its computational complexity. A delay estimation module often accounts for a large part of an echo canceller's CPU usage.
It is desirable to have an echo cancellation technique that is robust, reliable, and not computationally expensive.
Other objects, features, and characteristics of the present invention as well as the methods of operation and functions of the related elements of structure, and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification. None of the drawings are to scale unless specifically stated otherwise.
As used herein, unless used otherwise, the following terms or abbreviations have the following meanings:
VoIP means Voice over IP.
A “mechanism” refers to any device(s), process(es), routine(s), service(s), module(s), or combination thereof. A mechanism may be implemented in hardware, software, firmware, using a special-purpose device, or any combination thereof. A mechanism may be integrated into a single device or it may be distributed over multiple devices. The various components of a mechanism may be co-located or distributed. The mechanism may be formed from other mechanisms. In general, as used herein, the term “mechanism” may thus be considered to be shorthand for the term device(s) and/or process(es) and/or service(s).
As shown in the drawing in
The microphone 306 picks up the far end signal (FES) with the added marker signal (M) (possibly along with any other sounds including speech, noise, etc.). The marker generator and detector mechanism 302 may then detect the marker signal (M) in the return near end signal, thereby determining or estimating a delay associated with that signal. The estimated delay 308 may be used by the echo canceller mechanism 310 to remove the echo from the near end signal. The near end signal with the echo canceled by the echo canceller mechanism 310 is then sent to the far end.
It should be appreciated that the echo canceller mechanism 310 may not remove all of the echo in the near end signal, and that, if successful, some or all of the echo may be removed.
Although shown as a single box 302 in the drawings, it should be understood that the marker generator mechanism and marker detector mechanism may be separate components. However, the marker detector mechanism will need to know details of markers added to the far end signal in order to be able to detect and distinguish them.
Sampling Frequency
In order to play out and capture a marker signal at frequencies of around 20 kHz, the sampling frequency must be at least twice as high as the marker signal frequency. In practice that means a sampling frequency of 44.1 kHz or 48 kHz, or higher (a typical speech sampling frequency for telephony is 8 kHz, with higher quality speech being provided at a sampling frequency of 16 kHz or higher). However, this does not mean that the echo canceller and other processing must also operate at such a high sampling frequency. Only the audio input and output, and marker processing need operate at the higher frequency, while the rest of the processing can be done at a lower sampling frequency. That is, the marker generation and detection can, and preferably does, operate at a higher sampling frequency than the echo cancellation.
In the exemplary embodiment shown in
Although the arriving far end signal is shown in this embodiment as 16 kHz, that of ordinary skill in the art will realize and appreciate, upon reading this description, that different/other frequencies may be used. For example, the sample frequency of the arriving far end signal may be, without limitation, 8 kHz, 12 kHz, 16 kHz, 24 kHz, or 32 kHz.
The resampling (from 16 to 48 kHz and then back) may introduce delays. In the exemplary embodiment shown in
Running the echo canceller and other processing at a lower sampling frequency reduces their computational complexity.
Those of ordinary skill in the art will realize and appreciate, upon reading this description, that when a frequency is specified herein, the mechanism will operate at substantially that frequency, within acceptable limits.
Fall Back to Conventional Delay Estimation
For reasons outside the software's control, the ultrasonic markers may not always be present in the echo. For instance the loudspeaker or microphone may be unable to produce or capture the marker's high frequencies. Or the audio chain in the operating system or hardware may contain a low-pass filter that removes the markers. In these cases the device may fall back to conventional delay estimation. A practical approach is to start with the conventional estimation, and have it be overruled (and perhaps turned off) when markers are being detected in the echo.
Microphone Overload
In systems that lack analogue microphone gain control, or lack an API or the like to set the analogue microphone gain, the echo can overload the microphone, meaning that the output from the microphone or microphone electronic circuitry is distorted. This distortion makes the echo less similar, and thus less correlated, to the far end signal, thereby deteriorating a conventional delay estimate. This easily happens in a hands-free set up (speakerphone mode), where the loudspeaker is much closer to the microphone than to the near end user. The near end user may turn up the speaker volume so high that the echo reaches the microphone at a much higher level than the near end user's voice. If the microphone's analogue microphone gain is tuned to pick up the near end user, then the louder echo will overload. The invention solves this problem because the marker signal does not have to reach the near end user's ear, so that it may be played at a lower level and thereby avoid overloading the microphone. This helps with accurately detecting markers.
Marker Interval
The time interval at which markers are generated (referred to as the marker interval) determines how fast the delay estimate can follow changes in the true delay. The shorter the marker interval, the faster it can follow. However, if the marker interval becomes shorter than the known range of the true delay, uncertainty arises about which marker is being detected. For example, if markers are generated every 500 ms, and the delay range is from 100 to 1000 ms, then a detection at 200 ms after generating a marker is followed by another detection at 700 ms. In order to resolve this ambiguity, consecutive markers may differ such that the detector can tell them apart.
Another reason for using variable markers is to be immune from nearby devices that use the same type of markers. Any detected markers that differ from markers that were recently played out must come from a different device and are simply ignored.
When the markers are not all the same then the marker generator and detector mechanism 302 preferably tracks markers that have been generated and looks for those markers in the return signal. Markers may be tracked by the mechanism by being stored in a table or buffer or the like. It should be appreciated that the mechanism 302 need only store enough recent markers to cover the expected maximum delay range for the selected marker interval.
Randomized Marker Interval
Two nearby devices may still interfere with each other if they happen to play their markers at exactly the same time. The remedy is to (randomly) vary the interval between consecutive markers, so that each marker is played after a different interval than the one before. As a result, most of the time markers from nearby devices will not overlap in time. As should be appreciated, an occasionally lost or missed marker poses no problem.
Generating Versus Storing
Some embodiments hereof may trade memory for computational processing by generating markers only once, and storing them. Markers are then simply read from memory when needed.
Choosing a Marker Signal
Since a marker signal is preferably inaudible to humans, a particular implementation may freely optimize the marker signal for its purpose of delay estimation. While any marker signals may be used, desirable (though not required) properties of a marker signal are:
Those of ordinary skill in the art will realize and appreciate, upon reading this description that most of these requirements overlap with those of ultrasonic inter-device communications. In that area, a known and proven method uses Orthogonal Frequency Division Modulation (OFDM) to encode a message into a signal, often in combination with Forward Error Correction (FEC) for robustness, e.g., as described in Matsuoka, Hosei, Yusuke Nakashima, and Takeshi Yoshimura. “Acoustic communication with OFDM signal embedded in Audio,” Audio Engineering Society Conference: 29th International Conference: Audio for Mobile and Handheld Devices, Audio Engineering Society, 2006. Those of ordinary skill in the art will realize and appreciate, upon reading this description that this approach also works well for the purpose of this invention.
Exemplary operation of aspects of embodiments hereof is described with reference to the flowchart in
The near end signal (with any echo and marker) is resampled (if necessary) at a lower rate (at 412). The delay (determined at 410) is then used to cancel the echo in the near-end signal (at 414).
In some embodiments the devices operate, at least in part, in a framework such as described in: (a) U.S. patent application Ser. No. 14/311,291, filed Jun. 21, 2014, titled “Unified And Consistent Multimodal Communication Framework,” and/or (b) U.S. patent application Ser. No. 14/536,590, filed Nov. 8, 2014, titled “Voice In A Unified And Consistent Multimodal Communication Framework,” the entire contents of both of which are hereby fully incorporated herein by reference for all purposes.
Real Time
Those of ordinary skill in the art will realize and understand, upon reading this description, that, as used herein, the term “real time” means near real time or sufficiently real time. It should be appreciated that there are inherent delays in network-based communication (e.g., based on network traffic and distances), and these delays may cause delays in data reaching various components Inherent delays in the system do not change the real-time nature of the data. In some cases, the term “real-time data” may refer to data obtained in sufficient time to make the data useful for its intended purpose.
Although the term “real time” may be used here, it should be appreciated that the system is not limited by this term or by how much time is actually taken. In some cases, real time computation may refer to an online computation, i.e., a computation that produces its answer(s) as data arrive, and generally keeps up with continuously arriving data. The term “online” computation is compared to an “offline” or “batch” computation.
As used in this description, the term “portion” means some or all. So, for example, “A portion of X” may include some of “X” or all of “X”. In the context of a conversation, the term “portion” means some or all of the conversation.
As used herein, including in the claims, the phrase “at least some” means “one or more,” and includes the case of only one. Thus, e.g., the phrase “at least some ABCs” means “one or more ABCs”, and includes the case of only one ABC.
As used herein, including in the claims, the phrase “based on” means “based in part on” or “based, at least in part, on,” and is not exclusive. Thus, e.g., the phrase “based on factor X” means “based in part on factor X” or “based, at least in part, on factor X.” Unless specifically stated by use of the word “only”, the phrase “based on X” does not mean “based only on X.”
As used herein, including in the claims, the phrase “using” means “using at least,” and is not exclusive. Thus, e.g., the phrase “using X” means “using at least X.” Unless specifically stated by use of the word “only”, the phrase “using X” does not mean “using only X.”
In general, as used herein, including in the claims, unless the word “only” is specifically used in a phrase, it should not be read into that phrase.
As used herein, including in the claims, the phrase “distinct” means “at least partially distinct.” Unless specifically stated, distinct does not mean fully distinct. Thus, e.g., the phrase, “X is distinct from Y” means that “X is at least partially distinct from Y,” and does not mean that “X is fully distinct from Y.” Thus, as used herein, including in the claims, the phrase “X is distinct from Y” means that X differs from Y in at least some way.
As used herein, including in the claims, a list may include only one item, and, unless otherwise stated, a list of multiple items need not be ordered in any particular manner. A list may include duplicate items. For example, as used herein, the phrase “a list of XYZs” may include one or more “XYZs”.
It should be appreciated that the words “first” and “second” in the description and claims are used to distinguish or identify, and not to show a serial or numerical limitation. Similarly, the use of letter or numerical labels (such as “(a)”, “(b)”, and the like) are used to help distinguish and/or identify, and not to show any serial or numerical limitation or ordering.
No ordering is implied by any of the labeled boxes in any of the flow diagrams unless specifically shown and stated. When disconnected boxes are shown in a diagram the activities associated with those boxes may be performed in any order, including fully or partially in parallel.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
This application is related to and claims priority from U.S. Provisional Patent Application No. 62/091,661, titled “Delay Estimation for Echo Cancellation Using Ultrasonic Markers,” filed Dec. 15, 2014, the entire contents of which are hereby fully incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
62091661 | Dec 2014 | US |