1. Field of the Invention
The present invention relates to telecommunications over a computer network; in particular, the present invention relates to quality of audio and video communication over a computer network.
2. Discussion of the Related Art
Echo cancellation has been an active area of research in telecommunications for some time. In standard telephone networks, there are generally two sources of echoes—hybrid echo, and acoustic echo. Hybrid echoes result from the electrical properties of a telephone network. Acoustic echoes arise when signals (e.g., voice communication) originating at one end of a communication channel arrive at a recipient at the other end of the communication channel, and are then retransmitted back to the originator. For instance, two people (say, persons A and B) may be speaking to each other over a voice channel (e.g., a standard telephone connection or Voice-Over-Internet-Protocol (VOIP) connection). When person A speaks, person B listens to person A's speech through Person B's speakers. If Person B's microphone is sufficiently sensitive or close to the speakers, some of this speech may be picked up by the microphone and transmitted back to person A. This is perceived by person A as an echo of his/her speech, and can be awkward and distracting. The problem is aggravated when a “hands-free” device (e.g., a speakerphone), or a personal computer with a microphone and speakers set-up, is used for the communication. In such a system, the speakers are usually not immediately next to the listener's ears, thus necessitating an amplification in output volume. This amplified volume makes it easier for the listener to hear the other party's voice, but also makes it easier for the microphone to pick up—and hence to re-transmit—the signal back to the originating party.
Existing echo-canceling systems generally depend on what is referred to as an “altruistic” algorithm. In such an algorithm, each party endeavors to prevent the other party from hearing echoes, and vice-versa. Such an algorithm works by analyzing the signal arriving at a communication device (e.g., a telephone or a personal computer) and actuated as sound through its speaker. The algorithm tries to “subtract” a retransmitted portion of the received signal from the signal that is transmitted to the other party, so as to cancel the echoes of the received voice that the other party would otherwise hear. This processing requires an amount of work that is proportional to the so-called “echo path delay” (i.e., the amount of time between the arrival of a signal at one party's speaker and the echo of that signal at the microphone). For a typical application, the echo path delay is usually in the order of milliseconds, or even less. One common algorithm for echo cancellation in such an application is the LMS (i.e., least-mean squares) filter, or its variants, such as the normalized least-mean squares (NLMS) filter. There are other adaptive algorithms that estimate the error of a signal based only on observable signals. However, for various reasons, processing using such an algorithm at the site of the echo may be either impossible or impractical.
The present invention provides a method for using an intermediate server to process the communication between two parties, so as to eliminate echoes between them. According to one embodiment of the present invention, the server performs echo cancellation in a network-based voice communication system serving many conversations. For each conversation, the server allocates two echo cancellation modules, one for each communicating client program of the conversation, with each echo cancellation module (“current echo cancellation module”) including (a) a communication interface for communicating with a client program associated with the current echo cancellation module; (b) a first buffer for storing audio data received from the client program for transmission to a second echo cancellation module; (c) a second buffer for storing audio data received from the second echo cancellation module for transmitting to the associated client program of the current echo cancellation module; and (d) a set of filters using the audio data in both the first buffer and the second buffer to cancel echoes in the audio data in the second buffer. The communication interface of each echo cancellation module may be a logical communication interface communicating with a client program over a computer network.
According to one embodiment of the present invention, the set of filters provided on the server may include a filter implementing a method for double-talk detection. The method for double-talk detection may be any one of many methods, such as the Geigel algorithm, the “Microphone-echo cross-correlation” algorithm or the “Fast Normalized Cross-correlation” algorithm. In one embodiment, a filter implementing an echo cancellation method is suspended when the double-talk detection method detects double-talk.
The present invention allows the use of any one of many echo cancellation methods, such as the “Normalized Least-mean Squares” algorithm and the “Normalized Least-mean Squares algorithm with Pre-Whitening.” In one implementation, the echo cancellation filter may have between 4,000 to 32,000 taps. Optionally, a high-pass filter may be provided to eliminate frequency components less than 300 Hz.
The set of filters on the server may be implemented in software modules. The server may be one of multiple servers, together handling a large number of associated client programs supporting many conversations.
The present invention is better understood upon consideration of the detailed description below and the accompanying drawings.
The present invention provides a method which uses an intermediate server to process video or audio communication between two parties in order to eliminate echoes between them.
If both parties use, for example, the Adobe Flash software, the voice or audio data would arrive at intermediate server 101 in the Adobe Flash video format. The present invention is not limited by any particular audio or video data format. That is, if another software is used, the video or audio format may be in a format that is specific or proprietary to the transmitting software. In that situation, according to one embodiment of the present invention, the received video or audio data may be transformed (or transcoded) into a representation that is compatible with—or which is convenient for—the echo cancellation algorithm. One such format may be pulse-code modulation (PCM). Under the PCM format, analog audio data is sampled at regular intervals (e.g., 8 kHz, or 8,000 samples per second, which is typical for an audio communication application), and each sample is given a value within a certain range (e.g., a typical range may be a 16-bit range, or from −32,768 to 32,767).
As shown in
Initially, context 201 accumulates audio data coming from person B (received through
Context 201's “rx in” port) for a time period. The accumulated data may be buffered internally and simultaneously transmitted to person A without modification by context 201. When audio data is received at context 201's “tx in” port (i.e., when person A speaks), context 201 may modify such tx data before sending it through the “tx out” port to context 202 and hence to a speaker system at Person B's location. The decision as to whether or not to modify the incoming tx data may be based on a determination as to whether or not person A is currently speaking. If person A is determined to be speaking, context 201 generally sends the tx audio data unmodified to context 202. However, when context 201 determines that person A is not speaking, and yet receives audio data from person A, such audio data may include an echo of person B's speech, and therefore should be canceled.
Any one of many known DTD algorithms may be used to implement DTD module 302. For example, the Geigel algorithm is known and used in conventional telephone networks. The Geigel algorithm performs well in situations where the echo path is known and the delay is more or less constant (e.g., in a telephone network with a fixed line delay). However, the Geigel algorithm performs poorly for situations involving unpredictable or variable-length echo paths. As DTD is an area of active research, making DTD module 302 pluggable (i.e., in such a modular form that it can be replaced easily with a recompilation or with a command-line switch) allows echo cancellation process 300 to take advantage of ongoing developments in this field. Other suitable DTD algorithms that may be used to implement DTD module 302 include the “Microphone-echo cross-correlation” algorithm and the “Fast Normalized Cross-correlation” algorithm.
Once it is determined that echo cancellation should take place, the context again uses its buffered samples received through the “rx in” port. First, optional filtering on the “tx in” audio data may be performed. For instance, as a result of limitations in the conventional telephone network, telephone users are accustomed to the absence of frequencies in the transmitted speech below 300 Hz in voice communications. Such optional filtering (not shown in
The complexity of an implementation of the NLMS or NLMS-PW algorithm is generally proportional to the echo path delay, as previously mentioned. For a conventional application (e.g., a conventional telephone system), the echo path delay may only be a few milliseconds. For the server-based approach (e.g., system 100 illustrated in
The above detailed description is provided to illustrate the specific embodiments of the present invention and is not intended to be limiting. Many variations and modifications within the scope of the present invention are possible. The present invention is set forth in the following claims.
The present application is related to and claims priority of U.S. provisional patent application (‘Provisional Patent Application’), entitled “System And Method For Echo Reduction In Audio And Video Telecommunications Over A Network,” Ser. No. 61/420,248, filed on Dec. 6, 2010. The Provisional Patent Application is hereby incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61420248 | Dec 2010 | US |