An embodiment of the invention relates to automatic gain control techniques applied to an uplink speech signal within a communications device such as a smart phone or a cellular phone. Other embodiments are also described.
In-the-field of mobile communications using devices such as a smart phones and cellular phones, there are many audio signal processing operations that can impact how well a far-side user hears a conversation with a mobile phone user. For instance, there is active noise cancellation, which is an operation that estimates or detects such background noise, and then adds an appropriate anti-noise signal to an “uplink” speech signal of the near-end user, before transmitting the uplink signal to the far-end user's device during a call. This helps reduce the amount of the near-end user's background noise that might be heard by the far-end user.
Another problem that often appears during a call is that of acoustic echo. A downlink speech signal contains the far-end user's speech. This may be playing through either a loudspeaker (speakerphone mode) or an earpiece speaker of the near-end user's device, and is inadvertently picked up by the primary microphone. This may be due to acoustic leakage within the near-end user's device or, especially in speakerphone mode, it may be due to reverberations from external objects that are near the loudspeaker. An echo cancellation process takes samples of the far-end user's speech from the downlink signal and uses it to reduce the amount of the far-end user's speech that has been inadvertently picked up by the near-end user's microphone, thus reducing the likelihood that the far-end user will hear an echo of his own voice during the call.
Some users of a mobile phone tend to speak softly, whether intentional or not, while others speak loudly. The dynamic range of the speech signal in a mobile device, however, is limited (for practical reasons). In addition, it is generally accepted that one would prefer a fairly steady volume during a conversation with another person. A process known as automatic gain control (AGC) will even out large amplitude variations in the uplink speech signal, by automatically reducing a gain that is applied to the speech signal if the signal is strong, and raising the gain when the signal is weak. In other words, AGC continuously adapts its gain to the strength of its input signal during a call. It may be used separately for both uplink and downlink signals.
To further enhance acoustic experience for the far-end user, AGC of an uplink signal in the near-end user's device is controlled so that its gain is “frozen” during time intervals (also referred to as frames) where the near-user is not speaking and there is apparent silence at the near-end user side of the conversation. Once speech resumes, a decision is made to unfreeze the AGC, thereby allowing it to resume its adaptation of the gain during a speech frame. This is done in order to avoid undesired gain changes or noise amplification during silence frames, which the far-end user might find strange as he hears strongly varying background noise levels during silence frames. A voice activity detector (VAD) circuit or algorithm is used, to determine whether a given frame of the uplink signal is a speech frame or a non-speech (silence) frame, and then on that basis a decision is made as to whether the AGC gain updating for the uplink signal should be frozen or not.
In accordance with an embodiment of the invention, decisions on whether or not to freeze the AGC gain updating for the uplink signal are made based on the possibility of far-end user speech echo being present in the uplink signal. Thus, a method for performing a call between a near-end user and a far-end user may include the following operations (performed during the call by the near-end user's communications device). A downlink speech signal is received from the far-end user's communications device. An AGC process is performed to update a gain applied to an uplink speech signal, and the gain-updated uplink signal is transmitted to the far-end user's device. A frame in the downlink signal that contains speech is detected, and in response the updating of the gain during a frame in the uplink signal is frozen.
In a further aspect of the invention, the method continues with detecting a subsequent frame in the downlink signal that contains no speech; in response, the updating of the gain is unfrozen during a subsequent frame in the uplink signal.
The above summary does not include an exhaustive list of all aspects of the present invention. It is contemplated that the invention includes all systems and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims filed with the application. Such combinations have particular advantages not specifically recited in the above summary.
The embodiments of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” embodiment of the invention in this disclosure are not necessarily to the same embodiment, and they mean at least one.
Several embodiments of the invention with reference to the appended drawings are now explained. While numerous details are set forth, it is understood that some embodiments of the invention may be practiced without these details. In other instances, well-known circuits, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.
Turning now to
The user-level functions of the device are implemented under control of an applications processor 4 that has been programmed in accordance with instructions (code and data) stored in memory 5, e.g. microelectronic, non-volatile random access memory. The processor and memory are generically used here to refer to any suitable combination of programmable data processing components and data storage that can implement the operations needed for the various functions of the device described here. An operating system may be stored in the memory 5, along with application programs to perform specific functions of the device (when they are being run or executed by the processor 4). In particular, there is a telephony application that (when launched, unsuspended, or brought to foreground) enables the near-end user to “dial” a telephone number or address of a communications device of the far-end user to initiate a call using, for instance, a cellular protocol, and then to “hang-up” the call when finished.
For wireless telephony, several options are available in the device depicted in
The applications processor 4, while running the telephony application program, may conduct the call by enabling the transfer of uplink and downlink digital audio signals (also referred to here as voice or speech signals) between the applications processor 4 or the baseband processor 20 on the network side, and any user-selected combination of acoustic transducers on the acoustic side. The downlink signal carries speech of the far-end user during a call, while the uplink signal contains speech of the near-end user that has been picked up by the primary microphone. The acoustic transducers include an earpiece speaker 12, a loudspeaker (speakerphone) 14, one or more microphones 16 including a primary microphone that is intended to pick-up the near-end user's speech primarily, and a wired headset 18 with a built-in microphone. The analog-digital conversion interface between these acoustic transducers and the digital downlink and uplink signals is accomplished by an analog codec 9. The latter may also provide coding and decoding functions for preparing any data that is to be transmitted out of the device 2 through a connector 10, and data that is received into the device 2 through the connector 10. This may be a conventional docking connector, used to perform a docking function that synchronizes the user's personal data stored in the memory 5 with the user's personal data stored in memory of an external computing system, such as a desktop computer or a laptop computer.
Still referring to
The downlink signal path receives a downlink digital signal from either the baseband processor 20 or the applications processor 4 (originating as either a cellular network signal or a WLAN packet sequence) through the digital audio bus interface 30. The signal is buffered and is then subjected to various functions (also referred to here as a chain or sequence of functions), including some in downlink processing block 26 and perhaps others in downlink processing block 29. Each of these may be viewed as an audio signal processor. For instance, processing blocks 26, 29 may include one or more of the following: a side tone mixer, a noise suppressor, a voice equalizer, an automatic gain control unit, and a compressor or limiter. The downlink signal as a data stream or sequence is modified by each of these blocks, as it progresses through the signal path shown, until arriving at the digital audio bus interface 31, which transfers the data stream to the analog codec 9 (for playback through the speaker 12, 14, or headset 18).
The uplink signal path of the processor 21 passes through a chain of several audio signal processors, including uplink processing block 24, acoustic echo canceller (EC) 23, and automatic gain control (AGC) block 32. The uplink processing block 24 may include at least one of the following: an equalizer, a compander or expander, and another uplink signal enhancement of noise reduction function. After passing through the AGC block 32, the uplink data sequence is passed to the digital audio bus interface 30 which in turn transfers the data sequence to the baseband processor 20 for speech coding and channel coding prior, or to the applications processor 4 for Internet packetization (prior to being transmitted to the far-end user's device).
The signal processor 21 also includes a voice activity detector (VAD) 26. The VAD 26 has an input through which it obtains the downlink speech data sequence and then analyzes it, looking for time intervals or frames that contain speech (which is that of the far-end user during the call). For instance, the VAD 26 may classify or make a decision on each frame of the downlink sequence that it has analyzed, into one that either has speech or does not have speech, i.e. a silence or pause segment of the far-end user's speech. The VAD 26 may provide, at its output, an identification of this time interval frame together with classification as speech or non-speech.
Still referring to
In one embodiment, the decision to freeze (and then unfreeze) is made by a gain update controller 28. The controller 28 may receive from the VAD 27 an identification of a frame that has just been identified as a downlink speech frame. Next, following a predetermined time delay or frame delay in the uplink signal (in response to the indication from the VAD 27), the controller causes the gain updating of the AGC 32 to be frozen during the next incoming frame to the AGC 32. This is depicted in the diagram of
In one embodiment, the predetermined delay may be estimated or set in advance, by determining the elapsed time or equivalent number of frames, for sending a given downlink frame through the following path: starting with the VAD 27, then through the downlink signal processing block 29, then through the analog codec 9 and out of a speaker (e.g., earpiece speaker 12 or loudspeaker 14), then reverberating or leaking into the microphone 16, then through the uplink processing block 24, then through the echo canceller 23, and then arriving at the AGC block 32.
If the VAD 27 indicates that it has detected a non-speech (NS) frame, then in response, and optionally after waiting out the predetermined time interval or frame delay in the uplink signal, the gain updating is unfrozen for the next incoming frame to the AGC block 32. The sequence in
While the block diagram of
The following additional process operations may be performed during the call:
waiting a predetermined delay (a given time interval or a given number of one or more frames) in response to detecting the frame in the downlink signal, before freezing the updating of the gain (the gain update controller 28 may be programmed at the factory with this delay or it may be dynamically updated during in-the-field use of the device 2);
detecting a subsequent frame in the downlink signal that contains no speech (e.g., by the VAD 27) and in response unfreezing the updating of the gain during a subsequent frame in the uplink signal (VAD 27 indicates the detection to the gain update controller 28 which then responds by allowing gain updates to be applied to the subsequent frame); and
waiting a predetermined delay in response to detecting the subsequent frame in the downlink signal, before unfreezing the updating of the gain (the gain update controller 28 may use the same delay as it used before it froze the gain updating).
As explained above, an embodiment of the invention may be a machine-readable medium (such as microelectronic memory) having stored thereon instructions, which program one or more data processing components (generically referred to here as a “processor”) to perform the digital domain operations described above including filtering, mixing, adding, subtracting, comparisons, and decision making. In other embodiments, some of these operations might be performed in the analog domain, or by specific hardware components that contain hardwired logic (e.g., dedicated digital filter blocks). Those operations might alternatively be performed by any combination of programmed data processing components and fixed hardwired circuit components.
While certain embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that the invention is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those of ordinary skill in the art. For example, although the block diagram of