The present invention relates generally to processing speech signals and, more specifically, to adjusting gain of speech signals for enhancing voice quality.
Cellular phones and networks employ speech codecs to reduce the data rate in order to make efficient use of the bandwidth resources in the radio interface. In a mobile-to-mobile call, the PCM (pulse code modulation) speech signal is first encoded into a lower-rate bit stream by the speech codec of mobile A, transmitted over the network, and then decoded back into a PCM signal in the speech codec of mobile B. Speech codecs are also used in Internet-based transmission in conjunction with IP (Internet Protocol) phones. As in cellular phones, the reduced data rate due to speech codecs allows for more throughput, that is, more telephone conversation, for a given transmission medium.
With the increased reliance on wireless communications, voice quality has become an important consideration in wireless systems. Various improvements have been made over the years to improve voice quality including, for example, improving the speech codecs used in the networks, using tandem free networks, and so on. Various signal processing techniques for enhancing voice quality are also well-known and pervasive throughout the networks, e.g., acoustic echo control, noise compensation, noise reduction, and automatic gain control. As is well known, these techniques typically use some form of noise estimation and subsequent gain adjustment/modification to improve the speech signal quality. However, conventional gain modification arrangements are limited in accuracy and effectiveness.
For example,
More specifically,
Referring to
As shown for increments of approximately 10 or more (reference 211), a saturation effect occurs in that the calculated index may be frequently greater than the table length of 31, in which case it has to be limited to 31. Moreover, the output signal may be limited by other mechanisms in the decoder. Consequently, saturation occurs and the output increment becomes less than 3.4 dB as shown by reference 212.
The relationship between output levels and index increments in fixed codebook gain, as shown in
The shortcomings of prior arrangements for modifying gain in a speech signal are overcome according to the principles of the invention by changing the gain parameter in the speech signal in a variable and cyclical manner over time. In effect, the change in the amount of gain applied to the signal is effectively dispersed over time so that gradual changes in the output level of the signal can be achieved to better match actual signal conditions.
In one illustrative embodiment of the invention, the speech signal is encoded as a bit stream and the speech signal is transported in frames with each frame being further sub-divided into sub-frames. The gain parameter, e.g., fixed codebook gain, is modified in the speech signal in a variable and cyclical manner over a plurality of sub-frames so that gain is temporally dispersed over a plurality of sub-frames.
The gain dispersion technique according to the principles of the invention can be advantageously used in conjunction with voice quality enhancement functions including, but not limited to, noise compensation, noise reduction, acoustic echo control, and automatic gain control. According to the principles of the invention, enhancement of the speech signal can be accurately adapted to actual signal conditions because the quantization level of the gain can be set with a resolution that allows for closely matching the smallest perceivable differences in the speech signal, e.g. the smallest perceivable sound level (loudness) difference. By contrast, the prior art arrangements only allow for coarse quantization (adjustment), e.g., on the order of approximately 3.5 dB steps/increments.
A more complete understanding of the present invention may be obtained from consideration of the following detailed description of the invention in conjunction with the drawing, with like elements referenced with like reference numerals, in which:
In previously filed U.S. patent application Ser. No. 10/449,288, which is incorporated by reference as if set forth fully herein, I recognized problems associated with prior voice quality enhancement techniques and developed an improved method based on direct processing of the bit stream in the network using a subset of decoded parameters from the speech signal to modify selected parameters in the speech signal. Referring now to
As shown, near-end noise is compensated by enhancing the far-end signal. More specifically, decoder 103 regenerates near-end PCM signal y by decoding near-end bit stream 100. In a conventional manner, noise estimator 104 derives near-end noise level NY from PCM signal y and provides that as input to noise compensation unit 102. More specifically, noise-adaptive gain control (NGC) computation unit 106 receives near-end noise level NY as input and computes gain GN based on noise level NY and gain quantization unit 108 then quantizes gain GN. For this purpose, the desired gain GN is quantized in steps provided by the fixed codebook gain table. More specifically, the gain GN is expressed in index increments, represented as IINC, with respect to the fixed codebook gain table. The function of index computation unit 110 is to compute a new index for the fixed codebook gain, which then is used to modify the far-end bit stream 101. More specifically, the modified fixed codebook gain IM is computed in index computation unit 110 by adding the index increment IINC to the fixed codebook gain index IO, which is decoded and extracted by decoder 131 from the far-end bit stream speech. The sum of both values is further limited to fixed codebook gain table length LT, to warrant a valid table index. Stated otherwise, the modified fixed codebook gain can be expressed as IM=min(I0+IINC, LT). The far-end bit stream 101 is then modified in bit stream processor 120 by replacing the original fixed codebook gain index of the far-end speech signal (bit stream 101) with the modified fixed codebook gain index IM, to generate modified far-end bit stream 105.
As shown in
According to the principles of the invention, gain dispersion unit 312, responsive to the quantized gain increment (via the table index increment IINC) and the corresponding remainder R, generates a time-dispersed index increment ĨINC. More specifically, the time-dispersed index increment ĨINC is the sum of the base index increment IINC and a cyclical increment ΔINC, i.e., ĨINC=IINC+ΔINC. The cyclical increment ΔINC is typically a time-varying integer of either 1 or 0 and determined by remainder R. While time-dispersed index increment ĨINC and its cyclical component ΔINC will be described in further detail below, generally, the closer remainder R is to zero, the less frequent ΔINC takes on the value 1 and, the closer remainder R is to one, the more frequent ΔINC takes on the value 1. Index computation unit 110 (as in
The far-end bit stream 101 is then modified in bit stream processor 120 by replacing the original fixed codebook gain index of the far-end speech signal (bit stream 101) with the modified fixed codebook gain index IM, to generated modified far-end bit stream 305.
Referring now to
Continuing with the example described in
For illustration purposes, consider sub-frames 460-463 as representing one cycle in which the fixed codebook gain is maintained at index level 2 (corresponding output level 7.0 dB) for three (3) sub-frames and raised to index level 3 (corresponding output level 10.5 dB) for one (1) sub-frame. By incrementing the fixed codebook gain by one index over just one (1) of the four (4) sub-frames in that cycle, the far-end output level is increased by approximately 0.9 dB to 7.9 dB (from level 450 to 451) since the gain index was incremented in a cyclical manner over the 20 ms frame. Continuing with this example, consider that this pattern is repeated (e.g., 1 sub-frame at gain index increment 3 and 3 sub-frames at gain index increment 2) for several frames, thus resulting in the far-end output level remaining at 7.9 dB (level 451). At approximately 80 ms (the 5th frame), the pattern changes such that the gain index is kept at index increment 2 for 2 sub-frames and at index increment 3 for 2 sub-frames, thus resulting in an increment in output level (from level 451 to 452) by another 0.9 dB to 8.8 dB. This pattern continues for a few more cycles to maintain the output level at 8.8 dB until the pattern changes again to 3 sub-frames at gain index increment 3 and 1 sub-frame at gain index increment 2); and so on.
For comparison purposes, consider that the pattern of the gain index increment in
Incrementing the fixed codebook gain index in a cyclical manner according to the principles of the invention effectively disperses the gain index increment over time. That is, by dispersing the increment in fixed codebook gain index over many sub-frames in a cyclical manner, a more gradual adjustment in the far-end output level can be achieved. In the example shown in
As shown, the routine starts at step 502. In step 504, the parameters COUNTER, CYCLE, and CYCLE_PERIOD are initialized. In this example, COUNTER and CYCLE are set to a value of 0, while CYCLE_PERIOD is set to a value of 4 continuing with the example shown in
In step 506, the index increment IINC and the remainder R are used as input to the algorithm. In step 508 the variable CYCLE is determined by taking COUNTER modulo CYCLE_PERIOD. The modulo operation is equivalent to first dividing COUNTER by CYCLE_PERIOD and then taking the remainder. For example, with CYCLE_PERIOD=4 and a COUNTER sequence of 0-1-2-3-4-5-6-7-8-9. . . , the resulting CYCLE variable sequence will be 0-1-2-3-0-1-2-3-0-1- . . . In step 510, the variable CYCLE is compared with zero (0). If CYCLE equals zero, step 520 is performed next, otherwise, step 512 is performed next. In step 520, the remainder R, one of the input variables of the algorithm, is compared with 0.25. If remainder R is greater than 0.25, step 526 is performed next, otherwise step 516 is performed next. In step 526, the time-dispersed index increment ĨINC is computed by adding one to the base increment IINC, while in step 516, the time-dispersed index increment ĨINC is set equal to the base increment IINC Steps 512, 514, 522, and 524 are carried out similar to steps 510 and 520, and are, for sake of brevity, not described again in detail. After the time-dispersed index increment ĨINC was set in either step 516 or step 526, the variable COUNTER is incremented by one in step 518. The next subframe is then processed by loading the new inputs in step 506.
To relate
In general, the foregoing embodiments are merely illustrative of the principles of the invention. Those skilled in the art will be able to devise numerous arrangements and modifications, which, although not explicitly shown or described herein, nevertheless embody those principles that are within the scope of the invention. For example, the invention was described in the context of certain illustrative embodiments. While various examples were also given for possible modifications or variations to the disclosed embodiments, it is contemplated that other modifications and arrangements will also be apparent to those skilled in the art in view of the teachings herein. Accordingly, the embodiments shown and described herein are only meant to be illustrative and not limiting in any manner.
Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure. Thus, for example, it will be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures, including any functional blocks labeled as “processors”, may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read-only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the FIGS. are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
Software modules, or simply modules which are implied to be software, may be represented herein as any combination of flowchart elements or other elements indicating performance of process steps and/or textual description. Such modules may be executed by hardware that is expressly or implicitly shown.
In the claims hereof any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements which performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The invention as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. Applicant thus regards any means which can provide those functionalities as equivalent as those shown herein. Finally, the scope of the invention is limited only by the claims appended hereto.