The invention relates to electronic devices, and more particularly to speech encoding, transmission, storage, and decoding/synthesis methods and circuitry.
Commercial digital speech systems and telephony, including wireless and packetized network, continually demand increased speech coding quality and compression. This has led to ITU standardized methods such as G.729 and G.729 Annex A for encoding/decoding speech using a conjugate structure algebraic code-excited linear-prediction (CS-ACELP) method. Further, standard G.729 Annex B provides additional compression for silence frames and is to be used with G.729 and G.729 Annex A. In particular, Annex B provides a voice activity detector (VAD), discontinuous transmission, and comfort noise generator to reduce the transmission bit rate during silence periods, such as pauses during speaking.
G.729 and G.729 Annex A use 10 ms frames, and the Annex B VAD makes a voice activity decision every frame to decide the type of frame encoding; see
The present invention identifies a problem with G.729 Annex B SID LSF vector quantization.
Preferred embodiment encoding and decoding have advantages including fixes of the problem of G.729 Annex B SID LSF vector quantization.
The drawings are heuristic for clarity.
1. Overview
The preferred embodiment systems adjust functions of G.729 Annex B to overcome the SID LSF vector quantization overflow problem identified by the invention. In particular, for SID frames rapid spectral change may cause the current frame LSF vector to diverge from the LSF predictor vectors derived from prior frames, and thus the error (difference of current and predictor) LSF vector is large and not close to any of the codebook (quantized) vectors. In this case the G.729 Annex B quantization routine fails and essentially random codebook indices (which may fall outside of the codebook range) arise which can lead to memory corruption. The following sections list the pertinent Annex B code and the preferred embodiments' adjusted code.
The preferred embodiment systems may include digital signal processors (DSPs) or general purpose programmable processors or application specific circuitry or systems on a chip such as both a DSP and RISC processor on the same chip with the RISC processor controller and preferred embodiment encoding and decoding functions as stored programs. Codebooks would be stored in memory at both the encoder and decoder, and a stored program may be in an onboard or external ROM, flash EEPROM, or ferroelectric RAM for a DSP or programmable processor. Analog-to-digital converters and digital-to-analog converters provide coupling to the real world, and modulators and demodulators (plus antennas for air interfaces) provide coupling for transmission waveforms. The encoded speech can be packetized and transmitted over networks such as the Internet.
2. G.729 Annex B Problem
To explain the preferred embodiments, first consider the G.729 Annex B quantization module QsidLSF.c which includes the quantization functions lsfq_noise, Qnt_e, New_ML_search —1, and New_ML_search—2. Basically, Qnt_e employs a two-stage vector quantization with delayed decision quantization in which the first stage outputs a few (typically 4) candidate codebook (quantized) vectors to the second stage and the second stage performs a full quantization. Multiple (typically 2) moving average predictors are used to predict the current frame LSF vector, and the prediction error (“errlsf”) is the target vector for the quantization.
The lsfq_noise function takes as input the current (two-frame average) lsp vector plus the prior (four-frame) lsf vectors to generate predictors and output the quantized lsp vector plus codebook and predictor indices. In particular, lsfq_noise calls Qnt_e which, in turn, calls the two codebook search functions New_ML_search—1 and New_ML_search—2 for the two quantizations. As described below, an overflow problem arises in the search functions New_ML_search—1 and New_ML_search—2. Note that “lsf[ ]” is the current (two)-frame lsf vector; “freq_prev[ ][ ]” are the previous frames' lsf vectors used to make the predictors; “errlsf[ ]” is a one-dimensional array of the prediction errors (differences of lsf[ ] and the moving average predictors), so errlsf is the quantization target; and “sum[ ]” is a list of the distances between errlsf and the codebook quantized vectors, so the K minimal entries of sum[ ] should correspond to K quantization candidates.
The called quantization function Qnt_e ( ) is
The called New_ML_search—1 and New_ML_search—2 functions do the two stages of codebook searching. The searches essentially finds K candidate (delayed decision) quantized vectors as the K codebook vectors closest to one of the J “errlsf” component vectors where “errlsf” component vectors are the J errors (differences) of the current lsf vector from J the moving average predictor vectors. The K output pairs min_indx_p[ ]=p and min_indx_m[ ]=m are the errlsf predictor mode and corresponding quantized vector codebook index, respectively, for the candidate quantized vectors.
The invention recognizes a problem in the foregoing select candidate routines which have the nested for loops
These nested for loops perform the codebook searches with K the number of candidate codebook quantized vectors, J the number of predictor modes (variable p), MQ the number of quantized vectors in the codebook (variable m), and sum[p*MQ+m] the one-dimensional array of distances between the errlsf vector and the codebook quantized vectors. In particular, the invention recognizes the problem occurring when the errlsf component vectors are not near any of the codebook quantized vectors. In this case sum[p*MQ+m] for all p and m will equal MAX—16 (overflow), so the condition if(sub(sum[p*MQ+m], min[q])<0) will never be true. With the if condition never true, the p and m values will not be assigned and essentially be random. But not all p and m are within allowed ranges, and memory corruption arises. The preferred embodiments fix this problem.
3. First Preferred Embodiment
In particular, first preferred embodiments include default assignments of p and m in the select candidate searches:
4. Second Preferred Embodiment Quantization
The second preferred embodiment is analogous to the first preferred embodiment but randomly picks (within the possible range) default values for the assignments of p and m in the overflow case for both search functions. The listing is as follows.
5. Third Preferred Embodiment Quantizations
Third preferred embodiments include an overflow flag to solve the overflow problem of G.729 Annex B; the overflow flag indicates an overflow in either the first or second quantization stage. Upon overflow the third preferred embodiments suppress the generation of the SID frame(s) and the encoder continues to produce the same output as before the overflow; this persists until the overflow condition ends. See
The Annex B module Dtx.c function Cod_cng computes DTX (discontinuous transmission), encodes SID (silence indicator descriptor) frames, and computes CNG (comfort noise generator) excitation update:
The preferred embodiment adjusted version of Cod_cng plus some needed definitions:
6. Fourth Preferred Embodiment
Fourth preferred embodiments also use an overflow indication. But rather than suppressing the SID frames at overflow as with the third preferred embodiments, the fourth preferred embodiments only suppress the spectral portion (LSF's) of SID frames at overflow. The fourth preferred embodiments still produce the SID frame amplitude portion; and for such SID frames the spectral portion can be filled in with previous values (or, alternatively, computed some other way); see
Listings of the quantization functions highlight the preferred embodiment adjustments:
7. Fifth Preferred Embodiments
The fifth preferred embodiments also use an overflow indicator plus memory to store parameters of a prior SID frame plus modifications of the quantization and search functions. With an overflow flagged, the encoder simply repeats the prior (stored) SID frame parameters for transmission, and the decoder updates essentially only by the filter interpolation. The preferred embodiment quantization and Cod_cng functions partial listings to highlight changes are:
8. Further Preferred Embodiments
Further preferred embodiments adjust the G.729 Annex B functions to handle SID LSF vector quantization overflow by ignoring the LSF vector predictors and directly quantizing the current LSF vector. Indeed, the overflow problem occurs when the predictors differ significantly from the current vector, so ignoring the predictors during overflow should be an improvement.
Preferred embodiments provide two ways to implement the direct quantization of the current LSF vector. First, if the overflow arises at the first stage of quantization (that is, with the New_ML_search—1 function), then use the current LSF vector as the target vector for a two-stage vector quantization with 5 bits for the first stage and 4 bits for the second stage. Contrarily, if the overflow does not arise until the second Annex B stage, then use the current LSF vector as the target vector of a one-stage 7-bit quantization.
9. Modifications
The preferred embodiments may be modified in various ways while retaining the feature of identification of the overflow problem and a fix for the problem.
For example,
G729, Annex B uses the same perceptual weighting function, Get_wegt(lsf, weighting), for both normal speech and for noise. This G729B function is only designed for voice. A new weighting function can be developed using research on noise perception that is also resistant to overflow as well as improving the signal reproduction quality.
This application claims priority from provisional applications: Ser. No. 60/350,274, filed Nov. 2, 2001. The following patent applications disclose related subject matter: Ser. Nos. 09/699,366, filed Oct. 31, 2000, now U.S. Pat. No. 6,807,525 and Ser. No. 09/871,779, filed Jun. 1, 2001, now U.S. Pat. No. 7,031,916. These referenced applications have a common assignee with the present application.
Number | Name | Date | Kind |
---|---|---|---|
5233660 | Chen | Aug 1993 | A |
6381570 | Li et al. | Apr 2002 | B2 |
6697776 | Fayad et al. | Feb 2004 | B1 |
6711537 | Beaucoup | Mar 2004 | B1 |
6807525 | Li et al. | Oct 2004 | B1 |
7031916 | Li et al. | Apr 2006 | B2 |
Number | Date | Country | |
---|---|---|---|
20030135363 A1 | Jul 2003 | US |
Number | Date | Country | |
---|---|---|---|
60350274 | Nov 2001 | US |