The present invention relates to combined innovation codebook devices and corresponding methods for use in a Code-Excited Linear Prediction (CELP) coder and decoder.
The CELP model is widely used to encode sound signals, for example speech, at low bit rates. In CELP, the sound signal is modelled as an excitation processed through a time-varying synthesis filter. Although the time-varying synthesis filter may take many forms, a linear recursive all-pole filter is often used. The inverse of this time-varying synthesis filter, which is thus a linear all-zero non-recursive filter, is called “Short-Term Prediction” (STP) filter since it comprises coefficients calculated in such a manner as to minimize a prediction error between a sample s[i] of the sound signal and a weighted sum of previous samples s[i-1], s[i-2], . . . , s[i-m] of the sound signal, where m is the order of the filter. Another denomination frequently used for the STP filter is “Linear Prediction” (LP) filter.
If a residual of the prediction error from the LP filter is applied as the input of the time-varying synthesis filter with proper initial state, the output of the synthesis filter is the original sound signal, such as speech. At low bit rates, it is not possible to transmit an exact prediction error residual. Accordingly, the prediction error residual is encoded to form an approximation referred to as the excitation. In traditional CELP coders, the excitation is encoded as the sum of two contributions; the first contribution is produced from a so-called adaptive codebook and the second contribution is produced from a so-called innovation or fixed codebook. The adaptive codebook is essentially a block of samples from the past excitation with proper gain. The innovation or fixed codebook is populated with codevectors having the task of encoding the prediction error residual from the LP filter and adaptive codebook.
The innovation or fixed codebook can be designed using many structures and constraints. However, in modern speech coding systems, the Algebraic Code-Excited Linear Prediction (ACELP) model is often used. ACELP is well known to those of ordinary skill in the art of speech coding and, accordingly, will not be described in detail in the present specification. In summary, the codevectors in an ACELP innovation codebook each contain few non-zero pulses which can be seen as belonging to different interleaved tracks of pulse positions. The number of tracks and non-zero pulses per track usually depend on the bit rate of the ACELP innovation codebook. The task of an ACELP coder is to search the pulse positions and signs to minimize an error criterion. In ACELP, this search is performed using an analysis-by-synthesis procedure in which the error criterion is calculated not in the excitation domain but rather in the synthesis domain, i.e. after a given ACELP codevector has been filtered through the time-varying synthesis filter. Efficient ACELP search algorithms have been proposed to allow fast search even with very large ACELP innovation codebooks.
Although very efficient to encode speech at low bit rates, ACELP codebooks may not gain in quality as quickly as other approaches such as transform coding and vector quantization when increasing the ACELP codebook size. When measured in dB/bit/sample, the gain at higher bit rates (e.g. bit rates higher than 16 kbit/s) obtained by using more non-zero pulses per track in an ACELP innovation codebook is not as large as the gain (in dB/bit/sample) of transform coding and vector quantization. This can be seen when considering that ACELP essentially encodes the sound signal as a sum of delayed and scaled impulse responses of the synthesis filter. At lower bit rates (e.g. bit rates lower than 12 kbit/s), the ACELP technique captures quickly the essential components of the excitation. But at higher bit rates, higher granularity and, in particular, a better control over how the additional bits are spent across the different frequency components of the signal are useful.
Therefore, there is a need for an innovation codebook structure better adapted for use at higher bit rates.
In the appended drawings:
According to non-limitative exemplary aspects, the present disclosure relates to:
a combined innovation codebook coding method, comprising: pre-quantizing a first, adaptive-codebook excitation residual, the pre-quantizing being performed in transform-domain; and searching a CELP innovation-codebook in response to a second excitation residual produced from the first, adaptive-codebook excitation residual;
a combined innovation codebook decoding method comprising: de-quantizing pre-quantized coding parameters into a first innovation excitation contribution, wherein de-quantizing the pre-quantized coding parameters comprises calculating an inverse transform of the coding parameters; and applying CELP innovation-codebook parameters to a CELP innovation-codebook structure to produce a second innovation excitation contribution;
a combined innovation codebook coding device, comprising: a pre-quantizer of a first, adaptive-codebook excitation residual, the pre-quantizer operating in transform-domain; and a CELP innovation-codebook module responsive to a second excitation residual produced from the first, adaptive-codebook excitation residual;
a CELP coder comprising the above-mentioned combined innovation codebook coding device;
a combined innovation codebook comprising: a de-quantizer of pre-quantized coding parameters into a first innovation excitation contribution, the de-quantizer comprising an inverse transform calculator responsive to the coding parameters; and a CELP innovation-codebook structure responsive to CELP innovation-codebook parameters to produce a second innovation excitation contribution; and
a CELP decoder comprising the above described combined innovation codebook.
The foregoing and other features of the combined innovation codebook devices and corresponding methods will become more apparent upon reading of the following non-restrictive description of illustrative embodiments thereof, given by way of example only with reference to the accompanying drawings.
Referring to the decoder 200 of
More specifically,
Prior to further describing the decoder 200 of
Linear Prediction Filtering
Referring to
where ai represent the linear prediction coefficients (LP coefficients) with a0=1, and M is the number of linear prediction coefficients (order of LP analysis). The LP coefficients ai are determined in an LP analyzer (not shown) of the ACELP coder 300.
The LP filter 301 produces at its output a LP residual 303.
Adaptive-Codebook Search
The LP residual signal 303 from the LP filter 301 is used in an adaptive-codebook search module 304 of the ACELP coder 300 to find an adaptive-codebook contribution 305. The adaptive-codebook search module 304 also produce the pitch parameters 320 transmitted to the decoder 200 (
The ACELP coder 300 also comprises a combined innovation codebook coding device including a first coding stage 306 operating in the transform-domain and referred to as pre-quantizer, and a second coding stage 307 operating in the time-domain and using, for example, ACELP. As illustrated in
As described hereinabove, the pre-quantizer 306 may use, for example, a DCT as frequency representation of the sound signal and an Algebraic Vector Quantizer (AVQ) to quantize and encode the frequency-domain coefficients of the DCT. The pre-quantizer 306 is used more as a pre-conditioning stage rather than a first-stage quantizer, especially at lower bit rates. More specifically, using the pre-quantizer 306, the ACELP innovation-codebook search module 311 (second coding stage 307) is applied to a second excitation residual 312 (
Production of the Pitch Residual Signal 313
The ACELP coder 300 comprises a subtractor 314 for subtracting the adaptive-codebook contribution 305 from the LP residual signal 303 to produce the above-mentioned first, adaptive-codebook excitation residual 313 that is inputted to the pre-quantizer 306. The adaptive codebook excitation residual r1[n] is given by
r1[n]=r[n]−gpv[n]
where r[n] is the LP residual, gp is the adaptive codebook gain, and v[n] is the adaptive codebook excitation (usually interpolated past excitation).
Pre-Quantizing
Operation of the pre-quantizer 306 will now be described with reference to
Pre-Emphasis Filtering
In a given subframe aligned with the subframe of the ACELP innovation-codebook search in the second coding stage 307, the first, adaptive-codebook excitation residual 313 (
F(z)=1/(1−αz−1)
which corresponds to the difference equation
y[n]=x[n]+αy[n−1]
where x[n] is the first, adaptive-codebook excitation residual 313 inputted to the pre-emphasis filter F(z) 308, y[n] is the pre-emphasized, first adaptive-codebook excitation residual, and coefficient α controls a level of pre-emphasis. In this non limitative example, if the value of α is set between 0 and 1, the pre-emphasis filter F(z) 308 will have a larger gain in lower frequencies and a lower gain in higher frequencies, which will produce a pre-emphasized, first adaptive-codebook excitation residual y[n] with amplified lower frequencies. The pre-emphasis filter F(z) 308 applies a spectral tilt to the first, adaptive-codebook excitation residual 313 to enhance lower frequencies of this residual.
DCT Calculation
A calculator 309 applies, for example, a DCT to the pre-emphasized first, adaptive-codebook excitation residual y[n] from the pre-emphasis filter F(z) 308 using, for example, a rectangular non-overlapping window. In this non-limitative example, DCT-II is used, which is defined as
Algebraic Vector Quantizing (AVQ)
A quantizer, for example the AVQ 310 quantizes and codes the frequency-domain coefficients of the DCT Y[k] (DCT-transformed, de-emphasised first adaptive-codebook excitation residual) from the calculator 309. An example of AVQ implementation can be found in U.S. Pat. No. 7,106,228. The quantized and coded frequency-domain DCT coefficients 315 from the AVQ 310 are transmitted as pre-quantized parameters to the decoder (
Depending on the bit rate, a target signal-to-noise ratio (SNR) for the AVQ 310 (AVQ_SNR (
Producing Excitation Residual Signal 312
Inverse DCT Calculation
To obtain the excitation residual signal 312 for the second coding stage 307
(ACELP innovation-codebook search in this example; other CELP structure could also be used), the AVQ-quantized DCT coefficients 315 from the AVQ 310 are inverse DCT transformed in calculator 316.
De-Emphasis Filtering
Then the inverse DCT transformed coefficients 315 are processed through a de-emphasis filter 1/F(z) 317 to obtain a time-domain contribution 318 from the pre-quantizer 306. The de-emphasis filter 1/F(z) 317 has the inverse transfer function of the pre-emphasis filter F(z) 308. In the non limitative example for the pre-emphasis filter F(z) 308 given herein above, the difference equation of the de-emphasis filter 1/F(z)=1−αz−1 is given by:
y[n]=x[n]−αx[n−1]
where, in the case of the de-emphasis filter, x[n] is the pre-emphasized quantized excitation residual (from calculator 316), y[n] is the de-emphasized quantized excitation residual (time-domain contribution 318), and coefficient α has been defined hereinabove.
Subtraction to Produce the Second Excitation Residual
Finally, a subtractor 319 subtracts the de-emphasized excitation residual y[n] (time-domain contribution 318) from the adaptive-codebook contribution 305 found by means of the adaptive-codebook search in the current subframe to yield the second excitation residual 312.
ACELP Innovation-Codebook Search
The Second Excitation Residual 312 is Encoded by the ACELP Innovation-codebook search module 311 in the second coding stage 307. Innovation-codebook search of an ACELP coder are believed to be otherwise well known to those of ordinary skill in the art and, accordingly, will not be further described in the present specification. The ACELP innovation-codebook parameters 333 at the output of the ACELP innovation-codebook search calculator 311 are transmitted as ACELP parameters to the decoder (
Operation of the Combined Innovation Codebook 201
Referring back to the decoder 200 of
AVQ Decoding
First of all, the transform-domain decoder (204), AVQ in this example, (204) receives decoded pre-quantized coding parameters for example formed by the AVQ-quantized DCT coefficients 315 (which may include the AVQ global gain) from the AVQ 310 of
Inverse DCT Calculating
The inverse DCT calculator (204) then applies an inverse transform, for example the inverse DCT, to the de-quantized and scaled parameters from the AVQ decoder Y′[k]. Inverse DCT-II is used in this non-limitative example, defined as
De-Emphasis Filtering (1/F(z))
The AVQ-decoded and inverse DCT-transformed parameters y′[n] from the decoder/calculator 204 are then processed through the de-emphasis filter 1/F(z) 205 to produce a first stage innovation excitation contribution 208 from the de-quantizer 202.
ACELP Parameters Decoding
Coding in the ACELP innovation-codebook search calculator 311 of
Addition of Excitation Contributions
Finally, the decoder 200 comprises an adder 210 to sum the adaptive codebook contribution 113, the excitation contribution 208 from the de-quantizer 202 and the ACELP innovation-codebook excitation contribution 209 to form a total excitation signal 211.
Synthesis Filtering
The excitation signal 211 is processed through an LP synthesis filter 212 to recover the sound signal 213.
Referring to
However, the higher the bit rate, the more bits are used, in proportion, by the pre-quantizer 306 in the first coding stage, which results in a total coding noise being shaped more and more to follow the spectral envelope of the weighted LP filter.
Although the present invention has been described in the foregoing description in relation to illustrative embodiments thereof, these embodiments can be modified at will within the scope of the appended claims without departing from the scope and nature of the present invention.
The present application claims the priority to the U.S. Provisional Application Ser. No. 61/324,191, entitled “Flexible And Scalable Combined Innovation Codebook For Use In CELP Coder And Decoder” filed on Apr. 14, 2010. The specification of the above-identified application is incorporated herewith by reference.
Number | Name | Date | Kind |
---|---|---|---|
6134518 | Cohen et al. | Oct 2000 | A |
6662154 | Mittal et al. | Dec 2003 | B2 |
7106228 | Bessette et al. | Sep 2006 | B2 |
7430329 | Sarna | Sep 2008 | B1 |
7996233 | Oshikiri | Aug 2011 | B2 |
8306827 | Yamanashi et al. | Nov 2012 | B2 |
8515767 | Reznik | Aug 2013 | B2 |
8892448 | Vos et al. | Nov 2014 | B2 |
8892449 | Lecomte et al. | Nov 2014 | B2 |
20020103638 | Gao | Aug 2002 | A1 |
20030097258 | Thyssen | May 2003 | A1 |
20050096903 | Mittal et al. | May 2005 | A1 |
20050240398 | Chen et al. | Oct 2005 | A1 |
20060247926 | Rousseau | Nov 2006 | A1 |
20060271356 | Vos | Nov 2006 | A1 |
20080120118 | Choo et al. | May 2008 | A1 |
20080126085 | Morii | May 2008 | A1 |
20090182558 | Su et al. | Jul 2009 | A1 |
20090240491 | Reznik | Sep 2009 | A1 |
20100017198 | Yamanashi et al. | Jan 2010 | A1 |
20100324917 | Shlomot et al. | Dec 2010 | A1 |
20100332221 | Yamanashi et al. | Dec 2010 | A1 |
20110010169 | Vasilache et al. | Jan 2011 | A1 |
Number | Date | Country |
---|---|---|
2001 255 422 | Dec 2001 | AU |
2 347 735 | May 2000 | CA |
0 665 530 | Aug 1995 | EP |
2 292 466 | Oct 2009 | FR |
2 223 555 | May 2003 | RU |
2009059333 | May 2009 | WO |
WO 2009113316 | Sep 2009 | WO |
Entry |
---|
Yang et al., “Transform-Based CELP Vocoders with Low-Delay Low-Complexity and Variable-Rate Features”, IEICE Trans. Inf. & Syst., vol. E85-D, No. 6, Jun. 2002, pp. 1003-1014. |
Bessette et al., “Proposed CE for extending the LPD Mode in USAC”, 94 MPEG Meeting; 11-10-2010-10-15-2010; Guangzhou; Motion Picture Expert Group or ISO/IEC JTC1/sc29/wg11, No. M18481, Oct. 8, 2010, XP030047071,4 pps. |
Number | Date | Country | |
---|---|---|---|
20120089389 A1 | Apr 2012 | US |
Number | Date | Country | |
---|---|---|---|
61324191 | Apr 2010 | US |