The present invention relates to a method for coding or decoding speech signal sampled values.
In the standard for coding audiovisual objects according to MPEG-4, in ISO/IEC 14496-3 FCD, Subpart 2, parametric coders are specified, in particular the HVXC (Harmonic Vector Excitation Coding) coder, for coding speech at extremely low bitrates. In order to generate the LPC coefficients, the spectral envelopes of the speech signal, and the unvoiced segments, this standard contains a plurality of tables that are present in floating-point format.
In Subpart 3 of this standard, the CELP (Code Excited Linear Prediction) coder for coding speech at medium to low bitrates is described. For generating the LPC coefficients and the gain values, this standard contains a plurality of tables that are present in floating-point format.
For coding such speech signals, the method of “analysis through synthesis” is often used (ANT Nachrichtentechnische Berichte, Heft 5, November 1988, pages 93 to 105). In the mentioned speech coding methods, values are stored in code books, i.e., in the tables, the values being used for the generation of the signal parameters and thus for the coefficients of the speech synthesis filter. The values stored in the code books are read out via an index control unit.
Through the quantization of the values in the code books, the existing data are limited in their precision (quantization) so that the code book entries can be represented with a finite word length. In this way, their transfer to digital signal processors with whole-number arithmetic can take place without infringing the quality demands prescribed by standards, in particular according to ISO/IEC 14496-3. In contrast to the present invention, in the mentioned working versions of the standards the values for the code books are present in unquantized form, in floating-point format, and can be processed directly only using very expensive and memory-intensive methods. Despite the limitation of precision of the table values, in the present invention an equal subjective quality is to be achieved after the speech decoding. Using the measures of the present invention, a simple transfer—conforming to standards—of the code to various computing platforms is possible without influencing the subjective quality of the coder. Since reduced word lengths are used, a considerable savings of memory capacity, in particular in the form of ROMs, is possible. The present invention can be used with various speech signal coding methods, for example for HVXC coders/decoders or CELP coders/decoders.
Before discussing the actual quantization, a speech decoder is first presented in which the inventive quantization is used.
In the HVXC speech decoder according to
Parallel to this calculation, and as a function of the voiced/unvoiced decision, the vectors for the spectral envelope (voiced frame), AM code books 9 (CbAm) and 10 (CbAm4), or the vectors for the stochastic excitation signal (unvoiced frame, CELP code books 11 (CbCelp) and 12 (CbCelp4)) are read. The regeneration of the spectral envelopes and of the excitation signal takes place using the inverse vector quantizers 13 and 14. After the harmonic synthesis (voiced)—module 15—the filtering of the speech data takes place in the LPC synthesis filter. The output data from the voiced—module 7—and from the unvoiced—module 8—synthesis filter are subsequently added, yielding the reconstructed speech signal for a frame of 20 ms.
Because, as explained above, values for the code books in floating-point form are not suitable for fixed-point DSPs, because the required word lengths would be too large (memory requirement, internal word lengths and arithmetic, ROM), the conversion of the table values for the code books that were previously obtained by analysis from the speech signal sampled values takes place in a quantized form, with resulting equivalent speech quality. The word lengths required for this for the individual table values are determined in various hearing tests.
The quantization takes place to a word length that is determined in various tests. In the following representation, this word length is designated in general as wordlength. This size is expressed in bits. A signed whole number having wordlength bits includes a value range from −2wordlength−1 to 2wordlength−1−1. The quantization of the code books in this context takes place in the manner shown below. The beginning point is represented by the code books defined in the “Study on ISO/IEC 14496-3 FCD, Subpart 3.” For this document, the code book cb is defined as follows: cb={a0, a1, , an, , am} with 0≦n≦m and anεR. For the quantization of the individual elements, the following steps are required:
1.) Determination of the Value Range of the Code Books
In order to obtain a well-matched quantization, the elements of each code book are scaled in such a manner that the available value range is exploited as completely as possible. For this purpose, the value range of the elements is located between
In order to achieve this, the maximum of the positive and of the negative elements (max_pos or max_neg) of each code book is determined. These result from
max_pos=max ({anεcb|an≧0}) or max_neg=min ({anεcb|an≧0}), with 0≦n≦m
As a function of the magnitude of max_pos or max_neg, the following steps result:
max_pos>(1−2−(wordlength−1)) or max_neg≦−1
max_pos and max_neg are multiplied by 12. If the result still satisfies the condition set under (a), then the process is repeated until the condition no longer holds. The number of multiplications by ½ is counted and is stored in the variables scale.
max_pos≦(1−2−(wordlength−1)) or max_neg≧−1
max_pos and max_neg are multiplied by 2. If the result still satisfies the condition set under (b), then the process is repeated until the condition no longer holds. The number of multiplications by 2 is counted and is stored in the variables scale.
2.) Scaling of the Elements of cb to the Range Between −1 and (1−2−(wordlength−1)).
As a function of the decision made under 1.), the scaling of all code book entries to the cited range takes place:
with 0≦n≦m
bn=2scalean∀anεcb with 0≦n≦m.
After this step, the entries of each code book are located in the following range of values:
−1≦bn≦(1−2(wordlength−1)), with 0≦n≦m.
3.) Scaling to Wordlength Bits
For the scaling to the required value range, multiplication by 2wordlength−1 takes place. In this way, the values of code books c
4.) Rounding
Before the decimal places are truncated, rounding of the determined entries takes place. For this purpose, depending on the sign +0.5 or −0.5 is added. This takes place in the following form:
cn≧0:dn=cn+0.5
cn<0:dn=cn−0.5
Here care is to be taken not to exceed the maximum permissible value range. This is located in the range as indicated under 2.).
5.) Truncation of the Decimal Places
The final quantization takes place through the truncation of the decimal places. The quantized values are obtained in this way.
Trials have shown that with the setting of the variables wordlength at 16, a speech quality indistinguishable from the original is obtained.
A further construction of the present invention is explained in connection with
There, the block switching diagram of a CELP decoder is shown. First, the elements for decoding a frame are read from a transmitted bitstream, as before. These include the LPC indices, the excitation parameters (lag and shape index), and the amplitude indices (gain indices). These parameters (elements) are supplied to decoder inputs 17 to 21. The excitation parameters are made up of the parameters for adaptive code book (lag) 22 for the generation of periodic signal components (voiced) and the parameters for fixed code books (shape index) 23a . . . 23n.
The entries of fixed code books 23a . . . 23n and of adaptive code book 22 are each multiplied by a scaling factor (gain) via gain decoder 24. This scaling factor is reconstructed with the aid of the gain indices present at the input 21 and the gain VQ (vector quantization) tables stored in code books 25. The finally valid excitation vector is composed from the sum of the fixed and the adaptive code book vector.
With the use of vector quantizer VQ, the LPC indices represent the vector-quantized LSP (Line Spectral Pairs) parameters. The vectors of the first and second stage of the inverse vector quantization of the LSP parameters are obtained by reading out the LSP-VQ table values, which are stored in code books 26. The finally valid reconstruction of the LPC parameters takes place in LPC parameter decoder 27. Inside each frame, for each subframe interpolation—module 28—takes place between the LSP parameters of the past and of the current frame. The LSP parameters, converted into LPC parameters, enter into LPC synthesis filter 29 as coefficients. The reconstruction of the speech data takes place there through filtering of the excitation signal. In order to improve the speech quality, the reconstructed speech signal can be additionally filtered in a post-filter 30.
The LSP VQ table values, as well as the gain VQ table values for code books 25 and 26, which were previously obtained by analysis from the speech signal sampled values, are normally present in a floating-point representation, which, as explained above, is not suitable for a fixed-point DSP processing. For the same reasons as in the case of the HVXC decoder (
The above exemplary embodiments of the present invention have been explained on the basis of speech decoders. Of course, the present invention can also be used in corresponding coders (encoders) that use code books. There as well, the code book entries can be previously quantized for the preparation of speech signals for transmission. Examples of such encoders whose code book entries can be previously quantized described in European Published Patent Application No. 0545 386, U.S. Pat. No. 5,208,862, U.S. Pat. No. 5,487,128, U.S. Pat. No. 5,199,076, or U.S. Pat. No. 5,261,027.
Number | Date | Country | Kind |
---|---|---|---|
198 45 888 | Oct 1998 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/DE99/02633 | 8/21/1999 | WO | 00 | 11/6/2001 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO00/21076 | 4/13/2000 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5199076 | Taniguchi et al. | Mar 1993 | A |
5208862 | Ozawa | May 1993 | A |
5257215 | Poon | Oct 1993 | A |
5261027 | Taniguchi et al. | Nov 1993 | A |
5307441 | Tzeng | Apr 1994 | A |
5313554 | Ketchum | May 1994 | A |
5487128 | Ozawa | Jan 1996 | A |
5570454 | Liu | Oct 1996 | A |
5581652 | Abe et al. | Dec 1996 | A |
5646618 | Walsh | Jul 1997 | A |
5666370 | Ganesan et al. | Sep 1997 | A |
5719992 | Shoham | Feb 1998 | A |
5734789 | Swaminathan et al. | Mar 1998 | A |
5797121 | Fette et al. | Aug 1998 | A |
5806034 | Naylor et al. | Sep 1998 | A |
5889891 | Gersho et al. | Mar 1999 | A |
5983174 | Wong et al. | Nov 1999 | A |
6233550 | Gersho et al. | May 2001 | B1 |
Number | Date | Country |
---|---|---|
0 545 386 | Jun 1993 | EP |
WO96 17465 | Jun 1996 | WO |