This invention generally relates to coding in communication systems and more specifically to reusing codebooks in parameter quantization of signals.
Speech and audio coding algorithms have a wide variety of applications in communication, multimedia and storage systems. The development of the coding algorithms is driven by the need to save transmission and storage capacity while maintaining a high quality of a synthesized signal. The complexity of a coder is limited by the processing power of the application platform. In some applications, e.g., a voice storage, an encoder may be highly complex, while the decoder can be as simple as possible.
In a typical speech coder, the input speech signal is processed in segments, which are called frames. Usually the frame length is 10-30 ms, and a look ahead segment of 5-15 ms of the subsequent frame is also available. The frame may further be divided into a number of sub-frames. For every frame, the encoder 10a in
Most current speech coders include a linear prediction (LP) filter, for which an excitation signal is generated. The LP filter has typically an all-pole structure described by
where a1, a2, . . . , ap are LP coefficients. The degree p of the LP filter is usually 8-12. The input speech signal is processed in frames. For each speech frame, the encoder determines the LP coefficients using, e.g., the Levinson-Durbin algorithm. Line spectrum frequency (LSF) representation is employed for quantization of the coefficients, because they have good quantization properties. For intermediate sub-frames, the coefficients are linearly interpolated using the LSF representation.
In order to define the LSFs, an inverse LP filter A(z) polynomial is used to construct two polynomials as described by K. K. Paliwal and B. S. Atal, in “Efficient Vector Quantization of LPC Parameters at 24 bits/frame”, Proceedings of ICASSP-91, pp. 661-664, as follows:
P(z)=A(z)+z−(p+1)A(z−1), and
Q(z)=A(z)−z−(p+1)A(z−1).
The roots of the polynomials P(z) and Q(z) are called LSFs. The polynomials P(z) and Q(z) have the following properties: 1) all zeros (roots) of the polynomials are on the unit circle, 2) the zeros of P(z) and Q(z) are interlaced with each other. More specifically, the following relationship is always satisfied:
0=ω0<ω1<ω2< . . . <ωp−1<ωp<ωp+1=π.
The ascending ordering guarantees the filter stability, which is often required in signal coding applications. It is noted that the first and last parameters are always zero and π, respectively, and only p values has to be transmitted as described by N. Sugamura and N. Farvardin, in “Quantizer Design in LSP Speech Analysis and Synthesis”, Proceedings of ICASSP-88, pp. 398-401.
In speech coders an efficient representation is needed for storing LSF information. The most efficient way to quantize the LSF parameters is to use vector quantization (VQ) often together with prediction as described, for example, by A. McCree and J. C. De Martin, “A 1.7 kb/s MELP Coder with Improved Analysis and Quantization”, in Proceedings of ICASSP-98, pp. 593-596. Usually predicted values are estimated based on the previously decoded output values, e.g., in case of an autoregressive predictor (AR-predictor) or based on previously quantized values, e.g., in case of a moving average predictor (MA-predictor), as follows
where Ajs and Bis are predictor matrixes and m and n are orders of the AR- and MA-predictors, respectively. mLSFk is a mean LSF, qLSFk is a quantized LSF, CBk is a codebook vector for the frame k. State of the art quantization uses several switched predictors. Predictor selection is transmitted in that case with one or more bits. This is efficient since the bit used in a predictor selection is often more efficient than making the codebooks larger. Especially in space-constrained cases it is efficient to use the bits for the predictor selection since adding the bits to codebooks doubles the code book stage size, but using a new diagonal predictor requires only p values (commonly 10).
Codebooks are optimized for each predictor separately and stored, e.g., in a ROM memory. If several predictors and/or large codebooks are used, a lot of storage memory is required. By using smaller/fewer codebooks, the memory consumption can be reduced but at the expense of a reduced compression performance. Using its own optimized codebooks for each quantizer stage requires a lot of storage space as well. It is highly desirable to find an efficient solution to obviate the problem of a required large storage space.
The object of the present invention is to provide a methodology for reusing codebooks for a multistage vector quantization of parameter quantizers of signals.
According to a first aspect of the invention, a method of reusing codebooks for a multistage vector quantization of parameter quantizers for a signal, comprises the steps of: training multistage vector quantization codebooks for all predictor and non-predictor modes of the parameter quantizers; analyzing the trained codebooks for different stages of the vector quantization and optionally analyzing corresponding training data used for the training and identifying similar codebooks corresponding to different predictor and non-predictor modes out of the all predictor and non-predictor modes for the different stages based on the analyzing using a predetermined criterion; combining the training data corresponding to N codebooks selected from the similar codebooks based on a further predetermined criterion; and training the N codebooks using the combined training data thus generating a new common codebook to be used instead of the N codebooks for the multistage vector quantization of the parameter quantizers for the signal, wherein N is an integer of at least a value of two.
According further to the first aspect of the invention, the step of the training multistage vector quantization codebooks may include training predictors corresponding to the all predictor modes of the parameter quantizers.
According further still to the first aspect of the invention, the steps of the analyzing the combining and the training may be repeated until a pre-selected level of memory space savings is reached.
According yet further still to the first aspect of the invention, the N codebooks may have the same size.
Yet still further according to the first aspect of the invention, the identifying similar codebooks using the predetermined criterion may be based on evaluating a variance of related parameters, and optionally on evaluating the variance of training vectors or code vectors, corresponding to the similar codebooks.
Further according to the first aspect of the invention, the step of the analyzing the trained codebooks may include evaluating at least one related parameter for an original codebook out of the trained codebooks for one predictor mode of the all predictor modes, and then evaluating at least one related parameter using a different trained codebook out of the trained codebooks for a different predictor mode of the all predictor modes in place of the original trained codebook and using identical data for the both evaluatings.
Still yet further according to the first aspect of the invention, the step of the combining the training data may include combining the training data for the original codebook and the different codebook if the predetermined criterion is met.
Further according to the first aspect of the invention, the parameter quantizers may contain both vector and scalar parameters.
Still further still according to the first aspect of the invention, the training the N codebooks using the combined training data may be performed using a pre-selected algorithm, optionally a generalized Lloyd algorithm.
Still further according to the first aspect of the invention, all steps may be performed by an encoder of a communication system, and the encoder optionally may be a part of a mobile device which is optionally a mobile phone. Further, the encoder may be capable of storing the common codebooks and may be capable of generating an encoded quantized signal from the signal by using and reusing the common codebook for the multistage vector quantization of the parameter quantizers for the signal.
According to a second aspect of the invention, an encoder capable of reusing codebooks for a multistage vector quantization of parameter quantizers for a signal, comprises: a means for training multistage vector quantization codebooks for all predictor and non-predictor modes of the parameter quantizers; an analyzing block, for analyzing the trained codebooks for different stages of the vector quantization and optionally analyzing corresponding training data used for the training and identifying similar codebooks corresponding to different predictor and non-predictor modes out of the all predictor and non-predictor modes for the different stages based on the analyzing using a predetermined criterion; and a combining block, for combining the training data corresponding to N codebooks selected from the similar codebooks based on a further predetermined criterion; and means for training the N codebooks using the combined training data thus generating a new common codebook to be used instead of the N codebooks for the multistage vector quantization of the parameter quantizers for the signal, wherein N is an integer of at least a value of two.
According further to the second aspect of the invention, the training multistage vector quantization codebooks may also include training predictors corresponding to the all predictor modes of the parameter quantizers.
According still further to the second aspect of the invention, the analyzing the trained codebooks, the combining the training data and the training the N codebooks may be repeated until a pre-selected level of memory space savings is reached.
According further still to the second aspect of the invention, the N codebooks may have the same size.
Yet still further according to the second aspect of the invention, the identifying similar codebooks using the predetermined criterion may be based on evaluating a variance of related parameters, and optionally on evaluating the variance of training vectors corresponding to the similar codebooks.
Further according to the second aspect of the invention, analyzing the trained codebooks may include evaluating at least one related parameter for an original codebook out of the trained codebooks for one predictor mode of the all predictor modes, and then evaluating at least one related parameter using a different trained codebook out of the trained codebooks for a different predictor mode of the all predictor modes in place of the original trained codebook and using identical data for the both evaluatings. Further, the combining the training data may include combining the training data for the original codebook and the different codebook if the predetermined criterion is met.
Still further according to the second aspect of the invention, the parameter quantizers may contain both vector and scalar parameters.
Yet still further according to the second aspect of the invention, the encoder may be a part of a communication system or a part of a mobile device which is optionally a mobile phone.
Still yet further according to the second aspect of the invention, the means for training the multistage vector quantization codebooks and the means for training the N codebooks using the combined training data may be incorporated in one block.
Still further still according to the second aspect of the invention, the encoder may further comprise: a memory, for storing the common codebook; and a coding module, capable of retrieving the common codebook from the memory for generating an encoded quantized signal from the signal by using and reusing the common codebook for the multistage vector quantization of the parameter quantizers for the signal.
According to a third aspect of the invention, a computer program product may comprise: a computer readable storage structure embodying computer program code thereon for execution by a computer processor with the computer program code characterized in that it may include instructions for performing the steps of the method of the first aspect of the invention indicated as being performed by any component or a combination of components of the encoder.
Quantization performance remains good while the codebook sizes can be reduced significantly. The result is smaller size encoder and decoder. The size is very important especially in embedded applications such as mobile phones.
For a better understanding of the nature and objects of the present invention, reference is made to the following detailed description taken in conjunction with the following drawings, in which:
The present invention provides a new methodology for reusing codebooks for a multistage vector quantization of parameter quantizers of signals. According to the present invention, said parameter quantizers can be both vector and scalar parameters. Prior art multistage vector quantization is done in such a way that each stage has different optimized codebooks. Therefore the prior art codebooks use quite a lot of a memory storage space. Using the same codebook stages several times, according to the present invention, reduces the memory usage and a codebook structure maintains good quality by using optimized codebooks for the most important (first) stages in the quantization. The number of codebooks is reduced by reusing the same codebooks in the refining stages. Additionally, according to the present invention, using many predictors is space-wise efficient since they need only a few coefficients instead of larger codebooks.
In a practical implementation the codebook design/training has to be carefully implemented. Best results are obtained when the first stages in all multistage quantizers are optimal for predicted data (i.e., the first stage should have a unique codebook). This is important since in many multistage quantizers the first stages take out most of the error energy (i.e. the codebooks are designed so that the first stage codebooks have most variance and consequently most resolving power).
One possibility (first scenario) among many others, according to the present invention is to combine codebooks as follows:
Another possibility (second scenario) among many others is to combine codebooks with the following method:
After performing the algorithms for combining codebooks described above, the actual vector quantization by reusing the same codebooks is performed exactly the same way as the usual switched prediction (see e.g., A. McCree and J. C. De Martin, “A 1.7 kb/s MELP Coder with Improved Analysis and Quantization”, in Proceedings of ICASSP-98, pp. 593-596), and, e.g., as summarized below:
In addition to standard operating blocks such as a coding module 22 (e.g., for encoding and quantizing signal parameters of an input signal 36) and a memory 20 (e.g., for storing codebooks, training data, predictors, etc.), the encoder 10 contains an additional block, a codebook reusing module 26, for implementing training, analyzing and combining functions for reusing the codebooks for the multistage vector quantization of the parameter quantizers, according to the present invention. It is noted that, in an alternative implementation of the present invention, the codebook reusing module 26 can be located outside of the encoder 10. For example, the codebooks can be trained off-line on a PC and only the trained codebooks are then stored in the memory 20.
A training block 12 of the codebook reusing module 26 is for training multistage vector quantization codebooks for all predictor and non-predictor modes of said parameter quantizers (see step 1 in the first and second scenarios above). This function of the block 12 can be alternatively performed by a similar training block in the standard coding module 22. The training block 12 can also be used for re-training of similar codebooks (see step 5 in the first and second scenarios above) as discussed below in detail.
An analyzing/evaluating block 14 of the codebook reusing module 26 is for analyzing/evaluating the trained codebooks for different stages of the vector quantization (e.g., step 2 in the first scenario above and steps 2 and step 3 in the second scenario above) and optionally analyzing/evaluating corresponding training data used for said training (e.g., step 2 in the first scenario above) and identifying similar codebooks corresponding to different predictor and non-predictor modes out of said all predictor and non-predictor modes for said different stages based on said analyzing/evaluating using a predetermined criterion.
A combining block 16 of the codebook reusing module 26 is for combining the training data corresponding to N codebooks selected from said similar codebooks based on a further predetermined criterion. After completing this operation, the process moves to the training block 12, described above, for training the N codebooks using said combined training data, thus generating a new common codebook which is used instead of said N codebooks for said multistage vector quantization of said parameter quantizers for said signals, wherein N is an integer of at least a value of two.
Dotted arrow lines in the codebook reusing module 26 indicate logical directions of the process. Lines 28, 30 and 32 facilitate exchange of information to/from the blocks 12, 14 and 16 from/to the memory 20. Similarly a line 34 is used for communicating between the memory 20 and the coding module 22. For example, the training block 12 retrieves said N codebooks and said combined training data from the memory 20 (where they are stored after completing the combining procedure by the combining block 16 as described above) and then after completing the training procedure, the training block 12 sends the new common codebook to memory 20 for storage and for further using by the coding module 22 for encoding and quantizing signal parameters of an input signal 36 as mentioned above. A UI (user interface) signal 24 is used for sending appropriate commands to the codebook reusing module 26 regarding all or selected steps of the first and second scenarios described above.
The flow chart of
In a next step 46, the resulting codebooks and the used training data are analyzed (e.g., by the analyzing/evaluating block 14 of
In a next step 48, the training data, corresponding to N codebooks selected from said identified codebooks with similar behavior based on the further predetermined criterion, are combined (e.g., using the combining block 16 of
The flow chart of
The following example further demonstrates the present invention. In a very low bit rate coder there are four modes: no audio, unvoiced, mixed voiced and fully voiced. All but the no-audio segments require LSF parameter quantization. In all cases a switched prediction is used. In the unvoiced and mixed voicing cases a two-predictor model is used. In the fully voiced case four different predictors are used. The bit-allocation, modes and codebook reuse (UCB Unique CodeBook, CCB Common CodeBook) can be seen in Table 1 (results are generated using the first scenario described above).
As can be seen, 57% space savings are obtained. Without a space reuse, a memory usage would have been 10*2*(9*2ˆ5+26*2ˆ4)=14080 bytes with 16 bit coefficients. Now it is only 10*2*(4*2ˆ5+11*2ˆ4)=6080 bytes. Five Common CodeBooks have replaced 25 Unique CodeBooks.