The invention relates to low rate speech coding in communication and data processing systems, and more particularly to spectrum quantization of voice signals.
Digital speech processing is extensively used in communication systems, telephony, digital answering machines, low rate videoconferencing, etc. Low rate speech coding is typically based on parametric modeling of the speech signal. The speech encoder computes representative parameters of the speech signal, quantizes them into products, and places them into the data stream, which may be sent over a digital communication link or saved in a digital storage media. A decoder uses those speech parameters to produce the synthesized speech.
Almost all known speech compression algorithms for bit rates less than or equal to 8000 are based on linear prediction. Typically, linear prediction coefficients (LPC) are transmitted as linear spectral frequencies (LSF) (sometimes they are called “linear spectral parameters (LSP)” or “linear spectral pairs (LSP)”). Depending on the bit rate provided by the speech compression algorithm, LSF are updated once per 10-30 ms. Usually a 10th order linear prediction filter is used, which means that the LSF are represented by a 10-dimensional vector.
Then the current LSF vector and the set of predicted LSF vectors enters the vector quantizer unit 120. The vector quantizer unit 120 determines the best codebook index (or set of indices) and the best predictor number to provide the best approximation of the current LSF vector in the sense of some distortion measure. All indices computed by the vector quantizer enter indices encoder unit 130 where they are transformed into the codeword corresponding to the current LSF vector.
This codeword is sent along with other speech parameters into a data link transmission medium or a digital memory. Also, the codebook indices and predictor index enter the LSF reconstruction unit 140. Another input of the reconstruction unit is the set of predicted LSF vectors. In the LSF reconstruction unit 140 the quantized LSF vector is reconstructed. This vector is then saved in the buffer unit 150 to be used for prediction next LSF vectors.
Early quantizers used a single non-structured code and compared the source vector to each entry in the codebook (referred to as “full search quantizers”). The performance of vector quantization depends on the size of the codebook used, and to obtain better results, larger codebooks have to be used. On the other hand, storage and processing complexities also increase with increasing codebook size. To overcome this problem, suboptimal vector quantization procedures have been proposed that use multiple structured codebooks. One of the most widely used procedures is multistage vector quantization (MSVQ). In MSVQ a sequence of vector quantizers (VQ) is used. The input of the next VQ is the quantization error vector of the previous VQ.
An improvement on MSVQ is M-best or delayed decision MSVQ, which is described in (W. P. LeBlanc, B. Bhatacharya, S. A. Mahmood and V. Cuperman, “Efficient search and design procedures for robust multistage VQ of LPC Parameters for 4 kb/s speech coding” IEEE Transactions on speech and audio processing. Vol. 1, No. 4, Oct. 1993, pp. 373-385). The M-best MSVQ achieves better quantization results by keeping from stage to stage a few candidates (M candidates). The final decision for each stage is made only when the last quantization stage is performed. The more candidates that are kept, the higher the quantization gain that may be achieved and the greater the computational complexity.
The unit having the greatest impact on the performance of the quantizer is the vector quantization unit. Typically, an LSF vector is split into subvectors (usually 1 to 3 subvectors). A vector quantization procedure is then applied to each subvector. To improve the quantization accuracy, it is necessary to increase the dimensions of the subvectors and the corresponding codebook sizes. However, this leads to increasing the computational load needed for full search quantization. To decrease computational complexity, a multistage M-best quantization procedure is used.
The block diagram of a two-stage M-best quantizer is shown in
The common property of these suboptimal vector quantizers is that they reduce computational complexity by replacing an optimal large size non-structured codebook with a direct sum of small structured codebooks.
A reduced complexity vector quantizer is described. According to one embodiment of the invention, a multistage vector list quantizer comprises a first stage quantizer to select candidate first stage codewords from a plurality of first stage codewords, a reference table memory storing a set of second stage codewords for each first stage codeword, and a second stage codebook constructor to generate a reduced complexity second stage codebook that is the union of sets corresponding to the candidate first stage codewords selected by the first stage quantizer.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:
In the following description, numerous specific details are set forth to provide a thorough understanding of the invention. However, it is understood that the invention may be practiced without these specific details. In other instances, well-known structures and techniques have not been shown in detail in order not to obscure the invention.
The technique used by the searching unit 401 to select codewords from the non-structured code book C 402 to dynamically form the reduced complexity code book from the current input source vector depends on the implementation. However, the technique used will operate by performing less than a comparison of the source vector to every codeword in the codebook C. In particular, assume the codebook C includes L codewords. The searching unit will identify a subset of the L codewords without comparing the current source vector to each of the L codewords. The reduced complexity codebook is then used by the quantizer 405 to quantize the source vector. As such, the source vector is quantized with a subset of the codewords from the original non-structured codebook C, rather than a direct sum of small structured codebooks as used in MSVQ techniques. In addition, the system of
This list enters second-stage reduced complexity codebook constructor 330. The second-stage reduced complexity codebook constructor 330 is coupled to reference table memory unit 340. For each index of a codeword from first stage codebook C1, the reference table memory unit 340 keeps a precomputed set of P indices of second stage codewords from C. The second stage codebook C2 is dynamically constructed by selecting codewords from C based on this table. In particular, let C2(j) denote the subset of C corresponding to the xi
The second stage reduced complexity codebook enters second-stage quantizer 320. The second-stage quantizer selects the best (closest to source vector) codeword from among the codewords of the reduced complexity codebook. This index of the codeword is the output of quantizer 300.
Thus, the searching unit of
MSE and Complexity of Some List Quantization Schemes for 16 Codewords 2-Dimensional Quantizers
The complexity κ2 of the multistage list quantizer shown in
where L1 and L2 are the sizes of first-stage and second-stage codebooks, and M is the number of candidates kept after the first stage, and C2(ji) denotes the second-stage codebook corresponding to codeword ji of the first-stage codebook. The total number of codewords is, in general, less than L1L2. Note that the value of κ2 depends on the list of candidates (j1, . . . , jM) chosen by the first-stage quantizer. It means that the complexity of this scheme is a random variable, but is upper bounded by the right side of inequality (2).
For example, consider a (5,5,2)-scheme. FIG. 6 and Table 1 show that depending on the 2 words chosen by the first-stage quantizer, the second-stage quantizer will search for the best codeword among 8 or 9 candidates. For instance, if first-stage quantizer chose pair {a, b} as a list, then the number of candidates is equal to 9, if the pair {a, c} is chosen, then number of candidates is equal to 8. Taking into account that first stage quantizer computes the error 5 times, the total complexity of (5,5,2)-scheme is estimated as 13.49.
Complexities of different 16-word 2-dimensional quantizers are given in Table 2. Note that (5,7,1) and (5,5,2) methods provide the same quantization quality as a prior art full search quantizer and requires fewer computations. At the same time conventional two-stage M-best quantizers can not provide this quality level irrespectively of the computational complexity. In general, the computational load may be reduced 4-5 times for 4-5 dimensional codebooks of size equal or greater than 512 codewords.
The MSLQ, in a two-stage embodiment, may use two codebooks: RQC (rough quantization codebook) and FQC (fine quantization codebook). Also, the MSLQ can store the reference table information describing each RQC entry, the indices of some predetermined number FQC entries surrounding the RQC vector. MSLQ 300 can implement the following steps. Use an RQC for input vector quantization, and select a predetermined number of candidates. Then, construct a second-stage codebook. This subbook is union of FQC subsets corresponding to selected candidates in reference table. Among the second-stage codebook entries, choose the one closest to input vector in the sense of predetermined distortion measure. Use it's FQC index as a codeword.
This method may be used for more than two quantization stages. For this purpose the sequence of codebooks of increasing size have to be constructed. For each of the previous-stage codewords, the predetermined number indices of the next-stage codewords surrounding that previous-stage codeword are kept in the reference table. Quantization starts with list quantization using the smallest codebook. Then using reference table(s) the second stage codebook is constructed as a union of the sets corresponding to the candidates chosen on the first stage, etc. The final quantization result is one of largest codebook entries. Its index is a codeword corresponding to current LSF vector.
An alternative embodiment of vector quantization utilizing MSLQ shown in
Further processing of error vectors is performed by two independent branches. These branches differ one from another in parameters of splitting means and codebooks used for subvectors quantization. It is clear that generally speaking any number of processing branches may be used in another embodiment of the present invention. Those vectors that enter first splitting means 730 are split into a predetermined number of subvectors of smaller dimension. In this embodiment the input vectors are split into 2 subvectors each. Then each subvector is quantized by a corresponding MSLQ unit 740, 750. A similar processing occurs in second splitting means 735 and MSLQ units 760 and 770. Each of the MSLQ units may have its own set of codebooks different from codebook used by other MSLQ units. The outputs of the MSLQ units are sets of quantized subvectors along with corresponding codebook indices. This information enters the select best candidate unit 780, where a final decision about the best candidate is made. The output of quantizer contains the index of the best candidate and indices of 4 codebooks calculated in MSLQ units 740, 750, 760, 770.
The split-vector modification of the MSLQ of
As indicated above, the codebook (or set of candidates) used by the first-stage quantizer 710 includes 2 parts: a standard part and an adaptively varying part. The varying part is represented by the set of predicted LSF vectors. Variable length codewords are assigned to the candidates, because predicted LSF vectors usually are chosen more frequently than the standard LSF vectors. To satisfy this requirement, variable size codebooks are used for the second-stage (SMSLQ) quantization.
The advantage of MSLQ quantization over prior art MSVQ quantization is that MSLQ achieves the same quality as an exhaustive search over the FQC codebook, whereas the set of MSVQ-quantized vectors is direct sum of the stage codebook. The non-structured FQC codebook provides significantly better quantization accuracy than the structured codebooks used in the traditional multistage M-best quantization procedure.
The performance of this embodiment can be compared with the performance of other LSF coding schemes using a weighted Euclidean distance measure which is widely used in speech coding. This weighted distance (WD) d(f,f′) between the input vector f=(f′, . . . , fp) and the quantized vector f′=(f1′, . . . , fp′) is given by
where p is the number of elements in f, and wj is a weight assigned to the j th frequency. p=10 in this example. Also, weighting coefficients wj, used in G.723 standard, are applied. This metric weights wj are given by
w,=1/(f2−f1),
wj−1/min(fj−fj−1, fj+1−fj),j=2 . . . 9,
w10=1/(f10−f9).
In one embodiment of the present invention the following parameters of the quantizer of
Denote by M the number of candidates chosen by the first-stage quantizer. The switch unit forwards to first splitting means those error vectors which correspond to the predicated LSF vector (if the predicated LSF vector is selected as one of the candidates), and it forwards to second splitting means the remaining error vectors. Both splitting means split input 10-dimensional vectors into pair of 5-dimensional vectors. Denote by L1, L2, L3 and L4 the codebook sizes of codebooks used in MSLQ 1, . . . , MSLQ 4 units. These codebooks are also found using the LBG technique. The parameters of the MSLQ units may be chosen by such a way that quantization precision is the same as for a full-search quantization. To achieve a better number of bits/quantization accuracy tradeoff, a variable-length encoding of candidate indices and different sizes L1, . . . , L4 are used. To meet the fixed total number of bits constraint, a larger codebook is used for those candidates for which the candidate's codeword length is shorter. An example of bit allocation is shown on FIG. 8.
The simulation results for different bit rates and bit allocations are shown in Table 3 for fixed rate LSF quantizers with bit rate 15 . . . 22 b/frame. The quantization accuracy is characterized by the average weighted distortion (AWD). The AWD for FS-1016 standard scalar 34 bits/frame quantizer and 24 bits/frame vector-split ITU G.723 standard quantizer are given for the comparison.
128
128
128
128
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described. The method and apparatus of the invention can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting on the invention.
This application claims the benefit of U.S. Provisional Patent Application No. 60/157,647, entitled “Method And Apparatus For A Linear Spectral Frequency Audio Compression,” filed Oct. 4, 1999.
Number | Name | Date | Kind |
---|---|---|---|
5598505 | Austin et al. | Jan 1997 | A |
5765127 | Nishiguchi et al. | Jun 1998 | A |
5774839 | Shlomot | Jun 1998 | A |
5802487 | Tanaka | Sep 1998 | A |
6041297 | Goldberg | Mar 2000 | A |
Number | Date | Country | |
---|---|---|---|
60157647 | Oct 1999 | US |