The present invention relates to a audio/speech encoding apparatus, audio/speech decoding apparatus and audio/speech encoding and decoding methods using vector quantization.
In audio and speech coding, there are mainly two types of coding approaches: Transform Coding and Linear Prediction Coding.
Transform coding involves the transformation of the signal from time domain to spectral domain, such as using Discrete Fourier Transform (DFT) or Modified Discrete Cosine Transform (MDCT). The spectral coefficients are quantized and encoded. In the process of quantization or encoding, psychoacoustic model is normally applied to determine the perceptual importance of the spectral coefficients, and then the spectral coefficients are quantized or encoded according to their perceptual importance. Some popular transform codecs are MPEG MP3, MPEG AAC [1] and Dolby AC3. Transform coding is effective for music or general audio signals. A simple framework of transform codec is shown in
In the encoder illustrated in
Psychoacoustic model analysis is done on the frequency domain signal S(f) to derive the masking curve (103). Quantization is applied on the frequency domain signal S(f) (102) according to the masking curve derived from the psychoacoustic model analysis to ensure that the quantization noise is inaudible.
The quantization parameters are multiplexed (104) and transmitted to the decoder side.
In the decoder illustrated in
The decoded frequency domain signal {tilde over (S)}(f) is transformed back to time domain, to reconstruct the decoded time domain signal {tilde over (S)}(n) using frequency to time transformation method (107), such as Inverse Discrete Fourier Transform (IDFT) or Inverse Modified Discrete Cosine Transform (IMDCT).
On the other hand, linear prediction coding exploits the predictable nature of speech signals in time domain, obtains the residual/excitation signal by applying linear prediction on the input speech signal. For speech signal, especially for voiced regions, which have resonant effect and high degree of similarity over time shifts that are multiples of their pitch periods, this modelling produces very efficient presentation of the sound. After the linear prediction, the residual/excitation signal is mainly encoded by two different methods, TCX and CELP.
In TCX [2], the residual/excitation signal is transformed and encoded efficiently in the frequency domain. Some popular TCX codecs are 3GPP AMR-WB+, MPEG USAC. A simple framework of TCX codec is shown in
In the encoder illustrated in
The residual signal Sr(n) is transformed to frequency domain signal Sr(f) using time to frequency transformation method (205), such as Discrete Fourier Transform (DFT) or Modified Discrete Cosine Transform (MDCT).
Quantization is applied on Sr(f) (206) and quantization parameters are multiplexed (207) and transmitted to the decoder side.
In the decoder illustrated in
The quantization parameters are dequantized to reconstruct the decoded frequency domain residual signal {tilde over (S)}r(f) (210).
The decoded frequency domain residual signal {tilde over (S)}r(f) is transformed back to time domain, to reconstruct the decoded time domain residual signal {tilde over (S)}r(n) using frequency to time transformation method (211), such as Inverse Discrete Fourier Transform (IDFT) or Inverse Modified Discrete Cosine Transform (IMDCT).
With the dequantized LPC parameters from the dequantization module (209), the decoded time domain residual signal {tilde over (S)}r(n) is processed by LPC synthesis filter (212) to obtain the decoded time domain signal {tilde over (S)}(n).
In the CELP coding, the residual/excitation signal is quantized using some predetermined codebook. And in order to further enhance the sound quality, it is popular to transform the difference signal between the original signal and the LPC synthesized signal to frequency domain and further encode. Some popular CELP codecs are ITU-T G.729.1 [3], ITU-T G.718[4]. A simple framework of hierarchical coding (layered coding, embedded coding) of CELP and transform coding is shown in
In the encoder illustrated in
The prediction error signal Se(n) is transformed into frequency domain signal Se(f) using time to frequency transformation method (303), such as Discrete Fourier Transform (DFT) or Modified Discrete Cosine Transform (MDCT).
Quantization is applied on Se(f) (304) and quantization parameters are multiplexed (305) and transmitted to the decoder side.
In the decoder illustrated in
The quantization parameters are dequantized to reconstruct the decoded frequency domain residual signal {tilde over (S)}e(f) (308).
The decoded frequency domain residual signal {tilde over (S)}e(f) is transformed back to time domain, to reconstruct the decoded time domain residual signal {tilde over (S)}e(n) using frequency to time transformation method (309), such as Inverse Discrete Fourier Transform (IDFT) or Inverse Modified Discrete Cosine Transform (IMDCT).
With the CELP parameters, the CELP decoder reconstructs the synthesized signal Ssyn(n) (307), the decoded time domain signal {tilde over (S)}(n) is reconstructed by summing up the CELP synthesized signal Ssyn(n) and the decoded prediction error signal {tilde over (S)}e(n).
The transform coding and the transform coding part in linear prediction coding are normally performed by utilizing some quantization methods.
One of the vector quantization methods is named as split multi-rate lattice VQ or algebraic VQ (AVQ) [5]. In AMR-WB+ [6], split multi-rate lattice VQ is used to quantize the LPC residual in TCX domain (as shown in
Split multi-rate lattice VQ is a vector quantization method based on lattice quantizers. Specifically, for the split multi-rate lattice VQ used in AMR-WB+ [6], the spectrum is quantized in blocks of 8 spectral coefficients using vector codebooks composed of subsets of the Gosset lattice, referred to as the RE8 lattice (see [5]).
All points of a given lattice can be generated from the so-called squared generator matrix G of the lattice, as c=s·G, where s is a line vector with integer values and c is the generated lattice point.
To form a vector codebook at a given rate, only lattice points inside a sphere (in 8 dimensions) of a given radius are taken. Multi-rate codebooks can thus be formed by taking subsets of lattice points inside spheres of different radii.
A simple framework which utilizes the split multi-rate vector quantization in TCX codec is illustrated in
In the encoder illustrated in
The residual signal Sr(n) is transformed to frequency domain signal Sr(f) using time to frequency transformation method (405), such as Discrete Fourier Transform (DFT) or Modified Discrete Cosine Transform (MDCT).
Split multi-rate lattice vector quantization method is applied on Sr(f) (406) and quantization parameters are multiplexed (407) and transmitted to the decoder side.
In the decoder illustrated in
The quantization parameters are dequantized by split multi-rate lattice vector dequantization method to reconstruct the decoded frequency domain residual signal {tilde over (S)}r(f) (410).
The decoded frequency domain residual signal {tilde over (S)}r(f) is transformed back to time domain, to reconstruct the decoded time domain residual signal {tilde over (S)}r(n) using frequency to time transformation method (411), such as Inverse Discrete Fourier Transform (IDFT) or Inverse Modified Discrete Cosine Transform (IMDCT).
With the dequantized LPC parameters from the dequantization module (409), the decoded time domain residual signal {tilde over (S)}r(n) is processed by LPC synthesis filter (412) to obtain the decoded time domain signal {tilde over (S)}(n).
Each codebook consists of a number of code vectors. The code vector index in the codebook is represented by a number of bits. The number of bits is derived by equation 1 as shown below:
N
bits=log2(Ncv) (Equation 1)
where
Nbits means the number of bits consumed by the code vector index
Ncs means the number of code vectors in the codebook
In the codebook Q0, there is only one vector, the null vector, means the quantized value of the vector is 0. Therefore no bits are required for the code vector index.
As there are three sets of the quantization parameters for split multi-rate lattice VQ: the index of global gain, the indications of the codebooks and the indices of the code vectors. The bitstream are normally formed in two ways. The first method is illustrated in
In
In
When the bits available are not many, or when the spectrum to be quantized concentrates energy in certain frequency band, it happens that many vectors are quantized as 0 (null vector), results in a lot of null vectors in the decoded spectrum, in other words, the spectrum is very sparse.
In prior arts, the codebook indications and code vector indices are directly converted to binary number and form the bit stream.
Therefore the total bits consumption for all the vectors can be calculated in the following manner:
where
Bitstotal is the total bits consumption
Bitsgain
Bitscb
Bitscv
N is the total number of vectors in the whole spectrum
The sparseness of the spectrum is not exploited to achieve possible bits saving, in other words, some bits are wasted to indicate the null vectors.
In this invention, an efficient method is introduced to convert the AVQ codebook indications for null vectors to another efficient index by exploiting the sparseness of the signal spectrum.
Because Q0 is indication of null vectors and all other codebooks are indication of non-null vectors, the spectral sparseness information can be achieved by analyzing the codebook indications of all the vectors. This step is named as spectral cluster analysis and the detail process is illustrated as below:
Threshold=Bitsnull
where
Bitsnull
Bitsindicaiton is the bits consumption to inidcate the null vectors region
Bitsindex
Threshold is the threshold to judge the null vectors region
An example is illustrated in
For the conventional method, the parameters to be transmitted are:
1) Quantization index of the global gain
2) Codebook indications for all the vectors
3) Code vector indices for all the vectors
The total bits consumption for encoding of all the parameters is found as follows (it is assumed that bits available are enough to encode the parameters for all the vectors):
where
Bitstotal is the total bits consumption
Bitsgain
Bitscb
Bitscv
N is the total number of vectors in the whole spectrum
As the null vectors are quantized by Q0, therefore, for each null vector, one bit is consumed.
Then,
where
Bitsoriginal is the total bits consumption for the conventional method
Bitsgain
Bitscb
Bitscv
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
For the method proposed in this invention, the parameters to be transmitted are:
The total bits consumption for encoding of all the parameters (it is assumed that bits available are enough to encode the parameters for all the vectors):
where
Bitsnew is the total bits consumption for the proposed method in this invention
Bitsgain
Bitscb
Bitscv
Bitsindicaiton is the bits consumption to inidcate the null vectors region
BitsIndex
Index_end is the index of the ending vector of the null vectors region
By applying the invented method, it is possible to achieve some bits saving. The bits saving by the method proposed in this invention is calculated as following:
Bitssave=(Index_end−Index_start+1)−Bitsindication−BitsIndex
where
Bitssave is the bits saving by the proposed method in this invention
Bitsindicaiton is the bits consumption to inidcate the null vectors region
BitsIndex
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
In the spectral cluster analysis step 2), it is examined that the number of vectors in the null vectors region is larger than Threshold.
Numnull
where
Threshold is the threshold to judge the null vectors region
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
Numnull
And Threshold is determined by equation 3.
From the two equations, equation 3 and equation 8, we can have the conclusion below:
(Index_end−Index_start+1)>(Bitsindication+BitsIndex
where
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
Bitsindicaiton is the bits consumption to inidcate the null vectors region
BitsIndex
Therefore, bits saving is achived by the proposed method in this invention (Bitssave>0).
The main principle of the invention is described in this section with the aid of
In the encoder illustrated in
Psychoacoustic model analysis is done on the frequency domain signal S(f) to derive the masking curve (1002). Split multi-rate lattice vector quantization is applied on the frequency domain signal S(f) according to the masking curve derived from the psychoacoustic model analysis to ensure that the quantization noise is inaudible (1003).
The split multi-rate lattice vector quantization has three sets of quantization parameters: the quantization index of the global gain, and codebook indications and code vector indices.
The codebook indications are sent for spectral clusters analysis (1004). The spectral sparseness information is extracted by the spectral clusters analysis, and it is used to convert the codebook indications to another set of codebook indications (1005).
The global gain index, the code vector indices and the new codebook indications are multiplexed (1006) and transmitted to the decoder side.
In the decoder illustrated in
The new codebook indications are used to decode the original codebook indications (1008). The global gain index, the code vector indices and the original codebook indications are dequantized by the split multi-rate lattice vector dequantization method (1009) to reconstruct the decoded frequency domain signal {tilde over (S)}(f).
The decoded frequency domain signal {tilde over (S)}(f) is transformed back to time domain, to reconstruct the decoded time domain signal {tilde over (S)}(n) using frequency to time transformation method (1010), such as Inverse Discrete Fourier Transform (IDFT) or Inverse Modified Discrete Cosine Transform (IMDCT).
The proposed implementation method of spectral clusters analysis and codebook indications encoder is illustrated in
In
In this method, there are 5 steps, and each step is illustrated with figures. In this illustration, suppose that there are in total 22 vectors and the vector index starts from 0 and ends at 21.
Threshold=Bitsnull
where
Bitsnull
Bitsindicaiton is the bits consumption to inidcate the null vectors region
BitsIndex
In this example, since 6 bits and 2 bits are assigned to Bitsindicattion and Bitsindex
In
In
From these two tables, it can be seen that the indication of the null vectors region utilizes the indication of the Q6 codebook indication. 2 bit codebook is used to quantize the possible Index_end. Therefore, for the null vectors region, the total bits consumption is 8. And for the codebooks Qn (n≧6), they use the indication of Qn+1 (n≧6), means that their bits consumption is one bit higher than original indication.
In order to quantize the Index_end using a 2 bit codebook, the representative values are determined adaptively according to the range of the possible values of Index_end. The range for the possible value of Index_end is split to 4 portions. Each portion is represented by one representative value. The step (number of null vectors) of each portion is determined by the equation below:
cb_step=└(Max−Min+1)/4┘=└(21−11+1)/4┘=2 (Equation 11)
where
cb_step means the average number of values in each portion
Max is the maximum possible value of Index_end
Min is the minimum possible value of Index_end
The representative value is determined by the equation below:
cvε{0, 1, 2, 3}
where
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
Threshold is the threshold to judge the null vectors region
cv is the code vector to represent the value of Index_end
cb_step is the number of values in each portion
In this example, the total bits consumption to encode all the codebook indications by original method is:
where
Bitscb
Bitscb
N is the total number of vectors in the whole spectrum
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
In this example, the total bits consumption to encode all the codebook indications by the invented method is:
where
Bitscb
Bitscb
N is the total number of vectors in the whole spectrum
Bitsindicaiton is the bits consumption to inidcate the null vectors region
Bits
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
The bits saving by the method proposed in this invention is calculated as following:
where Bitscb
Bitscb
Bitsindicaiton is the bits consumption to inidcate the null vectors region
Bits
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
The step (number of null vectors) of each portion is determined by the equation below:
cb_step=(Max−Min+1)/4=(21−11+1)/4=2.75 (Equation 16)
where cb_step means the average number of values in each portion
Max is the maximum possible value of Index_end
Min is the minimum possible value of Index_end
The value of Index_end which is represented by the code vector is determined by the equation below:
cv ε{0, 1, 2, 3}
where Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
Threshold is the threshold to judge the null vectors region
cv is the code vector to represent the value of Index_end
cb_step is the number of values in each portion
In this example, the total bits consumption to encode all the codebook indications by original method is:
where
Bitscb
Bitscb
N is the total number of vectors in the whole spectrum
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
In this example, the total bits consumption to encode all the codebook indications by the proposed method is:
where
Bitscb
Bitscb
N is the total number of vectors in the whole spectrum
Bitsindicaiton is the bits consumption to inidcate the null vectors region
Bits
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
The bits saving by the method proposed in this invention is calculated as following:
where Bitscb
Bitscb
Bitsindicaiton is the bits consumption to inidcate the null vectors region
Bits
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
The methods to determine the code vectors are not limited to the examples given above. Those who are skilled in the art will be able to modify and adapt other methods without deviating from the spirit of the invention.
In this embodiment, by doing spectral analysis on the split multi-rate vector quantized spectrum, the spectrum is split to null vectors region and non-null vectors region.
For the null vectors region, instead of transmitting Q0 indication for null vectors, an indication of null vectors region and the quantized value of the index of the ending vector (denoted as ending index) of the null vectors region are transmitted.
The indication of null vectors region uses one of the codebook indications which are not used so frequently. The original codebook is indicated by other indication.
The ending index is quantized by an adaptively designed codebook. All the possible values of the ending index are split to a few portions, the length of each portion is adaptively determined according to the total number of possible values of the ending index. Each portion is represented by one of the representative value in the codebook.
Therefore, bits saving are achieved by applying the inventive method for consecutive null vectors.
Furthermore, in this embodiment, the value of ending index is quantized by a codebook whose number of representative values is denoted as N. The range of the possible values of the ending index is split to N portions. The minimum value in each portion is selected as the representative value of the portion.
Therefore, there is also an advantage that the bits consumption for the codebook of the ending index is fixed. But the representative values are adaptively determined according to the range of the possible values of the ending index, which can efficiently quantize the ending index for different scenarios.
Furthermore, as shown in
In this case, the indication of null vectors region uses one of the codebook indications which are not used frequently. And one more bit is utilized to indicate whether it is null vectors region or original codebook indication.
Therefore, there is an advantage that only one codebook indication is affected while all other codebooks remain same. If the indication is chosen appropriately (it is not used very frequently as codebook indication). More bits can be saved.
When the null vectors region is in the lower frequency range, instead of quantization of the ending index, the starting index (the index of the starting vector in the null vectors region) is quantized. The bit stream is reversed, so that the ending index is known in decoder side. It is preferable to compare the bits saving between the quantization of the starting index and quantization of the ending index, so that the method which saves more bits can be utilized.
As shown in
where
cb_step means the average number of values in each portion
Max is the maximum possible value of Index_end
Min is the minimum possible value of Index_end
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
Threshold is the threshold to decide whether a null vectors portion is the null vectors region
The representative value is determined by the equation below:
cv ε{0,1,2,3}
where
Index_start is the index of the starting vector of the null vectors region
Threshold is the threshold to judge the null vectors region
cv is the code vector to represent the value of Index_end
cb_step is the number of values in each portion
Because the Cb_step is very large, the difference between the neighbouring values of
For some conditions, the error between the quantized value and the real value of Index_end is large too. In this example,
Errorfs=Index_end−
where,
Index_end is the index of the ending vector of the null vectors region
Errorfs is the quantization error of the Index_end
Therefore, a method which quantizes the starting index instead of the ending index is proposed, and the series of codebook indications will be reversed to notify the value of Index_end to the decoder.
For the example in
Threshold=Bitsnull
Max=Index_end−Threshold=3 (Equation 28)
where,
cb_step means the average number of values in each portion
Max is the maximum possible value of Index_start
Min is the minimum possible value of Index_start
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
Threshold is the threshold to decide whether a null vectors portion is the null vectors region
Bitsnull
Bitsindicaiton is the bits consumption to inidcate the null vectors region, in this example 7 bits is consumed
BitsIndex
The cb_step and the representative values of Index_start,
cb_step=└(Max−Min+1)/4┘=└(3−0+1)/4┘=1 (Equation 29)
cv ε{0,1,2,3}
where,
Index_end is the index of the ending vector of the null vectors region
Threshold is the threshold to judge the null vectors region
cv is the code vector to represent the value of Index_end
cb_step is the number of values in each portion
cb_step=(Max−Min+1)/4=(3−0+1)/4=1 (Equation 32)
cv ε{0,1,2,3}
where
Index_end is the index of the ending vector of the null vector sregion
threshold is the threshold to judge the null vectors region
cv is the code vector to represent the value of Index_end
cb_step is the number of values in each portion
From equation 31 and equation 34, it can be seen that the
The Cb_step and the representative values of Index_start,
Errorbs=Index_start−
where,
Index_start is the index of the starting vector of the null vectors region
Errorbs is the quantization error of the Index_start
The method in embodiment 1 is named as forward searching as it determines the Cb_step by Index_start and total number of vectors. The method in this embodiment is named as backward searching as it determines the Cb_step by Index_end.
Although one more bit (9 bits for indication of backward searching, 8 bits for the indication of forward searching) is consumed to indicate the backward searching method, there is one more bit saved by the backward searching method comparing to forward searching method.
Bitssave
where,
Bitssave
Errorfs is the quantization error of the Index_end in forward searching
Errorbs is the quantization error of the Index_start in backward searching
In
In the codebook table for inventive method, the forward searching indication is not changed. And the backward searching is indicated by adding one 0 in front of the forward searching. This indication would not be misinterpreted as Q0+forward searching (0+111110) as it is not possible to have a null vector before the null vectors region.
In the decoder side, there are 3 steps to reconstruct the list of the codebook indications.
In this embodiment, when the null vectors region is in the lower frequency range, instead of quantization of the ending index, the starting index (the index of the starting vector in the null vectors region) is quantized. The bit stream is reversed, so that the ending index is known in decoder side. It is preferable to compare the bits saving between the quantization of the starting index and quantization of the ending index, so that the method which saves more bits can be utilized. Therefore, more bits saving can be achieved.
In embodiment 2, the reverse operation requires more computational power. In this embodiment, a method which requires no reversal of the list of the codebook indications is proposed.
For backward searching method, the Cb_step is calculated in the following equation:
cb_step−└(Index_end−8)/4┘ (Equation 37)
where
Index_end is the index of the ending vector of the null vectors region
cb_step is the number of values in each portion
The number of the null vectors in the null vectors region is calculated as the following equation:
no_null−10+cv*cb_step (Equation 38)
cv ε{0,1,2,3}
where
cv is the code vector to represent the value of Index_end
cb_step is the number of values in each portion
no_null is the number of null vectors in the null vector region
From equations 37 and 38, the following equation can be derived
Index_end−Index_start+1=10+cv*└(Index_end−8)/4┘ (Equation 39)
Here, if ‘Index_end−8’ is multiples of 4, then equation (39) is modified to equation (43) in a few steps:
where
cv is the code vector to represent the value of Index_end
cb_step is the number of values in each portion
no_null is the number of null vectors in the null vector region
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
From equation 43, it is possible to design the values of cv/(4-cv) so that number of null vectors can be derived from the value of Index_start.
The set of coefficients can be defined as
where
cv is the code vector to represent the value of Index_end
as an example.
In this embodiment, instead of reversing the bit stream, the number of null vectors is quantized as a scalar multiplies the value of starting index. It is preferable to train the scalars before hand and each scalar is represented by one of the code vectors in the codebook. There is an advantage that bit stream reversal can be avoided and complexity is reduced in this embodiment.
In this embodiment, it is possible to reduce the bits consumption according to the range of the possible values of the Index_end.
Min=Index_start+Threshold (Equation 45)
where
Min is the minimum possible value of Index_end
Index_start is the index of the starting vector of the null vectors region
Index_end is the index of the ending vector of the null vectors region
Threshold is the threshold to decide whether a null vectors portion is the null vectors region
The maximum possible value of the Index_end, denoted as Max, is:
Max=Total_num_of_vectors−1 (Equation 46)
where
Max is the maximum possible value of Index_end
Total_num_of_vectors is the total number of vectors in the spectrum
Then the range of the possible values of the Index_end is from Min to Max.
If we define Length as the total number of possible values of Index_end, according to the value of length, there are 4 different cases:
No bit is required to indicate the value of Index_end as there is only one possibility. Total bits consumption=6
One bit is required to indicate the value of Index_end as there are only two possibilities. Total bits consumption=6+1=7
Two bits are required to indicate the value of Index_end as there are three possibilities. Total bits consumption=6+2=8
The values of the Index_end are to be quantized by 2 bit codebook (which has 4 representative values). All the possible value of Index_end is split to 4 portions.
Each portion is represented by one representative value. Total bits consumption=6+2=8
In this embodiment, according to the number of possible values of ending index, the number of bits to represent the code vectors is adaptively decided. Such as if the length of possible number of null vectors is 1, and then no bit is required to indicate the number of null vectors. There is an advantage that more bits can be saved in this embodiment.
For the indication method of the null vectors region in the embodiment 1, each codebook indication for Qn(n≧6) consumes one more bit comparing with conventional method. If the input signal has M vectors which quantized by Qn(n≧6), and has no null vectors region, then M more bits are wasted on the codebook indication comparing with conventional method.
In this embodiment, a more efficient indication method for the null vectors region is proposed.
As shown in
In
Case 1: No vector using codebook Qn(n 6) and no null vectors region exists
when index<=Total_num_of_vectors-Threshold
Table 1 is used and no indication is required to indicate the indication table
Case 2: Null vectors region exist when index<=Total_num_of_vectors-Threshold
Table 2 is used and indication is done on the first vectors whose codebook is higher than Q5. It is preferable to ensure that the bits save achieved by null vectors representation is larger than bits increment caused by vectors which use codebook Qn(n≧6)
Case 3: Null vectors region doesn't exist, but some vectors using codebook>Q5
when index<=Total_num_of_vectors-Threshold
Table 1 is used and indication is done on the first vector whose codebook is higher than Q5
For the null vectors region indication in this embodiment, two indication tables are utilized. For the frames which have no null vectors region, conventional indication table is utilized.
For the frames which have null vectors region, the null vectors region indication table is utilized. One bit is consumed to indicate which table is utilized when necessary. In this embodiment, the bits waste to indicate the higher codebooks for the frames which have no null vectors region is limited to 1 bit.
For the frames which have the null vectors region up to the last vector, a specific indication is used. So that the errors for the number of null vectors caused by the Cb_step can be avoided
The indication table is shown in the
In this embodiment, for the frames which have the null vectors region up to the last vector, a specific indication is used, so that the quantization error of the ending index can be avoided. Therefore, there is an advantage that more bits can be saved for the frames which have the null vectors region up to the last vector.
The feature of this embodiment is the invented methods are applied in TCX codec.
The proposed idea is illustrated in
In the encoder illustrated in
The residual signal Sr(n) is transformed into frequency domain signal Sr(f) using time to frequency transformation method (2505), such as Discrete Fourier Transform (DFT) or Modified Discrete Cosine Transform (MDCT).
Split multi-rate lattice vector quantization is applied on the frequency domain signal Sr(f) (2506).
The split multi-rate lattice vector quantization has three sets of quantization parameters: the quantization index of the global gain, and codebook indications and code vector indices.
The codebook indications are sent for spectral clusters analysis (2507). The spectral sparseness information is extracted by the spectral clusters analysis, and it is used for convert the codebook indications to another set of codebook indications (2508).
The global gain index, the code vector indices and the new codebook indications are multiplexed (2509) and transmitted to the decoder side.
In the decoder illustrated in
The new codebook indications are used to decode the original codebook indications (2511). The global gain index, the code vector indices and the original codebook indications are dequantized by the split multi-rate lattice vector dequantization method (2512) to reconstruct the decoded frequency domain signal {tilde over (S)}r(f).
The decoded frequency domain residual signal {tilde over (S)}r(f) is transformed back to time domain, to reconstruct the decoded time domain residual signal {tilde over (S)}r(n) using frequency to time transformation method (2513), such as Inverse Discrete Fourier Transform (IDFT) or Inverse Modified Discrete Cosine Transform (IMDCT).
With the dequantized LPC parameters from the dequantization module (2514), the decoded time domain residual signal {tilde over (S)}r(n) is processed by LPC synthesis filter (2515) to obtain the decoded time domain signal {tilde over (S)}(n).
The feature of this embodiment is the spectral cluster analysis method is applied in hierarchical coding (layered coding, embedded coding) of CELP and transform coding.
In the encoder illustrated in
The prediction error signal Se(n) is transformed into frequency domain signal Se(f) using time to frequency transformation method (2603), such as Discrete Fourier Transform (DFT) or Modified Discrete Cosine Transform (MDCT).
Split multi-rate lattice vector quantization is applied on the frequency domain signal Se(f) (2604).
The split multi-rate lattice vector quantization has three sets of quantization parameters: the quantization index of the global gain, and codebook indications and code vector indices.
The codebook indications are sent for spectral clusters analysis (2605). The spectral sparseness information is extracted by the spectral clusters analysis, and it is used for convert the codebook indications to another set of codebook indications (2606).
The global gain index, the code vector indices and the new codebook indications are multiplexed (2607) and transmitted to the decoder side.
In the decoder illustrated in
The new codebook indications are used to decode the original codebook indications (2609). The global gain index, the code vector indices and the original codebook indications are dequantized by the split multi-rate lattice vector dequantization method (2610) to reconstruct the decoded frequency domain signal {tilde over (S)}e(f).
The decoded frequency domain residual signal {tilde over (S)}e(f) is transformed back to time domain, to reconstruct the decoded time domain residual signal {tilde over (S)}e(n) using frequency to time transformation method (2611), such as Inverse Discrete Fourier Transform (IDFT) or Inverse Modified Discrete Cosine Transform (IMDCT).
With the CELP parameters, the CELP decoder reconstructs the synthesized signal Ssyn(n) (2612), the decoded time domain signal {tilde over (S)}(n) is reconstructed by summing up the CELP synthesized signal Ssyn(n) and the decoded prediction error signal {tilde over (S)}e(n).
In this embodiment, as shown in
The encoding and decoding process is almost the same as in embodiment 8, except that the index of the global gain or the global gain itself from the split multi-rate is sent to adaptive gain quantization block (2706). Instead of directly quantize the global gain, the adaptive gain quantization method explores the relevancy between the synthesized signal and the coding error signal which is quantized by the split multi-rate lattice vector quantization, so that the global gain can be more efficiently quantized in a smaller range.
There are two methods to implement the AVQ gain quantization:
Method 1
Step 1: Search for the maximum absolute value syn_max of the synthesized signal Ssyn(f)
Step 2: Compute the ratio of AVQ_gain/syn_max
Step 3: Quantize the ratio of AVQ_gain/syn_max in a narrow downed range (It is preferable to train the narrow downed range using different signal sequences beforehand)
Method 2
Step 1: Search for the maximum absolute value syn_max of the synthesized signal Ssyn(f)
Step 2: Quantize AVQ_gain, suppose index=Index1
Step 3: Quantize syn_max, suppose index=Index2
Step 4: transmit the Index2-index1 in a narrowed range
(It is preferable to train the narrow downed range using different signal sequences beforehand)
If the CELP core codec has different bit rates, it is preferable to design different narrow downed ranges for different bitrate of the CELP coder. As shown in
In this embodiment, an adaptive global gain quantization method is introduced. The method consists of steps:
Because the searching range of the gain is narrowed down, fewer bits are required for the gain quantization.
The feature of this embodiment is the bits saved from the spectral cluster analysis method are utilized to improve the gain accuracy for the quantized vectors.
The encoding and decoding process is almost the same as in embodiment 1, except that the bits saved from the proposed method in embodiment 1 are used to improve the gain accuracy by applying the adaptive vector gain correction on the global gain (2906).
The adaptive vector gain correction is designed to correct the gain according to the number of bits saved from the spectral clusters analysis method. If the bits saved are very few, then the spectrum is split to a smaller number of sub bands, and one gain correction factor is computed for each sub band. On the other hand, if the bits saved are quite many, then the spectrum is split to a larger number of sub bands, and one gain correction factor is computed for each sub band. The gain correction factor for the sub band which has the coefficients indexing from M to N can be computed in the equation below:
where
S(f) are the input spectral coefficien ts to the split multi-rate VQ
Snorm (f) are the output spectral coefficien ts from the split multi-rate VQ
M is starting index of the coefficien ts in the target sub band
N is the last index of the coefficien ts in the target sub band
Gainoriginal is the original global gain
Gainnew is the new gain derived for the target subband
Gaincorrection is the derived correction factor for the target subband
The gain correction factors are multiplexed (2907) and transmitted to decoder side.
In the decoder side, the gain correction factors are used to correct the decoded spectrum {tilde over (S)}(f) (2911) according to the equation below:
{tilde over (S)}′(f)=
where
{tilde over (S)}(f) are the decoded spectral coefficien ts from the split multi-rate VQ
{tilde over (S)}′(f) are the gain corrected spectral coefficien ts
Gaincorrection is the derived correction factor for the target subband
The gain corrected spectrum {tilde over (S)}′(f) is transformed back to time domain, to reconstruct the decoded time domain signal {tilde over (S)}(n) using frequency to time transformation method (2912), such as Inverse Discrete Fourier Transform (IDFT) or Inverse Modified Discrete Cosine Transform (IMDCT).
In this embodiment, the bits saved from the spectral cluster analysis are utilized to give a finer resolution to the global gain by dividing the spectrum into smaller bands and assigning a ‘gain correction factor’ to each band. By utilizing the bits saved to transmit the gain correction factors, the quantization performance can be improved, sound quality can be improved.
The spectral cluster analysis method can be applied to encoding of stereo or mutli-channel signals. For example, the invented method is applied for encoding of side-signals and the saved bits are used in principal-signal coding. This would bring subjective quality improvement because principal-signal is perceptually more important than side-signal.
Furthermore, the spectral cluster analysis (SCA) method can be applied to the codec which encodes spectral coefficients in the plural frames basis (or plural sub frames basis). In this application, the saved bits by SCA can be accumulated and utilized to encode spectral coefficients or some other parameters in the next coding stage.
Furthermore, the bits saved from spectral cluster analysis can be utilized in FEC (Frame Erasure Concealment), so that the sound quality can be retained in frame lost scenarios.
Although all of the embodiments above are explained using split multi-rate lattice vector quantization, this invention is not limited to use of split multi-rate lattice vector quantization and it can be applied to other spectral coefficients coding method. Those who are skilled in the art will be able to modify and adapt this invention without deviating from the spirit of the invention.
Also, although the decoding apparatus of the above embodiments performs processing using encoded information outputted from the encoding apparatus of the above embodiments, the present invention is not limited to this, and, even if encoded information is not transmitted from the encoding apparatus, the decoding apparatus can perform processing as long as this encoded data contains necessary parameters and data.
Also, the encoding apparatus and decoding apparatus according to the present invention can be mounted on a communication terminal apparatus and base station apparatus in a mobile communication system, so that it is possible to provide a communication terminal apparatus, base station apparatus and mobile communication system having the same operational effects as above.
Although example cases have been described with the above embodiments where the present invention is implemented by hardware, the present invention can be implemented by software in cooperation with hardware.
Also, the present invention is applicable even to a case where a signal processing program is operated after being recorded or written in a mechanically readable recording medium such as a memory, disk, tape, CD, and DVD, so that it is possible to provide the same operations and effects as in the present embodiments.
Furthermore, each function block employed in the description of each of the aforementioned embodiments may typically be implemented as an LSI constituted by an integrated circuit. These may be individual chips or partially or totally contained on a single chip. “LSI” is adopted here but this may also be referred to as “IC,” “system LSI,” “super LSI,” or “ultra LSI” depending on differing extents of integration.
Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. After LSI manufacture, utilization of an FPGA (Field Programmable Gate Array) or a reconfigurable processor where connections and settings of circuit cells in an LSI can be reconfigured is also possible.
Further, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Application of biotechnology is also possible.
The disclosure of Japanese Patent Application No. 2010-154232, filed on Jul. 6, 2010, including the specification, drawings and abstract, is incorporated herein by reference in its entirety.
The encoding apparatus, decoding apparatus and encoding and decoding methods according to the present invention are applicable to a wireless communication terminal apparatus, base station apparatus in a mobile communication system, tele-conference terminal apparatus, video conference terminal apparatus and voice over interne protocol (VoIP) terminal apparatus.
Number | Date | Country | Kind |
---|---|---|---|
2010-154232 | Jul 2010 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/003884 | 7/6/2011 | WO | 00 | 12/27/2012 |