This invention relates to methods of coding and decoding low-bit rate acoustic signals in the mobile communication system and Internet wherein acoustic signals, such as speech signals and music signals, are encoded and transmitted, and also relates to acoustic parameter coding and decoding methods and devices applied thereto, and programs for conducting these methods by a computer.
In the fields of digital mobile communication and speech storage, in order to effectively utilize radio waves and storage media, there have been used speech coding devices wherein the speech information is compressed and encoded with high efficiency. In these speech coding devices, in order to express the high-quality speech signals even at the low bit rate, there has been employed a system using a model suitable for expressing the speech signals. As a system which has been widely in actual use at the bit rates in the range of 4 kbit/s to 8 kbit/s, for example, CELP (Code Excited Linear Prediction: Code Excited Linear Prediction Coding) system can be named. The art of CELP has been disclosed in M. R. Schroeder and B. S. Atal: “Code-Excited Linear Prediction (CELP): High-quality Speech at Very Low Bit Rates”, Proc. ICASSP-85, 25.1.1, pp.937–940, 1985”.
The CELP type speech coding system is based on a speech synthetic model corresponding to a vocal tract mechanism of human being, and a filter expressed by a linear predictive coefficient indicating a vocal tract characteristics and an excitation signal for driving the filter synthesize the speech signal. More particularly, a digitalized speech signal is delimited by every certain length of a frame (about 5 ms to 50 ms) to carry out the linear prediction of the speech signal for every frame, so that a predicted residual error (excitation signal) is encoded by using an adaptive code vector formed of a known waveform and a fixed code vector. The adaptive code vector is stored in an adaptive codebook as a vector which expresses a driving sound source signal generated in the past, and is used for expressing periodic components of the speech signal. The fixed code vector is stored in a fixed codebook as a vector prepared in advance and having a predetermined number of waveforms, and the fixed code vector is used for mainly expressing aperiodic components which can not be expressed by the adaptive codebook. As the vector stored in the fixed codebook, a vector formed of a random noise sequence and a vector expressed by a combination of several pulses are used.
As a representative example of the fixed codebooks that express the fixed code vectors by the combination of several pulses, there is an algebraic fixed codebook. More specific contents of the algebraic fixed codebook are shown in “ITU-T Recommendation G. 729” and the like.
In the conventional speech coding system, the linear predictive coefficients of the speech are converted into parameters, such as partial autocorrelation (PARCOR) coefficients and line spectrum pairs (LSP: Line Spectrum Pairs, also called as line spectrum frequencies), and quantized further to be converted into the digital codes, and then they are stored or transmitted. The details of these methods are described in “Digital Speech Processing” (Tokai University Press) written by Sadaoki Furui, for example.
In the coding of the linear predictive coefficients, as a method of coding the LSP parameter, a quantized parameter of the current frame is expressed by a weighted vector in which a code vector outputted from the vector codebook in a one or more frames in the past is multiplied by a weighting coefficient selected from a weighting coefficient codebook, or a vector in which a mean vector, found in advance, of the LSP parameter in the entire speech signal is added to this vector, and a code vector which should be outputted by the vector codebook and a set of weighting coefficients that should be outputted by the weighting coefficient codebook are selected such that a distortion with respect to the LSP parameter found from an input speech in the quantized parameter, that is, the quantization distortion becomes minimum or small enough. Then, they are outputted as codes of the LSP parameter.
This is generally called a weighted vector quantization, or supposing that the weighting coefficients are considered as the predictive coefficients from the past, it is called a moving average (MA: Moving Average) prediction vector quantization.
In a decoding side, from the received vector code and the weighting coefficient code, the code vector in the current frame and the past code vector are multiplied by the weighting coefficient, or, a vector, in which the mean vector, found in advance, of the LSP parameter in the entire speech signal is added further, is outputted as a quantized vector in the current frame.
As a vector codebook that outputs the code vector in each frame, there can be structured a basic one-stage vector quantizer, a split vector quantizer wherein dimensions of the vector are divided, a multi stage vector quantizer having two or more stages, or a multi-stage and split vector quantizer in which the multi stage vector quantizer and the split vector quantizer are combined.
In the aforementioned conventional LSP parameter encoder and decoder, since the number of frames is large in a silent interval and a stationary noise interval, and in addition, since the coding process and decoding process are configured in multi stages, it was not always possible to output the vector such that the parameter synthesized in correspondence with the silent interval and the stationary noise interval can be changed smoothly. This is because of the following reasons. Normally, the vector codebook used for coding was found by learning, but since learned speeches did not contain enough amount of the silent interval or the stationary noise interval upon this learning, the vector corresponding to the silent interval or the stationary noise interval was not always reflected enough to learn, or if the number of bits given to the quantizer was small, it was impossible to design the codebook including sufficient quantized vectors corresponding to non-voice intervals.
In these LSP parameter encoder and decoder, upon coding at the time of actual communication, the quantization performance during the non-voice interval could not be fully exhibited, and a deterioration of the quality as the reproduced sound was inevitable. Also, these problems occurred not only in the coding of the acoustic parameter equivalent to the linear predictive coefficient expressing a spectrum envelope of the speech signal, but also in the similar coding with respect to a music signal.
The present invention has been made in view of the foregoing points, and an object of the invention is to provide acoustic parameter coding and decoding methods and devices, wherein outputting the vectors equivalent to the silent interval and the stationary noise interval is facilitated so that the deterioration of the quality is scarce at these intervals in the conventional coding and decoding of the acoustic parameter equivalent to the linear predictive coefficient expressing a spectrum envelope of the acoustic signal, and also to provide acoustic signal coding and decoding methods and devices using the aforementioned methods and devices, and a program for conducting these methods by a computer.
The present invention is mainly characterized in that in coding and decoding of an acoustic parameter equivalent to a linear predictive coefficient showing a spectrum envelope of an acoustic signal, that is, a parameter such as an LSP parameter, α parameter, PARCOR parameter or the like (hereinafter simply referred to as an acoustic parameter), an acoustic parameter vector code a substantially flat spectrum envelope corresponding to a silent interval or stationary noise interval, which can not originally obtained by learning by a Codebook, and added to a vector codebook, to thereby be selectable. The present invention is different from the prior art in that a vector including a component of the acoustic parameter vector showing the substantially flat spectrum envelope is obtained in advance by calculation and stored as one of the vectors of the vector codebook, and in a multi-stage quantization configuration and a split vector quantization configuration, the aforementioned code vector is outputted.
An acoustic parameter coding method according to the present invention comprises:
(a) a step of calculating an acoustic parameter equivalent to a linear predictive coefficient showing a spectrum envelope characteristic of an acoustic signal for every frame of a predetermined length of time;
(b) a step of multiplying a code vector outputted in at least one frame in the closest past selected from a vector codebook for storing a plurality of code vectors in correspondence with an index representing the code vectors and a code vector selected in a current frame respectively with a set of weighting coefficients selected from a coefficient codebook for storing one or more sets of weighting coefficients in correspondence with an index representing the weighting coefficients, wherein multiplied results are added to generate a weighted vector and a vector including a component of the weighted vector is found as a candidate of a quantized acoustic parameter with respect to the acoustic parameter of the current frame; and
(c) a step of determining the code vector of the vector codebook and the set of the weighting coefficients of the coefficient codebook by using a criterion such that a distortion of the candidate of the quantized acoustic parameter with respect to the calculated acoustic parameter becomes a minimum, wherein an index showing the determined code vector and the determined set of the weighting coefficients are determined and outputted as a quantized code of the acoustic parameter; and
the vector codebook includes a vector having a component of an acoustic parameter vector showing the aforementioned substantially flat spectrum envelope as one of the stored code vectors.
An acoustic parameter decoding method according to the present invention comprises:
(a) a step of outputting a code vector corresponding to an index expressed by a code inputted for every frame and a set of weighting coefficients from a vector codebook, which stores a plurality of code vectors of an acoustic parameter equivalent to a linear predictive coefficient showing a spectrum envelope characteristic of an acoustic signal in correspondence with an index representing the code vectors, and a coefficient codebook, which stores one or more sets of weighting coefficients in correspondence with an index representing the sets; and
(b) a step of multiplying the code vector outputted from the vector codebook in at least one frame of the closest past and a code vector outputted from the vector codebook in a current frame respectively with the outputted set of the weighting coefficients, and adding multiplied results together to thereby generate a weighted vector, wherein a vector including a component of the weighted vector is outputted as a decoded quantized vector of the current frame; and
the vector codebook includes a vector having a component of an acoustic parameter vector showing a substantially flat spectrum envelope as one of the code vectors stored therein.
An acoustic parameter coding device according to the present invention comprises:
parameter calculating means for analyzing an input acoustic signal for every frame and calculating an acoustic parameter equivalent to a linear predictive coefficient showing a spectrum envelope characteristic of the acoustic signal;
a vector codebook for storing a plurality of code vectors in correspondence with an index representing the vectors;
a coefficient codebook for storing one or more sets of weighting coefficients in correspondence with an index representing the coefficients;
quantized parameter generating means for multiplying a code vector with respect to a current frame outputted from the vector codebook and a code vector outputted in at least one frame of the closest past respectively with the set of the weighting coefficients selected from the coefficient codebook, the quantized parameter generating means adding results together to thereby generate a weighted vector, the quantized parameter generating means outputting a vector including a component of the generated weighted vector as a candidate of a quantized acoustic parameter with respect to the acoustic parameter in the current frame;
a distortion computing part for computing a distortion of the quantized acoustic parameter with respect to the acoustic parameter calculated at the parameter calculating means; and
it is configured that a codebook search controlling part for determining the code vector of the vector codebook and the set of the weighing coefficients of the coefficient codebook by using a criterion such that the distortion becomes small, the codebook search controlling part outputting indexes respectively representing the determined code vector and the set of the weighting coefficients as codes of the acoustic parameter; and
the vector codebook includes a vector having a component of an acoustic parameter vector showing a substantially flat spectrum envelope.
An acoustic parameter decoding device according to the present invention is configured to comprise:
a vector codebook for storing a plurality of code vectors of an acoustic parameter equivalent to a linear predictive coefficient showing a spectrum envelope characteristic of an acoustic signal in correspondence with an index representing the code vectors,
a coefficient codebook for storing one or more sets of weighting coefficients in correspondence with an index representing the weighting coefficients, and
quantized parameter generating means for outputting one code vector from the vector codebook in correspondence with an index showing a code inputted for every frame, to thereby output a set of weighting coefficients from the coefficient codebook, the quantized parameter generating means multiplying the code vector outputted in a current frame and a code vector outputted in at least one frame of the closest past respectively with the set of the weighting coefficients outputted in the current frame, the quantized parameter generating means adding multiplied results together to thereby generate a weighted vector and outputting a vector including a component of the generated weighted vector as a decoded quantized acoustic parameter of the current frame; and
the vector codebook stores a vector including a component of an acoustic parameter showing a substantially flat spectrum envelope as one of the code vectors.
An acoustic signal coding device for encoding an input acoustic signal according to the present invention is configured to comprise:
means for encoding a spectrum characteristic of an input acoustic signal by using the aforementioned acoustic parameter coding method;
an adaptive codebook for holding adaptive code vectors showing periodic components of the input acoustic signal therein;
a fixed codebook for storing a plurality of fixed vectors therein;
filtering means for inputting as an excitation signal a sound source vector generated based on the adaptive code vector from the adaptive codebook and the fixed vector from the fixed codebook, the filtering means synthesizing a synthesized acoustic signal by using a filter coefficient based on the quantized acoustic parameter; and
means for determining an adaptive code vector and a fixed code vector respectively selected from the adaptive codebook and the fixed codebook such that a distortion of the synthesized acoustic signal with respect to the input acoustic signal becomes small, the means outputting an adaptive code and a fixed code respectively corresponding to the determined adaptive code vector and the fixed vector.
An acoustic signal decoding device for decoding an input code and outputting an acoustic signal according to the present invention is configured to comprise:
means for decoding an acoustic parameter equivalent to a linear predictive coefficient showing a spectrum envelope characteristic from an inputted code by using the aforementioned acoustic parameter decoding method;
a fixed codebook for storing a plurality of fixed vectors therein;
an adaptive codebook for holding adaptive code vectors showing periodic components of a synthesized acoustic signal therein;
means for taking out a corresponding fixed vector from the fixed codebook and taking out a corresponding adaptive code vector from the adaptive codebook by an inputted adaptive code and an inputted fixed code, the means synthesizing the vectors and generating an excitation vector; and
filtering means for setting a filter coefficient based on the acoustic parameter and reproducing an acoustic signal by the excitation vector.
An acoustic signal coding method for encoding an input acoustic signal according to the present invention comprises:
(A) a step of encoding a spectrum characteristic of an input acoustic signal by using the aforementioned acoustic parameter coding method;
(B) a step of using as an excitation signal a sound source vector generated based on an adaptive code vector from an adaptive codebook for holding adaptive code vectors showing periodic components of an input acoustic signal therein and a fixed vector from a fixed codebook for storing a plurality of fixed vectors therein, and carrying out a synthesis filter process by a filter coefficient based on the quantized acoustic parameter to thereby generate a synthesized acoustic signal; and
(C) a step of determining an adaptive code vector and a fixed vector selected from the fixed codebook and the adaptive codebook such that a distortion of the synthesized acoustic signal with respect to the input acoustic signal becomes small, and outputting an adaptive code and a fixed code respectively corresponding to the determined adaptive code vector and the fixed vector.
An acoustic signal decoding method for decoding input codes and outputting an acoustic signal according to the present invention comprises:
(A) a step of decoding an acoustic parameter equivalent to a linear predictive coefficient showing a spectrum envelope characteristic from inputted codes by using the aforementioned acoustic parameter decoding method;
(B) a step of taking out an adaptive code vector from an adaptive codebook for holding therein adaptive code vectors showing periodic components of an input acoustic signal by an inputted adaptive code and an inputted fixed code, taking out a corresponding fixed vector from a fixed codebook for storing a plurality of fixed vectors therein, and synthesizing the adaptive code vector and the fixed vector to thereby generate an excitation vector; and
(C) a step of carrying out a synthesis filter process of the excitation vector by using a filter coefficient based on the acoustic parameter, and reproducing a synthesized acoustic signal.
The aforementioned invention can be provided in a form of a program which can be conducted in the computer.
According to the present invention, in the weighted vector quantizer (or, MA prediction vector quantizer), since a vector including a component of an acoustic parameter vector showing a substantially flat spectrum is found and stored as the code vector of the vector codebook, a quantized vector equivalent to the corresponding silent interval or the stationary noise interval can be outputted.
Also, according to another embodiment of the invention, as a configuration of a vector codebook comprised in the acoustic parameter coding device and decoding device, in the case of using a multi-stage vector codebook, a vector including a component of an acoustic parameter vector showing a substantially spectrum envelope is stored a codebook of one stage thereof, and a zero vector is stored in the codebooks of the other stages. Accordingly, an acoustic parameter equivalent to a corresponding silent interval or stationary noise interval can be outputted.
It is not always necessary to store the zero vector. In the case of not storing the zero vector, when the vector including the component of the acoustic parameter vector showing the substantially flat spectrum envelope from a codebook of one stage is selected, it will suffice that the vector including the component of the acoustic parameter vector showing the substantially flat spectrum envelope is outputted as a candidate of the code vector of the current frame.
Also, in the case that the vector codebook is formed of a split vector codebook, there are used a plurality of split vectors in which dimensions of vectors including a component of an acoustic parameter vector showing a substantially flat spectrum envelope are divided, and by divisionally storing these split vectors one by one in a plurality of split vector codebooks, respectively, when searching in the respective split vector codebooks, the respective split vectors are selected, and a vector by integrating these split vectors can be outputted as a quantized vector equivalent to the corresponding silent interval or the stationary noise interval.
Furthermore, the vector quantizer may be formed to have the multi-stage and split quantization configuration, and by combining the arts of the aforementioned multi-stage vector quantization configuration and the split vector quantization configuration, there can be outputted as the quantized vector equivalent to the acoustic parameter in correspondence with the corresponding silent interval or the stationary noise interval.
In the case that the codebook is structured as the multi-stage configuration, in correspondence with respective code vectors of the codebook at the first stage, scaling coefficients respectively corresponding to the codebooks on and after the second stage are provided as the scaling coefficient codebook. The scaling coefficients corresponding to the code vector selected at the codebook of the first stage are read out from the respective scaling coefficient codebooks, and multiplied with code vectors respectively selected from the codebook of the second stage, so that the coding with much smaller distortion of the quantization can be achieved.
As described above, the acoustic parameter coding and decoding methods and the devices in which the quality deterioration is scarce in the aforementioned interval, that is, the object of the invention, can be provided.
In the acoustic signal coding device of the invention, in the quantization of the linear predictive coefficient, any one of the aforementioned parameter coding devices is used in an acoustic parameter area equivalent to the linear predictive coefficient. According to this configuration, the same operation and effects as those of the aforementioned one can be obtained.
In the acoustic signal decoding device of the invention, in decoding of the linear predictive coefficient, any one of the aforementioned parameter coding devices is used in the acoustic parameter area equivalent to the linear predictive coefficient. According to this configuration, the same operation and effects as those of the aforementioned one can be obtained.
Next, embodiments of the invention will be explained with reference to the drawings.
f(n)=(f1(n), f2(n), . . . , fp(n)) (1)
Here, the integer n indicates a certain frame number n, and hereinafter, the frame of this number is referred to as a frame n.
The codebook 14 is provided with a vector codebook 14A, which stores n code vectors representing LSP parameter vectors found by learning, and a coefficient codebook 14B which stores a set of K weighting coefficients, and by an index Ix(n) for specifying the code vector and an index Iw(n) for specifying the weighting coefficient code, a corresponding code vector x(n) and a set of weighting coefficients (w0, w1, . . . , wm) are outputted. The quantized parameter generating part 15 is formed of m pieces of buffer parts 15B1, . . . , 15Bm, which are connected in series; m+1 pieces of multipliers 15A0, 15A1, . . . , 15Am, a register 15C, and a vector adder 15D. The code vector x(n) in the current frame n which is selected as one of the candidates from the vector codebook 14A and code vectors x(n−1), . . . , x(n−m) which are determined with respect to the past frame n−1, . . . , n−m are respectively multiplied by a set of the selected weighting coefficients w0, . . . , wm at the multipliers 15A0, 15Am, and the results of multiplications are added together at the adder 15D. Further, a mean vector yave, found in advance, of the LSP parameter in the entire speech signal is added to the adder 15D from the register 15C. As described above, from the adder 15D, a candidate of the quantized vector, that is, a candidate y(n) of the LSP parameter, is generated. As the mean vector yave, a mean vector at a voice part may be used, or a zero vector may be used as described later.
When the code vector x(n) selected from the vector codebook 14A with respect to the current frame n is substituted as
x(n)=(x1(n), x2(n), . . . , xp(n)) (2)
and then, similarly, the code vector determined one frame before is substituted as x(n−1); the code vector determined two frame before is substituted as x(n−2); and the code vector determined m frame before is substituted as x(n−m); a quantized vector candidate of the current frame, that is,
y(n)=(y1(n), y2(n), . . . , yp(n)) (3)
is expressed as follows:
y(n)=w0·x(n)+Σj=1mwj·x(n−j)+yave (4)
Here, the larger a value of m is, the better the quantization efficiency is. However, the effect at the occurrence of a code error extends to portions after the m frame, and in addition, in case the coded and stored speech is reproduced from the middle thereof, it is necessary to go back to the m frame past. Therefore, m is adequately selected as occasion demands. For speech communication, in case of the one frame 20 ms, the value of m is sufficient if it is 6 or more, and even the value 1 to 3 may suffice. The number m is also called as the order of the moving average prediction.
The candidate y(n) of the quantization obtained as described above is sent to the distortion computing part 16, and the quantization distortion with respect to the LSP parameter f(n) calculated at the LSP parameter calculating part 13 is computed. The distortion d is defined by the weighted Euclidean distance as follows.
d=Σi=1pri(fi(n)−yi(n))2 (5)
Incidentally, ri, i=1, . . . , p are weighting coefficients found by the LSP parameter f(n), and if they are set to the weighting so as to stress on and around the formant frequency of the spectrum, the performance becomes excellent.
In the codebook search control part 17, pairs of the indexes Ix(n) and Iw(n) given to the codebook 14 are sequentially changed, and the calculation of the distortion d by the equation (5) as described above are repeated with regard to the respective pairs of the indexes, so that from the code vector of the vector codebook 14A and the set of the weighting coefficients of the vector codebook 14A in the codebook 14, the one pair thereof making the distortion d as the output from the distortion computing part 16 to be the smallest or small enough is searched, and these indexes Ix(n) and Iw(n) are sent out as the codes of the input LSP parameter from a terminal T2. The codes Ix(n) and Iw(n) sent out from the terminal T2 are sent to a decoder via a transmission channel, or stored in a memory.
When the output code vector x(n) of the current frame is determined, the code vectors x(n−j), j=1, . . . , m−1 in the buffer part 15Bj of the past frame (n−j) are sequentially sent to the next buffer part 15Bj+1, and the code vector x(n) of the current frame n is inputted into the buffer 15B1.
The invention is characterized in that as one of the code vectors stored the vector codebook 14A used in the coding by the weighted vector quantization of the LSP parameter described above or the moving average vector quantization, in case the mean vector yave is zero, the LSP parameter vector F corresponding to the silent interval or stationary noise interval is stored, or in case yave is not zero, a vector C0 found by subtracting yave from the LSP parameter vector F is stored. Namely, in case yave is not zero, the LSP parameter vector corresponding to the silent interval or the stationary noise interval constitutes:
F=(F1,F2, . . . , Fp) (6)
and the code vector C0 which should be stored in the vector codebook 14A in
C0F−yave (7)
In the coding by the moving average prediction at the silent interval or the stationary noise interval, when the C0 is selected consecutively throughout m frames, the quantized vector y(n) is found as follows:
Here, supposing that the sum of the weighting coefficients from w0 to wm is 1 or the value close thereto, y(n) can be outputted as the quantized vector F found from the LSP parameter at the silent interval or the vector close thereto, so that the coding performance at the silent interval or the stationary noise interval can be improved. By the configuration as described above, the vector including the component of the vector F is stored as one of the code vectors in the vector codebook 14A. As the code vector including the component of the vector F, in case the quantized parameter generating part 15 generates the quantized vector y(n) including the component of the mean vector yave, the one found by subtracting the mean vector yave from the vector F is used, and in case quantized parameter generating part 15 generates the quantized vector y(n) that does not include the component of the mean vector yave, the vector F itself is used.
In the present invention, also in the decoding device, as in the coding device shown in
In case the mean vector yave is not added at the adder 15D in
In
There are several methods for finding the vector C0. As one of them, since the spectrum envelope of the input acoustic signal normally becomes flat at the silent interval or the stationary noise interval, in the case of p-dimensional LSP parameter vector F, for example, 0 to π are divided equally by p+1, and p values having the substantially equal interval in size, such as π/(1+p), 2π/(1+p), . . . , π/(1+p), may be used as the LSP parameter vector. Alternatively, from the actual LSP parameter vector F at the silent interval and the stationary noise interval, it can be found by C0=F−yave. Or, the LSP parameter in the case of inputting the white noise or Hoth noise may be used as the parameter vector F, to find C0=F−yave. Incidentally, in general, the mean vector yave of the LSP parameter among the entire speech signal is found as a mean vector of all of the vectors for learning when the code vector x of the vector codebook 41 is learned.
The following Table 1 show examples of the ten-dimensional vectors C0, yave, and F wherein the LSP parameters at the silent interval or the stationary noise interval are normalized between 0 to π when p=10 dimensional LSP parameters are used as the acoustic parameters.
The vector F is the example of the code vector of the LSP parameter representing the silent interval and the stationary noise interval written into the codebook according to the present invention. Values of the elements of this vector are increased at substantially constant interval, and this means that the frequency spectrum is substantially flat.
Firstly, when the index Ix(n) specifying the code vector is inputted, the index Ix(n) is analyzed at a code analysis part 43, to thereby obtain an index Ix(n)1 specifying the code vector at the first stage and an index Ix(n)2 specifying the code vector at the second stage. Then, i-th and i′-th code vectors x1i and x2i′ respectively corresponding to the indexes Ix(n)1 and Ix(n)2 of the respective stages are read out from the first-stage codebook 41 and the second-stage codebook 42, and the code vectors are added together at an adding part 44, to thereby output the added result as a code vector x(n).
In the case of the two-stage structure vector codebook, the code vector search is carried out by using only the first-stage codebook 41 for a predetermined number of candidate code vectors sequentially starting from the one having the smallest quantization distortion. This search is conducted by a combination with the set of the weighting coefficients of the coefficients codebook 14B shown in
In case the code vector is searched by prioritizing the first-stage codebook 41 as described above, the code vector C0 (or F) is prestored as one of the code vectors in the first-stage codebook 41 of the multi stage vector codebook 4A, as well as the zero vector z is prestored as one of the code vectors in the second stage codebook 42. Accordingly, in case the code vector C0 is selected from the codebook 41, the zero vector z is selected from the codebook 42. As a result, the present invention achieves the structure in which the code vector C0 in the case of corresponding to the silent interval or the stationary noise interval can be outputted as the output of the codebook 4A from the adder 44. It may be structured such that in case the zero vector z is not stored and the code vector C0 is selected from the codebook 41, the selection and addition from the codebook 42 are not conducted.
In case the search is conducted for all of the combinations of the respective code vectors in the first-stage codebook 41 and the respective code vectors in the second-stage codebook, the code vector C0 and the zero vector z may be stored in either of the codebooks as long as they are stored in the separate codebooks from each other. It is highly possible that the code vector C0 and the zero vector z are selected at the same time in the silent interval or the stationary noise interval, but they may not be always selected simultaneously in relation to the computing error and the like. In the codebooks of the respective stages, the code vector C0 or the zero vector z becomes a choice for selection as same as the other code vectors.
The zero vector may not be stored in the second-stage codebook 42. In this case, if the vector C0 is selected from the first-stage codebook 41, the selection of the code vector from the second-stage codebook 42 is not conducted, and it will suffice that the code C0 of the codebook 41 is outputted as it is from the adder 44.
By forming the codebook 4A by the multi stage codebook as shown in
Firstly, when the index Ix(n) specifying the code index is inputted, the index Ix(n) is analyzed at the code analysis part 43, so that the index Ix(n)1 specifying the code vector of the first stage and the Ix(n)2 specifying the code vector of the second stage are obtained. The code vector x1i corresponding to Ix(n)1 is read out from the first-stage codebook 41. Also, from the scaling coefficient codebook 45, the scaling coefficient si corresponding to the read index Ix(n)1. Next, the code vector x2i′ corresponding to the Ix(n)2 is read out from the second-stage codebook 42, and in a multiplier 46, the scaling coefficient si is multiplied by the code vector x2i′ from the second-stage codebook 42. The vector obtained by the multiplication and the code vector x1i from the first-stage codebook 41 are added together at the adding part 44, and the added result is outputted as the code vector x(n) from the codebook 4A.
Also, in this embodiment, upon searching the code vector, firstly only the first-stage codebook 41 is used to search a predetermined number of the candidate code vectors sequentially starting from the one having the smallest quantization distortion. Then, regarding combinations of the respective candidate code vectors and the respective code vectors of the second codebook 42, a combination thereof having the smallest quantization distortion is searched. In this case, with respect to the multi stage vector codebook 4A with the scaling coefficients, the vector C0 is prestored as one cod vector in the first-stage codebook 41, and the zero vector z is prestored as one of the code vectors in the second-stage codebook 42 as well. Similarly to the case in
As described above, the code vector in case of corresponding to the silent interval or the stationary noise interval can be outputted. Although it is highly possible that the code vector C0 and the zero vector z are selected at the same time in the silent interval or the stationary noise interval, they may not be always selected simultaneously in relation to the computing error and the like. In the codebooks of the respective stages, the code vector C0 or the zero vector z becomes a choice for selection as same as the other code vectors. As in the embodiment of
The codebook 4A includes a low-order vector codebook 41L storing N pieces of low-order code vectors xL1, . . . , xLN, and a high-order vector codebook 41H storing N′ pieces of high-order code vectors xH1, . . . , xHN′. Supposing the output code vector is x(n), in the low-order and high-order codebooks 41L and 41H, 1 to k- orders are defined as the low order and k+1to p-orders are defined as the high order among p-order, so that the codebooks are respectively formed of the vectors in the respective numbers of the dimensions. Namely, i-th vector of the low-order codebook 41L is expressed by:
xLi=(xLi1,xLi2, . . . , xLik) (9)
and i′-th vector of the high-order vector codebook 41H is expressed by:
xHi′=(xHi′k+1, xHi′k+2, . . . , xHi′p) (10)
The inputted index Ix(n) is divided into Ix(n)L and Ix(n)H, and corresponding to these Ix(n)L and Ix(n)H, the low-order and high-order split vectors xLi and xHi′ are respectively selected from the respective codebooks 41L and 41H, and these split vectors xLi and xHi′ are integrated at an integrating part 47, to thereby generate the output code vector x(n). In other words, supposing that the code vector outputted from the integrating part 47 is x(n),
x(n)=(xLi1, xLi2, . . . , xLik|xHi′k+1, xHi′k+2, . . . , xHi′p) (11)
is expressed.
In this embodiment, a low-order vector C0L of the vector C0 is stored as one of the vectors of the low-order codebook 41L, and a high-order vector C0H of the vector C0 is stored as one of the vectors of the high-order codebook 41H. As described above, there is achieved a structure which can output the following as the code vector in case of corresponding to the silent interval or the stationary noise interval:
C0=(C0L|C0H) (12)
Furthermore, depending on the case, the vector may be outputted as a combination of C0L and the other high-order vector, or a combination of the other low-order vector and C0H. If the split vector codebooks 41L and 41H are provided as shown in
The first-stage codebook 41 N pieces of code vectors x11, . . . , x1N, a second-stage low-order codebook 42L stores N′ pieces of low-order code vectors x2L1, . . . , x2LN′, and a second-stage high-order codebook 42H stores N″ pieces of high-order code vectors x2H1, . . . , x2HN″.
In a code analysis part 431, the inputted index Ix(n) is analyzed into an index Ix(n)1 specifying the first-stage code vector, and an index Ix(n)2 specifying the second-stage code vector. Then, i-th code vector x1i corresponding to the first-stage index Ix(n)i is read out from the first-stage codebook 41. Also, the second-stage index Ix(n)2 is analyzed into Ix(n)2L and Ix(n)2H, and by Ix(n)2L and Ix(n)2H, the respective i′-th and i″-th split vectors x2Li′ and x2Hi″ of the second-stage low-order split vector codebook 42L and the second-stage high-order split vector codebook 42H are selected, and these selected split vectors are integrated at the integrating part 47, to thereby generate the second-stage code vector x2i′i″. At the adding part 44, the first-stage code vector x1i and the second-stage integrated vector x2i′i″ are added together, to be outputted as the code vector x(n).
In this embodiment, as in the embodiments of
In case they are not stored, the selection and addition from the codebooks 42L and 42H are not carried out at the time of selecting the vector C0.
At an analysis part 431, the inputted index Ix(n) is analyzed into the index Ix(n)1 specifying the first-stage code vector and the index Ix(n)2 specifying the second-stage code vector. Firstly, the code vector x1i corresponding to index Ix(n)1 is obtained from the first-stage codebook 41. Also, in correspondence with the index Ix(n)1, a low-order scaling coefficient SLi and a high-order scaling coefficient SHi are respectively read out from the low-order scaling coefficient codebook 45L and the high-order scaling coefficient codebook 45H. Then, the index Ix(n)2 is analyzed into an index Ix(n)2L and an index Ix(n)2H at an analysis part 432, and respective split vectors x2Li′ and x2Hi″ of the second-stage low-order split vector codebook 42L and the second-stage high-order split vector codebook 42H are selected by these indexes Ix(n)2L and Ix(n)2H. These selected split vectors are multiplied by the low-order and high-order scaling coefficients SLi and SHi at multipliers 46L and 46H, and the obtained multiplied vectors are integrated at an integrating part 47, to thereby generate a second-stage code vector x2i′i″. The first-stage code vector x1i and the second-stage integrated vector x2i′i″, are added together at the adder 44, and the added result is outputted as the code vector x(n).
In the multi-stage and split vector codebook 4A with scaling coefficients of the embodiment, the vector C0 is stored as one of the code vectors in the first-stage codebook 41, and the split zero vectors ZL and ZH are respectively stored as the split vectors in the low-order split vector codebook 42L and the high-order split vector codebook 42H of the second-stage split vector codebook as well. Accordingly, there is achieved a configuration of outputting the code vector in the case of corresponding to the silent interval or the stationary noise interval. The number of the stages of the codebook may be three or more. In this case, two or more stages subsequent to the second-stage can be respectively formed of the split vector codebooks. Also, in either case, it is not limited to the number of the split vector codebooks per stage.
At the code analysis part 43, the inputted index Ix(n) is analyzed into the index Ix(n)1 specifying the first-stage code vector and the index Ix(n)2 specifying the second-stage code vector. Respective i-th and i′th split vectors x1Li and x1Hi′ of the first-stage split vector codebook 41L and the first-stage high-order codebook 41H are selected as vectors corresponding to the first-stage index Ix(n)1, and the selected vectors are integrated at an integrating part 471, to thereby generate a first-stage integrated vector x1ii′.
Also, similarly to the first stage, regarding the second-stage index Ix(n)2, respective i″-th and i′″th split vectors x2Li″ and x2Hi′″ of the second-stage split vector codebook 42L and the second-stage high-order codebook 42H are selected, and the selected vectors are integrated at an integrating part 472, to thereby generate a second-stage integrated vector x2i″i′″. At the adding part 44, the first-stage integrated vector x1ii′ and the second-stage integrated vector x2i″i′″ are added together, and the added result is outputted as the code vector x(n).
In this embodiment, similarly to the configuration of the split vector codebook of
A speech signal 101 is converted into an electric signal by an input device 102, and outputted to an A/D converter 103. The A/D converter converts the (analog) signal outputted from the input device 102 into a digital signal, and output it to a speech coding device 104. The speech coding device 104 encodes the digital speech signal outputted from the A/D converter 103 by using a speech coding method, described later, and outputs the encoded information to an RF modulator 105. The RF modulator 105 converts the speech encoded information outputted from the speech coding device 104 into a signal to be sent out by being placed on a propagation medium, such as a radio wave, and outputs the signal to a transmitting antenna 106. The transmitting antenna 106 transmits the output signal outputted from the RF modulator 105 as the radio wave (RF signal) 107. The foregoing is the configuration and operations of the speech signal transmission device.
The transmitted radio wave (RF signal) 108 is received by a receiving antenna 109, and outputted to an RF demodulator 110.
Incidentally, the radio wave (RF signal) 108 in the figure constitutes the radio wave (RF signal) 107 as seen from the receiving side, and if there is no damping of signal or superposition of the noise in the propagation channel, the radio wave 108 constitutes the exactly same one as the radio wave (RF signal) 107. The RF demodulator 110 demodulates the speech encoded information from the RF signal outputted from the receiving antenna 109, and outputs the same to a speech decoding device 111. The speech decoding device 111 decodes the speech signal from the speech encoded information by using the speech decoding method, described later, and outputs the same to a D/A converter 112. The D/A converter 112 converts the digital speech signal outputted from the speech decoding device 111 into an analog electric signal and output it to an output device 113. The output device 113 converts the electric signal into vibration of air, and outputs as a sound wave 114 so that the human being can hear by ears. The foregoing is the configuration and operations of the speech signal receiving device.
By having at least one of the aforementioned speech signal transmission device and receiving device, a base station and mobile terminal device in the mobile communication system can be structured.
The aforementioned speech signal transmission device is characterized in the speech coding device 104.
An input speech signal constitutes the signal outputted from the A/D converter 103 in
The linear predictive coefficient (LPC) outputted from the LPC analysis part 201 is converted into the LSP parameter at the LSP parameter calculating part 13, and the obtained LSP parameter is encoded at the parameter coding part 10 as explained with reference to
The adder 204 calculates an error signal ε between the aforementioned Xin and the aforementioned synthesized signal, and outputs the same to a perceptual weighting part 211. The perceptual weighting part 211 conducts the perceptual weighting with respect to the error signal ε outputted from the adder 204, and calculates a distortion of the synthesized signal with respect to Xin in a perceptual weighting area, to thereby output it to the parameter determining part 212. The parameter determining part 212 determines the signals that should be generated by an adaptive codebook 205, a fixed codebook 207 and a quantized gain generating part 206 such that the coding distortion outputted from the perceptual weighting part 211 becomes a minimum. Incidentally, not only minimizing the coding distortion outputted from the perceptual weighting part 211, but also using a method of minimizing another coding distortion by using the aforementioned Xin, to thereby determine the signal generated from the aforementioned three means, the coding performance can be further improved.
The adaptive codebook 205 conducted buffering of the sound source signal of the preceding frame n−1, that was outputted from the adder 210 in the past when the distortion was minimized, and cuts out the sound vector from a position specified by an adaptive vector code A thereof outputted from the parameter determining part 212, to thereby repeatedly concatenate the same until it becomes the length of one frame, resulting in generating the adaptive vector including a desired periodic component and outputting the same to a multiplier 208. In the fixed codebook 207, a plurality of fixed vectors each having the length of one frame are stored in correspondence with the fixed vector codes, and outputs a fixed vector, which has a form specified by a fixed vector code F outputted from the parameter determining part 212, to a multiplier 209.
The quantized gain generating part 206 respectively provides the multipliers 208 and 209 with an adaptive vector, that is specified by a gain code G outputted from the parameter determining part 212, a quantized adaptive vector gain gA and a quantized adaptive vector gain gF with respect to the fixed vector. In the multiplier 208, the quantized adaptive vector gain gA outputted from the quantized gain generating part 206 is multiplied by the adaptive vector outputted from the adaptive codebook 205, and the multiplied result is outputted to the adder 210. In the multiplier 209, the quantized fixed vector gain gF outputted from the quantized gain generating part 206 is multiplied by the fixed vector outputted from the fixed codebook 207, and the multiplied result is outputted to the adder 210.
In the adder 210, the adaptive vector and the fixed vector after multiplying with the gains are added together, and the added result is outputted to the synthesis filter 203 and the adaptive codebook 205. Finally, in the multiplexing part 213, the code L indicating the quantized LPC is inputted from the LPC quantization part 202; the adaptive vector code A indicating the adaptive vector, the fixed vector code F indicating the fixed vector, and the gain code G indicating the quantized gains are inputted from the parameter determining part 212; and these codes are multiplexed to be outputted as the encoded information to the transmission path.
In the figure, regarding the encoded information outputted from the RF demodulator 110, the multiplexed encoded information is separated by a demultiplexing part 1301 into individual codes L, A, F and G. The separated LPC code L is given to an LPC decoding part 1302; the separated adaptive vector code A is given to an adaptive codebook 1305; the separated gain code G is given to a quantized gain generating part 1306; and the separated fixed vector code F is given to a fixed codebook 1307. The LPC decoding part 1302 is formed of a decoding part 1302A configured as same as that of
The adaptive codebook 1305 takes out an adaptive vector from a position specified by the adaptive vector code A outputted from the demultiplexing part 1301, and outputs the same to a multiplier 1308. The fixed codebook 1307 generates a fixed vector specified by the fixed vector code F outputted from the demultiplexing part 1301, and outputs the same to a multiplier 1309. The quantized gain generating part 1306 decodes the adaptive vector gain gA and the fixed vector gain gF, which are specified by the gain code G outputted from the demultiplexing part 1301, and respectively output them to the multipliers 1308 and 1309. In the multiplier 1308, the adaptive code vector is multiplied by the aforementioned adaptive code vector gain gA, and the multiplied result is outputted to an adder 1310. In the multiplier 1309, the fixed code vector is multiplied by the aforementioned fixed code vector gain gF, and the multiplied result is outputted to the adder 1310. In the adder 1310, the adaptive vector and the fixed vector, which are outputted from the multipliers 1308 and 1309 after multiplying with the gains, are added together, and the added result is outputted to the synthesis filter 1303. In the synthesis filter 1303, by having the vector outputted from the adder 1310 as a drive sound source signal, the filter synthesis is conducted by using a filter coefficient decoded by the LPC decoding part 1302, and the synthesized signal is outputted to a postprocessing part 1304. The postprocessing part 1304 conducts a process for improving a subjective quality of the speech, such as formant emphasis or pitch emphasis, or conducts a process for improving a subjective quality of the stationary noise, and thereafter outputs as a final decoded speech signal.
Although the LSP parameter is used as the parameter equivalent to the linear predictive coefficient indicating the spectrum envelope in the aforementioned description, other parameters, such as α parameter, PARCOR coefficient and the like, can be used. In the case of using these parameters, since the spectrum envelope also becomes flat in the silent interval or the stationary noise interval, the computation of the parameter at these intervals can be conducted easily, and in the case of p-order α parameter, for example, it will suffice that 0-order is 1.0 and 1- to p-order is 0.0. Even in the case of using other acoustic parameters, a vector of the acoustic parameter determined to indicate substantially flat spectrum envelope will suffice. Incidentally, the LSP parameter is practical since the quantization efficiency thereof is good.
In the foregoing description, in the case that the vector codebook is structured as the multi-stage configuration, the vector C0 may be expressed by two synthesis vectors, for example, C0=C01+C02, and C01 and C02 may be stored in the codebooks of the different stages from each other.
Furthermore, the present invention is applied not only to coding and decoding of the speech signal, but also to coding and decoding of general acoustic signal, such as a music signal.
Also, the device of the invention can carry out coding and decoding of the acoustic signal by running the program by the computer.
The computer which carries out the present invention is formed of a modem 410 connected to a communication network; an input and output interface 420 for inputting and outputting the acoustic signal; a buffer memory 430 for temporarily storing a digital acoustic signal or the acoustic signal; a random access memory (RAM) 440 for carrying out the coding and decoding processes therein; a central processing unit (CPU) 450 for controlling the input and output of the data and program execution; a hard disk 460 in which the coding and decoding program is stored; and a drive 470 for driving a record medium 470M. These components are connected by a common bus 480.
As the record medium 470M, there can be used any kinds of record media, such as a compact disc CD, a digital video disc DVD, a magneto-optical disk MO, a memory card, and the like. In the hard disk 460, there is stored the program in which the coding method and the decoding method conducted in the acoustic signal coding device and decoding device of
In the case of encoding the input acoustic signal, CPU 450 loads an acoustic signal coding program from the hard disk 460 into RAM 440; the acoustic signal imported into the buffer memory 430 is encoded by conducting the process per frame in RAM 440 in accordance with the coding program; and obtained code is send out as the encoded acoustic signal data via the modem 410, for example, to the communication network. Alternatively, the data is temporarily saved in the hard disk 460. Or, the data is written on the record medium 470M by the record medium drive 470.
In the case of decoding the input encoded acoustic signal, CPU 450 loads a decoding program from the hard disk 460 into RAM 440. Then, the acoustic code data is downloaded to the buffer memory 430 via the modem 410 from the communication network, or loaded to the buffer memory 430 from the record medium 470M by the drive 470. CPU 450 processes the acoustic code data per frame in RAM 440 in accordance with the decoding program, and obtained acoustic signal data is outputted from the input and output interface 420.
As described above, according to the present invention, in coding wherein the parameter equivalent to the linear predictive coefficient is quantized by the weighted sum of the code vector of the current frame and the code vector outputted in the past, or the vector in which the above sum and mean vector found in advance are added together, as the vector stored in the vector codebook, the parameter vector corresponding to the silent interval or the stationary noise interval, or a vector in which the aforementioned mean vector is subtracted from the parameter vector is selected as the code vector, and the code thereof can be outputted. Therefore, there can be provided the coding and decoding methods and the devices thereof in which the quality deterioration in these intervals is scarce.
Number | Date | Country | Kind |
---|---|---|---|
2000-359311 | Nov 2000 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP01/10332 | 11/27/2001 | WO | 00 | 5/27/2003 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO02/43052 | 5/30/2002 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4896361 | Gerson | Jan 1990 | A |
5271089 | Ozawa | Dec 1993 | A |
5323486 | Taniguchi et al. | Jun 1994 | A |
5396576 | Miki et al. | Mar 1995 | A |
5487128 | Ozawa | Jan 1996 | A |
5717824 | Chhatwal | Feb 1998 | A |
5727122 | Hosoda et al. | Mar 1998 | A |
5799131 | Taniguchi et al. | Aug 1998 | A |
5819213 | Oshikiri et al. | Oct 1998 | A |
20050096902 | Kondo et al. | May 2005 | A1 |
Number | Date | Country |
---|---|---|
0 967 594 | Dec 1999 | EP |
5-73097 | Mar 1993 | JP |
5-113800 | May 1993 | JP |
6-118999 | Apr 1994 | JP |
6-175695 | Jun 1994 | JP |
6-282298 | Oct 1994 | JP |
8-44400 | Feb 1996 | JP |
11-136133 | May 1999 | JP |
WO 0011650 | Mar 2000 | WO |
Number | Date | Country | |
---|---|---|---|
20040023677 A1 | Feb 2004 | US |