The present invention relates to a method and apparatus for encoding a speech signal.
In order to increase compressibility of a speech signal, linear prediction, an adaptive codebook and a fixed codebook search technique may be used.
An object of the present invention is to minimize spectrum quantization error in encoding a speech signal.
The object of the present invention can be achieved by providing a method of encoding a speech signal including extracting candidates which may be used as an optimal spectrum vector with respect to a speech signal according to first best information.
In another aspect of the present invention, there is provided a method of encoding a speech signal including extracting candidates which may be used as an optimal adaptive codebook with respect to a speech signal according to second best information.
In another aspect of the present invention, there is provided a method of encoding a speech signal including extracting candidates which may be used as an optimal fixed codebook with respect to a speech signal according to third best information.
According to the embodiments of the present invention, a method of encoding a speech signal based on best information is a method of extracting candidates of an optimal coding parameter and determining an optimal coding parameter through a search process of combining all coding parameters. It is possible to obtain an optimal parameter for minimizing quantization error as compared to the step-by-step optimization scheme and to improve quality of a synthesized speech signal. In addition, the present invention is compatible with conventional various speech encoding technologies.
According to the present invention, there is provided a method of encoding a speech signal, the method including acquiring a linear prediction filter coefficient of a current frame from an input signal using linear prediction, acquiring a quantized spectrum candidate vector of the current frame corresponding to the linear prediction filter coefficient of the current frame based on first best information, and interpolating the quantized spectrum candidate vector of the current frame and a quantized spectrum vector of a previous frame.
The first best information may be information about the number of codebook indexes extracted in frame units.
The acquiring the quantized spectrum candidate vector may include transforming the linear prediction filter coefficient of the current frame into a spectrum vector of the current frame, calculating error between the spectrum vector of the current frame and a codebook of the current frame, and extracting codebook indexes of the current frame in consideration of the error and the first best information.
The method may further include calculating error between the spectrum vector and codebook of the current frame and aligning the quantized code vectors or codebook indexes in ascending order of error.
The codebook indexes of the current frame may be extracted in ascending order of error between the spectrum vector and codebook of the current frame.
The quantized code vectors corresponding to the codebook indexes may be quantized immitance spectrum frequency candidate vectors of the current frame.
According to the present invention, there is provided an apparatus for encoding a speech signal, the apparatus including a linear prediction analyzer 200 configured to acquire a linear prediction filter coefficient of a current frame from an input signal using linear prediction, and a quantization unit 210 configured to acquire a quantized spectrum candidate vector of the current frame corresponding to the linear prediction filter coefficient of the current frame based on first best information and to interpolate the quantized spectrum candidate vector of the current frame and a quantized spectrum vector of a previous frame.
The first best information may be information about the number of codebook indexes extracted in frame units.
The quantization unit 210 configured to acquire the quantized spectrum frequency candidate vector may transform the linear prediction filter coefficient of the current frame into a spectrum vector of the current frame, measure error between the spectrum vector of the current frame and a codebook of the current frame, and extract codebook indexes in consideration of the error and the first best information, and the codebook of the current frame may include quantized code vectors and codebook indexes corresponding to the quantized code vectors.
The quantization unit 210 may calculate error between the spectrum vector and codebook of the current frame and align the quantized code vectors or the codebook indexes in ascending order of error.
The codebook indexes of the current frame may be extracted in ascending order of error between the spectrum vector and codebook of the current frame.
The quantized code vectors corresponding to the codebook indexes may be quantized immitance spectrum frequency candidate vectors of the current frame.
An analysis-by-synthesis method refers to a method of comparing a signal synthesized via a speech encoder and an original input signal and determining an optimal coding parameter of the speech encoder. That is, mean square error is not measured in an excitation signal generation step, but is measured in a synthesis step, thereby determining the optimal coding parameter. This method may be called a closed-circuit search method.
Referring to
The excitation signal generator 100 may obtain a residual signal according to long-term prediction and finally model a component having no correlation into a fixed codebook. In this case, an algebraic codebook which is a method of encoding a pulse position having a fixed size within a subframe may be used. A transfer rate may be changed according to the number of pulses and a codebook memory can be conserved.
The long-term synthesis filter 110 serves to generate long-term correlation, which is physically associated with a pitch excitation signal. The long-term synthesis filter 110 may be implemented using a delay value D and a gain value gp acquired through long-term prediction or pitch analysis, for example, as shown in Equation 1.
The short-term synthesis filter 120 models short-term correlation within an input signal. The short-term synthesis filter 120 may be implemented using a linear prediction filter coefficient acquired via linear prediction, for example, as shown in Equation 2.
In Equation 2, ai denotes an i-th linear prediction filter coefficient and p denotes filter order. The linear prediction filter coefficient may be acquired in a process of minimizing linear prediction error. A covariance method, an autocorrelation method, a lattice filter, a Levinson-Durbin algorithm, etc. may be used.
The weighting filter 130 may adjust noise according to an energy level of an input signal. For example, the weighting filter may weight noise in a formant of an input signal and lower noise in a signal with relatively low energy. The generally used weighting filter is expressed by Equation 3 and γ1=0.94 and γ2=0.6 are used in case of an ITU-T G.729 codec.
The analysis-by-synthesis method may perform closed-circuit search to minimize error between an original input signal s(n) and a synthesis signal ŝ(n) so as to acquire an optimal coding parameter. The coding parameter may include an index of a fixed codebook, a delay value and gain value of an adaptive codebook, and a linear prediction filter coefficient.
The analysis-by-synthesis method may be implemented using various coding methods based on a method of modeling an excitation signal. Hereinafter, a CELP type speech encoder will be described as a method of modeling an excitation signal. However, the present invention is not limited thereto and the same technical spirit is applicable to a multi-pulse excitation method and an Algebraic CELP (ACELP) method.
Referring to
A speech encoder divides an excitation signal into an adaptive codebook and a fixed codebook and analyzes the codebooks in order to model the excitation signal corresponding to a residual signal of linear prediction analysis. Modeling may be performed as shown in
u(n)={circumflex over (g)}pv(n)+{circumflex over (g)}cĉ(n), for n=0, . . . , Ns−1 equation 4
The excitation signal u(n) may be expressed by an adaptive codebook v(n), an adaptive codebook gain value ĝp, a fixed codebook ĉ(n) and a fixed codebook gain value ĝc.
Referring to
Next, a delay value and gain value of an adaptive codebook corresponding to a pitch may be obtained by a process of minimizing the mean square error (MSE) of a zero state response (ZSR) of the weighting synthesis filter 310 by an adaptive codebook 320 and the target signal of the adaptive codebook. The adaptive codebook 320 may be generated by a long-term synthesis filter 120. The long-term synthesis filter may use an optimal delay value and gain value for minimizing error between a signal passing through the long-term synthesis filter and the target signal of the adaptive codebook. For example, the optimal delay value may be obtained as shown in Equation 6.
where, k for maximizing Equation 6 is used and L means the length of one subframe of a decoder. The gain value of the long-term synthesis filter is obtained by applying the delay value D obtained in Equation 6 to Equation 7.
Through the above process, a gain value gp of an adaptive codebook, D corresponding to a pitch and an adaptive codebook v(n) are finally obtained.
The fixed codebook 330 models a remaining component in which adaptive codebook influence is removed from the excitation signal. The fixed codebook 330 may be searched for by a process of minimizing error between the weighted input signal and the weighted synthesis signal. The target signal of the fixed codebook may be updated to a signal in which the ZSR of the adaptive codebook 320 is removed from the input signal subjected to the weighting filter 300. For example, the target signal of the fixed codebook may be expressed as shown in Equation 8.
c(n)=sw (n)−gpv(n) Equation 8
In Equation 8, c(n) denotes the target signal of the fixed codebook, sw (n) denotes an input signal to which the weighting filter 300 is applied, and gpv(n) denotes a ZSR of the adaptive codebook 320. v(n) denotes an adaptive codebook generated using a long-term synthesis filter.
The fixed codebook 330 may be searched for by minimizing Equation 9 in a process of minimizing error between the fixed codebook and the target signal of the fixed codebook.
In Equation 9, H denotes a lower triangular Toeplitz convolution matrix generated by an impulse response h(n) of a weighting short-term synthesis filter, a main diagonal component is h(0), and lower diagonals become h(1), . . . , and h(L−1). A numerator of Equation 9 is calculated by Equation 10. Np is the number of fixed codebooks and si denotes an i-th pulse sign.
A denominator of Equation 9 is calculated by Equation 11.
The coding parameter of the speech encoder may use a step-by-step estimation method of searching for an optimal adaptive codebook and then searching for a fixed codebook.
Referring to
The quantization unit 210 may acquire a quantized spectrum candidate vector corresponding to the linear prediction filter coefficient (S410). The quantized spectrum candidate vector may be acquired using first best information, which will be described with reference to
Referring to
In a process of mapping the spectrum vector of the current frame to a codebook of the current frame and performing quantization, the spectrum vector may be divided into a number of subvectors and codebooks corresponding to the subvectors may be found. Although a multi-stage vector quantizer having multiple stages may be used, the present invention is not limited thereto.
The spectrum vector of the current frame transformed for quantization may be used without change. Alternatively, a method of quantizing a residual spectrum vector of the current frame may be used. The residual spectrum vector of the current frame may be generated using the spectrum vector of the current frame and a prediction vector of the current frame. The prediction vector of the current frame may be induced from a quantized spectrum vector of a previous frame. For example, the residual spectrum vector of the current frame may be induced as shown in Equation 12.
r(n)=z(n)−p(n), where p(n)=⅓{circumflex over (r)}(n−1) Equation 12
In Equation 12, r(n) denotes the residual spectrum vector of the current frame, z(n) denotes a vector in which an average value of each order is removed from the spectrum vector of the current frame, p(n) denotes the prediction vector of the current frame, and {circumflex over (r)}(n−1) denotes the quantized spectrum vector of the previous frame.
The quantization unit 210 may calculate error between the spectrum vector of the current frame and a codebook of the current frame (S520). The codebook of the current frame means a codebook used for spectrum vector quantization. The codebook of the current frame may include quantized code vectors and codebook indexes corresponding to the quantized code vectors. The quantization unit 210 may calculate error between the spectrum vector and the codebook of the current frame and align the quantized code vectors or codebook indexes in ascending order of error.
Codebook indexes may be extracted in light of the error and the first best information of S520 (S530). The first best information may mean information about the number of codebook indexes extracted in frame units. The first best information may be a value predetermined by an encoder. Codebook indexes (or quantized code vectors) may be extracted in ascending order of error between the spectrum vector and the codebook of the current frame according to the first best information.
The quantized spectrum candidate vectors corresponding to the extracted codebook indexes may be acquired (S540). That is, the quantized code vectors corresponding to the extracted codebook indexes may be used as the quantized spectrum candidate vector of the current frame. Accordingly, the first best information may indicate information about the number of quantized spectrum candidate vectors acquired in frame units. One quantized spectrum candidate vector or a plurality of quantized spectrum candidate vectors may be acquired according to the first best information.
The quantized spectrum candidate vector of the current frame acquired in S410 may be used as a quantized spectrum candidate vector for any subframe within the current frame. In this case, the quantization unit 210 may interpolate the quantized spectrum candidate vector (S420). The quantized spectrum candidate vectors for the remaining subframes within the current frame may be acquired through interpolation. Hereinafter, the quantized spectrum candidate vectors acquired on a per subframe basis within the current frame is referred to as a quantized spectrum candidate vector set. In this case, the first best information may indicate information about the number of quantized spectrum candidate vector sets acquired in frame units. Accordingly, one or a plurality of quantized spectrum candidate vector sets may be acquired with respect to the current frame according to the first best information.
For example, the quantized spectrum candidate vector of the current frame acquired in S410 may be used as a quantized spectrum candidate vector of a subframe in which a center of gravity of a window is located. In this case, the quantized spectrum candidate vectors for the remaining subframes may be acquired through linear interpolation between the quantized spectrum candidate vector of the current frame extracted in S410 and the quantized spectrum vector of the previous frame. If the current frame includes four subframes, the quantized spectrum candidate vectors corresponding to the subframes may be generated as shown in Equation 13.
q[0]=0.75qend.p+0.25qend
q[1]=0.5qend.p+0.5qend
q[2]=0.25qend.p+0.75qend
q[3]=qend Equation 13
In Equation 13, qend.p denotes the quantized spectrum vector corresponding to the last subframe of the previous frame and qend denotes the quantized spectrum candidate vector corresponding to the last subframe of the current frame.
The quantization unit 210 acquires a linear prediction filter coefficient corresponding to the interpolated quantized spectrum candidate vector. The interpolated quantized spectrum candidate vector may be transformed onto a linear prediction domain, which may be used to calculate a linear prediction filter and a weighting filter for each subframe.
The psychological weighting filter 280 may generate a weighted input signal from the input signal (S430). The weighting filter may be generated from Equation 3 using the linear prediction filter coefficient acquired from the interpolated quantized spectrum candidate vector.
The adaptive codebook 230 may acquire an adaptive codebook with respect to the weighted input signal (S440). The adaptive codebook may be obtained by the long-term synthesis filter. The long-term synthesis filter may use an optimal delay value and gain value of minimizing error between the target signal of the adaptive codebook and the signal passing through the long-term synthesis filter. The delay value and gain value, that is, the coding parameters of the adaptive codebook, may be extracted with respect to the quantized spectrum candidate vector according to the first best information. The delay value and gain value are shown in Equations 6 and 7. In addition, the fixed codebook 240 searches for the fixed codebook with respect to the target signal of the codebook (S450). The target signal of the fixed codebook and the process of searching for the fixed codebook are shown in Equations 8 and 9, respectively. Similarly, the fixed codebook may be acquired with respect to the quantized immitance spectrum frequency candidate vector or the quantized immitance spectrum frequency candidate vector set according to the first best information.
The adder 250 multiplies the adaptive codebook acquired in S450 and the fixed codebook searched in S460 by respective gain values and adds the codebooks so as to generate an excitation signal (S460). The synthesis filter 260 may perform synthesis filtering by a linear prediction filter coefficient acquired from the interpolated quantized spectrum candidate vector with respect to the excitation signal output from the adder 250 so as to generate a synthesis signal (S470). If a weighting filter is applied to the synthesis filter 260, a weighted synthesis signal may be generated. An error minimization unit 290 may acquire a coding parameter for minimizing error between the input signal (or the weighted input signal) and the synthesis signal (or the weighted synthesis signal) (S480). The coding parameter may include a linear prediction filter coefficient, a delay value and gain value of an adaptive codebook and an index and gain value of a fixed codebook. For example, the coding parameter for minimizing error may be acquired using Equation 14.
In Equation 14, sw (n) denotes the weighted input signal and ŝw(i) (n) denotes the weighted synthesis signal according to an i-th coding parameter.
Referring to
The quantization unit 210 may acquire a quantized immitance spectral frequency vector corresponding to the linear prediction filter coefficient (S610). Hereinafter, a method of acquiring the quantized spectrum vector will be described.
The quantization unit 210 may transform a linear prediction filter coefficient of a current frame into a spectrum vector of the current frame in order to quantize the linear prediction filter coefficient on a spectrum frequency domain. This transformation process is described with reference to
The quantization unit 210 may measure error between the spectrum vector of the current frame and the codebook of the current frame. The codebook of the current frame may mean a codebook used for spectrum vector quantization. The codebook of the current frame includes quantized code vectors and indexes allocated to the quantized code vectors. The quantization unit 210 may measure error between the spectrum vector and codebook of the current frame, align the quantized code vectors or the codebook indexes in ascending order of error, and store the quantized code vectors or the codebook indexes.
The codebook index (or the quantized code vector) for minimizing error between the spectrum vector and the codebook of the current frame may be extracted. The quantized code vector corresponding to the codebook index may be used as the quantized spectrum vector of the current frame.
The quantized spectrum vector of the current frame may be used as a quantized spectrum vector for any subframe within the current frame. In this case, the quantization unit 210 may interpolate the quantized spectrum vector (S620). Interpolation is described with reference to
The psychological weighting filter 280 may generate a weighted input signal from the input signal (S630). The weighting filter may be expressed by Equation 3 using the linear prediction filter coefficient from the interpolated quantized spectrum vector.
The adaptive codebook 230 may acquire an adaptive codebook candidate in light of the second best information with respect to the weighted input signal (S640). The second best information may be information about the number of adaptive codebooks acquired in frame units. Alternatively, the second best information may indicate indication about the number of coding parameters of the adaptive codebook acquired in frame units. The code parameter of the adaptive codebook may include a delay value and gain value of the adaptive codebook. The adaptive codebook candidate may indicate an adaptive codebook acquired according to the second best information.
First, the adaptive codebook 230 may acquire a delay value and a gain value corresponding to error between a target signal of an adaptive codebook and a signal passing through a long-term synthesis filter. The delay value and the gain value may be aligned in ascending order of error and may be then stored. The delay value and the gain value may be extracted in ascending order of error between the target signal of the adaptive codebook and the signal passing through the long-term synthesis filter. The extracted delay value and gain value may be used as the delay value and gain value of the adaptive codebook candidate.
The long-term synthesis filter candidate may be obtained using the extracted delay value and gain value. By applying the long-term synthesis filter candidate to the input signal or the weighted input signal, the adaptive codebook candidate may be acquired.
The fixed codebook 240 may search for a fixed codebook with respect to a target signal of a fixed codebook (S650). The target signal of the fixed codebook and the process of searching the fixed codebook are shown in Equations 8 and 9, respectively. The target signal of the fixed codebook may indicate a signal in which a ZSR of an adaptive codebook candidate is removed from the input signal subjected to the weighting filter 300. Accordingly, the fixed codebook may be searched for with respect to the adaptive codebook candidate according to the second best information.
The adder 250 multiplies the adaptive codebook acquired in S640 and the fixed codebook searched in S650 by respective gain values and adds the codebooks so as to generate an excitation signal (S660). The synthesis filter 260 may perform synthesis filtering by a linear prediction filter coefficient acquired from the interpolated quantized spectrum candidate vector with respect to the excitation signal output from the adder 250 so as to generate a synthesis signal (S670). If a weighting filter is applied to the synthesis filter 260, a weighted synthesis signal may be generated. The error minimization unit 290 may acquire a coding parameter for minimizing error between the input signal (or the weighted input signal) and the synthesis signal (or the weighted synthesis signal) (S680). The coding parameter may include a linear prediction filter coefficient, a delay value and gain value of an adaptive codebook and an index and gain value of a fixed codebook. For example, the coding parameter for minimizing error is shown in Equation 14 and thus a description thereof will be omitted.
Referring to
The quantization unit 210 may acquire a quantized spectrum vector corresponding to the linear prediction filter coefficient (S710). The method of acquiring the quantized spectrum vector is described with reference to
The quantized spectrum vector of the current frame may be used as a quantized immitance spectrum frequency vector for any one of subframes within the current frame. In this case, the quantization unit 210 may interpolate the quantized spectrum vector (S720). The quantized immitance spectrum frequency vectors for the remaining subframes within the current frame may be acquired through interpolation. The interpolation method is described with reference to
The quantization unit 210 may acquire a linear prediction filter coefficient corresponding to the interpolated quantized spectrum vector. The interpolated quantized spectrum vector may be transformed onto a linear prediction domain, which may be used to calculate a linear prediction filter and a weighting filter for each subframe.
The psychological weighting filter 280 may generate a weighted input signal from the input signal (S730). The weighting filter may be expressed by Equation 3 using the linear prediction filter coefficient from the interpolated quantized spectrum vector.
The adaptive codebook 230 may acquire an adaptive codebook with respect to the weighted input signal (S740). The adaptive codebook may be obtained by a long-term synthesis filter. The long-term synthesis filter may use an optimal delay value and gain value for minimizing error between a target signal of the adaptive codebook and a signal passing through the long-term synthesis filter. The method of acquiring the delay value and the gain value is described with reference to Equations 6 and 7.
The fixed codebook 240 may search for a fixed codebook candidate with respect to the target signal of the fixed codebook based on third best information (S750). The third best information may indicate information about the number of coding parameters of the fixed codebook extracted in frame units. The coding parameter of the fixed codebook may include an index and gain value of the fixed codebook. The target signal of the fixed codebook is shown in Equation 8.
The fixed codebook 330 may calculate error between the target signal of the fixed codebook and the fixed codebook. The index and gain value of the fixed codebook may be aligned and stored in ascending order of error between the target signal of the fixed codebook and the fixed codebook.
The index and gain value of the fixed codebook may be extracted in ascending order of error between the target signal of the fixed codebook and the fixed codebook according to the third best information. The extracted index and gain value of the fixed codebook may be used as the index and gain value of the fixed codebook candidate.
The adder 250 multiplies the adaptive codebook acquired in S740 and the fixed codebook candidate searched in S750 by respective gain values and adds the codebooks so as to generate an excitation signal (S760). The synthesis filter 260 may perform synthesis filtering by a linear prediction filter coefficient acquired from the interpolated quantized spectrum candidate vector with respect to the excitation signal output from the adder 250 so as to generate a synthesis signal (S770). If a weighting filter is applied to the synthesis filter 260, a weighted synthesis signal may be generated. The error minimization unit 290 may acquire a coding parameter for minimizing error between the input signal (or the weighted input signal) and the synthesis signal (or the weighted synthesis signal) (S780). The coding parameter may include a linear prediction filter coefficient, a delay value and gain value of an adaptive codebook and an index and gain value of a fixed codebook. For example, the coding parameter for minimizing error is shown in Equation 14 and thus a description thereof will be omitted.
In addition, the input signal may be quantized by a combination of the first best information, the second best information and the third best information.
The present invention may be used for speech signal encoding.
This application is a U.S. National Phase Application under 35 U.S.C. §371 of International Application PCT/KR2010/008848, filed on Dec. 10, 2010, which claims the benefit of U.S. Provisional Application No. 61/285,184, filed on Dec. 10, 2009, U.S. Provisional Application No. 61/295,165, filed on Jan. 15, 2010, U.S. Provisional Application No. 61/321,883, filed on Apr. 8, 2010, and U.S. Provisional Application No. 61/348,225, filed on May 25, 2010, the entire contents of the prior applications are hereby incorporated by reference in their entireties.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/KR2010/008848 | 12/10/2010 | WO | 00 | 6/8/2012 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2011/071335 | 6/16/2011 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6108624 | Park | Aug 2000 | A |
6574593 | Gao et al. | Jun 2003 | B1 |
7389227 | Kang et al. | Jun 2008 | B2 |
20010010038 | Kang et al. | Jul 2001 | A1 |
20030014249 | Ramo | Jan 2003 | A1 |
20080294429 | Su et al. | Nov 2008 | A1 |
20100057446 | Morii | Mar 2010 | A1 |
Number | Date | Country |
---|---|---|
1235335 | Nov 1999 | CN |
1975861 | Jun 2007 | CN |
0 902 421 | Mar 1999 | EP |
2128858 | Dec 2009 | EP |
10-1996-0015861 | Nov 1996 | KR |
10-2001-0084468 | Sep 2001 | KR |
10-2009-0117877 | Nov 2009 | KR |
WO 02093551 | Nov 2002 | WO |
WO 2008108076 | Sep 2008 | WO |
Entry |
---|
European Search Report dated Jul. 19, 2013 for Application No. 10 836 230.2, 8 pages. |
Rapporteur Q9/16: “Updated draft new of new ITU-T Recommendation G.VBR-EV”, ITU-T SG16 Meeting; Apr. 22-May 2, 2008; Geneva,, No. T05-SG16-080422-TD-WP3-0338, Apr. 24, 2008, XP030100513, pp. 2, 30-35, and 60-72. |
“Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s”; Recommendation ITU-T G.718; Study Period 2009-2012, International Telecommunication Union, Geneva ; CH, vol. Study Group 16, Sep. 13, 2010, pp. 1-257, XP017452920, [Retrieved on Sep. 13, 2010]; pp. i-ii, 21-26, and 52-70. |
International Search Report dated Aug. 18, 2011 for Application No. PCT/KR2010/008848, with English translation, 4 pages. |
Number | Date | Country | |
---|---|---|---|
20120245930 A1 | Sep 2012 | US |
Number | Date | Country | |
---|---|---|---|
61285184 | Dec 2009 | US | |
61295165 | Jan 2010 | US | |
61321883 | Apr 2010 | US | |
61348225 | May 2010 | US |