The present invention relates to an audio coding apparatus and an audio decoding apparatus, and, for example, to an audio coding apparatus and audio decoding apparatus that employ hierarchical coding (code-excited linear prediction (CELP) and transform coding).
With respect to audio coding, there are two main types of coding schemes, namely transform coding and linear prediction coding.
Transform coding involves a signal conversion from the time domain to the frequency domain, as in discrete Fourier transform (DFT), modified discrete cosine transform (MDCT), and/or the like. Spectral coefficients derived through signal conversion are quantized and coded. In the process of quantization or coding, the psychoacoustic model is ordinarily applied to determine the perceptual significances of the spectral coefficients, and the spectral coefficients are quantized or coded in accordance with their perceptual significances. MPEG MP3, MPEG, AAC (see Non-Patent Literature 1), Dolby AC3, and the like, are used widely for transform coding (transform codecs). Transform coding is effective for music, as well as audio signals in general. A simple configuration of a transform codec is shown in
With respect to the encoder shown in
A psychoacoustic model analysis is performed on frequency domain signal S(f), and a masking curve is derived (103). Frequency domain signal S(f) is quantized (102) in accordance with the masking curve derived through the psychoacoustic model analysis, thereby making quantization noise inaudible.
A quantized parameter is multiplexed (104) and sent to the decoder side.
With respect to the decoder shown in
Decoded spectral coefficient S˜(f) is converted back to the time domain using a method of converting (107) from the frequency domain to the time domain, such as inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform (IMDCT), and/or the like, and decoded signal S˜(n) is reconfigured.
On the other hand, linear predictive coding derives a residual signal (excitation signal) by applying linear prediction to an input audio signal, making use of the predictability of audio signals in the time domain. For vocal regions having similarity with respect to time shifts based on pitch period, this modeling procedure is an extremely efficient expression. Subsequent to linear prediction, the residual signal is typically coded through two types of methods, namely TCX and CELP.
With respect to TCX (see Non-Patent Literature 2), the residual signal is converted to the frequency domain, and coding is performed. One widely used TCX codec is 3GPP AMR-WB+. A simple configuration of a TCX codec is shown in
With respect to the encoder shown in
Residual signal Sr(n) is converted into residual signal spectral coefficient Sr(f) (205) using a method of converting from the time domain to the frequency domain, such as discrete Fourier transform (DFT), modified discrete cosine transform (MDCT), and/or the like.
Residual signal spectral coefficient Sr(f) is quantized (206), and a quantized parameter is multiplexed (207) and sent to the decoder side.
With respect to the decoder shown in
The quantized parameter is dequantized, and decoded residual signal spectral coefficient Sr˜(f) is reconfigured (210).
Decoded residual signal spectral coefficient Sr˜(f) is converted back to the time domain using a method of converting (211) from the frequency domain to the time domain, such as inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform (IMDCT), and/or the like, and decoded residual signal Sr˜(n) is reconfigured.
Based on the dequantized LPC parameter from dequantization section (209), decoded residual signal Sr˜(n) is processed with LPC synthesis filter (212) to obtain decoded signal S˜(n).
In CELP coding, the residual signal is quantized using a predetermined codebook. In order to further enhance the sound quality, the difference signal between the original signal and the LPC synthesis signal is typically converted to the frequency domain and further encoded. Examples of coding of such a configuration include ITU-T G.729.1 (see Non-Patent Literature 3) and ITU-T G.718 (see Non-Patent Literature 4). A simple configuration of hierarchical coding (embedded coding), which uses CELP at its core section, and transform coding is shown in
With respect to the encoder shown in
Error signal Se(n) is converted into error signal spectral coefficient Se(f) through a method of converting (303) from the time domain to the frequency domain, such as discrete Fourier transform (DFT), modified discrete cosine transform (MDCT), and/or the like.
Se(f) is quantized (304), and a quantized parameter is multiplexed (305) and sent to the decoder side.
With respect to the decoder shown in
The quantized parameter is dequantized, and decoded error signal spectral coefficient Se˜(f) is reconfigured (308).
Decoded error signal spectral coefficient Se˜(f) is converted back to the time domain using a method of converting (309) from the frequency domain to the time domain, such as inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform (IMDCT), and/or the like, and decoded error signal Se˜(n) is reconfigured.
Based on CELP coded parameters, the CELP decoder reconfigures synthesized signal Ssyn(n) (307), and reconfigures decoded signal S˜(n) by adding CELP synthesized signal Ssyn(n) and decoded error signal Se˜(n).
Transform coding is ordinarily carried out using vector quantization.
Due to bit constraints, it is usually impossible to finely quantize all spectral coefficients. Spectral coefficients are often loosely quantized, where only a portion of the spectral coefficients are quantized.
By way of example, there are several types of vector quantization methods used in G.718 for spectral coefficient quantization, multi-rate lattice VQ (SMLVQ) (see Non-Patent Literature 5), Factorial Pulse Coding (FPC), and Band Selective-Shape Gain Coding (BS-SGC). Each vector quantization method is used in one of the transform coding layers. Due to bit constraints, only several of the spectral coefficients are selected and quantized at each layer.
As shown in
When the number of usable bits is limited, it may not always be possible to quantize all spectral coefficients in the transform coding layers, thus resulting in numerous zero spectral coefficients in the decoded spectral coefficients. Under more adverse conditions, a spectral gap occurs in the decoded spectral coefficients.
Due to the spectral gap in the decoded signal spectral coefficients, the decoded signal is perceived as a dull and muffled sound. In other words, the sound quality drops.
An object of the present invention is to provide an audio coding apparatus and audio decoding apparatus that are capable of mitigating sound quality degradation.
With the present invention, a spectral gap caused by loose quantization is closed.
As shown in
Details of a spectral envelope shaping process are presented below.
First, a process of an audio coding apparatus will be presented. (1) Decoded error signal spectral coefficient Se˜(f) of the transform coding layer is reconfigured. (2) Decoded signal spectral coefficient S˜(f) is reconfigured by adding synthesized signal spectral coefficient Ssyn(f) from the CELP core layer and decoded error signal spectral coefficient Se˜(f), such as that given by the equation below, from the transform coding layer.
[1]
{tilde over (S)}(f)={tilde over (S)}e(f)+Ssyn(f) (Equation 1)
where {tilde over (S)}e(f) is the decoded error signal spectral coefficient, Ssyn(f) is the synthesized signal spectral coefficient from the CELP core layer, and {tilde over (S)}(f) is the decoded signal spectral coefficient.
(3) Decoded signal spectral coefficient S˜(f) and input signal spectral coefficient S(f) are both divided into a plurality of subbands. (4) For each subband, the energy of input signal spectral coefficient S(f) corresponding to zero decoded error signal spectral coefficient Se˜(f) is calculated as indicated by the equation below. The term “zero decoded error signal spectral coefficient” refers to a decoded error signal spectral coefficient whose spectral coefficient value is zero.
where Eorg
(5) For each subband, the energy of decoded signal spectral coefficient S˜(f) corresponding to zero decoded error signal spectral coefficient Se˜(f) is calculated as indicated by the equation below.
where Edec
(6) For each band, an energy ratio such as that given by the equation below is determined.
[4]
G
i
=E
org
i
/E
dec
i (Equation 4)
where Eorg
(7) The energy ratio is quantized and sent to the audio decoding apparatus side.
Next, a process of an audio decoding apparatus will be presented. (1) The energy ratio is dequantized. (2) The synthesized signal spectral coefficient from the CELP core layer is shaped in accordance with a spectral envelope shaping parameter derived from the decoded energy ratio. (3) The spectral-envelope-shaped spectrum is used to close the spectral gap of the transform coding layer as indicated in the equation below.
[5]
if {tilde over (S)}e(f)=0,
{tilde over (S)}
e(f)=Ssyn(f)*(√{square root over ({tilde over (G)}i)}−1)
fε[sb_start[i],sb_end[i]] (Equation 5)
where {tilde over (S)}e(f) is the decoded error spectral coefficient, Ssyn(f) is the synthesized signal spectral coefficient from the CELP core layer, and {tilde over (S)}(f) is the decoded signal spectral coefficient, {tilde over (G)}i is the decoded energy ratio with respect to subband i, sb_start[i] is the minimum frequency of subband i, and sb_end[i] is the maximum frequency of subband i.
With the present invention, by closing the spectral gap in the spectrum, dull and muffled sounds in the decoded signal may be prevented, thereby mitigating sound quality degradation.
Embodiments of the present invention are described in detail below with reference to the drawings. With respect to the various embodiments, like elements are designated with like numerals, while omitting redundant descriptions thereof.
With respect to the audio coding apparatus shown in
CELP local decoding section 602 reconfigures a synthesized signal using a CELP coded parameter. Multiplexing section 609 multiplexes the CELP coded parameter, and sends it to an audio decoding apparatus.
Subtractor 610 derives error signal Se(n) (the difference signal between the input signal and the synthesized signal) by subtracting the synthesized signal from the input signal.
T/F transform sections 603 and 604 convert the synthesized signal and error signal Se(n) into a synthesized signal spectral coefficient and error signal spectral coefficient Se(f) using a method of converting from the time domain to the frequency domain, e.g., discrete Fourier transform (DFT), modified discrete cosine transform (MDCT), and/or the like.
Vector quantization section 605 carries out vector quantization on error signal spectral coefficient Se(f), and generates a vector quantized parameter.
Multiplexing section 609 multiplexes the vector quantized parameter and sends it to the audio decoding apparatus.
At the same time, vector dequantization section 606 dequantizes the vector quantized parameter, and reconfigures decoded error signal spectral coefficient Se˜(f).
Spectral envelope extraction section 607 extracts spectral envelope shaping parameter {Gi} from the synthesized signal spectral coefficient, the error signal spectral coefficient, and the decoded error signal spectral coefficient.
Quantization section 608 quantizes spectral envelope shaping parameter {Gi}. Multiplexing section 609 multiplexes the quantized parameter, and sends it to the audio decoding apparatus.
As shown in
First, adder 708 adds synthesized signal spectral coefficient Ssyn(f) and error signal spectral coefficient Se(f) to form input signal spectral coefficient S(f). Adder 707 adds synthesized signal spectral coefficient Ssyn(f) and decoded error signal spectral coefficient Se˜(f) to form decoded signal spectral coefficient S˜(f).
Next, band division sections 702 and 701 divide input signal spectral coefficient S(f) and decoded signal spectral coefficient S˜(f) into a plurality of subbands.
Next, spectral coefficient division sections 704 and 703 reference the decoded error signal spectral coefficient, and classify each of the input signal spectral coefficient and the decoded signal spectral coefficient into two classes. First, the input signal spectral coefficient will be described. With respect to each subband, spectral coefficient division section 704 performs classification according to two types, where an input signal spectral coefficient corresponding to a band for which the decoded signal spectral coefficient value is zero is classified as a zero input signal spectral coefficient, and where an input signal spectral coefficient corresponding to a band for which the decoded signal spectral coefficient value is not zero is classified as a non-zero input signal spectral coefficient. Spectral coefficient division section 703 applies to the decoded signal spectral coefficient a similar classification based on the decoded error signal spectral coefficient to determine a zero decoded error signal spectral coefficient and a non-zero decoded signal spectral coefficient.
As shown in
Subband energy computation sections 706 and 705 calculate energy for each subband with respect to zero input signal spectral coefficient S″i(f) and zero decoded signal spectral coefficient S″i˜(f). Energy is calculated in the manner indicated by the equation below.
where E″org
where E″dec
The ratio between the above-mentioned two energies is calculated as follows.
[8]
G
i
=E″
org
i
/E″
dec
i (Equation 8)
where E″org
This {Gi} is outputted as a spectral envelope shaping parameter from divider 707.
With respect to the audio decoding apparatus shown in
By means of the CELP coded parameter, CELP decoding section 902 reconfigures synthesized signal Ssyn(n).
T/F transform section 903 converts synthesized signal Ssyn(n) into decoded signal spectral coefficient Ssyn(f) using a method of converting from the time domain to the frequency domain, e.g., discrete Fourier transform (DFT), modified discrete cosine transform (MDCT), and/or the like.
Vector dequantization section 904 dequantizes the vector quantized parameter, and reconfigures decoded error signal spectral coefficient Se˜(f).
Dequantization section 905 dequantizes the quantized parameter intended for the spectral envelope shaping parameter, and reconfigures decoded spectral envelope shaping parameter {Gi˜}.
Spectral envelope shaping section 906 closes the spectral gap of the decoded error signal spectral coefficient by means of decoded spectral envelope shaping parameter {Gi˜}, synthesized signal spectral coefficient Ssyn(f), and decoded error signal spectral coefficient Se˜(f) to generate post-processing error signal spectral coefficient Spost-e˜(f).
F/T transform section 907 transforms post-processing error signal spectral coefficient Spost-e˜(f) back to the time domain, and reconfigures decoded error signal Se˜(n) using a method of converting from the frequency domain to the time domain, such as inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform (IMDCT), and/or the like.
Adder 908 reconfigures decoded signal S˜(n) by adding synthesized signal Ssyn(n) and decoded error signal Se˜(n).
As shown in
Band division section 1001 divides synthesized signal spectral coefficient Ssyn(f) into a plurality of subbands.
Next, as shown in
Spectral envelope shaping parameter generation section 1003 processes decoded spectral envelope shaping parameter Gi˜, and calculates an appropriate spectral envelope shaping parameter. One such method is presented through the equation below.
[9]
P
i√{square root over ({tilde over (G)}i)}−1 (Equation 9)
where Pi is the derived spectral envelope shaping parameter, and {tilde over (G)} is the decoded spectral envelope shaping parameter of the ith subband.
Then, as indicated by the following equations, the synthesized signal spectral coefficients from the CELP layer are shaped by multiplier 1004 in accordance with the spectral envelope shaping parameter, and a post-processing error signal spectrum is generated by adder 1005.
[10]
if {tilde over (S)}e(f)=0,
{tilde over (S)}
post
e(f)=Ssyn(f)*Pi (Equation 10)
[11]
if {tilde over (S)}e(f)!=0,
{tilde over (S)}
post
e(f)={tilde over (S)}e(f)
fε[sb_start[i],sb_end[i]] (Equation 11)
where {tilde over (S)}e(f) is the decoded error signal spectral coefficient, Ssyn(f) is the synthesized signal spectral coefficient from the CELP layer, {tilde over (S)}(f) is the decoded signal spectral coefficient, Pi is the derived spectral envelope shaping parameter, {tilde over (S)}Spost
<Variation>
With respect to the coding section, after at least one of the zero input signal spectral coefficient and the zero decoded signal spectral coefficient has been classified, and, with respect to the decoding section, after the zero synthesized signal spectral coefficient has been classified, band division may be performed taking these classification results into account. This enables subbands to be determined efficiently.
The present invention may be applied to a configuration where the number of bits available for spectral envelope shaping parameter quantization is variable from frame to frame. By way of example, this may include cases where a variable bit rate coding scheme, or a scheme in which the number of bits quantized at vector quantization section 605 in
In quantizing spectral envelope shaping parameters, quantization may be performed in order from the higher frequency bands to the lower frequency bands. The reason being that, with respect to low frequency bands, CELP is able to code audio signals extremely efficiently through linear prediction modeling. Accordingly, when employing CELP in the core layer, it is perceptually more important to close the spectral gap of the high frequency bands.
If the number of bits available for spectral envelope shaping parameter quantization falls short, a spectral envelope shaping parameter having a large Gi value (Gi>1) or small Gi value (Gi<1) may be selected, and sent to the decoder side with quantization being performed only on the selected spectral envelope shaping parameter. In other words, what this signifies is that spectral envelope shaping parameters are quantized only with respect to subbands for which there is a large difference between the energy of the zero input signal spectral coefficients and the energy of the zero decoded signal spectral coefficients. Since this means that information of subbands that result in greater perceptual improvement will be selected and quantized, sound quality may be improved. In the case above, a flag indicating the subband of the selected energy is sent.
In quantizing spectral envelope shaping parameters, quantization may be performed with a bound provided so that the spectral envelope shaping parameter decoded after quantization does not exceed the value of the spectral envelope shaping parameter subject to quantization. Consequently, the post-processing error signal spectral coefficient that closes the spectral gap may be prevented from becoming unnecessarily large, and sound quality may be improved.
In the case of a configuration where coding is performed at a low bit rate, coding accuracy is sometimes insufficient even for bands where there is no spectral gap (i.e., bands coded at a transform coding layer), resulting in a large coding error relative to the input signal spectral coefficient. Under such conditions, it is possible to improve sound quality by applying spectral envelope shaping to bands where there is no spectral gap, just like it is applied to bands where there is a spectral gap. Furthermore, in this case, greater sound quality improving effects are attained when spectral envelope shaping is carried out with respect to bands in which there is no spectral gap, separately from bands in which there is a spectral gap.
A configuration of a spectral envelope extraction section according to the present embodiment is shown in
A configuration of a spectral envelope shaping section of the present embodiment is shown in
As shown in
[12]
P′
i
=√{square root over ({tilde over (G)}′
i−1 (Equation 12)
where P′i is the derived spectral envelope shaping parameter, and {tilde over (G)}′i is the spectral envelope shaping parameter of the ith subband.
Adder 1204 adds the synthesized signal spectral coefficient and the decoded error signal spectral coefficient to form the decoded signal spectral coefficient as indicated by the equation below.
[13]
{tilde over (S)}(f)={tilde over (S)}e(f)Ssyn(f) (Equation 13)
where {tilde over (S)}e(f) is the decoded error spectral coefficient, {tilde over (S)}(f) is the decoded signal spectral coefficient, and Ssyn(f) is the synthesized signal spectral coefficient from the CELP layer.
As indicated by the following equations, by means of band division section 1001, spectral coefficient division section 1002, multipliers 1004-1 and 1004-2, and adders 1005-1 and 1005-2, the decoded signal spectral coefficients is shaped for each subband in accordance with the spectral envelope shaping parameter to generate the post-processing error signal spectrum.
[14]
if {tilde over (S)}e(f)=0,
{tilde over (S)}
post
e(f)={tilde over (S)}(f)*Pi (Equation 14)
if {tilde over (S)}e(f)!=0,
{tilde over (S)}
post
e(f)={tilde over (S)}e(f)+{tilde over (S)}(f)*P′i
fε[sb_start[i],sb_end[i]] (Equation 15)
where {tilde over (S)}e(f) is the decoded error signal spectral coefficient, {tilde over (S)}(f) is the decoded signal spectral coefficient, Pi is the spectral envelope shaping parameter for a band in which there is a spectral gap, P′i is the spectral envelope shaping parameter for a band in which there is no spectral gap, {tilde over (S)}post
<Variation>
In the case of a low-bit-rate configuration, a spectral envelope shaping parameter to be used across all bands in which there is no spectral gap may be sent with respect to all bands. The spectral envelope shaping parameter in this case may be calculated as indicated by the equation below.
where E′org
At the audio decoding apparatus, the spectral envelope shaping parameter is used as indicated by the equation below.
[17]
P′i=√{square root over ({tilde over (G)}′−1 (Equation 17)
where P′i is the derived spectral envelope shaping parameter, and {tilde over (G)}′ is the decoded spectral envelope shaping parameter for the non-zero synthesized signal spectral coefficient.
One important factor in maintaining the sound quality of the input signal is to maintain an energy balance between different frequency bands. Accordingly, it is extremely important that the energy balance between a band that has a spectral gap in the decoded signal and a band that does not be maintained so as to resemble the input signal. What follows is a description of an embodiment capable of maintaining the energy balance between a band that has a spectral gap and a band that does not.
where E′org is the energy of the non-zero input signal spectral coefficients with respect to all subbands, S′i(f) is the non-zero input signal spectral coefficient with respect to the ith subband, Nsb is the total number of subbands, and Nnonzero[i] is the number of non-zero decoded signal spectral coefficients with respect to the ith subband.
where E′dec is the energy of the non-zero decoded signal spectral coefficients with respect to all subbands, Si(f) is the non-zero decoded signal spectral coefficient with respect to the ith subband, Nsb is the total number of subbands, and Nnonzero[i] is the number of non-zero decoded signal spectral coefficients with respect to the ith subband.
Energy ratio computation sections 1310 and 1309 calculate an energy ratio relative to the input signal spectral coefficient and an energy ratio relative to the decoded signal spectral coefficient, respectively, according to the equations below.
[20]
R
org
i
=E″
org
i
/E′
org (Equation 20)
where E″org
[21]
R
dec
i
=E″
dec
i
/E′
dec (Equation 21)
where E″dec
At divider 707, a spectral envelope shaping parameter is computed as indicated by the following equation.
[22]
G
i
=R
org
i
/R
dec
i (Equation 22)
where Rorg
In the case of a configuration where coding is performed at a low bit rate, coding accuracy is sometimes insufficient even for bands where there is no spectral gap (i.e., bands coded at a transform coding layer), resulting in a large coding error relative to the input signal spectral coefficient. Under such conditions, it is possible to improve sound quality by applying spectral envelope shaping to bands where there is no spectral gap, just like it is applied to bands where there is a spectral gap. The present embodiment is one where this idea has been applied to Embodiment 3.
[23]
P
i
=√{square root over ({tilde over (G)}
i
/{tilde over (G)}′−1 (Equation 23)
where Pi is the obtained spectral envelope shaping parameter, {tilde over (G)}i is the decoded energy ratio with respect to the ith subband, and {tilde over (G)}′ is the decoded energy ratio with respect to non-zero spectral coefficients.
Embodiments 1 through 4 of the present invention have been described above.
For these embodiments, the apparatuses were referred to as audio coding apparatuses/audio decoding apparatuses, but the term “audio” as used herein refers to audio in a broad sense. Specifically, an input signal with respect to an audio coding apparatus and a decoded signal with respect to an audio decoding apparatus may include any kind of signal, e.g., an audio signal, a music signal, or an acoustic signal including both of the above, and so forth.
The embodiments above have been described taking as examples cases where the present invention is configured with hardware. However, the present invention may also be realized through software in cooperation with hardware.
The functional blocks used in the descriptions for the embodiments above are typically realized as LSIs, which are integrated circuits. These may be individual chips, or some or all of them may be integrated into a single chip. Although the term LSI is used above, depending on the level of integration, they may also be referred to as IC, system LSI, super LSI, or ultra LSI.
The method of circuit integration is by no means limited to LSI, and may instead be realized through dedicated circuits or general-purpose processors. Field programmable gate arrays (FPGAs), which are programmable after LSI fabrication, or reconfigurable processors, whose connections and settings of circuit cells inside the LSI are reconfigurable, may also be used.
Furthermore, should there arise a technique for circuit integration that replaces LSI due to advancements in semiconductor technology or through other derivative techniques, such a technique may naturally be employed to integrate functional blocks. Applications of biotechnology, and/or the like, are conceivable possibilities.
The disclosure of the specification, drawings, and abstract included in Japanese Patent Application No. 2010-234088, filed on Oct. 18, 2010, is incorporated herein by reference in its entirety.
The present invention is applicable to wireless communications terminal apparatuses, base station apparatuses, teleconference terminal apparatuses, video conference terminal apparatuses, voice over Internet Protocol (VoIP) terminal apparatuses, and/or the like, of mobile communications systems.
Number | Date | Country | Kind |
---|---|---|---|
2010-234088 | Oct 2010 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/005171 | 9/14/2011 | WO | 00 | 3/13/2013 |