The present invention relates to an encoding apparatus, decoding apparatus, and encoding and decoding methods adopting a principal component analysis transformation.
In conventional speech communication systems, monaural speech signals are transmitted under the constraint of a limited transmission band. With broadbandization of communication networks, user's expectation on speech communication has risen from mere intelligibility to stereo image and naturalness, and a trend to deliver stereo speech has emerged. Therefore, a coding scheme for transmitting stereo speech efficiently is desired.
To achieve the above goal, encoding methods using PCA (Principal Component Analysis) have been studied as a method of encoding a stereo signal (i.e. two channels) or a plurality of channels (see Non-Patent Literature 1 and Non-Patent Literature 2). In an encoding method using PCA, an input signal is transformed by PCA (PCA-transformation) and each transformed signal is encoded independently. PCA transformation refers to linear transformation that achieves energy concentration in an input signal according to the distribution of eigenvalues obtained from the co-variance matrix of the input signal.
For example, a PCA-transformed stereo signal is transformed into a principal signal corresponding to principal components of the stereo signal (e.g. audio signal components or dominant speech components), and a secondary signal corresponding to the rest of the components other than the principal signal of the stereo signal. That is, the energy of the stereo signal is concentrated on the principal signal. By this means, with an encoding method using PCA, it is possible to remove the redundancy in an input signal by encoding signals in which energy is concentrated, so that it is possible to improve the efficiency of coding. Also, the principal signal and the secondary signal of a stereo signal are mutually uncorrelated, so that it is possible to further remove the redundancy in an input signal.
Here, v1 and v2 refer to the PCA transformation parameters to use to transform left signal L(n) and right signal R(n) into primary signal P(n) and secondary signal A(n). Encoding section 12 and encoding section 13 encode primary signal P(n) and secondary signal A(n) independently (e.g. scalar quantization or vector quantization), and output encoded data of primary signal P(n) and encoded data of secondary signal A(n) to multiplexing section 15. Also, quantizing section 14 quantizes PCA transformation parameters v1 and v2 obtained in PCA transformation section 11, and generates quantized codes of the PCA transformation parameters. Multiplexing section 15 multiplexes the encoded data of primary signal P(n), the encoded data of secondary signal A(n) and the quantized codes of the PCA transformation parameters, and generates bit streams.
Upon decoding a stereo signal in a decoding apparatus shown in
Also, according to speech communication systems, in speech data communication on IP networks, speech coding providing a scalable configuration is demanded to realize traffic control on networks and multicast communication. A scalable configuration refers to a configuration in which the receiving side can decode speech data even from partial encoded data. As a speech encoding technique providing a scalable configuration, scalable encoding (layer encoding) techniques integrating a plurality of encoding techniques in a layered manner have been studied. In scalable encoding techniques, the transmitting side performs layered coding processing of input speech signals and transmits encoded data layered in a plurality of encoded layers.
Also, in speech communication systems, there is a demand to compress speech signals at a low bit rate and transmit the results for efficient use of radio resources. Under a low bit rate constraint, when stereo signal coding is performed using the above PCA, it is difficult to encode both the primary signal and the secondary signal in high quality. Consequently, it is necessary to adequately allocate limited bits to the primary signal and the secondary signal. For example, Non-Patent Literature 1 and Non-Patent Literature 2 disclose a bit allocation method in stereo signal coding using PCA.
Non-Patent Literature 1 discloses a method of applying parametric coding to a secondary signal in stereo signal coding processing. That is, in a primary signal and a secondary signal, the secondary signal is represented as a parameter (parametric coding parameter) based on the difference between the characteristic of primary signal encoded data and the characteristic of the secondary signal. By applying parametric coding to the secondary signal, the redundancy of the secondary signal is removed, which decreases the bit rate of the secondary signal. By this means, primary signal encoded data and parametric coding parameter (secondary signal) with a low bit rate are allocated to limited bits.
Non-Patent Literature 2 discloses a bit allocation method of adaptively allocating bits according to the energy of each of a plurality of channels obtained by applying PCA transformation to an input signal. For example, in stereo signal coding processing, bits are adaptively allocated according to the energy of each of a primary signal and a secondary signal obtained by applying PCA transformation to a stereo signal (i.e. two channels). By this means, it is possible to preferentially transmit the channel of higher energy among a plurality of channels after PCA transformation. Also, under a low bit rate constraint, it is possible to discard the channel of lower energy among a plurality of channels forming a stereo signal. This transmission method is referred to as “channel scalability transmission method.”
Non-Patent Literature
However, in scalable coding systems using a scalable coding technique for stereo signals, if the above bit allocation method is adopted, the amount of information (the number of bits) of bit allocation information to be reported from the encoding apparatus to the decoding apparatus increases, and therefore the efficiency of coding degrades.
To be more specific, if the bit allocation method disclosed in Non-Patent Literature 1 is applied to a scalable coding system, a parametric coding parameter based on a principal signal subjected to scalable coding needs to be updated in each coding layer of scalable coding. Also, this parametric coding parameter requires a predetermined number of bits in each coding layer. That is, the encoding apparatus needs to report, to the decoding apparatus, bit allocation information indicating the amount of information (number of bits) of the parametric coding parameter that varies between coding layers, and therefore the efficiency of coding degrades.
Also, if the bit allocation method disclosed in Non-Patent Literature 2 is applied to a scalable coding system, the number of bits allocated to the primary signal and secondary signal of a stereo signal varies between coding layers. Consequently, the encoding apparatus needs to report, to the decoding apparatus, bit allocation information indicating the number of bits allocated to the primary signal and the secondary signal, and therefore the efficiency of coding degrades.
Thus, in a scalable coding system, when bits are allocated to the primary signal and secondary signal obtained by applying PCA transformation to a stereo signal, it is necessary to report bit allocation information of predetermined bits every coding layer, which increases the amount of bit allocation information to be reported to decoded signals.
It is therefore an object of the present invention to provide an encoding apparatus, decoding apparatus, and encoding and decoding methods for minimizing the amount of bit allocation information and generating stereo signals of high quality upon using a scalable coding technique for stereo signals.
The encoding apparatus of the present invention employs a configuration having: a transformation section that performs principal component analysis transformation of a first channel signal and a second channel signal of an input stereo signal, to generate a first layer primary signal and a first layer secondary signal; an m-th layer selecting section that compares importance of an m-th layer primary signal (where m is a natural number equal to or greater than 1 and equal to or less than M) and importance of an m-th layer secondary signal in a first layer to an M-th layer (where M is a natural number equal to or greater than 2), and selects a signal of higher importance; an m-th layer encoding section that encodes the signal selected in the m-th layer selecting section, to generate m-th layer encoded data in the first layer to the M-th layer; an m-th layer decoding section that decodes the m-th encoded data to generate an m-th layer decoded signal in the first layer to an (M−1)-th layer; a subtracting section that generates a signal obtained by subtracting the m-th layer decoded signal from the signal selected in the m-th layer selecting section, and a signal that is not selected in the m-th layer selecting section, as an (m+1)-th layer primary signal and an (m+1)-th layer secondary signal, in the first layer to the (M−1)-th layer; and a transmitting section that transmits encoded data of the first layer to the M-th layer and signal information indicating signals selected in selecting sections in the first layer to the M-th layer.
According to the present invention, upon using a scalable coding technique for stereo signals, the encoding apparatus encodes only the signal of the higher importance between two signals of a primary signal and a secondary signal obtained by applying PCA transformation to a stereo signal in each coding layer, so that it is possible to minimize the amount of bit allocation information while the decoding side can generate stereo signals of high quality.
Now, embodiments of the present invention will be explained using the accompanying drawings.
(Embodiment 1)
In encoding apparatus 100 shown in
Adaptive reissue encoding sections 102-1 to 102-M adaptively each select one of the two signals based on the importance of the primary signal and the importance of the secondary signal in the corresponding coding layer, and encode the selected signal (i.e. adaptive residue encoding). To be more specific, in the first layer to the M-th layer, adaptive residue encoding section 102-m (m is a natural number equal to or greater than 1 and equal to or less than M) compares the importance of the m-th layer primary signal and the importance of the m-th layer secondary signal, selects the signal of the higher importance and generates m-th layer encoded data (bit sequence) by encoding the selected signal. Also, in the first layer to the (M−1)-th layer, adaptive residue encoding section 102-m generates a residual signal obtained by subtracting a decoded signal of encoded data from the selected signal, and the other signal than the selected signal, as the (m+1)-th layer primary signal and the (m+1)-th layer secondary signal, respectively. Also, in the first layer to the M-th layer, adaptive residue encoding section 102-m generates an indicator representing signal information to indicate an encoded signal (primary signal or secondary signal). For example, if a signal indicated by the indicator is a primary signal, an encoded signal is the m-th layer primary signal, and, if a signal indicated by the indicator is a secondary signal, an encoded signal is the m-th layer secondary signal. That is, an indicator is generated as bit allocation information to indicate a signal allocated to the bit sequence for encoded data set in each coding layer.
For example, adaptive residue encoding section 102-1, which supports the lowest layer (i.e. first layer), applies adaptive residue encoding processing to first layer primary signal P1(n) and first layer secondary signal A1(n) received as input from PCA transformation section 101, and generates first layer encoded data C1. Also, adaptive residue encoding section 102-1 generates a residual signal obtained by subtracting a decoded signal of encoded data C1 from the encoded signal (the selected signal) in the input signals (first layer primary signal P1(n) and first layer secondary signal A1(n)) and generates the other signal (i.e. the signal that is not selected) than the encoded signal (i.e. the selected signal) in the input signals (first layer primary signal P1(n) and first layer secondary signal A1(n)), as second layer primary signal P^2(n) and second layer secondary signal A^2(n). Also, adaptive residue encoding section 102-1 generates indicator F1 indicating a signal encoded in the first layer (i.e. first layer primary signal P1(n) or first layer secondary signal A1(n)). Then, adaptive residue encoding section 102-1 outputs second layer primary signal P^2(n) and second layer secondary signal A^2(n) to adaptive residue encoding section 102-2 supporting the next coding layer (i.e. a second layer), and outputs indicator F1 and encoded data C1 to multiplexing section 104.
Similarly, adaptive residue encoding section 102-2 receives second layer primary signal P^2(n) and second layer secondary signal A^2(n) as input from adaptive residue encoding section 102-1. Then, in the same way as in adaptive residue encoding section 102-1, adaptive residue encoding section 102-2 generates second layer encoded data C2, third layer primary signal P^3(n), third layer secondary signal A^3(n) and indicator F2. Then, adaptive residue encoding section 102-2 outputs third layer primary signal P^3(n) and third layer secondary signal A^3(n) to adaptive residue encoding section 102-3 supporting the next coding layer (i.e. a third layer), and outputs indicator F2 and encoded data C2 to multiplexing section 104. The same applies to adaptive residue encoding sections 102-3 to 102-M. Here, adaptive residue encoding section 102-M supporting the highest layer (i.e. M-th layer) does not output coding residual signals as the primary signal and secondary signal of the next coding layer. That is, only in the first layer to the (M−1)-th layer, that is, only adaptive residue encoding sections 102-1 to 102-(M−1) generate a coding residual signal obtained by subtracting a decoded signal of encoded data from a selected signal, and a signal that is not selected, as the (m+1)-th layer primary signal and the (m+1)-th layer secondary signal, respectively.
Quantizing section 103 quantizes PCA transformation parameters v1 and v2 received as input from PCA transformation section 101, and generates quantized codes of the PCA transformation parameters. Then, quantizing section 103 outputs the quantized codes of PCA transformation parameters to multiplexing section 104.
Multiplexing section 104 multiplexes encoded data Cm and indicators Fm individually received as input from adaptive residue encoding sections 102-1 to 102-M, and the quantized codes received as input from quantizing section 103, and generates bit streams. The resulting bit streams are transmitted to decoding apparatus 200 (
Eigenvector calculating section 1012 calculates a co-variance matrix eigenvector using the co-variance matrix received as input from co-variance matrix calculating section 1011. Here, the elements of the eigenvector calculated in eigenvector calculating section 1012 are PCA transformation parameters v1 and v2. Then, eigenvector calculating section 1012 outputs the calculated eigenvector (PCA transformation parameters) to PCA transformation matrix forming section 1013 and quantizing section 103 shown in
PCA transformation matrix forming section 1013 forms a PCA transformation matrix using the eigenvector received as input from eigenvector calculating section 1012, and outputs the formed PCA transformation matrix to transformation section 1014.
Transformation section 1014 transforms left signal L(n) and right signal R(n) of a stereo signal into first layer primary signal P1(n) and first layer secondary signal A1(n), using the PCA transformation matrix received as input from PCA transformation matrix forming section 1013. Here, P1(n)=P(n) and A1(n)=A(n)).
Next, as an example of adaptive residue encoding processing in adaptive residue encoding sections 102-1 to 102-M, the configuration inside adaptive residue encoding section 102-m supporting the m-th layer will be explained using
In adaptive residue encoding section 102-m shown in
In m-th layer primary signal P^m(n) and m-th layer secondary signal A^m(n) received as input, encoding section 1022-m encodes a signal indicated by indicator Fm received as input from selecting section 1021-m, that is, a signal selected in selecting section 1021-m, to generate m-th layer encoded data Cm. To be more specific, encoding section 1022-m encodes m-th layer primary signal P^m(n) when the signal indicated by indicator Fm is the primary signal, or encodes m-th layer secondary signal A^m(n) when the signal indicated by indicator Fm is the secondary signal. Then, encoding section 1022-m outputs generated m-th layer encoded data Cm to decoding section 1023-m and multiplexing section 104 shown in
Decoding section 1023-m specifies encoded data Cm received as input from encoding section 1022-m based on indicator Fm received as input from selecting section 1021-m and generates an m-th layer decoded signal by decoding encoded data Cm. Here, decoding section 1023-m makes a decoded signal of the other signal than the signal indicated by indicator Fm “0.” Then, in m-th layer decoded signals generated, decoding section 1023-m outputs the decoded signal of the primary signal to subtractor 1024-m and the decoded signal of the secondary signal to subtractor 1025-m. To be more specific, when the signal indicated by indicator Fm is the primary signal, decoding section 1023-m decodes m-th layer primary signal P^m(n) using m-th layer encoded data Cm. Then, decoding section 1023-m outputs decoded signal P{tilde over ( )}m(n) of the primary signal to subtractor 1024-m while outputting “0” to subtractor 1025-m as decoded signal A{tilde over ( )}m(n) of the secondary signal. By contrast with this, when the signal indicated by indicator Fm is the secondary signal, decoding section 1023-m decodes m-th layer secondary signal A^m(n) using encoded data Cm. Then, decoding section 1023-m outputs decoded signal A{tilde over ( )}m(n) of the secondary signal to subtractor 1025-m while outputting “0” to subtractor 1024-m as decoded signal P{tilde over ( )}m(n) of the primary signal.
Subtractor 1024-m generates, as (m+1)-th layer primary signal P^m+1(n), a coding residual signal obtained by subtracting decoded signal P{tilde over ( )}m(n) of the primary signal received as input from decoding section 1023-m, from m-th layer primary signal P^m(n) of an input signal. Then, subtractor 1024-m outputs (m+1)-th layer primary signal P^m+1(n) to adaptive residue encoding section 102-(m+1) supporting the (m+1)-th layer, which is the next coding layer.
Subtractor 1025-m generates, as (m+1)-th layer secondary signal A^m+1(n), a coding residual signal obtained by subtracting decoded signal A{tilde over ( )}m(n) of the secondary signal received as input from decoding section 1023-m, from m-th layer secondary signal A^m(n) of an input signal. Then, subtractor 1025-m outputs (m+1)-th layer secondary signal A^m+1(n) to adaptive residue encoding section 102-(m+1).
For example, when the primary signal is selected in selecting section 1021-m, subtractor 1024-m generates, as (m+1)-th layer primary signal P^m+1(n), a coding residual signal obtained by subtracting a decoded signal of encoded data Cm from m-th layer primary signal P^m(n). Also, subtractor 1025-m generates m-th layer secondary signal A^m(n) as (m+1)-th layer secondary signal A^m+1(n). In contrast, when the secondary signal is selected in selecting section 1021-m, subtractor 1025-m generates, as (m+1)-th layer secondary signal A^m+1(n), a coding residual signal obtained by subtracting a decoded signal of encoded data Cm from m-th layer secondary signal A^m(n). Also, subtractor 1024-m generates m-th layer primary signal P^m(n) as (m+1)-th layer primary signal P^m+1(n).
Next, the configuration inside selecting section 1021-m will be explained using
In selecting section 1021-m shown in
Energy calculating section 1202-m calculates energy EA^m, of m-th layer secondary signal A^m(n) according to equation 4. Then, energy calculating section 1202-m outputs calculated energy EA^m to comparison section 1203-m.
Comparison section 1203-m compares energy EP^m received as input from energy calculating section 1201-m and energy EA^m received as input from energy calculating section 1202-m. Then, comparison section 1203-m selects the signal of the higher energy (i.e. primary signal or secondary signal) as a signal to encode in the m-th layer. For example, when energy EP^m is equal to or higher than energy EA^m, comparison section 1203-m selects the primary signal (i.e. m-th layer primary signal P^m(n)) as the signal to encode in the m-th layer. By contrast, when energy EP^m is lower than energy EA^m, comparison section 1203-m selects the secondary signal (i.e. m-th layer secondary signal A^m(n)) as the signal to encode in the m-th layer. Then, comparison section 1203-m generates indicator Fm indicating the selected signal, that is, the signal (primary signal or secondary signal) encoded in the m-th layer.
As described above, encoding apparatus 100 according to the present embodiment encodes only one of the primary signal and the secondary signal every coding layer. Therefore, the amount of information (the number of bits) of an indicator, which is bit allocation information in each coding layer, requires only one bit to distinguish between the primary signal and the secondary signal.
Also, selecting section 1021-m described above may calculate the energy of a primary signal and secondary signal in the logarithmic domain. Also, selecting section 1021-m may use left signal L(n) and right signal R(n) to calculate the energy of the primary signal and the secondary signal, and, for example, may use the energy of left signal L(n) and right signal R(n). Also, selecting section 1021-m may calculate the energy of the primary signal and the secondary signal taking into account masking.
Next, decoding apparatus 200 shown in
Decoding sections 202-1 to 202-M each decodes encoded data received as input from demultiplexing section 201, based on indicator Fm received as input from demultiplexing section 201. For example, when the signal indicated by indicator Fm is the primary signal, decoding section 202-m decodes the primary signal using encoded data Cm. Then, decoding section 202-m outputs decoded signal P{tilde over ( )}m(n) to adder 203. In contrast, when the signal indicated b indicator Fm is the secondary signal, decoding section 202-m decodes the secondary signal using encoded data Cm. Then, decoding section 202-m outputs decoded signal A{tilde over ( )}m(n) to adder 204. Also, decoding section 202-m outputs “0” to adder 203 or adder 204 as a decoded signal of the other signal than the signal indicated by indicator Fm.
Adder 203 adds decoded signals P{tilde over ( )}m(n) received as input from decoding sections 202-1 to 202-M. Then, adder 203 outputs decoded primary signal P{tilde over ( )}(n), which is obtained by adding decoded signals of all coding layers (the first layer to the M-th layer), to inverse PCA transformation section 206.
Adder 204 adds decoded signals A{tilde over ( )}m(n) received as input from decoding sections 202-1 to 202-M. Then, adder 204 outputs decoded secondary signal A{tilde over ( )}(n), which is obtained by adding decoded signals of all coding layers (the first layer to the M-th layer), to inverse PCA transformation section 206.
Also, depending on, for example, the communication path condition, a case is possible where part of bit streams is discarded. For example, if bit streams include only encoded data up to the m-th layer (m<M), decoding sections up to the first to M-th layers perform operations and adders 203 and 204 supporting these coding layers perform operations to obtain decoded primary signal P{tilde over ( )}(n) and decoded secondary signal A{tilde over ( )}(n), and these decoded primary signal P{tilde over ( )}(n) and decoded secondary signal A{tilde over ( )}(n) are outputted to inverse PCA transformation section 206.
Dequantizing section 205 dequantizes quantized codes received as input from demultiplexing section 201 and outputs resulting PCA transformation parameters v{tilde over ( )}1 and v{tilde over ( )}2 to inverse PCA transformation section 206.
Inverse PCA transformation section 206 receives decoded primary signal P{tilde over ( )}(n) as input from adder 203, receives decoded secondary signal A{tilde over ( )}(n) as input from adder 204 and receives PCA transformation parameters v{tilde over ( )}1 and v{tilde over ( )}2 as input from dequantizing section 205. According to equation 2, inverse PCA transformation section 206 applies inverse PCA transformation to decoded primary signal P{tilde over ( )}(n) and decoded secondary signal A{tilde over ( )}(n) using PCA transformation parameters v{tilde over ( )}1 and v{tilde over ( )}2, and obtains left signal L{tilde over ( )}(n) and right signal R{tilde over ( )}(n) of a stereo signal.
Thus, according to the present embodiment, encoding apparatus 100 (
Also, in scalable coding, coding residual signals in a lower coding layer are received as the input primary signal and secondary signal in each coding layer. Consequently, the energy of input signals in each coding layer changes depending on the coding result in a lower coding layer. Therefore, encoding apparatus 100 (
(Embodiment 2)
Although adaptive residue coding processing is applied to the primary signal and the secondary signal in the first layer of the lowest layer in Embodiment 1, with the present embodiment, band division coding processing is applied to the primary signal in the first layer for further dividing the first layer into layers and performing coding in division frequency band units.
As a method of scalable coding in division frequency band units, studies are underway on, for example, a method of realizing scalable coding by dividing an input signal into a plurality of bands and performing coding in divided band signal units (e.g. see US Patent Application Publication No. 2008/004883, specification), and a method of realizing scalable coding by performing coding in subband units on MDCT coefficients in coding after layer 4 of ITU-T recommendation G.729.1 (i.e. TDAC (Time-Domain Aliasing Cancellation)), and transmitting encoded data preferentially from the subband of the highest energy (see ITU-T recommendation G.729.1 (2006)).
In scalable coding based on band division coding, when an encoded error signal (coding residual signal) of a band signal of the coding target in a lower layer is large, the influence given from the coding residual signal to perceptual decoding quality is larger than the influence given from a band signal of the coding target in a higher layer to perceptual decoding quality.
Therefore, in a coding layer of the band division coding target, the present embodiment adaptively decides whether or not to encode the coding residual signal in a lower layer than each coding layer.
In encoding apparatus 500 shown in
Band division encoding section 501 divides primary signal P1(n) received as input from PCA transformation section 101 into a plurality of bands, and encodes divided band unit signals in a layered manner. Here, when band division encoding section 501 performs coding from the first layer to the L-th layer (L is a natural number equal to or greater than 2), adaptive residue encoding sections 102-2 to 102-M perform coding after the (L+1)-th layer in order. Then, band division encoding section 501 outputs encoded data CS including encoded data generated in each of coding layers up to the L-th layer, and indicator FS including the decision result generated in each of bands (subbands) dividing the first layer coding target band, to multiplexing section 104. Further, band division encoding section 501 outputs a coding residual signal encoded to adaptive residue encoding section 102-2 as input signal P^2(n) of adaptive residue encoding section 102-2.
In band division encoding section 501 shown in
Subband dividing section 552 divides first band signal S1 received as input from band dividing section 551, into a plurality of subband signals S1,sb (sb=1, 2, . . . , Nsb, Nsb, which represents the number of subband divisions). Then, subband dividing section 552 outputs divided subband signals S1,sb to evaluating section 556 and residue calculating section 557.
Encoding section 553 encodes first band signal S1 received as input from band dividing section 551 at a coding bit rate set in advance, and generates first layer encoded data. Then, encoding section 553 outputs generated first layer encoded data to decoding section 554 and multiplexing section 104 (
Decoding section 554 decodes the first layer encoded data received as input from encoding section 553 and generates first layer decoded signal S{tilde over ( )}1. Then, decoding section 554 outputs generated first layer decoded signal S{tilde over ( )}1 to subband dividing section 555.
Similar to subband dividing section 552, subband dividing section 555 divides first layer decoded signal S{tilde over ( )}1 received as input from decoding section 554, into a plurality of subband signals S{tilde over ( )}1,sb. Then, subband dividing section 555 outputs divided subband signals S{tilde over ( )}1,sb to evaluating section 556 and residue calculating section 557.
Evaluating section 556 decides whether or not the residue energy in each subband is lower than a predetermined threshold, using subband signals S1,sb received as input from subband dividing section 552 and subband signals S{tilde over ( )}1,sb received as input from subband dividing section 555. To be more specific, first, evaluating section 556 calculates the evaluation value related to coding performance in each subband of the first layer, using subband signals S1,sb and subband signals S{tilde over ( )}1,sb. For example, evaluating section 556 uses the SNR (Signal to Noise Ratio) for the coding residual signal in each subband, as an evaluation value. To be more specific, evaluating section 556 calculates SNRsb in the sb-th subband according to equation 5. Here, assume that the number of samples of a subband signal in the sb-th subband is P1,sb.
Further, evaluating section 556 decides whether or not the residue energy is lower than a predetermined threshold, based on the calculated evaluation value (SNR) related to coding performance in each subband. To be more specific, evaluating section 556 compares SNRsb of each subband and predetermined threshold SNRthr, and generates following decision result F1,sb in the following sb-th subband.
F1,sb=1 if SNRsb<SNRthr
F1,sb=0 else
That is, evaluating section 556 provides “1” as decision result F1,sb when the evaluation value (SNR) in each subband is lower than a predetermined threshold (i.e. when the residue energy is higher than a predetermined threshold), or provides “0” as decision result F1,sb when the evaluation value (SNR) is equal to or higher than a predetermined threshold (i.e. when the residue energy is equal to or lower than a predetermined threshold). Here, evaluating section 556 may set SNRthr in advance, set SNRthr based on the characteristic of the input signal, or set SNRthr every subband. Then, evaluating section 556 outputs decision result F1,sb in each subband to residue calculating section 557 and multiplexing section 104 (
Residue calculating section 557 calculates the coding residue signal in each subband based on decision result F1,sb received as input from evaluating section 556. To be more specific, in the sb-th subband in which decision result F1,sb is “1,” residue calculating section 557 calculates a coding residual signal in the sb-th subband by subtracting subband signals S{tilde over ( )}1,sb, received as input from subband dividing section 555, from subband signals S1,sb received as input from subband dividing section 552. By contrast, in the sb-th subband in which decision result F1,sb is “0,” residue calculating section 557 does not calculate a coding residual signal. Then, residue calculating section 557 outputs coding residual signal Sr1 of the entire first band including a coding residual signal only in subbands in which decision result F1,sb is “1,” to signal forming section 558.
Signal forming section 558 forms signal S′1 by adding coding residual signal Sr1 received as input from residue calculating section 557 and signal S″1 received as input from band dividing section 551. That is, in the frequency band of first layer primary signal P1(n), signal S′1 has coding residual signal Sr1 in the first band and signal S″1 in the frequency band different from the first band. Then, signal forming section 558 outputs generated signal S′1 to components (not shown) related to second layer coding processing.
Also, band division encoding section 501 uses signal S′1 outputted from signal forming section 558, as an input signal to the second layer. Then, in the second layer, similar to the first layer, band division encoding section 501 divides the input signal into a second band signal of the second layer coding target and a signal different from the second band signal, and encodes the second band signal at a coding bit rate set in advance. Also, band division encoding section 501 uses the signal different from the second band signal, as an input signal in the third layer. Here, band division encoding section 501 uses a frequency band including part of the first band, as the second band. Therefore, band division encoding section 501 preferentially encodes a frequency band signal corresponding to part of the first band in the second band signal. To be more specific, band division encoding section 501 preferentially encodes coding residual signals in part or all of subbands in which subband decision result F1,sb is “1.” The same applies to a third layer or later. Then, band division encoding section 501 outputs, to multiplexing section 104, encoded data CS including encoded data in all coding layers and indicator FS including decision result F1,sb in each subband of the first band.
Next, signal S′1 formed in signal forming section 558 is shown in
By this means, among subbands of the first band, band division encoding section 501 outputs coding residual signals of subbands in which the residue energy is higher than a threshold, to a higher layer as an input signal. Therefore, among coding residual signals obtained in a lower layer, band division encoding section 501 can adaptively select only signals of higher residue energy (i.e. signals of higher importance) as coding residual signals to encode in a higher layer.
Next, the decoding apparatus according to the present embodiment will be explained.
In decoding apparatus 600 shown in
In band division decoding section 601 shown in
Based on decision result F1,sb received as input from demultiplexing section 201, residual signal separating section 652 separates second layer decoded signal S{tilde over ( )}′1 received as input from components (not shown) related to second layer decoding processing (i.e. a signal decoded in the second layer to the L-th layer), to decoded residual signal S{tilde over ( )}r1 of the first band and decoded signal S{tilde over ( )}″1 of the different frequency band from the first band. Then, residual signal separating section 652 outputs decoded residual signal S{tilde over ( )}r1 of the first band to band decoded signal forming section 653 and decoded signal S{tilde over ( )}″1 of the different frequency band from the first band, to decoded signal forming section 654.
Based on decision result F1,sb received as input from demultiplexing section 201, band decoded signal forming section 653 forms the first band decoded signal by adding decoded signal S{tilde over ( )}1 received as input from decoding section 651 and decoded residual signal S{tilde over ( )}r1 received as input from residual signal separating section 652. To be more specific, band decoded signal forming section 653 adds decoded signal S{tilde over ( )}1 and decoded signals of subbands in which decision result F1,sb is “1” in decoded residual signal S{tilde over ( )}r1. Then, band decoded signal forming section 653 outputs a formed first band decoded signal to decoded signal forming section 654.
Decoded signal forming section 654 forms decoded signal P{tilde over ( )}1(n) using the first band decoded signal received as input from band decoded signal forming section 653 and decoded signal S{tilde over ( )}″1 of the frequency band different from the first band received as input from residual signal separating section 652. Then, decoded signal forming section 654 outputs formed decoded signal P{tilde over ( )}1(n) to adder 203 (
Thus, according to the present embodiment, encoding apparatus 500 (
Also, according to the present embodiment, among subbands of the first band of the first layer coding target, only subbands in which the evaluation value (SNR) is less than a predetermined threshold, that is, only subbands in which the residue energy is higher than a predetermined amount, are used as a coding target signal in a higher layer. That is, only signals of the subbands of higher energy in each coding layer (i.e. signals of the subbands of higher perceptual importance) are received as input in a higher layer. Therefore, in each coding layer in band division encoding section 501, encoding apparatus 500 adaptively encodes signals of higher residue energy (i.e. a signal of higher importance) according to a coding result in a lower layer, so that decoding apparatus 600 (
Also, according to the present embodiment, the coding target signal in each coding layer may be a time domain signal or a frequency domain signal (e.g. coefficients after MDCT transform).
Also, a case has been described above with the present embodiment where band division coding processing is applied to a lower coding layer than a coding layer to which adaptive residue coding processing is applied. However, according to the present invention, a coding layer to which band division coding processing is applied is not limited to a lower coding layer than a coding layer to which adaptive residue coding processing is applied. For example, an encoding apparatus may apply band division coding processing to a coding layer in the middle of a plurality of coding layers to which adaptive residue coding processing is applied.
Also, a case has been described above with the present embodiment where band division coding processing is applied to a PCA-transformed primary signal. However, according to the present invention, a signal to which adaptive division coding processing is applied is not limited to a PCA-transformed primary signal. For example, an encoding apparatus may apply band division coding processing to a coding residual signal in a coding layer in the middle of a plurality of coding layers to which adaptive residue coding processing is applied, or an arbitrary input signal different from a PCA-transformed signal. Also, an encoding apparatus may apply band division coding processing alone, without combining band division coding processing and adaptive residue coding processing.
Also, a case has been described above with the present embodiment where, in a band division encoding section, a frequency band set in advance from a lower band to a predetermined band in an input signal, is used as the coding target frequency band in each coding layer. However, according to the present invention, it is possible to adaptively set, for example, a frequency band based on the characteristic of an input signal as the coding target frequency band in each coding layer.
Also, a case has been described above with the present embodiment where an encoding apparatus determines whether or not to calculate the coding residual signal in each subband of the first band based on decision result F1,sb. However, according to the present invention, it is equally possible to calculate coding residual signals in all subbands of the first band, regardless of decision result F1,sb.
Embodiments of the present invention have been described above.
Also, cases have been described above with embodiments where signal energy is used as an index of signal importance. However, according to the present invention, the signal importance is not limited to the signal energy, and, for example, signal's SNR (Signal to Noise Ratio) may be used. The configuration inside selecting section 3021-m of adaptive residue encoding section 102-m in a case where the SNR is used as an index of signal importance, will be explained using the block diagram of
Similarly, encoding section 3206-m, decoding section 3207-m, subtractor 3208-m and inverse PCA transformation section 3209-m generate output stereo signals (left signal L^m2(n) and right signal R^m2(n)) in decoding apparatus 200 in a case where m-th layer secondary signal A^m(n) is encoded (i.e. where selecting section 3021-m selects the secondary signal). Then, measurement value calculating section 3210-m calculates quantitative measurement value M2 (i.e. SNR) using left signal L^m2(n) and right signal R^m2(n) (equation 7).
Comparison section 3211-m compares quantitative measurement value M1 and quantitative measurement value M2, selects the signal of the higher quantitative measurement value (i.e. primary signal or secondary signal) as the signal to be encoded, and outputs indicator Fm to indicate the selected signal. That is, selecting section 3021-m generates an output stereo signal obtained in decoding apparatus 200 upon encoding the primary signal and an output stereo signal obtained in decoding apparatus 200 upon encoding the secondary signal, in selecting section 3021-m. By this means, selecting section 3021-m can calculate the SNR in decoding apparatus 200 as a quantitative measurement value. Therefore, selecting section 3021-m selects the signal of the higher SNR in decoding apparatus 200, so that, similar to the above embodiments, it is possible to minimize the amount of information for reporting bit allocation information and improve the efficiency of coding. Here, the quantitative measurement value to indicate signal importance is not limited to the SNR calculated in equations 6 and 7, and it is equally possible to use, for example, an MNR (Mask to Noise Ratio). For example, when an MNR is used as stereo signal importance, it is possible to obtain the MNR through processing including psychoacoustic modeling of left signal L(n) and right signal R(n) in the stereo signal.
Also, cases have been described above with embodiments where the present invention is applied to time domain stereo signals. However, the present invention is not limited to time domain signals, but is applicable to stereo signals in other domains. For example, it is possible to apply the present invention to stereo signals in the MDCT (Modified Discrete Cosine Transform) domain or LPC (Linear Prediction Coefficient) residual signals obtained by applying an LPC analysis to stereo signals. Also, the present invention is applicable to LPC residual signals in the MDCT domain.
Also, in a case where the encoding apparatus according to the present invention divides an input signal band into a plurality of subbands, the present invention is applicable to subband signals, each of which is the signal of each subband of the input signal. For example, left signal L(n) and right signal R(n) of a stereo signal of an input signal are divided into K subbands to obtain subband signals Lk(n) (k=1 to K) of left signal L(n) and subband signals Rk(n) (k=1 to K) of right signal R(n).
For example, in a stereo signal, a case will be explained with
In
Band dividing section 306 divides LPC residual signal Le(f) in the MDCT domain of the left signal into a plurality of subbands (K subbands in this case), and generates subband signals Le1(f) to LeK(f) of left signal Le(f).
In contrast, analyzing section 307, quantizing section 308, dequantizing section 309, inverse filter 310, T/F section 311 and band dividing section 312 generate subband signals Re1(f) to ReK(f) of right signal Re(f), by applying, to right signal R(n), the same sequential processing as in from LPC analyzing section 301 to band dividing section 306.
Here, for example, a case will be explained where the present invention is applied only to subband signal Le1(f) and subband signal Re1(f) among subband signals Le1(f) to LeK(f) of left signal Le(f) and subband signals Re1(f) and ReK(f) of right signal Re(f). As shown in
In contrast, demultiplexing section 401 of the decoding apparatus shown in
Dequantizing section 451 shown in
In contrast, dequantizing section 455, band combining section 456, F/T section 457 and synthesis filter 458 generate right signal R{tilde over ( )}(n) by applying the same processing as in dequantizing section 451, band combining section 452, F/T section 453 and synthesis filter 454, to quantized code IqR and subband signals Re1(f) to ReK(n) of right signal Re(f).
Thus, by transforming an LPC residual signal of a stereo signal into the MDCT domain, dividing the MDCT-domain signal into a plurality of subbands and applying PCA transformation or adaptive residue coding to the divided band signals, it is possible to perform efficient coding suitable to each subband.
Also, cases have been described above with embodiments where, when a stereo signal is PCA-transformed, PCA transformation parameters before quantization (i.e. elements of the co-variance matrix eigenvector calculated from a stereo signal) are used. However, according to the present invention, it is equally possible to use quantized PCA transformation parameters as PCA transformation parameters to use upon PCA transformation.
Also, cases have been described above with embodiments where adaptive residue coding processing is performed in coding layers from the first layer to the M-th layer. However, according to the present invention, it is possible to omit adaptive residue coding processing in the first layer of the lowest layer. For example, the primary signal is more important information than the secondary signal in the first layer, so that the encoding apparatus can omit adaptive residue coding processing in the first layer and always select the primary signal. In this case, the encoding apparatus transmits indicators in the second layer to the M-th layer. That is, the indicator in the first layer needs not be transmitted, so that it is possible to reduce bit allocation information by one bit. Also, a case is possible where the encoding apparatus encodes both the primary signal and the secondary signal in the first layer and the present invention is applied to the second layer or later coding layers.
Also, cases have been described above with embodiments where adaptive residue coding processing is performed in coding layers from the first layer to the M-th layer. However, according to the present invention, for example, it is equally possible to omit adaptive residue coding processing in the first layer of the lowest layer to a predetermined coding layer. For example, in the first layer to the (i−1)-th layer (i is a natural number equal to or greater than 2), the encoding apparatus may omit adaptive residue coding processing and always select the primary signal. That is, the present invention is applicable to the i-th layer to the M-th layer in the encoding apparatus. Also, a case is possible where the encoding apparatus encodes both the primary signal and the secondary signal in the first layer to the (i−1)-th layer and the present invention is applied in the i-th layer to the M-th layer.
Also, cases have been described above with embodiments where adaptive residue coding processing is performed in coding layers from the first layer to the M-th layer. However, the present invention is applicable to at least one arbitrary coding layer among the first layer to the M-th layer.
Also, PCA transformation may be referred to as KLT (Karhunen Loeve Transform).
Also, example cases have been described with the above embodiments where the decoding apparatus according to the above embodiments receives and processes bit streams transmitted from the encoding apparatus according to the above embodiments. However, the present invention is not limited to this, and an essential requirement is that bit streams received and processed in the decoding apparatus according to the above embodiments are transmitted from an encoding apparatus that can generate bit streams that can be processed in the decoding apparatus according to the above embodiments.
Also, the above explanation is an example of the best mode for carrying out the present invention, and the scope of the present invention is not limited to this. The present invention is applicable to any systems as long as these systems include an encoding apparatus and decoding apparatus.
Also, for example, as a speech encoding apparatus and a speech decoding apparatus, the encoding apparatus and the decoding apparatus according to the present invention can be mounted on a communication terminal apparatus and base station apparatus in a mobile communication system, so that it is possible to provide a communication terminal apparatus, base station apparatus and mobile communication system having the same operational effects as above.
Although example cases have been described with the above embodiments where the present invention is implemented with hardware, the present invention can be implemented with software. For example, by describing the algorithm according to the present invention in a programming language, storing this program in a memory and running this program by an information processing section, it is possible to realize the same function as the encoding apparatus according to the present invention.
Furthermore, each function block employed in the description of each of the aforementioned embodiments may typically be implemented as an LSI constituted by an integrated circuit. These may be individual chips or partially or totally contained on a single chip.
“LSI” is adopted here but this may also be referred to as “IC,” “system LSI,” “super LSI,” or “ultra LSI” depending on differing extents of integration.
Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. After LSI manufacture, utilization of an FPGA (Field Programmable Gate Array) or a reconfigurable processor where connections and settings of circuit cells in an LSI can be reconfigured is also possible.
Further, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Application of biotechnology is also possible.
The disclosures of Japanese Patent Application No. 2008-143863, filed on May 30, 2008, and Japanese Patent Application No. 2008-160954, filed on Jun. 19, 2008, including the specifications, drawings and abstracts, are incorporated herein by reference in their entireties.
Industrial Applicability
For example, the encoding apparatus and the decoding apparatus according to the present invention are suitably used for mobile phones, IP telephones and television conference, and so on.
Number | Date | Country | Kind |
---|---|---|---|
2008-143863 | May 2008 | JP | national |
2008-160954 | Jun 2008 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2009/002384 | 5/29/2009 | WO | 00 | 11/2/2010 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2009/144953 | 12/3/2009 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5483534 | Ohki et al. | Jan 1996 | A |
6571227 | Agrafiotis et al. | May 2003 | B1 |
7599835 | Moriya et al. | Oct 2009 | B2 |
7751572 | Villemoes et al. | Jul 2010 | B2 |
8036904 | Myburg et al. | Oct 2011 | B2 |
8218775 | Norvell et al. | Jul 2012 | B2 |
20050091051 | Moriya et al. | Apr 2005 | A1 |
20050141721 | Aarts et al. | Jun 2005 | A1 |
20050213522 | Aarts et al. | Sep 2005 | A1 |
20060195314 | Taleb et al. | Aug 2006 | A1 |
20060233379 | Villemoes et al. | Oct 2006 | A1 |
20080004883 | Vilermo et al. | Jan 2008 | A1 |
20080195397 | Myburg et al. | Aug 2008 | A1 |
20090083045 | Briand et al. | Mar 2009 | A1 |
20090271184 | Goto et al. | Oct 2009 | A1 |
20090279598 | Moriya et al. | Nov 2009 | A1 |
20100235171 | Takagi et al. | Sep 2010 | A1 |
20100322429 | Norvell et al. | Dec 2010 | A1 |
20100332239 | Kim et al. | Dec 2010 | A1 |
20110004466 | Morii | Jan 2011 | A1 |
20110046946 | Liu et al. | Feb 2011 | A1 |
20110125495 | Morii et al. | May 2011 | A1 |
20120063604 | Myburg et al. | Mar 2012 | A1 |
Number | Date | Country |
---|---|---|
2002-223455 | Aug 2002 | JP |
2005-522721 | Jul 2005 | JP |
2005522722 | Jul 2005 | JP |
2006103581 | Oct 2006 | WO |
2007104883 | Sep 2007 | WO |
Entry |
---|
Dal Yang, Hongmei Al, Chris Kyriakakis and C.-C. Jay Kuo, vol. 11, No. 4, “High-fidelity multichannel audio coding with Karhunen Lôeve Transform”, IEEE transactions on speech and audio processing, Jul. 2003, vol. 11, No. 4, pp. 365-380. |
Manuel Briand, David Virette and Nadine Martin, “Parametric coding of stereo audio based on principal component analysis”, Proc of the 9th, Int. Conference, Sep. 18-20, 2006, pp. 291-296. |
Robbert G van der Waal et al., “Subband coding of stereophonic digital audio signals”, Proc. of IcASSP 1 91, Apr. 14, 1991, vol. 5, pp. 3601-3604. |
“Series G: Transmission System and Media, Digital System and Networks; Digital Terminal Equipments-Coding of analogue signals by methods other than their PCM”, ITU-T Recommendation G.729.1, , pp. 1-91. |
U.S. Appl. No. 12/990,697 to Toshiyuki Morii et al., which was filed Nov. 2, 2010. |
Number | Date | Country | |
---|---|---|---|
20110046946 A1 | Feb 2011 | US |