The present invention relates to a coding apparatus and a coding method used for a communication system that encodes and transmits a signal.
Compression/coding techniques are often used when transmitting a speech signal and/or a sound signal in a packet communication system represented by Internet communication or a mobile communication system or the like, to improve transmission efficiency of the speech signal and/or the sound signal. In addition to simply encoding the speech signal and/or the sound signal at a low bit rate, there is also a growing demand for a technique for encoding a wider band speech signal and/or sound signal and a technique for encoding/decoding with a low amount of processing calculation without causing degradation of sound quality.
Various techniques for satisfying such demands are being developed to reduce the amount of processing calculation without causing quality degradation of a decoded signal. For example, according to a technique disclosed in PTL 1, the amount of processing calculation in pitch period search (adaptive codebook search) is reduced in a code excited linear prediction (CELP) type coding apparatus. More specifically, the coding apparatus sparsifies the update of an adaptive codebook. In a processing method for the sparsification, in the case where the amplitude of a sample does not exceed a given threshold, the value of the sample is replaced with zero (0). In this way, processing (more specifically, multiplication processing) on a portion in which the value of the sample is 0 is omitted at the time of the pitch period search, whereby the amount of calculation is reduced. PTL 1 also discloses a configuration in which the threshold is set to be adaptively variable for each process. PTL 1 also discloses a configuration in which: samples are ranked in descending order of absolute values of samples; and the values of samples other than a desired number of samples from the top in the ranking are replaced with zero (0).
PTL 2 discloses a technique concerning a reduction in the amount of calculation in correlation processing in a frequency domain. According to this technique, when a position at which a low-band spectrum similar to a high-band spectrum appears is specified through correlation analysis, a high-band spectrum whose amplitude value is small is replaced with zero. In this way, part of the processing necessary for the correlation analysis is omitted, whereby the amount of calculation is reduced.
PTL 1 discloses, for example, a configuration in which the coding apparatus adaptively alters, for each process (subframe process), the threshold for selecting samples to be sparsified (samples whose value is replaced with zero (0)) at the time of the pitch period search. According to the above-mentioned method, however, although the average amount of processing calculation over an entire frame can be reduced in some cases, subframes in which the amount of calculation can be reduced and subframes in which the amount of calculation cannot be reduced mixedly exist, so that the amount of processing calculation is not necessarily reduced in frame-based processing. In other words, the above-mentioned method cannot guarantee a reduction in the amount of processing calculation in the worst case (the amount of processing calculation in a frame in which the amount of processing calculation is largest). Accordingly, the amount of processing calculation needs to be significantly reduced also in subframe-based processing, without causing quality degradation of a decoded signal. Similarly, in the case where correlation processing in a frequency domain is performed as in PTL 2, the amount of processing calculation needs to be significantly reduced also in subband-based processing within one frame without causing quality degradation of a decoded signal.
An object of the present invention is to provide a coding apparatus and a coding method that can reliably reduce the amount of subframe-based processing calculation or the amount of subband-based processing calculation (reduce the amount of processing calculation in the worst case) without causing quality degradation of a decoded signal when a correlation operation such as pitch period search is performed at the time of input signal coding.
A coding apparatus according to an aspect of the present invention includes: an acquisition section that acquires transform coefficients whose frequency band is divided between a low-band part and a high-band part; a division section that divides one frequency band of the low-band part and high-band part of the transform coefficients into a plurality of subbands; a setting section that sets a degree of importance for each of the subbands; a changing section that changes, to zero, amplitude values of a predetermined number of transform coefficients of the plurality of transform coefficients included in each of the subbands, in accordance with the set degree of importance; and a calculation section that calculates a correlation between the changed transform coefficients in the one frequency band and the transform coefficients in the other frequency band.
A coding method according to an aspect of the present invention includes: acquiring transform coefficients whose frequency band is divided between a low-band part and a high-band part; dividing one frequency band of the low-band part and the high-band part of the transform coefficients into a plurality of subbands; setting a degree of importance for each of the subbands; changing, to zero, amplitude values of a predetermined number of transform coefficients of the transform coefficients included in each of the subbands, in accordance with the set degree of importance; and calculating a correlation between the changed transform coefficients in the one frequency band and the transform coefficients in the other frequency band.
According to the present invention, when a correlation operation is performed on an input signal, samples (transform coefficients) used for the correlation operation are adaptively adjusted for each process, whereby the amount of processing calculation can be remarkably reduced while quality degradation of an output signal is suppressed. The degree of importance of each subframe (the degree of importance of each subband) is determined in advance over an entire frame, and the number of samples (or transform coefficients) used for the correlation operation is determined for each subframe (each subband) in accordance with each degree of importance, whereby a reduction in the amount of processing calculation in the worst case can be guaranteed.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. A speech coding apparatus and a speech decoding apparatus will be described as an example of the coding apparatus and decoding apparatus according to the present invention.
Coding apparatus 101 divides an input signal into blocks of N samples (N=1, 2, . . . ) each and encodes the input signal in frame units, with one frame including N samples. The input signal to be encoded is expressed as xn (n=0, . . . , N−1) in this case. Symbol n represents an (n+1)-th signal element of the input signal divided into blocks of N samples. Coding apparatus 101 transmits encoded input information (coding information) to decoding apparatus 103 via transmission path 102.
Decoding apparatus 103 receives the coding information transmitted from coding apparatus 101 via transmission path 102, decodes the coding information and obtains an output signal.
Subframe energy calculation section 201 receives an input signal. Subframe energy calculation section 201 first divides the received input signal into subframes. Hereinafter, a configuration will be described in which input signal Xn (n=0, . . . , N−1, that is, N samples) is divided into, for example, Ns subframes (subframe index k=0 to Ns−1).
Then, subframe energy calculation section 201 calculates subframe energy Ek (k=0, . . . , Ns−1) for each divided subframe according to expression 1. Then, subframe energy calculation section 201 outputs calculated subframe energy Ek to degree-of-importance determining section 202. Here, it is assumed that startk and endk in expression 1 indicate the leading sample index and the tail-end sample index, respectively, of a subframe whose subframe index is k.
Degree-of-importance determining section 202 receives subframe energy Ek (k=0, . . . , Ns−1) from subframe energy calculation section 201. Degree-of-importance determining section 202 sets the degree of importance of each subframe on the basis of the subframe energy. More specifically, degree-of-importance determining section 202 sets a higher degree of importance to a subframe whose subframe energy is larger. Hereinafter, the degree of importance set to each subframe is referred to as degree-of-importance information. Hereinafter, the degree-of-importance information is represented by Ik (k=0, . . . , Ns−1), and it is assumed that Ik having a smaller value indicates a higher degree of importance. For example, degree-of-importance determining section 202 sorts subframe energies Ek, respectively, of the received subframes in descending order, and sets a higher degree of importance (that is, degree-of-importance information Ik having a smaller value) in order from a subframe corresponding to the leading subframe energy after the sorting (a subframe whose subframe energy is largest).
For example, in the case where subframe energies Ek satisfy a relation of expression 2, degree-of-importance determining section 202 sets the degree of importance (degree-of-importance information Ik) of each subframe (a processing unit of CELP coding) as shown in expression 3.
[2]
E0≧E2≧E1≧E3 (Expression 2)
[3]
I0=1
I1=3
I2=2
I3=4 (Expression 3)
That is, degree-of-importance determining section 202 sets a higher degree of importance (degree-of-importance information Ik having a smaller value) to a subframe whose subframe energy Ek is larger. Here, the respective pieces of degree-of-importance information Ik of the subframes within one frame are different from one another in expression 3. Namely, degree-of-importance determining section 202 sets the degrees of importance such that the respective pieces of degree-of-importance information Ik of the subframes within one frame are always different from one another.
Then, degree-of-importance determining section 202 outputs set degree-of-importance information Ik (k=0, . . . , Ns−1) to CELP coding section 203. In expression 2 and expression 3, an example case where the number of subframes is 4 has been described, but the number of subframes is not limited in the present invention, and the present invention is similarly applicable to the numbers of subframes other than 4 given as an example. Furthermore, expression 3 shows example setting of degree-of-importance information Ik, and the present invention is similarly applicable to setting thereof using values other than those in expression 3.
CELP coding section 203 receives the input signal, and receives degree-of-importance information Ik (k=0, . . . , Ns−1) from degree-of-importance determining section 202. CELP coding section 203 encodes the input signal using the received degree-of-importance information. Hereinafter, details of coding processing by CELP coding section 203 will be described.
Pre-processing section 301 performs, on input signal xn, high pass filter processing of removing a DC component and waveform shaping processing or pre-emphasis processing for improving the performance of subsequent coding processing. Pre-processing section 301 outputs input signal Xn (n=0, . . . , N−1) obtained by applying the processing to perceptual weighting section 302 and LPC analysis section 304.
Perceptual weighting section 302 performs perceptual weighting on input signal Xn outputted from pre-processing section 301, using quantized LPCs outputted from LPC quantization section 305, and generates perceptually-weighted input signal WXn (n=0, . . . , N−1). Then, perceptual weighting section 302 outputs perceptually-weighted input signal WXn to sparsification processing section 303.
Sparsification processing section 303 performs sparsification processing on perceptually-weighted input signal WXn received from perceptual weighting section 302, using degree-of-importance information Ik (k=0, . . . , Ns−1) received from degree-of-importance determining section 202 (
Sparsification processing section 303 performs the sparsification processing on received perceptually-weighted input signal WXn on the basis of the received degree-of-importance information Ik (k=0, . . . , Ns−1). Here, as an example of the sparsification processing, processing of: selecting a predetermined number of samples in descending order from the largest absolute value of amplitude; and changing the values of the other samples to 0 is performed on perceptually-weighted input signal WXn. In this example, the predetermined number is adaptively determined on the basis of degree-of-importance information Ik (k=0, . . . , Ns−1). A setting example of the predetermined number when degree-of-importance information Ik (k=0, . . . , Ns−1) is as shown in expression 3 is shown in expression 4 given below. Here, it is assumed that the predetermined number is represented by Tk (k=0, . . . , Ns−1), and expression 4 shows an example case where the number Ns of subframes is 4.
[4]
T0=12
T1=6
T2=10
T3=8 (Expression 4)
In the case of expression 4, for the first subframe (subframe index k=0), sparsification processing section 303 performs, on perceptually-weighted input signal WXn (n=start0 to end0), processing of: selecting a predetermined number (T0=12) of samples in descending order from the largest absolute value of amplitude; and setting the values of the other samples than the selected samples to 0. Similarly, for the second subframe (subframe index k=1), sparsification processing section 303 performs, on perceptually-weighted input signal WXn (n=start1 to ends), processing of: selecting a predetermined number (T1=6) of samples in descending order from the largest absolute value of amplitude; and setting the values of the other samples than the selected samples to 0. Also for the third subframe (subframe index k=2) and the fourth subframe (subframe index k=3), sparsification processing section 303 performs similar processing.
That is, sparsification processing section 303 sets larger predetermined number Tk to a subframe whose value of degree-of-importance information Ik is smaller (a subframe whose degree of importance is higher). In other words, sparsification processing section 303 sets a smaller number of samples whose amplitude value is changed to zero, to a subframe whose value of degree-of-importance info illation Ik is smaller (a subframe whose degree of importance is higher). Furthermore, sparsification processing section 303 changes, to zero, the amplitude values of a predetermined number (that is, the number of samples within one subframe−Tk) of samples whose amplitude value is smaller, of the plurality of samples constituting the input signal in each subframe.
Then, sparsification processing section 303 outputs the input signal after the sparsification processing (sparsified perceptually-weighted input signal SWXn) to adding section 313.
LPC analysis section 304 performs linear predictive analysis using input signal Xn outputted from pre-processing section 301 and outputs the analysis result (linear prediction coefficients: LPCs) to LPC quantization section 305.
LPC quantization section 305 performs quantization processing on the linear prediction coefficients (LPCs) outputted from LPC analysis section 304 and outputs the obtained quantized LPCs to perceptual weighting section 302 and perceptual weighting synthesis filter 312. Furthermore, LPC quantization section 305 outputs a code (L) representing the quantized LPCs to multiplexing section 315.
Adaptive excitation codebook 306 stores, in a buffer, excitation that is outputted in the past from adding section 311, extracts samples corresponding to one frame from the past excitation specified by a signal outputted from parameter determining section 314 (to be described later), as an adaptive excitation vector, and outputs the samples to multiplying section 309.
Quantization gain generation section 307 outputs a quantization adaptive excitation gain and a quantization fixed excitation gain specified by a signal outputted from parameter determining section 314 to multiplying section 309 and multiplying section 310 respectively.
Fixed excitation codebook 308 outputs a pulse excitation vector having a shape specified by a signal outputted from parameter determining section 314 to multiplying section 310 as a fixed excitation vector. Fixed excitation codebook 308 may output a vector obtained by multiplying the pulse excitation vector by a spreading vector to multiplying section 310 as the fixed excitation vector.
Multiplying section 309 multiplies the adaptive excitation vector outputted from adaptive excitation codebook 306 by the quantization adaptive excitation gain outputted from quantization gain generation section 307, and outputs the adaptive excitation vector multiplied by the gain to adding section 311. Furthermore, multiplying section 310 multiplies the fixed excitation vector outputted from fixed excitation codebook 308 by the quantization fixed excitation gain outputted from quantization gain generation section 307, and outputs the fixed excitation vector multiplied by the gain to adding section 311.
Adding section 311 performs vector addition on the adaptive excitation vector multiplied by the gain outputted from multiplying section 309 and the fixed excitation vector multiplied by the gain outputted from multiplying section 310 and outputs excitation, which is the addition result, to perceptual weighting synthesis filter 312 and adaptive excitation codebook 306. The excitation outputted to adaptive excitation codebook 306 is stored in the buffer of adaptive excitation codebook 306.
Perceptual weighting synthesis filter 312 performs filter synthesis on the excitation outputted from adding section 311, using filter coefficients based on the quantized LPCs outputted from LPC quantization section 305, thus generates synthesized signal HPn (n=0, . . . , N−1), and outputs synthesized signal HPn to adding section 313.
Adding section 313 inverts the polarity of synthesized signal HPn outputted from perceptual weighting synthesis filter 312, adds the synthesized signal with the inverted polarity to sparsified perceptually-weighted input signal SWXn outputted from sparsification processing section 303, thus calculates an error signal, and outputs the error signal to parameter determining section 314.
Parameter determining section 314 selects an adaptive excitation vector, a fixed excitation vector, and a quantization gain that minimize coding distortion of the error signal outputted from adding section 313, from adaptive excitation codebook 306, fixed excitation codebook 308, and quantization gain generation section 307 respectively, and outputs an adaptive excitation vector code (A), a fixed excitation vector code (F), and a quantization gain code (G) showing the selection results to multiplexing section 315.
Here, details of processing by adding section 313 and parameter determining section 314 will be described. Coding apparatus 101 obtains a correlation between: the input signal that has been subjected to particular processing (such as the pre-processing and the perceptual weighting processing); and the synthesized signal generated using the codebooks (adaptive excitation codebook 306 and fixed excitation codebook 308) and the filter coefficients based on the quantized LPCs, and thus encodes the input signal. More specifically, parameter determining section 314 searches for synthesized signal HPn (namely, indexes (codes (A), (F), and (G))) whose error (coding distortion) with sparsified perceptually-weighted input signal SWXn is minimum. At this time, the error is calculated in the following manner.
Normally, error Dk between the two signals (synthesized signal HPn and sparsified perceptually-weighted input signal SWXn) is calculated as shown in expression 5.
In expression 5, the first term is energy of sparsified perceptually-weighted input signal SWXn, which is constant. This means that the second term needs to be maximized in order to minimize error Dk in expression 5. Here, in the present invention, sparsification processing section 303 limits samples targeted for calculation of the second term in expression 5, using degree-of-importance information Ik (k=0, . . . , Ns−1) outputted from degree-of-importance determining section 202 (
More specifically, sparsification processing section 303 selects, for each subframe k, predetermined number Tk (set in accordance with degree-of-importance information Ik) of samples in descending order of absolute value of amplitude (in order from the largest absolute value of amplitude). As a result, the second term in expression 5 is calculated for only the selected samples. That is, adding section 313 calculates a correlation between: an input signal in each subframe, the input signal including a predetermined number of samples whose amplitude value is changed to zero, of a plurality of samples constituting the input signal; and a synthesized signal.
For example, in the case where degree-of-importance information Ik has values shown in expression 3, as shown in expression 4, for the first subframe (subframe index k=0), sparsification processing section 303 selects “12” (T0=12) samples whose absolute value of amplitude is large (the top 12 samples in the ranking of absolute value of amplitude). Similarly, for the second subframe (subframe index k=1), sparsification processing section 303 selects “6” (T1=6) samples whose absolute value of amplitude is large (the top 6 samples in the ranking of absolute value of amplitude). Also for the third subframe (subframe index k=2) and the fourth subframe (subframe index k=3), sparsification processing section 303 performs similar processing.
In this way, sparsification processing section 303 adaptively adjusts the number of samples targeted for calculation of the second term in expression 5, among the subframes within one frame. At this time, the values of the unselected samples are changed to zero (0), and hence parameter determining section 314 can omit multiplication processing of the second term in expression 5 for the unselected samples, so that the amount of processing calculation of expression 5 can be remarkably reduced. Furthermore, sparsification processing section 303 adjusts the number of selected samples for all the subframes within one frame, and hence the amount of processing calculation can be reduced for all the subframes, so that a reduction in the amount of processing calculation in the worst case can be guaranteed.
Multiplexing section 315 multiplexes: the code (L) representing the quantized LPCs outputted from LPC quantization section 305; and the adaptive excitation vector code (A), the fixed excitation vector code (F), and the quantization gain code (G) outputted from parameter determining section 314, and outputs the multiplexing result as coding information to transmission path 102.
Hereinabove, the processing by CELP coding section 203 illustrated in
Hereinabove, the processing by coding apparatus 101 illustrated in
Next, an internal configuration of decoding apparatus 103 illustrated in
Demultiplexing section 401 demultiplexes the coding information received via transmission path 102 into individual codes ((L), (A), (G), and (F)). The demultiplexed LPC code (L) is outputted to LPC decoding section 402. The demultiplexed adaptive excitation vector code (A) is outputted to adaptive excitation codebook 403. The demultiplexed quantization gain code (G) is outputted to quantization gain generation section 404. The demultiplexed fixed excitation vector code (F) is outputted to fixed excitation codebook 405.
LPC decoding section 402 decodes the quantized LPCs from the code (L) outputted from demultiplexing section 401, and outputs the decoded quantized LPCs to synthesis filter 409.
Adaptive excitation codebook 403 extracts samples corresponding to one frame from past excitation specified by the adaptive excitation vector code (A) outputted from demultiplexing section 401, as adaptive excitation vectors, and outputs the samples to multiplying section 406.
Quantization gain generation section 404 decodes the quantization adaptive excitation gain and the quantization fixed excitation gain specified by the quantization gain code (G) outputted from demultiplexing section 401, outputs the quantization adaptive excitation gain to multiplying section 406, and outputs the quantization fixed excitation gain to multiplying section 407.
Fixed excitation codebook 405 generates a fixed excitation vector specified by the fixed excitation vector code (F) outputted from demultiplexing section 401, and outputs the fixed excitation vector to multiplying section 407.
Multiplying section 406 multiplies the adaptive excitation vector outputted from adaptive excitation codebook 403 by the quantization adaptive excitation gain outputted from quantization gain generation section 404, and outputs the adaptive excitation vector multiplied by the gain to adding section 408. On the other hand, multiplying section 407 multiplies the fixed excitation vector outputted from fixed excitation codebook 405 by the quantization fixed excitation gain outputted from quantization gain generation section 404, and outputs the fixed excitation vector multiplied by the gain to adding section 408.
Adding section 408 adds up the adaptive excitation vector multiplied by the gain outputted from multiplying section 406 and the fixed excitation vector multiplied by the gain outputted from multiplying section 407, generates excitation, and outputs the excitation to synthesis filter 409 and adaptive excitation codebook 403.
Synthesis filter 409 performs filter synthesis of the excitation outputted from adding section 408, using the filter coefficients based on the quantized LPCs decoded by LPC decoding section 402, and outputs the synthesized signal to post-processing section 410.
Post-processing section 410 performs processing of improving the subjective quality of speech such as formant emphasis and pitch emphasis, processing of improving the subjective quality of static noise, and the like on the signal outputted from synthesis filter 409, and outputs the processed signal as an output signal.
Hereinabove, the processing by decoding apparatus 103 illustrated in
Thus, according to the present embodiment, the coding apparatus that adopts the CELP type coding method first calculates subframe energy for each subframe over the entire frame. Subsequently, the coding apparatus sets the degree of importance of each subframe in accordance with the calculated subframe energy. Then, at the time of pitch period search in each subframe, the coding apparatus selects a predetermined number (set in accordance with the degree of importance) of samples whose absolute value of amplitude is large, performs error calculation on only the selected samples, and calculates an optimal pitch cycle. This configuration can guarantee a significant reduction in the amount of processing calculation over the entire frame.
The coding apparatus does not equally determine, for all the subframes, the number of samples targeted for the correlation calculation (distance calculation) at the time of the pitch period search, but can adaptively vary the number of samples in accordance with the degree of importance of each subframe. More specifically, the coding apparatus can perform the pitch period search with high accuracy on subframes whose subframe energy is large and which are perceptually important (subframes whose degree of importance is high). On the other hand, the coding apparatus can perform the pitch period search with low accuracy on subframes whose subframe energy is small and which have small influence on perception (subframes whose degree of importance is low), whereby the amount of processing calculation can be significantly reduced. This can suppress significant quality degradation of a decoded signal.
In the present embodiment, description has been given of an example configuration in which degree-of-importance determining section 202 (
In the present embodiment, sparsification processing section 303 (
In the present embodiment, description has been given of the case where the sparsification processing is performed on the input signal (here, sparsified perceptually-weighted input signal SWXn). In the present invention, not limited to the input signal, even if the sparsification processing is performed on the synthesized signal (here, synthesized signal HPn) whose correlation with the input signal is calculated, effects similar to those in the above-mentioned embodiment can be obtained. Namely, the coding apparatus may modify, to zero, the amplitude values of a predetermined number of samples of a plurality of samples constituting at least one signal of the input signal and the synthesized signal in each subframe, in accordance with the degree of importance set to each subframe, and may calculate a correlation between the input signal and the synthesized signal. Furthermore, the present invention is similarly applicable to a configuration in which, for both the input signal and the synthesized signal in each subframe, the coding apparatus changes, to zero, the amplitude values of a predetermined number of samples of a plurality of samples constituting each signal, and calculates a correlation between the input signal and the synthesized signal.
In the present embodiment, description has been given of the case where the sparsification processing is performed on sparsified perceptually-weighted input signal SWXn. The present invention is similarly applicable to the case where the pre-processing by pre-processing section 301 and the perceptual weighting processing by perceptual weighting section 302 are not performed on the input signal. In this case, sparsification processing section 303 performs the sparsification processing on input signal Xn.
In the present embodiment, an example configuration in which CELP coding section 203 adopts the CELP type coding method has been described. The present invention is not limited to this configuration, and is similarly applicable to coding methods other than the CELP type coding method. In another example configuration, the present invention is applied to a signal correlation operation between frames when coding parameters in a current frame are calculated using an encoded signal in a past frame without performing LPC analysis.
In Embodiment 1, the correlation analysis processing in the time domain has been described. In comparison, in the present embodiment, correlation analysis processing in a frequency domain will be described.
Coding apparatus 501 mainly includes an input terminal, down-sampling section 601, low-band signal coding section 602, low-band signal decoding section 603, delaying section 604, high-band signal coding section 605, multiplexing section 606, and an output terminal.
A digitized speech signal or a digitized music signal is inputted to the input terminal.
Down-sampling section 601 down-samples the input signal received via the input terminal and generates a signal having a low sampling rate. Down-sampling section 601 outputs the down-sampled signal to low-band signal coding section 602.
Low-band signal coding section 602 encodes the down-sampled signal received from down-sampling section 601. Low-band signal coding section 602 outputs the obtained coding code to low-band signal decoding section 603 and multiplexing section 606 (multiplexer).
Low-band signal decoding section 603 generates a decoded low-band signal using the coding code received from low-band signal coding section 602. Low-band signal decoding section 603 outputs the generated decoded low-band signal to high-band signal coding section 605.
Delaying section 604 gives a delay having a predetermined length to the input signal received via the input terminal, and outputs the delayed input signal to high-band signal coding section 605.
High-band signal coding section 605 encodes a high-band part of the input signal received from delaying section 604, using the decoded low-band signal received from low-band signal decoding section 603. High-band signal coding section 605 outputs the generated coding code to multiplexing section 606.
Multiplexing section 606 multiplexes the coding code received from low-band signal coding section 602 and the coding code received from high-band signal coding section 605 and outputs the multiplexing result as coding information via the output terminal.
The decoded low-band signal is inputted from low-band signal decoding section 603 (
Frequency domain transform section 701 performs frequency transform on the decoded low-hand signal received via the input terminal, and calculates decoded low-band spectrum X1k.
Frequency domain transform section 702 performs frequency transform on the input signal received via the input terminal, and calculates input spectrum X2k.
Here, discrete Fourier transform (DFT), discrete cosine transform (DCT), changed discrete cosine transform (MDCT), and the like are applied to the frequency transform by frequency domain transform sections 701 and 702. Hereinafter, a spectrum may also be referred to as transform coefficients in some cases. That is, frequency domain transform section 702 acquires input spectrum X2k. The frequency band of input spectrum (transform coefficients) X2k can be divided between a high-band part and a low-band part. Furthermore, frequency domain transform section 701 acquires decoded low-band spectrum X1k corresponding to a low-band part of the spectrum of the input signal (input spectrum).
Subband energy calculation section 703 receives the input spectrum from frequency domain transform section 702. Subband energy calculation section 703 first divides the high-band part of the received input spectrum into a plurality of subbands. Hereinafter, description will be given of, for example, a configuration in which high-band part X2k (k=0, . . . , K−1; that is, K transform coefficients) of the input spectrum is divided into NM subbands (subband index m=0 to NM−1).
Subband energy calculation section 703 calculates, for each divided subband, subband energy Em (m=0, . . . , NM−1) of high-band part X2k of the input spectrum according to expression 6. Then, subband energy calculation section 703 outputs calculated subband energy Em to degree-of-importance determining section 704. In expression 6, startm and endm indicate the transform coefficient index of the lowest frequency and the transform coefficient index of the highest frequency, respectively, of the subband whose subband index is m.
Degree-of-importance determining section 704 receives subband energy Em (m=0, . . . , NM−1) from subband energy calculation section 703. Degree-of-importance determining section 704 sets the degree of importance of each subband. For example, degree-of-importance determining section 704 sets the degree of importance of each subband on the basis of the subband energy. More specifically, degree-of-importance determining section 704 sets a higher degree of importance for a subband whose subband energy is larger. Hereinafter, the degree of importance set to each subband is referred to as degree-of-importance information. Hereinafter, the degree-of-importance information is represented by Im (m=0, . . . , NM−1), and it is assumed that Im having a smaller value indicates a higher degree of importance. For example, degree-of-importance determining section 704 sorts respective received subband energies Em of subbands in descending order, and sets a higher degree of importance (that is, degree-of-importance information Im having a smaller value) in order from a subband corresponding to the leading subband energy after the sorting (a subband whose subband energy is largest).
For example, in the case where subband energies Em satisfy the relation of expression 7, degree-of-importance determining section 704 sets the degree of importance (degree-of-importance information Im) of each subband as shown in expression 8.
[7]
E0≧E2≧E1≧E3 (Expression 7)
[8]
I0=1
I1=3
I2=2
I3=4 (Expression 8)
That is, degree-of-importance determining section 704 sets a higher degree of importance (degree-of-importance information Im having a smaller value) for a subband whose subband energy Em is larger. Here, the respective pieces of degree-of-importance information Im of the subbands are different from one another in expression 8. Namely, degree-of-importance determining section 704 sets the degrees of importance such that the respective pieces of degree-of-importance information Im of the subbands are always different from one another.
Then, degree-of-importance determining section 704 outputs set degree-of-importance information Im (m=0, . . . , NM−1) to sparsification processing section 705. In expression 7 and expression 8, an example case where the number of subbands is 4 has been described, but the number of subbands is not limited in the present invention, and the present invention is similarly applicable to a case where the number of subbands is other than four described as an example. Furthermore, expression 8 shows mere example setting of degree-of-importance information Im, and the present invention is similarly applicable a setting using values other than those used in expression 8.
Sparsification processing section 705 performs sparsification processing on high-band part X2k of the input spectrum received from frequency domain transform section 702, using degree-of-importance information Im (m=0, . . . , NM−1) received from degree-of-importance determining section 704. For example, sparsification processing section 705 performs sparsification processing of changing, to zero, the amplitude values of a predetermined number of transform coefficients of a plurality of transform coefficients (transform coefficient indexes startk to endk) constituting high-band part X2k of the input spectrum in each subband m. Hereinafter, details of the sparsification processing will be described.
Sparsification processing section 705 performs, in subband units, the sparsification processing on high-band part X2k of the received input spectrum on the basis of the received degree-of-importance information Im (m=0, . . . , NM−1). Here, as an example of the sparsification processing, processing of: selecting a predetermined number of transform coefficients in descending order from the largest absolute value of amplitude; and changing the values of the other transform coefficients to 0 is performed on high-band part X2k of the input spectrum. In this example, the predetermined number is adaptively determined on the basis of degree-of-importance information Im (m=0, . . . , MM−1). A setting example of the predetermined number when degree-of-importance information (m=0, . . . , NM−1) is as shown in expression 8 is shown in expression 9 given below. Here, it is assumed that the predetermined number is represented by Tm (m=0, . . . , NM−1), and expression 9 shows an example case where the number NM of subbands is 4.
[9]
T0=12
T1=6
T2=10
T3=8 (Expression 9)
In the case of expression 9, for the first subband (subband index m=0), sparsification processing section 705 performs, on high-band part X2k (k=start0 to end0) of the input spectrum, processing of: selecting a predetermined number (T0=12) of transform coefficients in descending order from the largest absolute value of amplitude; and setting (changing) the values of the other transform coefficients than the selected transform coefficients to 0. Similarly, for the second subband (subband index m=1), sparsification processing section 705 performs, on high-band part X2k (k=starts to ends) of the input spectrum, processing of: selecting a predetermined number (T1=6) of transform coefficients in descending order from the largest absolute value of amplitude; and setting (changing) the values of the other transform coefficients than the selected transform coefficients to 0. Also for the third subband (subband index m=2) and the fourth subband (subband index m=3), sparsification processing section 705 performs similar processing.
That is, sparsification processing section 705 sets larger predetermined number Tm for a subband whose value of degree-of-importance information Im is smaller (a subband whose degree of importance is higher). In other words, sparsification processing section 705 sets a smaller number of transform coefficients whose amplitude value is changed to zero, for a subband whose value of degree-of-importance information Im is smaller (a subband whose degree of importance is higher). Furthermore, sparsification processing section 705 sets (changes), to zero, the amplitude values of a predetermined number (that is, the number of transform coefficients within one subband−Tm) of transform coefficients whose amplitude value is smaller, of the plurality of transform coefficients constituting the high-band part of the input spectrum in each subband.
Then, sparsification processing section 705 outputs high-band part X2k of the input spectrum after the sparsification processing (high-band part SX2k of sparsified input spectrum) to correlation analysis section 706.
Correlation analysis section 706 analyzes, in subband units, a correlation between: decoded low-band spectrum X1k (corresponding to the low-band part of the input spectrum) received from frequency domain transform section 701; and high-band part SX2k of the input spectrum after the sparsification processing received from sparsification processing section 705, and obtains the amount of shift d when the correlation value is maximum. Then, correlation analysis section 706 outputs the amount of shift d of each subband to multiplexing section 606 (
In expression 10, d represents the amount of shift, represents the minimum value of the search range for the amount of shift, Dmax represents the maximum value of the search range for the amount of shift, and Corm(d) represents the correlation value at amount of shift d in the mth subband.
Correlation analysis section 706 obtains the amount of shift dmax when the correlation value is maximum, on the basis of correlation value Corm(d) calculated according to expression 10, performs coding with the obtained amount of shift dmax being set as the amount of shift in the mth subband, and outputs the resultant coding code to multiplexing section 606 (
In this way, in the present embodiment, sparsification processing section 705 reduces the amount of processing calculation at the time of the calculation of expression 10, using degree-of-importance information Im (m=0, . . . , NM−1) outputted from degree-of-importance determining section 704.
More specifically, sparsification processing section 705 selects, for each subband m, predetermined number Tm (set in accordance with degree-of-importance information Im) of transform coefficients in descending order of absolute value of amplitude (in order from the largest absolute value of amplitude). As a result, the processing in expression 10 is performed on only the selected transform coefficients. That is, correlation analysis section 706 calculates a correlation between: a high-band part of an input spectrum in each subband, the high-band part of the input spectrum including a predetermined number of transform coefficients whose amplitude value is changed to zero, in a plurality of subbands constituting the high-band part of the input spectrum; and a decoded low-band spectrum.
For example, in the case where degree-of-importance information Im has values indicated in expression 8, as shown in expression 9, for the first subband (subband index m=0), sparsification processing section 705 selects “12” (T0=12) transform coefficients whose absolute value of amplitude is large (the top 12 transform coefficients in the ranking of absolute value of amplitude). Similarly, for the second subband (subband index m=1), sparsification processing section 705 selects “6” (T1=6) transform coefficients whose absolute value of amplitude is large (the top 6 transform coefficients in the ranking of absolute value of amplitude). Also for the third subband (subband index m=2) and the fourth subband (subband index m=3), sparsification processing section 705 performs similar processing.
In this way, sparsification processing section 705 adaptively adjusts the number of transform coefficients targeted for calculation of the correlation value in expression 10, among the subbands within the frame. At this time, the values of the unselected transform coefficients are changed to zero (0), and hence correlation analysis section 706 can omit part of the processing in expression 10, so that the amount of processing calculation of expression 10 can be remarkably reduced. Furthermore, sparsification processing section 705 adjusts the number of selected transform coefficients among all the subbands within one frame, and hence the amount of processing calculation can be reduced for all the subbands, so that the amount of processing calculation in the worst case can be remarkably reduced.
Hereinabove, the processing by coding apparatus 501 according to the present embodiment has been described.
Next, processing by a decoding apparatus according to the present embodiment will be described.
Decoding apparatus 801 mainly includes an input terminal, demultiplexing section 901, low-band signal decoding section 902, up-sampling section 903, high-band signal decoding section 904, adding section 905, and an output terminal.
Coding information is inputted to the input terminal. Demultiplexing section 901 demultiplexes the coding information received via the input terminal into a coding code for low-band signal decoding section 902 and a coding code for high-band signal decoding section 904.
The coding code for low-band signal decoding section 902 is the coding code of the down-sampled signal encoded by low-band signal coding section 602 (
Low-band signal decoding section 902 generates a decoded low-band signal using the coding code obtained by demultiplexing section 901, and outputs the generated decoded low-band signal to up-sampling section 903 and high-band signal decoding section 904.
Up-sampling section 903 up-samples (increases the sampling frequency of) the decoded low-band signal received from low-band signal decoding section 902, and generates a signal having a high sampling rate. Up-sampling section 903 outputs the up-sampled signal to adding section 905.
High-band signal decoding section 904 receives the coding code demultiplexed by demultiplexing section 901 and the decoded low-band signal generated by low-band signal decoding section 902. High-band signal decoding section 904 performs decoding processing (to be described later), generates a decoded high-band signal, and outputs the generated decoded high-band signal to adding section 905.
Adding section 905 adds up the up-sampled decoded low-band signal received from up-sampling section 903 and the decoded high-band signal received from high-band signal decoding section 904, generates an output signal, and outputs the output signal to the output terminal.
The decoded low-band signal is inputted from low-band signal decoding section 902 (
Furthermore, the coding code is inputted from demultiplexing section 901 (
Frequency domain transform section 1001 performs frequency transform on the decoded low-band signal received via the input terminal, and calculates decoded low-band spectrum X1(k). Discrete Fourier transform (DFT), discrete cosine transform (DCT), changed discrete cosine transform (MDCT), and the like are applied to the frequency transform by frequency domain transform section 1001. Frequency domain transform section 1001 outputs calculated decoded low-band spectrum X1(k) to high-band spectrum generation section 1002.
High-band spectrum generation section 1002 refers to the amount of shift of each subband on the basis of the coding code received via the input terminal, copies a spectrum indicated by the amount of shift to the high-band part from the decoded low-band spectrum received from frequency domain transform section 1001, and generates a decoded high-band spectrum. This copy processing is performed for each subband. High-band spectrum generation section 1002 outputs the generated decoded high-hand spectrum to time domain transform section 1003.
Time domain transform section 1003 transforms the decoded high-band spectrum received from high-band spectrum generation section 1002 into a time-domain signal, and outputs the time-domain signal via the output terminal. At this time, time domain transform section 1003 performs appropriate processing such as windowing and superposition addition, to thereby avoid discontinuity that otherwise occurs between frames.
Hereinabove, the processing by decoding apparatus 801 according to the present embodiment has been described.
Thus, according to the present embodiment, the coding apparatus first acquires transform coefficients (spectrum) whose frequency band is divided between a low-band part and a high-band part. Subsequently, the coding apparatus divides one frequency band of the low-band part and the high-band part (in the present embodiment, the high-band part) of the transform coefficients into a plurality of subbands. Subsequently, the coding apparatus sets the degree of importance of each subband. Then, the coding apparatus changes, to zero, the amplitude values of a predetermined number of transform coefficients of the transform coefficients included in each subband, in accordance with the set degree of importance. Then, the coding apparatus calculates a correlation between the transform coefficients in the low-band part and the changed transform coefficients in the high-band part. This configuration can guarantee a significant reduction in the amount of processing calculation over the entire frequency band (for all the plurality of subbands).
The coding apparatus does not equally determine, for all the subbands, transform coefficients targeted for the correlation calculation (amount-of-shift calculation), but can adaptively vary the transform coefficients in accordance with the degree of importance of each subband. More specifically, the coding apparatus can perform the amount-of-shift search with high accuracy on subbands whose subband energy is large and which are perceptually important (subbands whose degree of importance is high). On the other hand, the coding apparatus can perform the amount-of-shift search with low accuracy on subbands whose subband energy is small and which have small influence on perception (subbands whose degree of importance is low), whereby the amount of processing calculation can be significantly reduced. This can suppress significant quality degradation of a decoded signal.
In Embodiment 2, the configuration in which the sparsification processing is performed on high-band part X2k of the input spectrum has been described. In the present embodiment, the configuration in which the sparsification processing is performed on decoded low-band spectrum X1k (that is, the low-band part of the input spectrum) will be described.
Subband energy calculation section 703a first divides the decoded low-band spectrum received from frequency domain transform section 701 into a plurality of subbands. Hereinafter, description will be given of, for example, a configuration in which decoded low-band spectrum X1k (k=0, . . . , K−1; that is, K transform coefficients) is divided into NJ subbands (subband index j=0 to NJ−1).
Subband energy calculation section 703a calculates, for each divided subband, subband energy Ej (j=0, . . . , NJ−1) of decoded low-band spectrum X1k according to expression 11. Then, subband energy calculation section 703a outputs calculated subband energy Ej to degree-of-importance determining section 704a. In expression 11, NJ indicates the number of subbands of the decoded low-band spectrum, and STARTj and ENDj indicate the transform coefficient index of the lowest frequency and the transform coefficient index of the highest frequency, respectively, of the subband whose subband index is j.
Degree-of-importance determining section 704a receives subband energy Ej (j=0, . . . , NJ−1) from subband energy calculation section 703a. Similarly to Embodiment 2 (degree-of-importance determining section 704), degree-of-importance determining section 704a sets degree-of-importance information Ij of each subband on the basis of the subband energy.
Similarly to Embodiment 2 (sparsification processing section 705), sparsification processing section 705a performs sparsification processing on decoded low-band spectrum X1k received from frequency domain transform section 701 using degree-of-importance information Ij (j=0, . . . , NJ−1) received from degree-of-importance determining section 704a. For example, sparsification processing section 705a performs sparsification processing of changing, to zero, the amplitude values of a predetermined number of transform coefficients of a plurality of transform coefficients (transform coefficient indexes STARTj to END) constituting decoded low-band spectrum X1k in each subband j, and generates decoded low-band spectrum SX1k after the sparsification processing. Sparsification processing section 705a outputs decoded low-band spectrum SX1k after the sparsification processing to correlation analysis section 706a.
Correlation analysis section 706a analyzes a correlation between: decoded low-band spectrum SX1k after the sparsification processing received from sparsification processing section 705a; and high-band part X2k of the input spectrum received from frequency domain transform section 702, and obtains amount of shift d when the correlation value is maximum. Correlation analysis section 706a performs the correlation analysis in subband units obtained by dividing the high-band part of the input spectrum, and obtains amount of shift d when the correlation value is maximum, for each subband of the high-band part of the input spectrum. Correlation analysis section 706a outputs the amount of shift d of each subband of the high-band part of the input spectrum, to multiplexing section 606 (
In expression 12, NM represents the number of subbands of the high-band part of the input spectrum, startm and endm represent the transform coefficient index of the lowest frequency and the transform coefficient index of the highest frequency, respectively, of the subband whose subband index is m (m=0, . . . , NM−1), d represents the amount of shift, Dmin represents the minimum value of the search range for the amount of shift, Dmax represents the maximum value of the search range for the amount of shift, and Corm(d) represents the correlation value at amount of shift d in the mth subband.
Correlation analysis section 706a obtains the amount of shift dmax when the correlation value is maximum, on the basis of correlation value Corm(d) calculated as described above, performs coding with the obtained amount of shift dmax being set as the amount of shift in the mth subband, and outputs the resultant coding code to multiplexing section 606 (
In this way, in the present embodiment, sparsification processing section 705a reduces the amount of processing calculation at the time of the calculation of expression 12, using degree-of-importance information Ij (j=0, . . . , N1−1) outputted from degree-of-importance determining section 704a.
More specifically, according to the present embodiment, the coding apparatus first acquires transform coefficients (spectrum) whose frequency band is divided between a low-band part and a high-band part. Subsequently, the coding apparatus divides one frequency band of the low-band part and the high-band part (in the present embodiment, the low-band part) of the transform coefficients into a plurality of subbands. Subsequently, the coding apparatus sets the degree of importance of each subband. Then, the coding apparatus changes, to zero, the amplitude values of a predetermined number of transform coefficients of the transform coefficients included in each subband, in accordance with the set degree of importance. Then, the coding apparatus calculates a correlation between the transform coefficients in the high-band part and the changed transform coefficients in the low-band part. This configuration can guarantee a significant reduction in the amount of processing calculation over the entire frequency band (for all the plurality of subbands).
The coding apparatus does not equally determine, for all the subbands, transform coefficients targeted for the correlation calculation (amount-of-shift calculation), but can adaptively vary the transform coefficients in accordance with the degree of importance of each subband. More specifically, the coding apparatus can perform the amount-of-shift search with high accuracy on subbands whose subband energy is large and which are perceptually important (subbands whose degree of importance is high). On the other hand, the coding apparatus can perform the amount-of-shift search with low accuracy on subbands whose subband energy is small and which have small influence on perception (subbands whose degree of importance is low), whereby the amount of processing calculation can be significantly reduced. This can suppress significant quality degradation of a decoded signal.
In Embodiments 2 and 3, description has been given of an example configuration in which the degree-of-importance determining section determines the degree-of-importance information on the basis of the subband energy calculated by the subband energy calculation section. The present invention is not limited to this configuration and is similarly applicable to a configuration in which the degree of importance is determined on the basis of information other than the subband energy. In another example configuration, the degree of transform coefficient variation (for example, spectral flatness measure (SFM)) of each subband is calculated, and a higher degree of importance is set for a subband whose SFM value is larger. As a matter of course, the degree of importance may be determined on the basis of information other than the SFM value.
In Embodiments 2 and 3, the sparsification processing section fixedly determines a predetermined number of samples targeted for the correlation value calculation on the basis of the degree-of-importance information determined by the degree-of-importance determining section. The present invention is not limited to the configuration. For example, in the case where the subband energy values of high-ranked subbands are extremely close to each other, the degree-of-importance determining section may allow values with fractional values such as (1.0, 2.5, 2.5, 4.0) to be used for setting of the degree-of-importance information, instead of simply setting the degree-of-importance information using integer values of (1, 2, 3, 4). That is, the degree-of-importance information may be more finely set in accordance with a difference in subband energy among the subbands. In another example configuration, the sparsification processing section sets the predetermined number (the predetermined number of transform coefficients) such as (12, 8, 8, 6) on the basis of the degree-of-importance information. In this way, the sparsification processing section determines the predetermined number of transform coefficients using more flexible weighting (degree of importance) in accordance with subband energy distribution of the plurality of subbands, whereby the amount of processing calculation can be reduced still more efficiently than in the above-mentioned embodiments. The predetermined number of transform coefficients can be determined by preparing a plurality of pattern sets of the predetermined number of transform coefficients in advance. Alternatively, the predetermined number of transform coefficients can also be dynamically determined on the basis of the degree-of-importance information. Both the configurations presuppose that patterns of the predetermined number of transform coefficients are determined or the predetermined number of transform coefficients is dynamically determined such that the amount of processing calculation can be reduced by a given value or more for all the plurality of subbands.
Hereinabove, the embodiments of the present invention have been described.
The coding apparatus and the coding method according to the present invention are not limited to the above-mentioned embodiments, and can be variously changed and implemented.
It is assumed that the decoding apparatus in each of the above-mentioned embodiments performs processing using the coding information transmitted from the coding apparatus in each of the above-mentioned embodiments. The present invention is not limited to this case. Coding information does not have to be the coding information transmitted from the coding apparatus in each of the above-mentioned embodiments. As long as coding information contains necessary parameters and data, the processing can be performed.
The present invention is also applicable to cases where a signal processing program is recorded and written into a machine-readable recording medium such as memory, disk, tape, CD, and DVD, and is operated, and operations and effects similar to those in each of the above-mentioned embodiments can be obtained.
Also, although cases have been described with the above embodiments as examples where the present invention is configured by hardware, the present invention can also be implemented by software in concert with hardware.
Each function block employed in the description of the aforementioned embodiments may typically be implemented as an LSI constituted by an integrated circuit. These may be individual chips or partially or totally contained on a single chip. “LSI” is adopted here but this may also be referred to as “IC,” “system LSI,” “super LSI,” or “ultra LSI” depending on differing extents of integration.
Further, the method of circuit integration is not limited to LSI, and implementation using dedicated circuitry or general purpose processors is also possible. After LSI manufacture, utilization of a programmable FPGA (Field Programmable Gate Array) or a reconfigurable processor where connections and settings of circuit cells within an LSI can be reconfigured is also possible.
Further, if integrated circuit technology comes out to replace LSI as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Application of biotechnology is also possible.
The disclosure of Japanese Patent Application No. 2011-229616, filed on Oct. 19, 2011, including the specification, drawings, and abstract, is incorporated herein by reference in its entirety.
The present invention can efficiently reduce the amount of calculation when a correlation operation is performed on an input signal, and is applicable to, for example, a packet communication system, a mobile communication system, and the like.
Number | Date | Country | Kind |
---|---|---|---|
2011-229616 | Oct 2011 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/006423 | 10/5/2012 | WO | 00 | 4/1/2014 |