The present invention relates to a speech coding apparatus and a speech coding method and more particularly relates to a speech coding apparatus and a speech coding method capable of deleting redundant inter-channel parameters.
Generally, a stereo speech coding method or a multi-channel speech coding method include two methods.
One is the method to individually encode different channel signals, and this method can be easily applied to stereo speech signals or multi-channel speech signals. However, since this method does not delete inter-channel redundancy, the entire coding bit rate becomes proportional to the number of channels, and hence results in a higher bit rate.
The other is the method to parametrically encode a stereo speech signal or a multi-channel speech signal. The basic principle of this method is as follows. That is, at first, a coding side down-mixes or transforms an input signal into a signal of fewer channels than (or the same number as) those of the input signal. Next, the coding side encodes the down-mixed or transformed signal using the conventional speech coding method. In parallel with this, the coding side calculates inter-channel parameters representing inter-channel relationship from an original signal, encodes and then transmits the inter-channel parameters to a decoding side such that the decoding side can generate a stereo image or a multi-channel image. This method can encode inter-channel parameters with a smaller amount of coding than the amount of coding to encode a speech signal itself, thus making it possible to realize a lower bit rate.
A parametric stereo coding system or a multi-channel coding system widely use a principal component analysis (PCA) (Non-Patent Literature 1), a binaural cue coding method (BCC) (Non-Patent Literature 2), an inter-channel prediction (ICP) (Non-Patent Literature 3), and intensity stereo (IS) (Non-Patent Literature 4). The above methods generate and then transmit certain inter-channel parameters to a decoding side. For example, a binaural cue coding method (BCC) generates inter-channel level difference (ICLD), inter-channel time difference (ICTD), and inter-channel coherence (ICC) as the inter-channel parameters. Also, as inter-channel parameters, an inter-channel prediction (ICP), intensity stereo (IS), and a principal component analysis (PCA) generate an inter-channel prediction coefficient, an energy scale coefficient, and a rotation angle, respectively.
Since BCC, ICP, IS, and PCA require to obtain highly precise inter-channel parameters, it is general to calculate and encode the inter-channel parameters on a subband basis.
{Xi
{Yi
{Pi
The following will be explained assuming that down-mixing is performed.
At the coding side illustrated in
Coding section 102 encodes down-mixed signal {yi
Multiplexing section 104 multiplexes coding parameters of down-mixed signals {yi
At the decoding side illustrated in
Decoding section 202 performs decoding processing using the coding parameters of the down-mixed signals, and generates decoded down-mixed signals {y{tilde over ( )}i
Decoding section 203 (inter-channel parameter decoding section) performs decoding processing using the coding parameters of the inter-channel parameters, and generates decoded inter-channel parameters {P{tilde over ( )}i
Inter-channel parameter applying section 204 up-mixes decoded down-mixed signals {y{tilde over ( )}i
Non-Patent Literature 1 describes a codec based on a principal component analysis (PCA) in the frequency domain.
{Lsb(f)}: left signals divided into a plurality of subbands
{Rsb(f)}: right signals divided into a plurality of subbands
{Pcsb(f)}: principal-component signals calculated every subband by a principal component analysis
{Asb(f)}: ambient signals calculated every subband by a principal component analysis
{θsb}: rotation angles calculated every subband by a principal component analysis
{PcARsb}: energy ratios of principal component signals to ambient signals, the ratios calculated every subband
At a coding side illustrated in
The transform of a principal component analysis is performed as the following equation.
(Equation 2)
Pc
sb(f)=Lsb(f)*cos θsb+Rsb(f)*sin θsb
A
sb(f)=Rsb(f)*cos θsb−Lsb(f)*sin θsb [2]
Monaural coding section 303 encodes principal-component signals {Pcsb(f)}.
Coding section 302 (rotation angle coding section) encodes rotation angles {θsb}.
Ambient signals {Asb(f)} are not regarded as important and thereby are not directly encoded. Energy parameter extracting section 304 calculates energy ratios {PcARsb} of principal-component signals to ambient signals, and coding section 305 (energy ratio coding section) encodes the energy ratios {PcARsb} and generates energy ratio coding parameters. The energy ratios {PcARsb} are calculated as the following equation.
Multiplexing section 306 multiplexes coding parameters of principal-component signals {Pcsb(f)}, rotation angles {θsb}, and energy ratios {PcARsb}, and transmits a bit stream to a decoding side.
At the decoding side illustrated in
Decoding section 402 (rotation angle decoding section) decodes the coding parameters of the rotation angles and outputs the decoded rotation angles {θ{tilde over ( )}i
Monaural decoding section 403 decodes the coding parameters of the principal-component signals, generates and then outputs decoded principal-component signals {P{tilde over ( )}csb(f)} to principal component combining section 406 and ambient signal combining section 405.
Decoding section 404 (energy ratio decoding section) decodes the coding parameters of the energy ratios and generates decoded energy ratios {P{tilde over ( )}cARsb} of the principal-component signals to the ambient signals.
By scaling the decoded principal-component signals {P{tilde over ( )}csb(f)} by the decoded energy ratios, ambient signal combining section 405 generates decoded ambient signals {A{tilde over ( )}sb(f)}.
Principal component combining section 406 inversely transforms decoded principal-component signals {P{tilde over ( )}csb(f)} and decoded ambient signals {A{tilde over ( )}sb(f)} by decoded rotation angles {θ{tilde over ( )}i
(Equation 4)
{tilde over (L)}
sb(f)={tilde over (P)}csb(f)*cos {tilde over (θ)}sb−Ãsb(f)*sin {tilde over (θ)}sb
{tilde over (R)}
sb(f)={tilde over (P)}csb(f)*sin {tilde over (θ)}sb+Ãsb(f)*cos {tilde over (θ)}sb [4]
In the case that the ambient signals are not encoded, the inverse transformation is performed as the following equation.
(Equation 5)
{tilde over (L)}
sb(f)={tilde over (P)}csb(f)*cos {tilde over (θ)}sb
{tilde over (R)}
sb(f)={tilde over (P)}csb(f)*sin {tilde over (θ)}sb [5]
Irrespective of coding quality or signal-level sizes of down-mixed signals {yi
Here, let us consider, as an example, a case where down-mixed signals of one or a plurality of subbands are not encoded, in the case of an extremely low bit rate. In these subbands where down-mixed signals are not encoded, the inter-channel parameters are unnecessary in generating multi-channel speech signals, and coding of these unnecessary parameters results in wasting bits used in the coding processing.
Hereinafter, a case will be described exemplifying the above codec based on a principal component analysis in the frequency domain.
It is assumed that when input signals are represented as L(n) and R(n), these signals can be represented as L(n)=S(n)+C(n) and R(n)=S(n)+B(n) (S(n) means the main source signal, and C(n) and B(n) means certain ambient noise).
In the case of the frequency domain, L(f)=S(f)+C(f) and R(f)=S(f)+B(f) hold true. In the subband where S(f) is not so strong, the ambient noise is dominant; that is, C(f) is dominant in L(f) and B(f) is dominant in R(f). In this case, these types of subbands are not so important in the whole spectrum that signals in these subbands are not encoded in the case of a low bit rate. Therefore, coding of rotation angles in these subbands is essentially not necessary. For this reason, the conventional art which always encodes the rotation angles of all subbands wastes the bits allocated to the coding of the rotation angles in these subbands.
Referring to
It is therefore an object of the present invention to provide a speech coding apparatus and a speech coding method capable of deleting the redundant inter-channel parameters.
In the first aspect of the present invention, before encoding and transmitting inter-channel parameters, a coding apparatus analyzes signal characteristics of each subband signal and checks whether or not it is necessary to transmit inter-channel parameters. Then, the coding apparatus selects inter-channel parameters not necessary to be transmitted and deletes the parameters from coding targets.
By this means, it is possible to delete the unnecessary inter-channel parameters from the coding targets and to prevent encoding the unnecessary parameters, which makes it possible to improve a coding efficiency without wasting bits.
In the second aspect of the present invention, redundant parameters are selected by a closed loop method. Introduction of a local decoding section at the coding side and analysis of signal coding quality selects the redundant parameters. By analyzing the energy or amplitude of decoded down-mix signals generated via the local decoding section, the subband with small energy or amplitude is regarded as a subband having a redundant inter-channel parameter. Deletion of the inter-channel parameter of this subband from the coding targets prevents a possibility of decreasing sound quality.
By this means, the local decoding section can select the subband having the redundant parameter (unimportant inter-channel parameter).
In the third aspect of the present invention, the redundant parameters are selected by an open loop method. An analysis of the characteristics of transformed or down-mixed original signals selects the redundant parameters.
Therefore, the present embodiment does not require a local decoding section and is useful in the condition incapable of using the local decoding section. Also, absence of the local decoding section can reduce the amount of calculations.
In the fourth aspect of the present invention, after decoding, the decoding side analyzes the transformed or down-mixed signals and selects the subband without an inter-channel parameter. Therefore, flag signals are not required, the signals reporting to the decoding section that a specific subband does not include the inter-channel parameter.
By this means, unnecessity of additional information representing the flag signals can improve the coding efficiency.
The fifth aspect of the present invention uses the bits saved by applying the present invention in order to encode certain more important signals (for example, the coding parameters of the principal-component signals, and the coding parameters of the transformed or down-mixed signals).
Thus, realization of more precise bit allocation can improve the coding efficiency.
In the sixth aspect of the present invention, the decoding side predicts non-existent inter-channel parameters from parameters of adjacent subbands, parameters of a former frame, or both of them. The predicted value is used on inverse transformation or up-mixing.
By this means, it is possible to predict non-existent inter-channel parameters and to maintain spatial images.
The seventh aspect of the present invention applies the present invention for scalable coding. In each layer, before encoding and transmitting inter-channel parameters, the coding apparatus analyzes the characteristics of the transformed or down-mixed signals every subband signal, and checks whether or not it is necessary to transmit inter-channel parameters. Then, the coding apparatus selects the inter-channel parameter not necessary to be transmitted and deletes the parameter from the coding targets. In the case of a layer where inter-channel parameters are necessary to generate input signals, the coding apparatus transmits the inter-channel parameters.
By this means, since the coding apparatus transmits the inter-channel parameters only in the case of the layer requiring the inter-channel parameters, it is possible to realize precise bit allocation.
Embodiments of the present invention will now be described with reference to the accompanying drawings.
The present embodiment will be described referring to
Local monaural decoding section 603 generates decoded principal-component signals such that a coding side can confirm the coding quality of the principal-component signals.
Through analysis of the coding quality of the decoded principal-component signals, redundant parameter deleting section 604 selects redundant parameters and deletes these parameters from coding targets.
The coding processing according to the present embodiment will be described referring to
As illustrated in
Zero-value inserting section 804 analyzes the decoded principal-component signals, selects the subband without a rotation angle, and inserts a zero value to the subband, so that inverse transformation can be performed smoothly.
The decoding processing according to the present embodiment will be described referring to
As illustrated in
The present invention can be applied to encoding of the energy ratios of principal-component signals to ambient signals.
The present embodiment will be described referring to
{Xi
{Yi
{Pi
{X{tilde over ( )}i
{Y{tilde over ( )}i
{P{tilde over ( )}i
The present embodiment deletes redundant parameters in multi-channel speech coding.
In speech coding apparatus 1000, inter-channel parameter generating section 1001 transforms or down-mixes input signals {xi
Coding section 1002 encodes the transformed or down-mixed signals {yi
Local decoding section 1003 generates signals transformed or down-mixed after decoding, such that the coding side can identify coding quality of the transformed or down-mixed signals.
By analyzing the coding quality of the transformed or down-mixed signals, deleting section 1004 selects redundant parameters and deletes these parameters from coding targets.
Coding section 1005 (inter-channel parameter coding section) encodes the remaining inter-channel parameters {P′i
Multiplexing section 1006 multiplexes coding parameters of {yi
The coding processing according to the present embodiment will be described referring to
As illustrated in
There are many methods, such as the following two, to determine whether or not the decoded subband signals are sufficiently weak. However, the present invention is not limited to the following methods.
<Method 1> Case Where Signal Energy of Subband is Extremely Lower than Adjacent Subbands
Every subband, this method calculates energy {Esb} and energy ratios of the subband to the adjacent subbands, and then compares the energy ratios with a predetermined value Eth (Eth<1). When both energy ratios are smaller than Eth, the subband signal is regarded as weak. For example, two energy ratios E2/E1 and E2/E3 are calculated in the second subband. If E2/E1<Eth and E2/E3<Eth hold true, the signal of the second subband is regarded as weak in this case. In this case, the inter-channel parameter of the second subband is regarded as the redundant parameter.
<Method 2> Case Where Subband Signal is Close to or Lower than Masking Curve
Every subband, this method calculates energy {Esb} and masking curve level {Msb}, and then compares the masking curve level with the subband energy. In this case, it is possible to define another threshold Mth (Mth>0). When the subband energy is smaller than or close to a masking curve, that is, Esb<Msb+Mth holds true, the subband signal is regarded as weak. For example, subband energy E2 is compared with masking curve level M2. If E2<M2+Mth holds true, the signal of the second subband is regarded as weak. Therefore, the inter-channel parameter in the second subband is regarded as the redundant parameter.
In speech decoding apparatus 1200, demultiplexing section 1201 demultiplexes the bit stream.
Decoding section 1202 decodes coding parameters of {yi
Decoding section 1203 (inter-channel parameter decoding section) decodes coding parameters of {P′i
Zero-value inserting section 1204 analyzes the decoded spectra of the transformed or down-mixed signals, selects the subband without an inter-channel parameter, and inserts a zero value in the subband so that inverse transformation or up-mixing can be performed smoothly.
By using spatial information represented by the decoded inter-channel parameters {P{tilde over ( )}i
The decoding processing according to the present embodiment will be described referring to
As illustrated in
As described above, before encoding and transmitting inter-channel parameters, the present embodiment analyzes the signal characteristics per signal transformed in each subband, and checks whether or not it is necessary to transmit the inter-channel parameters. Then, the inter-channel parameter not necessary to be transmitted is selected and deleted from the coding targets.
Therefore, according to the present embodiment, by deleting unnecessary inter-channel parameters from the coding targets, it is possible to prevent encoding the unnecessary parameters and hence to improve a coding efficiency.
Also, according to the present invention, the redundant parameters are selected by a closed loop method. That is, by analyzing the coding quality of signals, the local decoding section in the coding side selects redundant parameters.
Thus, according to the present embodiment, the local decoding section can specify the subband including the redundant parameter (unimportant inter-channel parameter). Thus, the possibility of decreasing sound quality is avoided.
Also, according to the present invention, the decoding side selects a subband in which no inter-channel parameter exists, by decoding and analyzing the transformed or down-mixed signals. Therefore, a flag signal reporting to the decoding section that no inter-channel parameter exists in a specific subband is not required.
As mentioned above, according to the present embodiment, unnecessity of additional information to represent the flag signals can improve the coding efficiency.
The present embodiment will be described referring to
In the present embodiment, the decoding side predicts the non-existent inter-channel parameter, from parameters of adjacent subbands, parameters of the former frame, or both of them. The predicted value is used in performing inverse transformation or up-mixing.
In speech decoding apparatus 1400, missing parameter predicting section 1404 predicts the non-existent inter-channel parameter by using the parameters of the adjacent subbands or the parameters of the former frame without insertion of a zero value into the non-existent inter-channel parameter.
The decoding processing according to the present embodiment will be described referring to
There are many other methods to predict non-existent inter-channel parameters.
For example, as the following equation, there is a method to interpolate the non-existent inter-channel parameter using the parameters of the adjacent subbands.
Also, as the following equation, there is a method to predict a non-existent inter-channel parameter using the parameters of the former frame. This method is effective when the spatial image is stable in a time domain.
(Equation 7)
{tilde over (P)}i
As described above, according to the present embodiment, the decoding side predicts the non-existent inter-channel parameter from the parameters of the adjacent subbands, the parameters of the former frame, or both of them. The predicted value is used on performing inverse transformation or up-mixing.
By this means, it is possible to predict the non-existent inter-channel parameters to maintain spatial images.
The present embodiment will be described referring to
{Xi
{Yi
{Pi
{X{tilde over ( )}i
{Y{tilde over ( )}i
{P{tilde over ( )}i
In the present invention, an open loop method selects redundant parameters. By analyzing the characteristics of the transformed or down-mixed original signal, the present embodiment selects the redundant inter-channel parameters and deletes the parameters from the coding targets.
In speech coding apparatus 1600, inter-channel parameter generating section 1601 transforms or down-mixes input signal {xi
Coding section 1602 encodes the transformed or down-mixed signal {yi
Signal analyzing section 1603 selects the redundant parameters by analyzing the signal characteristics of the transformed or down-mixed signal {yi
Redundant parameter deleting section 1604 selects the redundant parameters and deletes the parameters from the coding targets.
Coding section 1605 (inter-channel parameter coding section) encodes remaining inter-channel parameters {P′i
Multiplexing section 1606 multiplexes coding parameters of {yi
The coding processing according to the present embodiment will be described referring to
As illustrated in
There are many methods, such as the following two, to determine whether or not the subband signals are sufficiently weak. However, the present invention is not limited to the followings.
<Method 1> Case Where Signal Energy of Subband is Extremely Lower than Adjacent Subbands
Every subband, this method calculates energy {Esb} and energy ratios of the subband to the adjacent subbands, and then compares the energy ratios with a certain predetermined value Eth (Eth<1). When both energy ratios are smaller than Eth, the subband signal is regarded as weak. For example, two energy ratios E2/E1 and E2/E3 are calculated in the second subband. If E2/E1<Eth and E2/E3<Eth hold true, the signal of the second subband is regarded as weak in this case. In this case, the inter-channel parameter of the second subband is regarded as the redundant parameter.
<Method 2> Case Where Subband Signal is Close to or Lower than Masking Curve
Every subband, this method calculates energy {Esb} and masking curve level {Msb}, and then compares the masking curve level with the subband energy. In this case, it is possible to define another threshold Mth (Mth>0). When the subband energy is smaller than or close to a masking curve, that is, Esb<Msb+Mth holds true, the subband energy is regarded as weak. For example, when subband energy E2 is compared with masking curve level M2 and thereby E2<M2+Mth holds true, the signal of the second subband is regarded as weak. The inter-channel parameter in the second subband is regarded as the redundant parameter.
In speech decoding apparatus 1800, demultiplexing section 1801 demultiplexes the bit stream.
Decoding section 1802 decodes coding parameters of {yi
Decoding section 1803 (inter-channel parameter decoding section) decodes coding parameters of {P′i
Zero-value inserting section 1804 analyzes the decoded spectrum of the transformed or down-mixed signal, selects the subband without an inter-channel parameter, and inserts a zero value in the subband so that inverse transformation or up-mixing can be performed smoothly.
By using spatial information represented by decoded inter-channel parameters {P{tilde over ( )}i
The decoding processing according to the present embodiment will be described referring to
As illustrated in
According to the present invention, the redundant parameters are selected by an open loop method. That is, an analysis of the characteristics of transformed or down-mixed original signals selects the redundant parameters.
Therefore, the present embodiment does not require a local decoding section. Thus, the present embodiment is useful in the condition incapable of using the local decoding section. Also, absence of the local decoding section can reduce the amount of calculations.
The present embodiment will be described referring to
{Xi
{Yi
{Pi
{X{tilde over ( )}i
{Y{tilde over ( )}i
{P{tilde over ( )}i
The present embodiment deletes redundant parameters in scalable codec.
In speech coding apparatus 2000, inter-channel parameter generating section 2001 transforms or down-mixes input signals {xi
Scalable coding section 2002 encodes the transformed or down-mixed signals {yi
Scalable local decoding section 2003 generates decoded signals of layers, such that the coding side can identify coding quality of the transformed or down-mixed signals.
By analyzing the coding quality of the transformed or down-mixed signal, scalable redundant parameter deleting section 2004 selects redundant parameters and deletes these parameters from coding targets.
Coding section 2005 (inter-channel parameter coding section) encodes the remaining inter-channel parameters {P′i
Multiplexing section 2006 multiplexes the coding parameters of {yi
The coding processing according to the present embodiment will be described referring to
As illustrated in
On the other hand, in layer 2, the decoded signal of the second subband is not weak, and hence it is necessary to encode the inter-channel parameter in order to prevent possible deterioration of sound quality. Therefore, it is layer 2 that firstly encodes the inter-channel parameter of the second subband.
There are many methods, such as the following two, to determine whether or not the subband signal is extremely weak. However, the present invention is not limited to the followings.
<Method 1> Case Where Signal Energy of Subband is Extremely Lower than Adjacent Subbands
Every subband, this method calculates energy {Esb} and energy ratios of the subband to the adjacent subbands, and then compares the energy ratios with a certain predetermined value Eth (Eth<1). When both energy ratios are smaller than Eth, the subband signal is regarded as weak. For example, two energy ratios E2/E1 and E2/E3 are calculated in the second subband. If E2/E1<Eth and E2/E3<Eth hold true, the signal of the second subband is regarded as weak. The inter-channel parameter of the second subband is regarded as the redundant parameter.
<Method 2> Case Where Subband Signal is Close to or Lower than Masking Curve
Every subband, this method calculates energy {Esb} and masking curve level {Msb}, and then compares the masking curve level with the subband energy. In this case, it is possible to define another threshold Mth (Mth>0) When the subband energy is smaller than or close to a masking curve, that is, when Esb<Msb+Mth holds true, the subband energy is regarded as weak. For example, when subband energy E2 is compared with masking curve level M2 and thereby E2<M2+Mth holds true, the signal of the second subband is regarded as weak. The inter-channel parameter in this second subband is regarded as the redundant parameter.
In speech decoding apparatus 2200, demultiplexing section 2201 demultiplexes the bit stream in each layer.
Scalable decoding section 2202 decodes coding parameters of {yi
Decoding section 2203 (inter-channel parameter decoding section) decodes coding parameters of {P′i
In each layer, zero-value inserting section 2204 analyzes the decoded spectrum of the transformed or down-mixed signal, selects the subband without an inter-channel parameter, and inserts a zero value in the subband so that inverse transformation or up-mixing can be performed smoothly.
By using spatial information represented by inter-channel parameters {P{tilde over ( )}i
The decoding processing according to the present embodiment will be described referring to
As illustrated in
On the other hand, since the decoded signal of the second subband is not weak in layer 2, it is necessary to encode the inter-channel parameter of the second subband.
The method of the decoding side to determine whether or not the inter-channel parameters are encoded is the same as the method of the coding side for the purpose of maintaining consistency with the coding side.
As described above, before encoding inter-channel parameters and transmitting the result, in each layer of scalable coding, the present embodiment analyzes the characteristics of transformed or down-mixed signals every subband and checks whether or not it is necessary to transmit the inter-channel parameters. Then, the inter-channel parameter not necessary to be transmitted is selected and deleted from the coding targets. Meanwhile, in the case of the layer requiring the inter-channel parameter so as to generate input signals, the inter-channel parameter is transmitted.
Therefore, the present invention can realize precise bit allocation so as to transmit the inter-channel parameter only for the layer requiring the inter-channel parameter.
The disclosure of Japanese Patent Application No. 2009-298321, filed on Dec. 28, 2009, including the specification, drawings and abstract, is incorporated herein by reference in its entirety.
The present invention is suitable for a communication apparatus performing speech coding, a communication apparatus performing speech decoding, and particularly a wireless communication apparatus.
Number | Date | Country | Kind |
---|---|---|---|
2009-298321 | Dec 2009 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2010/007553 | 12/27/2010 | WO | 00 | 6/22/2012 |