The present invention relates to a speech processing apparatus, a speech processing method and a program and, more particularly, relates to a speech processing apparatus, a speech processing method and a program which, when multichannel audio signals are downmixed and coded, prevent delay and an increase in the computation amount upon decoding of the audio signals.
A coding apparatus which codes multichannel audio signals can perform highly efficient coding by utilizing a relationship between channels. This coding includes, for example, intensity coding, M/S stereo coding and spatial coding. A coding apparatus which performs spatial coding downmixes an n channel audio signal into a m (m<n) channel audio signal and codes the signal, finds spatial parameters representing the inter-channel relationship upon downmixing and transmits the spatial parameters together with the coded data. A decoding apparatus which receives the spatial parameters and the coded data decodes the coded data, and restores the original n channel audio signal from the m channel audio signal obtained as a result of decoding using the spatial parameter.
This spatial coding is known as “binaural cue coding”. For the spatial parameters (hereinafter, referred to as “BC parameters”), for example, ILD (Inter-channel Level Difference), IPD (Inter-channel Phase Difference) and ICC (Inter-channel Correlation) are used. The ILD refers to a parameter indicating the ratio of the magnitude of an inter-channel signal. The IPD refers to a parameter indicating an inter-channel phase difference, and the ICC refers to a parameter indicating an inter-channel correlation.
In addition, n=2 and m=1 for ease of description. That is, a coding target audio signal is a stereo audio signal (hereinafter, referred to as “stereo signal”), and coded data obtained as a result of coding is coded data of a monaural audio signal (hereinafter, referred to as “monaural signal”).
A coding apparatus 10 in
More specifically, the channel downmix unit 11 of the coding apparatus 10 downmixes the stereo signal input as the coding target, to the monaural signal XM. Further, the channel downmix unit 11 supplies the monaural signal to the spatial parameter detection unit 12 and the audio signal coding unit 13.
The spatial parameter detection unit 12 detects the BC parameters based on the monaural signal XM supplied from the channel downmix unit 11 and the stereo signal input as the coding target, and supplies the BC parameters to the multiplexing unit 14.
The audio signal coding unit 13 codes the monaural signal supplied from the channel downmix unit 11, and supplies resulting coded data to the multiplexing unit 14.
The multiplexing unit 14 multiplexes and outputs the coded data supplied from the audio signal coding unit 13 and the BC parameter supplied from the spatial parameter detection unit 12.
In addition, the audio signal coding unit 13 in
The audio signal coding unit 13 in
The MDCT unit 21 performs MDCT of the monaural signal supplied from the channel downmix unit 11, and transforms a monaural signal which is a time domain signal, into a MDCT coefficient which is a frequency domain coefficient. The MDCT unit 21 supplies the MDCT coefficient obtained as a result of transform, to the spectrum quantization unit 22 as a frequency spectrum coefficient.
The spectrum quantization unit 22 quantizes the frequency spectrum coefficient supplied from the MDCT unit 21, and supplies the frequency spectrum coefficient to the entropy coding unit 23. Further, the spectrum quantization unit 22 supplies quantization information which is information related to this quantization, to the multiplexing unit 24. The quantization information includes, for example, a scale factor and quantization bit information.
The entropy coding unit 23 performs entropy coding such as Huffman coding or arithmetic coding of the quantized frequency spectrum coefficient supplied from the spectrum quantization unit 22, and losslessly compresses the frequency spectrum coefficient. The entropy coding unit 23 supplies data obtained as a result of entropy coding, to the multiplexing unit 24.
The multiplexing unit 24 multiplexes the data supplied from the entropy coding unit 23 and the quantization information supplied from the spectrum quantization unit 22, and supplies resulting data to the multiplexing unit 14 (
In addition, the audio signal coding unit 13 in
The audio signal coding unit 13 in
The analysis filter bank 31 includes, for example, a QMF (Quadrature Mirror Filterbank) bank or a PQF (Poly-phase Quadrature Filter) bank. The analysis filter bank 31 divides the monaural signal supplied from the channel downmix unit 11, into N groups according to a frequency. The analysis filter bank 31 supplies N subband signals obtained as a result of division, to the MDCT units 32-1 to 32-N.
The MDCT units 32-1 to 32-N each perform MDCT of the subband signal supplied from the analysis filter bank 31, and transforms the subband signal which is a time domain signal, into a MDCT coefficient which is a frequency domain coefficient. Further, the MDCT units 32-1 to 32-N each supply the MDCT coefficient of each subband signal to the spectrum quantization unit 33 as the frequency spectrum coefficient.
The spectrum quantization unit 33 quantizes each of the N frequency spectrum coefficients supplied from the MDCT units 32-1 to 32-N, and supplies the N frequency spectrum coefficients to the entropy coding unit 34. Further, the spectrum quantization unit 33 supplies quantization information about this quantization, to the multiplexing unit 35.
The entropy coding unit 34 performs entropy coding such as Huffman coding or arithmetic coding of each of the quantized N frequency spectrum coefficients supplied from the spectrum quantization unit 33, and losslessly compresses the N frequency spectrum coefficients. The entropy coding unit 34 supplies N items of data obtained as a result of entropy coding, to the multiplexing unit 35.
The multiplexing unit 35 multiplexes the N items of data supplied from the entropy coding unit 34 and the quantization information supplied from the spectrum quantization unit 33, and supplies resulting data to the multiplexing unit 14 (
A decoding apparatus 40 in
More specifically, the inverse multiplexing unit 41 of the decoding apparatus 40 inversely multiplexes the multiplexed coded data supplied from the coding apparatus 10 in
The audio signal decoding unit 42 decodes the coded data supplied from the inverse multiplexing unit 41, and supplies the resulting monaural signal XM which is a time domain signal, to the stereo signal generation unit 44.
The generation parameter calculation unit 43 calculates generation parameters which are parameters for generating a stereo signal from a monaural signal which is a decoding result of the multiplexed coded data, using the BC parameter supplied from the inverse multiplexing unit 41. The generation parameter calculation unit 43 supplies these generation parameters to the stereo signal generation unit 44.
The stereo signal generation unit 44 generates the left audio signal XL and the right audio signal XR from the monaural signal XM supplied from the audio signal decoding unit 42 using the generation parameters supplied from the generation parameter calculation unit 43. The stereo signal generation unit 44 outputs the left audio signal XL and the right audio signal XR as stereo signals.
In addition, the audio signal decoding unit 42 in
The audio signal decoding unit 42 in
The inverse multiplexing unit 51 inversely multiplexes the coded data supplied from the inverse multiplexing unit 41 in
The entropy decoding unit 52 performs entropy decoding such as Huffman decoding or arithmetic decoding of the frequency spectrum coefficient supplied from the inverse multiplexing unit 51, and restores the quantized frequency spectrum coefficient. The entropy decoding unit 52 supplies this frequency spectrum coefficient to the spectrum inverse quantization unit 53.
The spectrum inverse quantization unit 53 inversely quantizes the quantized frequency spectrum coefficient supplied from the entropy decoding unit 52 based on the quantization information supplied from the inverse multiplexing unit 51, and restores the frequency spectrum coefficient. Further, the spectrum inverse quantization unit 53 supplies the frequency spectrum coefficient to the IMDCT (Inverse MDCT) (Inverse Modified Discrete Cosine Transform) unit 54.
The IMDCT unit 54 performs IMDCT of the frequency spectrum coefficient supplied from the spectrum inverse quantization unit 53, and transforms the frequency spectrum coefficient into the monaural signal XM which is a time domain signal. The IMDCT unit 54 supplies this monaural signal XM to the stereo signal generation unit 44 (
In addition, the audio signal decoding unit 42 in
The audio signal decoding unit 42 in
The inverse multiplexing unit 61 inversely multiplexes the coded data supplied from the inverse multiplexing unit 41 in
The entropy decoding unit 62 performs entropy decoding such Huffman decoding or arithmetic decoding of the frequency spectrum coefficients of the N subband signals supplied from the inverse multiplexing unit 61, and supplies the frequency spectrum coefficients to the spectrum inverse quantization unit 63.
The spectrum inverse quantization unit 63 inversely quantizes each of the frequency spectrum coefficients of the N subband signals which are supplied from the entropy decoding unit 62 and which are obtained as a result of entropy decoding, based on the quantization information supplied from the inverse multiplexing unit 61. By this means, the frequency spectrum coefficients of the N subband signals are restored. The spectrum inverse quantization unit 63 supplies the restored frequency spectrum coefficients of the N subband signals to the IMDCT units 64-1 to 64-N one by one.
The IMDCT units 64-1 to 64-N each perform IMDCT of the frequency spectrum coefficient supplied from the spectrum inverse quantization unit 63, and transform the frequency spectrum coefficient into a subband signal which is a time domain signal. The IMDCT units 64-1 to 64-N each supply the subband signal obtained as a result of transform, to the synthesis filter bank 65.
The synthesis filter bank 65 includes, for example, an inverse PQF and an inverse QMF. The synthesis bank 65 synthesizes the N subband signals supplied from the IMDCT units 64-1 to 64-N, and supplies the resulting signal to the stereo signal generation unit 44 (
The stereo signal generation unit 44 in
The reverb signal generation unit 71 generates a signal XD which is uncorrelated with this monaural signal XM using the monaural signal XM supplied from the audio signal decoding unit 42 in
In addition, for the reverb signal generation unit 71, a feedback delay network (FDN) is used in some cases (see, for example, Patent Document 1).
The reverb signal generation unit 71 supplies the generated signal XD to the stereo synthesis unit 72.
The stereo synthesis unit 72 synthesizes the monaural signal XM supplied from the audio signal decoding unit 42 in
The stereo signal generation unit 44 in
In addition, when the stereo signal generation unit 44 in
More specifically, for example, the spatial parameter detection unit 12 has two analysis filter banks. Further, in the spatial parameter detection unit 12, one analysis filter bank divides the stereo signal according to a frequency, and the other analysis filter bank divides the monaural signal from the channel downmix unit 11 according to a frequency. The spatial parameter detection unit 12 detects the BC parameter per subband signal based on the subband signal of the stereo signal and the subband signal of the monaural signal obtained as a result of division. Further, the generation parameter calculation unit 43 in
The analysis filter bank 81 includes, for example, a QMF (Quadrature Mirror Filter) bank. The analysis filter bank 81 divides the monaural signal XM supplied from the audio signal decoding unit 42 in
The subband stereo signal generation units 82-1 to 82-P each include a reverb signal generation unit and a stereo synthesis unit. The configuration of each of the subband stereo signal generation units 82-1 to 82-P is the same, and therefore only the subband stereo signal generation unit 82-B will be described.
The subband stereo signal generation unit 82-B includes a reverb signal generation unit 91 and a stereo synthesis unit 92. The reverb signal generation unit 91 generates a signal XDB which is irrelevant to this subband signal XmB using the subband signal XmB of the monaural signal supplied from the analysis filter bank 81, and supplies the signal XDB to the stereo synthesis unit 92.
The stereo synthesis unit 92 synthesizes the subband signal XmB supplied from the analysis filter bank 81 and the signal XDB supplied from the reverb signal generation unit 91 using the generation parameters of the subband signal XmB supplied from the generation parameter calculation unit 43 in
The synthesis filter bank 83 synthesizes left and right stereo signals of each subband signal supplied from the subband stereo signal generation units 82-1 to 82-P at a time. The synthesis filter bank 83 outputs the resulting left audio signal XL and right audio signal XR as stereo signals.
In addition, the configuration of the stereo signal generation unit 44 in
Further, a coding apparatus which performs intensity coding mixes the frequency spectrum coefficient of each channel at a frequency equal to or more than a predetermined frequency band of the input stereo signal, and generates the frequency spectrum coefficient of the monaural signal. Further, the coding apparatus outputs a level ratio of the frequency spectrum coefficient of this monaural signal and an inter-channel frequency spectrum coefficient as a coding result.
More specifically, the coding apparatus which performs intensity coding performs MDCT with respect to the stereo signal, and mixes and shares the frequency spectrum coefficient of each channel at a frequency equal to or more than a predetermined frequency band among resulting frequency spectrum coefficients of channels. Further, the coding apparatus which performs intensity coding quantizes and entropy-codes the shared frequency spectrum coefficient, and multiplexes resulting data and quantization information as coded data. Furthermore, the coding apparatus which performs intensity coding finds the level ratio of the inter-channel frequency spectrum coefficients, and multiplexes and outputs the level ratio and the coded data.
Still further, a decoding apparatus which performs intensity decoding inversely multiplexes the coded data on which the level ratio of the inter-channel frequency spectrum coefficients is multiplexed, entropy-decodes resulting coded data and inversely quantizes the coded data based on the quantization information. Moreover, the decoding apparatus which performs intensity decoding restores the frequency spectrum coefficient of each channel based on the level ratio of the frequency spectrum coefficient obtained as a result of inverse quantization and the inter-channel frequency spectrum coefficients multiplexed on the coded data. Moreover, the decoding apparatus which performs intensity decoding performs IMDCT of the restored frequency spectrum coefficient of each channel, and obtains a stereo signal at a frequency equal to or more than a predetermined frequency band.
Although such intensity coding ratio is usually used to improve a coding efficiency, a high band frequency spectrum coefficient of a stereo signal is monaural-coded and represented only by an inter-channel level difference, and therefore the original stereophonic effect is slightly lost.
As described above, the decoding apparatus 40 which decodes conventional spatially coded data generates the signal XD and signals XD1 to XDP which are irrelevant to the monaural signal XM used upon generation of a stereo signal, using the monaural signal XM which is a time domain signal.
Therefore, the reverb signal generation unit 71 which generates the signal XD, and the analysis filter bank 81 and the reverb signal generation units 91 of the subband stereo signal generation units 82-1 to 82-P which generate the signals XD1 to XDP cause delay, and increases algorithm delay of the decoding apparatus 40. This causes a problem when, for example, the decoding apparatus 40 is requested to provide immediate response performance or the decoding apparatus 40 is used in real-time communication, that is, when low delay property is important.
Further, filter computation in the reverb signal generation unit 71, and the analysis filter bank 81 and the reverb signal generation units 91 of the subband stereo signal generation units 82-1 to 82-P increases the computation amount, and also increases the required buffer capacity.
In light of such a situation, the present invention can prevent delay and an increase in the computation amount upon decoding of audio signals when multichannel audio signals are donwmixed and coded.
A speech processing apparatus according to an aspect of the present invention includes: an acquisition unit which acquires frequency domain coefficients of speech signals of channels which are generated from speech signals which are speech time domain signals of a plurality of channels, and the number of which is less than a plurality of channels, and a parameter representing a relationship between the plurality of channels; a first transform unit which transforms the frequency domain coefficients acquired by the acquisition unit, into first time domain signals; a second transform unit which transforms the frequency domain coefficients acquired by the acquisition unit, into second time domain signals; and a synthesis unit which generates the speech signals of the plurality of channels by synthesizing the first time domain signals and the second time domain signals using the parameter, wherein a base of transform performed by the first transform unit and a base of transform performed by the second transform unit are orthogonal.
A speech processing method and a program according to an aspect of the present invention support a speech processing apparatus according to an aspect of the present invention.
According to an aspect of the present invention, frequency domain coefficients of speech signals of channels which are generated from speech signals which are speech time domain signals of a plurality of channels, and the number of which is less than a plurality of channels, and a parameter representing a relationship between the plurality of channels are acquired, the acquired frequency domain coefficients are transformed into first time domain signals, the acquired frequency domain coefficients are transformed into second time domain signals, and the speech signals of the plurality of channels are generated by synthesizing the first time domain signals and the second time domain signals using the parameter. In addition, a base of transform into the first time domain signals and a base of transform into the second time domain signals are orthogonal.
The speech processing apparatus according to an aspect of the present invention may be an independent apparatus or may be an internal block which forms one apparatus.
According to an aspect of the present invention, it is possible to prevent delay and an increase in the computation amount upon decoding of audio signals when multichannel audio signals are downmixed and coded.
[Configuration Example of Speech Processing Apparatus According to First Embodiment]
The same configuration illustrated in
The configuration of the speech processing apparatus 100 in
The speech processing apparatus 100 decodes, for example, coded data spatially coded by a coding apparatus 10 in
More specifically, the inverse multiplexing unit 101 (acquisition unit) of the speech processing apparatus 100 corresponds to the inverse multiplexing unit 41 in
Further, the inverse multiplexing unit 101 inversely multiplexes the coded data, and obtains a quantized and entropy-coded frequency spectrum coefficient and quantization information. Furthermore, the inverse multiplexing unit 101 supplies the quantized and entropy-coded frequency spectrum coefficient, to the entropy decoding unit 52, and supplies the quantization information to the spectrum inverse quantization unit 53. Still further, the inverse multiplexing unit 101 supplies the BC parameter to the generation parameter calculation unit 104.
The uncorrelated frequency-time transform unit 102 generates the monaural signal XM and the signal XD′ which are two uncorrelated time domain signals, from the frequency spectrum coefficient of the monaural signal XM obtained as a result of inverse quantization by the spectrum inverse quantization unit 53. Further, the uncorrelated frequency-time transform unit 102 supplies the monaural signal XM and the signal XD′ to the stereo synthesis unit 103. This uncorrelated frequency-time transform unit 102 will be described in detail with reference to
The stereo synthesis unit 103 (synthesis unit) synthesizes the monaural signal XM and the signal XD′ supplied from the uncorrelated frequency-time transform unit 102, using generation parameters supplied from the generation parameter calculation unit 104. Further, the stereo synthesis unit 103 outputs a left audio signal XL and a right audio signal XR obtained as a result of synthesis as stereo signals. This stereo synthesis unit 103 will be described in detail with reference to
The generation parameter calculation unit 104 interpolates the BC parameter of a predetermined frame supplied from the inverse multiplexing unit 101, and calculates the BC parameter of each frame. The generation parameter calculation unit 104 generates the generation parameters using the BC parameter of a current processing target frame, and supplies the generation parameters to the stereo synthesis unit 103.
[Detailed Configuration Example of Uncorrelated Frequency-Time Transform Unit]
The uncorrelated frequency-time transform unit 102 in
The IMDCT unit 54 (first transform unit) in
The IMDST (Inverse Modified Discrete Sine Transform) unit 111 (second transform unit) performs IMDST of the frequency spectrum coefficient of the monaural signal XM supplied from the vector inverse quantization unit 53. Further, the IMDST unit 111 supplies the resulting signal XD′ which is a time domain signal (second time domain signal) to the stereo synthesis unit 103 (
As described above, transform performed by the IMDCT unit 54 is inverse cosine transform and transform performed by the IMDST unit 111 is inverse sine transform, and the base of transform performed by the IMDCT unit 54 and the base of transform performed by the IMDST unit 111 are orthogonal. Consequently, it is possible to regard that the monaural signal XM and the signal XD′ are substantially uncorrelated to each other.
In addition, MDCT, IMDCT and IMDST are defined according to following equations (1) to (3).
In equations (1) to (3), x(n) is a time domain signal, w(n) is a transform window, w′ (n) is an inverse transform window and y(n) is an inversely transformed signal. Further, Xc(k) is a MDCT coefficient, and Xs(k) is a MDST coefficient.
[Detailed Configuration Example of Uncorrelated Frequency-Time Transform Unit]
The same configuration illustrated in
The configuration of the uncorrelated frequency-time transform unit 102 in
The spectrum inversion unit 121 of the uncorrelated frequency-time transform unit 102 in
The IMDCT unit 122 performs IMDCT of the frequency spectrum coefficients supplied from the spectrum inversion unit 121, and obtains time domain signals. The IMDCT unit 122 supplies these time domain signals to the sign inversion unit 123.
The sign inversion unit 123 inverts the sign of an odd sample of the time domain signal supplied from the IMDCT unit 122, and obtains the signal XD′.
Meanwhile, when Xs(k) is replaced with Xs(N−k−1) in above equation 3 which defines IMDST, if N is a common multiple of 4, equation 3 can be modified to following equation 4.
Hence, a signal obtained as a result of performing IMDST of the frequency spectrum coefficients from the spectrum inverse quantization unit 53, and a signal obtained as a result of inverting and performing IMDST of the frequency spectrum coefficients such that the frequencies are in an inverse order and inverting the sign of the odd sample are the same signal XD′. That is, the IMDST unit 111 in
The sign inversion unit 123 supplies the obtained signal XD′ to the stereo synthesis unit 103 in
As described above, the uncorrelated frequency-time transform unit 102 in
[Detailed Configuration Example of Stereo Synthesis Unit]
The stereo synthesis unit 103 in
The multiplier 141 multiplies the monaural signal XM supplied from the uncorrelated frequency-time transform unit 102, with a coefficient h11 which is one of generation parameters supplied from the generation parameter calculation unit 104. The multiplier 141 supplies a resulting multiplication value h11×XM to the adder 145.
The multiplier 142 multiplies the monaural signal XM supplied from the uncorrelated frequency-time transform unit 102, with a coefficient h21 which is one of generation parameters supplied from the generation parameter calculation unit 104. The multiplier 141 supplies a resulting multiplication value h21×XM to the adder 146.
The multiplier 143 multiplies the signal XD′ supplied from the uncorrelated frequency-time transform unit 102, with a coefficient h12 which is one of generation parameters supplied from the generation parameter calculation unit 104. The multiplier 141 supplies a resulting multiplication value h12 ×XD′ to the adder 145.
The multiplier 144 multiplies the signal XD′ supplied from the uncorrelated frequency-time transform unit 102, with a coefficient h22 which is one of generation parameters supplied from the generation parameter calculation unit 104. The multiplier 141 supplies a resulting multiplication value h22×XD′ to the adder 146.
The adder 145 adds the multiplication value h11×XM supplied from the multiplier 141 and the multiplication value h12×XD′ supplied from the multiplier 143, and outputs a resulting addition value as the left audio signal XL.
The adder 146 adds the multiplication value h21×XM supplied from the multiplier 142 and the multiplication value h22×XD′ supplied from the multiplier 143, and outputs a resulting addition value obtained as the right audio signal XR.
As described above, the stereo synthesis unit 103 performs weighted addition using generation parameters as indicated in following equation 5 by using as a vector the monaural signal XM, the signal XD′, the left audio signal XL and the right audio signal XR as illustrated in
[Equation 5]
X
L
=h
11
·X
M
+h
12
·X
D′
X
R
=h
21
·X
M
+h
22
·X
D′ (5)
In addition, the coefficients h11, h12, h21 and h22 are represented by following equation 6.
In equation 6, an angle θL is an angle formed between the vector of the left audio signal XL and the vector of the monaural signal XM, and an angle θR is an angle formed between the vector of the right audio signal XR and the vector of the monaural signal XM.
Meanwhile, the coefficients h11, h12, h21 and h22 are calculated as generation parameters by the generation parameter calculation unit 104. More specifically, the generation parameter calculation unit 104 calculates gL, gR, θL and θR from the BC parameters, and calculates the coefficients h11, h12, h21 and h22 from gL, gR, θL and θR as generation parameters. In addition, details of a method of calculating gL, gR, θL and θR from BC parameters are disclosed in, for example, Japanese Patent Application Laid-Open No. 2006-325162.
In addition, for BC parameters, gL, gR, θL and θR can also be used, and compressed coded gL, gR, θL and θR can also be used. Further, for BC parameters, the coefficients h11, h12, h21, and h22 can also be directly used, or can also be compressed and coded, and used.
[Description of Processing of Speech Processing Apparatus]
In step S11 in
In step S12, the entropy decoding unit 52 performs entropy decoding such as Huffman decoding or arithmetic decoding of the frequency spectrum coefficients supplied from the inverse multiplexing unit 101, and restores the quantized frequency spectrum coefficients. The entropy-decoding unit 52 supplies the frequency spectrum coefficients to the spectrum inverse quantization unit 53.
In step S13, the spectrum inverse quantization unit 53 inversely quantizes the quantized frequency spectrum coefficients supplied from the entropy decoding unit 52 based on the quantization information supplied from the inverse multiplexing unit 101, and restores the frequency spectrum coefficients. Further, the spectrum inverse quantization unit 53 supplies the frequency spectrum coefficients to the uncorrelated frequency-time transform unit 102.
In step S14, the uncorrelated frequency-time transform unit 102 generates the monaural signal XM and the signal XD′ which are two uncorrelated time domain signals from the frequency spectrum coefficient of the monaural signal XM obtained as a result of inverse quantization by the spectrum inverse quantization unit 53. Further, the uncorrelated frequency-time transform unit 102 supplies the monaural signal XM and the signal XD′ to the stereo synthesis unit 103.
In step S15, the stereo synthesis unit 103 synthesizes the monaural signal XM and the signal XD′ supplied from the uncorrelated frequency-time transform unit 102 using the generation parameters supplied from the generation parameter calculation unit 104.
In step S16, the generation parameter calculation unit 104 interpolates the BC parameter of a predetermined frame supplied from the inverse multiplexing unit 101, and calculates the BC parameter of each frame.
In step S17, the generation parameter calculation unit 104 generates the coefficients h11, h12, h21 and h22 as generation parameters using the BC parameter of a current processing target frame, and supplies the generation parameters to the stereo synthesis unit 103.
In step S18, the stereo synthesis unit 103 synthesizes the monaural signal XM and the signal XD′ supplied from the uncorrelated frequency-time transform unit 102 using the generation parameters supplied from the generation parameter calculation unit 104, and generates a stereo signal. Further, the stereo synthesis unit 103 outputs the stereo signal, and processing ends.
As described above, the speech processing apparatus 100 generates the monaural signal XM and the signal XD′ by performing two types of transform such that the base is orthogonal to the frequency spectrum coefficient of the monaural signal XM. That is, the speech processing apparatus 100 can generate the signal XD′ using the frequency spectrum coefficient of the monaural signal XM. Consequently, the speech processing apparatus 100 can prevent delay caused by a reverb signal generation unit 71 in
Further, the IMDCT unit 54 of the conventional decoding apparatus 40 can be reutilized as part of the uncorrelated frequency-time transform unit 102, so that it is possible to minimize addition of new functions and prevent an increase in a circuit scale and required resources.
[Configuration Example of Speech Processing Apparatus According to Second Embodiment]
The same configuration illustrated in
The configuration of a speech processing apparatus 200 in
The speech processing apparatus 200 decodes, for example, coded data for which the same spatial coding as in a coding apparatus 10 in
More specifically, the band division unit 201 (division unit) of the speech processing apparatus 200 divides the frequency spectrum coefficient obtained by a spectrum inverse quantization unit 53, into two groups of high band frequency spectrum coefficients and low band frequency spectrum coefficients according to frequencies. Further, the band division unit 201 supplies the low band frequency spectrum coefficients to the IMDCT unit 202, and supplies the high band frequency spectrum coefficients to an uncorrelated frequency-time transform unit 102.
The IMDCT unit 202 (third transform unit) performs IMDCT of the low band frequency spectrum coefficients supplied from the band division unit 201, and obtains a monaural signal XMlow (third time domain signal) which is a low band time domain signal. The IMDCT unit 202 supplies the low band monaural signal XMlow to the adder 203 as a low band left audio signal, and to the adder 204 as the low band right audio signal.
The adder 203 receives an input of a high band left audio signal XLHigh obtained as a result of processing the high band frequency spectrum coefficient output from the band division unit 201 in the uncorrelated frequency-time transform unit 102 and the stereo synthesis unit 103. The adder 203 adds the high band left audio signal XLHigh and the low band monaural signal XMlow supplied from the IMDCT unit 202 as the low band left audio signal, and generates an entire frequency band left audio signal XL.
The adder 204 receives an input of a high band right audio signal XRHigh obtained as a result of processing the high band frequency spectrum coefficient output from the band division unit 201 in the uncorrelated frequency-time transform unit 102 and the stereo synthesis unit 103. The adder 204 adds the high band right audio signal XRHigh and the low band monaural signal XMlow supplied from the IMDCT unit 202 as the low band right audio signal, and generates an entire frequency band right audio signal XR.
[Description of Processing of Speech Processing Apparatus]
Steps S31 to S33 in
In step S34, the band division unit 201 divides frequency spectrum coefficients obtained by the spectrum inverse quantization unit 53, into two groups of high band frequency spectrum coefficients and low band frequency spectrum coefficients according to frequencies. Further, the band division unit 201 supplies the low band frequency spectrum coefficients to the IMDCT unit 202, and supplies the high band frequency spectrum coefficients to the uncorrelated frequency-time transform unit 102.
In step S35, the IMDCT unit 202 performs IMDCT of the low band frequency spectrum coefficients supplied from the band division unit 201, and obtains the monaural signal XMlow which is a low band time domain signal. The IMDCT unit 202 supplies the low band monaural signal XMlow to the adder 203 as the low band left audio signal, and to the adder 204 as the low band right audio signal.
In step S36, stereo signal generation processing is performed for high band frequency spectrum coefficients supplied from the band division unit 201 by the uncorrelated frequency-time transform unit 102, the stereo synthesis unit 103, and the generation parameter calculation unit 104. More specifically, the uncorrelated frequency-time transform unit 102, the stereo synthesis unit 103 and the generation parameter calculation unit 104 perform processing in steps S14 to S18 in
In step S37, the adder 203 adds the low band monaural signal XMlow supplied from the IMDCT unit 202 as a low band left audio signal and the high band left audio signal XLHigh supplied from the uncorrelated frequency-time transform unit 102, and generates an entire frequency band left audio signal XL. Further, the adder 203 outputs the entire frequency band left audio signal XL.
In step S38, the adder 204 adds the low band monaural signal XMlow supplied from the IMDCT unit 202 as the low band right audio signal and the high band right audio signal XRHigh supplied from the uncorrelated frequency-time transform unit 102, and generates the entire frequency band right audio signal XR. Further, the adder 204 outputs this entire frequency band right audio signal XR.
As described above, the speech processing apparatus 200 decodes coded data of the entire frequency band monaural signal XM, and stereo-codes only the high band. Consequently, it is possible to prevent sound from being unnatural due to stereo coding of the low band monaural signal XM.
In addition, although, with the speech processing apparatus 200, the band division unit 201 divides frequency spectrum coefficients into high band frequency spectrum coefficients and low band frequency spectrum coefficients, the band division band unit 201 may divide frequency spectrum coefficients into predetermined frequency band frequency spectrum coefficients and other frequency band frequency spectrum coefficients. That is, whether or not stereo coding is performed may be selected depending on whether a frequency band is a predetermined frequency band or other frequency bands instead of whether a frequency band is a low band or a high band.
[Configuration Example of Speech Processing Apparatus According to Third Embodiment]
The same configuration illustrated in
A configuration of a speech processing apparatus 300 in
The speech processing apparatus 300 in
More specifically, the inverse multiplexing unit 301 of the speech processing apparatus 300 corresponds to the inverse multiplexing unit 41 in
Furthermore, the inverse multiplexing unit 301 inversely multiplexes the coded data, and obtains quantized and entropy-coded frequency spectrum coefficients of N subband signals and quantization information. The inverse multiplexing unit 301 supplies the quantized and entropy-coded frequency spectrum coefficients of the N subband signals to the entropy decoding unit 62, and supplies the quantization information to the spectrum inverse quantization unit 63.
The IMDCT units 304-1 to 304-(N-1) (third transform unit) and the stereo coding unit 305 receive an input of the frequency spectrum coefficients of the N subband signals restored by the spectrum inverse quantization unit 63 one by one.
The IMDCT units 304-1 to 304-(N-1) each perform IMDCT of the input frequency spectrum coefficient, and transform the frequency spectrum coefficient into a subband signal XMi (i=1, 2, . . . and N-1) of the monaural signal XM which is a time domain signal. The IMDCT units 304-1 to 304-(N-1) each supply the subband signal XMi to the synthesis filter bank 306 as a left audio signal XLi and a right audio signal XRi.
The stereo coding unit 305 includes an uncorrelated frequency-time transform unit 102 and a stereo synthesis unit 103 in
The synthesis filter bank 306 (addition unit) includes a left synthesis filter bank for synthesizing a subband signal of a left audio signal, and a right synthesis filter bank for synthesizing a subband signal of a right audio signal. The left synthesis filter bank of the synthesis filter bank 306 synthesizes left subband signals XL1 to XLN-1 from the IMDCT units 304-1 to 304-(N-1), and the left subband signal XLA from the stereo coding unit 305. Further, the left synthesis filter bank outputs the entire frequency band left audio signal XL obtained as a result of synthesis.
Furthermore, the right synthesis filter bank of the synthesis filter bank 306 synthesizes right subband signals XR1 to XRN-1 from the IMDCT units 304-1 to 304-(N-1), and the right subband signal XRA from the stereo coding unit 305. Still further, the right synthesis filter bank outputs the entire frequency band right audio signal XR obtained as a result of synthesis.
In addition, although the speech processing apparatus 300 in
[Description of Processing of Speech Processing Apparatus]
In step S51 in
In step S52, the entropy decoding unit 62 entropy-decodes the frequency spectrum coefficients of the N subband signals supplied from the inverse multiplexing unit 101, and supplies the frequency spectrum coefficients to the spectrum inverse quantization unit 63.
In step S53, the spectrum inverse quantization unit 63 inversely quantizes the frequency spectrum coefficients of the N subband signals supplied from the entropy decoding unit 62 and obtained as a result of entropy decoding, based on the quantization information supplied from the inverse multiplexing unit 301. Further, the spectrum inverse quantization unit 63 supplies the resulting restored frequency spectrum coefficients of the N subband signals, to the IMDCT units 304-1 to 304-(N-1) and the stereo coding unit 305 one by one.
In step S54, the IMDCT units 304-1 to 304-(N-1) each perform IMDCT of the frequency spectrum coefficient supplied from the spectrum inverse quantization unit 63. Further, the IMDCT units 304-1 to 304-(N-1) each supply the resulting subband signal XMi (i=1, 2, . . . and N-1) of a monaural signal to the synthesis filter bank 306 as the subband signal XLi of the left audio signal and the subband signal XLi of the right audio signal.
In step S55, the stereo coding unit 305 performs stereo signal generation processing of the frequency spectrum coefficient of a predetermined subband signal supplied from the spectrum inverse quantization unit 63, using the generation parameters supplied from the generation parameter calculation unit 104. Further, the stereo coding unit 305 supplies the resulting subband signal XLA of the left audio signal and subband signal XRA of the right audio signal which are time domain signals, to the synthesis filter bank 306.
In step S56, the left synthesis filter bank of the synthesis filter bank 306 synthesizes all subband signals of left audio signals supplied from the IMDCT units 304-1 to 304-(N-1) and the stereo coding unit 305, and generates the entire frequency band left audio signal XL. Further, the left synthesis filter bank outputs this entire frequency band left audio signal XL.
In step S57, the right synthesis filter bank of the synthesis filter bank 306 synthesizes all subband signals of right audio signals supplied from the IMDCT units 304-1 to 304-(N-1) and the stereo coding unit 305, and generates the entire frequency band right audio signal XR. Further, the right synthesis filter bank outputs this entire frequency band right audio signal XR.
[Configuration Example of Speech Processing Apparatus According to Fourth Embodiment]
The same configuration illustrated in
The configuration of a speech processing apparatus 400 in
The speech processing apparatus 400 decodes coded data for which intensity coding is performed, and on which a BC parameter at a frequency equal to or more than an intensity start frequency Fis is multiplexed instead of a conventional level ratio of inter-channel frequency spectrum coefficients. That is, the coded data decoded by the speech processing apparatus 400 is generated by a coding apparatus which detects the BC parameter by, for example, downmixing a coding target stereo signal to a monaural signal XM and extracting the resulting monaural signal XM and a component at a frequency equal to or more than the intensity start frequency Fis of the coding target stereo signal by means of, for example, a bypass filter.
The spectrum separation unit 401 (separation unit) of the speech processing apparatus 400 obtains frequency spectrum coefficients restored by a spectrum inverse quantization unit 53. The spectrum separation unit 401 separates this frequency spectrum coefficient into a frequency spectrum coefficient of a stereo signal at a frequency lower than the intensity start frequency Fis and a frequency spectrum coefficient of a monaural signal XMhigh at a frequency equal to or more than the intensity start frequency Fis. The spectrum separation unit 401 supplies the frequency spectrum coefficient of the left audio signal XLlow of the stereo signal at a frequency lower than the intensity start frequency Fis, to the IMDCT unit 402, and supplies the frequency spectrum coefficient of the right audio signal XRlow to the IMDCT unit 403. Further, the spectrum separation unit 401 supplies the frequency spectrum coefficient of the monaural signal XMhigh to an uncorrelated frequency-time transform unit 102.
The IMDCT unit 402 (third transform unit) performs IMDCT of the frequency spectrum coefficient of the left audio signal XLlow supplied from the spectrum separation unit 401, and supplies the resulting left audio signal XLlow to the adder 404.
The IMDCT unit 403 (third transform unit) performs IMDCT of the frequency spectrum coefficient of the right audio signal XRlow supplied from the spectrum separation unit 401, and supplies the resulting right audio signal XRlow to the adder 405.
The adder 404 (addition unit) adds the left audio signal XLhigh which is generated by the stereo synthesis unit 103 and which is a time domain signal at a frequency equal to or more than an intensity start frequency Fis, and the left audio signal XLlow supplied from the IMDCT unit 402. The adder 404 outputs the resulting audio signal as the entire frequency band left audio signal XL.
The adder 405 (addition unit) adds the right audio signal XRhigh which is generated by the stereo synthesis unit 103 and which is a time domain signal at a frequency equal to or more than the intensity start frequency Fis, and the right audio signal XRlow supplied from the IMDCT unit 402. The adder 405 outputs the resulting audio signal as the entire frequency band right audio signal XR.
As described above, the speech processing apparatus 400 stereo-codes a component of the frequency equal to or more than the intensity start frequency Fis monaural-coded by intensity coding, using the BC parameter multiplexed on intensity-coded data. Consequently, it is possible to restore a stereophonic effect of the component of the frequency equal to or more than the intensity start frequency Fis compared to an intensity decoding apparatus which performs stereo-coding using a conventional level ratio of inter-channel frequency spectrum coefficients.
[Description of Processing of Speech Processing Apparatus]
Processing in steps S71 to S73 in
In step S74, the spectrum separation unit 401 separates the frequency spectrum coefficients restored by the spectrum inverse quantization unit 53 into frequency spectrum coefficients of stereo signals at a frequency lower than the intensity start frequency Fis and the frequency spectrum coefficient of the monaural signal XMhigh at a frequency equal to or more than the intensity start frequency Fis. The spectrum separation unit 401 supplies the frequency spectrum coefficient of the left audio signal XLlow of the stereo signal at a frequency lower than the intensity start frequency Fis, to the IMDCT unit 402, and the frequency spectrum coefficient of the right audio signal XRlow to the IMDCT unit 403. Further, the spectrum separation unit 401 supplies the frequency spectrum coefficient of the monaural signal XMhigh to the uncorrelated frequency-time transform unit 102.
In step S75, the IMDCT unit 402 performs IMDCT of the frequency spectrum coefficient of the left audio signal XLlow supplied from the spectrum separation unit 401. Further, the IMDCT unit 402 supplies the resulting left audio signal XLlow to the adder 404.
In step S76, the IMDCT unit 403 performs IMDCT of the frequency spectrum coefficient of the right audio signal XRlow supplied from the spectrum separation unit 401. Further, the IMDCT unit 403 supplies the resulting right audio signal XRlow to the adder 405.
In step S77, the uncorrelated frequency-time transform unit 102, the stereo synthesis unit 103 and the generation parameter calculation unit 104 perform stereo signal generation processing of the frequency spectrum coefficient of the monaural signal XMhigh from the spectrum separation unit 401. The resulting left audio signal XLhigh which is a time domain signal is supplied to the adder 404, and the right audio signal XRhigh is supplied to the adder 405.
In step S78, the adder 404 adds the left audio signal XLlow at a frequency lower than the intensity start frequency Fis from the IMDCT unit 402 and the left audio signal XLhigh at a frequency equal to or more than the intensity start frequency Fis from the stereo synthesis unit 103, and generates the entire frequency band left audio signal XL. Further, the adder 404 outputs this left audio signal XL.
In step S79, the adder 405 adds the right audio signal XRlow at a frequency lower than the intensity start frequency Fis from the IMDCT unit 403 and the right audio signal XRhigh at a frequency equal to or more than the intensity start frequency Fis from the stereo synthesis unit 103, and generates the entire frequency band right audio signal XR. Further, the adder 405 outputs this right audio signal XR.
In addition, although, with the above description, a speech processing apparatus 100 (200, 300 and 400) decodes coded data which is time-frequency transformed by MDCT, and therefore IMDCT is performed upon frequency-time transform, IMDST is performed upon frequency-time transform when coded data which is time-frequency transformed by MDST is decoded.
Further, although, with the above description, the uncorrelated time-frequency transform unit 102 uses IMDCT transform and IMDST transform where bases are orthogonal to each other, other lapped orthogonal transform such as sine transform or cosine transform may be used.
[Description of Computer to which Present Invention is Applied]
Next, a series of the above processing can be executed by hardware or by software. When a series of the processing are executed by software, a program configuring this software is installed to, for example, a general-purpose computer.
The program can be recorded in advance in a memory unit 508 or a ROM (Read Only Memory) 502 which is a recording medium built in the computer.
Alternatively, the program can be stored (recorded) in a removable media 511. This removable media 511 can be provided as so-called package software. Meanwhile, the removable media 511 includes, for example, a flexible disc, a CD-ROM (Compact Disc Read Only Memory), a MO (Magneto Optical) disc, a DVD (Digital Versatile Disc), a magnetic disc and a semiconductor memory.
In addition, the program can be installed to a computer from the above removable media 511 through a drive 510, and, in addition, may be downloaded to a computer through a communication network or a broadcasting network or installed in the built-in memory unit 508. That is, the program can be wirelessly transferred, for example, from a download site to a computer through a digital satellite broadcasting satellite, or can be transferred to a computer by way of a wire through a network such as LAN (Local Area Network) or Internet.
The computer has a built-in CPU (Central Processing Unit) 501, and the CPU 501 is connected with an input/output interface 505 through a bus 504.
The CPU 501 executes the program stored in the ROM 502 according to a command when receiving an input of the command according to, for example, a user' s operation of an input unit 506 through the input/output interface 505. Alternatively, the CPU 501 loads the program stored in the memory unit 508 to a RAM (Random Access Memory) 503 and executes the program.
Thus, the CPU 501 executes processing according to the above flowchart or processing executed by the configuration in the above block diagram. Further, the CPU 501 outputs this processing result from an output unit 507 through the input/output interface 505, transmits the processing result from a communication unit 509 or records the processing result in the memory unit 508.
In addition, the input unit 506 includes a keyboard, a mouse or a microphone. Further, the output unit 507 includes a LCD (Liquid Crystal Display) or speakers.
Meanwhile, in this description, processing executed by the computer according to the program does not necessarily need to be executed in a chronological order disclosed as a flowchart. That is, the processing executed by the computer according to the program include processing (such as parallel processing or processing by an object) executed in parallel or individually.
Further, the program may be processed by one computer (processor) or processed in a distributed manner by a plurality of computers. Furthermore, the program may be transferred to a distant computer and executed.
The present invention is applicable to a pseudo stereo coding technique for audio signals.
The embodiments of the present invention are by no means limited to the above embodiments, and can be variously modified within a scope which does not deviate from the spirit of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2010-061170 | Mar 2010 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/055293 | 3/8/2011 | WO | 00 | 9/10/2012 |