This application claims the benefit of Korean Patent Application No. 10-2008-00067815, filed on Jul. 11, 2008, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field
One or more embodiments relate to a method and apparatus for encoding and decoding an audio signal and/or a speech signal, and more particularly, to a method and apparatus for encoding and decoding a multi-channel signal having a plurality of channels.
2. Description of the Related Art
In general, methods of encoding a multi-channel signal are categorized into waveform encoding and a parametric encoding. In parametric encoding, a multi-channel image is formed by transmitting a spatial cue at a low bit rate. Parametric encoding is generally performed at about 40 kbps or less, with a down-mixing process being performed on a multi-channel signal. Spatial cues are extracted during the down-mixing process and are expressed in the form of inter-channel energy or level differences, inter-channel signal similarity, or inter-channel phase differences, to encode the multi-channel signal. Motion picture experts group (MPEG) surround coding and binaural cue coding are representative examples of the parametric encoding. However, such encoding techniques are not capable of precisely expressing reverberations, and thus, it is difficult to recover the original sounds even if the encoding bit rate is increased.
One or more embodiments include a multi-channel encoding and decoding method and apparatus capable of encoding and decoding residual signals by removing redundant information between a plurality of channels without a need for a downmixed signal.
According to one or more embodiments, there is provided a multi-channel encoding apparatus including a reference signal encoding unit to generate at least one reference signal from a plurality of channel signals in a multi-channel signal and to encode the reference signal, a phase difference encoding unit to calculate and encode respective phase differences between the plurality of channel signals and the reference signal, a gain encoding unit to calculate and encode respective gains of the plurality of channel signals, with the respective gains being ratios of respective amplitudes of the plurality of channel signals to an amplitude of the reference signal, and a residual signal encoding unit to extract and encode respective residual signals corresponding to differences between each predicted signal and each corresponding channel signal of the plurality of channel signals, where each predicted signal is predicted by respectively applying a respective calculated phase difference and a respective calculated gain to the reference signal for each corresponding channel signal of the plurality of channel signals.
According to one or more embodiments, there is provided a multi-channel decoding apparatus including a reference signal decoding unit to decode at least one reference signal, from a plurality of channel signals, for a multi-channel signal, a phase difference decoding unit to decode respective phase differences between the plurality of channel signals and the reference signal, a gain decoding unit to decode respective gains of the plurality of channel signals as ratios of respective amplitudes of the plurality of channel signals to an amplitude of the reference signal, a residual signal decoding unit to decode respective residual signals corresponding to encoder determined differences between each predicted signal and each corresponding channel signal of the plurality of channel signals, with each predicted signal being predicted by respectively applying an encoder calculated phase difference and an encoder calculated gain to the reference signal during an encoding of the multi-channel signal, and a multi-channel reconstruction unit to reconstruct the plurality of channel signals by using the respective phase differences, respective gains, and respective residual signals.
According to one or more embodiments, there is provided a multi-channel encoding method including generating and encoding at least one reference signal from a plurality of channel signals in a multi-channel signal, calculating and encoding respective phase differences between the plurality of channel signals and the reference signal, calculating and encoding respective gains of the plurality of channel signals, with the respective gains being ratios of respective amplitudes of the plurality of channel signals to an amplitude of the reference signal, and extracting and encoding respective residual signals corresponding to differences between each predicted signal and each corresponding channel signal of the plurality of channel signals, where each predicted signal is predicted by respectively applying a respective calculated phase difference and a respective calculated gain to the reference signal for each corresponding channel signal of the plurality of channel signals.
According to one or more embodiments, there is provided a multi-channel decoding method including decoding at least one reference signal, from a plurality of channel signals, for a multi-channel signal, decoding respective phase differences between the plurality of channel signals and the reference signal, decoding respective gains of the plurality of channel signals as ratios of respective amplitudes of the plurality of channel signals to an amplitude of the reference signal, decoding respective residual signals corresponding to encoder determined differences between each predicted signal and each corresponding channel signal of the plurality of channel signals, with each predicted signal being predicted by respectively applying a calculated phase difference and a calculated gain to the reference signal during an encoding of the multi-channel signal, and reconstructing the plurality of channel signals by using the respective phase differences, respective gains, and respective residual signals.
According to one or more embodiments, there is provided a computer readable recording medium having recorded thereon a computer program to control at least one processing device to implement a multi-channel encoding method, the method including generating and encoding at least one reference signal from a plurality of channel signals in a multi-channel signal, calculating and encoding respective phase differences between the plurality of channel signals and the reference signal, calculating and encoding respective gains of the plurality of channel signals, with the respective gains being ratios of respective amplitudes of the plurality of channel signals to an amplitude of the reference signal, and extracting and encoding respective residual signals corresponding to differences between each predicted signal and each corresponding channel signal of the plurality of channel signals, where each predicted signal is predicted by respectively applying a respective calculated phase difference and a respective calculated gain to the reference signal for each corresponding channel signal of the plurality of channel signals.
According to one or more embodiments, there is provided a computer readable recording medium having recorded thereon a computer program to control at least one processing device to implement a multi-channel decoding method, the method including decoding at least one reference signal, from a plurality of channel signals, for a multi-channel signal, decoding respective phase differences between the plurality of channel signals and the reference signal, decoding respective gains of the plurality of channel signals as ratios of respective amplitudes of the plurality of channel signals to an amplitude of the reference signal, decoding respective residual signals corresponding to encoder determined differences between each predicted signal and each corresponding channel signal of the plurality of channel signals, with each predicted signal being predicted by respectively applying a calculated phase difference and a calculated gain to the reference signal during an encoding of the multi-channel signal, and reconstructing the plurality of channel signals by using the respective phase differences, respective gains, and respective residual signals.
Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments.
These and/or other aspects will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. In this regard, the present embodiments may have different forms and should not be construed as being limited to the descriptions set forth herein. Accordingly, the embodiments are merely described below, by referring to the figures, to explain aspects of the present description.
The pre-processing unit 100 receives a multi-channel signal having N-channel signals via input terminals IN_1 to IN_N, and generates or selects a reference signal, to be used as a reference for encoding, from the N-channel signals. The pre-processing unit 100 may generates or select the reference signal from the N-channel signals but may also generate a reference signal in various ways. For example, if the multi-channel signal includes two-channel signals, the pre-processing unit 100 may select or generate the reference signal based on at least one of the matrices expressed in the below Equation 1, for example. For example, the pre-processing unit 100 may perform an operation of a predetermined matrix with the plurality of channel signals and may generate the reference signal such that residual signals can be minimized.
The pre-processing unit 100 may also change the reference signal in units of bark bands selected or generated from the N-channel signals, noting that alternative techniques for choosing a reference signal are equally available. In addition, if the number of channels included in the multi-channel signal increases or according to the selection of a user or a system, a plurality of reference signals may be used.
Alternatively, the multi-channel encoding apparatus may not implement or include the pre-processing unit 100.
The transformation unit 110 may generate a multi-channel spectrum by transforming the multi-channel signal from the time domain to the frequency domain so that the amplitudes and phases of the N-channel signals are expressed. For example, the transformation unit 110 may express each of the N-channel signals in the form of a complex-valued spectrum by performing a complex-valued transformation. When the complex-valued transformation is used, the transformation unit 110 calculates a real-number part and imaginary-number part by respectively performing a modified discrete cosine transformation (MDCT) and modified discrete sine transformation (MDST), for example, on the multi-channel signal in the time domain.
For example, when the multi-channel signal includes two-channel signals, e.g., a stereo signal, the transformation unit 110 may respectively transform the left signal and the right signal into x(t) and y(t) spectrums, as shown in the below Equation 2, for example.
x(t)=a0(t)eiφ
y(t)=a1(t)eiφ
Here, x(t) denotes a spectrum being obtained by transforming the left signal (first channel signal) by the transformation unit 110, y(t) denotes a spectrum being obtained by transforming the right signal (second channel signal) by the transformation unit 110, ai(t) denotes the amplitude of an ith channel spectrum, and φi(t) denotes the phase of the ith channel spectrum.
The reference spectrum quantization unit 120 may quantize a reference spectrum being obtained by generating or selecting the reference signal from the N-channel signals by the pre-processing unit 100 and transforming the reference signal by the transformation unit 110. If the transformation unit 110 performs complex-valued transformation using MDCT and MDST, as only an example, the reference spectrum quantization unit 120 may quantize only a reference spectrum obtained using MDCT In addition, the reference spectrum quantization unit 120 may quantize the reference spectrum by controlling the encoded amount of bits by determining quantization step size based on a psychoacoustic model.
The phase difference calculation unit may 130 calculate the phase differences between the respective channel spectrums and the reference spectrum. For example, the phase difference calculation unit 130 may calculate the phase differences according to the below Equation 3, for example.
ψi=φs(t)−φi(t) Equation 3:
Here, ψi denotes the phase differences between the ith channel spectrum and the reference spectrum, φs(t) denotes the phase of the reference spectrum, and φi(t) denotes the phase of the ith channel spectrum.
The gain calculation unit 140 may calculate respective gains, as respective ratios of the amplitudes of the channel spectrums to the amplitude of the reference spectrum. For example, the gain calculation unit 140 may calculate the gains according to the below Equation 4, for example.
Here, gi denotes the gain of the ith channel spectrum, as denotes the amplitude of the reference spectrum, and ai denotes the amplitude of the ith channel spectrum.
The calculation of the phase differences between the respective channel spectrums and the reference spectrum, e.g., by the phase difference calculation unit 130, and the calculation of the gains of the respective channel spectrums, e.g., by the gain calculation unit 140, according to one or more embodiments, will now be described based on an assumption that the input multi-channel signal is a two channel signal, such as a left signal and a right signal, noting that alternative embodiments are equally available.
First, the pre-processing unit 100 may select the left signal as a reference signal, and the transformation unit 110 may then generate a left spectrum and a right spectrum by transforming the left signal and the right signal from the time domain to the frequency domain by using a complex-valued transformation, as shown in the below Equation 5.
Here, L denotes the left spectrum obtained by the transformation unit 110, R denotes the right spectrum obtained by the transformation unit 110, akL denotes the amplitude of the left spectrum, akR denotes the amplitude of the right spectrum, φkL denotes the phase of the left spectrum, and φkR denotes the phase of the right spectrum.
The phase difference calculation unit 130 and the gain calculation unit 140 respectively calculate phase differences and gains that lead to a minimum value shown by the below Equation 6, for example.
Here, g denotes the gain and ψ denotes the phase difference.
Then, Equation 6 may be partially differentiated with respect to the gain g and the phase difference ψ, as shown in the below Equation 7, for example.
The phase difference calculation unit 130 and the gain calculation unit 140 finally, respectively calculate the phase difference ψ and the gain g that cause values of Equation 7 to be zero by using the below Equation 8, for example, e.g., so that a mean squared error between a predicted right signal that is predicted by applying the gain g and the phase difference ψ to the left signal which is the reference signal and the actual right signal are minimized.
The residual spectrum extraction unit 150 extracts residual spectrums corresponding to differences between the respective channel spectrums and predicted spectrums thereof, where the predicted spectrums are obtained by respectively applying the phase differences and gains of the respective channel spectrums calculated by the phase difference calculation unit 130 and the gain calculation unit 140 to the reference spectrum. For example, the residual spectrum extraction unit 150 may extract the residual spectrums according to the below Equation 9, for example.
r
i
=a
i cos φi−
Here, ri denotes a residual spectrum corresponding to the ith channel spectrum, ai denotes the actual amplitude of the ith channel spectrum, φi denotes the phase of the ith channel spectrum, and
The real-number part
i
=Re{ga
s exp(φs+ψ)} Equation 10:
Here, g denotes the gain calculated by the gain calculation unit 140, ψ denotes the phase difference calculated by the phase difference calculation unit 130, as denotes the amplitude of the reference spectrum, and φs denotes the phase of the reference spectrum.
The phase difference quantization unit 135 may then quantize the phase differences between the respective channel spectrums and the reference spectrum, e.g., as calculated by the phase difference calculation unit 130. The phase difference quantization unit 135 may quantize the phase differences on a uniform scale, for example.
The gain quantization unit 145 may quantize the gains of the respective channel spectrums, e.g., as calculated by the gain calculation unit 140. The gain quantization unit 145 may quantize the gains of the respective channel spectrums on either a log scale or the uniform scale, as another example.
The residual spectrum quantization unit 155 may quantize the residual spectrums of the respective channel spectrums, e.g., as extracted by the residual spectrum extraction unit 150. The residual spectrum quantization unit 155 may quantize the residual spectrums by controlling an encoded amount of bits by determining quantization step size according to the psychoacoustic model, for example.
The operations of the pre-processing unit 100, the reference spectrum quantization unit 120, the phase difference calculation unit 130, the phase difference quantization unit 135, the gain calculation unit 140, the gain quantization unit 145, the residual spectrum extraction unit 150, and the residual spectrum quantization unit 155 may be performed in the units of bark bands in consideration of a critical band, for example, noting that alternative embodiments are equally available.
The prediction checking unit 160 may determine how precise, i.e., accurate, the predicted spectrums, obtained by the respectively applying of the phase differences and the gains calculated by the phase difference calculation unit 130 and the gain calculation unit 140 to the reference spectrum, have been predicted from the corresponding actual channel spectrums, e.g., the original spectrums.
The prediction checking unit 160 may determine the precision of the prediction by comparing the energies of the residual spectrums extracted by the residual spectrum extraction unit 150 with those of the respective actual channel spectrums, noting that alternative embodiments are equally available.
In addition, the prediction checking unit 160 may classify frames into several frame types based on the precision of the prediction and may respectively encode the residual spectrums adaptively according to the corresponding frame types. For example, the prediction checking unit 160 may classify frames into three frame types based on the precision of prediction, as shown in the below Equation 11, for example.
The frame types may be used as the context of entropy coding when the residual spectrums are encoded, for example.
Alternatively, a multi-channel encoding apparatus according to one or more embodiments may not include or implement the prediction checking unit 160, and the reference spectrum, phase differences, gains, and residual spectrums may be encoded regardless of the precision of prediction.
For example, when the ratio of the energy of one of the predicted spectrums to the energy of the corresponding actual channel spectrums, as calculated by the prediction checking unit 160, meets a threshold, e.g., is greater than the threshold, as illustrated in Equation 11 related to the third frame type, then the multi-channel quantization unit 170 may quantize the corresponding channel spectrums and set the gain and phase difference thereof to ‘0’. Thus, the gain and phase difference of the corresponding channel spectrums would not be respectively quantized by the phase difference quantization unit 135 and the gain quantization unit 145, since the predicated spectrum of the corresponding channel spectrum, e.g., as predicted by the applying of the phase difference and gain of the corresponding channel spectrum that are calculated by the phase difference calculation unit 130 and the gain calculation unit 140 to the reference spectrum, when the prediction is not accurate, and thus it may be more efficient to individually encode the corresponding channel spectrum.
The losslessly encoding unit 180 may losslessly code the reference spectrum quantized by reference spectrum quantization unit 120, the phase differences of the respective channel spectrums quantized by the phase difference quantization unit 135, the gains of the respective channel spectrums quantized by the gain quantization unit 145, and the residual spectrums quantized by the residual spectrum quantization unit 155. However, as noted, when the ratio of the energy of the predicted spectrum to the energy of an actual channel spectrum thereof in a bark band meets a threshold, e.g., is greater than the threshold, from the channel spectrums, then the losslessly encoding unit 180 may losslessly code the corresponding channel spectrum instead of the phase differences, gains, and residual spectrums.
The multiplexing unit 190 may multiplex the reference spectrum, phase differences, gains, and residual spectrums, which are losslessly coded by the losslessly encoding unit 180, into a bitstream and then output the bitstream via an output terminal OUT. The multiplexing unit 190 may also multiplex the corresponding channel spectrum into a bitstream, instead of the phase differences, gains, and residual spectrums, according to the result of the prediction checking unit 160.
The demultiplexing unit 200 receives an encoded bitstream via an input terminal IN, and then demultiplexes the bitstream. The bitstream may include any of a reference spectrum, the phase differences between the respective channel spectrums and the reference spectrum, gains as ratios of amplitudes of the respective channel spectrums to an amplitude of the reference spectrum, and residual spectrums, or one or more channel spectrums, with respect to each of bark bands. Here, the reference spectrum may have been obtained by transforming a reference signal to be used as a reference for encoding from N-channel signals. The residual spectrums correspond to the differences between the respective channel spectrums and predicted spectrums thereof, where the predicted spectrums had been predicted by respectively applying the phase differences and gains of the actual channel spectrums thereof to the reference spectrum.
The losslessly decoding unit 210 may losslessly decode either the reference spectrum, phase differences, gains, and residual spectrums or the one or more channel spectrums.
The reference spectrum inverse quantization unit 220 may inversely quantize the reference spectrum that has been losslessly decoded by the losslessly decoding unit 210.
The first inverse transformation unit 225 may derive the reference signal by performing a first inverse transformation on the inversely quantized reference spectrum from the frequency domain to the time domain. An example of the first inverse transformation may include IMDCT related to a real-number part during complex-valued transformation, for example.
However, since a one-frame delay may occur in the transformation unit 230, which will be described in greater detail below, the first inverse transformation unit 225 may delay the reference signal by one frame and then supply the reference signal to the post-processing unit 270.
The transformation unit 230 may perform a second transformation on the reference signal, e.g., as inversely transformed by the first inverse transformation unit 225, from the time domain to the frequency domain. An example of second transformation may include MDST, for example, related to an imaginary-number part during complex-valued transformation. Since the transformation unit 230 performs the second transformation on the reference signal after the first inversion transformation has been performed by the first inverse transformation unit 225, the reference signal is therefore delayed by one frame before output from the transformation unit 230.
The phase difference inverse quantization unit 235 may inversely quantize the phase difference of the respective channel spectrums decoded by the losslessly decoding unit 210. The phase difference inverse quantization unit 235 may inversely quantize the phase differences on a uniform scale, for example.
The gain inverse quantization unit 240 may inversely quantize the gains of the respective channel spectrums decoded by the losslessly decoding unit 210. The gain inverse quantization unit 240 may inversely quantize the gains on a log scale or the uniform scale, also as an example.
The residual spectrum inverse quantization unit 245 may inversely quantize the residual spectrums of the respective channel spectrums decoded by the losslessly decoding unit 210.
The multi-channel spectrum reconstruction unit 250 reconstructs the channel spectrums by applying the phase differences being inversely quantized by the phase difference inverse quantization unit 235, the gains being inversely quantized by the gain inverse quantization unit 240, and the residual spectrums being inversely quantized by the residual spectrum inverse quantization unit 245 to the reference spectrum. Here, the reference spectrum is inversely quantized by the reference spectrum inverse quantization unit 220 and is transformed by the transformation unit 230 so that it may be used to express all the amplitudes and phases of the respective channel signals. In other words, the multi-channel spectrum reconstruction unit 250 may reconstruct each of the channel spectrums by shifting the phase of the reference spectrum by the phase difference between the respective channel spectrums and the reference spectrum, adjusting the amplitude of the reference spectrum by the gain of the channel spectrum, and adding the corresponding residual spectrum to the reference spectrum.
However, as noted, a one-frame delay may occur in the transformation unit 230, and thus, the multi-channel spectrum reconstruction unit 250 may start to reconstruct the channel spectrums after a one-frame delay.
The second inverse transformation unit 255 may inversely transform the respective channel spectrums reconstructed by the multi-channel spectrum reconstruction unit 250 from the frequency domain to the time domain.
When a multi-channel encoding apparatus has previously determined that the then predicted spectrum of the at least one channel spectrum, predicted by an applying of a phase difference and gain of the at least one channel spectrum to the reference spectrum, was not accurately predicted, and thus had encoded the at least one channel spectrum instead of the phase differences, gains, and residual spectrums, then the multi-channel inverse quantization unit 260 may inversely quantize the corresponding at least one channel spectrum.
The second inverse transformation unit 255 may inversely transform the channel spectrums being inversely quantized by the multi-channel inverse quantization unit 260 from the frequency domain to the time domain.
The post-processing unit 270 may perform a post-processing operation on the reference signal, as inversely transformed by the first inverse transformation unit 225 and delayed by one frame, and the multi-channel signal, as inversely transformed by the second inverse transformation unit 255, and then output the multi-channel signal via an output terminal OUT. Here, the post-processing operation may be an inverse operation of an operation performed by the pre-processing unit 100 of
Alternatively, in operation 300, the reference signal may be changed in units of bark bands from the N-channel signals.
In one or more embodiments, a multi-channel encoding method may alternatively not include or implement operation 300.
Next, multi-channel spectrums are generated by transforming the multi-channel signal from the time domain to the frequency domain so that amplitudes and phases of the respective channel signals are expressed (operation 310). For example, in operation 310, the respective multi-channel signals may be expressed in the form of complex-valued spectrums by performing a complex-valued transformation. When a complex-valued transformation is used in operation 310, a real-number part and imaginary part may be calculated by respectively performing MDCT and MDST, for example, on each of the channel signals in operation 310.
For example, in operation 310, when the multi-channel signal includes two-channel signals, such as a stereo signal, a left signal and a right signal may be transformed into x(t) and y(t) spectrums, as shown in the below Equation 13, for example.
x(t)=a0(t)eiφ
y(t)=a1(t)eiφ
Here, x(t) denotes a spectrum being obtained by transforming the left signal (first channel signal) in operation 310, y(t) denotes a spectrum being obtained by transforming the right signal (second channel signal) in operation 310, ai(t) denotes the amplitude of an ith channel spectrum, and φi(t) denotes the phase of the ith channel spectrum.
Next, the transformed reference spectrum may be quantized (operation 320). When a complex-valued transformation is performed using MDCT and MDST, for example, in operation 310, then only the reference spectrum obtained by performing MDCT may be quantized in operation 320. Alternatively, in operation 320, the reference spectrum may be quantized by controlling an encoded amount of bits by determining the quantization step size according to a psychoacoustic model.
Next, the phase differences between the respective channel spectrums and the reference spectrum may be calculated (operation 330). For example, in operation 330, the phase differences may be calculated as shown in the below Equation 14, for example.
ψi=φs(t)−φi(t) Equation 14:
Here, ψi denotes the difference between phases of the ith channel spectrum and the reference spectrum, φs(t) denotes the phase of the reference spectrum, and φi(t) denotes the phase of the ith channel spectrum.
Next, gains as ratios of amplitudes of the respective channel spectrums to the amplitude of the reference spectrum may be calculated (operation 340). For example, in operation 340, the gains may be calculated as shown in the below Equation 15, for example.
Here, gi denotes the gain of the ith channel spectrum, as denotes the amplitude of the reference spectrum, and ai denotes the amplitude of the ith channel spectrum.
A corresponding process for calculating the phase differences between the respective channel spectrums and the reference spectrum in operation 330 and process for calculating the gains of the respective channel spectrums in operation 340, according to one or more embodiments, will now be described based on an assumption that the input multi-channel signal includes a left signal and a right signal received via two channels, noting that alternative embodiments are equally available.
First, in operation 300, as only an example, the left signal may be selected as a reference signal in operation 300. Next, in operation 310, a left spectrum and a right spectrum are generated by respectively transforming the left and right signals from the time domain to the frequency domain by performing a complex-valued transformation, as shown in the below Equation 16, for example.
Here, L denotes the left spectrum obtained in operation 310, R denotes the right spectrum obtained in operation 310, akL denotes the amplitude of the left spectrum, akR denotes the amplitude of the right spectrum, φkL denotes the phase of the left spectrum, and φkR denotes the phase of the right spectrum.
In operations 330 and 340, the phase difference between the left and right spectrums and a gain as a ratio of the amplitude of the right spectrum to the amplitude of the left spectrum, can lead to the minimum value shown in the below Equation 17, for example.
Here, g denotes the gain and ψ denotes the phase difference.
Equation 17 may be partially differentiated with respect to the gain g and the phase difference ψ, as shown in the below Equation 18, for example.
In operations 330 and 340, the phase difference ψ and the gain g that cause values of Equation 18 to be zero may be calculated using the below Equation 19, for example, e.g., so that a mean squared error between a predicted right signal that is predicted by applying the gain g and the phase difference ψ to the left signal, which is the reference signal, and the actual right signal are minimized.
Next, residual spectrums corresponding to differences between the respective channel spectrums and predicted spectrums thereof may be extracted (operation 350), where the predicted spectrums are obtained by respectively applying the phase differences and the gains of the respective channel spectrums calculated in operations 330 and 340 to the reference spectrum. For example, the residual spectrums may be extracted using the below Equation 20, for example.
r
i
=a
i cos φi−
Here, ri denotes a residual spectrum corresponding to the ith channel spectrum, ai denotes the actual amplitude of the ith channel spectrum, φi denotes the phase of the ith channel spectrum, and
The real-number part
i
=Re{ga
s exp(φs+ψ)} Equation 21:
Here, g denotes the gain of the ith channel spectrum calculated in operation 340, ψ denotes the phase difference between the ith channel spectrum and the reference spectrum, which is calculated in operation 330, as denotes the amplitude of the reference spectrum, and φs denotes the phase of the reference spectrum.
Next, a determination may be made as to how precise the predicted spectrums, e.g., which are obtained by respectively applying the phase differences and the gains calculated in operations 330 and 340 to the reference spectrum, have been predicted from the corresponding actual channel spectrums (operation 355).
In operation 355, the precision of prediction of the predicted spectrums may be determined by comparing the energies of the residual spectrums extracted in operation 350 with the energies of the respective actual channel spectrums for example, noting that alternative embodiments are equally available.
In addition, in operation 355, frames may be classified into several frame types based on the determined precision of prediction, and the residual spectrums may be respectively encoded adaptively according to the frame types. For example, in operation 355, frames may be classified into three frame types based on the precision of prediction, as shown in the below Equation 22, for example.
Further, the frame types may be used as the context of entropy coding when the residual spectrums are encoded, for example.
Next, it is determined whether the precision of prediction determined in operation 355 meets a threshold, e.g., is greater than the threshold (operation 360).
In one or more embodiments, when it is determined in operation 360 that the precision of prediction is greater than the threshold, the phase differences between the respective channel spectrums and the reference spectrum calculated in operation 330 are quantized (operation 335). In operation 335, the phase differences may be quantized on a uniform scale, for example.
Next, the gains of the respective channel spectrums calculated in operation 340 may be quantized (operation 370). In operation 370, the gains may be quantized on a log scale or the uniform scale, for example.
Next, the residual spectrums extracted in operation 350 may be quantized (operation 375). In operation 375, the residual spectrums may be quantized while controlling an encoded amount of bits by determining the quantization step size according to a psychoacoustic model.
Operations 300, 320, 330, 340, 350, 365, 370 and 375 may be processed in the units of bark bands in consideration of a critical band, for example.
When it is determined in operation 360 that the precision of the prediction of the predicted spectrum is less than the threshold, for example, the channel spectrum corresponding to the predicted spectrum may be quantized and the gain and phase difference of the corresponding channel spectrum may be set to ‘0’ (operation 380). This is because the predicted spectrum that is obtained by applying the phase difference and gain of the corresponding channel spectrum, e.g., as calculated in operations 330 and 340, to the reference spectrum may be considered to not be accurately predicted, and thus it is more efficient to individually encode the corresponding channel spectrum.
Next, either the reference spectrum quantized in operation 320, the phase differences between the respective channel spectrums and the reference spectrum quantized in operation 365, the gains of the respective channel spectrums quantized in operation 370, and the residual spectrums quantized in operation 375 are losslessly coded, or the at least one channel spectrum is losslessly coded (operation 385).
Next, any of the reference spectrum, phase differences, gains, and residual spectrums or the channel spectrums that are losslessly coded in operation 385 are multiplexed into a bitstream (operation 390).
Next, any of the reference spectrum, phase differences, gains, and residual spectrums, or the at least one channel spectrum may be losslessly decoded (operation 410).
Next, the losslessly decoded reference spectrum may be inversely quantized (operation 420).
Next, the reference signal may be generated by performing a first inverse transformation on the inversely quantized reference spectrum from the frequency domain to the time domain (operation 425). An example of the first inverse transformation may include IMDCT, for example, in which a real-number part is calculated during complex-valued transformation.
Next, it may be determined as to whether at least one channel signal in the multi-channel signal has been individually encoded, e.g., because a precision of prediction was determined to be low during the corresponding multi-channel encoding with regard to the at least one channel signal (operation 427).
When it is determined in operation 427 that at least one channel signal in the multi-channel signal has not been individually encoded, then the reference signal being inversely transformed in operation 425 may be transformed from the time domain to the frequency domain by performing a second transformation (operation 430). An example of second transformation includes MDST, for example, related to an imaginary-number part during complex-valued transformation. However, as noted above, since the reference signal is inversely transformed using the first inverse transformation in operation 425 and then transformed again using the second transformation in operation 430, the reference signal may be delayed by one frame before output.
Next, the phase differences decoded in operation 410 may be inversely quantized (operation 435). In operation 435, the phase differences may be inversely quantized on a uniform scale, for example.
Next, the gains of the respective channel spectrums decoded in operation 410 may be inversely quantized (operation 440). In operation 440, the gains may be inversely quantized on a log scale or the uniform scale, for example.
Next, the residual spectrums of the respective channel spectrums decoded in operation 410 may be inversely quantized (operation 445).
Thereafter, the respective channel spectrums may be reconstructed by applying the phase differences as inversely quantized in operation 435, the gains as inversely quantized in operation 440, and the residual spectrums as inversely quantized in operation 445, to the reference spectrum (operation 450). Here, the reference spectrum may be inversely quantized in operation 420 and transformed in operation 430 so that it may be used to express all the amplitudes and phases of the respective N-channel signals. In other words, in operation 450, each of the respective channel spectrums may be reconstructed by shifting the phase of the reference spectrum by the phase difference between the respective channel spectrums and the reference spectrum, adjusting the amplitude of the reference spectrum according to the gain of the channel spectrum, and adding the corresponding residual spectrum to the reference spectrum. However, a one-frame delay occurs in operation 430, and thus, operation 450 may be performed after a one-frame delay.
Next, the respective channel spectrums reconstructed in operation 450 may be inversely transformed from the frequency domain to the time domain (operation 460).
When it is determined in operation 427 that at least one channel signal in the multi-channel signal has been individually encoded, then the at least one channel spectrum is inversely quantized, e.g., where the at least one channel spectrum has been encoded instead of the phase differences, gains, and residual spectrums since, during an encoding of the at least one channel signal, the encoder calculated predicted spectrum of the at least one channel spectrum, e.g., predicted by applying the phase difference and gain of the at least one channel spectrum to the reference spectrum, was determined to not be accurately predicted (operation 455).
Next, the multi-channel signal may be generated by inversely transforming either the channel spectrums reconstructed in operation 450 or the at least one channel spectrum being inversely quantized in operation 455 from the frequency domain to the time domain (operation 460).
Next, the multi-channel signal may be output by performing a post-processing operation on the reference spectrum as inversely transformed in operation 420 and the multi-channel signal as inversely transformed in operation 460, where the post-processing operation may be an inverse operation of an operation performed in operation 300 of
In one or more embodiments, signals are described as data expressed in the time domain and spectrums are described as data expressed in the frequency domain in the present disclosure, but signals are generally considered as including spectrums.
One or more embodiments may be implemented through computer readable code/instructions in/on a medium, e.g., a computer readable medium, to control at least one processing element to implement any above described embodiment. The medium can correspond to any defined, measurable, and tangible structure permitting the storing and/or transmission of the computer readable code.
The media may also include, e.g., in combination with the computer readable code, data files, data structures, and the like. Examples of the media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVDs; magneto-optical media such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of computer readable code include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter, for example. The media may also be a distributed network, so that the computer readable code is stored and executed in a distributed fashion. Still further, as only an example, the processing element could include a processor or a computer processor, and processing elements may be distributed and/or included in a single device.
While aspects of the present invention has been particularly shown and described with reference to differing embodiments thereof, it should be understood that these exemplary embodiments should be considered in a descriptive sense only and not for purposes of limitation. Descriptions of features or aspects within each embodiment should typically be considered as available for other similar features or aspects in the remaining embodiments.
Thus, although a few embodiments have been shown and described, with additional embodiments being equally available, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2008-0067815 | Jul 2008 | KR | national |