The present invention relates to multi-channel encoders, for example multi-channel audio encoders utilizing parametric descriptions of spatial audio. Moreover, the invention also relates to methods of processing signals, for example spatial audio, in such multi-channel encoders. Furthermore, the invention relates to decoders operable to decode signals generated by such multi-channel encoders.
Audio recording and reproduction has in recent years progressed from monaural single-channel format to dual-channel stereo format and more recently to multi-channel format, for example five-channel audio format as often used in home movie systems. The introduction of super audio compact disks (SACD) and digital video disc (DVD) data carriers has resulted in such five-channel audio reproduction contemporarily gaining interest. Many users presently own equipment capable of providing five-channel audio playback in their homes; correspondingly, five-channel audio programme content on suitable data carriers is becoming increasingly available, for example the aforementioned SACD and DVD types of data carriers. On account of growing interest in multi-channel programme content, more efficient coding of multi-channel audio programme content is becoming an important issue, for example to provide one or more of enhanced quality, longer playing time and even more channels. Moreover, this growing interest has prompted standardization bodies such as MPEG to appreciate that design of multi-channel encoders is a relevant topic.
Encoders capable of representing spatial audio information such as audio programme content by way of parametric descriptors are known. For example, in a published international PCT patent application no. PCT/IB2003/002858 (WO 2004/008805), encoding of a multi-channel audio signal including at least a first signal component (LF), a second signal component (LR) and a third signal component (RF) is described. This encoding utilizes a method comprising steps of:
(a) encoding the first and second signal components by using a first parametric encoder for generating a first encoded signal (L) and a first set of encoding parameters (P2);
(b) encoding the first encoded signal (L) and a further signal (R) by using a second parametric encoder for generating a second encoded signal (T) and a second set of encoding parameters (P1) wherein the further signal (R) is derived from at least the third signal component (RF); and
(c) representing the multi-channel audio signal at least by a resulting encoded signal (T) derived from at least the second encoded signal (T), the first set of encoding parameters (P2) and the second set of encoding parameters (P1).
Parametric descriptions of audio signals have gained interest in recent years because it has been shown that transmitting quantized parameters describing audio signals requires relative little transmission capacity. These quantized parameters are capable of being received and processed in decoders to regenerate audio signals perceptually not significantly differing from their corresponding original audio signals.
A problem of significant inter-channel interference arises when output from contemporary multi-channel encoders is subsequently decoded. Such interference is especially noticeable in multi-channel encoders arranged to yield a good stereo image in association with two-channel down-mix. The present invention is arranged to at least partially address this problem, thereby enhancing the quality of corresponding decoded multi-channel audio.
An object of the present invention is to provide an alternative multi-channel encoder or block that can be used within a multi-channel encoder which is susceptible to generating encoded output data which is subsequently capable of being decoded with reduced inter-channel interference.
According to a first aspect of the present invention, there is provided a multi-channel encoder operable to process input signals conveyed in a plurality of input channels to generate corresponding output data comprising down-mix output signals together with complementary parametric data, the encoder including:
(a) a down-mixer for down-mixing the input signals to generate the corresponding down-mix output signals; and
(b) an analyzer for processing the input signals, said analyzer being operable to generate said parametric data complementary to the down-mix output signals, said encoder being operable when generating the down-mix output signals to allow for subsequent decoding of the down-mix output signals for predicting signals of channels processed and then discarded within the encoder.
The invention is of advantage in that the output data from the encoder is susceptible to being decoded with reduced inter-channel interference, namely enabling enhanced subsequent regeneration of the input signals.
Moreover, the amount of data output from the multi-channel encoder required to represent the input signals is also potentially reduced.
Preferably, the encoder is operable to process the input signals on the basis of time/frequency tiles. More preferably, these tiles are defined either before or in the encoder during processing of the input signals.
Preferably, in the encoder, the analyzer is operable to generate at least part of the parametric data (C1,i;C2,i) by applying an optimization of at least one signal derived from a difference between one or more input signals and an estimation of said one or more input signals which can be generated from output data from the multi-channel encoder. More preferably, the optimization involves minimizing an Euclidean norm.
Preferably, in the encoder, there are N input channels which the analyzer is operable to process corresponding original input signals of the N input channels to generate for each time/frequency tile the parametric data, the analyzer being operable to output M(N−M) parameters together with M down-mix output signals for representing the input signals in the output data, M and N being integers and M<N. More preferably, in a case of the integer M being equal to two in the encoder, the down-mixer is operable to generate two down-mix output signals which are susceptible to being replayed in two-channel stereophonic apparatus and being coded by a standard stereo coder. Such a characteristic is capable of rendering the encoder and its associated output data backwardly compatible with earlier replay systems, for example stereophonic two-channel replay systems.
According to a second aspect of the invention, there is provided a signal processor for inclusion in a multi-channel encoder according to the first aspect of the invention, the processor being operable to process data in the multi-channel encoder for generating its down-mix output signals and parametric data.
According to a third aspect of the invention, there is provided a method of encoding input signals in a multi-channel encoder to generate corresponding output data comprising down-mix output signals together with complementary parametric data, the method including steps of:
(a) providing the input signals to the multi-channel encoder via a plurality (N) of input channels;
(b) down-mixing the input signals to generate the corresponding (M) down-mix output signals; and
(c) processing the input signals to generate said parametric data complementary to the down-mix output signals,
wherein processing of the input signals in the multi-channel encoder involves determining the parameter data for enabling representations of the input signals to be subsequently regenerated, said down-mix signals allowing for decoding thereof for predicting content of signals of channels processed in the encoder and then discarded therein.
According to a fourth aspect of the invention, there is provided encoded output data generated according to the method of the third aspect of the invention, said output data being stored on a data carrier.
According to a fifth aspect of the invention, there is provided a decoder for decoding output data generated by an encoder according to the first aspect of the invention, the decoder comprising:
(a) processing means for receiving down-mix output signals together with parametric data from the encoder, the processing means being operable to process the parametric data to determine one or more coefficients or parameters; and
(b) computing means for calculating an approximate representation of each input signal encoded into the output data using the parameter data and also the one or more coefficients determined in step (a) for further processing to substantially regenerate representations of input signals giving rise to the output data generated by the encoder.
According to a sixth aspect of the invention, there is provided a signal processor for inclusion in a multi-channel decoder according to the fifth aspect of the invention, the signal processor being operable to assist in processing data in association with regenerating representations of input signals.
According to a seventh aspect of the invention, there is provided a method of decoding encoded data in a multi-channel decoder, said data being of a form as generated by a multi-channel encoder according to the first aspect of the invention, the method including steps of:
(a) processing down-mix output signals together with parametric data present in the encoded data, said processing utilizing the parametric data to determine one or more coefficients or parameters; and
(b) calculating an approximate representation of each input signal encoded into the encoded data using the parameter data and also the one or more coefficients determined in step (a) for further processing to substantially regenerate representations of input signals giving rise to the encoded data generated by the encoder.
It will be appreciated that features of the invention are susceptible to being combined in any combination without departing from the scope of the invention.
Embodiments of the invention will now be described, by way of example only, with reference to the following diagrams wherein:
The present invention will be described in first and second contexts. In the first context, the invention is concerned with an encoder which is operable to process original input signals to generate corresponding encoded output data capable on being subsequently decoded in a decoder to regenerate perceptually more precise representations of the original input signals than hitherto possible. In the second context, the invention is concerned with specific example embodiments of the invention.
The first context will now be considered with regard to
(a) corresponding encoded output signals at M down-mix channel outputs where M<N, for example two channel outputs OP1 and OP2 denoted by 610, 620 respectively when M=2; and
(b) one or more parametric signal outputs, for example a parametric output denoted by 600.
In order subsequently to most optimally decode in a decoder output signals generated by the encoder 5, namely with regard to least-squares-errors, it is contemporarily beneficial that Principal Component Analysis (PCA) be employed in the encoder 5 when generating its encoded output signals 600, 610, 620. Processing of these output signals 600, 610, 620 for best possible regeneration of signals at a decoder indicated by 10 in
The inventors have appreciated for the present invention that, when a fixed down-mix is employed in conjunction with the aforementioned M down-mix channels in the encoder 5, a substantially perfect regeneration of the original input signals at the complementary decoder 10 is potentially possible when these M down-mix channels are extended by way of an additional appropriate set of N−M channels conveying complementary information. Thus, output signals of M down-mix channels generated by a fixed down-mix cannot be used to regenerate substantially perfect representations of original input signals of N channels when information relating to such N−M channels has been at least partially discarded during encoding. However, the inventors have appreciated that these N−M channels can at least partially be predicted when suitable processing is applied to the M down-mix channels, for example to the outputs 610, 620.
Thus, an encoder 5 configured according to the invention predicts from the M down-mix channels at least some information corresponding to the N−M channels at a decoder, while at the same time avoiding a need to send certain parameters from the encoder 5 to the decoder 10. Such prediction makes use of signal redundancy occurring between signals of the N channels as will be described in more detail later. Moreover, the correspondingly compatible decoder 10 reinstates the redundancy when decoding encoded data provided from the encoder 5.
In order to further elucidate the present invention, an example embodiment of the encoder 5 illustrated in
The example embodiment of the invention pursuant to the aforementioned second context will now be described with reference to
In
The six original input signals denoted by 400 to 450 comprise: a left front audio signal 400, a left rear audio signal 410, an effects audio signal 420, a center audio signal 430, a right front audio signal 440 and a right rear audio signal 450. The effects signal 420 preferably has a bandwidth of substantially 120 Hz for use in simulating rumble, explosion and thunder effects for example. Moreover, the input signals 400, 410, 430, 440, 450 preferably correspond to 5-channel home movie sound channels.
The processing units 20, 30, 40 are preferably implemented in a manner elucidated in published European patent application no. EP 1, 107, 232 which is hereby incorporated by reference with regard to these units 20, 30, 40.
The processing unit 20 comprises a segment and transform unit 100, a parameter analysis unit 110, a parameter to PCA angle unit 120 and a PCA rotation unit 130. The transform unit 100 includes transformed left-front and left-rear outputs 700, 710 respectively coupled to the PCA rotation unit 130 and the parameter analysis unit 110. A first parameter set output 720 is coupled via the PCA angle unit 120 to the PCA rotation unit 130. The rotation unit 130 is operable to process the outputs 700, 710 and the first parameter set output to generate the processed output 500. Processing within the unit 20 is performed on the basis of time/frequency tiles.
Similarly, the processing unit 30 comprises a segment and transform unit 200, a parameter analysis unit 210, a parameter to PCA angle unit 220 and a PCA rotation unit 230. The transform unit 200 includes transformed effects audio and centre audio outputs 800, 810 respectively coupled to the PCA rotation unit 230 and the parameter analysis unit 210. A fourth parameter set output 820 is coupled via the PCA angle unit 220 to the PCA rotation unit 220. The rotation unit 220 is operable to process the outputs 800, 810 and the fourth parameter set output to generate the processed output 510. Processing within the unit 30 is also performed on the basis of time/frequency tiles.
Similarly, the processing unit 40 comprises a segment and transform unit 300, a parameter analysis unit 310, a parameter to PCA angle unit 320 and a PCA rotation unit 330. The transform unit 300 includes transformed right-front and right-rear outputs 900, 910 respectively coupled to the PCA rotation unit 330 and the parameter analysis unit 310. A second parameter set output 920 is coupled via the PCA angle unit 320 to the PCA rotation unit 330. The rotation unit 330 is operable to process the outputs 900, 910 and the second parameter set output to generate the processed output 520. Processing within the unit 40 is performed on the basis of time/frequency tiles.
The processed outputs 500, 510, 520 correspond to left, center and right processed signals respectively. Moreover, the down-mix outputs 610, 620 are susceptible to being replayed via contemporary two-channel stereo playback apparatus thereby maintaining backward compatibility with earlier stereo sound systems. The third parameter set output 600 includes additional parameter data which can be processed at a decoder, for example the decoder 10 illustrated in
Referring again to the first context of the invention with regard to
For convenience, we consider two down-mix channels as illustrated for the encoder 15, although extension to other numbers of down-mix channels is possible. From the original input signals conveyed in N channels CH1 to CH3, the encoder 5 processes the aforesaid sub-band representations Z1[k] to ZN[k] to generate two down-mix channels L0[k] and R0[k] as provided in Equations 1 and 2 (Eq. 1 and 2):
wherein parameters αi and βi are preferably set as required for good stereo image in the two down-mix channels L0[k] and R0[k]. As elucidated in the foregoing, a subsequent decoder, for example the decoder 10 regenerating representations of the original input signals for CH1 to CH3 is only capable of generating substantially perfect representations when the two down-mix channels L0[k] and R0[k] are supplemented with an appropriate set of parameters to substantially regenerate the N−2 missing channels. When fixed down-mixing is employed, to some extent, information of the N−2 discarded channels can be predicted from the two down-mix channels L0[k] and R0[k], thereby providing a way of enhancing accuracy of regeneration of the aforesaid representation of the original input signals of channels CH1 to CH3 at a corresponding decoder, for example the decoder 10.
In a situation where information relating to certain of the N channels is discarded in generating the output signals 600, 610, 620, namely the discarded channels are denoted by C0,i[k], these discarded channels can be predicted from the down-mix channels L0[k] and R0[k] by applying Equation 3 (Eq. 3):
Ĉ0,i[k]={tilde over (C)}1,iL0[k]+{tilde over (C)}2,iR0[k] Eq. 3
wherein parameters {tilde over (C)}1,i and {tilde over (C)}2,i are selected according to one or more optimization criteria. Preferably, an optimization criterion employed in the encoder 5 is a minimum Euclidean norm of the signal C0,i[k] and its estimation Ĉ0,i[k]. In order to allow for processing according to Equation 3 to be employed in a decoder complementary to the encoder 5, the parameters {tilde over (C)}1,i and {tilde over (C)}2,i are preferably included in the third parameter set 600 output from the encoder 5.
The inventors have appreciated that the parameters {tilde over (C)}1,i and {tilde over (C)}2,i in Equation 3 are related to parameters that are generated in the encoder 5 when minimizing the Euclidean norm of the difference of the signal Zi[k] and an estimation {circumflex over (Z)}i[k] thereof generated at the decoder 10. The encoder 5 preferably is configured to employ these latter parameters Zi[k], {circumflex over (Z)}i[k]. A square of the Euclidean norm of the difference of the original input signal Zi[k] is then calculable in the encoder 5 by applying Equation 4 (Eq. 4):
Minimization of Equation 4 is preferably achieved by applying Equations 6 and 7 (Eq. 6 and 7):
Thus, for the parameters C1,Z
Thus, in the encoder 5, applying processing operations as described by Equations 1 to 13 (Eq. 1 to 13), it is feasible to convert input signals corresponding to N channels, namely the input signals for CH1 to CH3 wherein N=3, with two parameters per channel and two down-mix channels to generate signals for the outputs 610, 620 and the third parameter set output 600; the two parameters for the i-th channel are C1,Z
In the encoder 5, the input signals CH1 to CH3 are processed in the channel unit 100, 200, 300 to yield a representation of the input signals in time/frequency tiles. Processing operations as depicted by Equations 1 to 13 are repeated for each of these tiles. The signals L0[k] of all frequency tiles are combined in the encoder 5 and transformed to the time domain to form a signal for the current segment and this signal is at least partially combined with the signal pertaining to at least a preceding segment thereto to generate the encoded output signal 620. The signals Ro[k] are processed in a similar manner to the signals Lo[k] to generate the encoded output signal 610.
In summary, the encoder 5, and similarly the encoder 15 which is a specific example embodiment of the invention, is operable to encode the three input signals CH1 to CH3 as two down-mixed channels 610, 620, namely lO[n], rO[n] and 2N−4 parameters for each time/frequency tile applied when processing the input signals CH1 to CH3.
Complementary to the encoder 5 illustrated in
At the decoder 10, when receiving the outputs 600, 610, 620 from the encoder 5, for example conveyed by way of a communication network such as the Internet and/or a data carrier such as a digital video disk (DVD) or similar data medium, for each time/frequency tile, the following processing functions are performed:
(a) the coefficients C1,Z
(b) an approximate representation {circumflex over (Z)}i[k] of each input signal Zi[k] is computed using Equation 14 (Eq. 14):
{circumflex over (Z)}i=C1,Z
wherein L0[k] and R0[k] are the signals representing a time/frequency tile of two down-mix channels received at the decoder 10, namely the outputs 610, 620 respectively.
A specific example embodiment of the decoder 10 illustrated in
The signal 1700 is coupled directly and also via a decorrelator 1750 as shown to an inverse PCA unit 1800 which is operable to generate two intermediate outputs Lf, Ls which are coupled to an inverse transform and OLA unit 1900. The inverse transform unit 1900 is operable to process the intermediate outputs Lf, Ls to generate decoder outputs 2000, 2010 corresponding to the output 1500 in
Similarly, the signal 1710 is coupled directly and also via a decorrelator 1760 as shown to an inverse PCA unit 1810 which is operable to generate two intermediate outputs Cs, LFE which are coupled to an inverse transform and OLA unit 1910. The inverse transform unit 1910 is operable to process the intermediate outputs Cs, LFE to generate decoder outputs 2020, 2030 corresponding to the output 1510 in
Similarly, the signal 1720 is coupled directly and also via a decorrelator 1770 as shown to an inverse PCA unit 1820 which is operable to generate two intermediate outputs Rf, Rs which are coupled to an inverse transform and OLA unit 1920. The inverse transform unit 1920 is operable to process the intermediate outputs Rf, Rs to generate decoder outputs 2040, 2050 corresponding to the output 1520 in
The units 1800, 1810, 1820 require parameter inputs 920, 820, 720 during operation to receive sufficient data for correct operation.
Processing operations executed within the decoding processor 1610, also known as a decoder according to the invention, involve mathematical operations as described in the foregoing with reference to the decoder 10 illustrated in
It will be appreciated that embodiments of the invention described in the foregoing are susceptible to being modified without departing from the scope of the invention as defined by the accompanying claims.
For example, the encoder 5, similarly the encoder 15, is preferably arranged to function so as to generate a good stereo image in the down-mix outputs by applying Equations 15 and 16 (Eq. 15 and 16) during processing:
L0[k]=L[k]+Cs[k] Eq. 15
R0[k]=R[k]+Cs[k] Eq. 16
In such a situation N=3 hence only two parameters per tile, as determined by 2N−4, need to be transmitted from the encoder 5 to the decoder 10. Such an arrangement is of advantage in that the two parameters or coefficients C1,Z
Correspondingly, at the decoder 10, when providing three or more channel playback, there are computed for each tile six parameters, namely C1,L, C2,L, C1,R, C2,R, C1,Cs and C2,Cs. Such computation is based on two transmitted parameters and information regarding relations between these six parameters.
As an example, the coefficients C1,L and C2,R are transmitted from the encoder 5 to the decoder 10. The decoder 10 is then capable of deriving other coefficients therefrom by way of Equations 17 (Eqs. 17), namely:
C2,L=C2,R−1 C1,R=C1,L−1
C1,Cs=1−C1,L C2,Cs=1−C2,R Eqs. 17
When these six coefficients have been derived for each tile, representations of output signals within the encoder 5, namely {circumflex over (L)}[k], {circumflex over (R)}[k] and Ĉs[k], can be regenerated within the decoder 10 by using Equation 18 (Eq. 18) in computations executed within the decoder 10:
These signals {circumflex over (L)}[k], {circumflex over (R)}[k] and Ĉs[k] are then transformable from the frequency domain to the temporal domain to generate signals 1500 to 1520 for output from the decoder 10 for user appreciation, for example during home movie presentation.
In a most straightforward use of the multi-channel encoders 5, 15, a standard stereo coder, namely both encoder and decoder, where M=2 is employed between the multi-channel encoder 5, 15 and the multi-channel decoder 10, 18 described in the foregoing. In other words, referring to
In the accompanying claims, numerals and other symbols included within brackets are included to assist understanding of the claims and are not intended to limit the scope of the claims in any way.
Expressions such as “comprise”, “include”, “incorporate”, “contain”, “is” and “have” are to be construed in a non-exclusive manner when interpreting the description and its associated claims, namely construed to allow for other items or components which are not explicitly defined also to be present. Reference to the singular is also to be construed to be a reference to the plural and vice versa.
Number | Date | Country | Kind |
---|---|---|---|
04101405 | Apr 2004 | EP | regional |
04102862 | Jun 2004 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2005/051040 | 3/25/2005 | WO | 00 | 10/2/2006 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2005/098824 | 10/20/2005 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7193538 | Craven et al. | Mar 2007 | B2 |
7447317 | Herre et al. | Nov 2008 | B2 |
20040049379 | Thumpudi et al. | Mar 2004 | A1 |
Number | Date | Country |
---|---|---|
1107232 | Jun 2001 | EP |
WO2004008805 | Jan 2004 | WO |
Number | Date | Country | |
---|---|---|---|
20070239442 A1 | Oct 2007 | US |