The present invention relates to decoding of audio signals and in particular to decoding of a parametric multi-channel downmix of an original multi-channel signal into a number of channels smaller than the number of channels of the original multi-channel signal.
Recent development in audio coding has made available the ability to recreate a multi-channel representation of an audio signal based on a stereo (or mono) signal and corresponding control data. These methods differ substantially from older matrix based solutions such as Dolby Prologic, since additional control data is transmitted to control the re-creation, also referred to as upmix, of the surround channels based on the transmitted mono or stereo channels.
Hence, such a parametric multi-channel audio decoder, e.g. MPEG Surround, reconstructs N channels based on M transmitted channels, where N>M, and the additional control data. The additional control data represents a significant lower data rate than transmitting all N channels, making the coding very efficient while at the same time ensuring compatibility with both M channel devices and N channel devices.
These parametric surround coding methods usually comprise a parameterization of the surround signal based on IID (Inter channel Intensity Difference) and ICC (Inter Channel Coherence). These parameters describe power ratios and correlation between channel pairs in the upmix process. Further parameters also used in prior art comprise prediction parameters used to predict intermediate or output channels during the upmix procedure.
Two famous examples of such multi-channel coding are BCC coding and MPEG surround. In BCC encoding, a number of audio input channels are converted to a spectral representation using a DFT (Discrete Fourier Transform) based transform with overlapping windows. The resulting uniform spectrum is then divided into non-overlapping partitions. Each partition has a bandwidth proportional to the equivalent rectangular bandwidth (ERB). Then, spatial parameters called ICLD (Inter-Channel Level Difference) and ICTD (Inter-Channel Time Difference) are estimated for each partition. The ICLD parameter describes a level difference between two channels and the ICTD parameter describes the time difference (phase shift) between two signals of different channels. The level differences and the time differences are given for each channel with respect to a common reference channel. After the derivation of these parameters, the parameters are quantized and encoded for transmission.
The individual parameters are estimated with respect to one single reference channel in BCC-coding. In other parametric surround coding systems, e.g. in MPEG surround, a tree-structured parameterization is used. This means, that the parameters are no longer estimated with respect to one single common reference channel but to different reference channels that may even be a combination of channels of the original multi-channel signal. For example, having a 5.1 channel signal, parameters may be estimated between a combination of the front channels and between a combination of the back channels.
Of course, backward compatibility to already established audio-standards is highly desirable also for the parametric coding schemes. For example, having a mono-downmix signal it is desirable to also provide a possibility to create a stereo-playback signal with high fidelity. This means that a monophonic downmix signal has to be upmixed into a stereo signal, making use of the additionally transmitted parameters in the best possible way.
One common problem in multi-channel coding is energy preservation in the upmix, as the human perception of the spatial position of a sound-source is dominated by the loudness of the signal, i.e. by the energy contained within the signal. Therefore, utmost care must be taken in the reproduction of the signal to attribute the right loudness to each reconstructed channel such as to avoid the introduction of artifacts strongly decreasing the perceptional quality of the reconstructed signal. As during the downmix amplitudes of signals are commonly summed up, the possibility of interference arises, being described by the correlation or coherence parameter.
When it comes to the reconstruction of a reduced number of channels (a number of channels smaller than the original number of channels of the multi-channel signal), schemes like BCC are simple to handle, since every parameter is transmitted with respect to the same single reference channel. Therefore, having knowledge on the reference channel, the most relevant level information (absolute energy measure) can easily be derived for every channel needed for the upmix. Thus, reduced number of channels can be reconstructed without the need to reconstruct the full multi-channel signal first. Thus, the energy computations for the energies of the multichannel signal is easier in BCC by using single variables rather than products of variables, but this is only a first step. When it comes to deriving energies and correlations of a reduced number of channels which should come as close as possible to partial downmixes of the original multichannel signals, the level of difficulty in MPEG Surround and BCC is comparable.
In contrast thereto, a tree-based structure as MPEG surround uses a parameterization in which the relevant information for each individual channel is not contained in a single parameter. Therefore, in prior art, reconstructing reduced numbers of channels requires the reconstruction of the multi channel signal followed by a downmix into the reduced numbers of channels to not violate the energy preservation requirement. This has the obvious disadvantage of extremely high computational complexity.
It is the object of the present invention to provide a concept for obtaining a reduced number of channels from a parametric multichannel signal more efficiently.
In accordance with a first aspect of the present invention, this object is achieved by a parameter calculator for deriving upmix parameters for upmixing a downmix signal into an intermediate channel representation of a multi-channel signal having more channels than the downmix signal and less channels than the multi-channel signal, the downmix signal having associated thereto multi-channel parameters describing spatial properties of the multi-channel signal, wherein the multi-channel signal includes channels not included in the intermediate channel representation and wherein the multi-channel parameters include information on the channels not included in the intermediate channel representation, the parameter calculator comprising: a parameter recalculator for deriving the upmix parameters from the multi-channel parameters using the parameters having information on channels not included in the intermediate channel representation.
In accordance with a second aspect of the present invention, this object is achieved by a channel reconstructor having a parameter reconstructor, comprising: a parameter calculator for deriving upmix parameters for upmixing a downmix signal into an intermediate channel representation of a multi-channel signal having more channels than the downmix signal and less channels than the multi-channel signal, the downmix signal having associated thereto multi-channel parameters describing spatial properties of the multi-channel signal, wherein the multi-channel signal includes channels not included in the intermediate channel representation and wherein the multi-channel parameters include information on the channels not included in the intermediate channel representation, the parameter calculator comprising: a parameter recalculator for deriving the upmix parameters from the multi-channel parameters using the parameters having information on channels not included in the intermediate channel representation; and an upmixer for deriving the intermediate channel representation using the upmix parameters and the downmix signal.
In accordance with a third aspect of the present invention, this object is achieved by a method for generating upmix parameters for upmixing a downmix signal into an intermediate channel representation of a multi-channel signal having more channels than the downmix signal and less channels than the multi-channel signal, the downmix signal having associated thereto multi-channel parameters describing spatial properties of the multi-channel signal, wherein the multi-channel signal includes channels not included in the intermediate channel representation and wherein the multi-channel parameters include information on the channels not included in the intermediate channel representation, the method comprising: deriving the upmix parameters from the multi-channel parameters using the parameters having information on channels not included in the intermediate channel representation.
In accordance with a fourth aspect of the present invention, this object is achieved by an audio receiver or audio player, the receiver or audio player having a parameter calculator for deriving upmix parameters for upmixing a downmix signal into an intermediate channel representation of a multi-channel signal having more channels than the downmix signal and less channels than the multi-channel signal, the downmix signal having associated thereto multi-channel parameters describing spatial properties of the multi-channel signal, wherein the multi-channel signal includes channels not included in the intermediate channel representation and wherein the multi-channel parameters include information on the channels not included in the intermediate channel representation, the parameter calculator comprising: a parameter recalculator for deriving the upmix parameters from the multi-channel parameters using the parameters having information on channels not included in the intermediate channel representation.
In accordance with a fifth aspect of the present invention, this object is achieved by a method of receiving or audio playing, the method having a method for generating upmix parameters for upmixing a downmix signal into an intermediate channel representation of a multi-channel signal having more channels than the downmix signal and less channels than the multi-channel signal, the downmix signal having associated thereto multi-channel parameters describing spatial properties of the multi-channel signal, wherein the multi-channel signal includes channels not included in the intermediate channel representation and wherein the multi-channel parameters include information on the channels not included in the intermediate channel representation, the method comprising: deriving the upmix parameters from the multi-channel parameters using the parameters having information on channels not included in the intermediate channel representation.
The present invention is based on the finding that an intermediate channel representation of a multi-channel signal can be reconstructed highly efficient and with high fidelity, when upmix parameters for upmixing a transmitted downmix signal to the intermediate channel representation are derived that allow for upmix using the same upmixing algorithms as within the multi-channel reconstruction. This can be achieved when a parameter re-calculator is used to derive the upmix parameters taking also into account parameters having information on channels not included in the intermediate channel representation.
In one embodiment of the present invention, a decoder is capable of reconstructing a stereo output signal from a parametric downmix of a 5-channel multi-channel signal, the parametric downmix comprising a monophonic downmix signal and associated multi-channel parameters. According to the invention, the spatial parameters are combined to derive upmix parameters for the upmix of a stereo signal, wherein the combination also takes into account multi-channel parameters not associated to the left-front or the right-front channel. Hence, absolute powers for the upmixed stereo-channels can be derived and a coherence measure between the left and the right channel can be derived allowing for a high fidelity stereo reconstruction of the multi-channel signal. Moreover, an ICC parameter and a CLD parameter are derived allowing for an upmixing using already existing algorithms and implementations. Using parameters of channels not associated to the reconstructed stereo-channels allows for the preservation of the energy within the signal with higher accuracy. This is of most importance, as uncontrolled loudness variations are disturbing the quality of the playback signal most.
Generally, the application of the inventive concept allows a reconstruction of a stereo upmix from a mono-downmix of a multi-channel signal without the need of an intermediate full reconstruction of the multi-channel signal, as in prior art methods. Evidently, the computational complexity on the decoder side can thus be decreased significantly. Using also multi-channel parameters associated to channels not included in the upmix (i.e. the left front and the right front channel) allows for a reconstruction that does not introduce any additional artifacts or loudness-variations but preserves the energy of the signal perfectly instead. To be more specific, the ratio of the energy between the left and the right reconstructed channel is calculated from numerous available multi-channel parameters, taking also into account multi-channel parameters not associated to the left front and the right front channel. Evidently, the loudness ratio between the left and the right reconstructed (upmixed) channel is dominant with respect to the listening quality of the reconstructed stereo signal. Without using the inventive concept a reconstruction of channels having the precisely correct energy ratio is not possible in tree-based structures discussed within this document.
Therefore, implementing the inventive concept allows for a high-quality stereo-reproduction of a downmix of a multi-channel signal based on multi-channel parameters, which are not derived for a precise reproduction of a stereo signal.
It should be noted, that the inventive concept may also be used when the number of reproduced channels is other than two, for example when a center-channel shall also be reconstructed with high fidelity, as it is the case in some playback environments.
A more detailed review of the prior art, multi-channel encoding schemes (particularly of tree-based structures) will be given within the following to outline the high benefit of the inventive concept.
Preferred embodiments of the present invention are subsequently described by referring to the enclosed drawings, wherein:
The inventive concept will in the following be described mainly with respect to MPEG coding, but is as well applicable to other schemes based on parametric coding of multi-channel signals. That is the embodiments described below are merely illustrative for the principles of the present invention for reduced number of channels decoding for tree-structured multi-channel systems. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
As mentioned above, in some parametric surround coding systems, e.g. MPEG Surround, a tree-structured parameterization is used. Such a parameterization is sketched in
Generally, the individual channels or channel pairs are characterized with respect to each other by multi-channel parameters, such as for example a correlation parameter ICC and a level parameter CLD. Possible parameterizations will be shortly explained in the following paragraph, the resulting tree-structured decoding schemes are then illustrated in
In the example shown in the left side of
In the parameterization on the right side (5-1-52 parameterization) parameters are used, relating the left front channel 2 and the left surround channel 5, the right front channel 4 and the right surround channel 6 and the center channel 3 and the low-frequency enhancement channel 7. Additional parameters (CLD1 and ICC1) describe a combination of the left channels 2 and 5 with respect to a combination of the right channels 4 and 6. A further set of parameters (CLD0 and ICC0) describes the relation of a combination of the center channel 3 and the LFE-channel 7 with respect to a combination of the remaining channels.
Accordingly, OTT module 24 derives, using CLD1 and ICC1, first channel being a combined channel of the center channel 3 and the low-frequency channel 7 and a second channel being a combination of the left front channel 2 and the right front channel 4. In the same way, OTT module 26 derives the left surround channel 5 and the right surround channel 6, using CLD2 and ICC2. OTT module 27 derives the center channel 3 and the low-frequency channel 7, using CLD4 and OTT module 28 derives the left front channel 2 and the right front channel 4, using CLD3 and ICC3. Finally, a reconstruction of the full set of channels 30 is derived from a single monophonic downmix channel 22. For the 5-1-52 tree structure, the general layout of the OTT module is equivalent to the 5-1-51 tree structure. However, the single OTT modules derive different channel combinations, the channel combinations corresponding to the parameterization outlined in
It becomes evident from
Therefore, in the parameterizations shown, individual channels cannot be simply derived using the parameters associated to the OTT-boxes in the visualization, but some or all of the remaining parameters have to be taken into account additionally.
The tree-structure of the parameterization is only a visualization for actual signal flow or processing shown in
As shown in
A solution to the afore-mentioned problem of obtaining stereo output from a mono downmix and parametric surround parameters in a parameterization that does not naturally support “pruning” down to a stereo output will in the following be derived for the general case. This is followed by two specific embodiments showing the use of the inventive concept in the parameterizations described above. Thus, solutions are provided to the problem of obtaining stereo output from a mono downmix and parametric surround parameters in a parameterization that does not support “pruning” down to a stereo output.
The general approach of the parameter recalculation will be outlined below. In particular, it applies to the case of computing stereo output parameters from an arbitrary number of multi-channel audio channels N. It is furthermore assumed that the audio signal is described by a subband representation, derived using a filter bank that could be real valued or complex modulated.
Let all signals considered be finite vectors of subband samples corresponding to a time frequency tile defined by the spatial parameters and let the subband samples of a reconstructed multi-channel audio signal y be formed from subband samples of audio channels m1,m2, . . . ,mM and decorrelated subband samples of audio channels d1,d2, . . . ,dD according to a matrix upmix operation
y=Rx, where
All signals are regarded as row vectors. The matrix R is of size N×(M+D) and represents the combined effect of the matrices M1 and M2 of
yD=Dy.
This covariance matrix can be computed by multiplication with complex conjugate transposed to be
yDy*D=Dyy*D*=DRxx*R*D*,
where the inner covariance matrix xx* is often known from the properties of decorrelators and the transmitted parameters.
An important special case where this holds true is for M=1, and frequently this inner covariance matrix is then actually equal to the identity matrix of size M+D. As a consequence, for a stereo output where ND=2, the CLD and ICC parameters can be read from
in the sense that
Note that here and in the following, the following notation is applied. For complex vectors x,y, the complex inner product and squared norm is defined by
where the star denotes complex conjugation.
Subsequently, two embodiments of the present invention shall be derived for the different parameterizations (5-1-51 and 5-1-52) shown in
It is furthermore assumed that the processing of the individual audio channels is done frame wise, i.e. in discrete time portions. Thus, when talking about powers or energies contained within one channel, the term “power” or “energy” is to be understood as the energy or power contained within one frame of one specific channel.
Generally, parameters as for example CLD and ICC are also valid for one single frame. Having a frame with k sample values ai, the energy E within the frame can for example be represented by the squared sum of the subband sample values within the frame:
Channel level differences (CLD) transmitted and used for the calculation of upmix parameters for upmixing the downmix signal M into an intermediate channel representation (stereo) of the multi-channel signal are defined as follows:
wherein L0 and R0 denote the power of the signals in question within the frame for which the parameter CLD shall be derived.
Therefore, for the 5-1-51 case, the four CLD parameters CLDX, X=0,1,2,3, can be used to obtain channel powers normalized by the power of the mono downmix channel m.
Lf=(c10c11c13)2,
Rf=(c10c11c23)2,
C=(c10c21)2,
Ls=(c20c12)2,
Rs=(c20c22)2.
The channel gains are defined by
The final goal is to derive optimal stereo channels l0 and r0 in the sense that appropriate estimates of the normalized powers and correlation of the stereo channels (intermediate channel representation) formed by
l0=l+qc, with l=G(lf+ls), such that L=Lf+Ls,
r0=r+qc, with r=G(rf+rs), such that R=Rf+Rs.
are found, wherein the center downmix weight is q=1/√{square root over (2)}. Computing powers from this assumption gives the result
L0=L+q2C+2Rel,qc,
R0=R+q2C+2Rer,qc.
It turns out to be most advantageous to assume that both the combined left channel l and the combined right channel rare uncorrelated with the center channel c, rather than attempting to incorporate the correlation information carried by the parameters ICCXl,m, X=0,1. The normalized powers of the stereo output channels are therefore estimated by
Having derived the powers of the output channels, the desired CLD parameter can easily be computed using the definition of the CLD parameter given above.
According to the inventive concept, an ICC parameter is derived to allow a stereo upmix. The correlation between the two output channels is defined by the following expression:
p=Rel0,r0=q2C+Rel,r+qRec,l+r.
An attractive set of simplifying assumptions is here again that the combined left channel l and the combined right channel r are uncorrelated with the center channel c, and moreover that the surround channels are uncorrelated with the front channels. These assumptions can be expressed by
Rec,l+r=0,
Rel,r=Relf,rf+Rels,rs.
The resulting estimate for p depends on the two ICC parameters ICCX, X=2,3, which describe normalized left/right correlations
which can be written out as
Thus, the final correlation value depends on numerous parameters of the multi-channel parameterization, allowing for the high fidelity reconstruction of the signal. The ICC parameter is finally derived using the following formula:
According to the inventive concept, the power distribution between the reconstructed channels is reconstructed with high accuracy. However, a global power scaling applied to both channels may be additionally necessary, to assure for overall energy preservation. As the relative energy distribution between the channels is vital for the spatial perception of the reconstructed signal, global scaling may deteriorate the perceptual quality of the reconstructed signal. It is to be emphasized that the global scaling is only global inside a parameter defined time-frequency tile. This means that wrong scalings will affect the signal locally at the scale of parameter tiles. In other words both frequency and time depending gains will be applied which lead to both spectral colorization and time modulation artifacts. A gain adjustment factor for global scaling is necessary to assure that the stereo upmix process is preserving the power of the mono downmix channel m.
However, this factor is defined by g=√{square root over (L0+R0)}, which amounts to g=1 for the 5-1-51 configuration, since L0+R0=Lf+Rf+C+Ls+Rs=1.
As a further embodiment, the application of the inventive concept to the 5-1-52 tree-structure will be outlined within the following paragraphs. For the creation of a high-fidelity stereo signal, the two first CLD and ICC parameter sets corresponding to the top branches of the tree are relevant.
The two CLD parameters CLDX for X=0,1, are used first to obtain normalized channel powers of the combined left and right channels and the center channel
L=(c10c11)2,
R=(c10c21)2,
C=c202,
where the channel gains are defined by
The goal is to derive the powers and correlation of the downmix channels
l0=l+qc,
r0=r+qc,
where the center downmix weight is q=1/√{square root over (2)}. Computing powers from this assumption gives the result
L0=L+q2C+2Rel,qc,
R0=R+q2C+2Rer,qc.
An advantageous assumption is here that both the ICC between the channels l and c and between channels r and cis the same as the given ICC0 between the channels l+r and c. This assumption leads to the estimates
Rel,c=ICC0√{square root over (LC)},
Rer,c=ICC0√{square root over (RC)},
such that the estimates of the normalized powers become
As in the preceding embodiment, having the power values L0 and R0, the desired CLD parameter can be derived:
Deriving the correlation and finally the ICC parameter starts from the general definition of the correlation value:
p=Rel0,r0=q2C+Rel,r+qRec,l+r.
All the necessary information is available from the parameters of the 5-1-52 tree structure since
Rec,l+r=ICC0√{square root over (C)}∥l+r∥,
∥l+r∥2=L+R+2Rel,r,
Rel,r=ICC1√{square root over (LR)}.
The final results can be written out as
The required gain adjustment factor g is defined by:
g=√{square root over (L0+R0)}
It may be noted, that the generated CLD and ICC parameters may further be quantized, to enable the use of lookup tables in the decoder for upmix matrix creation rather than performing the complex calculations. This further increases the efficiency of the upmix process.
Generally, upmix is possible using already existing OTT modules. This has the advantage that the inventive concept can be easily implemented in already existing decoding scenarios.
Generally, the upmix matrix can be described as follows:
Therefore, having inventively derived the parameters CLD and ICC, stereo upmix of a transmitted downmix can be performed with high fidelity using standard upmix modules.
In a further embodiment of the present invention, an inventive Channel reconstructor comprises a parameter calculator for deriving upmix parameters and an upmixer for deriving an intermediate channel representation using the upmix parameters and a transmitted downmix signal.
The inventive concept is again outlined in
It may be noted, that the inventive concept can easily be adapted to scenarios with an upmix comprising more than two channels. The upmix is in that sense generally defined as an intermediate channel representation of the multi-channel signal, wherein the intermediate channel representation has more channels than the downmix signal and less channels than the multi-channel signal. One common scenario is a configuration in which an additional center channel is reconstructed.
The application of the inventive concept is again outlined in
The single CLD and ICC parameters 508 and 510 are input in the OTT module 520 to steer the upmix of the monophonic downmix signal 522. Thus, at the output of the OTT module 520, a stereo signal 524 can be provided as an intermediate channel representation of the multi-channel signal.
A bit stream can be input at the input 602 of the inventive receiver/audio player 600. The decoder 601 then decodes the bit stream and the decoded signal is output or played at the output 604 of the inventive receiver/audio player 600.
Although the inventive concept has been outlined mainly with respect to MPEG surround coding, it is of course by no means limited to the application to the specific parametric coding scenario. Because of the high flexibility of the inventive concept, it can be easily applied to other coding schemes as well, such as for example to 7.1 or 7.2 channel configurations or BCC schemes.
Although the embodiments of the present invention relating to MPEG-coding introduce some simplifying assumptions for the generation of the common CLD and ICC parameter, this is not mandatory. It is of course also possible to not introduce those simplifications.
Depending on certain implementation requirements of the inventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, in particular a disk, DVD or a CD having electronically readable control signals stored thereon, which cooperate with a programmable computer system such that the inventive methods are performed. Generally, the present invention is, therefore, a computer program product with a program code stored on a machine readable carrier, the program code being operative for performing the inventive methods when the computer program product runs on a computer. In other words, the inventive methods are, therefore, a computer program having a program code for performing at least one of the inventive methods when the computer program runs on a computer.
While the foregoing has been particularly shown and described with reference to particular embodiments thereof, it will be understood by those skilled in the art that various other changes in the form and details may be made without departing from the spirit and scope thereof. It is to be understood that various changes may be made in adapting to different embodiments without departing from the broader concepts disclosed herein and comprehended by the claims that follow.
Number | Date | Country | Kind |
---|---|---|---|
0600713 | Mar 2006 | SE | national |
This application claims priority to U.S. patent application Ser. No. 60/788,911 filed Apr. 3, 2006, and Sweden patent application number 0600713-2, filed Mar. 29, 2006, which are incorporated herein in their entirety by these references made thereto.
Number | Name | Date | Kind |
---|---|---|---|
7394903 | Herre et al. | Jul 2008 | B2 |
7765104 | Pang et al. | Jul 2010 | B2 |
20020067834 | Shirayanagi | Jun 2002 | A1 |
20050195981 | Faller et al. | Sep 2005 | A1 |
20090129601 | Ojala et al. | May 2009 | A1 |
Number | Date | Country |
---|---|---|
2148447 | May 1994 | CA |
1376538 | Jan 2004 | EP |
200404222 | Mar 2004 | TW |
200405673 | Apr 2004 | TW |
WO 2004019656 | Mar 2004 | WO |
2005101370 | Oct 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20070233293 A1 | Oct 2007 | US |
Number | Date | Country | |
---|---|---|---|
60788911 | Apr 2006 | US |