This application is a continuation of copending International Application No. PCT/EP2005/011586, filed Oct. 28, 2005, which designated the United States, and was not published in English and is incorporated herein by reference in its entirety.
1. Field of the Invention
The present invention relates to multi-channel reconstruction of audio signals based on an available stereo signal and additional control data.
2. Description of Prior Art
Recent development in audio coding has made available the ability to recreate a multi-channel representation of an audio signal based on a stereo (or mono) signal and corresponding control data. These methods differ substantially from older matrix based solution such as Dolby Prologic, since additional control data is transmitted to control the re-creation, also referred to as up-mix, of the surround channels based on the transmitted mono or stereo channels.
Hence, the parametric multi-channel audio decoders reconstruct N channels based on M transmitted channels, where N>M, and the additional control data. The additional control data represents a significant lower data rate than transmitting the additional N-M channels, making the coding very efficient while at the same time ensuring compatibility with both M channel devices and N channel devices.
These parametric surround coding methods usually comprise a parameterisation of the surround signal based on IID (Inter channel Intensity Difference) and ICC (Inter Channel Coherence). These parameters describe power ratios and correlation between channel pairs in the up-mix process. Further parameters also used in prior art comprise prediction parameters used to predict intermediate or output channels during the up-mix procedure.
One of the most appealing usage of prediction based method as described in prior art is for a system that re-creates 5.1 channel from two transmitted channels. In this configuration a stereo transmission is available at the decoder side, which is a downmix of the original 5.1 multichannel signal. In this context it is particularly interesting to be able to as accurately as possible extract the center channel from the stereo signal, since the center channel is usually downmixed to both the left and the right downmix channel. This is done by means of estimating two prediction coefficients describing the amount of each of the two transmitted channels used to build the center channel. These parameters are estimated for different frequency regions similarly to the IID and ICC parameters above.
However, since the prediction parameters do not describe a power ratio of two signals, but are based on wave-form matching in a least square error sense, the method becomes inherently sensitive to any modification of the stereo waveform after the calculation of the prediction parameters.
Further developments in audio coding over the recent years has introduced High Frequency Reconstruction methods as a very useful tool in audio codecs at low bitrates. One example is SBR (Spectral Band Replication) [WO 98/57436], that is used in MPEG standardized codecs such as MPEG-4 High Efficiency AAC. Common for these methods are that they re-create the high frequencies on the decoder side from a narrow-band signal coded by the underlying core-codec and a small amount of additional guidance information. Similar to the case of the parametric reconstruction of multi-channel signals based on one or two channels, the amount of control data required to re-create the missing signal components (in the case of SBR, the high frequencies), is significantly smaller than the amount of data that would be required to code the entire signal with a wave-form codec.
It should be understood however, that the re-created highband signal, is perceptually equal to the original highband signal, while the actual wave-form differs significantly. Furthermore, for wave-form coders coding stereo signals at low bitrate stereo pre-processing is commonly used, which means that a limitation on the side signal of the mid/side representation of the stereo signal is performed.
When a multi-channel representation is desired based on a stereo codec signal using MPEG-4 High Efficiency AAC or any other codec utilising high frequency reconstruction techniques, these and other aspects of the codec used to code the down-mixed stereo signal must be considered.
Even further, it is common that for a recording available as a multi-channel audio signal there is a dedicated stereo mix available, that is not an automated down-mix version of the multi-channel signal. This is commonly referred to as “artistic down-mix”. This down-mix cannot be expressed as a linear combination of the multi-channel signals.
It is an object of the present invention to provide an improved multi-channel down-mix/encoder or up-mix/decoder concept, which results in a better quality reconstructed multi-channel output.
In accordance with a first aspect, the invention provides a multi-channel synthesizer for generating at least three output channels using an input signal having at least one base channel, the base channel being derived from the original multi-channel signal, having:
an up-mixer for up-mixing the at least one base channel based on an energy-loss introducing up-mixing rule so that the at least three output channels are obtained,
wherein the up-mixer is operative to generate the at least three output channels in response to an energy measure and at least two different up-mixing parameters so that the at least three output channels have an energy higher than an energy of a signal obtained by only using the energy-loss introducing up-mixing rule instead of an energy error, the energy error depending on the energy-loss introducing up-mixing rule, and
wherein the at least two different up-mixing parameters and the energy measure for controlling the up-mixer are included in the input signal.
In accordance with a second aspect, the invention provides an encoder for processing a multi-channel input signal, having an energy measure calculator for calculating an energy measure depending on an energy difference between a multi-channel input signal or an at least one base channel derived from the multi-channel input signal and an up-mixed signal generated by an energy-loss introducing up-mixing operation; and
an output interface for outputting the at least one base channel after being scaled by a scaling factor dependent on the energy measure or for outputting the energy measure.
In accordance with a third aspect, the invention provides a method of generating at least three output channels using an input signal having at least one base channel, the base channel being derived from the original multi-channel signal, the method including the steps of:
up-mixing the at least one base channel based on an energy-loss introducing up-mixing rule so that the at least three output channels are obtained,
wherein, in the step of upmixing, the at least three output channels are generated in response to an energy measure and at least two different up-mixing parameters so that the at least three output channels have an energy higher than an energy of a signal obtained by only using the energy-loss introducing up-mixing rule instead of an energy error, the energy error depending on the energy-loss introducing up-mixing rule, and
wherein the at least two different up-mixing parameters and the energy measure for controlling the up-mixer are included in the input signal.
In accordance with a fourth aspect, the invention provides a method of processing a multi-channel input signal, the method including the steps of:
calculating an error measure depending on an energy difference between a multi-channel input signal or an at least one base channel derived from the multi-channel input signal and an up-mixed signal generated by an energy-loss introducing up-mixing operation; and
outputting the at least one base channel after being scaled by a scaling factor dependent on the energy measure or outputting the energy measure.
In accordance with a fifth aspect, the invention provides an encoded multi-channel information signal having at least one base channel scaled by an energy measure depending on an energy difference between a multi-channel input signal or an at least one base channel derived from the multi-channel input signal and an up-mixed signal generated by an energy-loss introducing up-mixing operation or having the energy measure or for outputting the energy measure.
In accordance with a sixth aspect, the invention provides a machine-readable medium having stored thereon an encoded multi-channel information signal having at least one base channel scaled by an energy measure depending on an energy difference between a multi-channel input signal or an at least one base channel derived from the multi-channel input signal and an up-mixed signal generated by an energy-loss introducing up-mixing operation or having the energy measure or for outputting the energy measure.
The present invention relates to the problem of waveform modification of the down mixed multi-channel signal when prediction based up-mix methods are used. This includes when the down-mixed signal is coded by a codec performing stereo-pre-processing, high frequency reconstruction and other coding schemes that significantly modifies the waveform. Furthermore, the invention addresses the problem that arises when using predictive up-mix techniques for an artistic down-mix, i.e. a down-mix signal that is not automated from the multi-channel signal.
The present invention comprises the following features:
The present invention will now be described by way of illustrative examples, not limiting the scope or spirit of the invention, with reference to the accompanying drawings, in which:
a a schematic diagram of a preferred multi-channel encoder;
b a flow chart of the preferred method performed by the device of
a a multi-channel encoder having a spectral band replication functionality for generating a different parameterisation compared to the device in
b a tabular illustration of frequency-selective generation and transmission of parametric data; and
a an inventive decoder illustrating the calculation of up-mix matrix coefficients;
b a detailed description of parameter calculation for the predictive up-mix;
The below-described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
It is emphasized that subsequent parameter calculation, application, upmixing, downmixing or any other actions can be performed on a frequency band selective base, i.e. for subbands in a filterbank.
In order to outline the advantages of the present invention a more detailed description of a predictive upmix as known by prior art is given first. Let's assume a three channel upmix based on two downmix channels, as outlined in
Assume the following definitions where X is a 3×L matrix containing the three signal segments l(k), r(k), c(k), k=0, . . . ,L-1 as rows.
Likewise, let the two downmixed signals l0(k), r0(k) form the rows of X0. The downmix process is described by
X0=DX (1)
where the downmix matrix is defined by
A preferred choice of downmix matrix is
which means that the left downmix signal l0(k) will contain only l(k) and αc(k), and r0(k) will contain only r(k) and αc(k). This downmix matrix is preferred since it assigns an equal amount of the center channel to the left and right downmix, and since it does not assign any of the original right channel to the left downmix or vice versa.
The upmix is defined by
{circumflex over (X)}=CX0 (4)
where C is a 3×2 upmix matrix.
The predictive upmix as known from prior art relies on the idea of solving the overdetermined system
CX0=X (5)
for C in the least squares sense. This leads to the normal equations
CX0X*0=XX*0 (6)
Multiplying (6) from the left with D gives DCX0X*0=X0X*0, which, in the generic case where X0X*=DXX*D* is non-singular, implies
DC=I2 (7)
where, In, denotes the n identity matrix. This relation reduces the parameter space C to dimension two.
Given the above, the upmix matrix
can be completely defined on the decoder side if the downmix matrix D is known, and two elements of the C matrix are transmitted, e.g. c11 and c22.
The residual (prediction error) signals are given by
Xr=X−{circumflex over (X)}=(I3−CD)X (8)
Multiplying from the left with D yields
DXr=(D−DCD)X=0 (9)
due to (7). It follows that there is a 1×L row vector signal xr such that
Xr=vxr (10)
where v is a 3×1 unit vector spanning the kernel (null space) of D. For instance, in the case of downmix (3), one can use
In general, when v=[v1, vr, vc]T, and the {circumflex over (X)}=[{circumflex over (l)}(k), {circumflex over (r)}(k), ĉ(k)]T this just means that, up to a weight factor, the residual signal is common for all three channels,
l(k)={circumflex over (l)}(k)+vlxr(k)
r(k)={circumflex over (r)}(k)+vrxr(k)
c(k)=ĉ(k)+vcxr(k) (12)
Due to the orthogonality principle, the residual xr(k) is orthogonal to all three predicted signals {circumflex over (l)}(k), {circumflex over (r)}(k), ĉ(k).
Problems Solved and Improvements Obtained by Preferred Embodiments of the Present Invention
Evidently the following problems arise when using prediction based up-mix according to prior art as outlined above:
Energy Compensation
As mentioned above, one of the problems with prediction based multi-channel re-construction is that the prediction error corresponds to an energy loss of the three reconstructed channels. In the below, the theory for this energy loss and a solution as taught by preferred embodiments is outlined. Firstly, the theoretical analysis is performed, and subsequently a preferred embodiment of the present invention according to the below outlined theory is given.
Let E, Ê, and Er be the sum of the energies of the original signals in X, the predicted signals in {circumflex over (X)} and the prediction error signals in Xr, respectively. From orthogonality, it follows that
E=Ê+Er (13)
The total prediction gain can be defined as
but in the following it will be more convenient to consider the parameter
Hence, ρ2ε[0,1] measures the total relative energy of the predictive upmix.
Given this ρ, it is possible to readjust each channel by applying a compensation gain, {circumflex over (z)}g(k)=gz{circumflex over (z)}(k), such that ∥{circumflex over (z)}g∥2=∥z∥2 for z=1, r, c. Specifically, the target energy is given by (12),
∥zμ2=∥{circumflex over (z)}∥2+vz2∥xr∥2 (15)
so we need to solve
gz2∥{circumflex over (z)}∥2=∥{circumflex over (z)}∥2+vz2∥xr∥2 (16)
Here, since v is a unit vector,
Er=∥xr∥2, (17)
and it follows from the definition (14) of ρ and (13) that
Putting all this together, we arrive at the gain
It is evident that with this method, in addition to transmitting ρ, the energy distribution of the decoded channels has to be computed at the decoder. Moreover only the energies are reconstructed correctly, while the off diagonal correlation structure is ignored.
It is possible to derive a gain value that ensures that the total energy is preserved, while not ensuring that the energy of the individual channels are correct. A common gain for all channels gz=g that ensures that the total energy is preserved is obtained via the defining equation g2Ê=E. That is,
By linearity, this gain can be applied in the encoder to the downmixed signals, so that no additional parameter has to be transmitted.
In
In an alternative implementation of the present invention the energy correction can be done on the encoder side.
A preferred example for a down-mixing matrix corresponding to equation (3) is noted below the down-mixer in
As will be outlined later on, for the present case of a down-mixer having, as an input, three channels, and, having, as an output, two channels, two additional up-mix parameters c1, c2 are at least required. When a down-mixing matrix D is variable or not fully known to a decoder, also additional information on the used down-mix has to be transmitted from the encoder-side to a decoder-side, in addition to the parameters 105 and 106.
Correlation Structure
One of the problems with the up-mix procedure described by prior art is that it does not re-construct the correct correlation between the re-created channels. Since, as was outlined above, the centre channel is predicted as a linear combination of the left down-mix channel and the right down-mix channel, and the left and right channels are reconstructed by subtracting the predicted center channel from the left and right down-mix channels. It is evident that the prediction error will result in remains of the original center channel in the predicted left and right channel. This implies that the correlations between the three channels are not the same for the reconstructed channels as it was for the original three channels.
A preferred embodiment teaches that the predicted three channels should be combined with de-correlated signals in accordance with the measured prediction error.
The basic theory for achieving the correct correlation structure is now outlined. The special structure of the residual can be used to reconstruct the full 3×3 correlation structure XX* by substituting a de-correlated signal xd for the residual in the decoder.
First, note that the normal equations (6) lead to X,X*0=0 so
X,{circumflex over (X)}*=0, {circumflex over (X)}X*r=0 (21)
Hence, as X={circumflex over (X)}+X,
XX*={circumflex over (X)}{circumflex over (X)}*+XrX*r={circumflex over (X)}{circumflex over (X)}*+vv*Er (22)
where (10) and (17) were applied for the last equality.
Let xd be a signal de-correlated from all decoded signals {circumflex over (l)}, {circumflex over (r)}, ĉ such that {circumflex over (X)}x*r=0. The enhanced signal
Y={circumflex over (X)}+vxd (23)
then has the correlation matrix
YY*={circumflex over (X)}{circumflex over (X)}*+vv*∥xdμ2 (24)
In order to completely reproduce the original correlation matrix (22), it suffices that
∥xd∥2=Er (25)
If xd is obtained by de-correlating the downmixed signal, say
followed by a gain γ then it should hold that
This gain can be computed in the encoder. However, if the more well-defined parameter ρ2ε[0,1] from (14) is to be used, estimation of Ê and
has to be performed in the decoder. In light of this, a more attractive alternative is to generate xd using three decorrelators
xd=γ·(d1{{circumflex over (l)}}+d2{{circumflex over (r)}}+d3{ĉ}) (26a)
since then ∥xdμ2=γ2Ê, so (25) is satisfied by the choice
The mixing of the predictive up-mixed signals with decorrelated versions of the same is an essential feature of the present invention. In
A third preferred embodiment uses decorrelators 501, 502, 503 for the up-mixed channels. A de-correlated signal can also be generated by a de-correlator 501′, which receives, as an input signal, the down-mix channel or even all down-mix channels. Furthermore, in case of more than one down-mix channel, as shown in
Furthermore, it is outlined in connection with
In
Regarding the channel-specific down-mix-dependent parameter νz, the same remarks as outlined above with respect to
Furthermore, it is to be noted here that the
When only a part of the residual energy is to be covered by a de-correlated signal, pre-correction only has to be partly removed by pre-scaling the signal input into the mixing box 504, 505, 506 by a ρ-dependent factor, which is, however, closer to one than the factor ρ itself. Naturally, this partly-compensating pre-scaling factor will depend on the encoder-generated signal K input at 605 in
Controlling the Degree of Decorrelation
A preferred embodiment of the invention teaches that the amount of de-correlation added to the predicted up-mixed signals can be controlled from the encoder, while still maintaining the correct output energy. This is since in a typical “interview” example of dry speech in the center channel and ambience in the left and right channels, the substitution of de-correlated signal for prediction error in the center channel may be undesirable.
According to a preferred embodiment of the present invention an alternative mixing procedure to the one outlined in
We will assume that a total energy preserving gain compensation (20) has been performed on the downmixed signal, so that we first obtain the decoded signal {circumflex over (X)}/ρ. From this, a decorrelated signal d with same total energy ∥d∥2=Ê/ρ2 is produced, for instance by use of three decorrelators as in the previous section. The total upmix is then defined according to
where εε[ρ,1] is a transmitted parameter. The choice κ=1 corresponds to total energy preservation without decorrelated signal addition and κ=ρ corresponds to full 3×3 correlation structure reproduction. We have
so the total energy is preserved for all κε[ρ,1], as it can be seen by computing the traces (sum of diagonal values) of the matrices in (30). However, correct individual energy is only obtained for κ=ρ.
The above described embodiment of the present invention, allows the system to employ a detection mechanism on the encoder side, that estimates the amount of de-correlation to be added in the prediction based up-mix. The implementation described in
This means that for an example with three ambient signals, e.g. a classical music piece, with a lot of ambience, the encoder can detect the lack of a “dry” center channel, and let the decoder replace the entire prediction error with de-correlated signal, thus re-creating the ambience of the sound from the three channels in a way that would not be possible with prior-art prediction based methods alone. Furthermore, for a signal with a dry center channel, e.g. speech in the center channel and ambient sounds in the left and right channels, the encoder detects that replacing the prediction error by de-correlated signal is not psycho-acoustically correct and instead let the decoder adjust the levels of the three reconstructed channels so that the energy of the three channels is correct. Obviously the extreme examples above represents two possible outcomes of the invention. It is not limited to cover just the extreme cases outlined in the above examples.
Adapting the Prediction Coefficients to Modified Waveforms.
As outlined above the prediction parameters are estimated by minimising the mean square error given the original three channels X and a downmix matrix D. However, in many situations it cannot be relied upon that the downmixed signal can be described as a downmix matrix D multiplied by a matrix X describing the original multichannel signal. One obvious example for this is when a so called “artistic downmix” is used, i.e. the two channel downmix can not be described as a linear combination of the multichannel signal. Another example is when the downmixed signal is coded by a perceptual audio codec that utilises stereo-pre processing or other tools for improved coding efficiency. It is commonly known in prior art that many perceptual audio codecs rely on mid/side stereo coding, where the side signal is attenuated under bitrate constrained condition, yielding an output that has a narrower stereo image than that of the signal used for encoding.
As mentioned earlier perceptual audio codecs employ mid/side coding for stereo coding at low bitrates. Furthermore, stereo pre-processing is commonly employed in order to reduce the energy of the side signal under bitrate constrained conditions. This is done based on the psycho acoustical notion that for a stereo signal reduction of the width of the stereo signal is a preferred coding artefact over audible quantisation distortion and bandwidth limitation.
Hence, if a stereo pre-processing is used, the down-mix equation (3), can be expressed as
where γ is the attenuation of the side signal. As outlined earlier the D matrix needs to be known on the decoder side in order to correctly be able to reconstruct the three channels. Hence, the present embodiment teaches that the attenuation factor should be sent to the decoder.
Parameterisation for HFR Codec Signals
If the prediction based upmix is used with High Frequency Reconstruction methods such as SBR [WO 98/57436], the prediction parameters estimated on the encoder side will not match the re-created high band signal on the decoder side. The present embodiment teaches the use of an alternative non-wave form based up-mix structure for re-creation of three channels from two. The proposed up-mix procedure is designed to re-create the correct energy of all up-mixed channels in case of un-correlated noise signals.
Assuming that the downmix matrix Dα as defined in (3) is used. And that we now will define the upmix matrix C. Then the upmix is defined by
{circumflex over (X)}=CX0 (32)
Striving at only re-creating the correct energy of the up-mixed signal l(k), r(k), and c(k), where the energies are L, R and C, the up-mix matrix is chosen so that the diagonal elements of {circumflex over (X)}{circumflex over (X)}* and XX* are the same, according to:
The corresponding expression for the downmix matrix will be
Setting the diagonal element, of {circumflex over (X)}{circumflex over (X)}* equal to the diagonal element of XX* translates to three equations defining the relation between the elements in C and L, R and C
Based on the above an up-mix matrix can be defined. It is 10 preferable to define an up-mix matrix that does not add the right down-mixed channel to the left up-mixed channel and vice versa. Hence, a suitable up-mix matrix may be
This gives a C matrix according to:
It can be shown that the elements of the C matrix can be re-created on the decoder side from the two transmitted parameters
from which the C matrix can be derived on the decoder side. These parameters along with the parameters output from 104 are input to selection module 1002. In one preferred embodiment, the selection module 1002 outputs the parameters from 104 if the parameters correspond to a frequency range that is coded by a wave-form codec, and outputs the parameters from 1001 if the parameters correspond to a frequency range reconstructed by HFR. The selection module 1002 also outputs information 1005 on which parameterisation is used for the different frequency ranges of the signal.
On the decoder side the module 1004 takes the transmitted parameters and directs them to the predictive up-mix 109 or the energy-based up-mix 1003 according to the above, dependent on the indication given by the parameter 1005. The energy based up-mix 1003 implements the up-mix matrix C according to equation (40).
The upmix matrix C as outlined in equation (40) has equal weights (δ) to obtain the estimated (decoder) signal c(k) from the two downmixed signals l0(k), r0(k). Based on the observation that the relative amount of the signal c(k) may differ in the two downmixed signals l0(k), r0(k) (i.e., C/L not equal to C/R), one could also consider the following generic upmix matrix:
In order to estimate c(k), this embodiment also requires transmission of two control parameters c1 and c2, which are for example equal to c1=α2C/(L+α2X) and c2=α2X/(R+α2C)). A possible implementation of the upmix matrix functions fi is then given by
The signalling of the different parameterisation for the SBR range according to the present invention is not limited to SBR. The above outlined parameterisation can be used in any frequency range where the prediction error of the prediction based up-mix is deemed too large. Hence, module 1002 may output the parameters from 1001 or 104 dependent on a multitude of criteria, such as coding method of the transmitted signals, prediction error etc.
A preferred method for improved prediction based multi-channel reconstruction includes, at the encoder side, extracting different multi-channel parameterisations for different frequency ranges, and, at the decoder side, applying these parameterisations to the frequency ranges in order to re-construct the multi-channels.
A further preferred embodiment of the present invention includes a method for improved prediction based multi-channel reconstruction including, at the encoder side, extracting information on the down-mix process used and subsequently sending this information to a decoder, and, at the decoder side, applying an up-mix based on extracted prediction parameters and the information on the down-mix in order to reconstruct the multi-channels.
A further preferred embodiment of the present invention includes a method for improved prediction based multi-channel reconstruction, in which, at the encoder side, the energy of the down-mix signal is adjusted in accordance with a prediction error obtained for the extracted predictive up-mix parameters.
A further preferred embodiment of the present invention relates to a method for improved prediction based multi-channel reconstruction, in which, at the decoder side, an energy lost due to the prediction error is compensated for by applying a gain to the up-mixed channels.
A further embodiment of the present invention relates to a method for improved prediction based multi-channel reconstruction, in which, at the decoder side, the energy lost due to a prediction error is replaced by a de-correlated signal.
A further preferred embodiment of the present invention relates to a method for improved prediction based multi-channel reconstruction, in which, at the decoder side, a part of the energy lost due to a prediction error is replaced by a de-correlated signal, and a part of the energy lost is replaced by applying a gain to the up-mixed channels. This part of the energy lost is preferably signalled from an encoder.
A further preferred embodiment of the present invention is an apparatus for improved prediction based multi-channel reconstruction comprising means for adjusting the energy of the down-mix signal in accordance with the prediction error obtained for the extracted predictive up-mix parameters.
A further preferred embodiment of the present invention is an apparatus for improved prediction based multi-channel reconstruction comprising means for compensating for the energy loss due to the prediction error by applying a gain to the up-mixed channels.
A further preferred embodiment of the present invention is an apparatus for improved prediction based multi-channel reconstruction comprising means for replacing the energy lost due to the prediction error by a de-correlated signal.
A further preferred embodiment of the present invention is an apparatus for improved prediction based multi-channel reconstruction comprising means for replacing part of the energy lost due to the prediction error by a de-correlated signal, and part of the energy lost by applying a gain to the up-mixed channels.
A further preferred embodiment of the present invention is an encoder for improved prediction based multi-channel reconstruction including adjusting the energy of the down-mix signal in accordance with the prediction error obtained for the extracted predictive up-mix parameters.
A further preferred embodiment of the present invention is a decoder for improved prediction based multi-channel reconstruction including compensating for an energy loss due to the prediction error by applying a gain to the up-mixed channels.
A further preferred embodiment of the present invention relates to a decoder for improved prediction based multi-channel reconstruction including replacing the energy lost due to the prediction error by a de-correlated signal.
A further preferred embodiment of the present invention is a decoder for improved prediction based multi-channel reconstruction including replacing a part of the energy lost due to the prediction error by a de-correlated signal, and a part of the energy lost by a applying a gain to the down-mixed channels.
Preferably, the energy measure is any measure related to an energy loss introduced by the upmixing rule. It can be an absolute measure of the upmix-introduced energy error or the energy of the upmix signal (which is normally lower in energy than the original signal), or it can be a relative measure such as a relation between the original signal energy and the upmix signal energy or a relation between the energy error and the original signal energy or even a relation between the energy error and the upmix signal energy. A relative energy measure can be used as a correction factor, but nevertheless is an energy measure since it depends on the energy error introduced into the upmix signal generated by an energy-loss introducing upmixing rule or—stated in other words—a non-energy-preserving upmixing rule.
An exemplary energy-loss introducing upmixing rule (non-energy-preserving upmixing rule) is an upmix using transmitted prediction coefficients. In case of a non-prefect prediction of a frame or subband of a frame, the upmix output signal is affected by a prediction error, corresponding to an energy loss. Naturally, the prediction error varies from frame to frame, since in case of an almost perfect prediction (a low prediction error) only a small compensation (by scaling or adding a decorrelated signal) has to be done while in case of a larger prediction error (a non-perfect prediction) more compensation has to be done. Therefore, the energy measure also varies between a value indicating no or only a small compensation and a value indicating a large compensation.
When the energy measure is considered as an InterChannel Coherence (ICC) value, which consideration is natural, when the compensation is done by adding a decorrelated signal scaled depending on the energy measure, the preferably used relative energy measure (ρ) varies typically between 0.8 and 1.0, wherein 1.0 indicates that the upmixed signals are decorrelated as required or that no decorrelated signal has to be added or that the energy of the predictive upmix result is equal to the energy of the original signal or that the prediction error is zero.
However, the present invention is also useful in connection with other energy-loss introducing upmixing rules, i.e. rules that are not based on waveform matching but that are based on other techniques, such as the use of codebooks, spectrum matching, or any other upmixing rules that do not care for energy preservation.
Generally, the energy compensation can be performed before or after applying the energy-loss introducing upmixing rule. Alternatively, the energy loss compensation can even be included into the upmixing rule such as by altering the original matrix coefficients using the energy measure so that a new upmixing rule is generated and used by the up-mixer. This new upmixing rule is based on the energy-loss introducing upmixing rule and the energy measure. Stated in other words, this embodiment is related to a situation in which the energy compensation is “mixed” into the “enhanced” upmixing rule so that the energy compensation and/or the addition of a decorrelated signal are performed by applying one or more upmixing matrices to an input vector (the one or more base channel) to obtain (after the one or more matrix operations) the output vector (the reconstructed multi-channel signal having at least three channels).
Preferably, the up-mixer device receives two base channels l0, r0 and outputs three re-constructed channels l, r and c.
Subsequently, reference is made to
For the subsequent discussion, however, it is assumed that the energy 1202 of the base channels is the same as the energy 1200 of the original multi-channel signal.
1204 illustrates the energy of the up-mix signals, when the up-mix signals (e.g., 110, 111, 112 of
The up-mixer 1104 is operative to output output channels, which have an energy, which is higher than the energy 1204. Preferably, the up-mixer device 1104 performs a complete compensation so that the up-mix result 1100 in
Preferably, the up-mix result, the energy of which is shown at 1204, is not simply up-scaled as shown in
Number 1 of the Table in
Number 2 of
Number 3 of the Table in
Number 4 of
Number 5 of the
The number 6 embodiment in the Table in
The number 8 embodiment of
Subsequently, a preferred embodiment of the encoder is described in detail.
The encoder includes an energy measure calculator 1402 for calculating an error measure depending on an energy difference between an energy of the multi-channel input signal 1400 or an at least one base channel 1404 and an up-mixed signal 1406 generated by a non-energy conserving up-mixing operation 1407.
Furthermore, the encoder includes an output interface 1408 for outputting the at least one base channel after being scaled (401, 402) by a scaling factor 403 depending on the energy measure or for outputting the energy measure itself.
In a preferred embodiment, the encoder includes a down-mixer 1410 for generating the at least one base channel 1404 from the original multi-channels 1400. For generating the up-mix parameters, a difference calculator 1414 and a parameter optimiser 1416 are also present. These elements are operative to find the best-matching up-mix parameters 1412. At least two of this set of best fitting up-mix parameters are outputted via the output interface as the parameter output in a preferred embodiment. The difference calculator is preferably operative to perform a minimum means square error calculation between the original multi-channel signal 1400 and the up-mixer-generated up-mix signal for parameters input at parameter line 1412. This parameter optimisation procedure can be performed by several different optimisation procedures, which are all driven by the goal to obtain a best-matching up-mix result 1406 by a certain up-mixing matrix included in the up-mixer 1408.
The functionality of
Then, when the best fitting parameters set, e.g., the best fitting up-mix matrix has been found, at least two up-mixing parameters of the parameters set generated by step 1444 are output to the output interface as indicated by step 1446.
Furthermore, after the up-mix parameter optimisation step 1444 is complete, the energy measure can be calculated and output as indicated by step 1448. Generally, the energy measure will depend on the energy error 1210. In a preferred embodiment, the energy measure is the factor ρ which depends on the relation of the energy of the up-mix result 1406 and the energy of the original signal 1400 as shown in
Subsequently, reference is made to
When the
The encoded base channels at the output 1508 only include a low-band of the base channels 1504 in encoded form. Information on the high-band is calculated by an SBR spectral envelope calculator 1512, which is connected to an SBR information encoder 1514 for generating and outputting encoded SBR-side information at an output 1516.
The original signal 1502 is input into an energy calculator 1520, which generates channel energies (for a certain time period of the original channels l, c, r, wherein the channel energies are indicated by L, C, R, output by block 1520). The channel energies L, C, R, are input into a parameter calculator block 1522. The parameter calculator 1522 outputs two up-mix parameters c1, c2, which can, for example, be the parameters c1, c2, indicated in
The
Preferably, however, and as described in connection with
b shows a schematic representation of a parametric representation output by selection module 1002 in
Furthermore, the present invention is also useful when parameterisations different from the predictive parameterisation as shown in
Furthermore, it is to be noted that the frequency or time-selective calculation and transmission of parameters can be signalled explicitly as shown at 1005 in
Furthermore, it is to be noted that the encoder-side calculation of one, two or even more different parameterisations and the encoder-side selection, which parameterisation is transmitted is based on a decision using any encoder-side available information (the information can be an actually used target function or signalling information used for other reasons such as SBR processing and signalling) can be performed with or without transmitting the energy measure. Even when the preferred energy correction is not performed at all, e.g., when the result of the non-energy-conserving up-mix (predictive up-mix) is not energy-corrected, or when no corresponding pre-compensation on the encoder-side is performed, the preferred switching between different parameterisations is useful for obtaining a better multi-channel output quality and/or lower bit rate.
Particularly, the preferred switching between different parameterisations depending on available encoder-side information can be used with or without addition of a decorrelated signal completely or at least partly covering the energy error performed by the predictive up-mix as shown in connection with FIGS. 5 to 7. In this context, the addition of a de-correlated signal as described in connection with
Subsequently,
The calculation of the up-mix parameters is based on the equation in
The up-mix matrix in the device 1602 is set in accordance with the two transmitted up-mix parameters as forwarded by broken line 1604 and by the remaining four up-mix parameters calculated by block 1600. This up-mix matrix is then applied to the base channels input via line 1102. Depending on the implementation, an energy measure for a low-band correction is forwarded via line 1106 so that a corrected up-mix can be generated and output. When the predictive up-mix is only performed for the low-band as, for example, implicitly signalled via line 1606, and when there exist energy style up-mix parameters on line 1108 for the high-band, this fact is signalled, for a corresponding sub-band, to the calculator 1600 and to the up-mix matrix device 1602. In the energy style case, it is preferred to calculate the up-mix matrix elements of up-mix matrix (40) or (41). To this end, the transmitted parameters as indicated below equation (40) or the corresponding parameters as indicated below equation (41) are used. In this embodiment, the transmitted up-mix parameters c1, c2 cannot be directly used for an up-mix coefficient, but the up-mix coefficients of the up-mix matrix as shown in equation (40) or (41) have to be calculated using the transmitted up-mix parameters c1 and c2.
For the high-band, an up-mix matrix as determined for the energy-based up-mix parameters is used for up-mixing the high-band part of the multi-channel output signals. Subsequently, the low-band part and the high-band part are combined in a low/high combiner 1608 for outputting the full-bandwidth reconstructed output channels l, r, c. As illustrated in
The preferred methods or devices or computer programs can be implemented or included in several devices.
Depending on certain implementation requirements of the inventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, in particular a disk or a CD having electronically readable control signals stored thereon, which can cooperate with a programmable computer system such that the inventive methods are performed. Generally, the present invention is, therefore, a computer program product with a program code stored on a machine-readable carrier, the program code being configured for performing at least one of the inventive methods, when the computer program products runs on a computer. In other words, the inventive methods are, therefore, a computer program having a program code for performing the inventive methods, when the computer program runs on a computer.
While this invention has been described in terms of several preferred embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
0402652-2 | Nov 2004 | SE | national |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP05/11586 | Oct 2005 | US |
Child | 11290370 | Nov 2005 | US |