The present disclosure relates to a method for performing an adaptive down-mixing and following up-mixing of a multi-channel audio signal. In particular, the method is related to down-mixing and up-mixing operations that are commonly used in multi-channel audio coding or spatial audio coding.
Conventional adaptive down-mixing methods use a down-mixing transformation that is signal-dependent. Depending on the particular realization of the signal the most efficient down-mixing transformation is selected from a set of available down-mixing transformations. For example, in the case of stereo coding the down-mixing transformation of the stereo coding scheme can be selected, from a set comprising two different down-mixing transformations comprising an identity transformation (so-called LR coding) and a transformation yielding a sum (so-called M/Mid-channel) and a difference of the input channels (so-called S/Side-channel).
Such a conventional coding scheme is typically referred to as M/S coding or Mid/Side coding. Further such a conventional M/S coding provides only a limited rate distortion gain since the set of available transforms is limited. Moreover, since a closed loop coding is used, the associated complexity can be large.
These drawbacks of M/S coding have been addressed by down-mixing methods where the down-mixing transformation is computed based on an interchannel covariance matrix as described in M. Briand, D. Virette and N. Martin “Parametric Coding of Stereo Audio Based on Principal Component Analysis”, Proc. of the 9th International Conference on Digital Audio Effects, Montreal, Canada, Sep. 28, 2006. Further, this approach is limited to a stereo signal and cannot be adapted to a larger number of input channels. An extension of this approach to a higher number of channels is described in D. Yang, H. Ai, C. Kyriakakis, and C.-C. J. Kuo, “Progressive Syntax-Rich Coding of Multichannel Audio Sources,” EURASIP Journal on Applied Signal Processing, vol. 2003, pp. 980-992, January 2003. But this approach does not allow generating a backward compatible downmix.
Another disadvantage associated with the usage of a fixed set of down-mixing transformations is the difficulty in finding a suitable set of down-mixing transformations for the general case. A further conventional down-mixing transformation has been proposed in G. Hotho, L. F. Villemoes and J. Breebaart “A Backward-Compatible Multichannel Audio Codec” IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, No. 1, pp. 83 to 93, January 2008. This conventional method achieves a backward compatibility by combining a matrix down-mixing transformation with prediction of the secondary channels from the primary channels. This results in a parametric coding scheme where the parameters are prediction parameters. However, this conventional approach as described by Hotho et al. is only efficient when the number of channels is low. In addition, the coding performance of this conventional down-mixing approach is suboptimal in terms of rate distortion performance.
Conventional adaptive down-mixing methods either support an arbitrary number of channels but do not preserve the spatial characteristics of the original multi-channel audio signal, which means that the backward compatibility cannot be achieved, or they preserve the spatial characteristics of the original multi-channel audio signal in the generated down-mix but can only be used for multi-channel audio signals with a limited number of audio channels. Consequently, there is a need for a method and apparatus for performing an adaptive down-mixing of a multi-channel audio signal which allows preserving the spatial characteristics of the original multi-channel audio signal and which at the same time offer a backward compatibility.
According to a first implementation of a first aspect of the present disclosure a method is provided for performing an adaptive down-mixing of a multi-channel audio signal comprising a number of input channels,
wherein a signal adaptive transformation of the input channels is performed by multiplying the input channels with a downmix block matrix comprising a fixed block for providing a set of backward compatible primary channels and a signal adaptive block for providing a set of secondary channels.
In a second possible implementation of the first implementation of the first aspect of the present disclosure a signal adaptive block of the downmix block matrix is adapted depending on an interchannel covariance of the input channels.
In a further possible third implementation of the second implementation of the method according to the first aspect of the present disclosure an auxiliary covariance matrix for the interchannel covariance of the input channels is calculated by means of an auxiliary orthonormal transform.
In a further possible fourth implementation of the third implementation of the method according to the first aspect of the present disclosure said auxiliary orthonormal transform is calculated on the basis of the fixed block as initialization of a Gram-Schmidt procedure.
In a further possible fifth implementation of the third implementation of the method according to the first aspect of the present disclosure a Karhunen-Loeve-transformation matrix is calculated for a block of the auxiliary covariance matrix.
In a further possible sixth implementation of the fifth implementation of the method according to the first aspect of the present disclosure the signal adaptive block of the downmix block matrix is calculated on the basis of the calculated Karhunen-Loeve-transformation matrix.
In a further possible seventh implementation of the first to sixth implementation of the method according to the first aspect of the present disclosure the backward compatible primary channels are encoded by a single legacy encoder to generate a backward compatible primary legacy bit stream.
In a further possible eighth implementation of the method according to the first aspect of the present disclosure each backward compatible primary channel is encoded by a legacy encoder to generate a backward compatible primary legacy bit stream.
According to a possible ninth implementation of the seventh or eighth implementation of the method according to the first aspect of the present disclosure each secondary channel is encoded by a corresponding secondary channel encoder.
In a further possible tenth implementation of the seventh or eighth implementation of the method according to the first aspect of the present disclosure the secondary channels are encoded by a common multi-channel encoder to generate a secondary bit stream for the respective secondary channel.
According to a possible eleventh implementation of the third implementation of the method according to the first aspect of the present disclosure the interchannel covariance matrix or an auxiliary covariance matrix are quantized and transmitted with the secondary channel bit stream.
In a further possible twelfth implementation of the ninth or tenth implementation of the method according to the first aspect of the present disclosure the primary bit streams are transmitted along with the secondary bit streams to remote decoders.
In a further possible thirteenth implementation of the twelfth implementation of the method according to the first aspect of the present disclosure the remote decoders comprise a single legacy decoder adapted to decode the backward compatible primary bit streams for reconstructing the primary channels.
In a further fourteenth implementation of the twelfth implementation of the method according to the first aspect of the present disclosure the remote decoders comprise a corresponding number of legacy decoders adapted to decode the backward compatible primary bit streams for reconstructing the primary channels.
In a further possible fifteenth implementation of the twelfth implementation of the method according to the first aspect of the present disclosure the remote decoders comprise secondary channel decoders are adapted to decode the secondary bit streams for reconstructing the secondary channels.
In a further possible sixteenth implementation of the twelfth to fifteenth implementation of the method according to the first aspect of the present disclosure a type of a bit stream is signalled to the remote decoders.
In a further possible seventeenth implementation of the sixteenth implementation of the method according to the first aspect of the present disclosure the signalling of the type is performed by implicit signalling by means of auxiliary data transported in at least one bit stream.
In a further possible eighteenth implementation of the sixteenth implementation of the method according to the first aspect of the present disclosure the signalling of the type is performed by explicit signalling by means of a flag indicating the type of the respective bit stream.
In a further possible nineteenth implementation of the method according to the first aspect of the present disclosure the signal adaptive transformation of the number of input channels is performed by multiplying the input channels with the downmix block matrix to provide a set of backward compatible primary channels and a set of auxiliary channels.
In a further possible twentieth implementation of the nineteenth implementation of the method according to the first aspect of the present disclosure the Karhunen-Loeve-transformation KLT is applied to the set of auxiliary channels to provide the set of secondary channels.
According to a second aspect of the present disclosure a method for performing an adaptive up-mixing of received bit streams is provided,
wherein a backward compatible primary bit stream is decoded by a legacy decoder to reconstruct a corresponding primary channel, and
wherein a secondary bit stream is decoded by a secondary channel decoder to reconstruct a corresponding secondary channel,
wherein a signal adaptive inverse transformation of the decoder bitstreams is performed by means of an upmix block matrix to reconstruct a multi-channel audio signal comprising a number of output channels.
In a first possible implementation of the second aspect of the present disclosure a signal adaptive block of the upmix block matrix is adapted depending on a decoded interchannel covariance of the input channels.
In a further possible second implementation of the first implementation of the method according to the second aspect of the present disclosure an auxiliary covariance matrix for the interchannel covariance of the input channels is decoded.
In a further possible third implementation of the second implementation of the method according to the second aspect of the present disclosure an auxiliary orthonormal inverse transform is calculated on the basis of the fixed block as initialization of a Gram-Schmidt procedure.
In a further possible fourth implementation of the second implementation of the method according to the second aspect of the present disclosure a Karhunen-Loeve-transformation matrix is calculated for a block of the auxiliary covariance matrix.
In a possible fifth implementation of the fourth implementation of the method according to the second aspect of the present disclosure the signal adaptive block of the upmix block matrix is calculated on the basis of the calculated Karhunen-Loeve-transformation matrix.
According to a third aspect of the present disclosure a down-mixing apparatus is provided adapted to perform an adaptive down-mixing of a multi-channel audio signal comprising a number of input channels,
said down-mixing apparatus comprising:
a signal adaptive transformation unit which is adapted to perform a signal adaptive transformation of said input channels by multiplying the input channels with a downmix block matrix comprising a fixed block to provide a set of backward compatible primary channels and comprising a signal adaptive block to provide a set of secondary channels.
Possible implementations of the apparatus according to the third aspect are adapted to perform one, some or all of the implementations according to the first aspect.
According to a fourth aspect of the present disclosure an encoding apparatus is provided comprising a down-mixing apparatus according to the third aspect of the present disclosure and comprising further
at least one legacy encoder adapted to encode the backward compatible primary channels to generate at least one backward compatible primary bit stream and comprising
at least one secondary channel encoder adapted to encode the secondary channels to generate at least one secondary bit stream.
According to a fifth aspect of the present disclosure an up-mixing apparatus is provided adapted to perform an adaptive up-mixing of decoded bit streams comprising decoded primary bit streams and decoded secondary bit streams,
said up-mixing apparatus comprising
a signal adaptive retransformation unit which is adapted to perform a signal adaptive inverse transformation of the decoded bit streams by multiplying the decoded bit streams with an upmix block matrix comprising a fixed block for the decoded primary bit streams and a signal adaptive block for the decoded secondary bit streams.
According to a sixth aspect of the present disclosure a decoding apparatus is provided comprising an up-mixing apparatus according to the fifth aspect of the present disclosure and further comprising
at least one legacy decoder adapted to decode at least one received backward compatible primary bit stream to generate at least one decoded primary bit stream supplied to said up-mixing apparatus and comprising
at least one secondary channel decoder adapted to decode at least one received secondary bit stream to generate at least one decoded secondary bit stream supplied to said up-mixing apparatus.
Possible implementations of the apparatus according to the sixth aspect are adapted to perform one, some or all of the implementations according to the second aspect.
According to a seventh aspect of the present disclosure an audio system is provided comprising
at least one encoding apparatus according to the fourth aspect of the present disclosure and
at least one decoding apparatus according to the sixth aspect of the present disclosure,
wherein said encoding apparatus and said decoding apparatus are connected to each other via a network.
According to an eighth aspect of the disclosure a computer program is provided comprising a program code for performing the method according to any of the above method aspects or their implementations, when the computer program runs on a computer, a processor, a micro controller or any other programmable device.
The aforementioned aspects and their implementations can be implemented in hardware, software or in any combination of hardware and software.
In the following possible implementations of different aspects of the present disclosure are described with reference to the enclosed figures in more detail.
As can be seen in
The down-mixing apparatus 7 comprises a signal adaptive transformation unit which is adapted to perform a signal adaptive transformation of the received input channels of the multi-channel audio signal by multiplying the input channels with a downmix block matrix comprising a fixed block to provide a set of backward compatible primary channels and comprising a signal adaptive block to provide a set of secondary channels. The down-mixing operation performed by the down-mixing apparatus 7 can yield M channels in the down-mix domain comprising two groups, i.e. a first group of N backward compatible primary channels and a group of M-N secondary channels, where 1≦N≦M and 3≦M. Typically, the provided backward compatible primary channels comprise a larger energy than the secondary channels. This can be a result of the energy concentration achieved by the down-mixing method employed by the down-mixing apparatus 7.
As can be seen in
The backward compatible primary channels are encoded by a single legacy encoder 8 as shown in
In a possible scenario the backward compatible primary channels of the downmix signal can facilitate a playout using only the N primary channels which is also called legacy playout. In this situation the backward compatible primary channels do preserve some spatial properties of the original M input channels of the multi-channel audio signal in order to render a perceptually meaningful reconstruction using the legacy N channel playout.
As can be seen in
As can be seen in
In a possible implementation each bit stream can comprise an indication of the type of the respective bit stream. A possible type for a bit stream is an MP3 bit stream according to the standard ISO/IEC 11172-3. Alternative types for bit streams are advanced audio coding (AAC) bit streams as defined in the standard ISO/IEC 14496-3, or OPUS bit streams. The primary backward compatible bit stream can be one of these legacy types. MP3 and AAC are widely deployed and an existing legacy decoder can decode the backward compatible primary bit stream. The secondary bit stream can also be of a legacy type but also of a future or application individual type.
In a possible implementation the type of the respective bit stream is signalled to the remote decoders 10, 12 of the decoding apparatus 3. In a possible embodiment the signalling of the type is performed by an implicit signalling by means of auxiliary data transported in at least one bit stream. In an alternative embodiment the signalling is performed by explicit signalling by means of a flag indicating the type of the respective bit stream. In a possible embodiment it is possible to switch between a first signalling option comprising implicit signalling and a second signalling option comprising explicit signalling. In a possible implementation of the implicit signalling a flag can indicate a presence of the secondary channel information in auxiliary data of at least one backward compatible primary bit stream. The legacy decoder 10 does not check whether a flag is present or not and does only decode the backward compatible primary channel. For instance, the signalling of the secondary channel bit stream may be included in the auxiliary data of an AAC bit stream. Moreover, the secondary bit stream may also be included in the auxiliary data of an AAC bit stream. In that case, a legacy AAC decoder decodes only the backward compatible part of the bit stream and discards the auxiliary data. A not legacy type decoder according to an implementation of the present disclosure can check the presence of such a flag and if the flag is present in the received bit stream the not legacy decoder does reconstruct the multi-channel audio signal.
In a possible implementation of the explicit signalling a flag indicating that the bit stream is a secondary bit stream according to an implementation of the present disclosure obtained with a not legacy type secondary channel encoder 9 according to an implementation of the present disclosure can be used. A legacy decoder of the decoding apparatus 3 is not able to decode the bit stream as it does not know how to interpret this flag. However, a decoder according to an implementation of the present disclosure can have the ability to decode and can decide to decode either the backward compatible part only or the complete multi-channel audio signal.
A benefit of such a backward compatibility can be seen as follows. A mobile terminal according to an implementation of the present disclosure can decide to decode the backward compatible part to save the battery life of an integrated battery as the complexity load is lower. Moreover, depending on the rendering system, the decoder can decide which part of the bit stream to decode. For example, for rendering with a headphone, the backward compatible part of the received signal can be sufficient, while the multi-channel audio signal is decoded only when the terminal is connected for example to a docking station with a multi-channel rendering capability.
A main advantage provided by the backward compatibility provided by the audio system 1 according to the present disclosure is the possibility to decode directly the backward compatible part on a legacy decoder 10 which would not have the ability to render the multi-channel audio signal. Moreover, conventional equipment in which only a legacy decoder 10 is integrated may decode directly the backward compatible audio signal without the need to perform a transcoding operation from one coding format to another coding format. This facilitates the deployment of a new coding format and reduces the complexity for providing backward compatibility.
The backward compatible primary channels are generated in a backward compatible fashion. This means that the primary channels can be encoded using a conventional legacy audio encoder 8. For example, an existing stereo encoder can be used to encode stereo primary channels of the backward compatible downmix. Bit streams describing the backward compatible primary channels can be separated from the bit streams that render the reconstruction of the original multi-channel audio signal. For example, the multi-channel audio signal can be reconstructed by the conventional audio decoder 10 by stripping off bits from the complete bit stream. The reconstructed primary channels can be played out using a lower number of channels than the original number M of input channels. For example, a five channel signal can be played out using stereo loudspeakers.
A practical implication of the backward compatibility of the down-mixing transformation approach used by the method according to the present disclosure is that the backward compatible primary channels are generated in a restricted way. This restriction is due to the properties of the legacy encoders 8 and due to the requirement on particular composition of the backward compatible primary channels obtained by combining the channels of the original multi-channel signal.
In a possible embodiment the backward compatible primary channels can be encoded with an audio encoder (mono, stereo or multi-channel) which does provide a legacy primary bit stream for the N primary channels of the backward compatible downmix. The secondary channel encoder 9 generates another part of the bit stream which can be used by the decoding apparatus 3 to reconstruct the multi-channel audio signal. Each secondary channel can be encoded with a single channel audio encoder 9. Alternatively, a common multi-channel may be used for the secondary channels. This multi-channel audio encoder can use in a possible implementation a waveform coding scheme which is adapted to faithfully encode the waveforms of the secondary channels. In a further alternative embodiment the secondary channel encoder 9 can use a parametric representation of the secondary channels. For instance, a simple coding of the energy time and frequency envelopes of the secondary channels can be employed by the secondary channel encoder 9. In that case the secondary channel decoders 12 can use a characteristic of the secondary channels which are decorrelated to artificially generate the decoded secondary channels.
In the following the downmix operation is described with reference to an illustrative example. In this exemplary example the number M of input channels is M=3 and the number N of backward compatible primary channels is N=1. Accordingly, the multi-channel audio signal is performed in this example by a three-channel audio signal.
A method for performing an adaptive down-mixing of a multi-channel audio signal comprising a number M of input channels,
wherein a signal adaptive transformation of said input channels is performed by multiplying the input channels with a downmix block matrix WT comprising a fixed block WO for providing a set N of backward compatible primary channels and a signal adaptive block Wx for providing a set M-N of secondary channels.
The samples of the three-channel input signal can be represented by a random vector X with a realization xε. The signal can be divided into blocks, so that it can be viewed as stationary and, therefore, for each such block, an inter-channel covariance matrix ΣX={XXT} can be estimated for instance by computing a sample inter-channel covariance matrix. In a case with no backward compatibility constraint, the down-mixing method can lead to the maximum energy concentration in the channels of the down-mix signal. The energy concentration can be evaluated, for example, by computing a coding gain. If the energy concentration is large, the corresponding coding gain is large. The large coding gain indicates efficiency of source coding and thus facilitates coding of the primary and secondary channels of the down-mix. The optimal energy concentrating transform diagonalizes ΣX, i.e., the covariance matrix can be decomposed as ΣX=UΛUT, where U is a unitary transform (i.e., UUT=I) and A is a diagonal matrix. In this case the transform UT forms the KLT matrix and yields a diagonal covariance matrix, since Λ=UTΣXU. If the KLT matrix is used to generate the down-mix, the corresponding vector sample of the down-mix signal Y is then computed as:
The estimate of the inter-channel covariance matrix ΣX is updated on a frame-by-frame basis, which implies that the optimal transform UT varies in time. If for example y0 is a sample of a mono down-mix and because y0={right arrow over (u)}0Tx0, the relation to the original signal X is not fixed in time, it may happen that the perceptual quality of the down-mix is time-varying (in particular due to the modeling errors in this case). The vectors {right arrow over (u)}0T, . . . , {right arrow over (u)}2T form a basis in the 3 space that is optimized based on the signal statistics.
In a possible implementation to achieve a good quality of the down-mix signal one can construct a basis that contains some fixed vectors, which may be used to obtain down-mix channels with stable quality (primary channels), and some non-fixed vectors that can exploit the statistics of the signal and provide the optimal over-all energy concentration. Such a scenario is presented in
i. One can define a suitable criterion for designing a transform according to an implementation of the present disclosure. A reasonable criterion is the coding gain that may be maximized by improving the energy concentration. If the transform is given by matrix W, an inter-channel covariance matrix of the transformed signal is given by ΣY=WΣXWT. In general, matrix W is not the KLT matrix, and the inter-channel covariance matrix ΣY is not diagonal. However, since the transform matrix W is constrained to be unitary, one can use the diagonal elements of ΣY, given by σY
ii. In fact the numerator of (2) does not depend on the specific unitary transform that is used. This can be easily seen since Tr{WΣYWT}=Tr{WWTΣY}=Tr{ΣY}. Therefore the coding gain G is maximized if the denominator of (2) is minimized.
iii. For encoding of a multichannel signal represented by a source of X generating samples with xεM, an estimate of the inter-channel covariance matrix ΣX={XXT} is available. The goal is to find a transformation matrix W such that the coding gain G given by equation (2) is maximized, with a constraint on some vectors in W. One can therefore consider an orthonormal transform
W=[W0|WX], (3)
where W0εM×N contains N orthonormal vectors that are selected according to any arbitrary method that results in the stable quality of the down-mix. The other block of W that is of form of matrix WXεM×(M-N) which contains M−N remaining basis vectors that are adapted to obtain optimal energy concentration for a given covariance matrix ΣX. The design problem is to determine the optimal WX given the constrained part of the transform specified in W0.
To provide an algorithm for finding WX, it is possible to introduce an auxiliary orthonormal transform V
V=[W0|VX], (4)
where VXε M×(M-N) is chosen arbitrarily, so that VVT=I. Since the orthonormal transform V must be unitary, the columns of W0 and VX must be orthonormal. Several procedures exist that generate VX satisfying this requirement. For instance, one of these procedures involves a Gram-Schmidt procedure initialized with the basis vectors in W0 and applied to any vector in M.
For the covariance matrix of the transformed signal ΣY
one can use the fact that V is unitary. By introducing V additional structure is imposed into the design problem. One has therefore
where the structure with the off-diagonal zero matrices is due to the fact that the columns of VX are orthonormal to W0. It can be shown that the coding gain G in equation (2) is maximized if WXTVX is chosen to be the KLT of a corresponding block matrix within ΣV. Let ΣV be of the following form
Because Qε□(M-N)×(M-N) is an orthonormal transform that diagonalizes [ΣV](M-N)×(M-N)D the matrix Q may be found by means of a KLT performed over a block of [ΣV](M-N)×(M-N)D. Since V and ΣX are known, the optimal block WX of the transform W is given by
WX=(VVTQ)T. (9)
iv. The proposed method can be implemented very efficiently as shown in
The inter-channel covariance matrix ΣX of the input M channel signal can be available by means of estimation or transmitted as side information. The proposed method for generating the backward compatible down-mix WT=[W0|WX]T or up-mix W=[W0|WX] including N backward compatible primary channels from the input signal including M channels comprises the following encoding steps as shown in
Obtaining an estimate of the inter-channel covariance EX in step S61.
Choosing a predefined constrained part of the down-mixing transformation W0 in step S62.
Computing an arbitrary M×M transformation V that includes the block W0 in step S63.
Computing an auxiliary covariance matrix VTΣXV in step S64.
Computing the KLT matrix Q for a block [ΣV](M-N)×(M-N)D (see eq. (8)) of the auxiliary covariance matrix in step S65.
Computing the block WX according to the equation (9) in step S66.
According to some implementations an encoding algorithm can be implemented as shown in
Obtaining an estimate of the inter-channel covariance ΣX in step S71.
Choosing a predefined constrained part of the down-mixing transformation W0 in step S72.
Computing an arbitrary M×M transformation V that includes the block W0 in step S73.
Generating in step S74 a set of N primary channels and a set of M−N auxiliary channels by means of the transformation obtained in Step S73.
Computing the inter-channel covariance matrix for the subspace of the auxiliary channels based on known V and ΣX in step S75.
Computing in step S76 KLT for the subspace of the auxiliary channels based on the inter-channel covariance matrix obtained in Step S75.
Transforming in step S77 the auxiliary channels computed in Step S74 by means of the KLT computed in Step S76 that yields a set of M−N auxiliary channels.
According to a possible implementation the decoding method can be implemented as shown in
Obtaining in step S81 an estimate of the inter-channel covariance matrix ΣX that was transmitted as side information.
Choosing in step S82 a predefined constrained part of the down-mixing transformation W0 to be the same as the constrained part used in the down-mixing procedure.
Computing in a step S83 an inverse M×M transformation that includes the block W0
Decoding in a step S84 a bit-stream representing a set of N primary channels and M−N secondary channels and performing their reconstruction.
Computing in step S85 the inter-channel covariance matrix for the subspace of the auxiliary channels. This step S85 is possible since ΣX and the transformation obtained in the Step S82 are known.
Computing in step S86 the inverse KLT for the subspace of the auxiliary channels based on the inter-channel covariance matrix obtained in Step S85.
Transforming in step S87 the secondary channels reconstructed in Step S84 by means of the inverse KLT computed in Step S85 that yields a set of M−N auxiliary channels.
Computing in step S88 an up-mix using a transformation computed in Step S83 and the reconstructed primary channels obtained in Step S83 and the reconstructed auxiliary channels obtained in Step S87.
The application of the method according to the present disclosure can be illustrated by a numerical example in the case of quadrophonic sound. For a play-out setup as shown in
After selecting these vectors a first step of the encoding algorithm is completed. We assumed that the original input channels are provided in the following order FL, RL, FR, RL. In this example, we assume that the inter-channel covariance matrix ΣX for the considered signal has the form
Since the constrained part of the transformation is known the unconstrained part can be computed using the Gram-Schmidt procedure. The down-mix can look like the one given in (11).
The covariance matrix VTΣXV can be easily computed. A 2×2 block of the covariance matrix is of form
The KLT of [ΣV]2×2D takes the form
The adapted part Wx of the transformation matrix w can be computed from (9) yielding:
The final transformation for the down-mix WT takes the form:
The down-mix matrix given by (11) is provides a non-adaptive down-mixing method that provides a backward compatible stereo down-mix. The performance of such a down-mix evaluated by means of the coding gain G is 8.0. In the considered example, the proposed down-mixing method resulting in the backward-compatible down-mixing WT matrix given by equation (15) yields the coding gain of 26.6 which is a substantial improvement compared to the non-adaptive down-mixing method. One can verify the inter-channel covariance after applying the transformation (15), which is as follows:
It can be seen from (16) that the secondary channels have been mutually decorrelated.
In a possible embodiment in the case when the number of channels is large, the coding efficiency can be improved by using a signal adaptive downmix based on the Karhunen-Loeve-transformation KLT. The method according to the present disclosure facilitates a generation of the signal adaptive downmix that provides backward compatible downmix channels.
The method according to the present disclosure can be used in particular, when a downmix generates a set of backward compatible primary channels and a set of secondary channels. The method according to the present disclosure can be used for coding scenarios where the number of channels is large and where the number of backward compatible primary channels is low.
Depending on certain implementation requirements of the inventive methods, the inventive methods can be implemented in hardware or in software or in any combination thereof.
The implementations can be performed using a digital storage medium, in particular a floppy disc, CD, DVD or Blu-Ray disc, a ROM, a PROM, an EPROM, an EEPROM or a Flash memory having electronically readable control signals stored thereon which cooperate or are capable of cooperating with a programmable computer system such that an embodiment of at least one of the inventive methods is performed.
A further embodiment of the present disclosure is or comprises, therefore, a computer program product with a program code stored on a machine-readable carrier, the program code being operative for performing at least one of the inventive methods when the computer program product runs on a computer.
In other words, embodiments of the inventive methods are or comprise, therefore, a computer program having a program code for performing at least one of the inventive methods when the computer program runs on a computer, on a processor or the like.
A further embodiment of the present disclosure is or comprises, therefore, a machine-readable digital storage medium, comprising, stored thereon, the computer program operative for performing at least one of the inventive methods when the computer program product runs on a computer, on a processor or the like.
A further embodiment of the present disclosure is or comprises, therefore, a data stream or a sequence of signals representing the computer program operative for performing at least one of the inventive methods when the computer program product runs on a computer, on a processor or the like.
A further embodiment of the present disclosure is or comprises, therefore, a computer, processor or any other programmable logic device adapted to perform at least one of the inventive methods.
A further embodiment of the present disclosure is or comprises, therefore, a computer, processor or any other programmable logic device having stored thereon the computer program operative for performing at least one of the inventive methods when the computer program product runs on the computer, processor or the any other programmable logic device, e.g. a FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
While the aforegoing was particularly shown and described with reference to particular embodiments thereof, it is to be understood by those skilled in the art that various other changes in the form and details may be made, without departing from the spirit and scope thereof. It is therefore to be understood that various changes may be made in adapting to different embodiments without departing from the broader concept disclosed herein and comprehended by the claims that follow.
This application is a continuation of International Patent Application No. PCT/EP2012/052443, filed Feb. 14, 2012, which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5594800 | Gerzon | Jan 1997 | A |
7787631 | Faller | Aug 2010 | B2 |
8346564 | Hotho et al. | Jan 2013 | B2 |
8515759 | Engdegard et al. | Aug 2013 | B2 |
8654985 | Villemoes et al. | Feb 2014 | B2 |
20070233293 | Villemoes et al. | Oct 2007 | A1 |
20100324915 | Seo et al. | Dec 2010 | A1 |
20120269353 | Herre | Oct 2012 | A1 |
20140233762 | Vilkamo | Aug 2014 | A1 |
Number | Date | Country |
---|---|---|
1938760 | Mar 2007 | CN |
1853092 | Nov 2007 | EP |
2002241524 | Dec 2002 | JP |
2002541524 | Dec 2002 | JP |
2011008258 | Jan 2011 | JP |
20080103094 | Nov 2008 | KR |
0060746 | Oct 2000 | WO |
2005098824 | Oct 2005 | WO |
Entry |
---|
Yang et al., “Adaptive Karhunen-Loeve Transform for Enhanced Multichannel Audio Coding,” Proceedings of the SPIE, vol. 4475, pp. 43-54, The International Society for Optical Engineering, Bellingham, Washington (Dec. 5, 2001). |
Yang et al., “High-Fidelity Multichannel Audio Coding with Karhunen-Loeve Transform,” IEEE Transactions on Speech and Audio Processing, vol. 11, No. 4, pp. 365-380, Institute of Electrical and Electronics Engineers, New York, New York (Jul. 2003). |
Hotho et al., “A Backward-Compatible Multichannel Audio Codec,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, No. 1, pp. 83-93, Institute of Electrical and Electronics Engineers, New York, New York (Jan. 2008). |
Derrien et al., “A New Model-Based Algorithm for Optimizing the MPEG-AAC in MS-Stereo,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, No. 8, pp. 1373-1382, Institute of Electrical and Electronics Engineers, New York, New York (Nov. 2008). |
Torres-Guijarro et al., “Inter-channel de-correlation for perceptual audio coding,” Applied Acoustics, vol. 66, pp. 889-901, Elsevier, Amsterdam, Netherlands (2005). |
Briand et al., “Parametric Coding of Stereo Audio Based on Principal Component Analysis,” Proceedings of the 9th International Conference on Digital Audio Effects (DAFx-06), Montreal, Canada (Sep. 18-20, 2006). |
Briand et al., “Parametric Representation of Multichannel Audio Based on Principal Component Analysis,” Presented at the 120th AES Convention, Paris France, Journal of the Audio Engineering Society, pp. 1-13, New York, New York (May 20-23, 2006). |
Johnston, “Perceptual Transform Coding of Wideband Stereo Signals,” 1989 International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1993-1996, Institute of Electrical and Electronics Engineers, New York, New York (May 23-26, 1989). |
Yang et al., “Progressive Syntax-Rich Coding of Multichannel Audio Sources,” EURASIP Journal on Applied Signal Processing, vol. 10, pp. 980-992, Hindawi Publishing Corporation, Cairo, Egypt (2003). |
Number | Date | Country | |
---|---|---|---|
20140355767 A1 | Dec 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2012/052443 | Feb 2012 | US |
Child | 14460074 | US |