The invention disclosed herein generally relates to encoding and decoding of audio signals, and in particular to parametric reconstruction of a multichannel audio signal from a downmix signal and associated metadata.
Audio playback systems comprising multiple loudspeakers are frequently used to reproduce an audio scene represented by a multichannel audio signal, wherein the respective channels of the multichannel audio signal are played back on respective loudspeakers. The multichannel audio signal may for example have been recorded via a plurality of acoustic transducers or may have been generated by audio authoring equipment. In many situations, there are bandwidth limitations for transmitting the audio signal to the playback equipment and/or limited space for storing the audio signal in a computer memory or on a portable storage device. There exist audio coding systems for parametric coding of audio signals, so as to reduce the bandwidth or storage size needed. On an encoder side, these systems typically downmix the multichannel audio signal into a downmix signal, which typically is a mono (one channel) or a stereo (two channels) downmix, and extract side information describing the properties of the channels by means of parameters like level differences and cross-correlation. The downmix and the side information are then encoded and sent to a decoder side. On the decoder side, the multichannel audio signal is reconstructed, i.e. approximated, from the downmix under control of the parameters of the side information.
In view of the wide range of different types of devices and systems available for playback of multichannel audio content, including an emerging segment aimed at end-users in their homes, there is a need for new and alternative ways to efficiently encode multichannel audio content, so as to reduce bandwidth requirements and/or the required memory size for storage, and/or to facilitate reconstruction of the multichannel audio signal at a decoder side.
In what follows, example embodiments will be described in greater detail and with reference to the accompanying drawings, on which:
All the figures are schematic and generally only show parts which are necessary in order to elucidate the invention, whereas other parts may be omitted or merely suggested.
As used herein, an audio signal may be a pure audio signal, an audio part of an audiovisual signal or multimedia signal or any of these in combination with metadata.
As used herein, a channel is an audio signal associated with a predefined/fixed spatial position/orientation or an undefined spatial position such as “left” or “right”.
According to a first aspect, example embodiments propose audio decoding systems as well as methods and computer program products for reconstructing an audio signal. The proposed decoding systems, methods and computer program products, according to the first aspect, may generally share the same features and advantages.
According to example embodiments, there is provided a method for reconstructing an N-channel audio signal, wherein N≥3. The method comprises receiving a single-channel downmix signal, or a channel of a multichannel downmix signal carrying data for reconstruction of more audio signals, together with associated dry and wet upmix parameters; computing a first signal with a plurality of (N) channels, referred to as a dry upmix signal, as a linear mapping of the downmix signal, wherein a set of dry upmix coefficients is applied to the downmix signal as part of computing the dry upmix signal; generating an (N−1)-channel decorrelated signal based on the downmix signal; computing a further signal with a plurality of (N) channels, referred to as a wet upmix signal, as a linear mapping of the decorrelated signal, wherein a set of wet upmix coefficients is applied to the channels of the decorrelated signal as part of computing the wet upmix signal; and combining the dry and wet upmix signals to obtain a multidimensional reconstructed signal corresponding to the N-channel audio signal to be reconstructed. The method further comprises determining the set of dry upmix coefficients based on the received dry upmix parameters; populating an intermediate matrix having more elements than the number of received wet upmix parameters, based on the received wet upmix parameters and knowing that the intermediate matrix belongs to a predefined matrix class; and obtaining the set of wet upmix coefficients by multiplying the intermediate matrix by a predefined matrix, wherein the set of wet upmix coefficients corresponds to the matrix resulting from the multiplication and includes more coefficients than the number of elements in the intermediate matrix.
In this example embodiment, the number of wet upmix coefficients employed for reconstructing the N-channel audio signal is larger than the number of received wet upmix parameters. By exploiting knowledge of the predefined matrix and the predefined matrix class to obtain the wet upmix coefficients from the received wet upmix parameters, the amount of information needed to enable reconstruction of the N-channel audio signal may be reduced, allowing for a reduction of the amount of metadata transmitted together with the downmix signal from an encoder side. By reducing the amount of data needed for parametric reconstruction, the required bandwidth for transmission of a parametric representation of the N-channel audio signal, and/or the required memory size for storing such a representation, may be reduced.
The (N−1)-channel decorrelated signal serves to increase the dimensionality of the content of the reconstructed N-channel audio signal, as perceived by a listener. The channels of the (N−1)-channel decorrelated signal may have at least approximately the same spectrum as the single-channel downmix signal, or may have spectra corresponding to rescaled/normalized versions of the spectrum of the single-channel downmix signal, and may form, together with the single-channel downmix signal, N at least approximately mutually uncorrelated channels. In order to provide a faithful reconstruction of the channels of the N-channel audio signal, each of the channels of the decorrelated signal preferably has such properties that it is perceived by a listener as similar to the downmix signal. Hence, although it is possible to synthesize mutually uncorrelated signals with a given spectrum from e.g. white noise, the channels of the decorrelated signal are preferably derived by processing the downmix signal, e.g. including applying respective all-pass filters to the downmix signal or recombining portions of the downmix signal, so as to preserve as many properties as possible, especially locally stationary properties, of the downmix signal, including relatively more subtle, psycho-acoustically conditioned properties of the downmix signal, such as timbre.
Combining the wet and dry upmix signals may include adding audio content from respective channels of the wet upmix signal to audio content of the respective corresponding channels of the dry upmix signal, such as additive mixing on a per-sample or per-transform-coefficient basis.
The predefined matrix class may be associated with known properties of at least some matrix elements which are valid for all matrices in the class, such as certain relationships between some of the matrix elements, or some matrix elements being zero. Knowledge of these properties allows for populating the intermediate matrix based on fewer wet upmix parameters than the full number of matrix elements in the intermediate matrix. The decoder side has knowledge at least of the properties of, and relationships between, the elements it needs to compute all matrix elements on the basis of the fewer wet upmix parameters.
By the dry upmix signal being a linear mapping of the downmix signal is meant that the dry upmix signal is obtained by applying a first linear transformation to the downmix signal. This first transformation takes one channel as input and provides N channels as output, and the dry upmix coefficients are coefficients defining the quantitative properties of this first linear transformation.
By the wet upmix signal being a linear mapping of the decorrelated signal is meant that the wet upmix signal is obtained by applying a second linear transformation to the decorrelated signal. This second transformation takes N−1 channels as input and provides N channels as output, and the wet upmix coefficients are coefficients defining the quantitative properties of this second linear transformation.
In an example embodiment, receiving the wet upmix parameters may include receiving N(N−1)/2 wet upmix parameters. In the present example embodiment, populating the intermediate matrix may include obtaining values for (N−1)2 matrix elements based on the received N(N−1)/2 wet upmix parameters and knowing that the intermediate matrix belongs to the predefined matrix class. This may include inserting the values of the wet upmix parameters immediately as matrix elements, or processing the wet upmix parameters in a suitable manner for deriving values for the matrix elements. In the present example embodiment, the predefined matrix may include N(N−1) elements, and the set of wet upmix coefficients may include N(N−1) coefficients. For example, receiving the wet upmix parameters may include receiving no more than N(N−1)/2 independently assignable wet upmix parameters and/or the number of received wet upmix parameters may be no more than half the number of wet upmix coefficients employed for reconstructing the N-channel audio signal.
It is to be understood that omitting a contribution from a channel of the decorrelated signal when forming a channel of the wet upmix signal as a linear mapping of the channels of the decorrelated signal corresponds to applying a coefficient with the value zero to that channel, i.e. omitting a contribution from a channel does not affect the number of coefficients applied as part of the linear mapping.
In an example embodiment, populating the intermediate matrix may include employing the received wet upmix parameters as elements in the intermediate matrix. Since the received wet upmix parameters are employed as elements in the intermediate matrix without being processed any further, the complexity of the computations required for populating the intermediate matrix, and to obtain the upmix coefficients may be reduced, allowing for a computationally more efficient reconstruction of the N-channel audio signal.
In an example embodiment, receiving the dry upmix parameters may include receiving (N−1) dry upmix parameters. In the present example embodiment, the set of dry upmix coefficients may include N coefficients, and the set of dry upmix coefficients is determined based on the received (N−1) dry upmix parameters and based on a predefined relation between the coefficients in the set of dry upmix coefficients. For example, receiving the dry upmix parameters may include receiving no more than (N−1) independently assignable dry upmix parameters. For example, the downmix signal may be obtainable, according to a predefined rule, as a linear mapping of the N-channel audio signal to be reconstructed, and the predefined relation between the dry upmix coefficients may be based on the predefined rule.
In an example embodiment, the predefined matrix class may be one of: lower or upper triangular matrices, wherein known properties of all matrices in the class include predefined matrix elements being zero; symmetric matrices, wherein known properties of all matrices in the class include predefined matrix elements (on either side of the main diagonal) being equal; and products of an orthogonal matrix and a diagonal matrix, wherein known properties of all matrices in the class include known relations between predefined matrix elements. In other words, the predefined matrix class may be the class of lower triangular matrices, the class of upper triangular matrices, the class of symmetric matrices or the class of products of an orthogonal matrix and a diagonal matrix. A common property of each of the above classes is that its dimensionality is less than the full number of matrix elements.
In an example embodiment, the downmix signal may be obtainable, according to a predefined rule, as a linear mapping of the N-channel audio signal to be reconstructed. In the present example embodiment, the predefined rule may define a predefined downmix operation, and the predefined matrix may be based on vectors spanning the kernel space of the predefined downmix operation. For example, the rows or columns of the predefined matrix may be vectors forming a basis, e.g. an orthonormal basis, for the kernel space of the predefined downmix operation.
In an example embodiment, receiving the single-channel downmix signal together with associated dry and wet upmix parameters may include receiving a time segment or time/frequency tile of the downmix signal together with dry and wet upmix parameters associated with that time segment or time/frequency tile. In the present example embodiment, the multidimensional reconstructed signal may correspond to a time segment or time/frequency tile of the N-channel audio signal to be reconstructed. In other words, the reconstruction of the N-channel audio signal may in at least some example embodiments be performed one time segment or time/frequency tile at a time. Audio encoding/decoding systems typically divide the time-frequency space into time/frequency tiles, e.g. by applying suitable filter banks to the input audio signals. By a time/frequency tile is generally meant a portion of the time-frequency space corresponding to a time interval/segment and a frequency sub-band.
According to example embodiments, there is provided an audio decoding system comprising a first parametric reconstruction section configured to reconstruct an N-channel audio signal based on a first single-channel downmix signal and associated dry and wet upmix parameters, wherein N≥3. The first parametric reconstruction section comprises a first decorrelating section configured to receive the first downmix signal and to output, based thereon, a first N−1-channel decorrelated signal. The first parametric reconstruction section also comprises a first dry upmix section configured to: receive the dry upmix parameters and the downmix signal; determine a first set of dry upmix coefficients based on the dry upmix parameters; and output a first dry upmix signal computed by mapping the first downmix signal linearly in accordance with the first set of dry upmix coefficients. In other words, the channels of the first dry upmix signal are obtained by multiplying the single-channel downmix signal by respective coefficients, which may be the dry upmix coefficients themselves, or which may be coefficients controllable via the dry upmix coefficients. The first parametric reconstruction section further comprises a first wet upmix section configured to: receive the wet upmix parameters and the first decorrelated signal; populate a first intermediate matrix having more elements than the number of received wet upmix parameters, based on the received wet upmix parameters and knowing that the first intermediate matrix belongs to a first predefined matrix class, i.e. by employing properties of certain matrix elements known to hold for all matrices in the predefined matrix class; obtain a first set of wet upmix coefficients by multiplying the first intermediate matrix by a first predefined matrix, wherein the first set of wet upmix coefficients corresponds to the matrix resulting from the multiplication and includes more coefficients than the number of elements in the first intermediate matrix; and output a first wet upmix signal computed by mapping the first decorrelated signal linearly in accordance with the first set of wet upmix coefficients, i.e. by forming linear combinations of the channels of the decorrelated signal employing the wet upmix coefficients. The first parametric reconstruction section also comprises a first combining section configured to receive the first dry upmix signal and the first wet upmix signal and to combine these signals to obtain a first multidimensional reconstructed signal corresponding to the N-dimensional audio signal to be reconstructed.
In an example embodiment, the audio decoding system may further comprise a second parametric reconstruction section operable independently of the first parametric reconstruction section and configured to reconstruct an N2-channel audio signal based on a second single-channel downmix signal and associated dry and wet upmix parameters, wherein N2≥2. It may for example hold that N2 '2 2 or that N2≥3. In the present example embodiment, the second parametric reconstruction section may comprise a second decorrelating section, a second dry upmix section, a second wet upmix section and a second combining section, and the sections of the second parametric reconstruction section may be configured analogously to the corresponding sections of the first parametric reconstruction section. In the present example embodiment, the second wet upmix section may be configured to employ a second intermediate matrix belonging to a second predefined matrix class and a second predefined matrix. The second predefined matrix class and the second predefined matrix may be different than, or equal to, the first predefined matrix class and the first predefined matrix, respectively.
In an example embodiment, the audio decoding system may be adapted to reconstruct a multichannel audio signal based on a plurality of downmix channels and associated dry and wet upmix parameters. In the present example embodiment, the audio decoding system may comprise: a plurality of reconstruction sections, including parametric reconstruction sections operable to independently reconstruct respective sets of audio signal channels based on respective downmix channels and respective associated dry and wet upmix parameters; and a control section configured to receive signaling indicating a coding format of the multichannel audio signal corresponding to a partition of the channels of the multichannel audio signal into sets of channels represented by the respective downmix channels and, for at least some of the downmix channels, by respective associated dry and wet upmix parameters. In the present example embodiment, the coding format may further correspond to a set of predefined matrices for obtaining wet upmix coefficients associated with at least some of the respective sets of channels based on the respective wet upmix parameters. Optionally, the coding format may further correspond to a set of predefined matrix classes indicating how respective intermediate matrices are to be populated based on the respective sets of wet upmix parameters.
In the present example embodiment, the decoding system may be configured to reconstruct the multichannel audio signal using a first subset of the plurality of reconstruction sections, in response to the received signaling indicating a first coding format. In the present example embodiment, the decoding system may be configured to reconstruct the multichannel audio signal using a second subset of the plurality of reconstruction sections, in response to the received signaling indicating a second coding format, and at least one of the first and second subsets of the reconstruction sections may comprise the first parametric reconstruction section.
Depending on the composition of the audio content of the multichannel audio signal, the available bandwidth for transmission from an encoder side to a decoder side, the required playback quality as perceived by a listener and/or the required fidelity of the audio signal as reconstructed on a decoder side, the most appropriate coding format may differ between different applications and/or time periods. By supporting multiple coding formats for the multichannel audio signal, the audio decoding system in the present example embodiment allows an encoder side to employ a coding format more specifically suited for the current circumstances.
In an example embodiment, the plurality of reconstruction sections may include a single-channel reconstruction section operable to independently reconstruct a single audio channel based on a downmix channel in which no more than a single audio channel has been encoded. In the present example embodiment, at least one of the first and second subsets of the reconstruction sections may comprise the single-channel reconstruction section. Some channels of the multichannel audio signal may be particularly important for the overall impression of the multichannel audio signal, as perceived by a listener. By employing the single-channel reconstruction section to encode e.g. such a channel separately in its own downmix channel, while other channels are parametrically encoded together in other downmix channels, the fidelity of the multichannel audio signal as reconstructed may be increased. In some example embodiments, the audio content of one channel of the multichannel audio signal may be of a different type than the audio content of the other channels of the multichannel audio signal, and the fidelity of the multichannel audio signal as reconstructed may be increased by employing a coding format in which that channel is encoded separately in a downmix channel of its own.
In an example embodiment, the first coding format may correspond to reconstruction of the multichannel audio signal from a lower number of downmix channels than the second coding format. By employing a lower number of downmix channels, the required bandwidth for transmission from an encoder side to a decoder side may be reduced. By employing a higher number of downmix channels, the fidelity and/or the perceived audio quality of the multichannel audio signal as reconstructed may be increased.
According to a second aspect, example embodiments propose audio encoding systems as well as methods and computer program products for encoding a multichannel audio signal. The proposed encoding systems, methods and computer program products, according to the second aspect, may generally share the same features and advantages. Moreover, advantages presented above for features of decoding systems, methods and computer program products, according to the first aspect, may generally be valid for the corresponding features of encoding systems, methods and computer program products according to the second aspect.
According to example embodiments, there is provided a method for encoding an N-channel audio signal as a single-channel downmix signal and metadata suitable for parametric reconstruction of the audio signal from the downmix signal and an (N−1)-channel decorrelated signal determined based on the downmix signal, wherein N≥3. The method comprises: receiving the audio signal; computing, according to a predefined rule, the single-channel downmix signal as a linear mapping of the audio signal; and determining a set of dry upmix coefficients in order to define a linear mapping of the downmix signal approximating the audio signal, e.g. via a minimum mean square error approximation under the assumption that only the downmix signal is available for the reconstruction. The method further comprises determining an intermediate matrix based on a difference between a covariance of the audio signal as received and a covariance of the audio signal as approximated by the linear mapping of the downmix signal, wherein the intermediate matrix when multiplied by a predefined matrix corresponds to a set of wet upmix coefficients defining a linear mapping of the decorrelated signal as part of parametric reconstruction of the audio signal, and wherein the set of wet upmix coefficients includes more coefficients than the number of elements in the intermediate matrix. The method further comprises outputting the downmix signal together with dry upmix parameters, from which the set of dry upmix coefficients is derivable, and wet upmix parameters, wherein the intermediate matrix has more elements than the number of output wet upmix parameters, and wherein the intermediate matrix is uniquely defined by the output wet upmix parameters provided that the intermediate matrix belongs to a predefined matrix class.
A parametric reconstruction copy of the audio signal at a decoder side includes, as one contribution, a dry upmix signal formed by the linear mapping of the downmix signal and, as a further contribution, a wet upmix signal formed by the linear mapping of the decorrelated signal. The set of dry upmix coefficients defines the linear mapping of the downmix signal and the set of wet upmix coefficients defines the linear mapping of the decorrelated signals. By outputting wet upmix parameters which are fewer than the number of wet upmix coefficients, and from which the wet upmix coefficients are derivable based on the predefined matrix and the predefined matrix class, the amount of information sent to a decoder side to enable reconstruction of the N-channel audio signal may be reduced. By reducing the amount of data needed for parametric reconstruction, the required bandwidth for transmission of a parametric representation of the N-channel audio signal, and/or the required memory size for storing such a representation, may be reduced.
The intermediate matrix may be determined based on the difference between the covariance of the audio signal as received and the covariance of the audio signal as approximated by the linear mapping of the downmix signal, e.g. for a covariance of the signal obtained by the linear mapping of the decorrelated signal to supplement the covariance of the audio signal as approximated by the linear mapping of the downmix signal.
In an example embodiment, determining the intermediate matrix may include determining the intermediate matrix such that a covariance of the signal obtained by the linear mapping of the decorrelated signal, defined by the set of wet upmix coefficients, approximates, or substantially coincides with, the difference between the covariance of the audio signal as received and the covariance of the audio signal as approximated by the linear mapping of the downmix signal. In other words, the intermediate matrix may be determined such that a reconstruction copy of the audio signal, obtained as a sum of a dry upmix signal formed by the linear mapping of the downmix signal and a wet upmix signal formed by the linear mapping of the decorrelated signal completely, or at least approximately, reinstates the covariance of the audio signal as received.
In an example embodiment, outputting the wet upmix parameters may include outputting no more than N(N−1)/2 independently assignable wet upmix parameters. In the present example embodiment, the intermediate matrix may have (N−1)2 matrix elements and may be uniquely defined by the output wet upmix parameters provided that the intermediate matrix belongs to the predefined matrix class. In the present example embodiment, the set of wet upmix coefficients may include N(N−1) coefficients.
In an example embodiment, the set of dry upmix coefficients may include N coefficients. In the present example embodiments, outputting the dry upmix parameters may include outputting no more than N−1 dry upmix parameters, and the set of dry upmix coefficients may be derivable from the N−1 dry upmix parameters using the predefined rule.
In an example embodiment, the determined set of dry upmix coefficients may define a linear mapping of the downmix signal corresponding to a minimum mean square error approximation of the audio signal, i.e. among the set of linear mappings of the downmix signal, the determined set of dry upmix coefficients may define the linear mapping which best approximates the audio signal in a minimum mean square sense.
According to example embodiments, there is provided an audio encoding system comprising a parametric encoding section configured to encode an N-channel audio signal as a single-channel downmix signal and metadata suitable for parametric reconstruction of the audio signal from the downmix signal and an (N−1)-channel decorrelated signal determined based on the downmix signal, wherein N≥3. The parametric encoding section comprises: a downmix section configured to receive the audio signal and to compute, according to a predefined rule, the single-channel downmix signal as a linear mapping of the audio signal; and a first analyzing section configured to determine a set of dry upmix coefficients in order to define a linear mapping of the downmix signal approximating the audio signal. The parametric encoding section further comprises a second analyzing section configured to determine an intermediate matrix based on a difference between a covariance of the audio signal as received and a covariance of the audio signal as approximated by the linear mapping of the downmix signal, wherein the intermediate matrix when multiplied by a predefined matrix corresponds to a set of wet upmix coefficients defining a linear mapping of the decorrelated signal as part of parametric reconstruction of the audio signal, wherein the set of wet upmix coefficients includes more coefficients than the number of elements in the intermediate matrix. The parametric encoding section is further configured to output the downmix signal together with dry upmix parameters, from which the set of dry upmix coefficients is derivable, and wet upmix parameters, wherein the intermediate matrix has more elements than the number of output wet upmix parameters, and wherein the intermediate matrix is uniquely defined by the output wet upmix parameters provided that the intermediate matrix belongs to a predefined matrix class.
In an example embodiment, the audio encoding system may be configured to provide a representation of a multichannel audio signal in the form of a plurality of downmix channels and associated dry and wet upmix parameters. In the present example embodiment, the audio encoding system may comprise: a plurality of encoding sections, including parametric encoding sections operable to independently compute respective downmix channels and respective associated upmix parameters based on respective sets of audio signal channels. In the present example embodiment, the audio encoding system may further comprise a control section configured to determine a coding format for the multichannel audio signal corresponding to a partition of the channels of the multichannel audio signal into sets of channels to be represented by the respective downmix channels and, for at least some of the downmix channels, by respective associated dry and wet upmix parameters. In the present example embodiment, the coding format may further correspond to a set of predefined rules for computing at least some of the respective downmix channels. In the present example embodiment, the audio encoding system may be configured to encode the multichannel audio signal using a first subset of the plurality of encoding sections, in response to the determined coding format being a first coding format. In the present example embodiment, the audio encoding system may be configured to encode the multichannel audio signal using a second subset of the plurality of encoding sections, in response to the determined coding format being a second coding format, and at least one of the first and second subsets of the encoding sections may comprise the first parametric encoding section. In the present example embodiment, the control section may for example determine the coding format based on an available bandwidth for transmitting an encoded version of the multichannel audio signal to a decoder side, based on the audio content of the channels of the multichannel audio signal and/or based on an input signal indicating a desired coding format.
In an example embodiment, the plurality of encoding sections may include a single-channel encoding section operable to independently encode no more than a single audio channel in a downmix channel, and at least one of the first and second subsets of the encoding sections may comprise the single-channel encoding section.
According to example embodiments, there is provided a computer program product comprising a computer-readable medium with instructions for performing any of the methods of the first and second aspects.
According to example embodiments, it may hold that N=3 or N=4 in any of the methods, encoding systems, decoding systems and computer program products of the first and second aspects.
Further example embodiments are defined in the dependent claims. It is noted that example embodiments include all combinations of features, even if recited in mutually different claims.
On an encoder side, which will be described with reference to
where dn, n=1, . . . ,N, are downmix coefficients represented by a downmix matrix D. On a decoder side, which will be described with reference to
where cn, n=1, . . . , N, are dry upmix coefficients represented by a matrix dry upmix matrix C, pn,k, n=1, . . . ,N, k=1, . . . N−1, are wet upmix coefficients represented by a wet upmix matrix P, and zk,k=1, . . . , N−1 are the channels of an (N−1)-channel decorrelated signal Z generated based on the downmix signal Y. If the channels of each audio signal are represented as rows, the covariance matrix of the original audio signal X may be expressed as R=XXT, and the covariance matrix of the audio signal as reconstructed {circumflex over (X)} may be expressed as R={circumflex over (X)}{circumflex over (X)}T. It is to be noted that if for example the audio signals are represented as rows comprising complex-valued transform coefficients, the real part of XX*, where X* is the complex conjugate transpose of the matrix X, may for example be considered instead of XXT.
In order to provide a faithful reconstruction of the original audio signal X, it may be advantageous for the reconstruction given by equation (2) to reinstate full covariance, i.e., it may be advantageous to employ dry and wet upmix matrices C and P such that
R={circumflex over (R)}. (3)
One approach is to first find a dry upmix matrix C giving the best possible “dry” upmix {circumflex over (X)}0=CY in the least squares sense, by solving the normal equations
CYYT=XYT. (4)
{circumflex over (X)}0=CY, with a matrix C solving equation (4), it holds that
R={circumflex over (X)}0{circumflex over (X)}0T+({circumflex over (X)}0−X)({circumflex over (X)}0−X)T=R0+ΔR. (5)
Assuming that the channels of the decorrelated signal Z are mutually uncorrelated and all have the same energy ∥Y∥2 equal to that of the single-channel downmix signal Y, the positive definite missing covariance ΔR can be factorized according to
ΔR=PPT∥Y∥2. (6)
Full covariance may be reinstated according to equation (3) by employing a dry upmix matrix C solving equation (4) and a wet upmix matrix P solving equation (6). Equations (1) and (4) imply that DCYYT=YYT, and thereby that
Σn=1Ndncn=DC=1, (7)
for non-degenerate downmix matrices D. Equations (5) and (7) imply that D(X0−X)=DCY−Y=0 and
DΔR=0. (8)
Hence, the missing covariance ΔR has rank N−1, and may indeed be provided by employing a decorrelated signal Z with N−1 mutually uncorrelated channels. Equation (6) and (8) imply that DP=0, so that the columns of the wet upmix matrix P solving equation (6) can be constructed from vectors spanning the kernel space of the downmix matrix D. The computations for finding a suitable wet upmix matrix P may therefore be moved to that lower-dimensional space.
Let V be a matrix of size N(N−1) containing an orthonormal basis for the kernel space of the downmix matrix D, i.e. a linear space of vectors v with Dv=0. Examples of such predefined matrixes V for N=2, N=3, and N=4, respectively, are
In the basis given by V, the missing covariance can be expressed as Rv=VT(ΔR)V. To find a wet upmix matrix P solving equation (6) one may therefore first find a matrix H by solving Rv=HHT, and then obtain P as P=VH/∥Y∥, where ∥Y∥ is the square root of the energy of the single-channel downmix signal Y. Other suitable upmix matrices P may be obtained as P=VHO/∥Y∥, where O is an orthogonal matrix. Alternatively, one may rescale the missing covariance Rv by the energy ∥Y∥2 of the single-channel downmix signal Y and instead solve the equation
where H=HR∥Y∥, and obtain P as
P=VHR. (11)
When the entries of HR are quantized and the desired output has a silent channel, the properties of the predefined matrix V as stated above may be inconvenient. As an example, for N=3, a better choice for the second matrix of (9) would be
Fortunately, the requirement that the columns of the matrix V are pairwise orthogonal can be dropped as long as these columns are linearly independent. The desired solution Rv to ΔR=VRvVT is then obtained by Rv=WT(ΔR)W with =V(VTV)−1, the pseudoinverse of V.
The matrix Rv is a positive semi-definite matrix of size (N−1)2 and there are several approaches to finding solutions to equation (10), leading to solutions within respective matrix classes of dimension N(N−1)/2, i.e. in which the matrices are uniquely defined by N(N−1)/2 matrix elements. Solutions may for example be obtained by employing:
It is to be understood that, in the example embodiments described with reference to
In
The total number of downmix channels employed in
According to example embodiments, the audio encoding system 400 described with reference to
According to example embodiments, the audio decoding system 200 described with reference to
Further embodiments of the present disclosure will become apparent to a person skilled in the art after studying the description above. Even though the present description and drawings disclose embodiments and examples, the disclosure is not restricted to these specific examples. Numerous modifications and variations can be made without departing from the scope of the present disclosure, which is defined by the accompanying claims. Any reference signs appearing in the claims are not to be understood as limiting their scope.
Additionally, variations to the disclosed embodiments can be understood and effected by the skilled person in practicing the disclosure, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
The devices and methods disclosed hereinabove may be implemented as software, firmware, hardware or a combination thereof. In a hardware implementation, the division of tasks between functional units referred to in the above description does not necessarily correspond to the division into physical units; to the contrary, one physical component may have multiple functionalities, and one task may be carried out by several physical components in cooperation. Certain components or all components may be implemented as software executed by a digital signal processor or microprocessor, or be implemented as hardware or as an application-specific integrated circuit. Such software may be distributed on computer readable media, which may comprise computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to a person skilled in the art, the term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Further, it is well known to the skilled person that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
This application is a continuation of U.S. patent application Ser. No. 16/842,212 filed on Apr. 7, 2020, which is a continuation of U.S. patent application Ser. No. 16/363,099 filed on Mar. 25, 2019 (now U.S. Pat. No. 10,614,825 issued on Apr. 7, 2020), which is a continuation and claims the benefit of priority to U.S. patent application Ser. No. 15/985,635 filed on May 21, 2018 (now U.S. Pat. No. 10,242,685 issued on Mar. 26, 2019), which is a divisional and claims priority to Ser. No. 15/031,130 filed on Apr. 21, 2016 (now U.S. Pat. No. 9,978,385 issued on May 22, 2018), which is the U.S. National Stage Entry of International Patent Application No. PCT/EP2014/072570 filed Oct. 21, 2014, which claims the benefit of priority to U.S. Provisional Patent Application No. 61/893,770 filed 21 Oct. 2013; U.S. Provisional Patent Application No. 61/974,544 filed 3 Apr. 2014; and U.S. Provisional Patent Application No. 62/037,693 filed 15 Aug. 2014, each of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61893770 | Oct 2013 | US | |
61974544 | Apr 2014 | US | |
62037693 | Aug 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15031130 | Apr 2016 | US |
Child | 15985635 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16842212 | Apr 2020 | US |
Child | 17946060 | US | |
Parent | 16363099 | Mar 2019 | US |
Child | 16842212 | US | |
Parent | 15985635 | May 2018 | US |
Child | 16363099 | US |