Reducing the number of channels is essential for achieving multichannel coding at low bitrates. For example, parametric stereo coding schemes are based on an appropriate mono downmix from the left and right input channels. The so-obtained mono signal is to be encoded and transmitted by the mono codec along with side-information describing in a parametric form the auditory scene. The side information usually consists of several spatial parameters per frequency sub-band. They could include for example:
However, a downmix processing is prone to create signal cancellation and coloration due to inter-channel phase misalignment, which leads to undesired quality degradations. As an example, if the channels are coherent and near out-of-phase, the downmix signal is likely to show perceivable spectral bias, such as the characteristics of a comb-filter.
The downmix operation can be performed in time domain simply by a sum of the left and right channels, as expressed by
m[n]=w1l[n]+w2r[n],
where l[n] and r[n] are the left and right channels, n is the time index, and w1[n] and w2[n] are weights that determined the mixing. If the weights are constant over time, we speak about passive downmix. It has the disadvantage to be regardless of the input signal and the quality of the obtained downmix signal is highly dependent on input signal characteristics. Adapting the weight over time can reduce this problem to some extent.
However, for solving the main issues, an active downmix is usually performed in the frequency domain using for example a Short-Term Fourier Transform (STFT). Thereby the weights can be made dependent of the frequency index k and time index n and can fit better to the signal characteristics. The downmix signal is then expressed as:
M[k,n]=W1[k,n]L[k,n]+W2[k,n]R[k,n]
where M[k,n], L[k,n] and R[k,n] are the STFT components of the downmix signal, the left channel and the right channel, respectively, at frequency index k and time index n. The weights W1[k,n] and W2[k,n] can be adaptively adjusted in time and in frequency. It aims at preserving the average energy or amplitude of the two input channels by minimizing spectral bias caused by comb filtering effects.
The most straightforward method for active downmixing is to equalize the energy of the downmix signal to yield for each frequency bin or sub-band the average energy of the two input channels [1]. The downmix signal as shown in
Such straight forward solution has several shortcomings. First, the downmix signal is undefined when the two channels have phase inverted time-frequency components of equal amplitude (ILD=0 db and IPD=pi). This singularity results from the denominator becoming zero in this case. The output of a simple active downmixing is in this case unpredictable. This behavior is shown in
For ILD=0 dB, the sum of the two channels is discontinuous at IPD=pi resulting in a step of pi radian. In other conditions, the phase evolves regularly and continuously in modulo 2pi.
The second nature of problems comes from the important variance of the normalization gains for achieving such an energy-equalization. Indeed the normalization gains can fluctuate drastically from frame to frame and between adjacent frequency sub-bands. It leads to an unnatural coloration of the downmix signal and to block effects. The usage of synthesis windows for the STFT and the overlap-add method result in smoothed transitions between processed audio frames. However, a great change in the normalization gains between sequential frames can still lead to audible transition artefacts. Moreover, this drastic equalization can also leads to audible artefacts due to aliasing from the frequency response side lobes of the analysis window of the block transform.
As an alternative, the active downmix can be achieved by performing a phase alignment of the two channels before computing the sum-signal [2-4]. The energy-equalization to be done on the new sum signal is then limited, since the two channels are already in-phase before summing them up. In [2], the phase of the left channel is used as reference for aligning the two channels in phase. If the phases of the left channels are not well conditioned (e.g. zero or low-level noise channel), the downmix signal is directly affected. In [3], this important issue is solved by taking as reference the phase of the sum signal before rotation. Still the singularity problem at ILD=0 dB and IPD=pi is not treated. For this reason, [4] amends the approach by using a broadband phase difference parameter in order to improve stability in such a case. Nonetheless, none of these approaches considered the second nature of problem related to the instability. The phase rotation of the channels can also lead to an unnatural mixing of the input channels and can create severe instabilities and block effects especially when great changes happen in the processing over time and frequency.
Finally, there are more evolved techniques like [5] and [6], which are based on the observations that the signal cancellation during downmixing occurs only on time-frequency components which are coherent between the two channels. In [5], the coherent components are filtered out before summing-up incoherent parts of the input channels. In [6], the phase alignment is only computed for the coherent components before summing up the channels. Moreover, the phase alignment is regularized over time and frequency for avoiding problems of stability and discontinuity. Both techniques are computationally demanding since in [5] filter coefficients need to be identified at every frame and in [6] a covariance matrix between the channels has to be computed.
According to an embodiment, a downmixer for downmixing at least two channels of a multichannel signal having the two or more channels may have: a processor for calculating a partial downmix signal from the at least two channels; a complementary signal calculator for calculating a complementary signal from the multichannel signal, the complementary signal being different from the partial downmix signal; and an adder for adding the partial downmix signal and the complementary signal to obtain a downmix signal of the multichannel signal.
According to another embodiment, a method for downmixing at least two channels of a multichannel signal having the two or more channels may have the steps of: calculating a partial downmix signal from the at least two channels; calculating a complementary signal from the multichannel signal, the complementary signal being different from the partial downmix signal; and adding the partial downmix signal and the complementary signal to obtain a downmix signal of the multichannel signal.
According to another embodiment, a multichannel encoder may have: a parameter calculator for calculating multichannel parameters from at least two channels of a multichannel signal having the two or more than two channels, and an inventive downmixer; and an output interface for outputting or storing an encoded multichannel signal including the one or more downmix channels and/or the multichannel parameters.
According to another embodiment, a method for encoding a multichannel signal may have the steps of: calculating multichannel parameters from at least two channels of a multichannel signal having the two or more than two channels; and inventive downmixing; and outputting or storing an encoded multichannel signal including the one or more downmix channels and the multichannel parameters.
According to another embodiment, an audio processing system may have: an inventive multichannel encoder for generating an encoded multichannel signal; and a multichannel decoder for decoding the encoded multichannel signal to obtain a reconstructed audio signal.
According to another embodiment, a method of processing an audio signal may have the steps of: inventive multichannel encoding; and multichannel decoding an encoded multichannel signal to obtain a reconstructed audio signal.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method for downmixing at least two channels of a multichannel signal having the two or more channels, including: calculating a partial downmix signal from the at least two channels; calculating a complementary signal from the multichannel signal, the complementary signal being different from the partial downmix signal; and adding the partial downmix signal and the complementary signal to obtain a downmix signal of the multichannel signal, when said computer program is run by a computer.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method for encoding a multichannel signal, including: calculating multichannel parameters from at least two channels of a multichannel signal having the two or more than two channels; and inventive downmixing; and outputting or storing an encoded multichannel signal including the one or more downmix channels and the multichannel parameters, when said computer program is run by a computer.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method of processing an audio signal, including: Inventive multichannel encoding; and multichannel decoding an encoded multichannel signal to obtain a reconstructed audio signal, when said computer program is run by a computer.
The present invention is based on the finding that a downmixer for downmixing at least two channel of a multichannel signal having the two or more channels not only performs an addition of the at least two channels for calculating a downmix signal from the at least two channels, but the downmixer additionally comprises a complementary signal calculator for calculating a complementary signal from the multichannel signal, wherein the complementary signal is different from the partial downmix signal. Furthermore, the downmixer comprises an adder for adding the partial downmix signal and the complementary signal to obtain a downmix signal of the multichannel signal. This procedure is advantageous, since the complementary signal, being different from the partial downmix signal fills any time domain or spectral domain holes within the downmix signal that may occur due to certain phase constellations of the at least two channels. Particularly, when the two channels are in phase, then typically no problem should occur when a straight-forward adding together of the two channels is performed. When, however, the two channels are out of phase, then the adding together of these two channels results in a signal with a very low energy even approaching zero energy. Due to the fact, however, that the complementary signal is now added to the partial downmix signal, the finally obtained downmix signal still has significant energy or at least does not show such serious energy fluctuations.
The present invention is advantageous, since it introduces a procedure for downmixing two or more channels aiming to minimize typical signal cancellation and instabilities observed in conventional downmixing.
Furthermore, embodiments are advantageous, since they represent a low complex procedure that has the potential to minimize usual problems from multichannel downmixing.
Advantageous embodiments rely on a controlled energy or amplitude-equalization of the sum signal mixed with the complementary signal that is also derived from the input signals, but is different from the partial downmix signal. The energy-equalization of the sum signal is controlled for avoiding problems at the singularity point, but also to minimize significant signal impairments due to large fluctuations of the gain. Advantageously, the complementary signal is there to compensate a remaining energy loss or to compensate at least a part of this remaining energy loss.
In an embodiment, the processor is configured to calculate the partial downmix signal so that the predefined energy related or amplitude related relation between the at least two channels and the partial downmix channel is fulfilled, when the at least two channels are in phase, and so that an energy loss is created in the partial downmix signal, when the at least two channels are out of phase. In this embodiment, the complementary signal calculator is configured to calculate the complementary signal so that the energy loss of the partial downmix signal is partly or fully compensated by adding the partial downmix signal and the complementary signal together.
In an embodiment, the complementary signal calculator is configured for calculating the complementary signal so that the complementary signal has a coherence index of 0.7 with respect to the partial downmix signal, where a coherence index of 0.0 shows a full incoherence and a coherence index of 1 shows a full coherence. Thus, it is made sure that the partial downmix signal on the one hand and the complementary signal on the other hand are sufficiently different from each other.
Advantageously, the downmixing generates the sum signal of the two channels such as L+R as it is done in conventional passive or active downmixing approaches. The gains applied to this sum signal that are subsequently called W1 aim at equalizing the energy of the sum channel for either matching the average energy or the average amplitude of the input channels. However, in contrast to conventional active downmixing approaches, W1 values are limited to avoid instability problems and to avoid that the energy relations are restored based on an impaired sum signal.
A second mixing is done with the complementary signal. The complementary signal is chosen such that its energy does not vanish when L and R are out-of-phase. The weighting factors W2 compensate the energy equalization due to the limitation introduced into W1 values.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
In an embodiment, the processor 10 is configured to calculate the partial downmix signal 14 so that the predefined energy-related or amplitude-related relation between the at least two channels and the partial downmix signal is fulfilled, when the at least two channels are in phase and so that an energy loss is created in the partial downmix signal with respect to the at least two channels, when the at least two channels are out of phase. Embodiments and examples for the predefined relation are that the amplitudes of the downmix signal are in a certain relation to the amplitudes of the input signals or the subband-wise energies, for example, of the downmix signal are in a predefined relation to the energies of the input signals. One particularly interesting relation is that the energy of the downmix signal either over the full bandwidth or in subbands is equal to an average energy of the two downmix signals or the more than two downmix signals. Thus, the relation can be with respect to energy, or with respect to amplitude. Furthermore, the complementary signal calculator 20 of
Generally, embodiments are based on the controlled energy or amplitude-equalization of the sum signal mixed with the complementary signal also derived from the input channels.
Embodiments are based on a controlled energy or amplitude-equalization of the sum signal mixed with a complementary signal also derived from the input channels. The energy-equalization of the sum signal is controlled for avoiding problems at the singularity point but also to minimize significantly signal impairments due to large fluctuations of the gain. The complementary signal is there to compensate the remaining energy loss or at least a part of it. The general form of the new downmix can be expressed as
M[k,n]=W1[k,n](L[k,n]+R[k,n])+W2[k,n]S[k,n]
where the complementary signal S[k,n] are ideally orthogonal as much as possible to the sum signal, but can be in practice chosen as
S[k,n]=L[k,n]
or
S[k,n]=R[k,n]
or
S[k,n]=L[k,n]−R[k,n].
In all cases, the downmixing generates first the sum channel L+R as it is done in conventional passive and active downmixing approaches. The gain W1[k,n] aims at equalizing the energy of the sum channel for either matching the average energy or the average amplitude of the input channels. However, unlike conventional active downmixing approaches, W1[k,n] is limited to avoid instability problems and to avoid that the energy relations are restored based on an impaired sum signal.
A second mixing is done with the complementary signal. The complementary signal is chosen such that its energy doesn't vanish when L[k, n] and R[k, n] are out-of-phase. W2 [k, n] compensates the energy-equalization due to the limitation introduced in W1[k, n]
As illustrated, the complementary signal calculator 20 is configured to calculate the complementary signal so that the complementary signal is different from the partial downmix signal. In quantities, it is advantageous that a coherence index of the complementary signal is less than 0.7 with respect to the partial downmix signal. In this scale, a coherence index of 0.0 shows a full incoherence and a coherence index of 1.0 shows a full coherence. Thus, a coherence index of less than 0.7 has proven to be useful so that the partial downmix signal and the complementary signal are sufficiently different from each other. However, coherence indices of less than 0.5 and even less than 0.3 are more advantageous.
In an embodiment illustrated in
The output of the complementary signal selector 23 is input into a weighting factor calculator 24. The weighting factor calculator additionally typically receives the two or more signals to be combined by the processor 10 and the weighting factor calculator calculates weights W2 illustrated at 26. Those weights together with the signal used and determined by the complementary signal selector 23 are input into the weighter 25, and the weighter then weights the corresponding signal output from block 23 using the weighting factors from block 26 to finally obtain the complementary signal 22.
The weighting factors can only be time-dependent, so that for a certain block or frame in time, a single weighting factor W2 is calculated. In other embodiments, however, it is advantageous to use time and frequency dependent weighting factors W2 so that, for a certain block or frame of the complementary signal, not only a single weighting factor for this time block is available, but a set of weighting factors W2 for a set of different frequency values or spectral bins of the signal generated or selected by block 23.
A corresponding embodiment for time and frequency dependent weighting factors not only for usage of the complementary signal calculator 20, but also for usage of the processor 10 is illustrated in
Particularly,
The time-spectrum converter 60 is configured for applying an FFT and, advantageously, an overlapping FFT so that the sequence of spectra obtained by block 60 are related to overlapping blocks of the input channels. However, non-overlapping spectral conversion algorithms and other conversions apart from an FFT such as DCT or so can be used as well.
Particularly, the processor 10 of
The complementary signal calculator 20 of
Furthermore, the processor 10 of
The adder 30 outputs the downmix signal 40. The downmix 40 can be used in several different occurrences. One way to use the downmix signal 40 is to input it into a frequency domain downmix encoder 64 illustrated in
In embodiments, the processor 10 is configured for calculating time or frequency-dependent weighting factors W1 as illustrated by block 15 in
The embodiment in
In a further embodiment, the procedure in
In an advantageous embodiment illustrated in
In the above equation, A is a real valued constant advantageously being equal to the square root of 2, but A can have different values between 0.5 or 5 as well. Depending on the application, even values different from the above mentioned values can be used as well.
Given that
|L[k,n]+R[k,n]|≤|L[k,n]|+|R[k,n]|,
W1[k,n] and W2 [k,n] are positive and W1[k,n] is limited to
or e.g. 0.5.
The mixing gains can be computed bin-wise for each index k of the STFT as described in the previous formulas or can be computed band-wise for each non-overlapping sub-band gathering a set of indices b of the STFT. The gains are calculated based on the following equation:
Since the energy preservation during the equalization is not a hard constraint, the energy of the resulting downmix signal varies compared the average energy of the input channel. The energy relation depends on the ILD and IPD as illustrated in
In contrast to the simple active downmixing method, which preserves a constant relation between the output energy and the average energy of the input channels, the new downmix signal does not show any singularity as illustrated in
Listening test results confirm that the new down-mix method results in significantly less instabilities and impairments for a large range of stereo signals than conventional active downmixing.
In this context,
Compared to the conventional technology illustrated in
M[k,n]=W1[k,n](L[k,n]+R[k,n])+W2[k,n](L[k,n]−R[k,n])
where the set of gains W1[k, n] and W2[k, n] are computed such that the energy relation between the down-mixed signal and the input channels holds in every condition.
First the gain W1[k, n] is computed for equalizing the energy till a given limit, where A is again a real valued number equal to or different from this value:
As a consequence, the gain W1[k, n] of the sum signal is limited to the range [0, 1] as shown in
If the two channels have an IPD greater than pi/2, W1 can no more compensate for the loss of energy, and it will be then coming from the gain W2. W2 is computed as one of the roots of the following quadratic equation:
The roots of the equation are given by:
One of the two roots can be then selected. For both roots, the energy relation is preserved for all conditions as shown in
If the two channels have an IPD greater than pi/2, W1 can no more compensate for the loss of energy, and it will be then coming from the gain W2. W2 is computed as one of the roots of the following quadratic equation:
The roots of the equation are given by:
One of the two roots can be then selected. For both roots, the energy relation is preserved for all conditions as shown in
Advantageously, the root with the minimum absolute value is adaptively selected for W2[k, n]. Such an adaptive selection will result in a switch from one root to another for ILD=0 dB, which once again can create a discontinuity.
In contrast to the state-of-the art, this approach solves the comb-filtering effect of the downmix and spectral bias without introducing any singularity. It maintains the energy relations in all conditions but introduces more instabilities compared to the advantageous embodiment.
Thus,
Furthermore,
However,
The downmixing is given by;
In the equation for x, an alternative implementation is to use the denominator without a square root.
In this case the quadratic equation to solve is:
This time the gain W2 is not exactly taken as one of the roots of the quadratic equation but rather:
As a result, the energy relation is not preserved all the time as shown in
Thus,
Although the preceding description and certain Figs. provide detailed equations, it is to be noted that advantages are already obtained even when the equations are not calculated exactly, but when the equations are calculated, but the results are modified. Particularly, the functionalities of the first weighting factor calculator 15 and the second weighting factor calculator 24 of
An inventively encoded audio signal can be stored on a digital storage medium or a non-transitory storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier or a non-transitory storage medium.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
16197813.5 | Nov 2016 | DE | national |
This application is a continuation of copending U.S. application Ser. No. 16/395,933, filed Oct. 30, 2017, which is a continuation of copending International Application No. PCT/EP2017/077820, filed Oct. 30, 2017, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 16197813.5, filed Nov. 8, 2016, which is incorporated herein by reference in its entirety. The present invention is related to audio processing and, particularly, to the processing of multichannel audio signals comprising two or more audio channels.
Number | Date | Country | |
---|---|---|---|
Parent | 16395933 | Apr 2019 | US |
Child | 16847403 | US | |
Parent | PCT/EP2017/077820 | Oct 2017 | US |
Child | 16395933 | US |