The present invention is related to multichannel processing and, particularly, to multichannel processing providing the possibility for a mono output.
While a stereo encoded bitstream will usually be decoded to be played back on a stereo system, not all devices that are able to receive a stereo bitstream will typically be able to output a stereo signal. A possible scenario would be playback of the stereo signal on a mobile phone with only a mono speaker. With the advent of multi-channel mobile communication scenarios as supported by the emerging 3GPP IVAS standard a stereo-to-mono downmix may therefore be used that is free of additional delay and complexity-wise as efficient as possible while also providing the best possible perceptual quality beyond what is achievable with a simple passive downmix.
There are multiple ways of converting a stereo signal to a mono signal. The most direct ways of doing it is by a passive downmix [1] in time-domain which generates a mid-signal by adding the left and right channels and scaling the result:
Further more sophisticated (i.e. active) time-domain based downmixing methods include energy-scaling in an effort to preserve the overall energy of the signal [2][3], phase alignment to avoid cancellation effects [4] and prevention of comb-filter effects by coherence suppression [5].
Another method is to do the energy-correction in a frequency-dependent manner by calculation separate weighting factors for multiple spectral bands. For instance, this is done as part of the MPEG-H format converter [6], where the downmix is performed on a hybrid QMF subband representation of the signals with additional prior phase alignment of the channels. In [7], a similar band-wise downmix (including both phase and temporal alignment) is already used for the parametric low-bitrate mode DFT Stereo where the weighting and mixing is applied in the DFT domain.
The solution of a passive stereo-to-mono downmix in time-domain after decoding the stereo signal is not ideal as it is well known that a purely passive downmix comes with certain shortcomings, e.g. phase cancellation effects or general loss of energy, which can —depending on the item—severely degrade the quality.
Other active downmixing methods that are purely time-domain based mitigate some of problems of the passive downmix but are still suboptimal due to the lack of frequency-dependent weighting.
With the implicit constraints for mobile communication codecs like IVAS (Immersive Voice and Audio Services) in terms of delay and complexity, having a dedicated post-processing stage like the MPEG-H format converter for applying a band-wise downmix is also not an option as the transforms to frequency domain and back which may be performed will inevitably cause an increase in both complexity and delay.
In a DFT-based stereo system as described in [8] that uses only parameter-based residual prediction to restore the stereo signal at the decoder and where the mid-signal is generated by an active downmix as described in [7], a sufficiently good mono signal is available at the decoder. However, if spectral parts of the signal rely on a coded residual signal for stereo restoration that was generated by an M/S transform, the mono signal available before the stereo upmix is not suitable anymore. In this case the mono signal will spectrally consist in part of the mid-signal from the M/S transform (residual coding part) which is equal to a passive downmix and partially of an active downmix (residual prediction part). This mixture of two different downmixing methods leads to artifacts and energy imbalances in signal.
According to an embodiment, an apparatus for generating an output downmix representation from an input downmix representation, wherein a portion of the input downmix representation is in accordance with a first downmixing scheme, may have: an upmixer for upmixing the portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme to acquire an upmixed portion; and a downmixer for downmixing the upmixed portion in accordance with a second downmixing scheme different from the first downmixing scheme to acquire a downmixed portion representing the output downmix representation for the portion of the input downmix representation.
According to another embodiment, a multichannel decoder may have: an input interface for providing an input downmix representation and parametric data for a second portion of the input downmix representation, wherein a first portion of the input downmix representation is in accordance with a first downmixing scheme; an upmixer for upmixing the first portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme to acquire an upmixed first portion and for upmixing the second portion and the parametric data using a second upmixing scheme to acquire an upmixed second portion, the second upmixing scheme being different from the first upmixing scheme; and a combiner configured to combine the upmixed first portion and the upmixed second portion to acquire a multichannel output signal.
According to another embodiment, a method for generating an output downmix representation from an input downmix representation, wherein a portion of the input downmix representation is in accordance with a first downmixing scheme, may have the steps of: upmixing the portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme to acquire an upmixed portion; and downmixing the upmixed portion in accordance with a second downmixing scheme different from the first downmixing scheme to acquire a first downmixed portion representing the output downmix representation for at least the portion of the input downmix representation.
According to another embodiment, a method of multichannel decoding may have the steps of: providing an input downmix representation and parametric data for a second portion of the input downmix representation wherein a portion of the input downmix representation is in accordance with a first downmixing scheme; upmixing the input downmix representation of the first portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme to acquire an upmixed first portion; upmixing the second portion and the parametric data using a second upmixing scheme to acquire an upmixed second portion, the second upmixing scheme being different from the first upmixing scheme; and combining the upmixed first portion and the upmixed second portion to acquire a multichannel output signal.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform, when said computer program is run by a computer, a method for generating an output downmix representation from an input downmix representation, wherein a portion of the input downmix representation is in accordance with a first downmixing scheme, the method having the steps of: upmixing the portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme to acquire an upmixed portion; and downmixing the upmixed portion in accordance with a second downmixing scheme different from the first downmixing scheme to acquire a first downmixed portion representing the output downmix representation for at least the portion of the input downmix representation.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform, when said computer program is run by a computer, a method of multichannel decoding, the method having the steps of: providing an input downmix representation and parametric data for a second portion of the input downmix representation, wherein a first portion of the input downmix representation is in accordance with a first downmixing scheme; upmixing the input downmix representation of the first portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme to acquire an upmixed first portion; upmixing the second portion and the parametric data using a second upmixing scheme to acquire an upmixed second portion, the second upmixing scheme being different from the first upmixing scheme; and combining the upmixed first portion and the upmixed second portion to acquire a multichannel output signal.
An apparatus for generating an output downmix representation from an input downmix representation, where at least a portion of the input downmix representation is in accordance with a first downmixing scheme, comprises an upmixer for upmixing at least a portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme to obtain at least one upmixed portion. Furthermore, the apparatus comprises a downmixer for downmixing the at least one upmixed portion in accordance with a second downmixing scheme different from the first downmixing scheme.
In another embodiment, the portion of the input downmix representation is in accordance with the downmixing scheme and, additionally, a second portion of the input downmix representation is in accordance with a second downmixing scheme being different from the first downmixing scheme. In this embodiment, the downmixer is configured for downmixing the upmix portion in accordance with the second downmixing scheme or in accordance with a third downmixing scheme different from the downmixing scheme and the second downmixing scheme to obtain the first downmixed portion. Now, the situation with respect to the downmixed portion is such that the first downmixed portion and the second portion are related and, as one could say, in the same downmix scheme domain, so that the first downmixed portion and the second downmixed portion or a downmixed portion derived from the second downmixed portion can be combined by a combiner to obtain the output downmix representation comprising an output representation for the first portion and an output representation for the second portion, where the output representation for the first portion and the output representation for the second portion are based on the same downmixing scheme, i.e., are located in one and the same downmixing domain and are, therefore, “harmonized” with each other.
In a further embodiment, either the whole bandwidth or just a portion of the input downmix representation is based on a downmixing scheme relying on parameters and a residual signal or only relying on a residual signal without parameters. In such a context, the input downmix representation comprises a core signal, a residual signal or a residual signal and parameters. This signal is upmixed using the side information, i.e., using the parameters and the residual signal or using just the residual signal. The upmix comprises all the available information including the residual signal and a downmix is performed into the second downmixing scheme which is different from the first downmixing scheme, i.e., which is, advantageously, an active downmix having measures for addressing energy calculations or, in other words, a downmixing scheme that does not generate a residual signal and, advantageously, does not generate a residual signal and any parameters. Such a downmix provides a good and pleasant and high quality audio mono rendering possibility, while the core signal of the input downmix representation when used without upmixing and subsequent downmixing does not provide any pleasant and high quality audio reproduction if rendered without advantageously taking into consideration the residual signal and the parameters.
In accordance with this embodiment, the apparatus for generating an output downmix representation performs a conversion of a residual-like downmixing scheme into a non-residual like downmixing scheme. This conversion can be performed either in the full band or can also be performed in a partial band. Typically, and in advantageous embodiments, the lowband of a multichannel-encoded signal comprises a core signal, a residual signal and advantageously parameters. However, in the highband, less precision is provided in favor of a lower bit rate and, therefore, in such a highband an active downmix is sufficient without any additional side information such as residual data or parameters. In such a context, the lowband which is in the residual-downmix domain is converted into the non-residual downmix domain and the result is combined with the highband that is already in the “correct” non-residual downmix domain.
In a further embodiment, it is not required that the first portion is converted from the first downmix domain into the same downmix domain, in which the second portion is located. Instead, in further embodiments, where the first portion is in the first downmix domain and the second portion of the input representation is in the second downmix domain, both these portions are converted into another third downmix domain by upmixing the first portion in accordance with the first upmixing scheme corresponding to the first downmixing scheme. Additionally, the second portion is upmixed in accordance with the second upmixing scheme corresponding to the second downmixing scheme, and both upmixes are downmixed, advantageously by an active downmix without any residual or parametric data, into the third downmixing scheme, which is different from the first and the second downmixing schemes.
In further embodiments, more than two portions and, in particular, spectral portions or spectral bands, can be available that are in different downmix representations. By means of the present invention, where, advantageously, the upmixing and subsequent downmixing is performed in the spectral domain, individual processings for individual bands can be performed without interference from one spectral band to the other spectral band. At the output of the downmixer, all bands are in the same “downmix” domain and, therefore, a spectrum for the mono output downmix representation exists, which can be converted into a time domain representation by a spectrum-time-converter such as a synthesis bank, an inverse discrete Fourier transform, an inverse MDCT domain or any other such transform. The combination of the individual bands and the conversion into the time domain can be implemented by means of such a synthesis filter bank. In particular, it is irrelevant whether the combination is performed before the actual conversion, i.e., in the spectral domain. In such a situation, the combination takes place before the spectrum-time transform, i.e., at the input into the synthesis filter bank and only a single transform is performed to obtain a single time domain signal. However, the equivalent implementation consists in the implementation where the combiner performs a spectrum-time transform for each band individually, so that the time domain output of each such individual transform represents a time domain representation but in a certain bandwidth, and the individual time domain outputs are combined in a sample-by-sample manner advantageously subsequent to some kind of upsampling when critically sampled transforms have been implemented.
In a further implementation, the present invention is applied within a multichannel decoder that is operable in two different modes, i.e., in the multichannel output mode as the “normal” mode and that is also operable in a second mode such as an “exceptional mode” which is the mono output mode. This mono output mode is particularly useful when the multichannel decoder is implemented within a device which only has a mono speaker output facility such as a mobile phone having a single speaker or which is implemented in a device that is in some kind of power saving mode where, in order to save battery power or to save processing resources, only a mono output mode is provided even though the device would, basically, also have the possibility for a multichannel or a stereo output mode.
In such an implementation, the multichannel decoder comprises a first time-spectrum transform for the decoded core signal and a second time-spectrum transform facility for the decoder residual signal. Two different upmixing facilities in the spectral domain for two different spectral portions being in two different downmix domains are provided and the corresponding left channel spectral lines are combined by a combiner such as a synthesis filterbank or an IDFT block and the other channel spectral lines are combined by an additional or second synthesis filterbank or IDFT (inverse discrete Fourier transform) block.
In order to enhance such a multichannel decoder, the downmixer for downmixing the at least one upmixed portion in accordance with a second downmixing scheme different from the first downmixing scheme that is advantageously implemented as an active downmixer is provided. Additionally, in an embodiment, two switches and a controller are provided as well. The controller controls a first switch to bypass an upmixer for the highband portion and the second switch is implemented to feed the downmixer with the output of the upmixer. In such a mono output mode, the second combiner or synthesis filterbank is inactive and the upmixer for the highband is inactive as well in order to save processing power. However, in the stereo output mode, the first switch feeds the upmix for the highband and the second switch bypasses the (active) downmixer and both output synthesis filterbanks are active in order to obtain the left stereo output signal and the right output signal.
Since the mono output is calculated in the spectral domain such as the DFT domain, the generation of the mono output does not incur any additional delay compared to the generation of the stereo output, because any additional time-frequency transforms compared to the stereo processing mode are not necessary. Instead, one of the two stereo mode synthesis filterbanks are used for the mono mode as well. Furthermore, compared to the stereo output that, typically, provides an enhanced audio experience compared to the mono output, the mono processing mode saves complexity and, in particular, processing resources and, therefore, battery power in a low power mode particularly useful for a battery-powered mobile device. This is true, since the highband upmixer that is normally used in the stereo mode can be deactivated and, additionally, a second output filterbank that may also be used for the stereo output mode is deactivated as well. Instead, only a low complexity and low delay active downmix block fully operating in the spectral domain may be used as an additional processing block compared to the stereo mode. The additional processing resources that may be used by this active downmix block, however, are significantly smaller than the processing resources that are saved by deactivating the highband upmixer and the second synthesis filterbank or IDFT block.
Embodiments aim at generating a harmonized mono output signal from a mono input signal that was created by a downmix of a stereo signal where the downmix was done with different methods (e.g. active and passive) for at least two different spectral regions of the stereo signal. The harmonization is achieved by picking one downmix method as the advantageous method for the harmonized signal and transforming all spectral parts that were downmixed via different methods to the advantageous method. This is achieved by first upmixing these spectral parts using all the side parameters which may be used for the upmix to regain an LR representation in the respective spectral regions. Again using all the parameters that may be used for the advantageous downmix method, the spectral parts are converted to a mono representation by applying the advantageous method to the stereo representation. A harmonized mono output signal is generated that avoids the problems a non-uniform downmix without additional delay and complexity.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
The apparatus illustrated in
The first portion of the downmix representation is input into the upmixer 200 that upmixes corresponding to the first downmixing scheme and the first portion is forwarded, as discussed with respect to
A mostly parametric stereo scheme as described in [8] is built around the idea of only transmitting a single downmixed channel and recreating the stereo image via side parameters. This downmix at the encoder side is done in an active manner by dynamically calculating weights for both channels in the DFT domain [7]. These weights are computed band-wise using the respective energies of the two channels and their cross-correlation. The target energy that has to be preserved by the downmix is equal to the energy of the phase-rotated mid-channel:
where L and R represent the left and right channel. Based on this target energy the weights for the channels can be computed per band b as follows:
|L| and |R| are computed for each band b as
|L+R| is computed as
and |L, R
| is computed as the absolute of the complex dot product
with
where i specifies the bin number inside spectral band b.
The downmixed spectrum is obtained for each band by adding the weighted spectral bins of left and right channel:
If all the stereo processing in such a system is entirely reliant on parameters and the described active downmix is done on the whole spectrum, a mono signal that satisfies the given quality requirements by avoiding the problems of a passive downmix is already available after the core decoding. This means that in most cases it suffices to skip all decoder stereo processing and output the signal without going into DFT domain.
However, for higher bitrates this kind of system also supports the coding of a residual signal for the lower spectral bands. The residual signal can be seen as the side-signal of an MS-transform of these lowest bands while the core signal is the complementary mid-signal, basically a passive downmix of left and right. To keep the side signal as small as possible, a compensation of the interaural level differences (ILDs) between the channels is applied to it using side gains that are computed per band.
The downmixed mid-channel is computed at the encoder side for every spectral bin i inside the residual coding spectrum as
while the complementary side channel is computed as
The residual signal is obtained by subtracting the predicted part due to an ILD between left and right:
with side gain gb of the current spectral band b given as
The full-band signal going into the core coder is a mixture of passive downmix in lower bands and active downmix in all higher bands. Listening tests have shown that there are perceptual issues when playing back such a mixed signal. A way of harmonizing the different signal parts is therefore useful.
The L-R representations of the lowband signal with residual coding are thereby regained as follows:
Subsequently, the active downmix is applied as described above, only the weights are calculated from the upmixed decoded spectra L and R. The lowband is combined with the already actively downmixed highband to create a harmonized signal which is brought back to time domain via IDFT.
Furthermore, a second combiner 420 is illustrated in
Contrary thereto, in the stereo output mode or, generally, in the multichannel output mode, the controller 700 is configured to activate, via control signal CTRL1 the first switch so that the output of the first time-to-frequency converter 100 is fed into the second upmixer 220 indicated as “upmix high” in
In block 820, the weights are applied to the upmixed signal over the whole bandwidth of the signal under consideration or only in the corresponding portion per spectral bin. To this end, block 820 receives the spectral domain (complex) signals or bins or spectral values. Subsequent to the application of the weights and, particularly, an addition of the weighted values to obtain the downmix, a conversion 840 to the time domain is performed. Depending on whether only a portion or the full band is processed in block 820, the conversion to the time domain takes place without any other portion or takes place with the other portion particularly in the context of a harmonized downmix as, for example, illustrated and discussed with respect to
As illustrated, the amplitude-related measure can be the square root over the squared magnitudes of the spectral values in a band. This is illustrated as |Lb|. Another amplitude-related measure would, for example, be the sum over the magnitudes of the spectral lines in the band without any square root or with an exponent being different from ½ such as an exponent being between 0 and 1 but excluding 0 and 1. Furthermore, the amplitude-related measure could also refer to a sum over exponentiated magnitudes of spectral lines where the exponent is different from 2. For example, using an exponent of 3 would correspond to the loudness in psychoacoustic terms. However, other exponents being greater than 1 would be useful as well.
The same is true for the amplitude-related measure calculated in block 804 or the amplitude-related measure calculated in block 806.
Furthermore, with respect to the cross-correlation measure calculated in block 808, the corresponding mathematical equation illustrated before also relies on a squaring of the dot products and the calculation of a square root. However, other exponents for the dot products different from 2 such as exponents equal to 3 corresponding to a loudness domain or exponents greater than 1 can be used as well. At the same time, instead of the square root, other exponents different from ½ can be used such as ⅓ or, generally, any exponent being between 0 and 1.
Furthermore, block 810 indicates the calculation of wR and wL based on the three amplitude-related measures and the cross-correlation measure. Although it has been indicated that the target energy is preserved by the downmix and is equal to the energy of the phase-rotated mid-channel, it is not necessary, neither for the calculation of wR and wL nor for the calculation of the actual downmix signal that such a rotation with a rotation angle is actually performed. Instead, the only thing that is highly expedient when the actual rotation with the rotation angle Φ is not performed is the calculation of the cross-correlation measure between L and R in the corresponding bands b. In the previously described embodiment, although it has been indicated that an energy of a phase-rotated mid-channel is used as the target energy, any other target energies can be used or any phase rotation has not to be performed at all. With respect to other target energies, these target energies are energies that make sure that an energy of the downmix signal generated by the downmix 300 is fluctuating for the same signal less than the energy of a passive downmix as, for example, underlying the decoded core signal input into block 100 of
Advantageously, the time-to-spectral converters 100, 120 of
Alternatively, when the time-to-spectral conversion on the one hand and the spectral-time-conversion on the other hand are performed with, for example, a modified discrete cosine transform, an overlap processing is used as well. On the spectral-to-time conversion side, an overlap-add processing is performed so that, once again, each output time domain sample is obtained by summing corresponding time domain samples from two (or more) different IMDCT blocks.
Advantageously, the harmonization of the downmixing schemes is performed fully in the spectral domain as illustrated in
Advantageous embodiments remove artifacts and spectral loudness imbalances that stem from having different downmix methods in different spectral bands in the decoded core signal of a system as described in [8] without the additional delay and significantly higher complexity that a dedicated post-processing stage would bring about.
Embodiments provide, in an aspect, an upmix and a subsequent downmix at the decoder of one (or more) spectral or time parts of a mono signal, that was downmixed using one or more than one downmix method, in order to harmonize all spectral or time parts of the signal.
The present invention provides, in an aspect, a harmonization of a stereo-to-mono downmix at the decoder side.
In an embodiment, the output downmix is for a replay device that receives the downmix included in the output representation and feeds this downmix of the output representation into a digital to analog converter and the analog downmix signal is rendered by one or more loudspeakers included in the replay device. The replay device may be a mono device such as a mobile phone, a tablet, a digital clock, a Bluetooth speaker etc.
It is to be mentioned here that all alternatives or aspects as discussed before and all aspects as defined by independent claims in the following claims can be used individually, i.e., without any other alternative or object than the contemplated alternative, object or independent claim. However, in other embodiments, two or more of the alternatives or the aspects or the independent claims can be combined with each other and, in other embodiments, all aspects, or alternatives and all independent claims can be combined to each other.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier or a non-transitory storage medium.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
Some example solutions are listed below.
1. Apparatus for generating an output downmix representation from an input downmix representation, wherein at least a portion of the input downmix representation is in accordance with a first downmixing scheme, the apparatus comprising: an upmixer (200) for upmixing at least the portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme to obtain at least one upmixed portion; and a downmixer (300) for downmixing the at least one upmixed portion in accordance with a second downmixing scheme different from the first downmixing scheme to obtain a first downmixed portion representing the output downmix representation for at least the portion of the input downmix representation.
2. Apparatus of solution 1, wherein only the portion of the input downmix representation is in accordance with the first downmixing scheme and a second portion of the input downmix representation is in accordance with the second downmixing scheme, wherein the downmixer (300) is configured for downmixing the at least one upmixed portion in accordance with the second downmixing scheme to obtain the first downmixed portion; and further comprising a combiner (400) for combining the first downmixed portion and the second portion of the input downmix representation or a downmixed portion derived from the second portion of the input downmix representation to obtain the output downmix representation comprising a first output representation for only the portion of the input downmix representation and a second output representation for the second portion of the input downmix representation, wherein the first output representation for only the portion of the input downmix representation and the second output representation for the second portion of the input downmix representation are based on the same downmixing scheme.
3. Apparatus of solution 1 or 2, wherein the at least the portion of the input downmix representation or only the portion of the input downmix representation is a first frequency band, wherein the first downmixing scheme is a downmixing scheme relying on a residual signal, and wherein the upmixer (200) is configured to perform an upmix using the residual signal.
4. Apparatus of solution 1, 2, or 3, wherein the second downmixing scheme is a fully parametric scheme, and wherein the downmixer (300) is configured to apply the second downmixing scheme.
5. Apparatus of solution 2, 3, or 4, wherein the second portion of the input downmix representation is a second frequency band, and wherein the combiner (400) is configured to combine the first downmixed portion and the second portion of the input downmix representation to obtain the output downmix representation.
6. Apparatus of any one of the preceding solutions, further comprising an audio decoder (10) for generating a decoded core signal for at least the portion of the input downmix representation or only the portion of the input downmix representation, and a decoded residual signal for at least the portion of the input downmix representation or only the portion of the input downmix representation, wherein the upmixer (200) is configured to use, in the upmixing scheme, the decoded core signal for at least the portion of the input downmix representation or only the portion of the input downmix representation and the decoded residual signal for at least the portion of the input downmix representation or only the portion of the input downmix representation, wherein the downmixer (300) is configured for receiving the at least one upmixed portion comprising more channels than the input downmix representation.
7. Apparatus of solution 6, wherein the second portion of the input downmix representation is in accordance with the second downmixing scheme, wherein the audio decoder (10) is configured for generating a decoded core signal for the second portion of the input downmix representation and a decoded residual signal for at least the portion of the input downmix representation or only the portion of the input downmix representation only, and wherein the combiner (400) is configured to combine the first downmixed portion and the decoded core signal for the second portion of the input downmix representation.
8. Apparatus of one of the preceding solutions, further comprising: a time-to-spectrum converter (100) for converting a time domain input downmix representation of at least the portion of the input downmix representation or only the portion of the input downmix representation into a spectral domain; and a spectrum-to-time converter (400) for converting an output signal into a time domain to obtain the output downmix representation, wherein the time-to-spectrum converter (100) or the spectrum-to-time converter (400) is configured to perform an overlap and add processing or to perform a crossover processing from an earlier time block to a later time block, or further comprising an output interface (500) for outputting the output downmix representation to a rendering device or further comprising a rendering device for rendering the output downmix representation as a mono replay signal, or wherein the downmixer (300) is configured to apply, as the second downmixing scheme, an active downmixing scheme, an energy conserving downmixing scheme, or a downmixing scheme, in which a target energy of the downmix signal is in a predetermined ratio to an energy of a mid-channel derived from a first channel and a second channel, wherein at least one of the first channel and the second channel is phase rotated before being added together to form the input downmix representation.
9. Apparatus of solution 8, wherein the second portion of the input downmix representation is in accordance with the second downmixing, wherein the time-to-spectrum converter (100) is configured for converting a time domain input downmix representation of the second portion of the input downmix representation into the spectral domain, or wherein the predetermined ratio indicates an equality or a deviation range being 3 dB related to a higher energy of energies of a first original channel and a second original channel.
10. Apparatus of one of the preceding solutions, wherein at least the portion of the input downmix representation is in accordance with the first downmixing scheme relying on a residual signal or on a residual signal and parametric information, wherein the upmixer (200) is configured for upmixing the input downmix representation of at least the portion of the input downmix representation using the upmixing scheme corresponding to the first downmixing scheme and using the residual signal or the residual signal and the parametric information, respectively to obtain the at least one upmixed portion; and wherein the downmixer (300) is configured for downmixing the at least one upmixed portion in accordance with the second downmixing scheme different from the first downmixing scheme, wherein the second downmixing scheme is an active downmixing scheme or a fully parametric downmixing scheme to obtain the output downmix representation comprising at least one down mixed portion.
11. Apparatus of solution 10, further comprising an output interface (500) for outputting the output downmix representation to a rendering device or further comprising a rendering device for rendering the output downmix representation as a mono replay signal.
12. Apparatus of solution 10 or 11, wherein the downmixer (300) is configured to apply, as the active downmixing scheme, an energy conserving downmixing scheme, or a downmixing scheme, in which a target energy of the downmix signal is in a predetermined ratio to an energy of a mid-channel derived from a first channel and a second channel, wherein at least one of the first channel and the second channel is phase rotated before being added together.
13. Apparatus of solution 10, 11, or 12, wherein at least the portion of the input downmix representation comprises the full bandwidth of the input downmix representation.
14. Apparatus of one of the preceding solutions, wherein the downmixer (300) is configured to perform the second downmixing scheme, the second downmixing scheme comprising: calculating (800) a first weight for a first channel and a second weight for a second channel for a spectral band of the at least one upmixed portion, the spectral band comprising a plurality of spectral lines, and applying (820) the first weight to spectral lines of the spectral band of the first channel and applying the second weight to spectral lines of the spectral band of the second channel, and adding first weighted lines and second weighted lines to obtain downmixed spectral lines in the spectral band, and wherein the apparatus is configured to convert (840) the downmixed spectral lines to a time domain to obtain time domain samples of the output downmix representation.
15. Apparatus of solution 14, wherein the calculation of the first weight and the second weight is performed band wise using energies of the first channel and the second channel and a target energy.
16. Apparatus of solution 15, wherein the target energy is equal to an energy of a phase-rotated mid-channel or is derived from the energies of the first channel, the second channel and from a correlation value between the first channel and the second channel.
17. Apparatus of one of solutions 14 to 16, wherein calculating the first weight and the second weight comprises, for a spectral band: calculating (802) an amplitude-related measure for the first channel in the spectral band; calculating (804) an amplitude-related measure for the second channel in the spectral band: calculating (806) an amplitude-related measure for a linear combination of the first channel and the second channel in the spectral band; calculating (808) a cross-correlation measure between the first channel and the second channel in the spectral band; and calculating (810) the first weight and the second weight using the amplitude-related measure for the first channel, the amplitude-related measure for the second channel, the amplitude-related measure for the linear combination and the cross-correlation measure.
18. Apparatus of one of the preceding solutions, wherein the upmixer (200) is configured to perform the upmixing scheme, the upmixing scheme comprising: calculating first channel spectral lines for a spectral band of at least the portion of the input downmix representation or only the portion of the input downmix representation from spectral lines of the spectral band of at least the portion of the input downmix representation or only the portion of the input downmix representation using a prediction parameter for the spectral band and residual signal lines for the spectral band and a first calculation rule, and calculating second channel spectral lines for the spectral band of at least the portion of the input downmix representation or only the portion of the input downmix representation from the spectral lines of the spectral band of at least the portion of the input downmix representation or only the portion of the input downmix representation using the prediction parameter for the spectral band and the residual signal lines for the spectral band and a second calculation rule, wherein the first calculation rule is different from the second calculation rule.
19. Apparatus of solution 18, wherein the first calculation rule comprises one of an addition and a subtraction and the second calculation rule comprises the other one of the addition and the subtraction.
20. Multichannel decoder, comprising: an input interface (100, 120) for providing an input downmix representation and parametric data at least for a second portion of the input downmix representation; and the apparatus of one of the preceding solutions, wherein the multichannel decoder is configured to upmix, with the upmixer (200), the input downmix representation for at least the portion of the input downmix representation or only the portion of the input downmix representation in accordance with the upmixing scheme corresponding to the first downmixing scheme to obtain the at least one upmixed portion, and/or to upmix (220) the input downmix representation for the second portion and the parametric data using a second upmixing scheme corresponding to the second downmixing scheme to obtain an upmixed second portion, and wherein a combiner (400, 420) is configured to combine the at least one upmixed portion and the upmixed second portion to obtain a multichannel output signal.
21. Multichannel decoder of solution 20, wherein the input interface (100, 120) comprises: a first time-spectrum converter (100) for converting a first spectral representation of the at least the portion of the input downmix representation or only the portion of the input downmix representation and a second spectral representation of a second portion of the input downmix representation, the second portion of the input downmix representation comprising spectral values for higher frequencies than at least the portion of the input downmix representation or only the portion of the input downmix representation of the first spectral representation; a second time-spectrum-converter (120) for generating a spectral representation of a residual signal for the at least the portion of the input downmix representation or only the portion of the input downmix representation, wherein the upmixer (200) is configured to upmix the first spectral representation using the spectral representation of the residual signal to obtain the at least one upmixed portion in the spectral domain, wherein the downmixer (300) is configured to downmix the at least one upmixed portion to obtain the first downmixed portion in the spectral domain, and wherein the combiner (400) comprises a spectrum-time converter for combining the first downmixed portion and the spectral representation of the second portion of the input downmix representation and for converting into the time domain to obtain the output downmix representation.
22. Multichannel decoder of solution 20 or 21, further comprising: a second upmixer (220) for upmixing the second portion of the input downmix representation to obtain the upmixed second portion, wherein, in a multichannel output mode, the combiner (400) is configured to combine a first channel of the at least one upmixed portion and the first channel of the upmixed second portion and to convert into a time domain to obtain a first channel of a multichannel output, wherein the multichannel decoder further comprises a second combiner (420) configured to combine, in the multichannel output mode, a second channel of the at least one upmixed portion and a second channel of the upmixed second portion and to convert into the time domain to obtain a second channel of the multichannel output.
23. Multichannel decoder of solution 21, further comprising: a second upmixer (220) for upmixing the second portion of the input downmix representation to obtain the upmixed second portion, wherein, in a multichannel output mode, the combiner (400) is configured to combine a first channel of the at least one upmixed portion and the first channel of the upmixed second portion and to convert into a time domain to obtain a first channel of a multichannel output, wherein the multichannel decoder further comprises a second combiner (420) configured to combine, in the multichannel output mode, a second channel of the at least one upmixed portion and a second channel of the upmixed second portion and to convert into the time domain to obtain a second channel of the multichannel output, a switch (710) connected between the first time-spectrum-converter (100) and the second upmixer (220), and a controller (700), wherein the controller (700) is configured to control, in a mono output mode, the switch (710) to connect an output of the first time-spectrum-converter (100) to the combiner (400) or to bypass the second upmixer (220) and to connect an output of the upmixer (200) to an input of the downmixer (300), or to control, in the multichannel output mode, the switch (710) to connect an output of the first time-spectrum-converter (100) to an input of the second upmixer (220).
24. Multichannel decoder of one of solutions 22, 23, further comprising a second switch (720) connected between the upmixer (200) and the downmixer (300); and a controller (700), wherein the controller (700) is configured to control, in the mono output mode, the second switch (720) to connect an output of the upmixer (200) to an input of the downmixer (300) and to control, in the multichannel output mode, the second switch (720) to connect an output of the upmixer (200) to an input of the second combiner (420) or to bypass the downmixer (300).
25. Method for generating an output downmix representation from an input downmix representation, wherein at least a portion of the input downmix representation is in accordance with a first downmixing scheme, the method comprising: upmixing the input downmix representation of at least the portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme to obtain an at least one upmixed portion; and downmixing the at least one upmixed portion in accordance with a second downmixing scheme different from the first downmixing scheme to obtain a first downmixed portion representing the output downmix representation for at least the portion of the input downmix representation.
26. Method of solution 25, wherein a second portion of the input downmix representation is in accordance with a second downmixing scheme, wherein the downmixing comprises downmixing the at least one upmixed portion in accordance with the second downmixing scheme to obtain the first downmixed portion; and wherein the method further comprises combining the first downmixed portion and the second portion or a downmixed portion derived from the second portion to obtain the output downmix representation, wherein the output downmix representation for at least the portion of the input downmix representation and the output representation for the second portion are based on the same downmixing scheme.
27. Method of solution 25 or 26, wherein at least the portion of the input downmix representation is in accordance with the first downmixing scheme relying on a residual signal or on a residual signal and parametric information, wherein the upmixing comprises upmixing the input downmix representation of at least the portion of the input downmix representation using an upmixing scheme corresponding to the first downmixing scheme and using the residual signal or the residual signal and the parametric information, respectively to obtain the at least one upmixed portion; and wherein the downmixing comprises downmixing the at least one upmixed portion in accordance with the second downmixing scheme different from the first downmixing scheme, wherein the second downmixing scheme is an active downmixing scheme or a fully parametric downmixing scheme to obtain the output downmix representation for at least the portion of the input downmix representation.
28. Method of multichannel decoding, comprising: providing an input downmix representation and parametric data at least for a second portion of the input downmix representation; the method of any one of solutions 25 to 27, wherein the method comprises the upmixing the input downmix representation for at least the portion of the input downmix representation or only the portion of the input downmix representation in accordance with the upmixing scheme corresponding to the first downmixing scheme to obtain the at least one upmixed portion, and/or upmixing the second portion of the input downmix representation and the parametric data using a second upmixing scheme corresponding to the second downmixing scheme to obtain an upmixed second portion, and combining the at least one upmixed portion and the upmixed second portion to obtain a multichannel output signal.
29. Computer program for performing, when running on a computer or a processor, the method of any one of solutions 25 to 28.
30. Apparatus for generating an output downmix representation from an input downmix representation, wherein a first portion of the input downmix representation is in accordance with a first downmixing scheme and a second portion of the input downmix representation is in accordance with the second downmixing scheme, the apparatus comprising: an upmixer (200) for upmixing the first portion of the input downmix representation using a first upmixing scheme corresponding to the first downmixing scheme to obtain a first upmixed portion and for upmixing the second portion of the input downmix representation using a second upmixing scheme corresponding to the second downmixing scheme to obtain a second upmixed portion; and a downmixer (300) for downmixing the first upmixed portion and the second upmixed portion in accordance with a third downmixing scheme different from the first downmixing scheme and the second downmixing scheme to obtain the output downmix representation, wherein the output representation for the first portion of the input downmix representation and the output representation for the second portion of the input downmix representation are based on the same downmixing scheme of the input downmix representation.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
19170621.7 | Apr 2019 | EP | regional |
PCT/EP2019/070376 | Jul 2019 | WO | international |
This application is a continuation of copending U.S. application Ser. No. 17/501,993, filed Oct. 14, 2021, which is incorporated herein by reference in its entirety, which is a continuation of International Application No. PCT/EP2020/061233, filed Apr. 22, 2020, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 19170621.7, filed Apr. 23, 2019, and from International Application No. PCT/EP2019/070376, filed Jul. 29, 2019, both of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 17501993 | Oct 2021 | US |
Child | 19031912 | US | |
Parent | PCT/EP2020/061233 | Apr 2020 | WO |
Child | 17501993 | US |