The present invention relates to multi-channel encoding and decoding. More in particular, the present invention relates to a device and a method for converting a number of audio channels into a smaller number of audio channels (encoding), and a device and a method for converting a number of audio channels into a larger number of audio channels (decoding).
Audio systems using multiple channels are well known. While conventional stereo systems use only two audio channels, modern 5.1 systems use 6 channels: left front (lf), left rear (lr), right front (rf), right rear (rr), center (co) and low frequency effect (lfe or le). The larger number of channels has caused an increase in the amount of audio data to be stored and/or transmitted. This data increase has given rise to efforts to reduce the amount of data by coding.
One of these coding techniques is known as Mid/Side (M/S) coding or Sum/Difference coding, discussed in the paper by J. D. Johnston and A. J. Ferreira: “Sum-difference stereo transform coding”, Proceedings of the International Conference on Acoustics and Speech Signal Processing (ICASSP), San Francisco, USA, 1992, pp. II 569-572. Mid/Side coding is typically used for encoding a pair of stereo signals. Using M/S coding an audio signal consisting of a first (e.g. left) signal l[n] and a second (e.g. right) signal r[n] is coded as a sum signal m[n] and a difference (or residual) signal s[n]:
m[n]=r[n]+l[n]
s[n]=r[n]−l[n] (1)
For (almost) identical signals l[n] and r[n] this gives a large coding gain as the corresponding difference signal s[n] is close to zero, whereas the sum signal contains practically all signal energy. Hence, in this situation the bit rate required for coding the sum and difference signals is close to the bit rate required for coding only a single channel.
Alternatively the Mid/Side coding process of formula (1) can be described by means of a rotation matrix:
Here, the left and right signals have been rotated over an angle of π/4. The sum signal can be interpreted as a projection of the left and right samples onto the line l=r whereas the difference (or residual) signal can be interpreted as a projection of the left and right samples onto the line l=−r.
This technique can be generalized by allowing rotation angles other than π/4. In order to minimize the signal power in the residual signal (i.e., maximizing the coding gain) for a wide class of input signals, the rotation angle may further be signal dependent. The following unitary rotation may be applied to a pair of channels:
where m′[n] and s′[n] represent the dominant and the residual signal respectively and the angle α is chosen to minimize the power of the residual signal, thus maximizing the power of the dominant signal. This generalized rotation technique is often referred to as Principal Component Analysis (PCA).
As the rotation of formula (3) minimizes the power of the residual signal, the residual signal is typically considered to contain little perceptually relevant information, in particular at higher frequencies. For this reason, conventional encoding systems discard the residual signals produced in the rotation of formula (3) and in similar transformations.
Although the techniques referred to above are primarily aimed at stereo signals, they may be applied to audio signals having multiple channels, such as 5.1 signals, by repeatedly reducing a pair of signals to a dominant signal that is stored and/or transmitted and a residual signal that is discarded.
Discarding the residual signals of course results in a data reduction. However, the present inventors have realized that only a significant data reduction is achieved when the residual signal contains a relatively large amount of information. Discarding the residual signal in such cases inevitably results in an undesirable perceptual distortion of the audio signal.
In decoding devices, the techniques discussed above are used to reconstruct the original signals from the encoded signals. If M/S encoding has been used, for example, both a dominant signal and a residual signal are required to reproduce the original signal pair by an inverse rotation. In Prior Art decoding devices, the residual signals are not received and therefore a synthetic residual signal is derived from each dominant signal using a decorrelator. Although this allows the original signals to be approximated, the waveform of the synthetic residual signals typically differs from the waveform of the actual residual signals. As a result, there will be a discrepancy between the decoded signals and the original signals.
It is an object of the present invention to overcome these and other problems of the Prior Art and to provide an encoding device and a decoding device which allow an improved signal quality.
Accordingly, the present invention provides an encoding device for converting a first number of input audio channels into a second number of output audio channels, where the first number is larger than the second number, the device comprising at least two conversion units, each for converting a first signal and a second signal into a third signal and a fourth signal, the third signal containing most of the signal energy of the first and second signal, and the fourth signal containing the remainder of said signal energy, which encoding device is arranged for using the third signals to produce an output signal, wherein the encoding device is further arranged for outputting a fourth signal.
By outputting at least one fourth signal, that is, an above-mentioned residual signal instead of discarding it, a significantly better reconstruction of the original signal can be produced by the decoder.
If an encoding device comprises more than two conversion units, the fourth signal is preferably output for each conversion unit, although this is not essential and the fourth signal of selected conversion units could be used to enhance the signal quality at the decoder. It is noted that the conversion units could be arranged in parallel or in series (cascade), and that the conversion units may have more than two input channels, for example three.
Although it is possible to output an entire fourth signal, that is, for the entire duration of the first and second signals, it is preferred to select time segments for which the fourth signal is to be output. More in particular, by selecting perceptually relevant time segments (for example time frames), the transmission or storage capacity necessary for transmitting or storing the fourth signal(s) is reduced while still providing a significant signal quality improvement over the Prior Art. For example, only time segments containing frequencies lower than 5 kHz could be selected, thus using a frequency dependent selection.
In a further preferred embodiment, the selection of time segments or signal parts is accomplished by substantially passing perceptually relevant parts of the fourth (that is, residual) signals, attenuating perceptually less relevant parts of the fourth signal and suppressing least relevant parts of the fourth signals. That is, the signal parts (or frames) are divided into at least three groups: those signal parts being perceptually the most relevant are passed substantially without being attenuated, those signal parts being perceptually less relevant are also passed but are attenuated, and those signal parts being perceptually least relevant are suppressed. In this way, a smoother transition between signal parts each having a different relevance is achieved, resulting in a higher signal quality.
The perceptual relevance may be determined in a number of ways, for example by using a weighting function which provides a weighting (that is, gain or attenuation) value dependent on a ratio, for example the power ratio of the fourth signal and the third signal of a conversion unit during a particular time segment.
Instead of, or in addition to the selection of time and/or frequency segments of the respective channels, also the channels for which the fourth signal is output may be selected. If at least two conversion units are arranged in a cascade, preferably the conversion unit nearest to the output terminal of the encoding device is selected to output its fourth signal, while the fourth signal of one or more conversion units further away (in the signal processing direction) may be discarded. In other words, conversion units downstream (in the signal processing direction) are selected before other conversion units to output their respective fourth signal. The present inventors have realized that fourth signals produced nearest to the output terminal, that is in the last stages, of the encoding device will typically be used in the first stages of the decoding device and therefore have the greatest relevance for the quality of the decoded signal. For this reason, it is preferred that these fourth signals are transmitted while the fourth signals of conversion units having less relevance may be discarded, in particular when the available transmission capacity does not allow the transmission of all fourth signals.
This selection of conversion units may be temporary or permanent. If temporary, all conversion units may be provided with a selection unit which may pass or block the respective fourth signal in dependence on the available transmission capacity or other factors. If permanent, the selection units of certain conversion units, typically furthest from the output terminal of the device, may be omitted.
The present invention also provides a decoding device for decoding audio signals which have been encoded using an encoding device as defined above. Accordingly, the present invention provides a decoding device for converting a first number of input audio channels into a second number of output audio channels, where the first number is smaller than the second number, the device comprising at least two conversion units, each for converting a first signal and a second signal into a third signal and a fourth signal, the first signal containing most of the signal energy of the third and fourth signal, and the second signal containing the remainder of said signal energy, the device further comprising at least one decorrelation unit for decorrelating a first signal so as to produce a synthetic second signal, which decoding device is further arranged for receiving at least one additional second signal.
By receiving an additional second signal (that is, the residual signal referred to as fourth signal in the encoding device), an improved quality of the decoded audio signal may be achieved, as any synthetic residual signal generated in the decoding device is typically not identical to the original residual signal.
In a preferred embodiment, the received second signal is combined with the derived synthetic second signal, such that the second signal fed to the conversion unit is a combination of the two signals. This has the advantage that the synthetic residual signal is always available, also for the time segments for which no residual signal is transmitted. For those time segments for which a residual signal is indeed transmitted, the residual signal used by the conversion unit is a combination of the transmitted residual signal and the synthetic residual signal, and will therefore only partially consist of the synthetic residual signal.
In a preferred embodiment, the decoding device is provided with attenuation units controlled by the received residual signals for attenuating the synthetic residual signals. This allows smoother transitions between selected and un-selected residual signals and avoids any switching artifacts. More in particular, this allows the amplitude of each synthetic residual signal to be controlled by the corresponding received residual signal. Accordingly, a much improved mix of the synthetic residual signal and the actual transmitted residual signal is achieved.
In the above, reference is made to M/S and PCA encoding. Alternatively, or additionally, amplitude-related encoding techniques can be used.
It is noted that the present invention relates to spatial audio coding, that is audio coding typically involving more than two channels, as opposed to stereo coding which involves only two channels.
The present invention further provides a method of converting a first number of input audio channels into a second number of output audio channels, where the first number is larger than the second number, the method comprising at least two steps of converting a first signal and a second signal into a third signal and a fourth signal, the third signal containing most of the signal energy of the first and second signals, and the fourth signal containing the remainder of said signal energy, and the step of using the third signals to produce an output signal, which method comprises the further step of outputting a fourth signal.
The present invention still further provides a method of converting a first number of input audio channels into a second number of output audio channels, where the first number is smaller than the second number, the method comprising at least two steps of converting a first signal and a second signal into a third signal and a fourth signal, the first signal containing most of the signal energy of the third and fourth signals, and the second signal containing the remainder of said signal energy, and the step of deriving the second signal from the first signal, which method comprises the further step of receiving an additional second signal.
The method may comprise the further step of decorrelating a first signal so as to produce the derived synthetic second signal. Preferably, the method comprises the still further step of attenuating the synthetic second signal, said step being controlled by a corresponding received second signal. Advantageously, the method may comprise the yet further steps of combining the synthetic second signal and the received second signal, and using the combined signal in the conversion step.
The present invention additionally provides a computer program product for carrying out the encoding and/or decoding methods defined above. A computer program product may comprise a set of computer executable instructions stored on a data carrier in the form of a computer readable storage medium, such as a CD or a DVD. The set of computer executable instructions, which allow a programmable computer to carry out the methods as defined above, may also be available for downloading from a remote server, for example via the Internet.
The present invention will further be explained below with reference to exemplary embodiments illustrated in the accompanying drawings, in which:
The inventive arrangement 10 shown merely by way of non-limiting example in
In the example of
In Prior Art arrangements, the dominant signal m[k] is used for coding while the residual signal s[k] is discarded, the conversion unit 12 producing a dominant signal m[k] and a set of parameters (Pars) associated with the conversion. European Patent Application EP 04103168.3 filed 5 Jul. 2004, corresponding to U.S. patent application Ser. No. 10/599,564, filed Oct. 2, 2006, now U.S. Pat. No. 7,646,875, describes an encoder arrangement in which part of the residual signal s[k] is used. More in particular, in the arrangement of the earlier Application a selector is used which selects perceptually relevant parts of the residual signal while discarding perceptually irrelevant parts. Accordingly, some parts (which may be frequency representations of time frames) are either selected or discarded. European Patent Application EP 04103168.3, the entire contents of which are herewith incorporated in this document, describes the selection of parts of the residual signal in a stereo encoder and decoder. However, the selection of parts of the residual signal in a multi-channel encoding and decoding device, such as a 5.1 arrangement, is not described.
The selection according to the above-mentioned European Patent Application is schematically illustrated in
The present inventors have realized that this selection is too coarse and may cause audible switching artifacts. In particular, the quality of the decoded signals can be improved without significantly increasing the quantity of transmitted data. Accordingly, the present invention provides a selection of (parts of) the residual signal that distinguishes not only between relevant and non-relevant parts, but also identifies less relevant parts: parts that are not as relevant as the (most) relevant parts but are not irrelevant either.
Examples of a weighting function W according to the present invention are schematically shown in
In the example of
Of course other functions may be used than the ones illustrated in
It is noted that instead of power ratios other criteria can be used, such as bandwidth. For example, it can be decided to select signal parts having a frequency lower than a certain threshold frequency, irrespective of their signal power.
The selection and attenuation (S&A) unit 15 according to the present invention shown in
The selection and attenuation unit 15 outputs the weighted residual signal ws[k] which, together with the dominant signal m[k], may be encoded. It will be understood that the weighted residual signal ws[k] contains less information than the original residual signal s[k] and therefore reduces the bit rate required for transmission of the coded signal pair. On the other hand, the inclusion of the weighted residual signal ws[k] offers a significant improvement of the signal quality compared with Prior Art arrangements in which the residual signal is discarded. The selection and attenuation unit 15 uses a weighting function W as illustrated in
An arrangement in accordance with the present invention for use in a decoding device is schematically illustrated in
Accordingly, in the arrangement 20 of the present invention the residual signal sh[k] fed to the mixing unit 24 is a combination of the (decoded) residual signal ws[k] and an attenuated version of the synthetic residual signal. If no (transmitted) residual signal ws[k] is available, the decorrelated signal sd[k] is used, substantially without being attenuated. If a residual signal ws[k] is available, the decorrelated signal sd[k] is attenuated accordingly.
Encoding and decoding devices according to the present invention will be discussed below with reference to
The Prior Art encoding device 1′ is designed for encoding a six channel audio input signal, such as a so-called 5.1 signal, into a two channel audio output signal. In the example shown, the input channels are lf (left front), lr (left rear), rf (right front), rr (right rear), co (center) and le (low frequency effect). All these signals are assumed to be digital time signals and could be written as lf[n], lr[n] etc., with n being a sample number.
The audio input signals are input into segment and transform (T) units 11 which divide the signals into time segments which are then transformed, for example to the frequency domain using an FFT (fast Fourier transform). The time segments into which the time signals are divided preferably overlap partially, as is well known in the art.
The segment and transform units 11 produce transformed signals Lf, Lr, Rf, Rr, Co and Le, which are frequency domain representations of the time segments and could be written as Lf[k], Lr[k], etc. with k being a frequency index. These transformed signals are fed to 2-to-1 converters 12 which convert each pair of input signals (e.g. Lf and Lr) into a dominant signal (e.g. L) and a residual signal while producing an associated set of signal parameters (e.g. PS1). This conversion typically involves a rotation of the signals such that the dominant signal contains most of the signal energy while the residual signal contains the remainder of the signal energy.
In the Prior Art device of
The 3-to-2 conversion unit 13 converts the three input signals L, R and C into the two output signals L0 and R0, while producing an associated parameter set PS4. It is noted that the input signals L and R may respectively be identified with the first and second signals defined above, while the signals L0 and C0 may respectively be identified with the third and fourth signal defined above.
The (transform domain) signal L0 and R0 are fed to an inverse transform (T−1) and overlap-and-add (OLA) unit 14 which outputs time-domain signals l0 and r0. The inverse transform is the counterpart of the transform of the units 11 and typically is an inverse FFT. The overlap-and-add operation is substantially the inverse of the segment operation of the units 11 and adds partially overlapping time frames.
It can thus be seen that the Prior Art encoder 1′ converts six input audio (time) signals into two output audio (time) signals plus four sets of parameters. In each conversion unit 12 or 13, an output signal is discarded to reduce the number of signals and hence of the required transmission rate.
A compatible decoding device according to the Prior Art is illustrated in
The three mixing units 24 each receive a respective parameter set PS1, PS2 and PS3 that controls the (up)mixing operation. If PCA (Principal Component Analysis) is used, a signal rotation is carried out over an angle α contained in the signal parameter sets. Other suitable parameters are, for example, the IID and ICC mentioned above. Not all of these parameters are required, the angle α may be derived from the parameters IID and ICC using:
The signals produced by the mixing units 24 are the signal pairs Lf and Lr, Rf and Rr, and Co and Le respectively. These signals are inversely transformed (T−1) by the inverse transform and overlap-and-add units 25, which perform a suitable inverse transform such as an inverse FFT and then reconstitute the time signal pairs lf and lr, rf and rr, and co and le. It can thus be seen that the Prior Art decoder 2′ converts a pair of audio input signals (l0 and r0) into six audio output signals.
A disadvantage of the known decoding device 2′ is that the output signal quality is necessarily limited. In addition, any increase in available transmission capacity does not lead to a corresponding increase in output signal quality. This is mainly due to the fact that the residual signals used by the mixing units 24 are synthetic, that is, derived from the dominant signals. The present invention, as already illustrated with reference to
The encoding device 1 according to the present invention illustrated in
Each selection and attenuation unit 15 produces a respective residual signal Ls, Rs and Cs which is output by the encoder device 1. Those skilled in the art will understand that these residual signals, as well as the parameter sets PS1, . . . , PS4, may be suitably encoded and/or quantized before being output by the encoding device.
The additional residual channel E0 produced by the 3-to-2 unit 13 may optionally be output as well. This residual channel E0 represents the prediction error of the residual channel C0 mentioned with reference to
Additional residual channels may be used if additional transmission capacity (bit budget) is available. Accordingly, the additional transmission capacity may be distributed over all additional residual channels. Some distribution preferences may be stated:
additional channels are allocated symmetrically to left-side audio channel blocks and right-side audio channel blocks (a block being, for example, a number of units associated with a channel);
additional channels are allocated first to blocks nearest to the output of the encoding device; and
the available transmission capacity is distributed over as many additional channels as possible.
In addition, the bandwidth of additional channels may be limited, for example limited to 2 kHz.
An exemplary compatible decoding device according to the present invention is shown in
As shown in
It will be understood that the decoding device 2 is not only capable of decoding signals that have been encoded with the encoding device 1 of
Embodiments of the decoding device 2 of the present invention can be envisaged in which the attenuation units 26 are omitted and the decorrelated versions of the channels L, R and C are fed directly to the combination units 27. In such embodiments, which would still be within the scope of the present invention, the use of the additional residual channels Ls, Rs and Cs would still lead to an improved signal quality compared with the Prior Art decoder 2′ shown in
The optional further residual channel e0 may be used in the 2-to-3 unit 22 as third channel, thus providing three instead of two input channels. This improves the signal quality when deriving the signals L, R and C from the (transformed) input channels L0 and R0 and the parameter set PS4, for example by adjusting the prediction of the residual channel C0.
A Prior Art 6-to-1 encoding device 1′ is shown in
A corresponding Prior Art 1-to 6 decoding device is illustrated in
The Prior Art encoding device 1′ of
As already indicated above, the selection and attenuation units 15 may be omitted, thus providing additional channels Ls, Rs and Cs that are not weighted. In some embodiments, the selection and attenuation units 16a and 16b may be omitted. However, it is preferred that all S&A units 15, 16a and 16b are present, as illustrated in
It is also possible to select residual channels from the five available residual channels, for example when the transmission capacity is insufficient. In that case, it is preferred to select and transmit residual channels that are nearest to the output terminal of the encoding device 1, that is, nearest to the transform unit 14. These residual channels are the first ones to be used in the corresponding decoding device and therefore have the greatest impact on the decoding process and the quality of the decoded signals. In the example of
A compatible 1-to-6 decoder is illustrated in
The present invention is based upon the insight that, when encoding, the residual signal may be subdivided into at least three categories: perceptually relevant, less relevant and irrelevant, and that the residual signal may be attenuated accordingly. The present invention benefits from the further insight that, when decoding, the decoded residual signal may be used to control the attenuation of a synthetic residual signal to produce a reconstructed residual signal.
The present invention may be utilized in any application involving audio coding, such as internet radio, internet streaming, electronic music distribution (EMD), solid state (e.g. MP3 or AAC) audio players, consumer audio systems, professional audio systems, etc.
It is noted that any terms used in this document should not be construed so as to limit the scope of the present invention. In particular, the words “comprise(s)” and “comprising” are not meant to exclude any elements not specifically stated. Single (circuit) elements may be substituted with multiple (circuit) elements or with their equivalents.
It will be understood by those skilled in the art that the present invention is not limited to the embodiments illustrated above and that many modifications and additions may be made without departing from the scope of the invention as defined in the appending claims.
Number | Date | Country | Kind |
---|---|---|---|
04105527 | Nov 2004 | EP | regional |
05103079 | Apr 2005 | EP | regional |
05103443 | Apr 2005 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2005/053550 | 10/31/2005 | WO | 00 | 4/30/2007 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2006/048817 | 5/11/2006 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7394903 | Herre et al. | Jul 2008 | B2 |
7447629 | Breebaart | Nov 2008 | B2 |
7646875 | Schuijers et al. | Jan 2010 | B2 |
20040086130 | Eid et al. | May 2004 | A1 |
20060009225 | Herre et al. | Jan 2006 | A1 |
20080195397 | Myburg et al. | Aug 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20090055194 A1 | Feb 2009 | US |