The present invention relates to audio channel conversion. More in particular, the present invention relates to a device and a method for converting a first number of input audio channels into a second number of output audio channels, the first number being smaller than the second number.
It is well known to convert a number of audio channels into another, larger number of audio channels. This may be done for various reasons. A first reason may be the conversion into a new format. Stereo recordings, for example have only two channels, while modern audio systems typically have five or six channels, as in the popular “5.1” systems. Accordingly, the two stereo channels have to be converted into five or six channels in order to take full advantage of the advanced audio system. The second reason may be coding efficiency. It has been found that stereo audio signals can be encoded as single channel audio signals combined with a parameter bit stream describing the spatial properties of the audio signal. The decoder can reproduce the stereo audio signals with a very satisfactory degree of accuracy. In this way, substantial bit rate savings may be obtained.
There are several parameters which describe the spatial properties of audio signals. One of those parameters is the inter-channel cross-correlation, for example in stereo signals the cross-correlation between the L channel and the R channel. Another parameter is the power ratio of the channels. In so-called parametric spatial audio (en)coders these and other parameters are extracted from the original audio signal so as to produce an audio signal having a reduced number of channels, for example only a single channel, plus a set of parameters describing the spatial properties of the original audio signal. In so-called parametric spatial audio decoders the original audio signal is substantially reconstructed.
A parametric spatial audio decoder typically comprises a number of decorrelation filters for producing sets of decorrelated auxiliary channels of each input audio channel. These decorrelated auxiliary channels are then combined with the original input channels in a so-called upmix unit to produce output channels having a desired correlation, that is, a correlation corresponding with the original audio signal. In addition to setting the correlation, the upmix unit typically also sets the power ratio of the audio channels and/or carries out other signal processing steps, such as predicting an audio channel on the basis of other channels.
The present inventors have found that the decorrelation filters introduce a time delay and a temporal “smearing” of the audio signal and that, as a result of this, there may be a temporal discrepancy between a signal part (for example the signal contained in a time frame) and its corresponding parameters: as the signal part is delayed, its parameters may be applied to another signal part, resulting in distortion of the signal. This is clearly undesirable. It is, however, not feasible to delete the decorrelation units from the decoder, as this would make it impossible to provide audio channels having a correct inter-channel correlation.
It is an object of the present invention to overcome these and other problems of the Prior Art and to provide a device and a method for converting the number of audio channels of an audio signal in which the disadvantageous effects of the decorrelation filters are significantly reduced or even eliminated.
Accordingly, the present invention provides a device for converting a first number of input audio channels into a second number of output audio channels, where the first number is smaller than the second number, the device comprising:
at least one decorrelation unit for producing a set of decorrelated auxiliary channels from an input audio channel, and
at least one upmix unit for combining channels into output audio channels, said device further comprising:
at least one pre-processing unit for pre-processing the input audio channel prior to feeding the input audio channel to the at least one decorrelation unit.
By providing a pre-processing unit for pre-processing the input audio channels prior to processing by the decorrelation units, the audio channels can be (pre-)processed before any delay or “smearing” is introduced by the decorrelation units. As a result, the correct parameters are used for this processing and any misalignment of the signal parts and the parameters is avoided.
The at least one pre-processing unit is arranged such that the pre-processing takes place before the input audio channel is fed to the decorrelation unit(s). Accordingly, the pre-processing unit is arranged between an input terminal of the device and the at least one decorrelation unit.
The set of auxiliary channels derived from a single input audio channel may consist of one, two, three or more channels. Auxiliary channels may also be derived from intermediate channels, that is channels derived from the input audio channels by signal processing other than decorrelation, for example by prediction, as may be performed in the pre-processing unit of the present invention.
The upmix unit(s) may combine the input audio channel (or channels), the decorrelated auxiliary channel (or channels) and/or any intermediate channels in a known manner. In addition to combining (that is, mixing), the upmix unit may also perform scaling. However, in accordance with the present invention the processing of the auxiliary channels and the input audio channels, other than combining, is primarily or exclusively performed in the pre-processing unit.
The pre-processing unit(s) and/or the upmix unit(s) are preferably controlled by audio parameters. These units are therefore designed to be controlled by these units. This provides a greater flexibility and allows the pre-processing properties and/or upmix properties to be changed.
Accordingly, the pre-processing unit is preferably arranged for time-variant pre-processing. That is, the processing performed by the pre-processing units varies with time. More in particular, this processing is determined by time-varying signal parameters. The upmix unit is preferably also arranged for time-variant processing, such as time-variant decorrelation. In contrast, the decorrelation units are preferably arranged for time-invariant decorrelation.
The pre-processing unit(s) may advantageously be arranged for setting power ratios of audio channels and/or prediction. This prediction involves predicting the signals of certain audio channels on the basis of properties of other channels and prediction parameters.
It is noted that setting the correlations of the audio channels should be performed after the decorrelation units, that is, by the conventional upmix unit. All other signal processing, however, may take place in the pre-processing unit.
The present invention also provides an audio system comprising a device as defined above. The audio system may further comprise one or more audio sources, an amplifier and loudspeaker units or their equivalents.
The present invention additionally provides a method of converting a first number of input audio channels into a second number of output audio channels, where the first number is smaller than the second number, the method comprising the steps of:
producing a set of decorrelated auxiliary channels from an input audio channel, and
combining channels into output audio channels,
said method comprising the additional step of:
pre-processing the input audio channel prior to the step of producing the set of decorrelated auxiliary channels.
Preferably, audio parameters are used for controlling the combining step and the pre-processing step.
The present invention further provides a computer program product for carrying out the method as defined above. A computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD. The set of computer executable instructions, which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
The present invention will further be explained below with reference to exemplary embodiments illustrated in the accompanying drawings, in which:
The Prior Art device 1′ shown in
The number of output channels (N outputs 6) is greater than the number of input channels (M inputs 5). Exemplary values are N=6 and M=2, as when a stereo audio signal is converted into a 5.1 audio signal, or N=2 and M=1, as when a stereo signal is encoded as a mono signal plus additional information, although other values of M and N are also possible. The output channels typically have (mutual) correlations defined by parameters fed to the upmix unit 4. To produce output channels having the desired correlations, a set of mutually uncorrelated channels is derived from the input channels. To this end, decorrelation units 3 are coupled to each input 5 so as to produce sets of uncorrelated input channels. The actual number of decorrelation filters, which are well known in the art, may vary and is not limited to the number shown in the drawings.
The decorrelation units 31, . . . , 39 typically include filters having all-pass characteristics. Such filters substantially maintain the spectral envelope of the audio signal. However, the all-pass characteristics have the disadvantage of introducing a time delay. In addition, they often cause a “smearing” of the input signal, that is, the temporal envelope of the decorrelated signal is less well-defined than the temporal envelope of the original signal. Both the time delay and the “smearing” result in a discrepancy between the audio signal and the corresponding parameters: some signal parts (that is, time segments of the signal produced by decorrelation filters) reach the upmix unit later than the corresponding parameters. As a result, the wrong parameters are applied to these signal parts and the audio signal is processed incorrectly, leading to a perceptible signal distortion, for example cross-talk. It will be understood that this is highly undesirable.
It is noted that the parameters could be delayed (e.g. be a delay unit) so as to better match the timing of the parameters and the signals. However, the upmix unit 4 also receives the un-decorrelated input signals, which have not been delayed. In addition, the “smearing” may be frequency-dependent. As a result, it is difficult to match the parameters and the corresponding signal parts.
The present invention solves this problem by processing the audio signal prior to the decorrelation. That is, a substantial part of the signal processing is performed before the audio signal is fed to the decorrelation filters. In this way, the mismatch caused by the decorrelation filters is largely avoided.
The device 1 according to the present invention and illustrated merely by way of non-limiting example in
The pre-processing unit 2 receives the M input channels of the audio signal through the M inputs 5. The unit 2 also receives parameters relating to the audio signal, which are indicative of desired signal properties. Using these parameters, the pre-processing unit 2 performs signal processing such as adjusting the power ratios of the audio channels and predicting some audio channels on the basis of other audio channels. As a result, power ratio adjustment and prediction are carried out without being influenced by the decorrelation filters 3, and any time mismatch between the audio signal and the parameters relating to these operations is avoided.
It will be understood that not all signal processing can be performed by the pre-processing unit. Setting the desired correlations of the audio channels typically requires the availability of uncorrelated channels as produced by the decorrelation filters 3. Accordingly, setting the correlations is performed by the upmix unit 4. In addition, additional signal adjustments may be made by the upmix unit 4, such as an additional adjustment of the power levels of the audio channels. In this case, the power adjustment may be carried out in both the pre-processing unit 2 and the upmix unit 4, although it is very well possible to perform this operation in only one of these units.
An additional advantage of the present invention is the possibility to choose which of the units 2 and 4 is best suitable for performing a certain signal processing operation. By providing two units (2 and 4) instead of a single unit (4), a greater design flexibility is achieved, and the unfavorable effects of the decorrelation units can be avoided to the greatest extent possible.
In the preferred embodiments of the present invention, the pre-processing unit 2 and the upmix unit 4 are both time-variant: their signal processing properties are controlled by signal parameters which may vary in time. The decorrelation filters 3, however, are preferably time-invariant: their properties are not time-dependent and are preferably not controlled by signal parameters that vary over time. Embodiments can be envisaged in which either the pre-processing unit 2 or the upmix unit 4 is time-invariant.
In further advantageous embodiments, the processing performed by the pre-processing unit 2 and/or the upmix unit 4 is frequency-dependent: the signal processing properties of these units may be controlled by parameters which vary in dependence of the frequency.
As mentioned above, the number of output channels (N) is greater than the number of input channels (M). For example, there may be two input channels and five or six output channels, or there may be a single input channel and two or more output channels, although other combinations are possible.
It is also possible that the number of output channels 6 is equal to the number of input channels 5 (that is, M=N), in which case the device of the present invention provides a remix of the audio channels. This may be useful to adjust certain signal properties and to enhance the audio signal.
It is noted that the audio signal may be constituted by a series of signal parts contained in consecutive time segments. Such time segments may be time frames or other units defining a time-limited signal part. Due to the decorrelation units the synchronization between the time segments and the corresponding parameters may be lost. This problem is solved by the present invention.
A merely exemplary embodiment of the device of the present invention is shown in more detail in
A (first) gain unit 21 having a gain G1 could be added between the input terminal and the first decorrelation unit 31 but has been omitted from the embodiment shown where the first gain G1 is equal to 1.
The upmix unit 4 comprises, in the example shown, three mixing units 41, 42 and 43 which mix the input channel and its three auxiliary channels to produce four output channels Lf (Left front), Ls (Left surround), Rf (Right front) and Rs (Right surround). The mixing unit 41 receives the (time-dependent) parameters IID_lr (Inter-channel Intensity Difference left-right) and ICC_lr (Inter-channel Cross-Correlation left-right), the mixing unit 42 receives the (time-dependent) parameters IID_l (Inter-channel Intensity Difference left front-left surround) and ICC_l (Inter-channel Cross-Correlation left front-left surround), while the mixing unit 43 receives the (time-dependent) parameters IID_r (Inter-channel Intensity Difference right front-right surround) and ICC_r (Inter-channel Cross-Correlation right front-right surround).
The parameters mentioned above are typically used in a so-called mixing matrix to determine the desired output signals. For example, the output signals Rf (Right front) and Rs (Right surround) may be determined by a mixing matrix M of mixing unit 43:
where the matrix M has coefficients m11 . . . m22, and where H3(G3·S)=S3 is the output signal of decorrelation unit 33. The normalized correlation coefficient ICC of the signals Rf and Rs is given by:
where σ2x is the power of signal x. The intensity ratio IID is given by:
As the total power should be unaltered, it follows that:
σR2=m112σR2+m122σS32+m212σR2+m222σH32 (4)
It has been found that the further constraint m12=−m22 is effective. In other words, the power of the intermediate signal (auxiliary channel) S3 in both signals Rf and Rs is equal but has opposite signs (anti-phase). If m12=−m22 holds, the factors m12 and m22 can be moved upstream of decorrelator unit 33, for example to gain unit 23, to allow processing prior to decorrelation. Equation (1) can then be rewritten as:
Equation (1′) can be generalized using a parameter c:
For c=1 all time-variant processing of the decorrelator signal path is performed upstream of the decorrelator, while for c=G3·m12 all time-variant processing of the decorrelator signal path is performed downstream of the decorrelator. In accordance with the present invention, the parameter c will preferably have a value approximately or substantially equal to 1.
In the exemplary embodiment described above the upmix unit 4 sets both the cross-correlation and the intensity difference of the four output channels. This is, of course, not essential and in some embodiments the inter-channel intensity may be set in the pre-processing unit 2. This may be accomplished by performing all mixing operations in the pre-processing unit 2, for example directly using the input signal S.
It can be seen from
Another example of a device 1 according to the present invention is illustrated in
As can be seen from
An exemplary stereo decoder in accordance with the present invention is illustrated in
An audio system 10 according to the present invention is schematically illustrated in
Accordingly, the present invention may be used in audio amplifiers and/or systems. Such audio systems may include one or more audio sources, an amplifier and loudspeaker units or their equivalents. The audio sources may include a CD player, a DVD player, an MP3 or AAC player, a radio tuner, a hard disk, and/or other sources. The audio system may be incorporated in an entertainment center or in a computer system.
As discussed above, the present invention provides both a device and a method. The method steps are evident from
The present invention is based upon the insight that the time delay and possible “smearing” caused by the decorrelation in an audio decoder may cause temporal alignment discrepancies between the signal parameters and the corresponding signal parts. The present invention benefits from the further insight that this discrepancy can be eliminated, at least for certain signal processing operations, by carrying out these operations prior to the decorrelation.
It is noted that any terms used in this document should not be construed so as to limit the scope of the present invention. In particular, the words “comprise(s)” and “comprising” are not meant to exclude any elements not specifically stated. Single (circuit) elements may be substituted with multiple (circuit) elements or with their equivalents.
It will be understood by those skilled in the art that the present invention is not limited to the embodiments illustrated above and that many modifications and additions may be made without departing from the scope of the invention as defined in the appending claims.
Number | Date | Country | Kind |
---|---|---|---|
04103370 | Jul 2004 | EP | regional |
05103072 | Apr 2005 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2005/052293 | 7/11/2005 | WO | 00 | 1/9/2007 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2006/008697 | 1/26/2006 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5119422 | Price | Jun 1992 | A |
5136650 | Griesinger | Aug 1992 | A |
5222059 | Holman | Jun 1993 | A |
5621818 | Tashiro | Apr 1997 | A |
5701346 | Herre et al. | Dec 1997 | A |
5706309 | Eberlein et al. | Jan 1998 | A |
5812971 | Herre | Sep 1998 | A |
5844993 | Iida et al. | Dec 1998 | A |
5870480 | Griesinger | Feb 1999 | A |
6539357 | Sinha | Mar 2003 | B1 |
6711266 | Aylward | Mar 2004 | B1 |
7006636 | Baumgarte et al. | Feb 2006 | B2 |
7181019 | Breebaart et al. | Feb 2007 | B2 |
7343281 | Breebaart et al. | Mar 2008 | B2 |
7382888 | Aylward et al. | Jun 2008 | B2 |
7391869 | Eid et al. | Jun 2008 | B2 |
7394903 | Herre et al. | Jul 2008 | B2 |
7412380 | Avendano et al. | Aug 2008 | B1 |
7447629 | Breebaart | Nov 2008 | B2 |
7460990 | Mehrotra et al. | Dec 2008 | B2 |
7508947 | Smithers | Mar 2009 | B2 |
7583805 | Baumgarte et al. | Sep 2009 | B2 |
7602922 | Breebaart et al. | Oct 2009 | B2 |
7974713 | Disch et al. | Jul 2011 | B2 |
8019350 | Purnhagen et al. | Sep 2011 | B2 |
8346564 | Hotho et al. | Jan 2013 | B2 |
8532999 | Neusinger et al. | Sep 2013 | B2 |
20030206639 | Griesinger | Nov 2003 | A1 |
20040032960 | Griesinger | Feb 2004 | A1 |
20040044527 | Thumpudi et al. | Mar 2004 | A1 |
20050157883 | Herre et al. | Jul 2005 | A1 |
20050195981 | Faller et al. | Sep 2005 | A1 |
20060140412 | Villemoes et al. | Jun 2006 | A1 |
20060147048 | Breebaart et al. | Jul 2006 | A1 |
20060165184 | Purnhagen et al. | Jul 2006 | A1 |
20060165237 | Villemoes et al. | Jul 2006 | A1 |
20070168183 | Van De Kerkhof | Jul 2007 | A1 |
20070194952 | Breebaart et al. | Aug 2007 | A1 |
20070230710 | Van Loon et al. | Oct 2007 | A1 |
20080031463 | Davis | Feb 2008 | A1 |
20090055194 | Hotho et al. | Feb 2009 | A1 |
20100094631 | Engdegard et al. | Apr 2010 | A1 |
20100310079 | Jung et al. | Dec 2010 | A1 |
20110211703 | Villemoes et al. | Sep 2011 | A1 |
20130142339 | Engdegard et al. | Jun 2013 | A1 |
20130236021 | Purnhagen et al. | Sep 2013 | A1 |
Number | Date | Country |
---|---|---|
1054575 | Nov 2000 | EP |
1376538 | Jan 2004 | EP |
63-044600 | Mar 1988 | JP |
05-020500 | Mar 1993 | JP |
08-336199 | Dec 1996 | JP |
2000-350300 | Dec 2000 | JP |
2004535145 | Nov 2004 | JP |
2005523624 | Aug 2005 | JP |
2129336 | Apr 1999 | RU |
0004744 | Jan 2000 | WO |
03007656 | Jan 2003 | WO |
WO03007656 | Jan 2003 | WO |
03090206 | Oct 2003 | WO |
03090206 | Oct 2003 | WO |
2005086139 | Sep 2005 | WO |
Entry |
---|
Avendano et al., “Frequency Domain Techniques for Stereo to Multichannel Upmix”, 22nd International Conference on Virtual, Synthetic and Entertainment Audio, Jun. 2002, pp. 1 to 10. |
ISR: PCT/IB05/052293. |
Written Opinion: PCT/IB05/052293. |
English Translation of Japanese Office Action Mailed Jul. 6, 2010 in related Japanese Application No. 520954/2007, 6 pages. |
English translation of Russian Decision to Grant concerning Russian Patent Application 2007105501, said Decision mailed Dec. 2009, 12 pages. |
J. Engdegard, H. Purnhagen, J. Röden, L. Liljeryd: “Synthetic ambience in parametric stereo coding” 116th AES Convention, May 8, 2004, XP002347433, 12 pages, Berlin, DE. |
D. Griesinger: “Surround from Stereo”, 13th AES Convention, Oct. 5, 2002, XP002347434, Los Angeles, USA. |
Avendano C. et al.: “A Frequency-Domain Approach to Multichannel Upmix”, Journal of the Audio Engineering Society, Audio Engineering Society, New York, US, vol. 52, No. 7/8, Jul. 1, 2004, pp. 740-749, XP001231780. |
Griesinger, David , “Stereo and Surround Panning in Practice”, Presented at the 112th Convention AES Engineering Society, Convention Paper 5564, May 2002, 6 pages. |
Number | Date | Country | |
---|---|---|---|
20080091436 A1 | Apr 2008 | US |