The present disclosure relates to the field of multichannel audio coding/decoding and in particular to parametric spatial audio coding/decoding also known as parametric multichannel audio coding/decoding.
Multichannel audio coding is based on the extraction and quantisation of a parametric representation of a spatial image of the multichannel audio signal. These spatial parameters are transmitted by an encoder together with a generated downmix signal to a decoder. At the decoder the received multichannel audio signal is reconstructed based on the decoded downmix signal and the received spatial parameters containing the spatial information of the multichannel audio signal. In spatial audio coding, the spatial image of the multichannel audio signal is captured into a compact set of spatial parameters that can be used to synthesise a high quality multichannel representation from a transmitted downmix signal. During an encoding process the spatial parameters are extracted from the multichannel audio input signal. These spatial parameters typically include level/intensity differences and measures of correlation/coherence between the audio channels and can be represented in an extremely compact way. The generated downmix signal is transmitted together with the extracted spatial parameters to the decoder. The downmix signal can be conveyed to the receiver using conventional audio coders. On the decoding side the transmitted downmix signal is expanded into a high quality multichannel output signal based on the received spatial parameters. Due to the reduced number of audio channels, the spatial audio coding provides an extremely efficient representation of multichannel audio signals.
The generated downmix signal is transmitted by the multichannel audio encoder via a transmission channel along with the extracted spatial parameters SP to the multichannel audio decoder. In many scenarios the bandwidth of the transmission channel is very limited allowing a transmission of the downmix signal and the corresponding spatial parameters (SP) only with a very low bit rate. Accordingly, a goal of the present disclosure resides in saving band width for transmission of spatial parameters without degrading the quality of the multichannel audio signal reconstructed by the multichannel audio decoder.
According to a first aspect of the present disclosure a method is provided for decoding a multichannel audio signal comprising the steps of:
receiving a downmix audio signal and an interchannel cross correlation parameter,
deriving an interchannel phase difference parameter from the received interchannel cross correlation parameter, and
calculating a decoded multichannel audio signal for the received downmix audio signal depending on the derived interchannel phase difference parameter.
In a possible implementation of the first aspect of the present disclosure the interchannel phase difference parameter is set to a value π for negative values of the received interchannel cross correlation parameter.
In a possible implementation of the first aspect of the present disclosure the interchannel phase difference parameter (IPD) is derived from the received interchannel cross correlation parameter in response to a received IPD-activation flag.
In a possible implementation of the first aspect of the present disclosure a synthesis matrix is generated for calculating the decoded multichannel audio signal by multiplying a rotation matrix with a calculated pre-matrix.
In a possible implementation of the first aspect of the present disclosure the pre-matrix is calculated on the basis of the respective received interchannel cross correlation parameter and a received channel level difference parameter.
In a possible embodiment of the first aspect of the present disclosure the rotation matrix comprises rotation angles which are calculated in a possible embodiment on the basis of a derived interchannel phase difference parameter and an overall phase difference parameter.
In an alternative embodiment the rotation matrix comprises rotation angles which are calculated on the basis of the derived interchannel phase difference parameter and a predetermined angle value.
In a possible implementation the predetermined angle value is set to a value of 0.
In a possible implementation of the first aspect of the present disclosure the overall phase difference parameter is calculated on the basis of the derived interchannel phase difference parameter and the received channel level difference parameter.
In a possible implementation of the first aspect of the present disclosure the derived interchannel phase difference parameter is smoothed before calculating the rotation matrix.
In a possible implementation of the first aspect of the present disclosure the received downmix audio signal is decorrelated by means of decorrelation filters to provide decorrelated audio signals.
In a further possible implementation of the first aspect of the present disclosure the downmix audio signals and the decorrelated audio signals are multiplied with the generated synthesis matrix to calculate the decoded multichannel audio signal.
In a possible implementation of the first aspect of the present disclosure the interchannel cross correlation parameter is received for each frequency band (b).
In a possible implementation of the first aspect of the present disclosure the IPD-activation flag is transmitted once per frame.
In a possible implementation of the first aspect of the present disclosure the IPD activation flag is transmitted for each frequency band.
In a possible implementation of the first aspect of the present disclosure a corresponding interchannel phase difference parameter is derived from the respective interchannel cross correlation parameter for each frequency band to calculate the decoded multichannel audio signal.
In a possible implementation of the first aspect of the present disclosure for calculating the decoded multichannel audio signal a synthesis matrix is generated for each frequency band by multiplying a rotation matrix with a calculated pre-matrix.
In a possible implementation of the first aspect of the present disclosure the pre-matrix is calculated for each frequency band on the basis of the respective received interchannel cross correlation parameter and a received channel level difference parameter of the frequency band.
According to a second aspect of the present disclosure a multichannel audio decoder is provided for decoding a multichannel audio signal, said multichannel audio decoder comprising:
a receiver unit for receiving a downmix audio signal and an interchannel cross correlation parameter,
a deriving unit for deriving an interchannel phase difference parameter from the received interchannel cross correlation parameter, and
a calculation unit for calculating a decoded multichannel audio signal depending on the derived interchannel phase difference parameter.
In a possible implementation of the second aspect of the present disclosure the decoded multichannel audio signal is output to at least one multichannel audio device connected to said multichannel audio decoder,
wherein the multichannel audio device comprises for each audio signal of said multichannel audio signal an acoustic transducers.
In a possible implementation the acoustic transducer is an earphone.
In a further possible implementation said acoustic transducer is formed by a loudspeaker.
In a possible implementation of the second aspect of the present disclosure a multichannel audio device connected to the multichannel audio decoder is a mobile terminal.
In an alternative implementation of the second aspect of the multichannel audio decoder the multichannel audio device connected to the multichannel audio decoder is a multichannel audio apparatus.
In further implementation forms of the second aspect of the present disclosure the multichannel decoder is adapted to perform a method according to any of the implementation forms of the first aspect.
According to a third aspect of the present disclosure the method for encoding a multichannel audio signal is provided said method comprising the steps of:
generating a downmix audio signal for the multichannel audio signal,
extracting from the multichannel audio signal spatial parameters which comprise an interchannel cross correlation parameter and a general level difference parameter and
providing or adjusting an IPD-activation flag which is transmitted with the extracted spatial parameters to indicate the transmission of an implicit interchannel phase difference parameter (IPD) and to control the interchannel phase difference parameter (IPD). In dependence on the IPD-activation flag the interchannel phase difference parameter (IPD) can be derived from the interchannel cross correlation parameter, e.g. by a decoder, and can be used, e.g. by the decoder, for calculating a decoded multichannel audio signal for the transmitted downmix audio signal.
According to a fourth aspect of the present disclosure a multichannel audio encoder is provided for encoding a multichannel audio signal, said multichannel audio encoder comprising:
a downmix signal generation unit for generating a downmix audio signal for the multichannel audio signal; and
a spatial parameter extraction unit for extracting from said multichannel audio signal spatial parameters comprising an interchannel cross correlation parameter and a channel level difference parameter and providing an adjustable IPD-activation flag being transmitted with the extracted spatial parameters to indicate the transmission of an implicit interchannel phase difference parameter (IPD) and to control an interchannel phase difference parameter (IPD). In dependence on the IPD-activation flag the interchannel phase difference parameter (IPD) can be derived from the interchannel cross correlation parameter, e.g. by a decoder, and can be used, e.g. by the decoder, for calculating a decoded multichannel audio signal for the transmitted downmix audio signal.
In further implementation forms of the fourth aspect of the present disclosure the multichannel decoder is adapted to perform a method according to any of the implementation forms of the third aspect.
Implementations forms of the first to fourth aspect may comprise stereo signals as multichannel audio signals, the stereo signal comprising a left channel signal and a right channel signal.
Possible implementations and embodiments of different aspects of the present disclosure are described in the following with reference to the enclosed figures.
In a possible implementation the multichannel input audio signal S is first processed by the spatial parameter extraction unit and the extracted spatial parameters SP are subsequently separately encoded while the generated downmix signal SD can be encoded using an audio encoder.
In a possible implementation the audio bit stream provided by the audio encoder and the bit stream provided by the spatial parameter extraction unit can be combined into a single output bit stream transmitted via the transmission channel 4 to the remote multichannel audio decoder 3. The multichannel audio decoder 3 shown in
In a possible implementation the multichannel audio decoder 3 separates by means of a bit stream de-multiplexer the received downmix signal data and the received spatial parameter data. The received downmix audio signals can be decoded by means of an audio decoder and fed into a spatial synthesis stage performing a synthesis based on the decoded spatial parameters SP. Hence, the spatial parameters SP are estimated at the encoder side and supplied to the decoder side as a function of time and frequency. Both the multichannel audio encoder 2 and the multichannel audio decoder 3 can comprise a transform of filter bank that generates individual times/frequency tiles.
In a possible implementation the multichannel audio encoder 2 can receive a multichannel audio signal S with a predetermined sample rate. The input audio signals are segmented using overlapping frames of a predetermined length. In a possible embodiment each segment is then transformed to the frequency domain by means of FFT. The frequency domain signals are divided into non-overlapping sub bands each having a predetermined band width BW around a centre frequency fc. For each frequency band b spatial parameters SP can be computed by the spatial parameter extraction unit of the multichannel audio encoder 2.
wherein K is the index of the frequency sub band, b is the index of parameter band, kb is the starting subband of band b and X1 and X2 are the spectrums of the two input audio channels, respectively. In this implementation the ICC parameter can take a value between −1 and +1. In an alternative implementation the parameter extraction unit 2A computes the ICC parameter according to the following equation:
In an implementation the ICC parameter can take values only in the range between 0 and 1.
In a possible implementation, the ICC parameters are extracted on the full bandwidth stereo audio signal. In that case, only one ICC parameter is transmitted for each frame and represents the correlation of the two input signals. The ICC extraction can be performed on a full band audio signal (e.g. in time domain).
In the implementation of the multichannel audio encoder 2 the spatial parameter extraction unit 2A also computes a channel level difference CLD parameter which represents the level difference between two input audio channels In a possible implementation the CLD parameter is calculated using the following equation:
wherein k is the index of frequency sub band, b is the index of parameter band, kb is the starting sub band of band b, and X1 and X2 are the spectrums of the first and second input audio channels, respectively.
The interchannel cross correlation parameter ICC indicates a degree of similarity between signal paths. The interchannel cross correlation ICC is defined as an assigned value of a normalized cross correlation function with the largest magnitude resulting in a range of values between −1 and 1. A value of −1 means that the signals are identical but have a different sign (phase inverted). Two identical signals (ICC=1) transmitted by two transducers such as headphones are perceived by the user as a relatively compact auditory event. For noise the width of the received auditory event increases as the ICC between the transducers signal decreases until two distinct auditory events are perceived.
The interchannel level difference CLD indicates a level difference between two audio signals. The interchannel level difference is also sometimes referred to as inter aural level difference, e.g. a level difference between a left and right ear entrance signal.
For example, shadowing caused by a head results in an intensity difference at the left and right ear entrance referred to as interchannel level difference ILD. For example a signal source to the left of a listener results in a higher intensity of the acoustic signal at the left ear than at the right ear of the listening person.
It can be seen from
On the decoder side the transmitted ICC parameter can be decoded for each frequency band. If a negative ICC is present an up mix matrix index can be added to the bit stream to select whether or not an implicit IPD synthesis is to be used by the decoder.
The downmix signal generation unit 2B generates a downmix signal SD. The transmitted downmix signal SD contains all signal components of the input audio signal S. The downmix signal generation unit 2B provides a downmix signal wherein each signal component of the input audio signal S is fully maintained. In a possible implementation a down mixing technique is employed which equalizes the downmix signal such that a power of signal components in the downmix signal SD is approximately the same as the corresponding power in all input audio channels. In a possible implementation the input audio channels are decomposed into a number of subbands. The signals of each sub band of each input channel are added and can be multiplied with a factor in a possible implementation The subbands can be transformed back to the time domain resulting in a downmix signal SD which is transmitted by the downmix signal generation unit 2B via the transmission channel 4 to the multichannel audio decoder 3.
Please note that the steps S31, S32, S33 as illustrated in
The extracted spatial parameters SP and in a possible implementation also the IPD flag are transmitted by the multichannel audio encoder 2 via the transmission channel 4 to the multichannel audio decoder 3 which performs in a possible implementation a decoding according to a further aspect of the present disclosure as illustrated by
In an alternative embodiment the rotation angles θ of the rotation matrix R are calculated on the basis of the derived IPD parameter and a predetermined angle value on the basis of an overall phase difference parameter OPD. This predetermined angle value can be set in a possible implementation to 0.
In a further implementation the derived interchannel phase difference parameter IPD derived in step S42 is smoothed (e.g. by a filter) before calculating the rotation matrix R to avoid switching artefacts.
In step S43 the received downmix audio signal SD is first decorrelated by means of decorrelation filters to provide decorrelated audio signals D. Then the downmix audio signal SD received by the multichannel audio decoder 3 and the decorrelated audio signals D are multiplied in step S43 with the generated synthesis matrix MS to calculate the decoded multichannel audio signal S′.
In a further step S64 the synthesis matrix calculation unit 3C of the multichannel audio decoder 3 calculates an overall phase difference parameter OPDi depending on the derived interchannel phase difference parameter IPDi with the received channel level difference parameter CLDi in step S65. In a possible implementation the overall phase difference parameter OPD is calculated as follows:
In a further step S66 as shown in
In a special implementation for a stereo audio signal downmixed to a mono downmix signal SD the pre-matrix MP is given by:
The rotation matrix R is adapted by the synthesis matrix calculation unit 3C. In a special implementation of a stereo audio signal the rotation matrix R is given by:
wherein
The synthesis matrix calculation 3C then calculates a synthesis matrix MS by multiplying the adjusted rotation matrix R with the prematrix MP as follows:
M
S
=R·M
P
For the special implementation of a stereo audio signal the synthesis matrix MS can be calculated as follows:
The generated synthesis matrix MS is applied by the synthesis matrix calculation unit 3C to the multiplication unit 3D which multiplies the downmix audio signal SD and the decorrelated audio signals D with the generated synthesis matrix M to calculate the decoded multichannel audio signal S′ as shown in
In the special implementation of a stereo audio signal the decoded multichannel audio signal S′ can be calculated as follows:
In this special embodiment only one decorrelated audio signal D and the input downmix signal SD are multiplied with the synthesis matrix MS to obtain a synthesis stereo audio signal S′.
In a possible implementation of the multichannel audio decoder 3 as shown in
In a possible implementation angles θ1, θ2 of the rotation matrix R are calculated as follows by the synthesis matrix calculation unit 3C:
θ1=OPD
θ2=OPD−IPD
In an alternative implementation the angles θ1, θ2 of the rotation matrix R are set two values with a difference of IPD:
θ1=θ
θ2=θ−IPDi
In this implementation a first angle θ1 is not a variable. The constant angle θ can be chosen in order to simplify the processing by the synthesis matrix calculation unit 3C which is not changed during processing. In a possible implementation the value for the angle θ is chosen as θ=0.
In a further step S82 it is decided whether the received interchannel cross correlation parameter ICCi has a negative value and whether the IPD-flag is set. If this is the case the operation continues with step S83, shown in
In a further step S84 the interchannel phase difference parameter IPDi is set to a value of θ.
In a further step S85 a synthesis matrix MS is calculated by the synthesis matrix calculation 3C by multiplying the rotation matrix R with the prematrix MP calculated in step S83. After having calculated the synthesis matrix MS by the synthesis matrix calculation unit 3C the calculated synthesis matrix MS is supplied by the synthesis matrix calculation unit 3C to the multiplication unit 3D which calculates in step S86 a decoded multichannel audio signal for the received downmix audio signal SD by multiplication of the downmix audio signal SD and the corresponding decorrelated audio signals D with the generated synthesis matrix MS.
In step S82 it is detected that the provided interchannel cross correlation parameter ICC is either positive or negative but the implicit IPD-flag is not set, the process continues with step S87. In step S87 the prematrix MP is computed based on the received interchannel cross correlation parameter ICCi. In a further step S88 the synthesis matrix MS is set to the calculated prematrix MP and supplied by the synthesis matrix calculation unit 3C to the multiplication unit 3D for calculating the decoded multichannel audio signal in step S86.
The method and apparatus for encoding and decoding a multichannel audio signal can be used for any multichannel audio signal comprising a higher number of audio channels Generally, the synthesized audio channels can be obtained by the spatial audio decoder 3 as follows:
wherein m is the channel index and x is the index of the decorrelated version of the downmix signal SD.
The multichannel audio device connected to the multichannel audio decoder 3 can also be formed by a multichannel audio apparatus (MCA) 8 as shown in
With the method and an apparatus for encoding and decoding a multichannel audio signal it is possible to optimize the band width occupied by spatial parameters SP while keeping the quality of the reconstructed audio signal. The apparatus allows to reproduce an inversed audio channel without introducing an artificial decorrelated signal. Furthermore, switching artefacts caused by switching from positive to negative ICC and switching from negative to positive ICC are reduced. An improved subjective quality for a negative ICC signal type can be achieved with a reduced bit rate based on implicit IPD synthesisers.
The apparatus and method according to the present disclosure for encoding and decoding multichannel audio signals is not restricted to the above described embodiments and can comprise many variants and implementations. The entities described with respect to the multichannel audio decoder 3 and the multichannel audio decoder 2 can be implemented by hardware or software modules. Furthermore, entities can be integrated into other modules. A transmission channel 4 connecting the multichannel audio encoder 2 and the multichannel audio decoder 3 can be formed by any wireless or wired link or network. In a possible implementation of a multichannel audio encoder 2 and a multichannel audio decoder 3 can be integrated on both sides in an apparatus allowing for bidirectional communication. A network connecting a multichannel audio encoder 2 with a multichannel audio decoder 3 can comprise a mobile telephone network, a data network such as the internet, a satellite network and a broadcast network such as a broadcast TV network. The multichannel audio encoder 2 and the multichannel audio decoder 3 can be integrated in different kind of devices, in particular in a mobile multichannel audio apparatus such as a mobile phone or in a fixed multichannel audio apparatus, such as a stereo or surround sound setup for a user. The improved low bit rate parametric encoding and decoding method allow to better represent a multichannel audio signal, in particular when a cross correlation is negative. According to an aspect of the present disclosure a negative correlation between audio channels is efficiently synthesized using an IPD parameter. In the present disclosure this IPD parameter is not transmitted but derived from other spatial parameters SPs to save bandwidth allowing a low bit rate for data transmission. In a possible implementation an implicit IPD flag is decoded and used for generating a synthesis matrix MS. With the method according to the present disclosure it is possible to better represent signals having a negative ICC without causing switching artefacts from frame to frame when a change in ICC sign occurs. The method according to the present disclosure is particularly efficient for a signal with an ICC value close to −1. The method allows a reduced bit rate for negative ICC synthesisers by using an implicit IPD synthesiser and improves audio quality by applying IPD synthesisers only for negative ICC frequency bands.
In the preceding specification, the subject matter has been described with reference to specific exemplary embodiments. It will, however, be evident that various modifications and changes may be made without departing from the broader spirit and scope as set forth in the claims that follow. The specification and drawings are accordingly to be regarded as illustrative rather than restrictive. Other embodiments may be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein.
This application is a continuation of International Patent Application No. PCT/CN2010/077571, filed on Oct. 5, 2010, which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2010/077571 | Oct 2010 | US |
Child | 13856579 | US |