The invention pertains to the field of the coding by principal component analysis of a multi-channel audio signal for digital audio transmissions on diverse transmission networks at various bit rates. More particularly, the invention is aimed at allowing bit rate-based graduated (also known as scalable) coding so as to adapt to the constraints of the transmission network or to allow audio rendition of variable quality.
Within the framework of the coding of multi-channel audio signals, two approaches are particularly known and used.
The first and older consists in matrixing the channels of the original multi-channel signal so as to reduce the number of signals to be transmitted. By way of example, the Dolby® Pro Logic® II multi-channel audio coding method carries out the matrixing of the six channels of a 5.1 signal into two signals to be transmitted. Several types of decoding can be carried out so as to best reconstruct the six original channels.
The second approach, called parametric audio coding, is based on extracting spatialization parameters so as to reconstitute the listener's spatial perception. This approach is based mainly on a method called “Binaural Cue Coding” (BCC) which is aimed on the one hand at extracting and then coding the indices of the auditory localization and on the other hand at coding a monophonic or stereophonic signal arising from the matrixing of the original multi-channel signal.
Furthermore, an approach exists which is a hybrid of the above two approaches based on a procedure called “Principal Component Analysis” (PCA). Specifically, PCA can be seen as a dynamic matrixing of the channels of the multi-channel signal to be coded. More precisely, PCA is obtained through a rotation of the data whose angle corresponds to the spatial position of the dominant sound sources at least for the stereophonic case. This transformation is moreover considered to be the optimal decorrelation procedure which makes it possible to compact the energy of the components of a multi-component signal. An exemplary PCA-based stereophonic audio coding is disclosed in documents WO 03/085643 and WO 03/085645.
Specifically,
This encoder 109 carries out adaptive filtering of the components arising from the PCA of the original stereo signal comprising the channels L and R.
The encoder comprises rotation means 102, PCA means 104, prediction filtering means 106, subtraction means 108, multiplication means 110, addition means 112, first and second audio coding means 129a and 129b.
The rotation means 102 carry out a rotation of the channels L and R according to an angle α thus defining a principal component y and a residual component r. The angle α is determined by the PCA means 104 so that the principal component y exhibits a higher energy than that of the residual component r.
The multiplication means 110 multiply the residual component r by a scalar γ. The result of the multiplication rγ is added by the addition means 112 to the principal component y. The result of the addition rγ+y is introduced into the prediction filtering means 106.
The filtering parameter Fp which defines the prediction filtering means 106 is coded by the second coding means 129b to generate a coded filtering parameter Fpe.
Moreover, the result of the addition rγ+y is also coded by the first coding means 129a to generate a coded principal component ye.
Thus, the procedure consists in determining the parameters of the prediction filtering means such that these filtering means can generate an estimation of the residual component r arising from the PCA on the basis of the principal component y which has the greatest energy.
The decoder 115 comprises first and second decoding means 141a and 141b, filtering means 120, inverse rotation means 118 and addition and multiplication means 122a and 122b.
The decoder 115 then carries out the inverse operation by decoding the principal component y′e by the first decoding means 141a forming a decoded principal component y′, then by carrying out its filtering by the filtering means 120 into a filtered residual component r′ on the basis of the filtering parameters Fp.
The multiplication means 122b multiply the filtered residual component r′ with the scalar γ forming the product r′γ. The addition means 122a make it possible to subtract r′γ from the decoded principal component y′.
The inverse rotation means 118 apply the inverse rotation matrix as a function of the angle of rotation a to the signals y′ and r′ so as to generate the channels L′ and R′ of the decoded stereophonic signal.
However, the PCA carried out according to the prior art does not adapt to the constraints of the transmission network and does not make it possible to obtain a fine characterization of the signals to be coded.
The present invention relates to a scalable coding method of a multi-channel audio signal comprising a principal component analysis transformation of at least two channels of the said audio signal into a principal component and at least one residual sub-component by rotation defined by a transformation parameter, characterized in that it comprises the following steps:
formation of a frequency subband-based residual structure on the basis of the said at least one residual sub-component, and
definition of a coded audio signal comprising the said principal component, at least one residual structure of a frequency subband and the said transformation parameter.
Thus, the audio coding is graduated in bit rate. This offers the possibility of approaching an asymptotically perfect reconstruction of the original signals. Specifically, using a higher bit rate, the reconstructed signal can be perceptually closer to the original signal.
Advantageously, the method comprises a formation of at least one energy parameter as a function of the said at least one residual sub-component.
The said at least one energy parameter can be formed by a frequency subband-based extraction of energy difference between a decomposition of the said principal component and the said at least one residual sub-component.
As a variant, the said at least one energy parameter corresponds to a subband-based energy of the said at least one residual sub-component.
The method comprises a frequency analysis applied to the said at least one residual sub-component as a function of the said at least one energy parameter so as to form the residual structures of the frequency subbands.
Advantageously, the method comprises a determined order of transmission of the residual structures. The said determined order of transmission can be carried out according to a perceptual order of the subbands or an energy criterion.
Advantageously, the said at least one residual sub-component is a frequency residual sub-component (A(b)) carried out according to a principal component analysis in the frequency domain.
Thus, the principal component analysis in the frequency domain by frequency subbands makes it possible to obtain a finer characterization of the signals to be coded.
The principal component analysis transformation in the frequency domain comprises the following steps:
decomposing the said at least two channels of the said audio signal into a plurality of frequency subbands,
calculating the said at least one transformation parameter as a function of at least a part of the said plurality of frequency subbands,
transforming at least a part of the said plurality of frequency subbands into the said at least one frequency residual sub-component and at least one frequency principal sub-component as a function of the said at least one transformation parameter, and
forming the said principal component on the basis of the said at least one frequency principal sub-component.
Thus, the energy of the signals arising from the PCA principal component analysis carried out by frequency subbands is more compacted in the principal component compared with the energy of the signals arising from a PCA carried out in the time domain.
Advantageously, the said plurality of frequency subbands is defined in accordance with a perceptual scale. Thus, the coding method takes account of the frequency resolution of the human auditory system.
According to another embodiment, the method comprises a frequency subband-based analysis of the said at least one residual sub-component.
According to this other embodiment, the said frequency subband-based analysis comprises the following steps:
application of a short-term Fourier transform to the said at least one residual sub-component to form at least one frequency residual sub-component, and
filtering of the said at least one frequency residual sub-component by a frequency windowing module to obtain the residual structures of the frequency subbands.
Advantageously, the method comprises an analysis of correlation between the said at least two channels to determine a corresponding correlation value, and in that the said coded audio signal furthermore comprises the said correlation value. Thus, the correlation value can indicate any presence of reverberation in the original signal making it possible to improve the quality of the decoding of the coded signal.
The invention is also aimed at a method of decoding a reception signal comprising a coded audio signal constructed according to any one of the above characteristics, the said decoding method comprising a transformation by inverse principal component analysis to form at least two decoded channels corresponding to the said at least two channels arising from the said original multi-channel audio signal, the method being characterized in that it comprises the decoding of at least one residual structure of a frequency subband so as to synthesize at least one decoded residual sub-component.
According to a first embodiment the decoding method comprises the following steps:
receiving the coded audio signal,
extracting a decoded principal component and at least one decoded transformation parameter,
decomposing the said decoded principal component into at least one decoded frequency principal sub-component,
transforming the said at least one decoded principal sub-component and the said at least one decoded residual sub-component into decoded frequency subbands, and
combining the said decoded frequency subbands to form the said at least two decoded channels.
According to a second embodiment the decoding method comprises the following steps:
receiving the coded audio signal,
extracting a decoded principal component and at least one decoded transformation parameter,
forming the said at least two channels decoded by the inverse principal component analysis as a function of the said at least one decoded transformation parameter, of the said decoded principal component and of the said at least one decoded residual sub-component.
The invention is also aimed at a scalable encoder of a multi-channel audio signal, comprising:
transformation means based on principal component analysis transforming at least two channels of the said audio signal into a principal component and at least one residual sub-component by rotation defined by a transformation parameter,
structure formation means for forming a frequency subband-based residual structure on the basis of the said at least one residual sub-component, and
defining means for defining a coded audio signal comprising the said principal component, at least one residual structure of a frequency subband and the said transformation parameter.
The invention is also aimed at a scalable decoder of a reception signal comprising a coded audio signal constructed according to any one of the above characteristics, the decoder comprising:
The invention is also aimed at a system comprising the encoder and the decoder according to the above characteristics.
The invention is also aimed at a computer program downloadable from a communication network and/or stored on a medium readable by computer and/or executable by a microprocessor, characterized in that it comprises program code instructions for executing the steps of the coding method according to at least one of the above characteristics, when it is executed on a computer or a microprocessor.
The invention is also aimed at a computer program downloadable from a communication network and/or stored on a medium readable by computer and/or executable by a microprocessor, characterized in that it comprises program code instructions for executing the steps of the decoding method according to at least one of the above characteristics, when it is executed on a computer or a microprocessor.
Other features and advantages of the invention will emerge on reading the description given, hereinafter, by way of nonlimiting indication, with reference to the appended drawings, in which:
In accordance with the invention,
The coding device 3 comprises an encoder 9 which on receiving a multi-channel audio signal C1, . . . , CM generates a coded audio signal SC representative of the original multi-channel audio signal C1, . . . , CM.
The encoder 9 can be connected to a transmission means 11 for transmitting the coded signal SC via the communication network 7 to the decoding device 5.
The decoding device 5 comprises a receiver 13 for receiving the coded signal SC transmitted by the coding device 3. Furthermore, the decoding device 5 comprises a decoder 15 which on receiving the coded signal SC generates a decoded audio signal C′1, . . . , C′M corresponding to the original multi-channel audio signal C1, . . . , CM.
The encoder 9 comprises principal component analysis (PCA) transformation means 28, defining means 29 and structure formation means 30.
The principal component analysis (PCA) transformation means 28 are intended to transform at least two channels L and R of the multi-channel audio signal into a principal component CP and at least one residual sub-component r by rotation defined by a transformation parameter or angle of rotation θ.
The structure formation means 30 are intended to form a frequency subband-based residual structure Sfr on the basis of the said at least one residual sub-component r.
Furthermore, the defining means 29 are intended to define a coded audio signal SC comprising the principal component CP, at least one part of the residual structure Sfr and the said at least one transformation parameter θ.
Thus, this scalable coding allows adaptation to the constraints of the transmission network 7. It also makes it possible to reconstruct a signal perceptually closer to the original signal.
The structure formation means 30 comprise frequency analysis means 31 allowing the formation of at least one energy parameter E as a function of the said at least one residual sub-component r.
As a variant, the frequency analysis means 31 allow the formation of at least one energy parameter E by a frequency subband-based extraction of energy difference between a decomposition of the principal component CP and the residual sub-component or sub-components r. Specifically, the dotted arrow shows that the energy parameter E depends on the principal component and more particularly on a frequency decomposition of the principal component CP.
Moreover, the energy parameter or parameters E can correspond to subband-based energies of the residual sub-component or sub-components r.
Thus, the frequency analysis means 31 make it possible to apply a frequency analysis to at least one residual sub-component r as a function of at least one energy parameter E so as to form a frequency subband-based residual structure Sfr.
Thus, the fine residual structure of the audio signal, over the whole of the frequency band, is composed of the residual structures of the frequency subbands thus formed. To designate the residual structure of a frequency subband, it is possible to speak of a frequency subband-based residual structure or else of a frequency band of the (global) fine residual structure.
Advantageously, this coding method adapts to the capabilities of the transmission network 7 and/or of the desired audio playback quality by virtue of the introduction of scalability in terms of coding bit rate for the residual component or ambiance.
Thus, it is possible to use a traditional monophonic audio coder (MPEG-1 Layer III or Advanced Audio Coding for example) to transmit the principal component while carrying out a flexible audio coding of the ambiance signal.
According to the coding method considered, the energy parameter E, transformation parameter θ, or filtering parameter used to generate the ambiance component r when decoding are accompanied by the fine residual structure Sfr of this ambiance signal r.
Moreover, the transmission of this residual structure Sfr can be carried out according to various determined orders of transmissions.
By way of example, the transmission of the residual structure Sfr can be carried out according to a perceptual order of the subbands or according to an energy criterion or according to a correlation of the components arising from the PCA in subbands. This ordering can also be a combination of some of these criteria.
Specifically, the order of transmission of the fine residual structure Sfr of the ambiance component (or of the ambiance components) can be put in place so as to prioritize the information to be transmitted. Certain frequency bands of the fine residual structure Sfr can be transmitted in priority. Thus, the ordering can be carried out according to frequency bands of a quantized spectral envelope. This ordering can be predefined according for example to an increasing order or according to any other order.
Furthermore, the coding method can comprise an analysis of correlation between the two channels L and R to determine a corresponding correlation value c. Thus, the coded audio signal SC can also comprise this correlation value c.
It will be noted that
The decoder 15 comprises transformation means 44 based on inverse principal component analysis (PCA−1) and frequency synthesis means 45.
Thus, on receipt of a coded signal SC comprising a principal component CP, at least one part of a residual structure Sfr and at least one transformation parameter θ, the decoder 15 forms at least two decoded channels L′ and R′ corresponding to the two channels L and R arising from the original multi-channel audio signal.
Specifically, the frequency synthesis means 45 allow the decoding of the frequency subband-based residual structure Sfr so as to synthesize at least one decoded residual sub-component r′.
The transformation means 44 based on inverse principal component analysis (PCA−1) then form the two decoded channels L′ and R′ as a function of the decoded residual sub-component r in addition to the principal component CP and the transformation parameter θ.
The encoder 9 comprises principal component analysis transformation means 28, defining means 29 and structure formation means 30.
The principal component analysis transformation means 28 comprise rotation means 2 and PCA means 4.
The defining means 29 comprise first and second audio coding means 29a and 29b and quantizing means 29c.
Furthermore, the encoder 9 comprises prediction filtering means 6, subtraction means 8, multiplication means 10 and addition means 12.
The rotation means 2 generate a principal component y and a residual sub-component r by means of a rotation of the channels L and R according to an angle α extracted from the PCA means 4.
The multiplication means 10 multiply the residual sub-component r by a scalar γ. The scalar γ allows the mixing of the signals arising from the rotation so as to facilitate the prediction of the signal r on the basis of the signal y.
The result of the multiplication rγ is added by the addition means 12 to the principal component y. The result of the addition rγ+y is applied to the first coding means 29a to generate a coded principal component y′e.
Moreover, the result of the addition rγ+y is introduced into the prediction filtering means 6 which consist of the series association of an adaptive filter and of a reverberation filter.
The filtering parameter Fp output by the prediction filtering means 6 is applied to the second coding means 29b to generate a coded filtering parameter Fpe.
The structure formation means 30 make it possible to add to this information the fine residual structure Sfr of the residual sub-component r or ambiance arising from the principal component analysis transformation means 28. Specifically, the use of the prediction filtering means 6 to generate a signal Fp which must be decorrelated from the useful signal for prediction is not very suitable. Consequently if the decoder benefits from additional information, admittedly at a higher bit rate, then the ambiance component generated makes it possible to carry out a better conditioned inverse PCA.
The structure formation means 30 carry out a frequency subband-based analysis of the residual sub-component r.
Specifically, these structure formation means 30 comprise frequency transformation means 16 in addition to the frequency analysis means 31.
The frequency transformation means 16 make it possible (for example, by applying a short-term Fourier transform STFT to the residual sub-component r) to form at least one frequency residual sub-component r(b).
Thereafter, the frequency analysis means 31 make it possible to obtain the frequency subband-based residual structure Sfr, for example by filtering the frequency residual sub-component by means of a frequency filter bank.
Thus, the fine structure Sfr(n,b) for each frequency subband b and each analysed signal portion n can be quantized by the quantizing means 29c and transmitted by the transmission means 11 from the coding device 3 to a decoding device 5.
The decoder 15 comprises frequency synthesis means 45 and transformation means 44 based on inverse principal component analysis (PCA−1) comprising inverse rotation means 18.
Furthermore, the decoder comprises extraction means 21, filtering means 20, and addition and multiplication means 22a and 22b. The extraction means 21 comprise first and second decoding means 41a and 41b.
Thus, by virtue of the reception of the coefficients of the adaptive filter Fpe, of the angle of rotation a, of the scalar γ and of the signal y′e, the decoder 15 then carries out the inverse operation by decoding the principal component y′e by the first decoding means 41a forming a decoded principal component y′, then by carrying out its filtering by the filtering means 20 into a filtered residual component r′ on the basis of the filtering parameters Fp arising from the second decoding means 41b.
The multiplication means 22b multiply the filtered residual component r′ with the scalar γ forming the product r′γ. The addition means 22a make it possible to subtract r′γ from the decoded principal component y′.
The inverse rotation means 18 apply the inverse rotation matrix as a function of the angle of rotation a to the signals y′ and r′ so as to generate the channels L′ and R′ of the decoded stereophonic signal.
If the residual structure Sfr(n,b) of the frequency subbands of the component r has been transmitted by the encoder 9 then a signal r″ can be generated by the frequency synthesis means 45 before carrying out the inverse rotation by the inverse rotation means 18.
Thus, the two decoded channels L′ and R′ can be formed by the inverse principal component analysis as a function of the decoded transformation parameter (or angle of rotation) of the decoded principal component y′ and of the decoded residual sub-component r.
Furthermore the decoder 15 can comprise decoding frequency transformation means 54 and decoding frequency analysis means 56 making it possible to form subbands on the basis of the filtered residual component r′.
Specifically, in the case of a partial reception of the residual structure Sfr(n,b) (reception of a few frequency subbands), the frequency synthesis means 45 use the subbands arising from the synthesis r′ to supplement the subbands whose fine structure has not been received.
According to this example, the encoder 9 is intended to code a stereophonic signal which can be defined by a succession of frames n, n+1, etc. and comprising two channels Left L and Right R.
The encoder 9 comprises principal component analysis (PCA) transformation means 28, defining means 29 and structure formation means 30.
The principal component analysis (PCA) transformation means 28 comprise decomposition means 21, calculation means 23, PCA means 25 and combining means 27.
Thus, for a determined frame n, the decomposition means 21 decompose the two channels L and R of the stereophonic signal into a plurality of frequency subbands l(n,b1), . . . , l(n,bN), r(n,b1), . . . , r(n,bN).
Specifically, the decomposition means 21 comprise short-term Fourier transform means (STFT) 61a and 61b and frequency windowing means 63a and 63b making it possible to group the coefficients of the short-term Fourier transform together into subbands.
Thus, a short-term Fourier transform is applied to each of the input channels L and R. These channels expressed in the frequency domain can then be windowed by frequency 63a and 63b according to N bands defined in accordance with a perceptual scale equivalent to the critical bands.
The calculation means 23 are intended to calculate at least one transformation parameter θ(n,bi) from among a plurality of transformation parameters θ(n,b1), . . . , θ(n,bN) as a function of at least a part of the plurality of frequency subbands.
By way of example, the calculation of the transformation parameters can be carried out by calculating a covariance matrix. The covariance matrix can then be calculated by the calculation means 23 for each signal frame n analysed and for each frequency subband bi.
Thus, eigenvalues λ1(n, bi) and λ2(n, bi) of the stereophonic signal are then estimated for each frame n and each subband bi, allowing the calculation of the transformation parameter or angle of rotation θ(n,bi).
It will be noted that it is also possible to calculate the transformation parameters solely on the basis of a covariance of the two original channels L and R.
This angle of rotation θ(n,bi) corresponds to the position of the dominant source at frame n for subband bi and so allows the rotation or transformation means 25 to carry out a frequency subband-based rotation of the data to determine a frequency principal component CP(n, bi) and a frequency residual (or ambiance) component A(n, bi). The energies of the components CP(n, bi) and A(n, bi) are proportional to the eigenvalues λ1 and λ2 such that: λ1>λ2. Consequently, the signal A(b) has a much lower energy than that of the signal CP(b).
The combining means 27 combine the frequency principal sub-components CP(n, b1), . . . , CP(n, bN) to form a single principal component CP(n).
Specifically, these combining means 27 comprise inverse STFT means 65a and addition means 67a. The sum by the addition means 67a of these limited-band frequency components CP(n, bi) then makes it possible to obtain the full-band principal component CP(n) in the frequency domain. The inverse STFT of the component CP(n) results in a full-band temporal component.
The structure formation means 30 comprising frequency analysis means 31 make it possible to form at least one energy parameter E(n,bi) from among a set of energy parameters E(n,b1), . . . , E(n,bN) as a function of the frequency residual sub-components A(n,b1), . . . , A(n,bN) and/or frequency principal sub-components CP(n,b1), . . . , CP(n,bN).
According to a first embodiment, the energy parameters E(n,b1), . . . , E(n,bN) are formed by extracting the frequency subband-based energy differences between the frequency principal sub-components CP(n,b1), . . . , CP(n,bN) and the frequency residual sub-components A(n,b1), . . . , A(n,bN).
According to another embodiment, the energy parameters E(n,b1), . . . , E(n,bN) correspond directly to the frequency subband-based energy of the frequency residual sub-components A(n,b1), . . . , A(n,bN).
Consequently, in order to better synthesize the sound ambiance, the coded audio signal SC can advantageously comprise at least one energy parameter from among the set of energy parameters E(n,b1), . . . , E(n,bN).
Furthermore, the structure formation means 30 make it possible to apply a frequency analysis to at least one residual sub-component A(n,bi) as a function of at least one energy parameter E(n,bi) to form the frequency subband-based residual structure Sfr(n,bi).
Thus, if the capabilities of the transmission network 7 so allow or if a higher audio quality is expected, the energy parameter or parameters E(n,b1), . . . , E(n,bN) can be accompanied by at least one part of the subband-based fine structure of the residual component A(n,bi) of the signal Sfr(n,bi).
This graduated approach to the coding of the residual component A(n,bi) offers the capability of transmitting additional information so as to approach an asymptotically perfect reconstruction of the original stereophonic signal. Specifically, using a higher bit rate, the reconstructed stereophonic signal will be perceptually closer to the original stereophonic signal.
Furthermore, the encoder 9 can comprise correlation analysis means 33 for carrying out an analysis of temporal correlation between the two channels L and R so as to determine a corresponding correlation index or value c(n). Thus, the coded audio signal SC can advantageously comprise this correlation value c(n) to indicate any presence of reverberation in the original signal.
The defining means 29 can comprise an audio coding means 29a for coding the principal component CP and quantizing means 29c, 29d, 29e and 29f for quantizing at least one part of the residual structure Sfr(n,bi), the transformation parameter or parameters θ(n,bi), at least one part of the residual structure Sfr(n,bi), the energy parameter or parameters E(n,bi) and the correlation value c(n) respectively.
The decoder 15 comprises transformation means 44 based on inverse principal component analysis (PCA−1) and frequency synthesis means 45.
The transformation means 44 based on inverse principal component analysis (PCA−1) comprise extraction means 41, decoding decomposition means 43, inverse transformation means 47, and decoding combining means 49.
Thus, on receipt of the coded audio signal SC(n), the extraction means 41 comprise monophonic decoding means 41a for extracting the decoded principal component CP′ and dequantizing means 41c, 41d, 41e and 41f for extracting the residual structure SfrQ(n,bi), the transformation parameters or angles of rotation θQ(n,bi), the energy parameters EQ(n,bi), and the correlation value cQ(n).
The decoding decomposition means 43 comprising for example STFTs 62a and filter banks 62b decompose the decoded principal component CP′ by a frequency windowing with N bands into decoded frequency principal sub-components.
Furthermore, a residual component A′(n, bi) can be synthesized by the frequency synthesis means 45 on the basis of the decoded audio stream CP′(n, bi), spectrally shaped by the dequantized energy parameters EQ(n,bi) and possibly by the residual structure SfrQ(n,bi).
Specifically, the additional information transmitted by the encoder 9 may or may not be used by the decoder 15. Thus, the residual fine structure Sfr(n,bi) of the frequency subband-based residual component A(n,bi) can therefore be used during the frequency synthesis of the signal A′(n, bi) on the basis of the decoded and possibly filtered signal CP′.
The frequency synthesis of the signal A′(n, bi) thus employs the energy parameters EQ(n,bi) and possibly the fine structure Sfr(n,bi) of the dequantized residual component.
The decoder 15 then carries out the operation inverse to the coder since the PCA is a linear transformation. The inverse PCA is carried out by the inverse transformation means, by multiplying the signals CPH′(n, bi) and A′(n, bi) by the matrix transpose of the rotation matrix used for encoding. This is made possible by virtue of the inverse quantization of the angles of rotation based on frequency subbands.
It will be noted that the signals CP′H(n, bi) correspond to the principal components CP′(n, bi) decorrelated by reverberation or decorrelation filtering means 49.
Specifically, due to the decorrelation properties of the PCA, the use of a decorrelation or reverberation filter is desirable for synthesizing a decorrelated component CP′H(n, bi) of the signal CP′(n, bi) and as a consequence of the signal A′(n, bi).
The filtering means 49 comprise a filter whose impulse response h(n) is dependent on the characteristics of the original signal. Specifically, the temporal analysis of the correlation of the original signal at frame n determines the correlation value c(n) which corresponds to the choice of the filter to be used for decoding. By default, c(n) imposes the impulse response of an all-pass filter with random phase which greatly reduces the inter-correlation of the signals CP′(n, bi) and CP′H(n, bi). If the temporal analysis of the stereo signal reveals the presence of reverberation, c(n) imposes the use, for example, of Gaussian white noise of decreasing energy so as to reverberate the content of the signal CP′(n, bi).
The combining means 49 comprising inverse STFT means 71a and 71b and addition means 73a and 73b combine the decoded frequency subbands to form two decoded components L′ and R′.
This graduated approach to the coding of the residual component A(n, bi) offers the capability of transmitting additional information so as to approach a reconstruction that is very close to the original stereophonic signal.
The encoder 109 is distinguished from that of
Furthermore, it comprises three inverse STFT means 65a, 65b and 65c as well as three addition means 73a, 73b and 73c.
The PCA is then applied to a triple of signals L, C and R. The 3D three-dimensional PCA is then carried out by a 3D rotation of the data, parametrized by the Euler angles (α,β,γ). Just as for the stereophonic case, these angles of rotation are estimated for each frequency subband on the basis of the covariance and eigenvalues of the original multi-channel signal.
The signal CP contains the sum of the dominant sound sources and the part of the ambiance components which coincides spatially with these sources present in the original signals.
The sum of the secondary sound sources, which spectrally overlap with the dominant sources, and of the other ambiance components is distributed proportionately to the eigenvalues A2 and A3 in the signals A1 and A2 which have markedly less energy than the signal CP since: λ1>λ2>λ3.
Thus, the coding method applied to the stereophonic signals can be extended to the case of multi-channel signals C1, . . . , C6 of 5.1 format comprising the following channels: Left L, Centre C, Right R, Back Left (Left surround) Ls, Back Right (Right surround) Rs, and Low Frequency (Low Frequency Effect) LFE.
Specifically,
Thus, this encoder 209 makes it possible to carry out a first PCA1 of the triple 80a of signals (L, C, Ls) according to the encoder 109 of
Thus, the pair of principal components (CP1, CP2) can be considered to be a stereophonic signal (L, R) spatially coherent with the original multi-channel signal.
It is appropriate to specify that the LFE signal can be coded independently of the other signals since the discrete-nature low-frequency content of this channel is almost insensitive to the reduction in the inter-channel redundancies.
The encoding adapts to the bit rate constraints of the transmission network by transmitting a stereophonic signal coded by a stereophonic audio coder 81a accompanied by parameters quantized by quantizing means 81a to 81d, as well as quantizing means 91a to 91d defined for each frame n and each frequency subband bi.
Thus, the stereophonic audio coder 81a makes it possible to code the pair of principal components (CP1, CP2). The quantizing means 81b make it possible to quantize the Euler angles (α,β,γ) that are useful for the PCAs of each triple of signals.
The quantizing means 81d make it possible to quantize the values c1(n) and c2(n) determining the choice of the filter to be used for each triple of signals.
Furthermore, frequency synthesis means 45 comprising filtering and frequency analysis means 83a and 83b make it possible to determine frequency subband-based parameters or energy differences Eij(n,b) (1≦i,j≦2) between the signals CP1 and A11, A12 as well as the signals CP2 and A21, A22 respectively.
As a variant, the energy parameters can correspond to the subband-based energies of the signals A11, A12 and A21, A22.
The energy parameters Eij(n,b) can then be quantized by the quantizing means 81c.
Furthermore, the fine residual structures SfAij(n,b) with 1≦i,j≦2 of the four residual or ambiance signals A11, A12 and A21, A22 arising from the 3D PCAs can be quantized by the quantizing means 91a to 91d.
Just as for the coding of the stereophonic signals, at least one part of the fine structures SfAij(n,b) of the residual signals A11, A12 and A21, A22 can be transmitted as additional information using a higher bit rate and consequently a superior audio reconstruction quality.
Moreover, this computerized system can be used to execute a computer program comprising program code instructions for implementing the coding or decoding method according to the invention.
Specifically, the invention is also aimed at a computer program product downloadable from a communication network comprising program code instructions for executing the steps of the coding or decoding method according to the invention when it is executed on a computer. This computer program can be stored on a medium readable by computer and can be executable by a microprocessor.
This program can use any programming language, and be in the form of source code, object code, or code intermediate between source code and object code, such as in a partially compiled form, or in any other desirable form.
The invention is also aimed at an information medium readable by a computer, and comprising instructions of a computer program such as mentioned above.
The information medium can be any entity or device capable of storing the program. For example, the medium can comprise a storage means, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or else a magnetic recording means, for example a diskette (floppy disc) or a hard disc.
Moreover, the information medium can be a transmissible medium such as an electrical or optical signal, which can be trunked via an electrical or optical cable, by radio or by other means. The program according to the invention can be in particular downloaded from a network of Internet type.
Alternatively, the information medium can be an integrated circuit into which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the method in question.
Thus, the invention allows a bit rate-scalable audio coding. This offers the capability of approaching an asymptotically perfect reconstruction of the original signals. Specifically, using a higher bit rate, the reconstructed signal will be perceptually closer to the original signal.
Furthermore the method according to the invention is graduated in terms of number of decoded channels. For example, the coding of a signal in the 5.1 format also allows decoding as a stereophonic signal so as to ensure compatibility with various playback systems.
The fields of application of the present invention are digital-audio transmissions on diverse transmission networks at various bit rates since the proposed procedure makes it possible to adapt the coding bit rate as a function of the network or of the quality desired.
Moreover, this method is generalizable to multi-channel audio coding with a larger number of signals. Specifically, the proposed procedure is by nature generalizable and applicable to numerous 2D and 3D audio formats (6.1, 7.1 formats, ambisonic, wave field synthesis, etc.).
A particular exemplary application is the compression, transmission and then playback of a multi-channel audio signal on the Internet following an order/purchase by a cybernaut (listener). This service is moreover commonly called “audio on demand”. The proposed procedure then makes it possible to encode a multi-channel signal (stereophonic or of 5.1 type) at a bit rate supported by the Internet network linking the listener to the server. Thus, the listener can listen to the sound scene decoded in the format desired on his multi-channel broadcasting system. In the case where the signal to be transmitted is of 5.1 type but the user does not possess a multi-channel playback system, the transmission can then be limited to the principal components of the starting multi-channel signal; and subsequently, the decoder delivers a signal with fewer channels such as a stereophonic signal for example.
Number | Date | Country | Kind |
---|---|---|---|
0650883 | Mar 2006 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR2007/050897 | 3/8/2007 | WO | 00 | 9/15/2008 |