This application is a §371 application from PCT/FR2010/052671 filed Dec. 10, 2010, which claims priority from French Patent Application No. 09 59547 filed Dec. 23, 2009, each of which is incorporated herein by reference in its entirety.
The invention relates to a method for encoding/decoding a digital stereo sound stream as well as a device made up of an encoder and an associated decoder. The purpose of the invention is in particular to improve a standard system of the type encoder/decoder (codec) making it possible to encode and decode a digital stereo/audio stream.
The invention finds an application particularly advantageous in the field of codecs for the compression of stereo/audio signals such as for example codecs of the type MP3. However, the invention could also be used with any type of codec adapted to the encoding and the decoding of two digital sound signals.
Digital codecs of the type MP3 or other formed by a standard encoder makes it possible to encode, according to a known encoding protocol, digital stereo sound signals for example in WAVE format in order to transform said digital stereo sound signals into encoded stereo signals; a standard decoder is also known which makes it possible to decode, according to a known decoding protocol, encoded stereo signals in order to transform them into digital stereo signals for example in WAVE format. In general, encoding consists in compressing stereo signals, while decoding consists in decompressing compressed stereo signals.
The problem is that the transmission channel available for encoding is generally limited to N kbits/s (N generally being equal to 64 or 128). However when a stereo signal formed of two audio channels: a right sound channel and a left sound channel, is encoded according to the characteristics of the codecs used, it can be necessary to encode approximately each audio channel of the signal at a transfer rate of N/2 kbits/s.
The invention makes it possible to increase the quality of the final stereo signal without increasing the transfer rate in the transmission channel; or to preserve the quality of the final stereo signal by reducing the transfer rate in the transmission channel.
For this purpose, the device according to the invention comprises a so-called pre-processing module associated with the standard encoder acting before the encoding process, which combines the stereo signals in order to transform said stereo signals into a single combined signal. The invention also comprises a post-processing module associated with the decoder acting after decoding the compressed signal, which makes it possible to generate the two audio signals from the single combined signal generated by the pre-processing module. The function of this post-processing module is to generate two sound signals (right and left) decorrelated relative to one another from the decompressed combined signal.
Thus, in the invention, there is only one signal to be encoded (the single combined signal) instead of the two right and left signals in the traditional methods. That makes it possible either to less compress the combined signal in order to increase the quality of the final signal, or to decrease the transfer rate in the transmission channel while having the same quality as with the existing encoding methods.
Preferably, in order to enable the decoder to detect whether it is question of a stream encoded by the method according to the invention or a standard stream not encoded by the invention, a meta-datum is added into the data frame encoded by the encoder, which indicates whether the method according to the invention is activated or not. The site of this meta-datum in the frame encoded by the encoder can vary according to the standard encoding used.
The invention thus relates to a method for encoding and decoding a digital audio signal composed of an original right sound signal and of an original left sound signal, wherein said method comprises the following steps:
According to an implementation, in order to combine the right and left original sound signals into a single combined signal, a point-to-point weighted sum of the samples of the original right sound signal and the original left sound signal are carried out in the temporal field.
According to an embodiment, in order to generate from the decompressed combined signal the restored right and left sound signals, the decompressed combined signal is applied to the input of a first and a second elementary block, the output signal for these blocks corresponding respectively to the restored right electric sound signal and to the restored left electric sound signal, the output signal for each block being the combination of the input signal for the block weighted by a first gain, and of the combination of the output signal for the block weighted by a second gain and of the input signals for the block delayed by a delay line.
According to an embodiment:
for the first elementary block,
s1(n)=e1(n)·g1+s1(n−D1)·g2+e1(n−D1)
e1 being the input signal for the first block corresponding to the decompressed combined signal,
s1 being the output signal for the first block corresponding to one of the restored sound signals (right or left),
g1, g2 being respectively the values of the first gain and the second gain for the first block,
n being the nth harmonic sample,
D1 being the value of the number of delay samples introduced by the delay line, and
for the second elementary block:
s2(n)=e2(n)·g3+s2(n−D2)·g4+e2(n−D2)
e2 being the input signal for the second block corresponding to the decompressed combined signal,
s2 being the output signal for the second block corresponding to the other restored sound signal (right if s1 corresponds to the left one or left if s1 corresponds to the right one),
g3, g4 being respectively the values of the first gain and the second gain for the second block,
n being the nth harmonic sample,
D2 being the value of the number of delay samples introduced by the delay line.
According to an embodiment, the gain values inside a block are opposite one another, the value of the first gain being opposite the value of the second gain.
According to an embodiment, the gain values of the first block are opposite the gain values of the second block, the value of the first gain of the first block being opposite the value of the first gain of the second block; while the value of the second gain of the first block is opposite the value of the second gain of the second block.
According to an embodiment, the gain values of the first and second elementary block have the same absolute value.
According to an embodiment, the first gain of the first block and the second gain of the second block are equal to g; while the second gain of the first block and the first gain of the second block are equal to −g.
According to an embodiment, the delay introduced by the line of the first block and the delay introduced by the line of the second block is equal to one another.
According to an embodiment, the decompressed combined signal is first filtered by means of a high-pass filter and only the filtered high frequency part is applied to the input of the elementary blocks.
According to an embodiment,
According to an embodiment, the output signals of each elementary block by means of parametric filtering cells is filtered in gain and in phase in order to modify the sound perception of these output signals.
According to an embodiment, to enable the decoder to detect whether it is question of an encoded stream formed of a combined signal or a standard stream, a meta-datum is added into the data frame encoded by the encoder, which indicates whether the step of combining the original right and left signals into a single combined signal is activated or not.
According to an embodiment, for each restored right and left sound signal primarily formed of a low frequency component lower than a cut-off frequency,
The invention moreover relates to a digital stream encoder used with the decoder according to the invention for the implementation of the method for encoding and decoding a digital audio signal composed of an original right sound signal and of an original left sound signal according to the invention, wherein said digital stream encoder comprises:
The invention also relates to a digital stream decoder by means of the encoder according to the invention for the implementation of the method for encoding and decoding a digital audio signal composed of an original right sound signal and of an original left sound signal according to the invention, wherein said digital stream decoder comprises:
According to an embodiment, said decoder moreover comprises a module for generating treble frequencies including:
According to an embodiment, the upper and lower limits of the band-pass filter depend on the compression ratio applied by the method.
The invention will be better understood when reading the following description and examining the annexed figures. These figures are given only as an illustration but by no means a restriction of the invention. They show:
a-7e: very schematic representations of the signals that can be observed when using the module for generating high frequency components in
Identical elements have the same reference throughout the figures.
In addition, the device 1 according to the invention comprises a decoder 7 according to the invention formed by a standard decoder 8 and an associated post-processing module 9. The decoder 8 could be for example a decoder of the MP3 type integrated into a digital music player or an audio decoder integrated into a digital television decoder (set top box).
When operating, a stereo signal formed by an original right sound signal SDO and an original left sound signal SGO are applied to the input of the pre-processing module 3. The original right SDO and left SGO sound signals are sampled and quantified signals. As shown in
The combined signal Sc is applied to the input of the encoder 5 which compresses the signal Sc according to a known compression protocol so as to obtain a compressed combined signal SCC. This signal SCC could be for example transmitted on any type of wired media, radio, or other or even saved on a digital storage medium such as for example a CD-ROM or a memory of the USB type.
Since it is enough to encode the combined signal SC whereas the two signals (right and left) of the stereo signal need to be encoded in the existing methods, it is clear that the method according to the invention makes it possible to limit the stream in the available encoding channel 10, or then to reduce the compression ratio for improving the final sound rendering if the same transfer rate as in the existing methods is kept.
The compressed combined signal SCC is applied to the input of the decoder 8 which decompresses it, according to a known decompression protocol, so as to obtain a decompressed combined signal SCD.
The signal SCD is then applied to the input of the post-processing module 9 comprising, as shown in
For this purpose, the decorrelating module 11 is made of two elementary blocks 13.1-13.2 to the input of which the decompressed combined signal SCD is applied, the output of these blocks 13.1, 13.2 corresponding respectively to the restored right sound signal SDR and to the restored left sound signal SGR. The output signal s1 (resp. s2) of each block 13.1 (resp. 13.2) depends on the combination of the input signal e1 (resp. e2) for the block weighted with a first gain g1 (resp. g3), and of the combination of the input signals e1 (resp. e2) and of the output signal s1 (resp. s2) for the block weighted with a second gain g2 (resp. g4), delayed by a delay line 14.1 (resp. 14.2).
According to an embodiment, for each elementary block 13.1, 13.2, the input signal e1, e2 is applied to the input of a first adder 16.1, 16.2 and is applied to an input of a second adder 17.1, 17.2 after being multiplied by the first gain g1, g3. The output signal s1, s2 for the block is applied to another input of the first adder 16.1, 16.2 after being multiplied by the second gain g2, g4, the output signal of the first adder 16.1, 16.2 being applied to the input of the delay line 14.1, 14.2. The output signal for the delay line 14.1, 14.2 is applied to another input of the second adder 17.1, 17.2, the output signal for this second adder 17.1, 17.2 corresponding to the output signal s1, s2 of for block and thus to the restored right SDR or left SGR sound signal.
Thus, for the first elementary block 13.1:
s1(n)=e1(n)·g1+s1(n−D1)·g2+e1(n−D1)
e1 being the input signal for the first block 13.1 corresponding to the decompressed combined signal,
s1 being the output signal for the first block 13.1 corresponding to one of the restored sound signals (right or left),
g1, g2 being respectively the values of the first gain and the second gain of the first block 13.1,
n being the nth harmonic sample,
D1 being the value of the number of delay samples introduced by the delay line 14.1.
For the second elementary block 13.2:
s2(n)=e2(n)·g3+s2(n−D2)·g4+e2(n−D2)
e2 being the input signal for the second block 13.2 corresponding to the decompressed combined signal,
s2 being the output signal for the second block 13.2 corresponding to the other restored sound signal (right if s1 corresponds to the left one or left if s1 corresponds to the right one),
g3, g4 being respectively the values of the first gain and the second gain of the second block 13.2,
n being the nth harmonic sample,
D2 being the value of the number of delay samples introduced by the delay line 14.2.
Preferably, inside the same block 13.1 (resp. 13.2), the first gain g1 (resp. g3) and the second gain g2 (resp. g4) have values opposite one another. Each block 13.1, 13.2 then behaves as a filter of the all-pass type which does not modify the gain of the input signal e1, e2 but only the phase thereof.
Moreover, the gains g1, g2 of the first block 13.1 and the gains g3, g4 of the second block 13.2 have preferably values opposite one another. Thus, the value of the first gain g1 of the first block 13.1 is opposite the value of the first gain g3 of the second block 13.2; while the value of the second gain g2 of the first block 13.1 is opposite the value of the second gain g4 of the second block 13.2.
Gains for the first 13.1 and the second 13.2 block which have an identical absolute value g will also preferably be chosen. Thus, preferably, the first gain g1 of the first block 13.1 and the second gain g4 of the second block 13.2 have a value g; while the second gain g2 of the first block 13.1 and the first gain g3 of the second block 13.2 have a value −g.
Preferably, the delays D1, D2 introduced by the delay line 14.1 of the first elementary block 13.1 and the delay line 14.2 of the second elementary block 13.2 are equal to one another. However, it would be possible to choose delays D1, D2 with different durations
In an embodiment example, g=0.4 and delays D1 and D2 of 176 samples each are chosen, such values allowing to obtain a good sound rendering.
In an improvement of the invention represented in
The low frequency part of the signal SCD is applied to the input of a third delay line 23 and the thus-delayed low frequency part is added, if need be after weighting with a gain g7, to the output signals s1, s2 of the elementary blocks, so as to obtain restored right SDR and left SGR sound signals with an improved sound rendering. For one realizes that statistically the low frequency signals are very correlated, it is not therefore advisable to decorrelate them by means of the decorrelating module 11, otherwise the general audiophonic perception will not appear natural in the ear. In an example, the delay D3 applied by the third delay line 23 is equal to 176 samples (at a sampling rate of 44.1 KHz).
Moreover, parametric equalization cells 25.1, 25.2 is connected to the output of each elementary block 13.1, 13.2 before addition to the delayed low frequency part. These cells 25.1, 25.2 cause the modification of the perception of the output signals s1, s2 of these blocks 13.1, 13.2 because, even if the signals s1, s2 have substantially identical levels, there are differences in the perception thereof because of the decorrelation relative to one another. Consequently, it can be useful to modify these signals from a perceptive point of view so that the general sound impression is as best as possible.
For this purpose, each equalization cell 25.1, 25.2 comprises a filter 26.1, 26.2 whose gain and phase can be adjusted according to various frequency bands of the signals s1, s2 and a gain g5, g6 which act on all the spectrum of the signals s1, s2. These gain and phase parameters are adapted by sound engineers in particular according to the application considered.
Preferably, in order that the decoder 8 can detect whether it is question of a stream encoded by the method according to the invention or of a standard stream not encoded by the invention, a meta-datum M is added into the data frame encoded by the encoder 5, which indicates whether the method according to the invention is activated or not. This meta-datum M is of the static type, i.e. it will be able for example to take only two different values so that, when the decoder 7 detects in the encoded frame the first value (for example 1) corresponding to the activation of the pre-processing module 3, it activates the post-processing module 9; and when the decoder 7 detects in the encoded frame the second value corresponding to the deactivation of the pre-processing module 3, it inhibits the post-processing module 9 and uses in a traditional way the standard decoder 8 for decoding the stereo signal in the two right and left channels. Indeed, in the case of the deactivation of the module 3, the signals SDO and SGO are directly applied to the input of the standard encoder 5 for a traditional encoding, then transmitted to the decoder 8, then decoded in a traditional way by the decoder 8 in order to obtain a restored left signal SGR and a restored right signal SDR.
The site of this meta-datum M in the frame 30 encoded by the encoder 5 can vary according to the standard encoding used.
In an improvement of the invention, an analysis of the correlation between the original right SDO and left SGO sound signals are carried out in definite frequency bands so as to produce a coefficient representative of the correlation in each band.
The calculated correlation coefficients are packed as meta-data into the heading 30.1 of the encoded signal.
Then, the parameters g1, g2, g3, g4, D1, D2 of the elementary blocks 13.1 and 13.2 are adapted according to the received correlation values, so as to decorrelate each range of frequencies differently.
For this purpose, a table stored in a memory gives the correspondence between the parameters of each block 13.1, 13.2 (first gain g1, g3 and second gain g2, g4 and delay D1, D2 of the line 14.1, 14.2) and the received correlation ratios. The decorrelation ratio of the decorrelating module 11 is then modified by selecting in the table the parameters (g1-g4, D1, D2) corresponding to the correlation coefficient received.
In addition, it is known that the upper cut-off frequency fC of the restored signals depends on the compression ratio T applied by the encoder 5. Indeed, for compression ratios T corresponding to a transfer rate of 128 kbits/s there is a cut at 15 kHz for signals in MP3 encoders; while for compression ratios T corresponding to a transfer rate of 64 kbits/S, there is a cut at 10 kHz for signals. In other words, the higher the compression ratio T is, the more the high frequency component of the signals is reduced.
The invention makes it possible to regenerate the high frequency component of the right SDR or left SGR sound signals which has been suppressed because of the compression. This aspect of the invention is independent of the principle of generation of the two stereo-decompressed sound signals SDR and SGR from only one compressed signal SC.
For this purpose, the restored left SGR and right SDR sound signals, which are substantially formed of a low frequency component SBF lower than the cut-off frequency fC (see
This module 35 comprises a first band-pass filter 36 at the input of which the restored left SGR (resp. right SDR) sound signal is applied. This first filters 36 makes it possible to isolate the highest frequency part of the input signal SGR (resp SDR) ranging between a lower limit and an upper limit. In an example, the upper limit is equal to the cut-off frequency fC, and the lower limit is equal to fC/N, N preferably being equal to 2 or 4. The isolated part Si of the restored signal obtained at the output of the band-pass filter 36 is shown in
The isolated part Si is then applied to the input of the processor 38 of a nonlinear type which makes it possible to duplicate the isolated signal Si with regard to the frequency by generating the high frequencies harmonics at f1, f2, . . . fn of this signal Si, which makes it possible to fill the frequency spectrum in the zone of high frequencies. The duplicated signal SD thus obtained at the output of the nonlinear processor 38 is shown in
Then the high frequency part of the duplicated signal SD is isolated (without the isolated part Si from which it has been obtained) in order to obtain a high frequency component SHF of the sound signal shown in
In addition, the restored left SGR (resp. right SDR) sound signal is filtered by means of a low-pass filter 41 with a cut-off frequency substantially equal to fC to keep only the low frequency component SBF of the restored signal SGR, SDR. The low frequency part SBF is then delayed with a delay D4 by means of a delay cell 42. This delay D4 is about some samples.
Then, the low frequency component SBF is added to the high frequency component SHF by means of an adder 44, in order to obtain an increased restored left SGRA (resp. right SDRA) sound signal formed of the initial low frequency component SBF of the restored sound signal and the high frequency component SHF thus generated by the method according to the invention.
Preferably, but that is not obligatory, a post-processing cell 45 modifies the form of the spectral response of the high frequency component SHF, and gains g8 and g9 are applied to the high frequency SHF and low frequency SBF components before addition by means of the adder 44.
The parameters of the filters 36, 39, 41 depend on the compression ratio T. Indeed, the filters 36, 39, 41 have limits which depend on the cut-off frequency fC. As this cut-off frequency fC depends on the compression ratio T, the limits also depend on the compression ratio T. There is thus a table 47 giving the correspondence between the compression ratio T and the associated filter parameters making it possible to generate the high frequency component of the left and right sound signals.
The parameters of the post-processing cell 45, of the nonlinear processor 38, the delay cell 42, and the gains g8 and g9 also depend on the compression ratio T.
The parameters of the modules for generating treble frequencies 35 which process the left sound signal SGR and the right sound signal SDR are preferably symmetrical, i.e. the module 35 which processes the left sound signal SGR has parameters with the same value as the module 35 which processes the right sound signal SDR.
Number | Date | Country | Kind |
---|---|---|---|
09 59547 | Dec 2009 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR2010/052671 | 12/10/2010 | WO | 00 | 6/25/2012 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2011/086253 | 7/21/2011 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4991218 | Kramer | Feb 1991 | A |
6895093 | Ali | May 2005 | B1 |
20100303245 | Vickers | Dec 2010 | A1 |
Number | Date | Country |
---|---|---|
WO 2006048227 | May 2006 | WO |
WO 2006132857 | Dec 2006 | WO |
Entry |
---|
Engdegard et al “Synthetic Ambience in Parametric Stereo Coding”, AES 116th Convention, May 8-11, 2004, p. 1-12, Berlin, Germany. |
Kim et al “Enhanced Stereo Coding with Phase Parameters for MPEG Unified Speech and Audio Coding”, AES 127th Convention, Oct. 9-12, 2009, p. 1-7, New York City, USA. |
“Information Technology-Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s, Part 3: Audio”, International Standard ISO/ICE, No. 11172-03, Aug. 1, 1993, pp. 1-158. |
Jean-Marc Valin, “Perceptually-Motivated Nonlinear Channel Decorrelation for Stereo Acoustic Echo Cancellation”, Hands-free Speech Communication and Microphone Arrays, May 6, 2008, pp. 188-191, IEEE, Piscataway, NJ, USA. |
Engdegard, Purnhagen, Roden and Liljeryd, “Synthetic Ambience in Parametric Stereo Coding,” Audio Engineering Society, May 8-11, 2004, AES 116th Convention, Berlin, Germany. |
Number | Date | Country | |
---|---|---|---|
20120275608 A1 | Nov 2012 | US |