The present invention relates to a method for hierarchical coding of audio data, more particularly for scalar quantization-based coding.
This coding is notably designed for the transmission and/or for the storage of digital signals such as audio frequency signals (speech, music or others).
The present invention relates more particularly to the coding of waveforms such as PCM (for “Pulse Code Modulation”) coding where each input sample is coded individually, without prediction. The general principal of PCM coding/decoding specified by the recommendation UIT-T G.711 is such as described with reference to
The PCM coder 13 comprises a quantization module QPCM 10 which receives the input signal S at its input. The quantization index IPCM at the output of the quantization module 10 is transmitted via the transmission channel 11 to the decoder 14.
The decoder PCM 14 receives at its input the indices IPCM coming from the transmission channel, a version which could be affected by binary errors of IPCM, and carries out an inverse quantization by the inverse quantization module Q−1PCM 12 in order to obtain the coded signal S′Mic.
The normalized UIT-T G.711 PCM coding (hereinafter referred to as G.711) carries out a compression of the amplitude of the signals with a logarithmic curve prior to uniform scalar quantization, which allows an approximately constant signal-to-noise ratio to be obtained for a wide dynamic range of signals. The quantization step in the frequency range of the original signal is therefore proportional to the amplitude of the signals.
The successive samples of the compressed signal are quantized over 8 bits, or 256 levels. In the Public Switched Telephone Network (PSTN), these 8 bits are transmitted at a frequency of 8 kHz giving a bit rate of 64 kbits/s.
A quantized signal frame according to the G.711 standard is composed of quantization indices coded over 8 bits. Thus, if the inverse quantization is applied by table, it simply consists of the index pointing to one of the 256 possible decoded values.
For reasons of complexity of implementation, the PCM compression has been approximated by a segmented linear curve.
Two coding laws are defined in the G.711 standard: law A, mainly used in Europe, and mu (μ) law used in North America and in Japan.
These coding laws allow an amplitude compression (or “companding”) to be applied to the signal. The amplitude of the signal is thus “compressed” with a non-linear function in the coder, sent over a transmission channel and “decompressed” with the inverse function in the decoder. The advantage of amplitude compression is that it allows the probability distribution of the amplitude of the input audio signal to be transformed into a quasi-uniform probability law, on which a uniform scalar quantization can be applied.
The laws of amplitude compression are generally laws of the logarithmic type which therefore allow a signal sampled with a resolution of 16 bits (in “linear PCM” format) to be coded over 8 bits (in “PCM” format of the law A or mu type).
The 8 bits per sample in G.711 are allocated in the following manner such as is shown at reference 15 in
1 sign bit S (0 for a negative value, otherwise 1), assigned the reference sgn in
3 bits to indicate the segment (reference ID-SEG in
4 bits for indicating the location on the segment, assigned the reference ID-POS in
The last 7 bits therefore constitute the coded absolute value. In the following we will firstly study the case of law A, then the results are generalized for the mu law. According to the A law G.711 standard, the final index is obtained by inverting each second bit starting from the Least Significant Bit or LSB. This coding law allows a scalar quantization precision of 12 bits (hence a quantization step of 16) on the first two segments, then the precision decreases by 1 bit when the segment number increases by 1.
It can be noted that it is possible to perform the G.711 PCM quantization starting from a digital signal represented over 16 bits by carrying out simple comparisons between the amplitude of the sample to be coded and the decision thresholds of the quantifier. The use of a dichotomy significantly accelerates these comparisons. This solution requires a table with 256 entries to be stored; table 1 hereinbelow presents an extract from such a table for the G.711 law A.
For example, an original sample of the signal S to be coded has an amplitude equal to −75. Consequently, this amplitude is included in the interval [−80, −65] of the line 123 (or “level” 123) of the table. The coding of this information consists in delivering a coded final index, referenced I′Mic in
The signal-to-noise ratio (SNR) obtained by the PCM coding is more or less constant (˜38 dB) for a wide dynamic range of signals. The quantization step in the frequency range of the original signal is proportional to the amplitude of the signals. This signal-to-noise ratio is not sufficient to make the quantization noise inaudible over the whole band of frequencies 0-4000 Hz. Moreover, for low-level signals (which are coded with the first segment) the SNR is very poor.
The G.711 standard is generally considered as being of good quality for narrow-band telephony applications with terminals limiting the band to [300-3400 Hz]. However, the quality is not high enough when G.711 is used for other applications such as, for example, for high-fidelity terminals in the band [50, 4000 Hz] or for the wideband hierarchical extension of the G.711 coding.
For this reason, there do exist methods of hierarchical coding consisting in generating an enhancement layer determined from the coding noise of the G.711 coder. This coding noise is then coded by a technique different from G.711, which forms the layer known as ‘base layer’ (or ‘core layer’). Such a method of hierarchical coding is for example described in the document: Y. Hiwasaki, H. Ohmuro, T. Mori, S. Kurihara and A Kataoka. “A G.711 embedded wideband speech coding for VoIP conferences”, IEICE Trans. Inf. & Syst, Vol. E89-D, no 9, September 2006. This type of method has the drawback of very significantly increasing the complexity of the coder, whereas coding of the PCM type is reputed to be of low complexity. Moreover, since the PCM coding noise is a white noise, hence uncorrelated, the coding of this type of noise is difficult to implement because compression techniques are essentially based on extraction properties from the correlation of the signal to be coded.
The present invention offers a solution that improves the situation.
For this purpose, the invention provides a method for scalar quantization-based coding of the samples of a digital audio signal, the samples being coded over a pre-determined number of bits in order to obtain a binary frame of quantization indices, the coding being carried out according to an amplitude compression law, where a pre-determined number of least significant bits are not taken into account in the binary frame of quantization indices. The method is such that it comprises the following steps:
Thus, an enhancement bit stream is transmitted at the same time as the binary frame of quantization indices.
This extension bit stream is determined by taking advantage of the least significant bits that are not used during the coding. This method therefore has the advantage of not adding complexity to the coder and of providing the desired improvement in quality by providing the decoder with the possibility of obtaining a better decoding precision.
In one embodiment, the stored bits are the most significant bits amongst the bits that are not taken into account in the binary frame of quantization indices.
All the bits put aside during the application of the logarithmic coding law are not necessarily included in the extension bit stream. It is thus possible to determine an extension bit stream according to the requirements in quality and availability in terms of bit rate.
In one variant embodiment, the number of bits taken into account for determining the enhancement bit stream is a function of the bit rate available during a transmission to a decoder.
Thus, the extension bit stream may be modulated in the course of the transmission depending on the available bit rate.
The invention is particularly well suited to the case where the scalar quantization step is a quantization of the PCM type according to a logarithmic amplitude compression coding law of the A type or of the mu type in accordance with the ITU-T G.711 standard.
The invention is also applicable to a method for decoding a binary frame of quantization indices comprising a pre-determined number of bits by an inverse quantization step and according to an amplitude compression law. The method is such that it comprises the following steps:
The decoder that receives extension bits thus improves the precision of its expansion or “decompression” by concatenating the extension bits received to those present in the quantization index frame received from the basic bit stream.
In one preferred embodiment, the method also comprises a step for adapting a rounding value according to the number of extension bits received in order to obtain the decoded audio signal.
The detection of the coded audio signal is thus adapted according to the number of bits in the extension bit stream.
The invention also relates to an audio coder comprising a module for scalar quantization of the samples of a digital audio signal, the samples being coded over a pre-determined number of bits in order to obtain a binary frame of quantization indices, the coding being applied according to an amplitude compression law, a pre-determined number of least significant bits not being taken into account in the binary frame of quantization indices. The coder according to the invention comprises:
The invention relates to an audio decoder capable of decoding a binary frame of quantization indices comprising a pre-determined number of bits by an inverse quantization module and according to an amplitude compression law. The decoder according to the invention comprises:
Lastly, the invention is aimed at a computer program designed to be stored in a memory of a coder and/or a storage medium capable of cooperating with a drive of the coder, comprising code instructions for the implementation of the steps of the coding method according to the invention when it is executed by a processor of the coder.
Similarly, the invention is aimed at a computer program designed to be stored in a memory of a decoder and/or a storage medium capable of cooperating with a drive of the decoder, comprising code instructions for the implementation of the steps of the coding method according to the invention when it is executed by a processor of the decoder.
Other features and advantages of the invention will become more clearly apparent upon reading the following description, presented by way of non-limiting example and with reference to the appended drawings, in which:
a and 3b show the quantized values relative to the input values following application of the A and mu coding laws, respectively, according to the G.711 standard;
A coder 23 comprises a quantifier QPCM 20 capable of quantizing the input signal S in order to obtain a frame of quantization indices IPCM which is transmitted over the transmission channel 21 to a decoder 24.
In one particular embodiment, this coder is of the PCM coder type and implements a coding law of the A or mu type such as is described in the G.711 standard.
The frame of quantization indices obtained is therefore shown in 15 and is in accordance with the frame of the G.711 A or mu law type.
Methods for implementation of the A and mu coding laws are included in the G.711 standard. They consist in determining the final quantization index by simple operations of low complexity which avoid storing large tables of values.
Thus, the pseudo-code shown in Appendix A-10 gives an example of implementation of the A law such as described in the G.711 standard (with a linear approximation by segments of the amplitude compression law). One concrete implementation of this pseudo-code is also given by way of example in Appendix A-10. This implementation is in accordance with the recommendation ITU-T G.191 Software Tool Library (STL-2005), Chapter 13 “ITU-T Basic Operators”. This recommendation is accessible on the ITU Internet website:
http://www.itu.int/rec/T-REC-G.191-200508-I/en
It can be seen in this pseudo-code that the quantization index over 8 bits comprises the sign bit (sign), the index of the segment (exp) and the position on the segment (mant).
In a first part of this coding, the sign bit that goes at the position 0, as indicated in 15 in
The 4 bits forming the position on the segment are placed at the positions 4, 5, 6 and 7 as shown in 15.
There is always a shift of bits to the right of at least 4 bits (x=shift_right (x, pos−4)) and hence 4 bits lost; Therefore, only the most significant bits (MSB) are used in order to form the frame of quantization indices. The minimum value of the variable “pos” for the coding according to the A law is 8. For all the segments, there are therefore at least 4 of the least significant bits that are lost. The compression for the process of amplitude compression is thus achieved.
For an input signal with a 16 bit resolution per sample (in “linear PCM” format), the smallest quantization step is 16, the 4 least significant bits being lost. Table 2 hereinafter gives the thresholds and quantization step for each segment for the G.711A law.
In the same way, the decoding can be implemented by simple operations as the pseudo-code and the ITU-T STL-2005 implementation shown in Appendix A-11 illustrate.
It can be seen in this pseudo-code that the sign (sign), the segment (exp) and the value in the segment (val) are recovered from the 8-bit index (index). A rounding value equal to 8 and corresponding to half the quantization step used for a segment is applied in order to obtain the value of the middle of the quantization interval. Thus, the inversion of the amplitude compression process is achieved. The least significant bits that were rejected in the coding are recovered here after approximation.
The mu law version of G.711 is similar to the A law. The main difference is that 128 is added to the values in order to ensure that, in the first segment, bit 7 is always equal to 1, which makes the transmission of this bit redundant and hence increases the precision of the first segment (quantization step 8 in the first segment compared to 16 in the A law). This also enables identical processing of all the segments. In addition, 4 is added (hence 128+4=132 in total) for the rounding so as to have the level 0 amongst the quantized values (the A law has no level 0, the smallest values being 8 or −8). The price of this better resolution in the first segment is the shifting of all the segments by 132. Table 3 hereinafter gives the thresholds and the quantization step for each segment for the G.711 mu law.
a and 3b allow the resolution of these two laws to be compared for the first 512 values.
In the same way as for the A law, a method for implementation without storing tables of values is given by an example of encoding pseudo-code according to the G.711 mu law standard shown in Appendix A-12.
In the same way as for the A law, it can be seen in this pseudo-code that there is always a shift of bits to the right of at least 3 bits (x=shift_right (x, pos−4)), the minimum value of “pos” being 7 for the mu law.
Therefore, only the most significant bits (MSB) are used to form the frame of quantization indices and thus to carry out the amplitude compression step.
The minimum value of the variable “pos” for the coding according to the mu law is 7 since, as previously mentioned, in the case of the mu law the first segment is handled in the same way as the other segments. Hence, for all the segments, there are at least 3 least significant bits that are lost.
As for the A law, the decoding can simply be carried out by a simple algorithm, an example of which is given in Appendix A-13.
The coder 23 according to the invention takes advantage of the method of coding according to A or mu laws by storing in a memory space, shown as reference 27, a part of the least significant bits which have not been taken into account for the coding of the binary frame of quantization indices IPCM.
Thus, as previously mentioned for the logarithmic coding according to the A or mu laws, at least 3 bits for all the segments can be stored.
The number of bits lost by the coding methods according to the A or mu law increases with the number of the segment, up to 10 bits for the last segment.
The method according to the invention allows at least the most significant bits among these lost bits to be recovered.
In order to determine an enhancement bit stream with a bit rate of 16 kbit/s, hence with 2 bits per sample, the method according to the invention will store in memory 27 the two most significant bits of the bits that are not taken into account in the compression operation in order to determine the frame of quantization indices.
These bits are recovered for determining, in 28 by determination means from the extension bit stream, the enhancement bit stream IEXT. This enhancement bit stream is then transmitted via another transmission channel 25 to a decoder 24.
Thus, the decoder 24 comprising an inverse quantifier, here an inverse PCM quantifier Q−1PCM 22, receives in parallel the basic bit stream I′PCM and the enhancement bit stream I′EXT.
These streams I′PCM and I′EXT are versions that could be affected by binary errors of IPCM and of IEXT, respectively.
In the case where this enhancement bit stream is received by the reception means 29 of the decoder 24, the decoder will then have a greater precision on the location of the decoded sample in the segment. For this purpose, it concatenates the extension bits to the bits received in the basic stream I′PCM by bit concatenation means 30, and then carries out an inverse quantization in 22.
Indeed, the addition of another bit allows the number of segment levels to be multiplied by two. Doubling the number of levels also increases the signal-to-noise ratio by 6 dB. Thus, for each bit added in the enhancement bit stream and received at the decoder, the signal-to-noise ratio will be increased by 6 dB, which in turn enhances the quality of the decoded signal without however significantly increasing the complexity at the coder.
In the example illustrated in
It can be seen that, instead of shifting the bits all at once by “pos−4” positions to only keep the 5 most significant bits, as is the case in the coding according to the A law, as a first step a shift of 2 positions less (hence “pos−6” positions) is applied, to keep the 7 most significant bits, and the last two bits are stored in 27. Then, in a second step, a shift of two more bits is made so as to obtain 5 most significant bits of which the first bit, always at 1, is not transmitted. The 4 others are used for the basic bit stream.
The two stored bits are sent in the extension bit stream.
As shown in
The pseudo-code enabling all of these operations to be performed at the coder for the A law is given in Appendix A-15.
It can be seen that the differences with respect to the conventional G.711 coding (sections underlined and in bold in the appendix) are the steps for shifting in two operations as previously explained and the use of these two stored bits for determining the enhancement bit stream “ext” and transmitting it.
Similarly, for the implementation of the mu law, the corresponding pseudo-code for the coding is shown in Appendix A-16.
The same differences with the conventional coding as for the coding according to the A law are noted.
Similarly,
Upon reception of the enhancement bit stream IEXT, the decoder concatenates in 30 the extension bits thus received behind position bits of the basic stream I′PCM in order to carry out the amplitude decompression—or expansion—which is the inverse operation of the amplitude compression process.
Using these additional bits thus allows a greater precision in the location of the decoded sample in the segment to be obtained.
Indeed, for one additional bit, the segment is divided into two. The precision on the location in the segment of the decoded value is then more important.
The rounding value “roundval”, which enables the value of the middle of the segment to be found, is also adapted according to the number of extension bits received.
The information on the number of extension bits received is for example given by means of an external indicator as represented by the arrow 26 in
This information could also be deduced directly by analysis of the extension bit stream.
One example of decoding taking into account these extension bits is given in Appendix A-17 by the pseudo-codes for the A law and the mu law, respectively.
The differences between the conventional decoding and that of the invention (sections underlined and in bold in the appendix) represent the bits of the extension bit stream being taken into account and the application of a rounding value “roundval”.
The coder, such as that shown in
This memory space 27 can form part of a memory block that also comprises a storage memory and/or a working memory.
The storage means can comprise a computer program comprising code instructions for the implementation of the steps of the coding method according to the invention when they are executed by the processor of the coder.
The computer program can also be stored on a storage medium readable by a drive of the coder or downloadable into the memory space of the coder.
This coder thus implements the method according to the invention for scalar quantization-based coding of the samples of a digital audio signal. The samples are codes over a pre-determined number of bits in order to obtain a binary frame of quantization indices and the coding is carried out according to an amplitude compression law. A pre-determined number of least significant bits are not taken into account in the binary frame of quantization indices. The coding is such that it comprises the following steps:
Similarly, the decoder according to the invention comprises a processor of the DSP type not shown here and is capable of implementing the method of decoding of a binary frame of quantization indices comprising a pre-determined number of bits by an inverse quantization step according to an amplitude compression law. This method is such that it comprises the following steps:
This decoder also comprises a storage means (not shown), capable of storing a computer program comprising code instructions for the implementation of the steps of the decoding method according to the invention when they are executed by the processor of the decoder.
The computer program can also be stored on a storage medium readable by a drive of the decoder or downloadable into the memory space of the decoder.
The example shown and explained with reference to
The LSBs “ext_bits” of the variable “ext” are sent in the enhancement bit stream.
It should be noted that the term “pos−4-ext” bits can be negative for ext_bits 3 in the first segments and depending on the law used (A or mu). Even under these conditions, the pseudo-code given would work correctly because shift_right(x, −v)=shift_left(x, v). In other words, in the case where the number of least significant bits that are not taken into account in the frame of quantization indices is less than the number of bits in the extension bit stream, in particular in the first segments, the missing bits just need to be completed in the extension bit stream with zeros. Thus, the most significant bits of the extension bit stream will be the bits stored and recovered according to the invention; the least significant bits will be set to 0.
Since the number of bits stored in the following segments increases, it will no longer be necessary to complete them with zeros.
Similarly, the invention is also applicable in the case where during transmission the bit rate must be reduced. In the case where the extension bit stream comprises two bits, the least significant bit of this extension bit stream is then no longer transmitted.
The decoder then only receives one extension bit per sample. The decoder such as it is described in the pseudo-code by way of example will work correctly with this extension layer reduced to one bit per sample as long as the extension bit received is put into the variable “ext” at the position 1, the bit of position 0 of the variable “ext” is then set to 0 and the value of “roundval” is adapted accordingly.
The value of the variable “roundval” such as used in the examples given therefore depends on the number of bits received by the encoder and on the law used (A or mu). Table 4 hereinafter gives the value of the variable “roundval” in the various situations.
This example therefore shows another advantage of the solution presented which is that the binary train of the extension layer is hierarchical. It is therefore possible to decrease its bit rate in the course of the transmission.
Thus, if the two bits are received by the decoder, the increase in the SNR is 12 dB, if one bit is received, the increase in the SNR is 6 dB.
Of course, this example may also be generalized; for example, the encoder can send 4 bits per sample in the extension layer and the decoder can receive 4, 3, 2, 1 or 0 of these bits, and the quality of the decoded signal will be proportional to the number of extension bits received.
It can be observed in the pseudo-codes given that the additional complexity of the decoding of the extension layer is only two operations per sample at the encoder and 4 operations per sample at the decoder, this being ˜0.05 weighted million operations per second (WMOPS), which is negligible. This low complexity may be used to advantage in the case of a hierarchical coding extending G.711 while at the same time allowing, for example in audio conference applications, a “conventional” low-complexity mixing of G.711 stream or extended G.711 stream according to the invention, whereas in the article by Hiwasaki a mixing referred to as “partial mixing”, implying a degradation in quality with respect to the conventional mixing, is implemented in order to limit the complexity of the mixing with scalable G.711 coding.
In an alternative embodiment, the invention will not be implemented following the algorithms specified previously by pseudo-code, but by pre-calculating and storing in tables at the coder and/or at the decoder the levels allowing the extension bits to be obtained. This solution has however the drawback of requiring greater memory capacity both at the coder and at the decoder for a small gain in complexity.
x = shift_right(x, pos − 6) /* first part of shift */
ext = and(x, 0x3)
/* save last to bits */
x = shift_right(x, 2)
/* finish shift */
x = shift_right(x, 2)
ext = and(x, 0x3)
/* save last two bits */
x = shift_right(x, 2)
/* finish shift */
x = shift_right(x, pos − 6) /* first part of shift */
ext = and(x, 0x3)
/* save last two bits */
x = shift_right(x, 2)
/* finish shift */
ext = shift_left(and(ext, 0x03), 2) /* put extension bits in
position 2 & 3 */
ext = shift_left(and(ext, 0x03), 1) /* put extension bits in
position 1 & 2 */
Number | Date | Country | Kind |
---|---|---|---|
0756326 | Jul 2007 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR2008/051248 | 7/4/2008 | WO | 00 | 2/25/2010 |