The proposed technology relates to coding/decoding in general and specifically to improved coding and decoding of signals in a fixed-bitrate codec.
Typically, speech/audio codecs process low- and high-frequency components of an audio signal with different compression schemes. Most of the available bit-budget is consumed by the LB (Low frequency Band) coder (due to the higher sensitivity of human auditory system at these frequencies). In addition to that, most of the available computational complexity is also consumed by the LB codec, e.g., analysis-by-synthesis ACELP (Algebraic Code Excited Linear Prediction). This leaves severe requirements on the complexity available to the HB (High frequency Band) codec.
Due to the mentioned above constraints the HB part of the signal is typically reconstructed by a parametric BWE (Band Width Extension) algorithm. This solution handles the problem of the constrained bit-budget and limited complexity, but it completely lacks scalability, which means that the quality quickly saturates and does not follow the bit-rate increase.
Variable bit-rate schemes such as entropy coding schemes present an efficient way to encode sources at a low average bit-rate. However, many applications rely on a fixed bit-rate for the encoded signal, such as e.g. mobile communication channels. The number of consumed bits for a segment of a given input signal is not known before the entropy coding has been completed. One common solution is to run several iterations of the entropy coder until a good compression ratio within the fixed bit budget has been reached.
Consequently, there is a need for methods and arrangements to enable low-complexity and scalable coding of the high band part of audio signals and enables utilizing a variable bit-rate quantization scheme within a fixed bitrate framework.
The solution of running multiple iterations for the entropy coder is a computationally complex solution, which may not fit in the context of real-time communication on a device with limited processing power.
A general object of the proposed technology is improved coding and decoding of audio signals.
A first aspect of the embodiments relates to a method for quantizing a received excitation signal in a communication system. The method includes the steps of re-shuffling the elements of an excitation signal to provide a re-shuffled excitation signal, coding the re-shuffled excitation signal, and reassigning codewords of the coded excitation signal if a number of used bits exceeds a predetermined fixed bit rate requirement to provide a quantized excitation signal.
A second aspect of the embodiments relates to a method for reconstructing an excitation signal in a communication system. The method includes the steps of entropy decoding a received quantized excitation signal, and SQ decoding the entropy decoded excitation signal to provide a reconstructed excitation signal.
A third aspect of the embodiments relates to an encoding method in a communication system. The method includes the steps of extracting a representation of a spectral envelope of an audio signal, and providing and quantizing an excitation signal based on at least the representation and the audio signal, the quantization being performed according to the previously described quantizer method. Further, the method includes the steps of providing and quantizing a gain for the audio signal based on at least the excitation signal, the provided representation and the audio signal, and finally transmitting quantization indices for at least the quantized gain and the quantized excitation signal to a decoder unit.
A fourth aspect of the embodiments relates to a decoding method in a communication system. The method includes the steps of generating a reconstructed excitation signal for an audio signal based on received quantization indices for an excitation signal. The quantization indices for the excitation signal have been provided according to the above described quantizer method. Further the method includes the steps of generating and spectrally shaping a reconstructed representation of the spectral envelope of the audio signal based on at least the generated reconstructed signal and received quantized representation of a spectral envelope of the audio signal, to provide a synthesized audio signal. Finally, the method includes the step of up-scaling the thus synthesized audio signal based on received quantization indices for a gain, to provide a decoded audio signal.
A fifth aspect of the embodiments relates to a quantizer unit for quantizing a received excitation signal in a communication system. The quantizer unit includes a re-shuffling unit configured for re-shuffling the elements of an excitation signal to provide a re-shuffled excitation signal, a coding unit configured for coding the re-shuffled excitation signal to provide a coded excitation signal, and a reassigning unit configured for reassigning codewords of the coded excitation signal.
A sixth aspect of the embodiments relates to a de-quantizer unit for reconstructing an excitation signal in a communication system. The de-quantizer unit includes an entropy-decoding unit configured for entropy decoding a received quantized excitation signal, and an SQ decoding unit configured for SQ decoding the entropy decoded excitation signal. Further, the de-quantizer unit includes an inverse re-shuffling unit configured for inversely re-shuffling the elements of the reconstructed excitation signal.
A seventh aspect of the embodiments relates to an encoder unit. The encoder unit includes a quantizer unit as described above and further an extracting unit configured for extracting a representation of a spectral envelope of an audio signal, and the quantizer unit is configured for providing and quantizing an excitation signal based on at least the representation and the audio signal. Further, the encoder includes a gain unit configured for providing and quantizing a gain based on at least the excitation signal, the provided representation, and the audio signal, and a transmitting unit configured for transmitting quantization indices for at least the quantized gain and the quantized excitation signal to a decoder unit.
An eighth aspect of the embodiments relates to a decoder unit. The decoder unit includes a de-quantizer unit for generating a reconstructed excitation signal based on received quantization indices for an excitation signal for an audio signal, and a synthesizer unit configured for generating and spectrally shaping a reconstructed representation of the spectral envelope of the audio signal based at least on the generated reconstructed excitation signal and a received quantizer representation of the spectral envelope to provide a synthesized audio signal. Finally, the decoder unit includes a scaling unit configured for up-scaling the synthesized audio signal based on received quantization indices for a gain to provide a decoded audio signal.
The proposed technology also involves a user equipment and/or a base station terminal including at least one such quantizer unit, de-quantizer unit, encoder unit or decoder unit.
An advantage of the proposed technology is scalable low-complexity coding of high-band audio signals.
The embodiments of the proposed technology, together with further objects and advantages thereof, may best be understood by referring to the following description taken together with the accompanying drawings, in which:
HB: High frequency Band
LB: Low frequency Band
The proposed technology is in the area of audio coding, but is also applicable to other types of signals. It describes technology for a low complex adaptation of a variable bit-rate coding scheme to be used in a fixed rate audio codec. It further describes embodiments of methods and arrangements for coding and decoding the HB (High frequency Band) part of an audio signal utilizing a variable bit-rate coding scheme within a fixed-bitrate codec. Although the embodiments mainly relate to coding and decoding of high frequency band audio signals, it is equally applicable to any signal, e.g. audio or image, and any frequency range where a fixed bitrate is applied.
Throughout this description, the terms excitation, excitation signal, residual vector, and residual are used in an exchangeable manner.
The embodiments provide a lightweight and scalable structure for variable bit-rate coding in a fixed bit-rate codec, and is particularly suitable for, but not limited to, HB audio coding and frequency domain coding schemes. One key aspect of the embodiments includes jointly designed lossy and lossless compression modules, which together with codeword reassignment logic operate at a fixed-bitrate. In this way, the system has the complexity and scalability advantage of SQ (Scalar Quantization), at relatively low-bitrate, where SQ technology is typically not applicable.
Known methods of utilizing variable bit rate schemes within a fixed bit rate scheme include performing a quantization step multiple times until a predetermined fixed bitrate is achieved.
One main concept of the invention is the combination of an entropy coding scheme with a low complex adaptation to fixed bit-rate operation. Here it is first presented in the context of a time-domain audio codec and later in the context of a frequency-domain audio codec.
A high-level block-diagram of an embodiment of an audio codec in the time domain is presented in
Real-time audio coding is typically done in frames (blocks) that are compressed in an encoder and transmitted as a bitstream to a decoder over a network. The decoder reconstructs these blocks from the received bitstream and generates an output audio stream. The algorithm in the embodiments operates in the same way. A HB audio signal is typically processed in 20 ms blocks. At 16 kHz sampling frequency, this corresponds to 320 samples processed at a given time instant. However, the same method can be applied to any size blocks and for any sampling frequency.
Although the majority of the present disclosure explicitly deals with quantization in a time domain, it is equally applicable within the frequency domain, in particular within a MDCT context. A corresponding high-level block-diagram of coding/decoding in the frequency domain is illustrated in
Some of the benefits of the frequency-domain approach are. A) down- and up-sampling can be avoided (low/high frequency components of the coded vector can be directly selected), and B) easier to select regions with lower perceptual importance, as an example, the effects of masking of weak tones in the presence of stronger tones requires frequency-domain processing.
In order to provide the necessary quantization indices for the excitation signal, for either the time domain scheme or the frequency domain scheme, the inventors have developed a novel quantization method and arrangement, which enables utilizing a variable bit-rate algorithm in a fixed bit-rate scheme. The same quantization method can be utilized regardless if the quantization takes place in a frequency domain based encoder/decoder or a time domain based encoder/decoder.
According to an aspect of the current disclosure, a novel quantizer arrangement and method for quantizing an excitation signal for a signal (audio or other) to be subsequently encoded will be described with reference to
With reference to
The quantizer method will be denoted Qe in the following description, and is given in more detail in
The re-shuffling step S301 and the coding step S302 can be performed in any order without affecting the end-result. Consequently, the coding step S302 can be applied to the received excitation signal and the elements of the coded excitation can be subsequently re-shuffled S301.
Finally, the codewords of the coded excitation signal are reassigned in step S303 if the number of used bits for the coded signal exceeds a predetermined fixed bit rate requirement, the reason for this is further explained below.
According to a further embodiment, the quantizer unit and method optionally includes a unit for performing a step S304 of inversely re-shuffling the elements of the codeword reassignment in order to re-establish the original order of the elements of the excitation signal.
Since SQ schemes are generally not efficient at low bit-rates, entropy coding such as Huffman coding or similar is used for more efficient use of the available bits. The concept of the Huffman codes is that shorter codewords are assigned to symbols that occur more frequently; see Table 1 below, which presents the Huffman code for a 5 level quantizer. Each reconstruction level has attached a codeword (shorter for more probable amplitudes, which correspond also to lower amplitudes).
Since Huffman coding is a variable bit-rate algorithm, a special codeword reassignment algorithm according to the present embodiments is used to fit the HB coding into a fixed-bitrate requirement. The “Codeword reassignment” module in
Although the above description mainly deals with Huffman coding, it is equally possible to utilize any other codec, which has a variable codeword length depending on amplitude probability, preferably a codec where a shorter codeword is assigned to higher probability amplitude. It is further possible to include a step of providing a plurality of Huffman tables (or other codes) and performing a selection of an optimal or preferred table. Another possibility is to use one or more codes (Huffman or other) out of a plurality of provided codes. The main criterion for the code is that there is a correlation between amplitude probability and codeword length.
The motivation behind this procedure is that the lowest amplitudes are set to zero first, which leads to lower error in the reconstructed signal. Since the elements of the excitation vector are re-shuffled or randomly selected, extracting a sequence of elements from Group 1 and zeroing their amplitudes does not produce error localized in time (the error is spread over the entire vector). Instead of performing an actual re-shuffling of the excitation vector and then extracting elements from Group 1 in a sequence, it is possible to randomize directly the extraction step
The excitation quantization consumes most of the available bits. It easily scales with increasing bitrate by increasing the number of reconstruction levels of the SQ.
In a corresponding manner the quantized excitation signal needs to be reconstructed in a receiving unit e.g. decoder or de-quantizer unit in a decoder, in order to enable reconstructing the original audio signal.
Accordingly, with reference to
With reference to
Initially, a representation of a spectral envelope of an audio signal is extracted in step S1. For a time-domain application the representation of the spectral envelope can comprise the auto regression coefficients, and for a frequency domain application the representation of the spectral envelope can comprise a set of band gains for the audio signal. Subsequently, in step S2, an excitation signal for the audio signal is provided and quantized. The quantization is performed according to the previously described embodiments of the quantization method. Further, in step S3 a gain is provided and quantized for the audio signal based on at least the extracted excitation signal, the provided representation of the spectral envelope and the audio signal itself. Finally, in step S4, quantization indices for at least the quantized gain and the quantized excitation signal are transmitted to or provided at a decoder unit.
With reference to
With reference to
Below follows, a more detailed description of the various steps and arrangements described above.
An embodiment of the HB encoder operations is illustrated in
e(n)=A(z)sHB(n), (1)
where A(z)=1+â1z−1+â2z−2+ . . . +âMz−M, is the AR model of order M=10.
The excitation signal or residual is down sampled to 8 kHz, which corresponds to vectors with length N=160 samples. This down sampled excitation signal contains the frequency components 8-12 kHz of the original bandwidth of audio input s. The motivation behind this operation is to focus available bits, and accurately code perceptually more important signal components (8-12 kHz). Spectral regions above 12 kHz are typically less audible, and can easily be reconstructed without the cost of additional bits. However, it is equally applicable to perform any other degree of down sampling of parts of or the entire high frequency band spectrum of the audio input signal s.
It should be noted that this down sampling is optional and may be unnecessary if the available bit budget permits coding the entire frequency range. If, on the other hand, the bit budget is even more restricted a down sampling to an even narrower band may be desired, e.g. representing the 8-10 kHz band, or some other frequency band.
Prior to quantization, the optionally down sampled excitation signal or residual vector e′ is normalized to unit energy, according to Equation 2 below. This scaling facilities shape quantization operation (i.e. the quantizers do not have to capture global energy variations in the signal).
The actual residual quantization is performed in Qe block in
In order to calculate and transmit the appropriate energy level of the HB signal, the encoder performs the steps of synthesizing the waveform (in the same manner as in the decoder). First the residual ê with bandwidth 8-16 kHz is reconstructed em from the coded one (8-12 kHz residual) through up sampling with spectrum folding. Then the waveform is synthesized by running reconstructed excitation through all-pole autoregressive filter to form the synthesized high frequency band signal s′HB. The energy of the synthesized waveform s′HB is adjusted to the energy of the target waveform sHB. The corresponding gain G, as defined in Equation 3, can be efficiently quantized with a 6 bit SQ in logarithmic domain.
In summary, on a frame-by-frame basis, embodiments of the encoder in the time domain quantizes and transmits quantization indices for a set of AR coefficients Ia, one global gain IG, and excitation signal Ie for a received signal.
With reference to
A decoder 200 according to the present disclosure reconstructs the HB signal by extracting from the bitstream, received from the encoder unit 100, quantization indices for the global gain IG, AR coefficients Ia, and excitation vector Iv.
An embodiment of the excitation reconstruction algorithm or de-quantizer unit 400 in a decoder 200 is illustrated in
An overview of the processing steps of an embodiment of the HB decoder is shown in
s′
HB(n)=A(z)−1em(n), (4)
Finally the waveform is up-scaled, as indicated by the dotted box, in step S30, with the received gain G (as represented by the received quantization indices IG for the gain G) to match the energy of the target HB waveform, to provide the output high frequency band part of the audio signal, as shown in Equation 5 below.
ŝ
HB(n)=Ĝ×s′HB(n), (5)
As mentioned previously, the embodiments of the described scheme for HB coding in the time domain can also be implemented on a signal transformed to some frequency domain representation, e.g., DFT, MDCT, etc. In this case, AR envelope can be replaced by band gains that resemble the spectrum envelope, and the excitation or residual signal can be obtained after normalization with such band gains. In such an embodiment, the re-shuffling operation may be done such that perceptually less important elements will be removed first. One possible such re-shuffling would be to simply reverse the residual in frequency, since lower frequencies are generally more perceptually relevant.
With reference to
In a corresponding manner to the decoding method described with reference to
Processing steps in the frequency-domain encoder are illustrated in
Processing steps in the frequency-domain decoder are illustrated in
Note that
Arrangements according to the present disclosure will be described below with reference to
With reference to
With reference to
Further embodiments of a quantizer unit 300 and a de-quantizer unit 400 according to the present technology are illustrated in
As mentioned previously, the above described quantizer unit 102, 300 is beneficially implemented in an encoder unit, embodiments of which will be further described with reference to
A general embodiment of the encoder unit 100 includes a quantizer 102, 300 as described previously, and further includes an extracting unit 101 configured for extracting a representation of a spectral envelope of an audio signal, and the quantizer unit 300 is configured for providing and quantizing an excitation signal based on at least that representation of the spectral envelope and the audio signal. Further, the encoder 100 includes a gain unit 103 configured for providing and quantizing S3 a gain based on at least the excitation signal, the provided representation and the audio signal, and a transmitting unit 104 configured for transmitting quantization S4 indices for at least the quantized gain and the quantized excitation signal to a decoder unit.
According to
According to
As mentioned previously, the above described de-quantizer unit 201, 400 is beneficially implemented in a decoder unit 200, embodiments of which will be further described with reference to
A general embodiment of the decoder unit 200 includes a de-quantizer unit 201, 400 as described previously. Further, the de-quantizer unit 400, 201 is configured for generating a reconstructed excitation signal based on received quantization indices for the excitation signal. The decoder 200 further includes a generating unit 202 configured for generating and spectrally shaping a reconstructed representation of a spectral envelope of the audio signal based on the generated reconstructed signal and received quantizer representation of a spectral envelope of the audio signal, to provide a synthesized audio signal. In addition, the decoder 400 includes a scaling unit 203 configured for up-scaling the synthesized audio signal based on received quantization indices for a gain, to provide a decoded audio signal.
With reference to
With reference to
In the following, an example of an embodiment of a quantizer unit 300 in an encoder unit 100 will be described with reference to
The I/O unit 330 may be interconnected to the processor 310 and/or the memory 320 via an I/O bus to enable input and/or output of relevant data such as input parameter(s) and/or resulting output parameter(s).
In the following, and example of an embodiment of a de-quantizer unit 400 in a decoder 200 will be described with reference to
The I/O unit 430 may be interconnected to the processor 410 and/or the memory 420 via an I/O bus to enable input and/or output of relevant data such as input parameter(s) and/or resulting output parameter(s).
In the following, an example of an embodiment of an encoder unit 100 will be described with reference to
The I/O unit 130 may be interconnected to the processor 110 and/or the memory 120 via an I/O bus to enable input and/or output of relevant data such as input parameter(s) and/or resulting output parameter(s).
In the following, an example of an embodiment of a decoder unit 200 will be described with reference to
Software component 103 may implement the functionality of step S30 in the embodiment described with reference to
The I/O unit 230 may be interconnected to the processor 210 and/or the memory 220 via an I/O bus to enable input and/or output of relevant data such as input parameter(s) and/or resulting output parameter(s).
At least some of the steps, functions, procedures, and/or blocks described above may be implemented in software for execution by a suitable processing device, such as a microprocessor, Digital Signal Processor (DSP) and/or any suitable programmable logic device, such as a Field Programmable Gate Array (FPGA) device.
It should also be understood that it might be possible to re-use the general processing capabilities of the network nodes. For example this may, be performed by reprogramming of the existing software or by adding new software components.
The software may be realized as a computer program product, which is normally carried on a computer-readable medium. The software may thus be loaded into the operating memory of a computer for execution by the processor of the computer. The computer/processor does not have to be dedicated to only execute the above-described steps, functions, procedures, and/or blocks, but may also execute other software tasks.
The technology described above is intended to be used in an audio encoder and decoder, which can be used in a mobile device (e.g. mobile phone, laptop) or a stationary PC. However, it can be equally adapted to be used in an image encoder and decoder.
The presented quantization scheme allows low-complexity scalable coding of received signals, in particular but not limited to HB audio signals. In particular, it enables an efficient and low cost utilization of variable bit rate schemes within a fixed bit rate framework. In this way, it overcomes the limitations of quantization in e.g. the conventional BWE schemes in the time domain as well as MDCT schemes in the frequency domain.
The embodiments described above are to be understood as a few illustrative examples. It will be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the scope of the present embodiments. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible. The scope of the present invention is, however, defined by the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2012/072491 | 11/13/2012 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61659605 | Jun 2012 | US |