This application is a 371 National Entry of PCT/EP2007/001730 filed 28 Feb. 2007, which claims priority to German Patent Application No. 102006022346.2 filed 12 May 2006.
The present invention relates to information signal encoding, such as audio or video encoding.
The usage of digital audio encoding in new communication networks as well as in professional audio productions for bi-directional real time communication necessitates a very inexpensive algorithmic encoding as well as a very short encoding delay. A typical scenario where the application of digital audio encoding becomes critical in the sense of the delay time exists when direct, i.e. unencoded, and transmitted, i.e. encoded and decoded signals are used simultaneously. Examples therefore are live productions using cordless microphones and simultaneous (in-ear) monitoring or “scattered” productions where artists play simultaneously in different studios. The tolerable overall delay time period in these applications is less than 10 ms. If, for example, asymmetrical participant lines are used for communication, the bit rate is an additional limiting factor.
The algorithmic delay of standard audio encoders, such as MPEG-1 3 (MP3), MPEG-2 AAC and MPEG-2/4 low delay ranges from 20 ms to several 100 ms, wherein reference is made, for example, to the article M. Lutzky, G. Schuller, M. Gayer; U. Kraemer, S. Wabnik: “A guideline to audio codec delay”, presented at the 116th AES Convention, Berlin, May 2004. Voice encoders operate at lower bit rates and with less algorithmic delay, but provide merely a limited audio quality.
The above outlined gap between the standard audio encoders on the one hand and the voice encoders on the other hand is, for example, closed by a type of encoding scheme described in the article B. Edler, C. Faller and G. Schuller, “Perceptual Audio Coding Using a Time-Varying Linear Pre- and Postfilter”, presented at 109th AES Convention, Los Angeles, September 2000, according to which the signal to be encoded is filtered with the inverse of the masking threshold on the encoder side and is subsequently quantized to perform irrelevance reduction, and the quantized signal is supplied to entropy encoding for performing redundancy reduction separate from the irrelevance reduction, while the quantized prefiltered signal is reconstructed on the decoder side and filtered in a postfilter with the marking threshold as transmission function. Such an encoding scheme, referred to as ULD (Ultra Low Delay) encoding scheme below, results in a perceptual quality that can be compared to standard audio encoders, such as MP3, for bit rates of approximately 80 kBit/s per channel and higher. An encoder of this type is, for example, also described in WO 2005/078703 Al.
Particularly, the ULD encoders described there use psychoacoustically controlled linear filters for forming the quantizing noise. Due to their structure, the quantizing noise is on the given threshold, even when no signal is in a given frequency domain. The noise remains inaudible, as long as it corresponds to the psychoacoustic masking threshold. For obtaining a bit rate that is even smaller than the bit rate as predetermined by this threshold, the quantizing noise has to be increased, which makes the noise audible. Particularly, the noise becomes audible in domains without signal portions. Examples therefore are very low and very high audio frequencies. Normally, there are only very low signal portions in these domains, while the masking threshold is high. If the masking threshold is increased uniformly across the whole frequency domain, the quantizing noise is at the increased threshold, even when there is no signal, so that the quantizing noise becomes audible as a signal that sounds spurious. Subband-based encoders do not have this problem, since the same simply quantize subbands having smaller signals than the threshold to zero.
The above-mentioned problem that occurs when the allowed bit rate falls below the minimum bit rate, which causes no spurious quantizing noise and which is determined by the masking threshold, is not the only one. Further, the ULD encoders described in the above references suffer from a complex procedure for obtaining a constant data rate, particularly since an iteration loop is used, which has to be passed in order to determine, per sampling block, an amplification factor value adjusting a dequantizing step size.
According to an embodiment, an apparatus for encoding an information signal into an encoded information signal may have a means for determining a representation of a psycho-perceptibility motivated threshold, which indicates a portion of the information signal irrelevant with regard to perceptibility, by using a perceptual model; a means for filtering the information signal for normalizing the information signal with regard to the psycho-perceptibility motivated threshold, for obtaining a prefiltered signal; a means for predicting the prefiltered signal in a forward-adaptive manner to obtain a predicted signal, a prediction error for the prefiltered signal and a representation of prediction coefficients, based on which the prefiltered signal can be reconstructed; and a means for quantizing the prediction error for obtaining a quantized prediction error, wherein the encoded information signal comprises information about the representation of the psycho-perceptibility motivated threshold, the representation of the prediction coefficients and the quantized prediction error.
According to another embodiment, an apparatus for decoding an encoded information signal comprising information about a representation of a psycho-perceptibility motivated threshold, a representation of prediction coefficients and a quantized prediction error into a decoded information signal may have a means for dequantizing the quantized prediction error for obtaining a dequantized prediction error; a means for determining a predicted signal based on the prediction coefficients; a means for reconstructing a prefiltered signal based on the predicted signal and the dequantized prediction error; and a means for filtering the prefiltered signal for reconverting a normalization with regard to the psycho-perceptibility motivated threshold for obtaining the decoded information signal.
According to another embodiment, a method for encoding an information signal into an encoded information signal, may have the steps of using a perceptibility model, determining a representation of a psycho-perceptibility motivated threshold indicating a portion of the information signal irrelevant with regard to perceptibility; filtering the information signal for normalizing the information signal with regard to the psycho-perceptibility motivated threshold for obtaining a prefiltered signal; predicting the prefiltered signal in a forward-adaptive manner to obtain a prefiltered signal, a prediction error to the prefiltered signal and a representation of prediction coefficients, based on which the prefiltered signal can be reconstructed; and quantizing the prediction error to obtain a quantized prediction error, wherein the encoded information signal comprises information about the representation of the psycho-perceptibility motivated threshold, the representation of the prediction coefficients and the quantized prediction error.
According to another embodiment, a method for decoding an encoded information signal comprising information about the representation of a psycho-perceptibility motivated threshold, a representation of prediction coefficients and a quantized prediction error into a decoded information signal may have the steps of dequantizing the quantized prediction error to obtain a dequantized prediction error; determining a predicted signal based on the prediction coefficient; reconstructing a prefiltered signal based on the predicted signal and the dequantized prediction error; and filtering the prefiltered signal for converting a normalization with regard to the psycho-perceptibility motivated threshold to obtain the decoded information signal.
Another embodiment may have a computer program with a program code for performing the inventive methods when the computer program runs on a computer.
According to another embodiment, an encoder may have an information signal input; a perceptibility threshold determiner operating according to a perceptibility model having an input coupled to the information signal input and a perceptibility threshold output; an adaptive prefilter comprising a filter input coupled to the information signal input, a filter output and a adaption control input coupled to the perceptibility threshold output, a forward prediction coefficient determiner comprising an input coupled to the prefilter output and a prediction coefficient output; a first subtractor comprising a first input coupled to the prefilter output, a second input and an output; a clipping and quantizing stage comprising a limited and constant number of quantizing levels, an input coupled to the subtractor output, a quantizing step size control input and an output; a step size adjuster comprising an input coupled to the output of the clipping and quantizing stage and a quantizing step size output coupled to the quantizing step size control input of the clipping and quantizing stage; a dequantizing stage comprising an input coupled to the output of the clipping/quantizing stage and a dequantizer control output; an adder comprising a first adder input coupled to the dequantizer output, a second adder input and an adder output; a prediction filter comprising a prediction filter input coupled to the adder output, a prediction filter output coupled to the second subtractor input as well as to the second adder input, as well as a prediction coefficient input coupled to the prediction coefficient output; an information signal generator comprising a first input coupled to the perceptibility threshold output, a second input coupled to the prediction coefficient output, a third input coupled to the output of the clipping and quantizing stage and an output representing an encoder output.
According to another embodiment, a decoder for decoding an encoded information signal comprising information about a representation of a psycho-perceptibility motivated threshold, prediction coefficients and a quantized prediction error, into a decoded information signal may have a decoder input; an extractor comprising an input coupled to the decoder input, a perceptibility threshold output, a prediction coefficient output and a quantized prediction error output; a dequantizer comprising a limited and constant number of quantizing levels, a dequantizer input coupled to the quantized prediction error output, a dequantizer output and a quantizing threshold control input; a backward-adaptive threshold adjuster comprising an input coupled to the quantized prediction error output, and an output coupled to the quantized threshold control input; an adder comprising a first adder input coupled to the dequantizer output, a second adder input and an adder output; a prediction filter comprising a precision filter input coupled to the adder output, a prediction filter output coupled to the second input, and a prediction filter coefficient input coupled to the prediction coefficient output; and an adaptive postfilter comprising a prediction filter input coupled to the adder output, a prediction filter output representing a decoder output, and an adaption control input coupled to the perceptibility threshold output.
The central idea of the present invention is the finding that extremely coarse quantization exceeding the measure determined by the masking threshold is made possible, without or only very little quality losses, by not directly quantizing the prefiltered signal but a prediction error obtained by forward-adaptive prediction of the prefiltered is. Due to the forward adaptivity, the quantizing error has no negative effect on the prediction coefficient.
According to a further embodiment, the prefiltered signal is even quantized in a nonlinear manner or even clipped, i.e. quantized via a quantizing function, which maps the unquantized values of the prediction error on quantizing indices of quantizing stages, and whose course is steeper below a threshold than above a threshold. Thereby, the noise PSD increased in relation to the masking threshold due to the low available bit rate adjusts to the signal PSD, so that the violation of the masking threshold does not occur at spectral parts without signal portion, which further improves the listening quality or maintains the listening quality, respectively, despite a decreasing available bit rate.
According to a further embodiment of the present invention, quantization is even quantized or limited, respectively, by clipping, namely by quantizing to a limited and fixed number of quantizing levels or stages, respectively. By prediction of the prefiltered signal via forward-adaptive prediction, the coarse quantization has no negative effect on the prediction coefficients themselves. By quantizing to a fixed number of quantizing levels, prevention of iteration for obtaining a constant bit rate is inherently enabled.
According to a further embodiment of the present invention, a quantizing step size or stage height, respectively, between the fixed number of quantizing levels is determined in a backward-adaptive manner from previous quantizing level indices obtained by quantization, so that, on the one hand, despite a very low number of quantizing levels, a better or at least best possible quantization of the prediction error or residual signal, respectively, can be obtained, without having to provide further side information to the decoder side. On the other hand, it is possible to ensure that transmission errors during transmission of the quantized residual signal to the decoder side only have a short-time effect on the decoder side with appropriate configuration of the backward-adaptive step size adjustment.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
a/b are graphs showing exemplarily the course of the noise spectrum in relation to the masking threshold and signal power spectrum density for the case of the encoder according to claim 1 (graph a) or for a comparative case of an encoder with backward-adaptive prediction of the prefiltered signal and iterative and masking threshold block-wise quantizing step size adjustment (graph b), respectively;
Before embodiments of the present invention will be discussed in more detail with reference to the drawings, first, for a better understanding of the advantages and principles of these embodiments, a possible implementation of an ULD-type encoding scheme will be discussed as comparative example, based on which the essential advantages and considerations underlying the subsequent embodiments, which have finally led to these embodiments, can be illustrated more clearly.
As has already been described in the introduction of the description, there is a need for an ULD version for lower bit rates of, for example, 64 k Bit/s, with comparable perceptual quality, as well as simpler scheme for obtaining a constant bit rate, particularly for intended lower bit rates. Additionally, it would be advantageous when the recovery time after a transmission error would remain low or at a minimum.
For redundancy reduction of the psychoacoustically preprocessed signal, the comparison ULD encoder uses a sample-wise backward-adaptive closed-loop prediction. This means that the calculation of prediction coefficients in encoder and decoder is based merely on past or already quantized and reconstructed signal samples. For obtaining an adaption to the signal or the prefiltered signal, respectively, a new set of predictor coefficients is calculated again for every sample. This results in the advantage that long predictors or prediction value determination formulas, i.e. particularly predictors having a high number of predictor coefficients can be used, since there is no requirement to transmit the predictor coefficients from encoder to decoder side. On the other hand, this means that the quantized prediction error has to be transmitted to the decoder without accuracy losses, for obtaining prediction coefficients that are identical to those underlying the encoding process. Otherwise, the predicted or predicated values, respectively, in the encoder and decoder would not be identical to each other, which would cause an instable encoding process. Rather, in the comparison ULD encoder, periodical reset of the predictor both on encoder and decoder side is necessitated to allow selective access to the encoded bit stream as well as to stop a propagation of transmission errors. However, the periodic resets cause bit rate peaks, which presents no problem for a channel with variable bit rate, but for channels with fixed bit rate where the bit rate peaks limit the lower limit of a constant bit rate adjustment.
As will result from the subsequent more detailed description of the ULD comparison encoding scheme with the embodiments of the present invention, these embodiments differ from the comparison encoding scheme by using a block-wise forward-adaptive prediction with a backward-adaptive quantizing step size adjustment instead of a sample-wise backward-adaptive prediction. On the one hand, this has the disadvantage that the predictors should be shorter in order to limit the amount of necessitated side information for transmitting the necessitated prediction coefficients towards the encoder side, which again might result in reduced encoder efficiency, but, on the other hand, this has the advantage that the procedure of the subsequent embodiments still functions effectively for higher quantizing errors, which are a result of reduced bit rates, so that the predictor on the decoder side can be used for quantizing noise shaping.
As will also result from the subsequent comparison, compared to the comparison ULD encoder, the bit rate is limited by limiting the range of values of the prediction remainder prior to transmission. This results in noise shaping modified compared to the comparison ULD encoding scheme, and also leads to different and less spurious listening artifacts. Further, a constant bit rate is generated without using iterative loops. Further, “reset” is inherently included for every sample block as result of the block-wise forward adaption. Additionally, in the embodiments described below, an encoding scheme is used for prefilter coefficients and forward prediction coefficients, which uses difference encoding with backward-adaptive quantizing step size control for an LSF (line spectral frequency) representation of the coefficients. The scheme provides block-wise access to the coefficients, generates a constant side information bit rate and is, above that, robust against transmission errors, as will be described below.
In the following, the comparison ULD encoder and decoder structure will be described in more detail, followed by the description of embodiments of the present invention and the illustration of its advantages in the transmission from higher constant bit rates to lower bit rates.
In the comparison ULD encoding scheme, the input signal of the encoder is analyzed on the encoder side by a perceptual model or listening model, respectively, for obtaining information about the perceptually irrelevant portions of the signal. This information is used to control a prefilter via time-varying filter coefficients. Thereby, the prefilter normalizes the input signal with regard to its masking threshold. The filter coefficients are calculated once for every block of 128 samples each, quantized and transmitted to the encoder side as side information.
After multiplication of the prefiltered signal with an amplification factor by subtracting the backward-adaptive predicted signal, the prediction error is quantized by a uniform quantizer, i.e. a quantizer with uniform step size. As already mentioned above, the predicted signal is obtained via sample-wise backward-adaptive closed-loop prediction. Accordingly, no transmission of prediction coefficients to the decoder is necessitated.Subsequently, the quantized prediction residual signal is entropy encoded. For obtaining a constant bit rate, a loop is provided, which repeats the steps of multiplication, prediction, quantizing and entropy-encoding several times for every block of prefiltered samples. After iteration, the highest amplification factor of a set of predetermined amplification values is determined, which still fulfills the constant bit rate condition. This amplification value is transmitted to the decoder. If, however, an amplification value smaller than one is determined, the quantizing noise is perceptible after decoding, i.e. its spectrum is shaped similar to the masking threshold, but its overall power is higher than predetermined by the prediction model. For portions of the input signal spectrum, the quantizing noise could even get higher than the input signal spectrum itself, which again generates audible artifacts in portions of the spectrum, where otherwise no audible signal would be present, due to the usage of a predictive encoder. The effects caused by quantizing noise represent a limiting factor when lower constant bit rates are of interest.
Continuing with the description of the comparison ULD scheme, the prefilter coefficients are merely transmitted as intraframe LSF differences, and also only as soon as the same exceed a certain limit. For avoiding transmission error propagation for an unlimited period, the system is reset from time to time. Additional techniques can be used for minimizing a decrease in perception of the decoded signal in the case of transmission errors. The transmission scheme generates a variable side information bit rate, which is leveled in the above-described loop by adjusting the above-mentioned amplification factor accordingly.
The entropy encoding of the quantized prediction residual signal in the case of the comparison ULD encoder comprises methods, such as a Golomb, Huffman, or arithmetic encoding method. The entropy encoding has to be reset from time to time and generates inherently a variable bit rate, which is again leveled by the above-mentioned loop.
In the case of the comparison ULD encoding scheme, the quantized prediction residual signal in the decoder is obtained from entropy encoding, whereupon the prediction remainder and the predicted signal are added, the sum is multiplied with the inverse of the transmitted amplification factor, and therefrom, the reconstructed output signal is generated via the postfilter having a frequency response inverse to the one of the prefilter, wherein the postfilter uses the transmitted prefilter coefficients.
A comparison ULD encoder of the just described type obtains, for example, an overall encoder/decoder delay of 5.33 to 8 ms at sample frequencies of 32 kHz to 48 kHz. Without (spurious loop) iterations, the same generates bit rates in the range of 80 to 96 kBit/s. As described above, at lower constant bit rates, the listening quality is decreased in this encoder, due to the uniform increase of the noise spectrum. Additionally, due to the iterations, the effort for obtaining a uniform bit rate is high. The embodiments described below overcome or minimize these disadvantages. At a constant transmission data rate, the encoding scheme of the embodiments described below causes altered noise shaping of the quantizing error and necessitates no iteration. More precisely, in the above-discussed comparison ULD encoding scheme, in the case of constant transmission data rate in an iterative process, a multiplicator is determined, with the help of which the signal coming from the prefilter is multiplied prior to quantizing, wherein the quantizing noise is spectrally white, which causes a quantizing noise in the decoder which is shaped like the listening threshold, but which lies slightly below or slightly above the listening threshold, depending on the selected multiplicator, which can, as described above, also be interpreted as a shift of the determinedlistening threshold. In connection therewith, quantizing noise results after decoding, whose power in the individual frequency domains can even exceed the power of the input signal in the respective frequency domain. The resulting encoding artifacts are clearly audible. The embodiments described below shape the quantizing noise such that its spectral power density is no longer spectrally white. The coarse quantizing/limiting or clipping, respectively, of the prefilter signal rather shapes the resulting quantizing noise similar to the spectral power density of the prefilter signal. Thereby, the quantizing noise in the decoder is shaped such that it remains below the spectral power density of the input signal. This can be interpreted as deformation of the determined listening threshold. The resulting encoding artifacts are less spurious than in the comparison ULD encoding scheme. Further, the subsequent embodiments necessitate no iteration process, which reduces complexity.
Since by describing the comparison ULD encoding scheme above, a sufficient base has been provided for turning the attention to the underlying advantages and considerations of the following embodiments for the description of these embodiments, first, the structure of an encoder according to an embodiment of the present invention will be described below.
The encoder of
As shown in
The prefilter or preestimation means 18 is coupled to both the masking threshold determination means 16 and the input 12 and filters the output signal for normalizing the same with regard to the masking threshold for obtaining a prefiltered signal f(n). The prefilter means 18 is based, for example, on a linear filter and is implemented to adjust the filter coefficients in dependence on the representation of the masking threshold provided by the masking threshold of the determination means 16, such that the transmission function of the linear filter corresponds substantially to the inverse of the masking threshold. Adjustment of the filter coefficients can be performed block-wise, half block-wise, such as in the case described below of the blocks overlapping by half in the masking threshold determination, or sample-wise, for example by interpolating the filter coefficients obtained by the block-wise determined masking threshold representations, or by filter coefficients obtained therefrom across the interblock gaps.
The forward prediction means 20 is coupled to the prefilter means 18,for subjecting the samples f(n) of the prefiltered signal, which are filtered adaptively in the time domain by using the psychoacoustic masking threshold to a forward-adaptive prediction, for obtaining a predicted signal {circumflex over (f)}(n), a residual signal r(n) representing a prediction error to the prefiltered signal f(n), and a representation of prediction filter coefficients, based on which the predicted signal can be reconstructed. Particularly, the forward-adaptive prediction means 20 is implemented to determine the representation of the prediction filter coefficients immediately from the prefiltered signal f and not only based on a subsequent quantization of the residual signal r. Although, as will be discussed in more detail below with reference to
The quantizing/clip means 22 is coupled to the prediction means 20, for quantizing or clipping, respectively, the residual signal via a quantizing function mapping the values r(n) of the residual signal to a constant and limited number of quantizing levels, and for transmitting the quantized residual signal obtained in that way in the shape of the quantizing indices ic(n), as has already been mentioned, to the forward-adaptive prediction means 20.
The quantized residual signal ic(n), the representation of the prediction coefficients determined by the means 20, as well as the representation of the masking threshold determined by the means 16 make up information provided to the decoder side via the encoded signal 14, wherein therefore the bit stream generation means 24 is provided exemplarily in
Before the more detailed structure of the encoder of
In the following, the structure of the encoder in
After the detailed structure of the encoder of
In the prediction coefficient calculation module 36, the samples f(n) of the prefiltered signal are processed in a block-wise manner, wherein the block-wise division can correspond exemplarily to the one of the audio signal 12 by the perceptual model module 26, but does not have to do this. For every block of prefiltered samples, the coefficient calculation module 36 calculates prediction coefficients for usage by the prediction filter 44. Therefore, the coefficient calculation module 36 performs, for example, LPC (LPC=linear predictive coding) analysis per block of the prefiltered signal for obtaining the prediction coefficients. The coefficient encoder 38 encodes then the prediction coefficients similar to the coefficient encoder 30, as will be discussed in more detail below, and outputs this representation of the prediction coefficients to the bit stream generator 24 and particularly the coefficient decoder 40, wherein the latter uses the obtained prediction coefficient representation for applying the prediction coefficients obtained in the LPC analysis by the coefficient calculation module 36 to the linear filter 44, so that the closed loop predictor consisting of the closed loop of filter 44, delay member 46 and adder 48 generates the predicted signal {circumflex over (f)}(n), which is again subtracted from the prefiltered signal f(n) by the subtractor 42. The linear filter 44 is, for example, a linear prediction filter of the type A(z)=Σi=1naiz−i of the length N, wherein the coefficient decoder 40 adjusts the values ai in dependence on the prediction coefficients calculated by the coefficient calculation module 36, i.e. the weightings with which the previous predicted values {circumflex over (f)}(n) plus the dequantized residual signal values are weighted and then summed for obtaining the new or current, respectively, predicted value {circumflex over (f)}
The prediction remainder r(n) obtained by the subtractor 42 is subject to uniform quantization, i.e. quantization with uniform quantizing step size, in the quantizer 56, wherein the step size Δ(n) is time-variable, and is calculated or determined, respectively, by the step size adaption module in a backward-adaptive manner, i.e. from the quantized residual values to the previous residual values r(m<n). More precisely, the uniform quantizer 56 outputs a quantized residual value q(n) per residual value r(n), which can be expressed as q(n)=i(n)·Δ(n) and can be referred to as provisional quantizing step with index. The provisional quantizing index i(n) is again clipped by the limiter 58, to the amount C=[−c;c], wherein c is a constant cε{1, 2, . . . }. Particularly, the limiter 58 is implemented such that all provisional index values i(n) with |i(n)|>c are either set to −c or c, depending on which is closer. Merely the clipped or limited, respectively, index sequence or series ic(n) is output by the limiter 58 to the bit stream generator 24, the dequantizer 50 and the step size adaption block 54 or the delay element 62, respectively, because the delay member 62, as well as all other delay members in the present embodiments, delays the incoming values by one sample.
Now, backward-adaptive step size control is realized via the step size adaption block 54, in that the same uses past index sequence values ic(n) delayed by the delay member 62 for constantly adapting the step size Δ(n), such that the area limited by the limiter 58, i.e. the area set by the “allowed” quantizing indices or the corresponding quantizing levels, respectively, is placed such to the statistic probability of occurrence of unquantized residual values r(n), that the allowed quantizing levels occur as uniformly as possible in the generated clipped quantizing index sequence stream ic(n). Particularly, the step size adaption module 60 calculates, for example, the current step size Δ(n) for example by using the two immediately preceding clipped quantizing indices ic(n−1) and i2(n−2) as well as the immediately previously determined step size value Δ(n−1) to Δ(n)=βΔ(n−1)+δ(n), with βε[0.0;1.0 [, δ(n)=δ0 for |ic(n−1)+ic(n−2)|≦I and δ(n)=δ1 for |ic(n−1)+ic(n−2)|>I, wherein δ0, δ1 and I are appropriately adjusted constants, as well as β.
As will be discussed in more detail below with reference to
The quantizing noise introduced in the quantizing index sequence qc(n) is no longer white due to the clipping. Rather, its spectral form copies the one of the prefiltered signal. For illustrating this, reference is briefly made to
The above description of the mode of operation of the encoder of
The following description deals with the transmission of the prefilter or prediction coefficients, respectively, calculated by the coefficient calculation modules 28 and 36 to the decoder side, i.e. particularly with an embodiment for the structure of the coefficient encoders 30 and 38.
As is shown, the coefficient encoders according to the embodiment of
An input of the LSF conversion module 102 directly follows the input 124. The subtractor 104 with its non-inverting input and its output is connected between the output of the LSF conversion module 102 and a first input of the subtractor 106, wherein a constant lc is applied to the input of the subtractor 104. The subtractor 106 is connected with its non-inverting input and its output between the first subtractor 104 and the quantizer 108, wherein its inverting input is coupled to an output of the prediction filter 120. Together with the delay member 118 and the adder 114, the prediction filter 120 forms a closed-loop predictor, in which the same are connected in series in a loop with feedback, such that the delay member 118 is connected between the output of the adder 114 and the input of the prediction filter 120, and the output of the prediction filter 120 is connected to a first input of the adder 114. The remaining structure corresponds again mainly to the one of the means 22 of the encoder 10, i.e. the quantizer 108 is connected between the output of the subtractor 106 and the input of the limiter 110, whose output is again connected to the output 126, an input of the delay member 116 and an input of the dequantizer 112. The output of the delay member 116 is connected to an input of the step size adaption module 122, which thus form together a step size adaption block. An output of the step size adaption module 122 is connected to step size control inputs of the quantizer 108 and the dequantizer 112. The output of the dequantizer 112 is connected to the second input of the adder 114.
After the structure of the coefficient encoder has been described above, its mode of operation will be described below, wherein reference is made again to
If the two coefficient encoders 30 and 38 are implemented in the way described in
Before results of listening tests, which have been obtained by an encoder according to
The decoder generally indicated by 200 in
After the basic structure of the decoder of
As has already been mentioned, the extractor 214 extracts the quantizing indices ic(n) representing the quantized prefilter residual signal from the encoded data stream at the input 202. In the uniform dequantizer 220, these quantizing indices are dequantized to the quantized residual values qc(n). Inherently, this dequantizing remains within the allowed quantizing levels, since the quantizing indices ic(n) have already been clipped on the encoder side. The step size adaption is performed in a backward-adaptive manner, in the same way as in the step size adaption block 54 of the encoder of
Assuming that the encoder 10 is provided with coefficient encoders 30 and 38, which are implemented as described in
The LSF residual signal indices le(n) incoming at the input 318 are dequantized by the dequantizer 308, wherein the dequantizer 308 uses the backward-adaptive step size values Δ(n), which had been determined in a backward-adaptive manner by the step size adaption module 306 from already dequantized quantizing indices, namely those that had been delayed by a sample by the delay member 302. The adder 312 adds the predicted signal to the dequantized LSF residual values, which calculates the combination of delay member 304 and prediction filter 210 from sums that the adder 312 has already calculated previously and thus represent the reconstructed LSF values, which are merely provided with a constant offset by the constant offset lc. The latter is corrected by the adder 314 by adding the value lc to the LSF values, which the adder 312 outputs. Thus, at the output of the adder 314, the reconstructed LSF values result, which are converted by the module 316 from the LSF domain back to reconstructed prediction or prefilter coefficients, respectively. Therefore, the LSF reconversion module 316 considers all spectral line frequencies, whereas the discussion of the other elements of
After providing both encoder and decoder embodiments above, listening test results will be presented below based on
For the comparison ULD encoding scheme, a backward-adaptive prediction with a length of 64 has been used in the implementation, together with a backward-adaptive Golomb encoder for entropy encoding, with a constant bit rate of 64 kBit/s. In contrast, for implementing the encoder according to
The results of the MUSHRA listening tests are shown in
The piece es01 (Suzanne Vega) is a good example for the superiority of the encoding scheme according to
The signal transients of the piece sm02 (Glockenspiel) have a high bit rate requirement for the comparison ULD encoding scheme. In the used 64 kBit/s, the comparison ULD encoding scheme generates spurious encoding artifacts across full blocks of samples. In contrast, the encoder operating according to
In summary, from the above-described embodiments, an audio encoding scheme with low delay results, which uses a block-wise forward-adaptive prediction together with clipping/limiting instead of a backward-adaptive sample-wise prediction. The noise shaping differs from the comparison ULD encoding scheme. The listening test has shown that the above-described embodiments are superior to the backward-adaptive method according to the comparison ULD encoding scheme in the case of lower bit rates. Subsequently, the same are a candidate for closing the bit rate gap between high quality voice encoders and audio encoders with low delay. Overall, the above-described embodiments provided a possibility for audio encoding schemes having a very low delay of 6-8 ms for reduced bit rates, which has the following advantages compared to the comparison ULD encoder. The same is more robust against high quantizing errors, has additional noise shaping abilities, has a better ability for obtaining a constant bit rate, and shows a better error recovery behavior. The problem of audible quantizing noise at positions without signal, as is the case in the comparison ULD encoding scheme, is addressed by the embodiment by a modified way of increasing the quantizing noise above the masking threshold, namely by adding the signal spectrum to the masking threshold instead of uniformly increasing the masking threshold to a certain degree. In that way, there is no audible quantizing noise at positions without signal.
In other words, the above embodiments differ from the comparison ULD encoding scheme in the following way. In the comparison ULD encoding scheme, backward-adaptive prediction is used, which means that the coefficients for the prediction filter A(z) are updated on a sample-by-sample basis from previously decoded signal values. A quantizer having a variable step size is used, wherein the step size adapts all 128 samples by using information from the entropy encoders and the same is transmitted as side information to the decoder side. By this procedure, the quantizing step size is increased, which adds more white noise to the prefiltered signal and thus uniformly increases the masking threshold. If the backward-adaptive prediction is replaced with a forward-adaptive block-wise prediction in the comparison ULD encoding scheme, which means that the coefficients for the prediction filter A(z) are calculated once for 128 samples from the unquantized prefiltered samples, and transmitted as side information, and if the quantizing step size is adapted for the 128 samples by using information from the entropy encoder and transmitted as side information to the decoder side, the quantizing step size is still increased, as it is the case in the comparison ULD encoding scheme, but the predictor update is unaffected by any quantization. The above embodiments used only a forward adapted block-wise prediction, wherein additionally the quantizer had merely a given number 2N+1 of quantizing stages having a fixed step size. For the prefiltered signals x(n) with amplitudes outside the quantizer range [−NΔ;NΔ] the quantized signal was limited to [−NΔ;NΔ]. This results in a quantizing noise having a PSD, which is no longer white, but copies the PSD of the input signal, i.e. the prefiltered audio signal.
As a conclusion, the following is to be noted on the above embodiments. First, it should be noted that different possibilities exist for transmitting information about the representation of the masking threshold, as they are obtained by the perceptual model module 26 within the encoder to the prefilter 34 or prediction filter 44, respectively, and to the decoder, and there particularly to the postfilter 232 and the prediction filter 226. Particularly, it should be noted that it is not necessitated that the coefficient decoders 32 and 40 within the encoder receive exactly the same information with regard to the masking threshold, as it is output at the output 14 of the encoder and as it is received at the output 202 of the decoder. Rather, it is possible, that, for example in a structure of the coefficient encoder 30 according to
On this occasion, the following should be noted. Above, it has been described with reference to
In the above embodiments, it has been assumed that the masking threshold is calculated in the module 26. However, it should be noted that the calculated threshold does not have to exactly correspond to the psychoacoustic threshold, but can represent a more or less exact estimation of the same, which might not consider all psychoacoustic effects but merely some of them. Particularly, the threshold can represent a psychoacoustically motivated threshold, which has been deliberately subject to a modification in contrast to an estimation of the psychoacoustic masking threshold.
Further, it should be noted that the backward-adaptive adaption of the step size in quantizing the prefilter residual signal values does not necessarily have to be present. Rather, in certain application cases, a fixed step size can be sufficient.
Further, it should be noted that the present invention is not limited to the field of audio encoding. Rather, the signal to be encoded can also be a signal used for stimulating a fingertip in a cyber-space glove, wherein the perceptual model 26 in this case considers certain tactile characteristics, which the human sense of touch can no longer perceive. Another example for an information signal to be encoded would be, for example, a video signal. Particularly the information signal to be encoded could be a brightness information of a pixel or image point, respectively, wherein the perceptual model 26 could also consider different temporal, local and frequency psychovisual covering effects, i.e. a visual masking threshold.
Additionally, it should be noted that quantizer 56 and limiter 58 or quantizer 108 and limiter 110, respectively, do not have to be separate components. Rather, the mapping of the unquantized values to the quantized/clipped values could also be performed by a single mapping. On the other hand, the quantizer 56 or the quantizer 108, respectively, could also be realized by a series connection of a divider followed by a quantizer with uniform and constant step size, where the divider would use the step size value Δ(n) obtained from the respective step size adaption module as divisor, while the residual signal to be encoded formed the dividend. The quantizer having a constant and uniform step size could be provided as simple rounding module, which rounds the division result to the next integer, whereupon the subsequent limiter would then limit the integer as described above to an integer of the allowed amount C. In the respective dequantizer, a uniform dequantization would simply be performed with Δ(n) as multiplicator.
Further, it should be noted that the above embodiments were restricted to applications having a constant bit rate. However, the present invention is not limited thereto and thus quantization by clipping of, for example, the prefiltered signal used in these embodiments is only one possible alternative. Instead of clipping, a quantizing function with nonlinear characteristic curve could be used. For illustrating this, reference is made to
Further, the above-described embodiments can also be varied with regard to the processing of the encoded bit stream. Particularly, bit stream generator and extractor 214, respectively, could also be omitted.
The different quantizing indices, namely the residual values of the prefiltered signals, the residual values of the prefilter coefficients and the residual values of the prediction coefficients could also be transmitted in parallel to each other, stored or made available in another way for decoding, separately via individual channels. On the other hand, in the case that a constant bit rate is not imperative, these data could also be entropy-encoded.
Particularly, the above functions in the blocks of
Particularly, it should be noted that depending on the circumstances, the inventive scheme could also be implemented in software. The implementation can be made on a digital memory medium, particularly a disc or CD with electronically readable control signals, which can cooperate with a programmable computer system such that the respective method is performed. Generally, thus, the invention consists also in a computer program product having a program code stored on a machine-readable carrier for performing the inventive method when the computer program product runs on the computer. In other words, the invention can be realized as a computer program having a program code for performing the method when the computer program runs on a computer.
While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10 2006 022 346 | May 2006 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2007/001730 | 2/28/2007 | WO | 00 | 5/15/2009 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2007/131564 | 11/22/2007 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4385393 | Chaure et al. | May 1983 | A |
4677671 | Galand et al. | Jun 1987 | A |
4751736 | Gupta et al. | Jun 1988 | A |
5138662 | Amano | Aug 1992 | A |
5142583 | Galand | Aug 1992 | A |
5347478 | Suzuki | Sep 1994 | A |
5699484 | Davis | Dec 1997 | A |
5781888 | Herre | Jul 1998 | A |
5926785 | Akamine | Jul 1999 | A |
5956674 | Smyth et al. | Sep 1999 | A |
6101464 | Serizawa | Aug 2000 | A |
6104996 | Yin | Aug 2000 | A |
6377915 | Sasaki | Apr 2002 | B1 |
6401062 | Murashima | Jun 2002 | B1 |
6487535 | Smyth et al. | Nov 2002 | B1 |
6675148 | Hardwick | Jan 2004 | B2 |
6778953 | Edler et al. | Aug 2004 | B1 |
6810381 | Sasaki | Oct 2004 | B1 |
6950794 | Subramaniam et al. | Sep 2005 | B1 |
7110953 | Edler et al. | Sep 2006 | B1 |
7171355 | Chen | Jan 2007 | B1 |
20010053973 | Tsuzuki | Dec 2001 | A1 |
20020147584 | Hardwick | Oct 2002 | A1 |
20020184005 | Gigi | Dec 2002 | A1 |
20030149559 | Lopez-Estrada | Aug 2003 | A1 |
20040015346 | Yasunaga | Jan 2004 | A1 |
20040093208 | Yin | May 2004 | A1 |
20040181398 | Sung | Sep 2004 | A1 |
20040184537 | Geiger et al. | Sep 2004 | A1 |
20050114126 | Geiger et al. | May 2005 | A1 |
20060147124 | Edler et al. | Jul 2006 | A1 |
20060271355 | Wang | Nov 2006 | A1 |
20060271357 | Wang et al. | Nov 2006 | A1 |
20070016402 | Schuller et al. | Jan 2007 | A1 |
20070016403 | Schuller et al. | Jan 2007 | A1 |
20070027678 | Hotho et al. | Feb 2007 | A1 |
20070043557 | Schuller et al. | Feb 2007 | A1 |
20070100639 | Den Brinker et al. | May 2007 | A1 |
20070112560 | Gerrits et al. | May 2007 | A1 |
20070124139 | Chen | May 2007 | A1 |
20080027720 | Kondo | Jan 2008 | A1 |
20080112632 | Vos et al. | May 2008 | A1 |
20090240492 | Zopf et al. | Sep 2009 | A1 |
Number | Date | Country |
---|---|---|
WO 2005078704 | Aug 2005 | DE |
102004007184 | Sep 2005 | DE |
2150377 | Jun 1985 | GB |
2159377 | Nov 1985 | GB |
2144222 | Oct 2000 | RU |
WO 0063886 | Oct 2000 | WO |
WO 02082425 | Oct 2002 | WO |
WO 2005078703 | Aug 2005 | WO |
WO 2005078705 | Aug 2005 | WO |
Entry |
---|
Tzeng. “Analysis-by-Synthesis Linear Predictive Speech Coding at 2.4 kbit/s” 1989. |
Edler et al. “Audio Coding Using a Psychoacoustic Pre- and Post-Filter” 2000. |
Liebchen et al. “Improved Forward-Adaptive Prediction for MPEG-4 Audio Lossless Coding” May 31, 2005. |
Vass et al. “Adaptive Forward-Backward Quantizer for Low Bit Rate High Quality Speech Coding” 1997. |
Kramer et al. “Ultra Low Delay audio coding with constant bit rate” 2004. |
Lutzky et al. “Structural analysis of low latency audio coding schemes” 2005. |
Wabnik et al. “Packet Loss Concealment in Predictive Audio Coding” 2005. |
Harma. “Evaluation of a Warped Linear Predictive Coding Scheme” 2000. |
Wylie. “apt-X100: Low-Delay,Low-Bit-RateSubband ADPCM Digital Audio Coding” 1995. |
de Bont et al. “A High Quality Audio-Coding System at 128kb/s” 1995. |
Edler, et al. “Audio coding using a psychoacoustic pre-and post-filter.” Acoustics, Speech, and Signal Processing, 2000. ICASSP'00. Proceedings. 2000 IEEE International Conference on. vol. 2. IEEE, Jun. 2000, pp. 881-884. |
Schuller, Gerald, and Aki Hanna. “Low delay audio compression using predictive coding.” Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on. vol. 2. IEEE, May 2002, pp. 1853-1856. |
Edler, Bernd, et al. “Perceptual audio coding using a time-varying linear pre-and post-filter.” Audio Engineering Society Convention 109. Audio Engineering Society, Sep. 2000, pp. 1-12. |
Russian Decision to Grant, with English Translation, in related Russian Patent Application No. 2008148961, Decision dated Jun. 9, 2010, 26 pages. |
Schuller, et al.; “Low delay audio compression using predictive coding”; May 13-17, 2002; IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, p. II-1853, XP010804256, ISBN: 0-7803-7402-9. |
Wabnik, et al.; “Reduced Bit Rate Ultra Low Delay Audio Coding”; May 20, 2006; 120th AES Convention, XP002437647. |
Wabnik, et al.; “Different Quantisation Noise Shaping Methods for Predictive Audio Coding”; May 14-19, 2006; IEEE Acoustics, Speech and Signal Processing, vol. 5. |
Wabnik, et al; “Frequency Warping in Low Delay Audio Coding”; Mar. 18-23, 2005; ICASSP, vol. 3, pp. III-181 through III-184. |
Lutzky, et al; “A guideline to audio codec delay”; May 8-11, 2004; Presented at the 116th Convention Audio Engineeering Society, Convention Paper 6062, pp. 1-10. |
Schuller, et al.; “Perceptual Audio Coding Using Adaptive Pre- and Post-Filters and Lossless Compression”;Sep. 2002; IEEE Transactions on Speech and Audio Processing, vol. 10, No. 6, pp. 379-390. |
Edler, et al.; “Perceptual Audio Coding Using a Time-Varying Linear Pre- and Post-Filter”; Sep. 22-25, 2000; AES 109th Convention. |
Number | Date | Country | |
---|---|---|---|
20090254783 A1 | Oct 2009 | US |