1. Field of the Invention
The present invention relates to audio coder and decoders and audio coding in general and, in particular, to audio codings allowing audio signals to be coded with a short delay time.
2. Description of Prior Art
The audio compression method best known at present is MPEG-1 Layer III. With this compression method, the sample or audio values of an audio signal are coded into a coded signal in a lossy manner. Put differently, irrelevance and redundancy of the original audio signal are reduced or ideally removed when compressing. In order to achieve this, simultaneous and temporal maskings are recognized by a psycho-acoustic model, i.e. a temporally varying masking threshold depending on the audio signal is calculated or determined indicating from which volume on tones of a certain frequency are perceivable for human hearing. This information in turn is used for coding the signal by quantizing the spectral values of the audio signal in a more precise or less precise manner or not at all, depending on the masking threshold, and integrating same into the coded signal.
Audio compression methods, such as, for example, the MP3 format, experience a limit in their applicability when audio data is to be transferred via a bit rate-limited transmission channel in a, on the one hand, compressed manner, but, on the other hand, with as small a delay time as possible. In some applications, the delay time does not play a role, such as, for example, when archiving audio information. Small delay audio coders, which are sometimes referred to as “ultra low delay coders”, however, are necessary where time-critical audio signals are to be transmitted, such as, for example, in teleconferencing, in wireless loudspeakers or microphones. For these fields of application, the article by Schuller G. et al. “Perceptual Audio Coding using Adaptive Pre- and Post-Filters and Lossless Compression”, IEEE Transactions on Speech and Audio Processing, vol. 10, no. 6, September 2002, pp. 379-390, suggests audio coding where the irrelevance reduction and the redundancy reduction are not performed based on a single transform, but on two separate transforms.
The principle will be discussed subsequently referring to
After filtering the audio values 906, quantization with a constant step size takes place, such as, for example, a rounding operation to the next integer. The quantizing noise caused by this is white noise. On the decoder side, the filtered signal is “retransformed” again by a parameterizable filter, the transfer function of which is set to the magnitude of the masking threshold itself. Not only is the filtered signal decoded again by this, but the quantizing noise on the decoder side is also adjusted to the form or shape of the masking threshold. In order for the quantizing noise to correspond to the masking threshold as precisely as possible, an amplification value a# applied to the filtered signal before quantizing is calculated on the coder side for each parameter set or each parameterization. In order for the retransform to be performed on the decoder side, the amplification value a and the parameterization x are transferred to the coder as side information 910 apart from the actual main data, namely the quantized filtered audio values 912. For the redundancy reduction 914, this data, i.e. the side information 910 and the main data 912, is subjected to a loss-free compression, namely entropy coding, which is how the coded signal is obtained.
The above-mentioned article suggests a size of 128 sample values 906 as a block size. This allows a relatively short delay of 8 ms with a sampling rate of 32 kHz. With reference to the detailed implementation, the article also states that, for increasing the efficiency of the side information coding, the side information, namely the coefficients x# and a#, will only be transferred if there are sufficient changes compared to a parameter set transferred before, i.e. if the changes exceed a certain threshold value. In addition, it is described that the implementation is preferably performed such that a current parameter set is not directly applied to all the sample values belonging to the respective block, but that a linear interpolation of the filter coefficients x# is used to avoid audible artifacts. In order to perform the linear interpolation of the filter coefficients, a lattice structure is suggested for the filter to prevent instabilities from occurring. For the case that a coded signal with a controlled bit rate is desired, the article also suggests selectively multiplying or attenuating the filtered signal scaled with the time-depending amplification factor a by a factor unequal to 1 so that audible interferences occur, but the bit rate can be reduced at sites of the audio signal which are complicated to code.
Although the audio coding scheme described in the article mentioned above already reduces the delay time for many applications to a sufficient degree, a problem in the above scheme is that, due to the requirement of having to transfer the masking threshold or transfer function of the coder-side filter, subsequently referred to as pre-filter, the transfer channel is loaded to a relatively high degree even though the filter coefficients will only be transferred when a predetermined threshold is exceeded.
Another disadvantage of the above coding scheme is that, due to the fact that the masking threshold or inverse thereof has to be made available on the decoder side by the parameter set x# to be transferred, a compromise has to be made between the lowest possible bit rate or high compression ratio on the one hand and the most precise approximation possible or parameterization of the masking threshold or inverse thereof on the other hand. Thus, it is inevitable for the quantizing noise adjusted to the masking threshold by the above audio coding scheme to exceed the masking threshold in some frequency ranges and thus result in audible audio interferences for the listener.
Another problem with the audio coding scheme according to
It is an object of the present invention to provide a more effective audio coding scheme.
In accordance with a first aspect, the present invention provides a device for coding an audio signal of a sequence of audio values into a coded signal, having: means for applying a psycho-acoustic model to a first block of audio values of the sequence of audio values and a second block of audio values of the sequence of audio values; means for calculating a version of a first parameterization of a parameterizable filter based on a result of applying the psycho-acoustic model to the first block and a version of a second parameterization of the parameterizable filter based on a result of applying the psycho-acoustic model to the second block; means for filtering a predetermined block of audio values of the sequence of audio values with the parameterizable filter using a predetermined parameterization which in a predetermined manner depends on the version of the second parameterization to obtain a block of filtered audio values corresponding to the predetermined block; means for quantizing the filtered audio values to obtain a block of quantized filtered audio values; means for forming a combination of the version of the first parameterization and the version of the second parameterization including at least a difference between the version of the first parameterization and the version of the second parameterization; and means for integrating information from which the quantized filtered audio values and a version of the first parameterization may be derived and which includes the combination into the coded signal.
In accordance with a second aspect, the present invention provides a method for coding an audio signal of a sequence of audio values into a coded signal, having the steps of: applying a psycho-acoustic model to a first block of audio values of the sequence of audio values and a second block of audio values of the sequence of audio values; calculating a version of a first parameterization of a parameterizable filter based on a result of applying the psycho-acoustic model to the first block a version of a second parameterization of the parameterizable filter based on a result of applying the psycho-acoustic model to the second block; filtering a predetermined block of audio values of the sequence of audio values with the parameterizable filter using a predetermined parameterization which in a predetermined manner depends on the version of the second parameterization to obtain a block of filtered audio values corresponding to the predetermined block; quantizing the filtered audio values to obtain a block of quantized filtered audio values; forming a combination of the version of the first parameterization and the version of the second parameterization including at least a difference between the version of the first parameterization and the version of the second parameterization; and integrating information from which the quantized filtered audio values may be derived and which includes the combination into the coded signal.
In accordance with a third aspect, the present invention provides a device for decoding a coded signal into an audio signal, the coded signal containing information from which a block of quantized filtered audio values and a version of a first parameterization according to which a transfer function of a parameterizable filter corresponds to a first result of applying a psycho-acoustic model may be derived, and which includes a combination between a version of a second parameterization according to which a transfer function of the parameterizable filter corresponds to a second result of applying the psycho-acoustic model and the version of the first parameterization including at least a difference between the version of the first parameterization and the version of the second parameterization, having: means for deriving the version of the first parameterization from the coded signal; means for calculating a sum between the version of the first parameterization and the difference to obtain the version of the second parameterization; and means for filtering the block of quantized filtered audio values with a parameterizable filter using the version of the second parameterization such that the transfer function thereof corresponds to a result of applying the psycho-acoustic model to obtain a block of decoded audio values of the audio signal.
In accordance with a fourth aspect, the present invention provides a method for decoding a coded signal into an audio signal, wherein the coded signal contains information from which a block of quantized filtered audio values and a version of a first parameterization according to which a transfer function of a parameterizable filter corresponds to a first result of applying a psycho-acoustic model may be derived, and which includes a combination between a version of a second parameterization according to which a transfer function of the parameterizable filter corresponds to a second result of applying the psycho-acoustic model and the version of the first parameterization which includes at least a difference between the version of the first parameterization and the version of the second parameterization, having the steps of: deriving the version of the first parameterization from the coded signal; calculating a sum between the version of the first parameterization and the difference to obtain the version of the second parameterization; and filtering the block of quantized filtered audio values with a parameterizable filter using the version of the second parameterization such that the transfer function thereof corresponds to a result of applying the psycho-acoustic model to obtain a block of decoded audio values of the audio signal.
In accordance with a fifth aspect, the present invention provides a computer program having a program code for performing one of the above mentioned methods when the computer program runs on a computer.
Inventive coding of an audio signal of a sequence of audio values into a coded signal includes determining a first listening threshold for a first block of audio values of the sequence of audio values and a second listening threshold for a second block of audio values of the sequence of audio values; calculating a version of a first parameterization of a parameterizable filter such that the transfer function thereof roughly corresponds to the inverse of the magnitude of the first listening threshold and a version of a second parameterization of the parameterizable filter such that the transfer function thereof roughly corresponds to the inverse of the magnitude of the second listening threshold; filtering a predetermined block of audio values of the sequence of audio values with the parameterizable filter using a predetermined parameterization which in a predetermined manner depends on the version of the second parameterization to obtain a block of filtered audio values corresponding to the predetermined block; quantizing the filtered audio values to obtain a block of quantized filtered audio values; forming a combination of the version of the first parameterization and the version of the second parameterization including at least a difference between the version of the first parameterization and the version of the second parameterization; and integrating information from which the quantized filtered audio values and a version of the first parameterization may be derived and which includes the combination into the coded signal.
The central idea of the present invention is that a higher compression ratio may be achieved by transferring differences of successive parameterizations.
If, additionally, the transfer of parameterizations only takes place when there is a sufficient difference between same, the finding of the present invention will in particular also be that in this case, too, although the parameterization differences do not fall below the minimum difference measure, nevertheless the transfer of differences between two parameterizations provides a compression increase, instead of parameterization, more than compensating for the additional complexity of calculating the difference on the coder side and calculating the sum on the decoder side.
According to an embodiment of the present invention, the pure differences between successive parameterizations are transferred, whereas according to another embodiment the minimum threshold starting from which parameterizations of new nodes will be transferred is subtracted from these differences.
Preferred embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
a shows a schematic diagram for illustrating the incoming audio signal, the sequence of audio values it consists of, and the operating steps of
b shows a schematic diagram for illustrating the setup of the coded signal;
a shows a diagram where an embodiment of a quantizing step function is shown;
b shows a diagram where another embodiment of a quantizing step function is shown;
The audio coder 10 of
The irrelevance reduction part 16 and the redundancy reduction part 18 are connected in series in this order between the data input 12 and the data output 14. In particular, the data input 12 is connected to a data input of the means 20 for determining a listening threshold and to a data input of the input buffer 32. A data output of the means 20 for determining a listening threshold is connected to an input of the means 24 for calculating a parameterization and to a data input of the means 22 for calculating an amplification value to pass on a listening threshold determined to same. The means 22 and 24 calculate a parameterization or amplification value based on the listening threshold and are connected to the node comparing means 26 to pass on these results to same. Depending on the result of the comparison, the node comparing means 26, as will be discussed subsequently, passes on the results calculated by the means 22 and 24 as input parameter or parameterization to the parameterizable pre-filter 30. The parameterizable pre-filter 30 is connected between a data output of the input buffer 32 and a data input of the buffer 38. The multiplier 40 is connected between a data output of the buffer 38 and the quantizer 28. The quantizer 28 passes on filtered audio values which may be multiplied or scaled, but always quantized, to the redundancy reduction part 18, more precisely to a data input of the compressor 34. The node comparing means 26 passes on information from which the input parameters passed to the parameterizable pre-filter 30 may be derived to the redundancy reduction part 18, more precisely to another data input of the compressor 34. The bit rate controller is connected to a control input of the multiplier 40 via a control connection to provide for the quantized filtered audio values, as received from the pre-filter 30, to be multiplied by the multiplier 40 by a suitable multiplicand, as will be discussed in greater detail below. The bit rate controller 36 is connected between a data output of the compressor 34 and the data output 14 of the audio coder 10 in order to determine the multiplicand for the multiplier 40 in a suitable manner. When each audio value passes the quantizer 40 for the first time, the multiplicand is at first set to a suitable scaling factor, such as, for example, 1. The buffer 38, however, continues storing each filtered audio value to give the bit rate controller 36, as will be described subsequently, a possibility of changing the multiplicand for another pass of a block of audio values. If such a change is not indicated by the bit rate controller 36, the buffer 38 may release the memory taken up by this block.
After the setup of the audio coder of
As can be seen from
a at 54 indicates the sequence of sample values, each sample value being illustrated by a rectangle 56. The sample values are numbered for illustration purposes, wherein for reasons of clarity in turn only some sample values of the sequence 54 are shown. As is indicated by braces above the sequence 54, 128 successive sample values each are combined to form a block according to the present embodiment, wherein the directly successive 128 sample values form the next block. Only as a precautionary measure, it is to be pointed out that the combination to form blocks could also be performed differently, exemplarily by overlapping blocks or spaced-apart blocks and blocks having another block size, although the block size of 128 in turn is preferred since it provides a good tradeoff between high audio quality on the one hand and the smallest possible delay time on the other hand.
Whereas the audio blocks combined in the means 20 in step 52 are processed in the means 20 for determining a listening threshold block by block, the incoming audio values will be buffered 54 in the input buffer 32 until the parameterizable pre-filter 30 has obtained input parameters from the node comparing means 26 to perform pre-filtering, as will be described subsequently.
As can be seen from
In a subsequent step 64, the means 24 and the means 22 calculate from the listening threshold M(f) calculated (f indicating the frequency) an amplification value a or parameter set of N parameters x(i) (i=1, . . . , N). The parameterization x(i) which the means 24 calculates in step 64 is provided for the parameterizable pre-filter 30 which is, for example, embodied in an adaptive filter structure, as is used in LPC coding (LPC=linear predictive coding). For example, s(n), n=0, . . . , 127, be the 128 audio values of the current audio block and s′(n) be the resulting filtered 128 audio values, then the filter is exemplarily embodied such that the following equation applies:
K being the filter order and akt, k=1, . . . , K, being the filter coefficients, and the index t is to illustrate that the filter coefficients change in successive audio blocks. The means 24 then calculates the parameterization akt such that the transfer function H(f) of the parameterizable pre-filter 30 roughly equals the inverse of the magnitude of the masking threshold M(f), i.e. such that the following applies:
wherein the dependence of t in turn is to illustrate that the masking threshold M(f) changes for different audio blocks. When implementing the pre-filter 30 as the adaptive filter mentioned above, the filter coefficients akt will be obtained as follows: the inverse discrete Fourier transform of |M(f, t)|2 over the frequency for the block at the time t results in the target auto-correlation function rmmt(i). Then, the akt are obtained by solving the linear equation system:
In order for no instabilities to arise between the parameterizations in the linear interpolation described in greater detail below, a lattice structure is preferably used for the filter 30, wherein the filter coefficients for the lattice structure are re-parameterized to form reflection coefficients. With regard to further details as to the design of the pre-filter, the calculation of the coefficients and the re-parameterization, reference is made to the article by Schuller etc. mentioned in the introduction to the description and, in particular, to page 381, division III, which is incorporated herein by reference.
Whereas consequently the means 24 calculates a parameterization for the parameterizable pre-filter 30 such that the transfer function thereof equals the inverse of the masking threshold, the means 22 calculates a noise power limit based on the listening threshold, namely a limit indicating which noise power the quantizer 28 is allowed to introduce into the audio signal filtered by the pre-filter 30 in order for the quantizing noise on the decoder side to be below the listening threshold M(f) or exactly equal it after post- or reverse-filtering. The means 22 calculates this noise power limit as the area below the square of the magnitude of the listening threshold M, i.e. as Σ|M(f)|2. The means 22 calculates the amplification value a from the noise power limit by calculating the root of the fraction of the quantizing noise power divided by the noise power limit. The quantizing noise is the noise caused by the quantizer 28. The noise caused by the quantizer 28 is, as will be described below, white noise and thus frequency-independent. The quantizing noise power is the power of the quantizing noise.
As has become evident from the above description, the means 22 also calculates the noise power limit apart from the amplification value a. Although it is possible for the node comparing means 26 to again calculate the noise power limit from the amplification value a obtained from the means 22, it is also possible for the means 22 to also transmit the noise power limit determined to the node comparing means 26 apart from the amplification value a.
After calculating the amplification value and the parameterization, the node comparing means 26 checks in step 66 whether the parameterization just calculated differs by more than a predetermined threshold from the current last parameterization passed on to the parameterizable pre-filter. If the check in step 66 has the result that the parameterization just calculated differs from the current one by more than the predetermined threshold, the filter coefficients just calculated and the amplification value just calculated or noise power limit are buffered in the node comparing means 26 for an interpolation to be discussed and the node comparing means 26 hands over to the pre-filter 30 the filter coefficients just calculated in step 68 and the amplification value just calculated in step 70. If, however, this is not the case and the parameterization just calculated does not differ from the current one by more than the predetermined threshold, the node comparing means (26) will hand over to the pre-filter 30 in step 72, instead of the parameterization just calculated, only the current node parameterization, i.e. that parameterization which last resulted in a positive result in step 66, i.e. differed from a previous node parameterization by more than a predetermined threshold. After steps 70 and 72, the process of
In the case that the parameterization just calculated does not differ from the current node parameterization and consequently the pre-filter 30 in step 72 again obtains the node parameterization already obtained for at least the last audio block, the pre-filter 30 will apply this node parameterization to all the sample values of this audio block in the FIFO 32, as will be described in greater detail below, which is how this current block is taken out of the FIFO 32 and the quantizer 28 receives a resulting audio block of pre-filtered audio values.
In step 80, the parameterizable pre-filter 30 checks whether a handover of filter coefficients just calculated from the node comparing means 26 has taken place, or of older node parameterizations. The pre-filter 30 performs the check 80 until such a handover has taken place.
As soon as such a handover has taken place, the parameterizable pre-filter 30 starts processing the current audio block of audio values just in the buffer 32, i.e. that one for which the parameterization has just been calculated. In
It is assumed in
The parameterization calculated for block 1 still located in the FIFO 32, however, in contrast differed, according to the illustrative example of
At the time when the parameter set a1, x1 is passed on, only the audio values 128-255, i.e. the current audio block after the last audio block 0 processed by the pre-filter 30, are in the memory 32. After determining the handover of node parameters x1(i) in step 80, the pre-filter 30 determines the noise power limit q1 corresponding to the amplification value a1 in step 84. This may take place by the node comparing means 26 passing on this value to the pre-filter 30 or by the pre-filter 30 again calculating this value, as has been described above referring to step 64.
After that, an index j is initialized to a sample value in step 86 to point to the oldest sample value remaining in the FIFO memory 32 or the first sample value of the current audio block “block 1”, i.e. in the present example of
In step 88, the parameterizable pre-filter 30 performs the interpolation of the filter coefficients x0, x1 between the two nodes in the form of a linear interpolation to obtain the interpolated filter coefficients at the sample position j, i.e. x(tj)(i), i=1 . . . N.
After that, namely in step 90, the parameterizable pre-filter 30 performs an interpolation between the noise power limit q1 and qc to obtain an interpolated noise power limit at the sample position j, i.e. q(tj).
In step 92, the parameterizable pre-filter 30 subsequently calculates the amplification value for the sample position j on the basis of the interpolated noise power limit and the quantizing noise power, and preferably also the interpolated filter coefficients, namely for example depending on the root of
wherein for this reference is made to the explanations of step 64 of
In step 94, the parameterizable pre-filter 30 then applies the amplification value calculated and the interpolated filter coefficients to the sample value at the sample position j to obtain a filtered sample value for this sample position, namely s′(tj).
In step 96, the parameterizable pre-filter 30 then checks whether the sample position j has reached the current node, i.e. node 1, in the case of
Before the further procedure when processing the filtered sample values s′ will be described referring to
The application of the amplification value in steps 94 and 100 in the pre-filter 30 is a multiplication of the audio signal or the filtered audio signal, i.e. the sample values s or the filtered sample values s′, by the amplification factor. The purpose is to set by this the quantizing noise introduced into the filtered audio signal by the quantization described in greater detail below, and which is adjusted by the reverse-filtering on the decoder side to the form of the listening threshold, as high as possible without exceeding the listening threshold. This can be exemplified by Parsevals formula according to which the square of the magnitude of a function equals the square of the magnitude of the Fourier transform. When on the decoder side the multiplication of the audio signal in the pre-filter by the amplification value is reversed again by dividing the filtered audio signal by the amplification value, the quantizing noise power is also reduced, namely by the factor a−2, a being the amplification value. Consequently, the quantizing noise power can be set to an optimally high degree by applying the amplification value in the pre-filter 30, which is synonymous to the quantizing step size being increased and thus the number of quantizing steps to be coded being reduced, which in turn increases the compression in the subsequent redundancy reduction part.
Put differently, the effect of the pre-filter could be considered as a normalization of the signal to its masking threshold, so that the level of the quantizing interferences or quantizing noise can be kept constant in both time and frequency. Since the audio signal is in the time domain, the quantization may thus be performed step by step with a uniform constant quantization, as will be described subsequently. In this way, ideally any possible irrelevance is removed from the audio signal and a lossless compression scheme may be used to also remove the remaining redundancy in the pre-filtered and quantized audio signal, as will be described below.
Referring to
Subsequently, the further processing of the pre-filtered signal will be described referring to
The quantized filtered sample values are referred to by σ′ in
The reason for this threshold value is that it has been observed that the filtered audio signal output by the pre-filter 30 occasionally comprises audio values adding up to very large values due to an unfavorable accumulation of harmonic waves. Furthermore, it has been observed that cutting these values, as is achieved by the quantizing step function shown in
A somewhat more specific example of the quantizing step function shown in
Another example of a possible quantizing step function would be the one shown in
As has already been described before, on the decoder side not only the quantized and filtered audio values σ′ must be available, but also the input parameters for the pre-filter 30 being the basis of filtering these values, namely the node parameterization including a hint to the pertaining amplification value. In step 114, the compressor 34 thus performs a first compression trial and thus compresses side information containing the amplification values a0 and a1 at the nodes, such as, for example, 127 and 255, and the filter coefficients x0 and x1 at the nodes and the quantized filtered sample values σ′ to a temporally filtered signal. The compressor 34 thus is a losslessly operating coder, such as, for example, a Huffman or arithmetic coder with or without prediction and/or adaptation.
The memory 38 which the sampled audio values σ′ pass through serves as a buffer for a suitable block size with which the compressor 34 processes the quantized, filtered and also scaled, as will be described before, audio values σ′ output by the quantizer 28. The block size may differ from the block size of the audio blocks as are used by the means 20.
As has already been mentioned, the bit rate controller 36 has controlled the multiplexer 40 by a multiplicand of 1 for the first compression trial so that the filtered audio values go unchanged from the pre-filter 30 to the quantizer 28 and from there as quantized filtered audio values to the compressor 34. The compressor 34 monitors in step 116 whether a certain compression block size, i.e. a certain number of quantized sampled audio values, has been coded into the temporary coded signal, or whether further quantized filtered audio values σ′ are to be coded into the current temporary coded signal. If the compression block size has not been reached, the compressor 34 will continue performing the current compression 114. If the compression block size, however, has been reached, the bit rate controller 36 will check in step 118 whether the bit quantity required for the compression is greater than a bit quantity dictated by a desired bit rate. If this is not the case, the bit rate controller 36 will check in step 120 whether the bit quantity required is smaller than the bit quantity dictated by the desired bit rate. If this is the case, the bit rate controller 36 will fill up the coded signal in step 122 with filler bits until the bit quantity dictated by the desired bit rate has been reached. Subsequently, the coded signal is output in step 124. As an alternative to step 122, the bit rate controller 36 could pass on the compression block of filtered audio values σ′ still stored in the memory 38 on which the last compression has been based in a form multiplied by a multiplicand greater than 1 by the multiplier 40 to the quantizer 28 for again passing steps 110-118, until the bit quantity dictated by the desired bit rate has been reached, as is indicated by a step 125 illustrated in broken lines.
If, however, the check in step 118 results in that the required bit quantity is greater than the one dictated by the desired bit rate, the bit rate controller 36 will change the multiplicand for the multiplier 40 to a factor between 0 and 1 exclusive. This is performed in step 126. After step 126, the bit rate controller 36 provides for the memory 38 to again output the last compression block of filtered audio values σ′ on which the compression has been based, wherein they are subsequently multiplied by the factor set in step 126 and again supplied to the quantizer 28, whereupon steps 110-118 are performed again and the up to then temporarily coded signal is disposed of.
It is to be pointed out that when performing steps 110-116 again, in step 114 of course the factor used in step 126 (or step 125) is also integrated into the coded signal.
The purpose of the procedure after step 126 is increasing the effective step size of the quantizer 28 by the factor. This means that the resulting quantizing noise is uniformly above the masking threshold, which results in audible interferences or audible noise, but results in a reduced bit rate. If, after passing steps 110-116 again, it is again determined in step 118 that the required bit quantity is greater than the one dictated by the desired bit rate, the factor will be reduced again in step 126, etc.
If the data is finally output at step 124 as a coded signal, the next compression block will be performed from the subsequent quantized filtered audio values σ′.
It is also to be pointed out that another pre-initialized value than 1 could be used as the multiplication factor, namely, for example, 1. Then, scaling would take place in any case at first, i.e. at the very top of
b illustrates again the resulting coded signal which is generally indicated by 130. The coded signal includes side information and main data therebetween. The side information includes, as has already been mentioned, information from which for special audio blocks, namely audio blocks where a significant change in the filter coefficients has resulted in the sequence of audio blocks, the value of the amplification value and the value of the filter coefficients can be derived. If necessary, the side information will include further information relating to the amplification value used for the bit controller. Due to the mutual dependence of the amplification value and the noise power limit q, the side information may optionally, apart from the amplification value a# to a node #, also include the noise power limit q#, or only the latter. The side information is preferably arranged within the coded signal such that the side information to filter coefficients and pertaining amplification value or pertaining noise power limit is arranged in front of the main data to the audio block of quantized filtered audio values σ′, from which these filter coefficients with pertaining amplification values or pertaining noise power limit have been derived, i.e. the side information ac, x,(i) after block −1 and the side information a1, x1(i) after block 1. Put differently, the main data, i.e. the quantized filtered audio values σ′, starting from, excluding, an audio block of the kind where a significant change in the sequence of audio blocks has resulted in the filter coefficients, up to, including, the next audio block of this kind, in
In addition, the side information regarding the amplification value or the noise power limit and the filter coefficients in each side information block 132 and 134 are not always integrated independently of each other. Rather, this side information is transferred in differences to the previous side information block. In
This kind of integrating the side information into the side information blocks 132 and 134 offers the advantage of the possibility of a higher compression rate. The reason for this is that, although the side information will, if possible, only be transferred if a sufficient change of the filter coefficients to the filter coefficients of a previous node has resulted, the complexity of calculating the difference on the coder side or calculating the sum on the decoder side pays off since the resulting differences are small in spite of the query of step 66 to thus allow advantages in entropy coding.
After an embodiment of an audio coder has been described before, an embodiment of an audio decoder which is suitable for decoding the coded signal generated by the audio coder 10 of
The setup of this decoder is shown in
As is shown in
As is shown in
In step 230, the decompressor 212 monitors the decompressed signal for the occurrence of any kind of side information block, namely with absolute filter coefficients or filter coefficients differences to a previous side information block. In the example of
As soon as the side information block 132 has occurred, the decompressor 212 will calculate the parameter values at the node 1, i.e. a1, x1(i), in step 232 by adding up the difference values in the side information block 134 and the parameter values in the side information block 132. Step 232 is of course omitted if the current side information block is a self-contained side information block without differences, which, as has been described before, may exemplarily occur every second. In order for the waiting time for the decoder 210 not to be too long, side information blocks 132 where the parameter values may be derived absolutely, i.e. with no relation to another side information block, are arranged in sufficiently small distances so that the turn-on time or down time when switching on the audio coder 210 in the case of, for example, a radio transmission or broadcast transmission is not too large. Preferably, the number of side information blocks 132 arranged therebetween with the difference values are arranged in a fixed predetermined number between the side information blocks 132 so that the decoder knows when a side information block of type 132 is again to be expected in the coded signal. Alternatively, the different side information block types are indicated by corresponding flags.
As is shown in
If the post-filter 218 has not yet reached the current node with the sample position j, which it checks in step 246, it will increment the sample position index j in step 248 and start steps 238-246 again. Only when the node has been reached, it will apply the amplification value and the filter coefficients of the new node to the sample value at the node, namely in step 250. The application in turn includes, like in step 218, a division by means of the amplification value and filtering with a transfer function equaling the listening threshold and not the inverse of the latter, instead of a multiplication. After step 250, the current audio block is decoded by an interpolation between two node parameterizations.
As has already been mentioned, the noise introduced by the quantization when coding in step 110 or 112 is adjusted in both shape and magnitude to the listening threshold by the filtering and the application of an amplification value in steps 218 and 224.
It is also to be pointed out that in the case that the quantized filtered audio values have been subjected to another multiplication in step 126 due to the bit rate controller before being coded into the coded signal, this factor may also be considered in steps 218 and 224. Alternatively, the audio values obtained by the process of
With regard to
Referring to the previous description, it is pointed out again that the coding scheme illustrated above may be varied in many regards. Exemplarily, it is not necessary for a parameterization and an amplification value or a noise power limit, as were determined for a certain audio block, to be considered as directly valid for a certain audio value, like in the previous embodiment the last respective audio value of each audio block, i.e. the 128th value in this audio block so that interpolation for this audio value may be omitted. Rather, it is possible to relate these node parameter values to a node which is temporally between the sample times tn, n=0, . . . , 127, of the audio values of this audio block so that an interpolation would be necessary for each audio value. In particular, the parameterization determined for an audio block or the amplification value determined for this audio block may also be applied indirectly to another value, such as, for example, the audio value in the middle of the audio block, such as, for example, the 64th audio value in the case of the above block size of 128 audio values.
Additionally, it is pointed out that the above embodiment referred to an audio coding scheme designed for generating a coded signal with a controlled bit rate. Controlling the bit rate, however, is not necessary for every case of application. This is why the corresponding steps 116 to 122 and 126 may also be omitted.
With reference to the compression scheme mentioned referring to step 114, for reasons of completeness, reference is made to the document by Schuller et al. described in the introduction to the description and, in particular, to division IV, the contents of which with regard to the redundancy reduction by means of lossless coding is incorporated herein by reference.
In addition, the following is to be pointed out referring to the previous embodiment. Although it has been described before that the threshold value always remains constant when quantizing or even the quantizing step function always remains constant, i.e. the artifacts generated in the filtered audio signal are always quantized or cut off by rougher a quantization, which may impair the audio quality to an audible extent, it is also possible to only use these measures if the complexity of the audio signal requires this, namely if the bit rate required for coding exceeds a desired bit rate. In this case, in addition to the quantizing step functions shown in
Furthermore, some aspects of the above embodiment are of advantage, but not necessary. Exemplarily, interpolation may be omitted in the above audio coding scheme. In addition, it would be possible to transfer the parameterizations and the amplification value or the parameterizations and the noise power limit with regard to each audio block with regard to which they were calculated, and not to leave out a single one when the successive parameterizations differ by less than the predetermined measure already mentioned.
In addition, it would be possible to only apply the difference coding to the parameterizations, but not to the amplification value or the noise power limit.
In addition, it is conceivable in the above coding scheme to transfer the filter coefficients in the difference side blocks 134 in a different manner, namely, for example, in the form of the current filter coefficients minus the previously transferred filter coefficients minus the minimum threshold of step 66.
The above-described audio coding scheme consequently relates to, among other things, effectively transferring side information in an audio coder with a very small delay time. The side information having to be transferred for the decoder in order for the audio signal to be reconstructed suitably has the feature of usually changing only slowly. This is why only differences are transferred, which decreases the bit rate. In addition, they will only be transferred when there are sufficient changes. From time to time, the absolute value will be transferred in case past values were lost. Put differently, the side information from the prefilter or the coefficients are transferred such that the post-filter in the decoder has the inverse transfer function so that the audio signal may again be reconstructed suitably. The bit rate required for this is reduced by transferring differences, but only if they have a sufficient size. These differences have smaller values and occur more frequency, which is why they require fewer bits when coding. The difference coding thus particularly pays off since the differences will also only change steadily with continually changing audio signals.
In particular, it is pointed out that, depending on the circumstances, the inventive audio coding scheme may also be implemented in software. The implementation may be on a digital storage medium, in particular on a disc or a CD having control signals which may be readout electronically, which can cooperate with a programmable computer system such that the corresponding method will be executed. In general, the invention also is in a computer program product having a program code stored on a machine-readable carrier for performing the inventive method when the computer program product runs on a computer. Put differently, the invention may also be realized as a computer program having a program code for performing the method when the computer program runs on a computer.
In particular, above method steps in the blocks of the flow chart may be implemented individually or in groups of several ones together in subprogram routines. Alternatively, an implementation of an inventive device in the form of an integrated circuit is, of course, also possible where these blocks are, for example, implemented as individual circuit parts of an ASIC.
In particular, it is pointed out that, depending on the circumstances, the inventive scheme may also be implemented in software. The implementation may be on a digital storage medium, in particular on a disc or a CD having control signals which may be read out electronically, which can cooperate with a programmable computer system such that the corresponding method will be executed. In general, the invention thus also is in a computer program product having a program code stored on a machine-readable carrier for performing the inventive method when the computer program runs on a computer. Put differently, the invention may also be realized as a computer program having a program code for performing the method when the computer program runs on a computer.
While this invention has been described in terms of several preferred embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10 2004 007191.8 | Feb 2004 | DE | national |
This application is a continuation of copending International Application No. PCT/EP2005/001363, filed Feb. 10, 2005, which designated the United States and was not published in English, and is incorporated herein by reference in its entirety, and which claimed priority to German Patent Application No. 10 2004 007 191.8, filed on Feb. 13, 2004.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP05/01363 | Feb 2005 | US |
Child | 11460423 | Jul 2006 | US |