The present application is concerned with audio coding, and especially with noise filling in connection with audio coding.
In transform coding it is often recognized (compare [1], [2], [3]) that quantizing parts of a spectrum to zeros leads to a perceptual degradation. Such parts quantized to zero are called spectrum holes. A solution for this problem presented in [1], [2], [3] and [4] is to replace zero-quantized spectral lines with noise. Sometimes, the insertion of noise is avoided below a certain frequency. The starting frequency for noise filling is fixed, but different between the known technology.
Sometimes, FDNS (Frequency Domain Noise Shaping) is used for shaping the spectrum (including the inserted noise) and for the control of the quantization noise, as in USAC (compare [4]). FDNS is performed using the magnitude response of the LPC filter. The LPC filter coefficients are calculated using the pre-emphasized input signal.
It was noted in [1] that adding noise in the immediate neighborhood of a tonal component leads to a degradation, and accordingly, just as in [5] only long runs of zeros are filled with noise to avoid concealing non-zero quantized values by the injected surrounding noise.
In [3] it is noted that there is a problem of a compromise between the granularity of the noise filling and the size of the necessitated side information. In [1], [2], [3] and [5] one noise filling parameter per complete spectrum is transmitted. The inserted noise is spectrally shaped using LPC as in [2] or using scale factors as in [3]. It is described in [3] how to adapt scale factors to a noise filling with one noise filling level for the whole spectrum. In [3], the scale factors for bands that are completely quantized to zero are modified to avoid spectral holes and to have a correct noise level.
Even though the solutions in [1] and [5] avoid a degradation of tonal components in that they suggest not filling small spectrum holes, there is still a need to further improve the quality of an audio signal coded using noise filling, especially at very low bit-rates.
An embodiment may have an apparatus configured to perform noise filling on a spectrum of an audio signal in a manner dependent on a tonality of the audio signal, wherein the apparatus is configured to dequantize the spectrum, as derived after the noise-filling, using a spectrally varying and signal-adaptive quantization step size controlled via a linear prediction spectral envelope signaled via linear prediction coefficients in a data stream into which the spectrum is coded, or scale factors relating to scale factor bands, signaled in the data stream into which the spectrum is coded, wherein the apparatus is configured to fill a contiguous spectral zero-portion of the audio signal's spectrum with noise spectrally shaped using: a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges an absolute slope of which negatively depends on the tonality, or a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges a spectral width of which positively depends on the tonality, or a constant or unimodal function an integral of which—normalized to an integral of 1—over outer quarters of the contiguous spectral zero-portion negatively depends on the tonality.
Another embodiment may have an apparatus configured to perform noise filling on a spectrum of an audio signal in a manner dependent on a tonality of the audio signal, wherein the apparatus is configured to dequantize the spectrum, as derived after the noise-filling, using a spectrally varying and signal-adaptive quantization step size controlled via a linear prediction spectral envelope signaled via linear prediction coefficients in a data stream into which the spectrum is coded, or scale factors relating to scale factor bands, signaled in the data stream into which the spectrum is coded, identify contiguous spectral zero-portions of the audio signal's spectrum and to apply the noise filling onto the contiguous spectral zero-portions identified, and respectively fill the contiguous spectral zero-portions of the audio signal's spectrum with noise spectrally shaped with a function set dependent on a respective contiguous spectral zero-portion's width so that the function is confined to the respective contiguous spectral zero-portion, and dependent on the tonality of the audio signal so that, if the tonality of the audio signal increases, the function gets more compact in the inner of the respective contiguous spectral zero-portion and distanced from the respective contiguous spectral zero-portion's outer edges.
Another embodiment may have an audio decoder supporting noise filling including an inventive apparatus.
According to another embodiment, a perceptual transform audio decoder may have: an inventive apparatus configured to perform noise filling on a spectrum of an audio signal; and a frequency domain noise shaper configured to subject the noise filled spectrum to spectral shaping using a spectral perceptual weighting function.
Another embodiment may have an audio encoder supporting noise filling including an inventive apparatus, the encoder being configured to use a spectrum filled with noise by the apparatus, for analysis-by-synthesis.
Another embodiment may have an audio encoder supporting noise filling, configured to quantize and code a spectrum of an audio signal into a data stream and set and code into the data stream, a spectrally global noise filling level for performing noise filling on the spectrum of the audio signal, in a manner dependent on a tonality of the audio signal, wherein the encoder is configured to, in setting and coding the spectrally global noise filling level, measure of a level of the audio signal within contiguous spectral zero-portions of the spectrum, spectrally shaped dependent on the tonality of the audio signal, wherein the contiguous spectral zero-portions of the audio signal's spectrum are spectrally shaped using a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges an absolute slope of which negatively depends on the tonality, or a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges a spectral width of which positively depends on the tonality, or a constant or unimodal function an integral of which—normalized to an integral of 1—over outer quarters of the contiguous spectral zero-portion negatively depends on the tonality.
Another embodiment may have a method including performing noise filling on a spectrum of an audio signal in a manner dependent on a tonality of the audio signal, wherein the method includes dequantizing the spectrum, as derived after the noise-filling, using a spectrally varying and signal-adaptive quantization step size controlled via a linear prediction spectral envelope signaled via linear prediction coefficients in a data stream into which the spectrum is coded, or scale factors relating to scale factor bands, signaled in the data stream into which the spectrum is coded, wherein the method includes filling a contiguous spectral zero-portion of the audio signal's spectrum with noise spectrally shaped using: a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges an absolute slope of which negatively depends on the tonality, or a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges a spectral width of which positively depends on the tonality, or a constant or unimodal function an integral of which—normalized to an integral of 1—over outer quarters of the contiguous spectral zero-portion negatively depends on the tonality.
Another embodiment may have a method for audio encoding supporting noise filling, the method including quantizing and coding a spectrum of an audio signal into a data stream and setting and coding into the data stream, a spectrally global noise filling level for performing noise filling on the spectrum of the audio signal, in a manner dependent on a tonality of the audio signal, wherein the setting and coding the spectrally global noise filling level includes measuring of a level of the audio signal within contiguous spectral zero-portions of the spectrum, spectrally shaped dependent on the tonality of the audio signal, wherein the contiguous spectral zero-portions of the audio signal's spectrum are spectrally shaped using: a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges an absolute slope of which negatively depends on the tonality, or a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges a spectral width of which positively depends on the tonality, or a constant or unimodal function an integral of which—normalized to an integral of 1—over outer quarters of the contiguous spectral zero-portion negatively depends on the tonality.
Another embodiment may have a computer program including a program code for performing, when running on a computer, a method including performing noise filling on a spectrum of an audio signal in a manner dependent on a tonality of the audio signal, wherein the method includes dequantizing the spectrum, as derived after the noise-filling, using a spectrally varying and signal-adaptive quantization step size controlled via a linear prediction spectral envelope signaled via linear prediction coefficients in a data stream into which the spectrum is coded, or scale factors relating to scale factor bands, signaled in the data stream into which the spectrum is coded, wherein the method includes filling a contiguous spectral zero-portion of the audio signal's spectrum with noise spectrally shaped using: a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges an absolute slope of which negatively depends on the tonality, or a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges a spectral width of which positively depends on the tonality, or a constant or unimodal function an integral of which—normalized to an integral of 1—over outer quarters of the contiguous spectral zero-portion negatively depends on the tonality, when said computer program is run by a computer.
Another embodiment may have a computer program including a program code for performing, when running on a computer, a method for audio encoding supporting noise filling, the method including quantizing and coding a spectrum of an audio signal into a data stream and setting and coding into the data stream, a spectrally global noise filling level for performing noise filling on the spectrum of the audio signal, in a manner dependent on a tonality of the audio signal, wherein the setting and coding the spectrally global noise filling level includes measuring of a level of the audio signal within contiguous spectral zero-portions of the spectrum, spectrally shaped dependent on the tonality of the audio signal, wherein the contiguous spectral zero-portions of the audio signal's spectrum are spectrally shaped using: a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges an absolute slope of which negatively depends on the tonality, or a function assuming a maximum in an inner of the contiguous spectral zero-portion, and including outwardly falling edges a spectral width of which positively depends on the tonality, or a constant or unimodal function an integral of which—normalized to an integral of 1—over outer quarters of the contiguous spectral zero-portion negatively depends on the tonality, when said computer program is run by a computer.
It is a basic finding of the present application that noise filling of a spectrum of an audio signal may be improved in quality with respect to the noise filled spectrum so that the reproduction of the noise filled audio signal is less annoying, by performing the noise filling in a manner dependent on a tonality of the audio signal.
In accordance with an embodiment of the present application, a contiguous spectral zero-portion of the audio signal's spectrum is filled with noise spectrally shaped using a function assuming a maximum in an inner of the contiguous spectral zero-portion, and having outwardly falling edges an absolute slope of which negatively depends on the tonality, i.e. the slope decreases with increasing tonality. Additionally or alternatively, the function used for filling assumes a maximum in an inner of the contiguous spectral zero-portion and has outwardly falling edges, a spectral width of which positively depends on the tonality, i.e. the spectral width increases with increasing tonality. Even further, additionally or alternatively, a constant or unimodal function may be used for filling, an integral of which—normalized to an integral of 1—over outer quarters of the contiguous spectral zero-portion negatively depends on the tonality, i.e. the integral decreases with increasing tonality. By all of these measures, noise filling tends to be less detrimental for tonal parts of the audio signal, however with being nevertheless effective for non-tonal parts of the audio signal in terms of reduction of spectrum holes. In other words, whenever the audio signal has a tonal content, the noise filled into the audio signal's spectrum leaves the tonal peaks of the spectrum unaffected by keeping enough distance therefrom, wherein however the non-tonal character of temporal phases of the audio signal with the audio content as non-tonal is nevertheless met by the noise filling.
In accordance with an embodiment of the present application, contiguous spectral zero-portions of the audio signal's spectrum are identified and the zero-portions identified are filled with noise spectrally shaped with functions so that, for each contiguous spectral-zero portion the respective function is set dependent on a respective contiguous spectral zero-portion's width and a tonality of the audio signal. For the ease of implementation, the dependency may be achieved by a lookup in a look-up table of functions, or the functions may be computed analytically using a mathematical formula depending on the contiguous spectral zero-portion's width and the tonality of the audio signal. In any case, the effort for realizing the dependency is relatively minor compared to the advantages resulting from the dependency. In particular, the dependency may be such that the respective function is set dependent on the contiguous spectral zero-portion's width so that the function is confined to the respective contiguous spectral zero-portion, and dependent on the tonality of the audio signal so that, for a higher tonality of the audio signal, a function's mass becomes more compact in the inner of the respective contiguous spectral zero-portion and distanced from the respective contiguous spectral zero-portion's edges.
In accordance with a further embodiment, the noise spectrally shaped and filled into the contiguous spectral zero-portions is commonly scaled using a spectrally global noise filling level. In particular, the noise is scaled such that an integral over the noise in the contiguous spectral zero-portions or an integral over the functions of the contiguous spectral zero-portions corresponds to, e.g. is equal to, a global noise filling level. Advantageously, a global noise filling level is coded within existing audio codecs anyway so that no additional syntax has to be provided for such audio codecs. That is, the global noise filling level may be explicitly signaled in the data stream into which the audio signal is coded with low effort. In effect, the functions with which the contiguous spectral zero-portion's noise is spectrally shaped may be scaled such that an integral over the noise with which all contiguous spectral zero-portions are filled corresponds to the global noise filling level.
In accordance with an embodiment of the present application, the tonality is derived from a coding parameter using which the audio signal is coded. By this measure, no additional information needs to be transmitted within an existing audio codec. In accordance with specific embodiments, the coding parameter is an LTP (Long-Term Prediction) flag or gain, a TNS (Temporal Noise Shaping) enablement flag or gain and/or a spectrum rearrangement enablement flag.
In accordance with a further embodiment, the performance of the noise filling is confined onto a high-frequency spectral portion, wherein a low-frequency starting position of the high-frequency spectral potion is set corresponding to an explicit signaling in a data stream and to which the audio signal is coded. By this measure, a signal adaptive setting of the lower bound of the high-frequency spectral portion in which the noise filling is performed, is feasible. By this measure, in turn, the audio quality resulting from the noise filling may be increased. The additional side information necessitated, in turn, caused by the explicit signaling, is comparatively small.
In accordance with a further embodiment of the present application, the apparatus is configured to perform the noise filing using a spectral low-pass filter so as to counteract a spectral tilt caused by a pre-emphasis used to code the audio signal's spectrum. By this measure, the noise filling quality is increased even further, since the depth of remaining spectrum holes is further reduced. More generally speaking, noise filling in perceptual transform audio codecs may be improved by, in addition to tonality dependently spectrally shaping the noise within spectrum holes, performing the noise filling with a spectrally global tilt, rather than in a spectrally flat manner. For example, the spectrally global tilt may have a negative slope, i.e. exhibit a decrease from low to high frequencies, in order to at least partially reverse the spectral tilt caused by subjecting the noise filled spectrum to the spectral perceptual weighting function. A positive slope may be imaginable as well, e.g. in cases where the coded spectrum exhibits a high-pass-like character. In particular, spectral perceptual weighting functions typically tend to exhibit an increase from low to high frequencies. Accordingly, noise filled into the spectrum of perceptual transform audio coders in a spectrally flat manner, would end-up in a tilted noise floor in the finally reconstructed spectrum. The inventors of the present application, however, realized that this tilt in the finally reconstructed spectrum negatively affects the audio quality, because it leads to spectral holes remaining in noise-filled parts of the spectrum. Accordingly, inserting the noise with a spectrally global tilt so that the noise level decreases from low to high frequencies at least partially compensates for such a spectral tilt caused by the subsequent shaping of the noise filled spectrum using the spectral perceptual weighting function, thereby improving the audio quality. Depending on the circumstances, a positive slope may be advantageous, e.g. on certain high-pass-like spectra.
In accordance with an embodiment, the slope of the spectrally global tilt is varied responsive to a signaling in the data stream into which the spectrum is coded. The signaling may, for example, explicitly signal the steepness and may be adapted, at the encoding side, to the amount of spectral tilt caused by the spectral perceptual weighting function. For example, the amount of spectral tilt caused by the spectral perceptual weighting function may stem from a pre-emphasis which the audio signal is subject to before applying the LPC analysis thereon.
The noise filling may be used at audio encoding and/or audio decoding side. When used at the audio encoding side, the noise filled spectrum may be used for analysis-by-synthesis purposes.
In accordance with an embodiment, an encoder determines the global noise scaling level by taking the tonality dependency into account.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
Wherever in the following description of the figures, equal reference signs are used for the elements shown in these figures, the description brought forward with regard to one element in one figure shall be interpreted as transferrable onto the element in another figure having been referenced using the same reference sign. By this measure, an extensive and repetitive description is avoided as far as possible, thereby concentrating the description of the various embodiments onto the differences among each other rather than describing all embodiments anew from the outset on, again and again.
The following description starts with embodiments for an apparatus for performing noise filling on a spectrum of an audio signal, first. Second, different embodiments are presented for various audio codecs, where such a noise filling may be built-in, along with specifics which could apply in connection with a respective audio codec presented. It is noted that the noise filling described next may, in any case, be performed at the decoding side. Depending on the encoder, however, the noise filling as described next may also be performed at the encoding side such as, for example, for analysis-by-synthesis reasons. An intermediate case according to which the modified way of noise filling in accordance with the embodiments outlined below merely partially changes the way the encoder works such as, for example, in order to determine a spectrally global noise filling level, is also described below.
Beyond that, in a time-aligned manner
The apparatus of
The actual noise filling is performed by noise filler 32. The noise filler 32 receives the spectrum to which the noise filling shall be applied. This spectrum is illustrated in
Accordingly, it is the task of tonality determiner 34 to provide the noise filler 32 with an estimation of the tonality on the basis of another tonality hint 38 as will be described in more detail below. In accordance with the embodiments described later, the tonality hint 38 may be available at encoding and decoding sides anyway, by way of a respective coding parameter conveyed within the data stream of the audio codec within which apparatus 30 is, for example, used.
The tonality dependency of the noise filling generally described above with respect to
As can be seen, the absolute value of the slope of edges 58 and 60 is higher for function 50 than for function 48. The noise filler 32 selects to fill the zero-portion 40 with function 50 for tonalities lower than tonalities for which noise filler 32 selects to use function 48 for filling zero-portion 40. By this measure, the noise filler 32 avoids clustering the immediate periphery of potentially tonal spectral peaks of spectrum 34, such as, for example, peak 62. The smaller the absolute slope of edges 58 and 60 is, the further away the noise filled into zero-portion 40 is from the non-zero portions of spectrum 34 surrounding zero-portion 40.
Noise filler 32 may, for example, choose to select function 48 in case of the audio signal's tonality being τ2, and function 50 in case of the audio signal's tonality being τ1, but the description brought forward further below will reveal that noise filler 32 may discriminate more than two different states of the audio signal's tonality, i.e. may support more than two different functions 48, 50 for filling a certain contiguous spectral zero-portion and choose between those depending on the tonality via a surjective mapping from tonalities to functions.
As a minor note, it is noted that the construction of functions 48 and 50 according to which same have a plateau in the inner interval 52, flanked by edges 58 and 60 so as to result in unimodal functions, is merely an example. Alternatively, bell-shaped functions may be used, for example, in accordance with an alternative. The interval 52 may alternatively be defined as the interval between which the function is higher than 95% of its maximum value.
In order to explain this, see
In
In this situation, the integral of function 50 over quarters a, d is greater than the integral of function 48 over quarters a, d and accordingly, noise filler 32 uses function 50 for higher tonalities and function 48 for lower tonalities, i.e. the integral over the outer quarters of the normalized functions 50 and 48 negatively depends on the tonality.
For illustration purposes, in case of
Although the type of variation of functions 48 and 50 depending on the tonality varies, all examples of
Until now, the description of
The zero-portion filler 72 is configured to fill the identified contiguous spectral zero-portions identified by identifier 70 with noise spectrally shaped in accordance with a function as described above with respect to
In particular, the individual filling of each contiguous spectral zero-portion identified by identifier 70 may be performed by filler 72 as follows: the function is set dependent on the contiguous spectral zero-portion's width so that the function is confined to the respective contiguous spectral zero-portion, i.e. the domain of the function coincides with the contiguous spectral zero-portion's width. The setting of the function is further dependent on the tonality of the audio signal, namely in the manner outlined above with respect to
It has already been outlined above that the noise filling's dependency on the tonality may discriminate between more than only two different tonalities such as 3, 4 or even more then 4.
Until now, the description of certain embodiments of the present application focused on the function's shape used to spectrally shape the noise with which certain contiguous spectral zero-portions are filled. It is advantageous, however, to control the overall level of noise added to a certain spectrum to be noise filled so as to result in a pleasant reconstruction, or to even control the level of noise introduction spectrally.
In accordance with one embodiment, the available set of functions 48, 50 for spectrally shaping the noise to be filled into the portions 90-94, all have a predefined scale which is known to encoder and decoder. A spectrally global scaling factor is signaled explicitly within the data stream into which the audio signal, i.e. the non-quantized part of the spectrum, is coded. This factor indicates, for example, the RMS or another measure for a level of noise, i.e. random or pseudorandom spectral line values, with which portions 90-94 are pre-set at the decoding side with then being spectrally shaped using the tonality dependently selected functions 48, 50 as they are. As to how the global noise scaling factor could be determined at the encoder side is described further below. Let, for example, A be the set of indices i of spectral lines where the spectrum is quantized to zero and which belong to any of the portions 90-94, and let N denote the global noise scaling factor. The values of the spectrum shall be denoted xi. Further, “random(N)” shall denote a function giving a random value of a level corresponding to level “N” and left(i) shall be a function indicating for any zero-quantized spectral value at index i the index of the zero-quantized value at the low-frequency end of the zero-portion to which i belongs, and Fi(j) with j=0to Ji−1 shall denote the function 48 or 50 assigned to, depending on the tonality, the zero-portion 90-94 starting at index i, with Ji indicating the width of that zero-portion. Then, portions 90-94 are filled according to xi=Fleft(i)(i−left(i))·random(N).
Additionally, the filling of noise into portions 90-94, may be controlled such that the noise level decreases from low to high frequencies. This may be done by spectrally shaping the noise with which portions are pre-set, or spectrally shaping the arrangement of functions 48,50 in accordance with a low-pass filter's transfer function. This may compensate for a spectral tilt caused when re-scaling/dequantizing the filled spectrum due to, for example, a pre-emphasis used in determining the spectral course of the quantization step size. Accordingly, the steepness of the decrease or the low-pass filter's transfer function may be controlled according to a degree of pre-emphasis applied. Applying the nomenclature used above, portions 90-94 may be filled according to xi=Fleft(i)(i−left(i))·random(N)·LPF(i) with LPF(i) denoting the low-frequency filter's transfer function which may be linear. Depending on the circumstances, the function LPF which corresponds to function 15 may have a positive slope and LPF changed to read HPF accordingly.
Instead of using a fixed scaling of the functions selected depending on tonality and zero-portion's width, the just outlined spectral tilt correction may directly be accounted for by using the spectral position of the respective contiguous zero-portion also as an index in looking-up or otherwise determining 80 the function to be used for spectrally shaping the noise with which the respective contiguous spectral zero-portion has to be filled. For example, a mean value of the function or its pre-scaling used for spectrally shaping the noise to be filled into a certain zero-portion 90-94 may depend on the zero-portion's 90-94 spectral position so that, over the whole bandwidth of the spectrum, the functions used for the contiguous spectral zero-portions 90-94 are pre-scaled so as to emulate a low-pass filter transfer function so as to compensate for any high pass pre-emphasis transfer function used to derive the non-zero quantized portions of the spectrum.
Having described embodiments for performing the noise filling, in the following embodiments for audio codecs are presented where the noise filling outlined above may be advantageously built into.
The spectral line-wise representation of the audio signal, i.e. the spectrogram 12, and the masking threshold enter quantizer 108 which is responsible for quantizing the spectral samples of the spectrogram 12 using a spectrally varying quantization step size which depends on the masking threshold: the larger the masking threshold, the smaller the quantization step size is. In particular, the quantizer 108 informs the decoding side of the variation of the quantization step size in the form of so-called scale factors which, by way of the just-described relationship between quantization step size on the one hand and perceptual masking threshold on the other hand, represent a kind of representation of the perceptual masking threshold itself. In order to find a good compromise between the amount of side information to be spent for transmitting the scale factors to the decoding side, and the granularity of adapting the quantization noise to the perceptual masking threshold, quantizer 108 sets/varies the scale factors in a spectrotemporal resolution which is lower than, or coarser than, the spectrotemporal resolution at which the quantized spectral levels describe the spectral line-wise representation of the audio signal's spectrogram 12. For example, the quantizer 108 subdivides each spectrum into scale factor bands 110 such as bark bands, and transmits one scale factor per scale factor band 110. As far as the temporal resolution is concerned, same may also be lower as far as the transmission of the scale factors is concerned, compared to the spectral levels of the spectral values of spectrogram 12.
Both the spectral levels of the spectral values of the spectrogram 12, as well as the scale factors 112 are transmitted to the decoding side. However, in order to improve the audio quality, the encoder 100 transmits within the data stream also a global noise level which signals to the decoding side the noise level up to which zero-quantized portions of representation 12 have to be filled with noise before rescaling, or dequantizing, the spectrum by applying the scale factors 112. This is shown in
As already denoted above, the noise filling to which the global noise level 114 refers, may be subject to a restriction in that this kind of noise filling merely refers to frequencies above some starting frequency which is indicated in
The encoder 100 of
As far as the dependency on the tonality is concerned, the encoder 100 may determine the global noise level 114, and insert same into the data stream, by associating to the zero-portions 40a to 40d the function for spectrally shaping the noise for filling the respective zero-portion. In particular, the encoder may use these functions in order to weight the original, i.e. weighted but not yet quantized, audio signal's spectral values in these portions 40a to 40d in order to determine the global noise level 114. Thereby, the global noise level 114 determined and transmitted within the data stream, leads to a noise filling at the decoding side which more closely recovers the original audio signal's spectrum.
The encoder 100 may, depending on the audio signal's content, decide on using some coding options which, in turn, may be used as tonality hints such as the tonality hint 38 shown in
Additionally or alternatively, encoder 100 may support temporal noise shaping. That is, on a per spectrum 18 basis, for example, encoder 100 may choose to subject spectrum 18 to temporal noise shaping with indicating this decision by way of a temporal noise shaping enablement flag to the decoder. The TNS enablement flag indicates whether the spectral levels of spectrum 18 form the prediction residual of a spectral, i.e. along frequency direction determined, linear prediction of the spectrum or whether the spectrum is not LP predicted. If TNS is signaled to be enabled, the data stream additionally comprises the linear prediction coefficients for spectrally linear predicting the spectrum so that the decoder may recover the spectrum using these linear prediction coefficients by applying same onto the spectrum before or after the rescaling or dequantizing. The TNS enablement flag is also a tonality hint: if the TNS enablement flag signals TNS to be switched on, e.g. on a transient, then the audio signal is very unlikely to be tonal, as the spectrum seems to be well predictable by linear prediction along frequency axis and, hence, non-stationary. Accordingly, the tonality may be determined on the basis of the TNS enablement flag such that the tonality is higher if the TNS enablement flag disables TNS, and is lower if the TNS enablement flag signals the enablement of TNS. Instead of, or in addition to, a TNS enablement flag, it may be possible to derive from the TNS filter coefficients a TNS gain indicating a degree up to which TNS is usable for predicting the spectrum, thereby also revealing a more-than-two-valued hint concerning the tonality.
Other coding parameters may also be coded within the data stream by encoder 100. For example, a spectral rearrangement enablement flag may signal one coding option according to which the spectrum 18 is coded by rearranging the spectral levels, i.e. the quantized spectral values, spectrally with additionally transmitting within the data stream the rearrangement prescription so that the decoder may rearrange, or rescramble, the spectral levels so as to recover spectrum 18. If the spectrum rearrangement enablement flag is enabled, i.e. spectrum rearrangement is applied, this indicates that the audio signal is likely to be tonal as rearrangement tends to be more rate/distortion effective in compressing the data stream if there are many tonal peaks within the spectrum. Accordingly, additionally or alternatively, the spectrum rearrangement enablement flag may be used as a tonal hint and the tonality used for noise filling may be set to be larger in case of the spectrum rearrangement enablement flag being enabled, and lower if the spectrum arrangement enablement flag is disabled.
For the sake of completeness, and also with reference to
As far as the concept of imposing a spectrally global tilt on the noise and taking the same into account when computing the noise level parameter at encoding side is concerned, the encoder 100 may determine the global noise level 114, and insert same into the data stream, by weighting portions of the not-yet quantized, but with the inverse of the perceptual weighting function weighted audio signal's spectral values, spectrally co-located to zero-portions 40a to 40d, with a function spectrally extending at least over the whole noise filling portion of the spectrum bandwidth and having a slope of opposite sign relative to the function 15 used at the decoding side for noise filling, for example and measuring the level based on the thus weighted non-quantized values.
As already described with respect to
It is noted that the noise which noise filler 30 spectrally shapes in the tonality dependent manner described above and/or subjects to a spectrally global tilt in a manner described above, may stem from a pseudorandom noise source, or may be derived from noise filler 30 on the basis of spectral copying or patching from other areas of the same spectrum or related spectrums, such as a time-aligned spectrum of another channel, or a temporally preceding spectrum. Even patching from the same spectrum may be feasible, such as copying from lower frequency areas of spectrum 18 (spectral copy-up). Irrespective of the way the noise filler 30 derives the noise, filler 30 spectrally shapes the noise for filling into contiguous spectral zero-portions 40a to 40d in the tonality dependent manner described above and/or subjects same to a spectrally global tilt in a manner described above.
For the sake of completeness only, it is shown in
Even here, the noise filler 30 may apply the tonality dependent filling of the contiguous spectral zero-portions 40a to 40d exemplarily as shown in
In accordance with the audio codec examples outlined above with respect to
The dequantizer 174 receives from the LPC-to-spectral-line converter 172 a spectral curve to be used by dequantizer 174 for reshaping the filled spectrum or, in other words, for dequantizing it. This process is sometimes called FDNS (Frequency Domain Noise Shaping). The LPC-to-spectral-line-converter 172 derives the spectral curve on the basis of the LPC information 162 in the data stream. The dequantized spectrum, or reshaped spectrum, output by dequantizer 174 is subject to an inverse transformation by inverse transformer 176 in order to recover the audio signal. Again, the sequence of reshaped spectrums may be subject by inverse transformer 176 to an inverse transformation followed by an overlap-add-process in order to perform time-domain aliasing cancellation between consecutive retransforms in case of the transformation of transformer 152 being a critically sampled lapped transform such as MDCT.
By way of dotted lines in
Up to now, several embodiments have been described, and hereinafter specific implementation examples are presented. The details brought forward with respect to these examples, shall be understood as being individually transferrable onto the above embodiments to further specify same. Before that, however, it should be noted that all of the embodiments described above may be used in audio as well as speech coding. They generally refer to transform coding and use a signal adaptive concept for replacing the zeros introduced in the quantization process with spectrally shaped noise using very small amount of side information. In the embodiments described above, the observation has been exploited that spectral holes sometimes also appear just below a noise filling starting frequency if any such starting frequency is used, and that such spectral holes are sometimes perceptually annoying. The above embodiments using an explicit signaling of the starting frequency allow for removing the holes that bring degradation but allow for avoiding to insert noise at low frequencies wherever the insertion of noise would introduce distortions.
Moreover, some of the embodiments outlined above use a pre-emphasis controlled noise filing in order to compensate for the spectral tilt caused by the pre-emphasis. These embodiments take into account the observance that if the LPC filter is calculated on a pre-emphasis signal, merely applying a global or average magnitude or average energy of the noise to be inserted would cause the noise shaping to introduce a spectral tilt in the inserted noise as the FDNS at the decoding side would subject the spectrally flat inserted noise to a spectral shaping still showing the spectral tilt of the pre-emphasis. Accordingly, the latter embodiments performed a noise filling in such a manner that the spectral tilt from the pre-emphasis is taken into account and compensated.
Thus, in other words,
Further, the perceptual transform audio decoder comprises a frequency domain noise shaper 6 in form of dequantizer 132, 174, configured to subject the noise-filled spectrum to spectral shaping using a spectral perceptual weighting function. In case of
Further, the perceptual transform audio decoder comprises an inverse transformer 134, 176 configured to inversely transform the noise-filled spectrum, spectrally shaped by the frequency domain noise shaper, to obtain an inverse transform, and subject the inverse transform to an overlap-add process.
Correspondingly,
The just-applied alternative and generalizing wording used to describe
As shown in
In order to control noise filling to be performed at the decoding side so as to improve the spectrum 34, with regard to setting the level of the noise, a noise level computer 3 of the perceptual transform audio encoder may optionally be present which computes a noise level parameter by measuring a level of the perceptually weighted spectrum 4 at portions 5 co-located to zero-portions 40 of the quantized spectrum 34. The noise level parameter thus computed may also coded in the aforementioned data stream so as to arrive at the decoder.
The perceptual transform audio decoder is shown in
The significance of filling spectrum 34 with noise 9 which exhibits a spectrally global tilt is the following: later, when the noise filled spectrum 36 is subject to the spectral shaping by frequency domain noise shaper 6, spectrum 36 will be subject to a tilted weighting function. For example, the spectrum will be amplified at the high frequencies when compared to a weighting of the low frequencies. That is, the level of spectrum 36 will be raised at higher frequencies relative to lower frequencies. This causes a spectrally global tilt with positive slope in originally spectrally flat portions of spectrum 36. Accordingly, if noise 9 would be filled into spectrum 36 so as to fill the zero-portions 40 thereof, in a spectrally flat manner, then the spectrum output by FDNS 6 would show within these portions 40 a noise floor which tends to increase from, for example, low to high frequencies. That is, when examining the whole spectrum or at least the portion of the spectrum bandwidth, where noise filling is performed, one would see that the noise within portions 40 has a tendency or linear regression function with positive slope or negative slope. As noise filling apparatus 30, however, fills spectrum 34 with noise exhibiting a spectrally global tilt of positive or negative slope, indicated a in
“Spectrally global tilt” shall denote that the noise 9 filled into spectrum 34 has a level which tends to decrease (or increase) from low to high frequencies. For example, when placing a linear regression line through local maxima of noise 9 as filled into, for example, mutually spectrally distanced, contiguous spectral zero portions 40, the resulting linear regression line has the negative (or positive) slope α.
Although not mandatory, the perceptual transform audio encoder's noise level computer may account for the tilted way of filling noise into spectrum 34 by measuring the level of the perceptually weighted spectrum 4 at portions 5 in a manner weighted with a spectrally global tilt having, for example, a positive slope in case of a being negative and negative slope if α is positive. The slope applied by the noise level computer, which is indicated as β in
Later on it will be described that it may be feasible to control a variation of a slope of the spectrally global tilt a via explicit signaling in the data stream or via implicit signaling in that, for example, the noise filling apparatus 30 deduces the steepness from, for example, the spectral perceptual weighting function itself or from a transform window length switching. By the letter deduction, for example, the slope may be adapted to the window length.
There are different manners feasible by way of which noise filling apparatus 30 causes the noise 9 to exhibit the spectrally global tilt.
As will be described in more detail below, it would be feasible to adaptively set the portion of the whole spectrum within which noise filling is performed by noise filling apparatus 30.
In connection with the embodiments outlined further below, according to which contiguous spectral zero-portions in spectrum 34, i.e. spectrum holes, are filled in a specific non-flat and tonality dependent manner, it will be explained that there are also alternatives for the multiplication 11 illustrated in
All of the embodiments described above have in common that spectrum holes are avoided and that also concealing of tonal non-zero quantized lines is avoided. In the manner described above, the energy in noisy parts of a signal may be preserved and the adding of noise that masked tonal components is avoided in a manner described above.
In the specific implementations described below, the part of the side information for performing the tonality dependent noise filling does not add anything to the existing side information of the codec where the noise filling is used. All information from the data stream that is used for the reconstruction of the spectrum, regardless of the noise filling, may also be used for the shaping of the noise filling.
In accordance with an implementation example, the noise filling in noise filler 30 is performed as follows. All spectral lines above a noise filling start index that are quantized to zero are replaced with a non-zero value. This is done, for example, in a random or pseudorandom manner with spectrally constant probability density function or using patching from other spectral spectrogram locations (sources). See, for example,
The inserted noise is shaped in the following steps:
The only additional side info needed for the noise filling is the level, which is transmitted using 3 bits, for example.
When using FDNS there is no need to adapt it to a specific noise filling and it shapes the noise over the complete spectrum using smaller number of bits than the scale factors.
A spectral tilt may be introduced in the inserted noise to counteract the spectral tilt from the pre-emphasis in the LPC-based perceptual noise shaping. Since the pre-emphasis represents a gentle high-pass filter applied to the input signal, the tilt compensation may counteract this by multiplying the equivalent of the transfer function of a subtle low-pass filter onto the inserted noise spectrum. The spectral tilt of this low-pass operation is dependent on the pre-emphasis factor and, advantageously, bit-rate and bandwidth. This was discussed referring to
For each spectral hole, constituted from 1 or more consecutive zero-quantized spectral lines, the inserted noise may be shaped as depicted in
The transition width is dependent on the tonality of the input signal. The tonality is obtained for each time frame. In
The tonality measure of the spectrum may be based on the information available in the bitstream:
The transition width is proportional to the tonality—small for noise like signals, big for very tonal signals.
In an embodiment, the transition width is proportional to the LTP gain if the LTP gain>0. If the LTP gain is equal to 0 and the spectrum rearrangement is enabled then the transition width for the average LTP gain is used. If the TNS is enabled then there is no transition area, but the full noise filling should be applied to all zero-quantized spectral lines. If the LTP gain is equal to 0 and the TNS and the spectrum rearrangement are disabled, a minimum transition width is used.
If there is no tonality information in the bitstream a tonality measure may be calculated on the decoded signal without the noise filling. If there is no TNS information, a temporal flatness measure may be calculated on the decoded signal. If, however, TNS information is available, such a flatness measure may be derived from the TNS filter coefficients directly, e.g. by computing the filter's prediction gain.
In the encoder, the noise filling level may be calculated by taking the transition width into account. Several ways to determine the noise filling level from the quantized spectrum are possible. The simplest is to sum up the energy (square) of all lines of the normalized input spectrum in the noise filling region (i.e. above iStart) which were quantized to zero, then to divide this sum by the number of such lines to obtain the average energy per line, and to finally compute a quantized noise level from the square root of the average line energy. In this way, the noise level is effectively derived from the RMS of the spectral components quantized to zero. Let, for example, A be the set of indices i of spectral lines where the spectrum has been quantized to zero and which belong to any of the zero-portions, e.g. is above start frequency, and let N denote the global noise scaling factor. The values of the spectrum as not yet quantized shall be denoted yi. Further, left(i) shall be a function indicating for any zero-quantized spectral value at index i the index of the zero-quantized value at the low-frequency end of the zero-portion to which i belongs, and Fi(j) with j=0 to Ji−1 shall denote the function assigned to, depending on the tonality, the zero-portion starting at index i, with Ji indicating the width of that zero-portion. Then, N may be determined by N=sqrt(Σi∈Ayi2/cardinality(A)).
In the embodiment, the individual hole sizes as well as the transition width are considered. To this end, runs of consecutive zero-quantized lines are grouped into hole regions. Each normalized input spectral line in a hole region, i.e. each spectral value of the original signal at a spectral position within any contiguous spectral zero-portion, is then scaled by the transition function, as described in the previous section, and subsequently the sum of the energies of the scaled lines is calculated. Like in the previous simple embodiment, the noise filling level can then be computed from the RMS of the zero-quantized lines. Applying the above nomenclature, N may be computed as by N=sqrt(Σi∈A(Fleft(i)(i−left(i))·yi)2/cardinality(A)).
A problem with this approach, however, is that the spectral energy in small hole regions (i.e. regions with a width of much less than twice the transition width) is underestimated since in the RMS calculation, the number of spectral lines in the sum by which the energy sum is divided is unchanged. In other words, when the quantized spectrums exhibits mostly many small hole regions, the resulting noise filling level will be lower than when the spectrum is sparse and has only a few long hole regions. To ensure that in both of these cases a similar noise level is found, it is therefore advantageous to adapt the line-count used in the denominator of the RMS computation to the transition width. Most importantly, if a hole region size is smaller than twice the transition width, the number of spectral lines in that hole region is not counted as-is, i.e. as an integer number of lines, but as a fractional line-number which is less than the integer line-number. In the above formula concerning N, for example, the “cardinality(A)” would be replaced by a smaller number depending on the number of “small” zero-portions.
Furthermore, the compensation of the spectral tilt in the noise filling due to the LPC-based perceptual coding should also be taken into account during the noise level calculation. More specifically, the inverse of the decoder-side noise filling tilt compensation is applied to the original unquantized spectral lines which were quantized to zero, before the noise level is computed. In the context of LPC-based coding employing pre-emphasis, this implies that higher-frequency lines are amplified slightly with respect to lower-frequency lines prior to the noise level estimation. Applying the above nomenclature, N may be computed as by N=sqrt(Σi∈A(Fleft(i)(i−left(i))·LPF(i)−1·yi)2/cardinality(A)). As mentioned above, depending on the circumstances, the function LPF which corresponds to function 15 may have a positive slope and LPF changed to read HPF accordingly. It is briefly noted that in all above formulae using “LPF”, setting Fleft to a constant function such as to be all one, would reveal a way how to apply the concept of subjecting the moise to be filled into the spectrum 34 with a spectrally global tilt without the tonality-dependent hole filling.
The possible computations of N may be performed in the encoder such as, for example, in 108 or 154.
Finally, it was found that when harmonics of a very tonal, stationary signal were quantized to zero, the lines representing these harmonics lead to a relatively high or unstable (i.e. time-fluctuating) noise level. This artifact can be reduced by using in the noise level calculation the average magnitude of zero-quantized lines instead of their RMS. While this alternative approach does not guarantee that the energy of the noise filled lines in the decoder reproduces the energy of the original lines in the noise filling regions, it does ensure that spectral peaks in the noise filling regions have only limited contribution to the overall noise level, thereby reducing the risk of overestimation of the noise level.
Finally, it is noted that an encoder may even be configured to perform the noise filling completely in order to keep itself in line with the decoder such as, for example, for analysis by synthesis purposes.
Thus, the above embodiment, inter alias, describes a signal adaptive method for replacing the zeros introduced in the quantization process with spectrally shaped noise. A noise filling extension for an encoder and a decoder are described that fulfill the abovementioned requirements by implementing the following:
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
[1] B. G. G. F. S. G. M. M. H. P. J. H. S. W. G. S. J. H. Nikolaus Rettelbach, “Noise Filler, Noise Filling Parameter Calculator Encoded Audio Signal Representation, Methods and Computer Program”. Patent US 2011/0173012 A1.
[2] Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec, 3GPP TS 26.290 V6.3.0, 2005-2006.
[3] B. G. G. F. S. G. M. M. H. P. J. H. S. W. G. S. J. H. Nikolaus Rettelbach, “Audio encoder, audio decoder, methods for encoding and decoding an audio signal, audio stream and computer program”. Patent WO 2010/003556 A1.
[4] M. M. N. R. G. F. J. R. J. L. S. W. S. B. S. D. C. H. R. L. P. G. B. B. J. L. K. K. H. Max Neuendorf, “MPEG Unified Speech and Audio Coding—The ISO/MPEG Standard for High-Efficiency Audio Coding of all Content Types,” in 132nd Convertion AES, Budapest, 2012. Also appears in the Journal of the AES, vol. 61, 2013.
[5] M. M. M. N. a. R. G. Guillaume Fuchs, “MDCT-Based Coder for Highly Adaptive Speech and Audio Coding,” in 17th European Signal Processing Conference (EUSIPCO 2009), Glasgow, 2009.
[6] H. Y. K. Y. M. T. Harada Noboru, “Coding Method, Decoding Method, Coding Device, Decoding Device, Program, and Recording Medium”. Patent WO 2012/046685 A1.
This application is a continuation of copending U.S. application Ser. No. 15/698,442 filed Sep. 7, 2017, which is continuation of U.S. application Ser. No. 14/812,354, filed Jul. 29, 2015, which is a continuation of International Application No. PCT/EP2014/051630, filed Jan. 28, 2014, which claims priority from US Application No. 61/758,209, filed Jan. 29, 2013, which are each incorporated herein in its entirety by this reference thereto.
Number | Date | Country | |
---|---|---|---|
61758209 | Jan 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15698442 | Sep 2017 | US |
Child | 16523588 | US | |
Parent | 14812354 | Jul 2015 | US |
Child | 15698442 | US | |
Parent | PCT/EP2014/051630 | Jan 2014 | US |
Child | 14812354 | US |