The present invention relates to audio coding/decoding and, particularly, to audio coding using Intelligent Gap Filling (IGF).
Audio coding is the domain of signal compression that deals with exploiting redundancy and irrelevancy in audio signals using psychoacoustic knowledge. Today audio codecs typically need around 60 kbps/channel for perceptually transparent coding of almost any type of audio signal. Newer codecs are aimed at reducing the coding bitrate by exploiting spectral similarities in the signal using techniques such as bandwidth extension (BWE). A BWE scheme uses a low bitrate parameter set to represent the high frequency (HF) components of an audio signal. The HF spectrum is filled up with spectral content from low frequency (LF) regions and the spectral shape, tilt and temporal continuity adjusted to maintain the timbre and color of the original signal. Such BWE methods enable audio codecs to retain good quality at even low bitrates of around 24 kbps/channel.
The inventive audio coding system efficiently codes arbitrary audio signals at a wide range of bitrates. Whereas, for high bitrates, the inventive system converges to transparency, for low bitrates perceptual annoyance is minimized. Therefore, the main share of available bitrate is used to waveform code just the perceptually most relevant structure of the signal in the encoder, and the resulting spectral gaps are filled in the decoder with signal content that roughly approximates the original spectrum. A very limited bit budget is consumed to control the parameter driven so-called spectral Intelligent Gap Filling (IGF) by dedicated side information transmitted from the encoder to the decoder.
Storage or transmission of audio signals is often subject to strict bitrate constraints. In the past, coders were forced to drastically reduce the transmitted audio bandwidth when only a very low bitrate was available.
Modern audio codecs are nowadays able to code wide-band signals by using bandwidth extension (BWE) methods [1]. These algorithms rely on a parametric representation of the high-frequency content (HF)—which is generated from the waveform coded low-frequency part (LF) of the decoded signal by means of transposition into the HF spectral region (“patching”) and application of a parameter driven post processing. In BWE schemes, the reconstruction of the HF spectral region above a given so-called cross-over frequency is often based on spectral patching. Typically, the HF region is composed of multiple adjacent patches and each of these patches is sourced from band-pass (BP) regions of the LF spectrum below the given cross-over frequency. State-of-the-art systems efficiently perform the patching within a filterbank representation,e.g. Quadrature Mirror Filterbank (QMF), by copying a set of adjacent subband coefficients from a source to the target region.
Another technique found in today's audio codecs that increases compression efficiency and thereby enables extended audio bandwidth at low bitrates is the parameter driven synthetic replacement of suitable parts of the audio spectra. For example, noise-like signal portions of the original audio signal can be replaced without substantial loss of subjective quality by artificial noise generated in the decoder and scaled by side information parameters. One example is the Perceptual Noise Substitution (PNS) tool contained in MPEG-4 Advanced Audio Coding (AAC) [5].
A further provision that also enables extended audio bandwidth at low bitrates is the noise filling technique contained in MPEG-D Unified Speech and Audio Coding (USAC) [7].
Spectral gaps (zeroes) that are inferred by the dead-zone of the quantizer due to a too coarse quantization, are subsequently filled with artificial noise in the decoder and scaled by a parameter-driven post-processing.
Another state-of-the-art system is termed Accurate Spectral Replacement (ASR) [2-4]. In addition to a waveform codec, ASR employs a dedicated signal synthesis stage which restores perceptually important sinusoidal portions of the signal at the decoder. Also, a system described in [5] relies on sinusoidal modeling in the HF region of a waveform coder to enable extended audio bandwidth having decent perceptual quality at low bitrates. All these methods involve transformation of the data into a second domain apart from the Modified Discrete Cosine Transform (MDCT) and also fairly complex analysis/synthesis stages for the preservation of HF sinusoidal components.
Furthermore, even though the typical audio core coders operate in the spectral domain, the core decoder nevertheless generates a time domain signal which is then, again, converted into a spectral domain by the filter bank 1326 functionality. This introduces additional processing delays, may introduce artifacts due to tandem processing of firstly transforming from the spectral domain into the frequency domain and again transforming into typically a different frequency domain and, of course, this also necessitates a substantial amount of computation complexity and thereby electric power, which is specifically an issue when the bandwidth extension technology is applied in mobile devices such as mobile phones, tablet or laptop computers, etc.
Current audio codecs perform low bitrate audio coding using BWE as an integral part of the coding scheme. However, BWE techniques are restricted to replace high frequency (HF) content only. Furthermore, they do not allow perceptually important content above a given cross-over frequency to be waveform coded. Therefore, contemporary audio codecs either lose HF detail or timbre when the BWE is implemented, since the exact alignment of the tonal harmonics of the signal is not taken into consideration in most of the systems.
Another shortcoming of the current state of the art BWE systems is the need for transformation of the audio signal into a new domain for implementation of the BWE (e.g. transform from MDCT to QMF domain). This leads to complications of synchronization, additional computational complexity and increased memory requirements.
Storage or transmission of audio signals is often subject to strict bitrate constraints. In the past, coders were forced to drastically reduce the transmitted audio bandwidth when only a very low bitrate was available. Modern audio codecs are nowadays able to code wide-band signals by using bandwidth extension (BWE) methods [1-2]. These algorithms rely on a parametric representation of the high-frequency content (HF)—which is generated from the waveform coded low-frequency part (LF) of the decoded signal by means of transposition into the HF spectral region (“patching”) and application of a parameter driven post processing.
In BWE schemes, the reconstruction of the HF spectral region above a given so-called cross-over frequency is often based on spectral patching. Other schemes that are functional to fill spectral gaps, e.g. Intelligent Gap Filling (IGF), use neighboring so-called spectral tiles to regenerate parts of audio signal HF spectra. Typically, the HF region is composed of multiple adjacent patches or tiles and each of these patches or tiles is sourced from band-pass (BP) regions of the LF spectrum below the given cross-over frequency. State-of-the-art systems efficiently perform the patching or tiling within a filterbank representation by copying a set of adjacent subband coefficients from a source to the target region. Yet, for some signal content, the assemblage of the reconstructed signal from the LF band and adjacent patches within the HF band can lead to beating, dissonance and auditory roughness.
Therefore, in [19], the concept of dissonance guard-band filtering is presented in the context of a filterbank-based BWE system. It is suggested to effectively apply a notch filter of approx. 1 Bark bandwidth at the cross-over frequency between LF and BWE-regenerated HF to avoid the possibility of dissonance and replace the spectral content with zeros or noise.
However, the proposed solution in [19] has some drawbacks: First, the strict replacement of spectral content by either zeros or noise can also impair the perceptual quality of the signal. Moreover, the proposed processing is not signal adaptive and can therefore harm perceptual quality in some cases. For example, if the signal contains transients, this can lead to pre- and post-echoes.
Second, dissonances can also occur at transitions between consecutive HF patches. The proposed solution in [19] is only functional to remedy dissonances that occur at cross-over frequency between LF and BWE-regenerated HF.
Last, as opposed to filter bank based systems like proposed in [19], BWE systems can also be realized in transform based implementations, like e.g. the Modified Discrete Cosine Transform (MDCT). Transforms like MDCT are very prone to so-called warbling [20] or ringing artifacts that occur if bandpass regions of spectral coefficients are copied or spectral coefficients are set to zero like proposed in [19].
Particularly, U.S. Pat. No. 8,412,365 discloses to use, in filterbank based translation or folding, so-called guard-bands which are inserted and made of one or several subband channels set to zero. A number of filterbank channels is used as guard-bands, and a bandwidth of a guard-band should be 0.5 Bark. These dissonance guard-bands are partially reconstructed using random white noise signals, i.e., the subbands are fed with white noise instead of being zero. The guard bands are inserted irrespective of the current signal to processed.
According to an embodiment, an apparatus for decoding an encoded audio signal including an encoded core signal and parametric data may have: a core decoder for decoding the encoded core signal to obtain a decoded core signal; an analyzer for analyzing the decoded core signal before or after performing a frequency regeneration operation to provide an analysis result; and a frequency regenerator for regenerating spectral portions not included in the decoded core signal using a spectral portion of the decoded core signal, the parametric data, and the analysis result, wherein the analyzer is configured for detecting a splitting of a peak portion in the spectral portion of the decoded core signal or in a regenerated signal at a frequency border of the decoded core signal or at a frequency border between two regenerated spectral portions generated by using the same or different spectral portions of the decoded core signal or at a maximum frequency border of the regenerated signal, and wherein the frequency regenerator is configured to changing the frequency border between the decoded core signal and the regenerated signal or the frequency border between two regenerated spectral portions generated by using the same or different spectral portions of the decoded signal or to changing the maximum frequency border so that the splitting is reduced or eliminated.
According to another embodiment, a method of decoding an encoded audio signal including an encoded core signal and parametric data may have the steps of: decoding the encoded core signal to obtain a decoded core signal; analyzing the decoded core signal before or after performing a frequency regeneration operation to provide an analysis result; and regenerating spectral portions not included in the decoded core signal using a spectral portion of the decoded core signal, the parametric data, and the analysis result wherein the analyzing includes detecting a splitting of a peak portion in the spectral portion of the decoded core signal or in a regenerated signal at a frequency border of the decoded core signal or at a frequency border between two regenerated spectral portions generated by using the same or different spectral portions of the decoded core signal or at a maximum frequency border of the regenerated signal, and wherein the regenerating includes changing the frequency border between the decoded core signal and the regenerated signal or the frequency border between two regenerated spectral portions generated by using the same or different spectral portions of the decoded signal or changing the maximum frequency border so that the splitting is reduced or eliminated.
According to another embodiment, an apparatus for decoding an encoded audio signal including an encoded core signal and parametric data may have: a core decoder for decoding the encoded core signal to obtain a decoded core signal; a frequency regenerator for regenerating spectral portions not included in the decoded core signal using a spectral portion of the decoded core signal to obtain a regenerated signal, the parametric data, and an analysis result, wherein the frequency regenerator is configured to generate a preliminary regenerated signal using parameters for the preliminary regeneration, an analyzer for analyzing the preliminary regenerated signal to detect artifact-creating signal portions as the analysis result, and wherein the frequency regenerator further includes a manipulator for manipulating the preliminary regenerated signal in order to obtain the regenerated signal or for performing a further regeneration with parameters being different from the parameters for the preliminary regeneration in order to obtain the regenerated signal in which the artifact-creating signal portions are reduced or eliminated.
According to another embodiment, a method of decoding an encoded audio signal including an encoded core signal and parametric data may have the steps of: decoding the encoded core signal to obtain a decoded core signal; regenerating spectral portions not included in the decoded core signal using a spectral portion of the decoded core signal, the parametric data, and an analysis result, analyzing the preliminary regenerated signal to detect artifact-creating signal portions as the analysis result, and wherein the regenerating further includes manipulating the preliminary regenerated signal in order to obtain the regenerated signal or performing a further regeneration with parameters being different from the parameters for the preliminary regeneration in order to obtain the regenerated signal in which the artifact-creating signal portions are reduced or eliminated.
Another embodiment may have a computer program for performing, when running on a computer or a processor, the inventive methods.
In accordance with the present invention, a decoder-side signal analysis using an analyzer is performed for analyzing the decoded core signal before or after performing a frequency regeneration operation to provide an analysis result. Then, this analysis result is used by a frequency regenerator for regenerating spectral portions not included in the decoded core signal.
Thus, in contrast to a fixed decoder-setting, where the patching or frequency tiling is performed in a fixed way, i.e., where a certain source range is taken from the core signal and certain fixed frequency borders are applied to either set the frequency between the source range and the reconstruction range or the frequency border between two adjacent frequency patches or tiles within the reconstruction range, a signal-dependent patching or tiling is performed, in which, for example, the core signal can be analyzed to find local minima in the core signal and, then, the core range is selected so that the frequency borders of the core range coincide with local minima in the core signal spectrum.
Alternatively or additionally, a signal analysis can be performed on a preliminary regenerated signal or preliminary frequency-patched or tiled signal, wherein, after the preliminary frequency regeneration procedure, the border between the core range and the reconstruction range is analyzed in order to detect any artifact-creating signal portions such as tonal portions being problematic in that they are quite close to each other to generate a beating artifact when being reconstructed. Alternatively or additionally, the borders can also be examined in such a way that a halfway-clipping of a tonal portion is detected and this clipping of a tonal portion would also create an artifact when being reconstructed as it is. In order to avoid these procedures, the frequency border of the reconstruction range and/or the source range and/or between two individual frequency tiles or patches in the reconstruction range can be modified by a signal manipulator in order to again perform a reconstruction with the newly set borders.
Additionally, or alternatively, the frequency regeneration is a regeneration based on the analysis result in that the frequency borders are left as they are and an elimination or at least attenuation of problematic tonal portions near the frequency borders between the source range and the reconstruction range or between two individual frequency tiles or patches within the reconstruction range is done. Such tonal portions can be close tones that would result in a beating artifact or could be halfway-clipped tonal portions.
Specifically, when a non-energy conserving transform is used such as an MDCT, a single tone does not directly map to a single spectral line. Instead, a single tone will map to a group of spectral lines with certain amplitudes depending on the phase of the tone. When a patching operation clips this tonal portion, then this will result in an artifact after reconstruction even though a perfect reconstruction is applied as in an MDCT reconstructor. This is due to the fact that the MDCT reconstructor would necessitate the complete tonal pattern for a tone in order to finally correctly reconstruct this tone. Due to the fact that a clipping has taken place before, this is not possible anymore and, therefore, a time varying warbling artifact will be created. Based on the analysis in accordance with the present invention, the frequency regenerator will avoid this situation by attenuating the complete tonal portion creating an artifact or as discussed before, by changing corresponding border frequencies or by applying both measures or by even reconstructing the clipped portion based on a certain pre-knowledge on such tonal patterns.
Additionally or alternatively, a cross-over filtering can be applied for spectrally cross-over filtering the decoded core signal and the first frequency tile having frequencies extending from a gap filling frequency to a first tile stop frequency or for a spectrally cross-over filtering a first frequency tile and a second frequency tile.
This cross-over filtering is useful for reducing the so-called filter ringing.
The inventive approach is mainly intended to be applied within a BWE based on a transform like the MDCT. Nevertheless, the teachings of the invention are generally applicable, e.g. analogously within a Quadrature Mirror Filter bank (QMF) based system, especially if the system is critically sampled, e.g. a real-valued QMF representation.
The inventive approach is based on the observation that auditory roughness, beatings and dissonance can only take place if the signal content in spectral regions closed to transition points (like the cross-over frequency or patch borders) is very tonal. Therefore, the proposed solution for the drawbacks found in state of the art consists of a signal adaptive detection of tonal components in transition regions and the subsequent attenuation or removal of these components. The attenuation or removal of these components can be accomplished by spectral interpolation from foot to foot of such a component, or, alternatively by zero or noise insertion. Alternatively, the spectral location of the transitions can be chosen signal adaptively such that transition artifacts are minimized.
In addition, this technique can be used to reduce or even avoid filter ringing. Especially for transient-like signals, ringing is an audible and annoying artifact. Filter ringing artifacts are caused by the so-called brick-wall characteristic of a filter in the transition band (a steep transition from pass band to stop band at the cut-off frequency). Such filters can be efficiently implemented by setting one coefficient or groups of coefficients to zero in the frequency domain of a time-frequency transform. So, in the case of BWE, we propose to apply a cross-over filter at each transition frequency between patches or between core-band and first patch to reduce said ringing effect. The cross-over filter can be implemented by spectral weighting in the transform domain employing suitable gain functions.
In accordance with a further aspect of the present invention, an apparatus for decoding an encoded audio signal comprises a core decoder, a tile generator for generating one or more spectral tiles having frequencies not included in the decoded core signal using a spectral portion of the decoded core signal and a cross-over filter for spectrally cross-over filtering the decoded core signal and a first frequency tile having frequencies extending from a gap filling frequency to a first tile stop frequency or for spectrally cross-over filtering a tile and a further frequency tile, the further frequency tile having a lower border frequency being frequency-adjacent to an upper border frequency of the frequency tile.
This procedure is intended to be applied within a bandwidth extension based on a transform like the MDCT. However, the present invention is generally applicable and, particularly in a bandwidth extension scenario relying on a quadrature mirror filterbank (QMF), particularly if the system is critically sampled, for example when there is a real-valued QMF representation as a time-frequency conversion or as a frequency-time conversion.
The embodiment is particularly useful for transient-like signals, since for such transient-like signals, ringing is an audible and annoying artifact. Filter ringing artifacts are caused by the so-called brick-wall characteristic of a filter in the transition band, i.e., a steep transition from a pass band to a stop band at a cut-off frequency. Such filters can be efficiently implemented by setting one coefficient or groups of coefficients to zero in a frequency domain of a time-frequency transform. Therefore, the present invention relies on a cross-over filter at each transition frequency between patches/tiles or between a core band and a first patch/tile to reduce this ringing artifact. The cross-over filter is implemented by spectral weighting in the transform domain employing suitable gain functions.
The cross-over filter is signal-adaptive and consists of two filters, a fade-out filter, which is applied to the lower spectral region and a fade-in filter, which is applied to the higher spectral region. The filters can be symmetric or asymmetric depending on the specific implementation.
In a further embodiment, a frequency tile or frequency patch is not only subjected to cross-over filtering, but the tile generator performs, before performing the cross-over filtering, a patch adaption comprising a setting of frequency borders at spectral minima and a removal or attenuation of tonal portions remaining in transition ranges around the transition frequencies.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
The core decoder 600 is implemented as an entropy (e.g. Huffman or arithmetic decoder) decoding and dequantizing stage 612 as illustrated in
Subsequently,
In case problematic tonal components have been discovered near frequency borders, a transition frequency adjuster 706 performs an adjustment of a transition frequency such as a transition frequency or cross-over frequency or gap filling start frequency between the core band and the reconstruction band or between individual frequency portions generated by one and the same source data in the reconstruction band. The output signal of block 706 is forwarded to a remover 708 of tonal components at borders. The remover is configured for removing remaining tonal components which are still there subsequent to the transition frequency adjustment by block 706. The result of the remover 708 is then forwarded to a cross-over filter 710 in order to address the filter ringing problem and the result of the cross-over filter 710 is then input into a spectral envelope shaping block 712 which performs a spectral envelope shaping in the reconstruction band.
As discussed in the context of
The detector 720 now controls a manipulator 722 for manipulating the signal, i.e., the preliminary regenerated signal. This manipulation can be done by actually processing the preliminary regenerated signal by line 723 or by newly performing a regeneration, but now with, for example, the amended transition frequencies as illustrated by line 724.
One implementation of the manipulation procedure is that the transition frequency is adjusted as illustrated at 706 in
An alternative implementation is illustrated in
A further implementation is illustrated in
Then, the spectrally adjusted signal output by block 826 is input into a frequency-time converter which, additionally, receives the first spectral portions, i.e., a spectral representation of the output signal of the core decoder 600. The output of the frequency-time converter 828 can then be used for storage or for transmitting to a loudspeaker for audio rendering.
The present invention can be applied either to known frequency regeneration procedures such as illustrated in
Typically, a first spectral portion such as 306 of
The decoder further comprises a frequency regenerator 116 for regenerating a reconstructed second spectral portion having the first spectral resolution using a first spectral portion. The frequency regenerator 116 performs a tile filling operation, i.e., uses a tile or portion of the first set of first spectral portions and copies this first set of first spectral portions into the reconstruction range or reconstruction band having the second spectral portion and typically performs spectral envelope shaping or another operation as indicated by the decoded second representation output by the parametric decoder 114, i.e., by using the information on the second set of second spectral portions. The decoded first set of first spectral portions and the reconstructed second set of spectral portions as indicated at the output of the frequency regenerator 116 on line 117 is input into a spectrum-time converter 118 configured for converting the first decoded representation and the reconstructed second spectral portion into a time representation 119, the time representation having a certain high sampling rate.
The spectral analyzer/tonal mask 226 separates the output of TNS block 222 into the core band and the tonal components corresponding to the first set of first spectral portions 103 and the residual components corresponding to the second set of second spectral portions 105 of
The analysis filterbank 222 is implemented as an MDCT (modified discrete cosine transform filterbank) and the MDCT is used to transform the signal 99 into a time-frequency domain with the modified discrete cosine transform acting as the frequency analysis tool.
The spectral analyzer 226 applies a tonality mask. This tonality mask estimation stage is used to separate tonal components from the noise-like components in the signal. This allows the core coder 228 to code all tonal components with a psycho-acoustic module. The tonality mask estimation stage can be implemented in numerous different ways and is implemented similar in its functionality to the sinusoidal track estimation stage used in sine and noise-modeling for speech/audio coding [8, 9] or an HILN model based audio coder described in [10]. Advantageously, an implementation is used which is easy to implement without the need to maintain birth-death trajectories, but any other tonality or noise detector can be used as well.
The IGF module calculates the similarity that exists between a source region and a target region. The target region will be represented by the spectrum from the source region. The measure of similarity between the source and target regions is done using a cross-correlation approach. The target region is split into nTar non-overlapping frequency tiles. For every tile in the target region, nSrc source tiles are created from a fixed start frequency. These source tiles overlap by a factor between 0 and 1, where 0 means 0% overlap and 1 means 100% overlap. Each of these source tiles is correlated with the target tile at various lags to find the source tile that best matches the target tile. The best matching tile number is stored in tileNum[idx_tar], the lag at which it best correlates with the target is stored in xcorr_lag[idx_tar][idx_src] and the sign of the correlation is stored in xcorr_sign[idx_tar][idx_src]. In case the correlation is highly negative, the source tile needs to be multiplied by −1 before the tile filling process at the decoder. The IGF module also takes care of not overwriting the tonal components in the spectrum since the tonal components are preserved using the tonality mask. A band-wise energy parameter is used to store the energy of the target region enabling us to reconstruct the spectrum accurately.
This method has certain advantages over the classical SBR [1] in that the harmonic grid of a multi-tone signal is preserved by the core coder while only the gaps between the sinusoids is filled with the best matching “shaped noise” from the source region. Another advantage of this system compared to ASR (Accurate Spectral Replacement) [2-4] is the absence of a signal synthesis stage which creates the important portions of the signal at the decoder. Instead, this task is taken over by the core coder, enabling the preservation of important components of the spectrum. Another advantage of the proposed system is the continuous scalability that the features offer. Just using tileNum[idx_tar] and xcorr_lag=0, for every tile is called gross granularity matching and can be used for low bitrates while using variable xcorr_lag for every tile enables us to match the target and source spectra better.
In addition, a tile choice stabilization technique is proposed which removes frequency domain artifacts such as trilling and musical noise.
In case of stereo channel pairs an additional joint stereo processing is applied. This is necessitated, because for a certain destination range the signal can a highly correlated panned sound source. In case the source regions chosen for this particular region are not well correlated, although the energies are matched for the destination regions, the spatial image can suffer due to the uncorrelated source regions. The encoder analyses each destination region energy band, typically performing a cross-correlation of the spectral values and if a certain threshold is exceeded, sets a joint flag for this energy band. In the decoder the left and right channel energy bands are treated individually if this joint stereo flag is not set. In case the joint stereo flag is set, both the energies and the patching are performed in the joint stereo domain. The joint stereo information for the IGF regions is signaled similar the joint stereo information for the core coding, including a flag indicating in case of prediction if the direction of the prediction is from downmix to residual or vice versa.
The energies can be calculated from the transmitted energies in the L/R-domain.
midNrg[k]=leftNrg[k]+rightNrg[k];
sideNrg[k]=leftNrg[k]−rightNrg[k];
with k being the frequency index in the transform domain.
Another solution is to calculate and transmit the energies directly in the joint stereo domain for bands where joint stereo is active, so no additional energy transformation is needed at the decoder side.
The source tiles are created according to the Mid/Side-Matrix:
midTile[k]=0.5·(leftTde[k]+rightTile[k])
sideTile[k]=0.5·(leftTde[k]−rightTile[k])
Energy adjustment:
midTile[k]=midTile[k]*midNrg[k];
sideTile[k]=sideTile[k]*sideNrg[k];
Joint Stereo->LR Transformation:
If no additional prediction parameter is coded:
leftTile[k]=midTile[k]+sideTile[k]
rightTile[k]=midTile[k]−sideTile[k]
If an additional prediction parameter is coded and if the signalled direction is from mid to side:
sideTile[k]=sideTile[k]−predictionCoeff·midTile[k]
leftTile[k]=midTile[k]+sideTile[k]
rightTile[k]=midTile[k]−sideTile[k]
If the signalled direction is from side to mid:
midTile1[k]=midTile[k]−predictionCoeff sideTile[k]
leftTile[k]=midTile1[k]−sideTile[k]
rightTile[k]=midTile1[k]+sideTile[k]
This processing ensures that from the tiles used for regenerating highly correlated destination regions and panned destination regions, the resulting left and right channels still represent a correlated and panned sound source even if the source regions are not correlated, preserving the stereo image for such regions.
In other words, in the bitstream, joint stereo flags are transmitted that indicate whether L/R or M/S as an example for the general joint stereo coding shall be used. In the decoder, first, the core signal is decoded as indicated by the joint stereo flags for the core bands. Second, the core signal is stored in both L/R and M/S representation. For the IGF tile filling, the source tile representation is chosen to fit the target tile representation as indicated by the joint stereo information for the IGF bands.
Temporal Noise Shaping (TNS) is a standard technique and part of AAC [11-13]. TNS can be considered as an extension of the basic scheme of a perceptual coder, inserting an optional processing step between the filterbank and the quantization stage. The main task of the TNS module is to hide the produced quantization noise in the temporal masking region of transient like signals and thus it leads to a more efficient coding scheme. First, TNS calculates a set of prediction coefficients using “forward prediction” in the transform domain, e.g. MDCT. These coefficients are then used for flattening the temporal envelope of the signal. As the quantization affects the TNS filtered spectrum, also the quantization noise is temporarily flat. By applying the invers TNS filtering on decoder side, the quantization noise is shaped according to the temporal envelope of the TNS filter and therefore the quantization noise gets masked by the transient.
IGF is based on an MDCT representation. For efficient coding, long blocks of approx. 20 ms have to be used. If the signal within such a long block contains transients, audible pre- and post-echoes occur in the IGF spectral bands due to the tile filling.
This pre-echo effect is reduced by using TNS in the IGF context. Here, TNS is used as a temporal tile shaping (TTS) tool as the spectral regeneration in the decoder is performed on the TNS residual signal. The necessitated TTS prediction coefficients are calculated and applied using the full spectrum on encoder side as usual. The TNS/TTS start and stop frequencies are not affected by the IGF start frequency fIGFstart of the IGF tool. In comparison to the legacy TNS, the TTS stop frequency is increased to the stop frequency of the IGF tool, which is higher than fIGFstart. On decoder side the TNS/TTS coefficients are applied on the full spectrum again, i.e. the core spectrum plus the regenerated spectrum plus the tonal components from the tonality map (see
In legacy decoders, spectral patching on an audio signal corrupts spectral correlation at the patch borders and thereby impairs the temporal envelope of the audio signal by introducing dispersion. Hence, another benefit of performing the IGF tile filling on the residual signal is that, after application of the shaping filter, tile borders are seamlessly correlated, resulting in a more faithful temporal reproduction of the signal.
In an inventive encoder, the spectrum having undergone TNS/TTS filtering, tonality mask processing and IGF parameter estimation is devoid of any signal above the IGF start frequency except for tonal components. This sparse spectrum is now coded by the core coder using principles of arithmetic coding and predictive coding. These coded components along with the signaling bits form the bitstream of the audio.
The high resolution is defined by a line-wise coding of spectral lines such as MDCT lines, while the second resolution or low resolution is defined by, for example, calculating only a single spectral value per scale factor band, where a scale factor band covers several frequency lines. Thus, the second low resolution is, with respect to its spectral resolution, much lower than the first or high resolution defined by the line-wise coding typically applied by the core encoder such as an AAC or USAC core encoder.
Regarding scale factor or energy calculation, the situation is illustrated in
Particularly, when the core encoder is under a low bitrate condition, an additional noise-filling operation in the core band, i.e., lower in frequency than the IGF start frequency, i.e., in scale factor bands SCB1 to SCB3 can be applied in addition. In noise-filling, there exist several adjacent spectral lines which have been quantized to zero. On the decoder-side, these quantized to zero spectral values are re-synthesized and the re-synthesized spectral values are adjusted in their magnitude using a noise-filling energy such as NF2 illustrated at 308 in
The bands, for which energy information is calculated coincide with the scale factor bands. In other embodiments, an energy information value grouping is applied so that, for example, for scale factor bands 4 and 5, only a single energy information value is transmitted, but even in this embodiment, the borders of the grouped reconstruction bands coincide with borders of the scale factor bands. If different band separations are applied, then certain re-calculations or synchronization calculations may be applied, and this can make sense depending on the certain implementation.
The spectral domain encoder 106 of
In the audio encoder of
Then, at the output of block 422, a quantized spectrum is obtained corresponding to what is illustrated in
The set to zero blocks 410, 418, 422, which are provided alternatively to each other or in parallel are controlled by the spectral analyzer 424. The spectral analyzer comprises any implementation of a well-known tonality detector or comprises any different kind of detector operative for separating a spectrum into components to be encoded with a high resolution and components to be encoded with a low resolution. Other such algorithms implemented in the spectral analyzer can be a voice activity detector, a noise detector, a speech detector or any other detector deciding, depending on spectral information or associated metadata on the resolution requirements for different spectral portions.
Subsequently, reference is made to
As illustrated at 301 in
An IGF operation, i.e., a frequency tile filling operation using spectral values from other portions can be applied in the complete spectrum. Thus, a spectral tile filling operation can not only be applied in the high band above an IGF start frequency but can also be applied in the low band. Furthermore, the noise-filling without frequency tile filling can also be applied not only below the IGF start frequency but also above the IGF start frequency. It has, however, been found that high quality and high efficient audio encoding can be obtained when the noise-filling operation is limited to the frequency range below the IGF start frequency and when the frequency tile filling operation is restricted to the frequency range above the IGF start frequency as illustrated in
The target tiles (TT) (having frequencies greater than the IGF start frequency) are bound to scale factor band borders of the full rate coder. Source tiles (ST), from which information is taken, i.e., for frequencies lower than the IGF start frequency are not bound by scale factor band borders. The size of the ST should correspond to the size of the associated TT. This is illustrated using the following example. TT[0] has a length of 10 MDCT Bins. This exactly corresponds to the length of two subsequent SCBs (such as 4+6). Then, all possible ST that are to be correlated with TT[0], have a length of 10 bins, too. A second target tile TT[1] being adjacent to TT[0] has a length of 15 bins I (SCB having a length of 7+8). Then, the ST for that have a length of 15 bins rather than 10 bins as for TT[0].
Should the case arise that one cannot find a TT for an ST with the length of the target tile (when e.g. the length of TT is greater than the available source range), then a correlation is not calculated and the source range is copied a number of times into this TT (the copying is done one after the other so that a frequency line for the lowest frequency of the second copy immediately follows—in frequency—the frequency line for the highest frequency of the first copy), until the target tile TT is completely filled up.
Subsequently, reference is made to
Then, the first spectral portion of the reconstruction band such as 307 of
In this context, it is very important to evaluate the high frequency reconstruction accuracy of the present invention compared to HE-AAC. This is explained with respect to scale factor band 7 in
In an implementation, the spectral analyzer is also implemented to calculating similarities between first spectral portions and second spectral portions and to determine, based on the calculated similarities, for a second spectral portion in a reconstruction range a first spectral portion matching with the second spectral portion as far as possible. Then, in this variable source range/destination range implementation, the parametric coder will additionally introduce into the second encoded representation a matching information indicating for each destination range a matching source range. On the decoder-side, this information would then be used by a frequency tile generator 522 of
Furthermore, as illustrated in
As illustrated, the encoder operates without downsampling and the decoder operates without upsampling. In other words, the spectral domain audio coder is configured to generate a spectral representation having a Nyquist frequency defined by the sampling rate of the originally input audio signal.
Furthermore, as illustrated in
As outlined, the spectral domain audio decoder 112 is configured so that a maximum frequency represented by a spectral value in the first decoded representation is equal to a maximum frequency included in the time representation having the sampling rate wherein the spectral value for the maximum frequency in the first set of first spectral portions is zero or different from zero. Anyway, for this maximum frequency in the first set of spectral components a scale factor for the scale factor band exists, which is generated and transmitted irrespective of whether all spectral values in this scale factor band are set to zero or not as discussed in the context of
The invention is, therefore, advantageous that with respect to other parametric techniques to increase compression efficiency, e.g. noise substitution and noise filling (these techniques are exclusively for efficient representation of noise like local signal content) the invention allows an accurate frequency reproduction of tonal components. To date, no state-of-the-art technique addresses the efficient parametric representation of arbitrary signal content by spectral gap filling without the restriction of a fixed a-priory division in low band (LF) and high band (HF).
Embodiments of the inventive system improve the state-of-the-art approaches and thereby provides high compression efficiency, no or only a small perceptual annoyance and full audio bandwidth even for low bitrates.
The general system consists of
A first step towards a more efficient system is to remove the need for transforming spectral data into a second transform domain different from the one of the core coder. As the majority of audio codecs, such as AAC for instance, use the MDCT as basic transform, it is useful to perform the BWE in the MDCT domain also. A second requirement for the BWE system would be the need to preserve the tonal grid whereby even HF tonal components are preserved and the quality of the coded audio is thus superior to the existing systems. To take care of both the above mentioned requirements for a BWE scheme, a new system is proposed called Intelligent Gap Filling (IGF).
The frequency regenerator 906 further comprises a calculator 914 for a missing energy in the reconstruction band, and the calculator 914 operates using the individual energy for the reconstruction band and the survive energy generated by block 912. Furthermore, the frequency regenerator 906 comprises a spectral envelope adjuster 916 for adjusting the further spectral portions in the reconstruction band based on the missing energy information and the tile energy information generated by block 918.
Reference is made to
Subsequently, a certain example with real numbers is discussed. The remaining survive energy as calculated by block 912 is, for example, five energy units and this energy is the energy of the exemplarily indicated four spectral lines in the first spectral portion 921.
Furthermore, the energy value E3 for the reconstruction band corresponding to scale factor band 6 of
Based on the missing energy divided by the tile energy tEk, a gain factor of 0.79 is calculated. Then, the raw spectral lines for the second spectral portions 922, 923 are multiplied by the calculated gain factor. Thus, only the spectral values for the second spectral portions 922, 923 are adjusted and the spectral lines for the first spectral portion 921 are not influenced by this envelope adjustment. Subsequent to multiplying the raw spectral values for the second spectral portions 922, 923, a complete reconstruction band has been calculated consisting of the first spectral portions in the reconstruction band, and consisting of spectral lines in the second spectral portions 922, 923 in the reconstruction band 920.
The source range for generating the raw spectral data in bands 922, 923 is, with respect to frequency, below the IGF start frequency 309 and the reconstruction band 920 is above the IGF start frequency 309.
Furthermore, it is advantageous that reconstruction band borders coincide with scale factor band borders. Thus, a reconstruction band has, in one embodiment, the size of corresponding scale factor bands of the core audio decoder or are sized so that, when energy pairing is applied, an energy value for a reconstruction band provides the energy of two or a higher integer number of scale factor bands. Thus, when is assumed that energy accumulation is performed for scale factor band 4, scale factor band 5 and scale factor band 6, then the lower frequency border of the reconstruction band 920 is equal to the lower border of scale factor band 4 and the higher frequency border of the reconstruction band 920 coincides with the higher border of scale factor band 6.
Subsequently,
The inverse scaling block 940 provides all first sets of first spectral portions below the IGF start frequency 309 of
Subsequently, reference is made to
The audio encoder has scale factor bands with different frequency bandwidths, i.e., with a different number of spectral values. Therefore, the parametric calculator comprise a normalizer 1012 for normalizing the energies for the different bandwidth with respect to the bandwidth of the specific reconstruction band. To this end, the normalizer 1012 receives, as inputs, an energy in the band and a number of spectral values in the band and the normalizer 1012 then outputs a normalized energy per reconstruction/scale factor band.
Furthermore, the parametric calculator 1006a of
In case the audio encoder is performing the grouping of two or more short windows, this grouping is applied for the energy information as well. When the core encoder performs a grouping of two or more short blocks, then, for these two or more blocks, only a single set of scale factors is calculated and transmitted. On the decoder-side, the audio decoder then applies the same set of scale factors for both grouped windows.
Regarding the energy information calculation, the spectral values in the reconstruction band are accumulated over two or more short windows. In other words, this means that the spectral values in a certain reconstruction band for a short block and for the subsequent short block are accumulated together and only single energy information value is transmitted for this reconstruction band covering two short blocks. Then, on the decoder-side, the envelope adjustment discussed with respect to
The corresponding normalization is then again applied so that even though any grouping in frequency or grouping in time has been performed, the normalization easily allows that, for the energy value information calculation on the decoder-side, only the energy information value on the one hand and the amount of spectral lines in the reconstruction band or in the set of grouped reconstruction bands has to be known.
Furthermore, it is emphasized that an information on spectral energies, an information on individual energies or an individual energy information, an information on a survive energy or a survive energy information, an information a tile energy or a tile energy information, or an information on a missing energy or a missing energy information may comprise not only an energy value, but also an (e.g. absolute) amplitude value, a level value or any other value, from which a final energy value can be derived. Hence, the information on an energy may e.g. comprise the energy value itself, and/or a value of a level and/or of an amplitude and/or of an absolute amplitude.
Main features of embodiments of the invention are as follows:
The embodiment is based on the MDCT that exhibits the above referenced warbling artifacts if tonal spectral areas are pruned by the unfortunate choice of cross-over frequency and/or patch margins, or tonal components get to be placed in too close vicinity at patch borders.
To overcome these problems, the new technique first detects the spectral location of the tonal components contained in the signal. Then, according to one aspect of the invention, it is attempted to adjust the transition frequencies between LF and all patches by individual shifts (within given limits) such that splitting or beating of tonal components is minimized. For that purpose, the transition frequency has to match a local spectral minimum. This step is shown in
According to another aspect of the invention, if problematic spectral content in transition regions remains, at least one of the misplaced tonal components is removed to reduce either the beating artifact at the transition frequencies or the warbling. This is done via spectral extrapolation or interpolation/filtering, as shown in
In other words,
Panel (1) of
Thus, a frequency fx1 illustrates a border frequency 1250 between the source range 1252 and a reconstruction range 1254 extending between the border frequency 1250 and a maximum frequency which is smaller than or equal to the Nyquist frequency fNyquist. On the encoder-side, it is assumed that a signal is bandwidth-limited at fx1 or, when the technology regarding intelligent gap filling is applied, it is assumed that fx1 corresponds to the gap filling start frequency 309 of
On the other hand, this procedure, in which f′x2 has been changed does not effectively address the beating problem which, therefore, is addressed by a removal of the tonal components by filtering or interpolation or any other procedures as discussed in the context of block 708 of
Another option would have been to set the transition border fx1 so that it is a little bit lower so that the tonal portion 1220a is not in the core range anymore. Then, the tonal portion 1220a has also been removed or eliminated by setting the transition frequency fx1 at a lower value.
This procedure would also have worked for addressing the issue with the problematic tonal component 1032. By setting f′x2 even higher, the spectral portion where the tonal portion 1032 is located could have been regenerated within the first patching operation 1225 and, therefore, two adjacent or neighboring tonal portions would not have occurred.
Basically, the beating problem depends on the amplitudes and the distance in frequency of adjacent tonal portions. The detector 704, 720 or stated more general, the analyzer 602 is configured in such a way that an analysis of the lower spectral portion located in the frequency below the transition frequency such as fx1, fx2, f′x2 is analyzed in order to locate any tonal component. Furthermore, the spectral range above the transition frequency is also analyzed in order to detect a tonal component. When the detection results in two tonal components, one to the left of the transition frequency with respect to frequency and one to the right (with respect to ascending frequency), then the remover of tonal components at borders illustrated at 708 in
According to another aspect of the invention, to reduce the filter ringing artifact, a cross-over filter in the frequency domain is applied to two consecutive spectral regions, i.e. between the core band and the first patch or between two patches. Advantageously, the cross-over filter is signal adaptive.
The cross over filter consists of two filters, a fade-out filter hout, which is applied to the lower spectral region, and a fade-in filter hin, which is applied to the higher spectral region.
Each of the filters has length N.
In addition, the slope of both filters is characterized by a signal adaptive value called Xbias determining the notch characteristic of the cross-over filter, with 0≤Xbias≤N:
The basic design of the cross-over filters is constraint to the following equations:
hout(k)=hin(N−1−k),∀Xbias
hout(k)+hin(k)=1,Xbias=0
with k=0, 1, . . . , N−1 being the frequency index.
In this example, the following equation is used to create the filter hout:
The following equation describes how the filters hin and hout are then applied,
Y(kt−(N−1)+k)=LF(kt−(N−1)+k)·hout(k)+HF(kt−(N−1)+k)·hin(k), k=0,1, . . . , N−1
with Y denoting the assembled spectrum, kt being the transition frequency, LF being the low frequency content and HF being the high frequency content.
Next, evidence of the benefit of this technique will be presented. The original signal in the following examples is a transient-like signal, in particular a low pass filtered version thereof, with a cut-off frequency of 22 kHz. First, this transient is band limited to 6 kHz in the transform domain. Subsequently, the bandwidth of the low pass filtered original signal is extended to 24 kHz. The bandwidth extension is accomplished through copying the LF band three times to entirely fill the frequency range that is available above 6 kHz within the transform.
The same effect, yet in a different illustration, is shown in
Subsequently,
Furthermore, a tile generator 1404 for regenerating one or more spectral tiles having frequencies not included in the decoded core signal are generated using a spectral portion of the decoded core signal. The tiles can be reconstructed second spectral portions within a reconstruction band as, for example, illustrated in the context of
Furthermore, a cross-over filter 1406 is provided for spectrally cross-over filtering the decoded core signal and a first frequency tile having frequencies extending from a gap filling frequency 309 to a first tile stop frequency or for spectrally cross-over filtering a first frequency tile 1225 and a second frequency tile 1221, the second frequency tile having a lower border frequency being frequency-adjacent to an upper border frequency of the first frequency tile 1225.
In a further implementation, the cross-over filter 1406 output signal is fed into an envelope adjuster 1408 which applies parametric spectral envelope information included in an encoded audio signal as parametric side information to finally obtain an envelope-adjusted regenerated signal. Elements 1404, 1406, 1408 can be implemented as a frequency regenerator as, for example, illustrated in
On the other hand, only the lowest 21 frequency lines of the first frequency tile 1225 are influenced by the fade-in function 1422a.
Additionally, it becomes clear from the cross-fade functions that the frequency lines between 9 and 13 are influenced, but the fade-in function actually does not influence the frequency lines between 1 and 9 and face-out function 1420a does not influence the frequency lines between 13 and 21. This means that only an overlap would be necessitated between frequency lines 9 and 13, and the cross-over frequency such as fx1 would be placed at frequency sample or frequency bin 11. Thus, only an overlap of two frequency bins or frequency values between the source range and the first frequency tile would be necessitated in order to implement the cross-over or cross-fade function.
Depending on the specific implementation, a higher or lower overlap can be applied and, additionally, other fading functions apart from a cosine function can be used. Furthermore, as illustrated in
As illustrated in
Furthermore, it is advantageous to make the cross-over filter characteristic signal-adaptive. Therefore, based on a signal analysis, the filter characteristic is adapted. Due to the fact that the cross-over filter is particularly useful for transient signals, it is detected whether transient signals occur. When transient signals occur, then a filter characteristic such as illustrated in
Then, based on the transient detection, or based on a tonality detection or based on any other signal characteristic detection, the cross-over filter 1406 characteristic is changed as discussed.
Although some aspects have been described in the context of an apparatus for encoding or decoding, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a non-transitory storage medium such as a digital storage medium, for example a floppy disc, a Hard Disk Drive (HDD), a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive method is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
13177346 | Jul 2013 | EP | regional |
13177348 | Jul 2013 | EP | regional |
13177350 | Jul 2013 | EP | regional |
13177353 | Jul 2013 | EP | regional |
13189382 | Oct 2013 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2014/065118, filed Jul. 15, 2014, which is incorporated herein by reference in its entirety, and which claims priority from European Applications Nos. EP13177346, filed Jul. 22, 2013, EP13177350, filed Jul. 22, 2013, EP13177353, filed Jul. 22, 2013, EP13177348, filed Jul. 22, 2013, and EP13189382, filed Oct. 18, 2013, which are all incorporated herein by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
4757517 | Yatsuzuka | Jul 1988 | A |
5502713 | Lagerqvist et al. | Mar 1996 | A |
5619566 | Fogel | Apr 1997 | A |
5717821 | Tsutsui et al. | Feb 1998 | A |
5926788 | Nishiguchi | Jul 1999 | A |
5950153 | Ohmori et al. | Sep 1999 | A |
5978759 | Tsushima et al. | Nov 1999 | A |
6041295 | Hinderks | Mar 2000 | A |
6061555 | Bultman et al. | May 2000 | A |
6104321 | Akagiri | Aug 2000 | A |
6289308 | Lokhoff | Sep 2001 | B1 |
6424939 | Herre et al. | Jul 2002 | B1 |
6502069 | Grill et al. | Dec 2002 | B1 |
6680972 | Liljeryd et al. | Jan 2004 | B1 |
6708145 | Liljeryd | Mar 2004 | B1 |
6799164 | Araki | Sep 2004 | B1 |
6826526 | Norimatsu et al. | Nov 2004 | B1 |
6963405 | Wheel et al. | Nov 2005 | B1 |
7206740 | Thyssen et al. | Apr 2007 | B2 |
7246065 | Tanaka et al. | Jul 2007 | B2 |
7318027 | Lennon et al. | Jan 2008 | B2 |
7328161 | Oh | Feb 2008 | B2 |
7447317 | Herre et al. | Nov 2008 | B2 |
7447631 | Truman et al. | Nov 2008 | B2 |
7460990 | Mehrotra et al. | Dec 2008 | B2 |
7483758 | Liljeryd et al. | Jan 2009 | B2 |
7502743 | Thumpudi et al. | Mar 2009 | B2 |
7539612 | Thumpudi et al. | May 2009 | B2 |
7739119 | Venkatesha Rao et al. | Jun 2010 | B2 |
7756713 | Chong et al. | Jul 2010 | B2 |
7761303 | Pang et al. | Jul 2010 | B2 |
7801735 | Thumpudi et al. | Sep 2010 | B2 |
7917369 | Chen et al. | Mar 2011 | B2 |
7930171 | Chen et al. | Apr 2011 | B2 |
7945449 | Vinton et al. | May 2011 | B2 |
8078474 | Vos et al. | Dec 2011 | B2 |
8112284 | Kjorling | Feb 2012 | B2 |
8135047 | Rajendran et al. | Mar 2012 | B2 |
8214202 | Bruhn | Jul 2012 | B2 |
8255229 | Koishida et al. | Aug 2012 | B2 |
8412365 | Liljeryd et al. | Apr 2013 | B2 |
8428957 | Garudadri et al. | Apr 2013 | B2 |
8473301 | Chen et al. | Jun 2013 | B2 |
8484020 | Krishnan et al. | Jul 2013 | B2 |
8489403 | Griffin et al. | Jul 2013 | B1 |
8554569 | Chen et al. | Oct 2013 | B2 |
8655670 | Purnhagen et al. | Feb 2014 | B2 |
8892448 | Vos et al. | Nov 2014 | B2 |
9015041 | Disch et al. | Apr 2015 | B2 |
9047875 | Gao | Jun 2015 | B2 |
9111427 | Knox et al. | Aug 2015 | B2 |
9111535 | Yang | Aug 2015 | B2 |
9390717 | Yamamoto et al. | Jul 2016 | B2 |
9646624 | Disch et al. | May 2017 | B2 |
20020128839 | Lindgren et al. | Sep 2002 | A1 |
20030009327 | Nilsson et al. | Jan 2003 | A1 |
20030014136 | Wang et al. | Jan 2003 | A1 |
20030074191 | Byrnes et al. | Apr 2003 | A1 |
20030115042 | Chen et al. | Jun 2003 | A1 |
20030220800 | Budnikov et al. | Nov 2003 | A1 |
20040008615 | Oh | Jan 2004 | A1 |
20040024588 | Watson et al. | Feb 2004 | A1 |
20040028244 | Tsushima et al. | Feb 2004 | A1 |
20040054525 | Sekiguchi et al. | Mar 2004 | A1 |
20050004793 | Ojala et al. | Jan 2005 | A1 |
20050036633 | Jeon et al. | Feb 2005 | A1 |
20050074127 | Herre | Apr 2005 | A1 |
20050096917 | Kjorling | May 2005 | A1 |
20050141721 | Aarts et al. | Jun 2005 | A1 |
20050157891 | Johansen | Jul 2005 | A1 |
20050165611 | Mehrotra et al. | Jul 2005 | A1 |
20050216262 | Fejzo | Sep 2005 | A1 |
20050278171 | Suppappola et al. | Dec 2005 | A1 |
20060006103 | Sirota et al. | Jan 2006 | A1 |
20060031075 | Oh et al. | Feb 2006 | A1 |
20060095269 | Smith et al. | May 2006 | A1 |
20060122828 | Lee et al. | Jun 2006 | A1 |
20060210180 | Geiger et al. | Sep 2006 | A1 |
20060265210 | Ramakrishnan et al. | Nov 2006 | A1 |
20060282263 | Vos et al. | Dec 2006 | A1 |
20070016402 | Schuller et al. | Jan 2007 | A1 |
20070016403 | Schuller et al. | Jan 2007 | A1 |
20070016411 | Kim et al. | Jan 2007 | A1 |
20070027677 | Ouyang et al. | Feb 2007 | A1 |
20070043575 | Onuma | Feb 2007 | A1 |
20070100607 | Villemoes | May 2007 | A1 |
20070112559 | Schuijers | May 2007 | A1 |
20070129036 | Arora | Jun 2007 | A1 |
20070147518 | Bessette et al. | Jun 2007 | A1 |
20070196022 | Geiger et al. | Aug 2007 | A1 |
20070223577 | Ehara et al. | Sep 2007 | A1 |
20070282603 | Bessette | Dec 2007 | A1 |
20080027711 | Rajendran et al. | Jan 2008 | A1 |
20080027717 | Rajendran et al. | Jan 2008 | A1 |
20080040103 | Vinton et al. | Feb 2008 | A1 |
20080052066 | Oshikiri et al. | Feb 2008 | A1 |
20080208538 | Visser et al. | Aug 2008 | A1 |
20080208600 | Pang et al. | Aug 2008 | A1 |
20080262835 | Oshikiri et al. | Oct 2008 | A1 |
20080262853 | Jung et al. | Oct 2008 | A1 |
20080270125 | Choo et al. | Oct 2008 | A1 |
20080281604 | Choo et al. | Nov 2008 | A1 |
20080312758 | Koishida | Dec 2008 | A1 |
20090006103 | Koishida et al. | Jan 2009 | A1 |
20090132261 | Kjorling | May 2009 | A1 |
20090144055 | Davidson et al. | Jun 2009 | A1 |
20090144062 | Ramabadran et al. | Jun 2009 | A1 |
20090180531 | Wein et al. | Jul 2009 | A1 |
20090192789 | Lee et al. | Jul 2009 | A1 |
20090216527 | Oshikiri et al. | Aug 2009 | A1 |
20090226010 | Schnell et al. | Sep 2009 | A1 |
20090228285 | Schnell | Sep 2009 | A1 |
20090234644 | Reznik et al. | Sep 2009 | A1 |
20090263036 | Tanaka | Oct 2009 | A1 |
20090292537 | Ehara et al. | Nov 2009 | A1 |
20100023322 | Schnell et al. | Jan 2010 | A1 |
20100063808 | Gao et al. | Mar 2010 | A1 |
20100070270 | Gao | Mar 2010 | A1 |
20100177903 | Vinton et al. | Jul 2010 | A1 |
20100211399 | Liljeryd et al. | Aug 2010 | A1 |
20100211400 | Oh et al. | Aug 2010 | A1 |
20100241437 | Taleb et al. | Sep 2010 | A1 |
20100286981 | Krini et al. | Nov 2010 | A1 |
20110002266 | Gao | Jan 2011 | A1 |
20110015768 | Lim et al. | Jan 2011 | A1 |
20110046945 | Li et al. | Feb 2011 | A1 |
20110093276 | Raemoe et al. | Apr 2011 | A1 |
20110099004 | Krishnan et al. | Apr 2011 | A1 |
20110106545 | Disch et al. | May 2011 | A1 |
20110125505 | Vaillancourt et al. | May 2011 | A1 |
20110173006 | Nagel et al. | Jul 2011 | A1 |
20110173007 | Multrus et al. | Jul 2011 | A1 |
20110194712 | Potard | Aug 2011 | A1 |
20110200196 | Disch et al. | Aug 2011 | A1 |
20110202352 | Neuendorf et al. | Aug 2011 | A1 |
20110202354 | Grill et al. | Aug 2011 | A1 |
20110202358 | Neuendorf et al. | Aug 2011 | A1 |
20110235809 | Schuijers et al. | Sep 2011 | A1 |
20110238425 | Neuendorf et al. | Sep 2011 | A1 |
20110238426 | Fuchs et al. | Sep 2011 | A1 |
20110257984 | Virette et al. | Oct 2011 | A1 |
20110264454 | Ullberg et al. | Oct 2011 | A1 |
20110264457 | Oshikiri | Oct 2011 | A1 |
20110288873 | Nagel | Nov 2011 | A1 |
20110295598 | Yang et al. | Dec 2011 | A1 |
20110305352 | Villemoes et al. | Dec 2011 | A1 |
20110320212 | Tsujino et al. | Dec 2011 | A1 |
20120002818 | Heiko et al. | Jan 2012 | A1 |
20120029923 | Rajendran et al. | Feb 2012 | A1 |
20120065965 | Choo et al. | Mar 2012 | A1 |
20120095769 | Zhang et al. | Apr 2012 | A1 |
20120136670 | Ishikawa et al. | May 2012 | A1 |
20120158409 | Nagel et al. | Jun 2012 | A1 |
20120209600 | Kim et al. | Aug 2012 | A1 |
20120226505 | Lin | Sep 2012 | A1 |
20120245947 | Neuendorf et al. | Sep 2012 | A1 |
20120253797 | Geiger et al. | Oct 2012 | A1 |
20120265534 | Coorman | Oct 2012 | A1 |
20120271644 | Bessette et al. | Oct 2012 | A1 |
20120296641 | Rajendran et al. | Nov 2012 | A1 |
20130006645 | Jiang et al. | Jan 2013 | A1 |
20130035777 | Niemisto et al. | Feb 2013 | A1 |
20130051571 | Nagel et al. | Feb 2013 | A1 |
20130051574 | Yoo | Feb 2013 | A1 |
20130090933 | Villemoes et al. | Apr 2013 | A1 |
20130090934 | Nagel | Apr 2013 | A1 |
20130121411 | Robillard et al. | May 2013 | A1 |
20130124214 | Yamamoto et al. | May 2013 | A1 |
20130156112 | Suzuki et al. | Jun 2013 | A1 |
20130185085 | Tsujino et al. | Jul 2013 | A1 |
20130282383 | Hedelin et al. | Oct 2013 | A1 |
20130332176 | Setiawan et al. | Dec 2013 | A1 |
20140088973 | Gibbs et al. | Mar 2014 | A1 |
20140149126 | Soulodre | May 2014 | A1 |
20140188464 | Choo | Jul 2014 | A1 |
20140200901 | Kawashima et al. | Jul 2014 | A1 |
20140229186 | Mehrotra et al. | Aug 2014 | A1 |
20150071446 | Sun et al. | Mar 2015 | A1 |
20160035329 | Ekstrand et al. | Feb 2016 | A1 |
20160140980 | Disch et al. | May 2016 | A1 |
20160210977 | Ghido et al. | Jul 2016 | A1 |
20170116999 | Gao | Apr 2017 | A1 |
20170133023 | Disch | May 2017 | A1 |
Number | Date | Country |
---|---|---|
1114122 | Dec 1995 | CN |
1465137 | Dec 2003 | CN |
1467703 | Jan 2004 | CN |
1496559 | May 2004 | CN |
1503968 | Jun 2004 | CN |
1647154 | Jul 2005 | CN |
1659927 | Aug 2005 | CN |
1677491 | Oct 2005 | CN |
1677493 | Oct 2005 | CN |
1813286 | Aug 2006 | CN |
1864436 | Nov 2006 | CN |
1905373 | Jan 2007 | CN |
1918631 | Feb 2007 | CN |
1918632 | Feb 2007 | CN |
101006494 | Jul 2007 | CN |
101067931 | Nov 2007 | CN |
101083076 | Dec 2007 | CN |
101185124 | May 2008 | CN |
101185127 | May 2008 | CN |
101238510 | Aug 2008 | CN |
101325059 | Dec 2008 | CN |
101502122 | Aug 2009 | CN |
101521014 | Sep 2009 | CN |
101609680 | Dec 2009 | CN |
101622669 | Jan 2010 | CN |
101933086 | Dec 2010 | CN |
101939782 | Jan 2011 | CN |
101946526 | Jan 2011 | CN |
102089758 | Jun 2011 | CN |
103038819 | Apr 2013 | CN |
103165136 | Jun 2013 | CN |
103971699 | Aug 2014 | CN |
0751493 | Feb 1997 | EP |
1734511 | Dec 2006 | EP |
1446797 | May 2007 | EP |
2077551 | Mar 2011 | EP |
2830056 | Jan 2015 | EP |
2830059 | Jan 2015 | EP |
2830063 | Jan 2015 | EP |
H07336231 | Dec 1995 | JP |
2001053617 | Feb 2001 | JP |
200250967 | Feb 2002 | JP |
2002268693 | Sep 2002 | JP |
2003108197 | Apr 2003 | JP |
2003140692 | May 2003 | JP |
2004046179 | Feb 2004 | JP |
2006293400 | Oct 2006 | JP |
2006323037 | Nov 2006 | JP |
3898218 | Mar 2007 | JP |
3943127 | Jul 2007 | JP |
2007532934 | Nov 2007 | JP |
2009501358 | Jan 2009 | JP |
2010526346 | Jul 2010 | JP |
2010538318 | Dec 2010 | JP |
2011154384 | Aug 2011 | JP |
2011527447 | Oct 2011 | JP |
2012027498 | Feb 2012 | JP |
2012037582 | Feb 2012 | JP |
2013125187 | Jun 2013 | JP |
2013521538 | Jun 2013 | JP |
2013524281 | Jun 2013 | JP |
1020070118173 | Dec 2007 | KR |
20130025963 | Mar 2013 | KR |
2323469 | Apr 2008 | RU |
2325708 | May 2008 | RU |
2388068 | Apr 2010 | RU |
2422922 | Jun 2011 | RU |
2428747 | Sep 2011 | RU |
2459282 | Aug 2012 | RU |
2470385 | Dec 2012 | RU |
2477532 | Mar 2013 | RU |
2481650 | May 2013 | RU |
2482554 | May 2013 | RU |
2487427 | Jul 2013 | RU |
412719 | Nov 2000 | TW |
200537436 | Nov 2005 | TW |
200939206 | Sep 2009 | TW |
201007696 | Feb 2010 | TW |
201009812 | Mar 2010 | TW |
201034001 | Sep 2010 | TW |
201205558 | Feb 2012 | TW |
201316327 | Apr 2013 | TW |
201333933 | Aug 2013 | TW |
2005104094 | Nov 2005 | WO |
2005109240 | Nov 2005 | WO |
2006107840 | Oct 2006 | WO |
2006049204 | May 2008 | WO |
2008084427 | Jul 2008 | WO |
2010070770 | Jun 2010 | WO |
2010114123 | Oct 2010 | WO |
2010136459 | Dec 2010 | WO |
2011047887 | Apr 2011 | WO |
2011110499 | Sep 2011 | WO |
2012012414 | Jan 2012 | WO |
WO2012110482 | Aug 2012 | WO |
2013061530 | May 2013 | WO |
2013147666 | Oct 2013 | WO |
2013147668 | Oct 2013 | WO |
2015010949 | Jan 2015 | WO |
2013035257 | Mar 2015 | WO |
Entry |
---|
“Information technology—MPEG audio technologies—Part 3: Unified speech and audio coding”, ISO/IEC FDIS 23003-3:2011(E); ISO/IEC JTC 1/SC 29/WG 11; STD Version 2.1c2, Sep. 20, 2011, 291 pages. |
Brinker, A. et al., “An overview of the coding standard MPEG-4 audio amendments 1 and 2: HE-AAC, SSC, and HE-AAC v2”, EURASIP Journal on Audio, Speech, and Music Processing, 2009, Feb. 24, 2009, 24 pages. |
Annadana, R et al., “New Results in Low Bit Rate Speech Coding and Bandwidth Extension”, Audio Engineering Society Convention 121, Audio Engineering Society Convention Paper 6876, Oct. 5-8, 2006, pp. 1-6. |
Bosi, M et al., “ISO/IEC MPEG-2 Advanced Audio Coding”, J. Audio Eng. Soc., vol. 45, No. 10, Oct. 1997, pp. 789-814. |
Daudet, L et al., “MDCT analysis of sinusoids: exact results and applications to coding artifacts reduction”, IEEE Transactions on Speech and Audio Processing, IEEE, vol. 12, No. 3, May 2004, pp. 302-312. |
Dietz, M et al., “Spectral Band Replication, a Novel Approach in Audio Coding”, Engineering Society Convention 121, Audio Engineering Society Paper 5553, May 10-13, 2002, pp. 1-8. |
Ekstrand, P , “Bandwidth Extension of Audio Signals by Spectral Band Replication”, Proc.1st IEEE Benelux Workshop on Model based Processing and Coding of Audio (MPCA-2002), Nov. 15, 2002, pp. 53-58. |
Ferreira, A.J.S et al., “Accurate Spectral Replacement”, Audio Engineering Society Convention, 118, Audio Engineering Society Convention Paper No. 6383, May 28-31, 2005, pp. 1-11. |
Geiser, B et al., “Bandwidth Extension for Hierarchical Speech and Audio Coding in ITU-T Rec. G.729.1”, IEEE Transactions on Audio, Speech and Language Processing, IEEE Service Center, vol. 15, No. 8, Nov. 2007, pp. 2496-2509. |
Herre, J et al., “Extending the MPEG-4 AAC Codec by Perceptual Noise Substitution”, Audio Engineering Society Convention 104, Audio Engineering Society Preprint,, May 16-19, 1998, pp. 1-14. |
Herre, J , “Temporal Noise Shaping, Quantization and Coding Methods in Perceptual Auidio Coding: A Tutorial Introduction”, Audio Engineering Society Conference: 17th International Conference: High-Quality Audio Coding, Audio Engineering Society, Aug. 1, 1999, pp. 312-325. |
ISO/IEC 13818-3:1998(E), “Information Technology—Generic Coding of Moving Pictures and Associated Audio, Part 3: Audio”, Second Edition, ISO/IEC, Apr. 15, 1998, 132 pages. |
ISO/IEC 14496-3:2001, , “Information Technology—Coding of audio-visual objects—Part 3: Audio, Amendment 1: Bandwidth Extension”, ISO/IEC JTC1/SC29/WG11/N5570, ISO/IEC 14496-3:2001/FDAM 1:2003(E), Mar. 2003, 127 pages. |
ISO/IEC FDIS 23003-3:2011(E), , “Information Technology—MPEG audio technologies—Part 3: Unified speech and audio coding, Final Draft”, ISO/IEC, 2010, 286 pages. |
McAulay, R et al., “Speech Analysis/ Synthesis Based on a Sinusoidal Representation”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-34, No. 4, Aug. 1986, pp. 744-754. |
Mehrotra, Sanjeev et al., “Hybrid low bitrate audio coding using adaptive gain shape vector quantization”, Multimedia Signal Processing, 2008 IEEE 10th Workshop on, IEEE, Piscataway, NJ, USA, XP031356759 ISBN: 978-1-4344-3394-4, Oct. 8, 2008, pp. 927-932. |
Nagel, F et al., “A Continuous Modulated Single Sideband Bandwidth Extension”, ICASSP International Conference on Acoustics, Speech and Signal Processing, Apr. 2010, pp. 357-360. |
Nagel, F et al., “A Harmonic Bandwidth Extension Method for Audio Codecs”, International Conference on Acoustics, Speech and Signal Processing, XP002527507, Apr. 19, 2009, pp. 145-148. |
Neuendorf, M et al., “MPEG Unified Speech and Audio Coding—The ISO/MPEG Standard for High-Efficiency Audio Coding of all Content Types”, Audio Engineering Society Convention Paper 8654, Presented at the 132nd Convention, Apr. 26-29, 2012, pp. 1-22. |
Purnhagen, H et al., “HILN—the MPEG-4 parametric audio coding tools”, Proceedings ISCAS 2000 Geneva, The 2000 IEEE International Symposium on Circuits and Systems, May 28-31, 2000, pp. 201-204. |
Sinha, D. et al., “A Novel Integrated Audio Bandwidth Extension Toolkit (ABET)”, Audio Engineering Society Convention, Paris, France, May 2006. |
Smith, J.O. et al., “PARSHL: An analysis/synthesis program for non-harmonic sounds based on a sinusoidal representation”, Proceedings of the International Computer Music Conference, 1987. |
Zernicki, T et al., “Audio bandwidth extension by frequency scaling of sinusoidal partials”, Audio Engineering Society Convention, San Francisco, USA, Oct. 2-5, 2008. |
Number | Date | Country | |
---|---|---|---|
20160140980 A1 | May 2016 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2014/065118 | Jul 2014 | US |
Child | 15002350 | US |