The present invention relates to audio signal encoding and decoding and, in particular, to audio signal processing using parallel frequency domain and time domain encoder/decoder processors.
The perceptual coding of audio signals for the purpose of data reduction for efficient storage or transmission of these signals is a widely used practice. In particular when lowest bit rates are to be achieved, the employed coding leads to a reduction of audio quality that often is primarily caused by a limitation at the encoder side of the audio signal bandwidth to be transmitted. Here, typically the audio signal is low-pass filtered such that no spectral waveform content remains above a certain pre-determined cut-off frequency.
In contemporary codecs well-known methods exist for the decoder-side signal restoration through audio signal Bandwidth Extension (BWE), e.g. Spectral Band Replication (SBR) that operates in frequency domain or so-called Time Domain Bandwidth Extension (TD-BWE) being is a post-processor in speech coders that operates in time domain.
Additionally, several combined time domain/frequency domain coding concepts exist such as concepts known under the term AMR-WB+ or USAC.
All these combined time domain/coding concepts have in common that the frequency domain coder relies on bandwidth extension technologies which incur a band limitation into the input audio signal and the portion above a cross-over frequency or border frequency is encoded with a low resolution coding concept and synthesized on the decoder-side. Hence, such concepts mainly rely on a pre-processor technology on the encoder side and a corresponding post-processing functionality on the decoder-side.
Typically, the time domain encoder is selected for useful signals to be encoded in the time domain such as speech signals and the frequency domain encoder is selected for non-speech signals, music signals, etc. However, specifically for non-speech signals having prominent harmonics in the high frequency band, the known frequency domain encoders have a reduced accuracy and, therefore, a reduced audio quality due to the fact that such prominent harmonics can only be separately parametrically encoded or are eliminated at all in the encoding/decoding process.
Furthermore, concepts exist in which the time domain encoding/decoding branch additionally relies on the bandwidth extension which also parametrically encodes an upper frequency range while a lower frequency range is typically encoded using an ACELP or any other CELP related coder, for example a speech coder. This bandwidth extension functionality increases the bitrate efficiency but, on the other hand, introduces further inflexibility due to the fact that both encoding branches, i.e., the frequency domain encoding branch and the time domain encoding branch are band limited due to the bandwidth extension procedure or spectral band replication procedure operating above a certain crossover frequency substantially lower than the maximum frequency included in the input audio signal.
Relevant topics in the state-of-art comprise
The following papers and patents describe methods that are considered to constitute conventional technology for the application:
[1] M. Dietz, L. Liljeryd, K. Kjorling and O. Kunz, “Spectral Band Replication, a novel approach in audio coding,” in 112th AES Convention, Munich, Germany, 2002.
[2] S. Meltzer, R. Bohm and F. Henn, “SBR enhanced audio codecs for digital broadcasting such as “Digital Radio Mondiale” (DRM),” in 112th AES Convention, Munich, Germany, 2002.
[3] T. Ziegler, A. Ehret, P. Ekstrand and M. Lutzky, “Enhancing mp3 with SBR: Features and Capabilities of the new mp3PRO Algorithm,” in 112th AES Convention, Munich, Germany, 2002.
[4] MPEG-D USAC Standard.
[5] PCT/EP2014/065109.
In MPEG-D USAC, a switchable core coder is described. However, in USAC, the band-limited core is restricted to transmit a low-pass filtered signal. Therefore, certain music signals that contain prominent high frequency content e.g. full-band sweeps, triangle sounds, etc. cannot be reproduced faithfully.
According to an embodiment, an audio encoder for encoding an audio signal may have: a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor has a time frequency converter for converting the first audio signal portion into a frequency domain representation including spectral lines up to a maximum frequency of the first audio signal portion; a spectral encoder for encoding the frequency domain representation; a second encoding processor for encoding a second different audio signal portion in the time domain; a cross-processor for calculating, from the encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, so that the second encoding processing is initialized to encode the second audio signal portion immediately following the first audio signal portion in time in the audio signal; a controller configured for analyzing the audio signal and for determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and an encoded signal former for forming an encoded audio signal including a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion.
According to another embodiment, an audio decoder for decoding an encoded audio signal may have: a first decoding processor for decoding a first encoded audio signal portion in a frequency domain, wherein the first decoding processor has: a frequency-time converter for converting a decoded spectral representation into a time domain to acquire a decoded first audio signal portion; a second decoding processor for decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion; a cross-processor for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding processor, so that the second decoding processor is initialized to decode the encoded second audio signal portion following in time the first audio signal portion in the encoded audio signal; and a combiner for combining the decoded first spectral portion and the decoded second spectral portion to acquire a decoded audio signal.
According to another embodiment, a method of encoding an audio signal may have the steps of: encoding a first audio signal portion in a frequency domain, including: converting the first audio signal portion into a frequency domain representation including spectral lines up to a maximum frequency of the first audio signal portion; encoding the frequency domain representation; encoding a second different audio signal portion in the time domain; calculating, from the encoded spectral representation of the first audio signal portion, initialization data for the step of encoding the second different audio signal portion, so that the step of encoding the second different audio signal portion is initialized to encode the second audio signal portion immediately following the first audio signal portion in time in the audio signal; analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and forming an encoded audio signal including a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion.
According to another embodiment, a method of decoding an encoded audio signal may have the steps of: decoding a first encoded audio signal portion in a frequency domain, the first decoding processor including: converting a decoded spectral representation into a time domain to acquire a decoded first audio signal portion; decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion; calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the step of decoding the second encoded audio signal portion, so that the step of decoding the second encoded audio signal portion is initialized to decode the encoded second audio signal portion following in time the first audio signal portion in the encoded audio signal; and combining the decoded first spectral portion and the decoded second spectral portion to acquire a decoded audio signal.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method of encoding an audio signal, having the steps of: encoding a first audio signal portion in a frequency domain, including: converting the first audio signal portion into a frequency domain representation including spectral lines up to a maximum frequency of the first audio signal portion; encoding the frequency domain representation; encoding a second different audio signal portion in the time domain; calculating, from the encoded spectral representation of the first audio signal portion, initialization data for the step of encoding the second different audio signal portion, so that the step of encoding the second different audio signal portion is initialized to encode the second audio signal portion immediately following the first audio signal portion in time in the audio signal; analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and forming an encoded audio signal including a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion, when said computer program is run by a computer.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method of decoding an encoded audio signal, including: decoding a first encoded audio signal portion in a frequency domain, the first decoding processor including: converting a decoded spectral representation into a time domain to acquire a decoded first audio signal portion; decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion; calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the step of decoding the second encoded audio signal portion, so that the step of decoding the second encoded audio signal portion is initialized to decode the encoded second audio signal portion following in time the first audio signal portion in the encoded audio signal; and combining the decoded first spectral portion and the decoded second spectral portion to acquire a decoded audio signal, when said computer program is run by a computer.
The present invention is based on the finding that a time domain encoding/decoding processor can be combined with a frequency domain encoding/decoding processor having a gap filling functionality but this gap filling functionality for filling spectral holes is operated over the whole band of the audio signal or at least above a certain gap filling frequency. Importantly, the frequency domain encoding/decoding processor is particularly in the position to perform accurate or wave form or spectral value encoding/decoding up to the maximum frequency and not only until a crossover frequency. Furthermore, the full-band capability of the frequency domain encoder for encoding with the high resolution allows an integration of the gap filling functionality into the frequency domain encoder.
In one aspect, full band gap filling is combined with a time-domain encoding/decoding processor. In embodiments, the sampling rates in both branches are equal or the sampling rate in the time-domain encoder branch is lower than in the frequency domain branch.
In another aspect, a frequency domain encoder/decoder operating without gap filling but performing a full band core encoding/decoding is combined with a time-domain encoding processor and a cross processor is provided for continuous initialization of the time-domain encoding/decoding processor. In this aspect, the sampling rates can be as in the other aspect, or the sampling rates in the frequency domain branch are even lower than in the time-domain branch.
Hence, in accordance with the present invention by using the full-band spectral encoder/decoder processor, the problems related to the separation of the bandwidth extension on the one hand and the core coding on the other hand can be addressed and overcome by performing the bandwidth extension in the same spectral domain in which the core decoder operates. Therefore, a full rate core decoder is provided which encodes and decodes the full audio signal range. This does not require the need for a downsampler on the encoder side and an upsampler on the decoder side. Instead, the whole processing is performed in the full sampling rate or full-bandwidth domain. In order to obtain a high coding gain, the audio signal is analyzed in order to find a first set of first spectral portions which has to be encoded with a high resolution, where this first set of first spectral portions may include, in an embodiment, tonal portions of the audio signal. On the other hand, non-tonal or noisy components in the audio signal constituting a second set of second spectral portions are parametrically encoded with low spectral resolution. The encoded audio signal then only involves the first set of first spectral portions encoded in a waveform-preserving manner with a high spectral resolution and, additionally, the second set of second spectral portions encoded parametrically with a low resolution using frequency “tiles” sourced from the first set. On the decoder side, the core decoder, which is a full-band decoder, reconstructs the first set of first spectral portions in a waveform-preserving manner, i.e., without any knowledge that there is any additional frequency regeneration. However, the so generated spectrum has a lot of spectral gaps. These gaps are subsequently filled with the Intelligent Gap Filling (IGF) technology by using a frequency regeneration applying parametric data on the one hand and using a source spectral range, i.e., first spectral portions reconstructed by the full rate audio decoder on the other hand.
In further embodiments, spectral portions, which are reconstructed by noise filling only rather than bandwidth replication or frequency tile filling, constitute a third set of third spectral portions. Due to the fact that the coding concept operates in a single domain for the core coding/decoding on the one hand and the frequency regeneration on the other hand, the IGF is not only restricted to fill up a higher frequency range but can fill up lower frequency ranges, either by noise filling without frequency regeneration or by frequency regeneration using a frequency tile at a different frequency range.
Furthermore, it is emphasized that an information on spectral energies, an information on individual energies or an individual energy information, an information on a survive energy or a survive energy information, an information a tile energy or a tile energy information, or an information on a missing energy or a missing energy information may comprise not only an energy value, but also an (e.g. absolute) amplitude value, a level value or any other value, from which a final energy value can be derived. Hence, the information on an energy may e.g. comprise the energy value itself, and/or a value of a level and/or of an amplitude and/or of an absolute amplitude.
A further aspect is based on the finding that the correlation situation is not only important for the source range but is also important for the target range. Furthermore, the present invention acknowledges the situation that different correlation situations can occur in the source range and the target range. When, for example, a speech signal with high frequency noise is considered, the situation can be that the low frequency band comprising the speech signal with a small number of overtones is highly correlated in the left channel and the right channel, when the speaker is placed in the middle. The high frequency portion, however, can be strongly uncorrelated due to the fact that there might be a different high frequency noise on the left side compared to another high frequency noise or no high frequency noise on the right side. Thus, when a straightforward gap filling operation would be performed that ignores this situation, then the high frequency portion would be correlated as well, and this might generate serious spatial segregation artifacts in the reconstructed signal. In order to address this issue, parametric data for a reconstruction band or, generally, for the second set of second spectral portions which have to be reconstructed using a first set of first spectral portions is calculated to identify either a first or a second different two-channel representation for the second spectral portion or, stated differently, for the reconstruction band. On the encoder side, a two-channel identification is, therefore calculated for the second spectral portions, i.e., for the portions, for which, additionally, energy information for reconstruction bands is calculated. A frequency regenerator on the decoder side then regenerates a second spectral portion depending on a first portion of the first set of first spectral portions, i.e., the source range and parametric data for the second portion such as spectral envelope energy information or any other spectral envelope data and, additionally, dependent on the two-channel identification for the second portion, i.e., for this reconstruction band under reconsideration.
The two-channel identification is advantageously transmitted as a flag for each reconstruction band and this data is transmitted from an encoder to a decoder and the decoder then decodes the core signal as indicated by advantageously calculated flags for the core bands. Then, in an implementation, the core signal is stored in both stereo representations (e.g. left/right and mid/side) and, for the IGF frequency tile filling, the source tile representation is chosen to fit the target tile representation as indicated by the two-channel identification flags for the intelligent gap filling or reconstruction bands, i.e., for the target range.
It is emphasized that this procedure not only works for stereo signals, i.e., for a left channel and the right channel but also operates for multi-channel signals. In the case of multi-channel signals, several pairs of different channels can be processed in that way such as a left and a right channel as a first pair, a left surround channel and a right surround as the second pair and a center channel and an LFE channel as the third pair. Other pairings can be determined for higher output channel formats such as 7.1, 11.1 and so on.
A further aspect is based on the finding that the audio quality of the reconstructed signal can be improved through IGF since the whole spectrum is accessible to the core encoder so that, for example, perceptually important tonal portions in a high spectral range can still be encoded by the core coder rather than parametric substitution. Additionally, a gap filling operation using frequency tiles from a first set of first spectral portions which is, for example, a set of tonal portions typically from a lower frequency range, but also from a higher frequency range if available, is performed. For the spectral envelope adjustment on the decoder side, however, the spectral portions from the first set of spectral portions located in the reconstruction band are not further post-processed by e.g. the spectral envelope adjustment. Only the remaining spectral values in the reconstruction band which do not originate from the core decoder are to be envelope adjusted using envelope information. Advantageously, the envelope information is a full-band envelope information accounting for the energy of the first set of first spectral portions in the reconstruction band and the second set of second spectral portions in the same reconstruction band, where the latter spectral values in the second set of second spectral portions are indicated to be zero and are, therefore, not encoded by the core encoder, but are parametrically coded with low resolution energy information.
It has been found that absolute energy values, either normalized with respect to the bandwidth of the corresponding band or not normalized, are useful and very efficient in an application on the decoder side. This especially applies when gain factors have to be calculated based on a residual energy in the reconstruction band, the missing energy in the reconstruction band and frequency tile information in the reconstruction band.
Furthermore, it is advantageous that the encoded bitstream not only covers energy information for the reconstruction bands but, additionally, scale factors for scale factor bands extending up to the maximum frequency. This ensures that for each reconstruction band, for which a certain tonal portion, i.e., a first spectral portion is available, this first set of first spectral portion can actually be decoded with the right amplitude. Furthermore, in addition to the scale factor for each reconstruction band, an energy for this reconstruction band is generated in an encoder and transmitted to a decoder. Furthermore, it is advantageous that the reconstruction bands coincide with the scale factor bands or in case of energy grouping, at least the borders of a reconstruction band coincide with borders of scale factor bands.
A further implementation of this invention applies a tile whitening operation. Whitening of a spectrum removes the coarse spectral envelope information and emphasizes the spectral fine structure which is of foremost interest for evaluating tile similarity. Therefore, a frequency tile on the one hand and/or the source signal on the other hand are whitened before calculating a cross correlation measure. When only the tile is whitened using a predefined procedure, a whitening flag is transmitted indicating to the decoder that the same predefined whitening process shall be applied to the frequency tile within IGF.
Regarding the tile selection, it is advantageous to use the lag of the correlation to spectrally shift the regenerated spectrum by an integer number of transform bins. Depending on the underlying transform, the spectral shifting may involve addition corrections. In case of odd lags, the tile is additionally modulated through multiplication by an alternating temporal sequence of −1/1 to compensate for the frequency-reversed representation of every other band within the MDCT. Furthermore, the sign of the correlation result is applied when generating the frequency tile.
Furthermore, it is advantageous to use tile pruning and stabilization in order to make sure that artifacts created by fast changing source regions for the same reconstruction region or target region are avoided. To this end, a similarity analysis among the different identified source regions is performed and when a source tile is similar to other source tiles with a similarity above a threshold, then this source tile can be dropped from the set of potential source tiles since it is highly correlated with other source tiles. Furthermore, as a kind of tile selection stabilization, it is advantageous to keep the tile order from the previous frame if none of the source tiles in the current frame correlate (better than a given threshold) with the target tiles in the current frame.
A further aspect is based on the finding that an improved quality and reduced bitrate specifically for signals comprising transient portions as they occur very often in audio signals is obtained by combining the Temporal Noise Shaping (TNS) or Temporal Tile Shaping (TTS) technology with high frequency reconstruction. The TNS/TTS processing on the encoder-side being implemented by a prediction over frequency reconstructs the time envelope of the audio signal. Depending on the implementation, i.e., when the temporal noise shaping filter is determined within a frequency range not only covering the source frequency range but also the target frequency range to be reconstructed in a frequency regeneration decoder, the temporal envelope is not only applied to the core audio signal up to a gap filling start frequency, but the temporal envelope is also applied to the spectral ranges of reconstructed second spectral portions. Thus, pre-echoes or post-echoes that would occur without temporal tile shaping are reduced or eliminated. This is accomplished by applying an inverse prediction over frequency not only within the core frequency range up to a certain gap filling start frequency but also within a frequency range above the core frequency range. To this end, the frequency regeneration or frequency tile generation is performed on the decoder-side before applying a prediction over frequency. However, the prediction over frequency can either be applied before or subsequent to spectral envelope shaping depending on whether the energy information calculation has been performed on the spectral residual values subsequent to filtering or to the (full) spectral values before envelope shaping.
The TTS processing over one or more frequency tiles additionally establishes a continuity of correlation between the source range and the reconstruction range or in two adjacent reconstruction ranges or frequency tiles.
In an implementation, it is advantageous to use complex TNS/TTS filtering. Thereby, the (temporal) aliasing artifacts of a critically sampled real representation, like MDCT, are avoided. A complex TNS filter can be calculated on the encoder-side by applying not only a modified discrete cosine transform but also a modified discrete sine transform in addition to obtain a complex modified transform. Nevertheless, only the modified discrete cosine transform values, i.e., the real part of the complex transform is transmitted. On the decoder-side, however, it is possible to estimate the imaginary part of the transform using MDCT spectra of preceding or subsequent frames so that, on the decoder-side, the complex filter can be again applied in the inverse prediction over frequency and, specifically, the prediction over the border between the source range and the reconstruction range and also over the border between frequency-adjacent frequency tiles within the reconstruction range.
The inventive audio coding system efficiently codes arbitrary audio signals at a wide range of bitrates. Whereas, for high bitrates, the inventive system converges to transparency, for low bitrates perceptual annoyance is minimized. Therefore, the main share of available bitrate is used to waveform code just the perceptually most relevant structure of the signal in the encoder, and the resulting spectral gaps are filled in the decoder with signal content that roughly approximates the original spectrum. A very limited bit budget is consumed to control the parameter driven so-called spectral Intelligent Gap Filling (IGF) by dedicated side information transmitted from the encoder to the decoder.
In further embodiments, the time domain encoding/decoding processor relies on a lower sampling rate and the corresponding bandwidth extension functionality.
In further embodiments, a cross-processor is provided in order to initialize the time domain encoder/decoder with initialization data derived from the currently processed frequency domain encoder/decoder signal This allows that when the currently processed audio signal portion is processed by the frequency domain encoder, the parallel time domain encoder is initialized so that when a switch from the frequency domain encoder to a time domain encoder takes place, this time domain encoder can immediately start processing since all the initialization data relating to earlier signals are already there due to the cross-processor. This cross-processor is advantageously applied on the encoder-side and, additionally, on the decoder-side and advantageously uses a frequency-time transform which additionally performs a very efficient downsampling from the higher output or input sampling rate into the lower time domain core coder sampling rate by only selecting a certain low band portion of the domain signal together with a certain reduced transform size. Thus, a sample rate conversion from the high sampling rate to the low sampling rate is very efficiently performed and this signal obtained by the transform with the reduced transform size can then be used for initializing the time domain encoder/decoder so that the time domain encoder/decoder is ready to immediately perform time domain encoding when this situation is signaled by a controller and the immediately preceding audio signal portion was encoded in the frequency domain.
As outlined, the cross-processor embodiment may rely on gap filling in the frequency domain or not. Hence, a time- and frequency domain encoder/decoder are combined via the cross-processor, and the frequency domain encoder/decoder may rely on gap filling or not. Specifically, certain embodiments as outlined are advantageous:
These embodiments employ gap filling in the frequency domain and have the following sampling rate figures and may or may not rely on the cross-processor technology:
Input SR=8 kHz, ACELP (time domain) SR=12.8 kHz.
Input SR=16 kHz, ACELP SR=12.8 kHz.
Input SR=16 kHz, ACELP SR=16.0 kHz
Input SR=32.0 kHz, ACELP SR=16.0 kHzl
Input SR=48 kHz, ACELP SR=16 kHz
These embodiments may or may not employ gap filling in the frequency domain and have the following sampling rate figures and rely on the cross-processor technology:
TCX SR is lower than the ACELP SR (8 kHz vs. 12.8 kHz), or where TCX and ACELP run both at 16.0 kHz, and where any gap filling is not used.
Hence, advantageous embodiments of the present invention allow a seamless switching of a perceptual audio coder comprising spectral gap filling and a time domain encoder with or without bandwidth extension.
Hence, the present invention relies on methods that are not restricted to removing the high frequency content above a cut-off frequency in the frequency domain encoder from the audio signal but rather signal-adaptively removes spectral band-pass regions leaving spectral gaps in the encoder and subsequently reconstructs these spectral gaps in the decoder. Advantageously, an integrated solution such as intelligent gap filling is used that efficiently combines full-bandwidth audio coding and spectral gap filling particularly in the MDCT transform domain.
Hence, the present invention provides an improved concept for combining speech coding and a subsequent time domain bandwidth extension with a full-band wave form decoding comprising spectral gap filling into a switchable perceptual encoder/decoder.
Hence, in contrast to already existing methods, the new concept utilizes full-band audio signal wave form coding in the transform domain coder and at the same time allows a seamless switching to a speech coder advantageously followed by a time domain bandwidth extension.
Further embodiments of the present invention avoid the explained problems that occur due to a fixed band limitation. The concept enables the switchable combination of a full-band wave form coder in the frequency domain equipped with a spectral gap filling and a lower sampling rate speech coder and a time domain bandwidth extension. Such a coder is capable of wave form coding the aforementioned problematic signals providing full audio bandwidth up to the Nyquist frequency of the audio input signal. Nevertheless, seamless instant switching between both coding strategies is guaranteed particularly by the embodiments having the cross-processor. For this seamless switching, the cross-processor represents a cross connection at both encoder and decoder between the full-band capable full-rate (input sampling rate) frequency domain encoder and the low-rate ACELP coder having a lower sampling rate to properly initialize the ACELP parameters and buffers particularly within the adaptive codebook, the LPC filter or the resampling stage, when switching from the frequency domain coder such as TCX to the time domain encoder such as ACELP.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
The audio encoder of
Hence, the controller 620 makes sure that for a single audio signal portion only a time domain representation or a frequency domain representation is in the encoded signal. This can be accomplished by the controller 620 in several ways. One way would be that, for one and the same audio signal portion, both representations arrive at block 630 and the controller 620 controls the encoded signal former 630 to only introduce one of both representations into the encoded signal. Alternatively, however, the controller 620 can control an input into the first encoding processor and an input into the second encoding processor so that, based on the analysis of the corresponding signal portion, only one of both blocks 600 or 610 is activated to actually perform the full encoding operation and the other block is deactivated.
This deactivation can be a deactivation or, as illustrated with respect to, for example,
In the further specific implementation of the second encoding processor operating in the time domain, the second encoding processor comprises a downsampler 900 or sampling rate converter for converting the audio signal portion into a representation with a lower sampling rate, wherein the lower sampling rate is lower than a sampling rate at the input into the first encoding processor. This is illustrated in
In a further embodiment of the present invention the audio encoder additionally comprises, although not illustrated in
Particularly, the preprocessor comprises a transient detector 1020, and the first branch is “opened” by a resampler 1021 to e.g. 12.8 kHz, followed by a preemphasis stage 1005a, an LPC analyzer 1002a, a weighted analysis filtering stage 1022a, and an FFT/Noise estimator/Voice Activity Detection (VAD) or Pitch Search stage 1007.
The second branch is “opened” by a resampler 1004 to e.g. 12.8 kHz or 16 kHz, i.e., to the ACELP Sampling Rate, followed by a preemphasis stage 1005b, an LPC analyzer 1002b, a weighted analysis filtering stage 1022b, and a TCX LTP parameter extraction stage 1024. Block 1024 provides its output to the bitstream multiplexor. Block 1002 is connected to an LPC quantizer 1010 controlled by the ACELP/TCX decision, and the block 1010 is also connected to the bitstream multiplexor.
Other embodiments can alternatively comprise only a single branch or more branches. In an embodiment, this preprocessor comprises a prediction analyzer for determining prediction coefficients. This prediction analyzer can be implemented as an LPC (linear prediction coding) analyzer for determining LPC coefficients. However, other analyzers can be implemented as well. Furthermore, the preprocessor in the alternative embodiment may comprise a prediction coefficient quantizer, wherein this device receives prediction coefficient data from the prediction analyzer.
Advantageously, however, the LPC quantizer is not necessarily part of the preprocessor, and it is implemented as part of the main encoding routine, i.e. not part of the preprocessor.
Furthermore, the preprocessor may additionally comprise an entropy coder for generating an encoded version of the quantized prediction coefficients. It is important to note that the encoded signal former 630 or the specific implementation, i.e., the bit stream multiplexer 630 makes sure that the encoded version of the quantized prediction coefficients is included into the encoded audio signal 632. Advantageously, the LPC coefficients are not directly quantized but are converted into an ISF representation, for example, or any other representation better suited for quantization. This conversion is advantageously performed either by the determine LPC coefficients block or is performed within the block for quantizing the LPC coefficients.
Furthermore, the preprocessor may comprise a resampler for resampling an audio input signal at an input sampling rate into a lower sampling rate for the time domain encoder. When the time domain encoder is an ACELP encoder having a certain ACELP sampling rate then the down sampling is performed to advantageously either 12.8 kHz or 16 kHz. The input sampling rate can be any of a particular number of sampling rates such as 32 kHz or an even higher sampling rate. On the other hand, the sampling rate of the time domain encoder will be predetermined by certain restrictions and the resampler 1004 performs this resampling and outputs the lower sampling rate representation of the input signal. Hence, the resampler can perform a similar functionality and can even be one and the same element as the downsampler 900 illustrated in the context of
Furthermore, it is advantageous to apply a pre-emphasis in the pre-emphasis block. The pre-emphasis processing is well-known in the art of time domain encoding and is described in literature referring to the AMR-WB+ processing and the pre-emphasis is particularly configured for compensating for a spectral tilt and, therefore, allows a better calculation of LPC parameters at a given LPC order.
Furthermore, the preprocessor may additionally comprise a TCX-LTP parameter extraction for controlling an LTP post filter illustrated at 1420 in
As illustrated, the result of block 1024 is input into the encoded signal, i.e., is in the embodiment of
Hence, to summarize, common to both paths is a preprocessing operation 1000 in which commonly used signal processing operations are performed. These comprise a resampling to an ACELP sampling rate (12.8 or 16 kHz) for one parallel path and this resampling is performed. Furthermore, a TCX LTP parameter extraction illustrated at block 1006 is performed and, additionally, a pre-emphasis and a determination of LPC coefficients is performed. As outlined, the pre-emphasis compensates for the spectral tilt and, therefore, makes the calculation of LPC parameters at a given LPC order more efficient.
Subsequently, reference is made to
Based on this audio signal portion, the controller 620 addresses a frequency domain encoder simulator 621 and a time domain encoder simulator 622 in order to calculate for each encoder possibility an estimated signal to noise ratio. Subsequently, the selector 623 selects the encoder which has provided the better signal to noise ratio, naturally under the consideration of a predefined bit rate. The selector then identifies the corresponding encoder via the control output. When it is determined that the audio signal portion under consideration is to be encoded using the frequency domain encoder, the time domain encoder is set into an initialization state or in other embodiments not requiring a very instant switching in a completely deactivated state. However, when it is determined that the audio signal portion under consideration is to be encoded by the time domain encoder, the frequency domain encoder is then deactivated.
Subsequently, an advantageous implementation of the controller illustrated in
In case the TCX branch is chosen, a TCX decoder is run in each frame which outputs a signal at the ACELP sampling rate. This is used to update the memories used for the ACELP encoding path (LPC residual, Mem w0, Memory deemphasis), to enable instant switching from TCX to ACELP. The memory update is performed in each TCX path.
Alternatively, a full analysis by synthesis process can performed, i.e., both encoder simulators 621, 622 implement the actual encoding operations and the results are compared by the selector 623. Alternatively, again, a complete feed forward calculation can be done by performing a signal analysis. For example, when it is determined that the signal is a speech signal by a signal classifier the time domain encoder is selected and when it is determined that the signal is a music signal then the frequency domain encoder is selected. Other procedures in order to distinguish between both encoders based on a signal analysis of the audio signal portion under consideration can also be applied.
Advantageously, the audio encoder additionally comprises a cross-processor 700 illustrated in
Hence, the time domain encoder 610 is configured to be initialized by the initialization data in order to encode an audio signal portion following an earlier audio signal portion encoded by the frequency domain encoder 600 in an efficient manner.
In particular, the cross-processor comprises a frequency-time converter for converting a frequency domain representation into a time domain representation which can be forwarded to the time domain encoder directly or after some further processing. This converter is illustrated in
In other embodiments, such as narrow-band operating modes with 8 kHz input sampling rate, the TCX branch operates at 8 kHz, whereas ACELP still runs at 12.8 kHz. I.e. the ACELP SR is not always lower than the TCX sampling rate. For 16 kHz input sampling rate (wideband), there are also scenarios where ACELP runs at the same sampling rate as TCX, i.e. both at 16 kHz. In a super wideband mode (SWB) the input sampling rate is at 32 or 48 kHz.
The ratio of the time domain coder sampling rate or ACELP sampling rate and the frequency domain coder sampling rate or input sampling rate can be calculated and is a downsampling factor DS illustrated in
For a downsampling factor greater than one, i.e., for an actual downsampling, the block 602 has a large transform size and the IMDCT block 702 has a small transform size. As illustrated in 7B, the IMDCT block 702 therefore comprises a selector 726 for selecting the lower spectral portion of an input into the IMDCT block 702. The portion of the full-band spectrum is defined by the downsampling factor DS. For example, when the lower sampling rate is 16 kHz and the input sampling rate is 32 kHz then the downsampling factor is 2.0 and, therefore, the selector 726 selects the lower half of the full-band spectrum. When the spectrum has, for example, 1024 MDCT lines then the selector selects the lower 512 MDCT lines.
This low frequency portion of the full-band spectrum is input into a small size transform and foldout block 720, as illustrated in
Thus, a very efficient downsampling operation can be applied since the downsampling is included in the IMDCT implementation. In this context, it is emphasized that the block 702 can be implemented by an IMDCT but can also be implemented by any other transform or filterbank implementation which can be suitably sized in the actual transform kernel and other transform related operations.
For a downsampling factor lower than one, i.e., for an actual upsampling, the notation in
The block 602 has a small transform size and the IMDCT block 702 has a large transform size. As illustrated in
This frequency portion of the full-band spectrum is input into a then large size transform and foldout block 720, as illustrated in
Thus, a very efficient upsampling operation can be applied since the upsampling is included in the IMDCT implementation. In this context, it is emphasized that the block 702 can be implemented by an IMDCT but can also be implemented by any other transform or filterbank implementation which can be suitably sized in the actual transform kernel and other transform related operations.
Generally, it is outlined that a definition of a sample rate in the frequency domain needs some explanation. Spectral bands are often downsampled. Hence, the notion of an effective sampling rate or an “associated” sample or sampling rate is used. In case of a filterbank/transform the effective sample rate would be defined as Fs_eff=subbandsamplerate*num_subbands
In a further embodiment illustrated in
Furthermore, the frequency domain encoder advantageously comprises a noise shaping block 606a. The noise shaping block 606a is controlled by quantized LPC coefficients as generated by block 1010. The quantized LPC coefficients used for noise shaping 606a perform a spectral shaping of the high resolution spectral values or spectral lines directly encoded (rather than parametrically encoded) and the result of block 606a is similar to the spectrum of a signal subsequent to an LPC filtering stage operating in the time domain such as an LPC analysis filtering block 704 to be described later on. Furthermore, the result of the noise shaping block 606a is then quantized and entropy coded as indicated by block 606b. The result of block 606b corresponds to the encoded first audio signal portion or a frequency domain coded audio signal portion (together with other side information).
The cross-processor 700 comprises a spectral decoder for calculating a decoded version of the first encoded signal portion. In the embodiment of
Furthermore, the cross-processor 700 may comprise in addition or alternatively a weighted prediction coefficient analysis filtering stage 708 for filtering the decoded version and for feeding a filtered decoded version to a codebook determinator 613 indicated as “MMSE” in
The time domain encoder processor 610 comprises, as illustrated in
Furthermore, an ACELP gains/coding stage 615 is provided in series to the innovative codebook stage 614 and the result of this block is input into a codebook determinator 613 indicated as MMSE in
As illustrated, several blocks of the time domain decoder depend on previous signals and these blocks are the adaptive codebook block 612, the codebook determinator 613, the LPC synthesis filtering block 616 and the de-emphasis block 617. These blocks are provided with data from the cross-processor derived from the frequency domain encoding processor data in order to initialize these blocks for the purpose of being ready for an instant switch from the frequency domain encoder to the time domain encoder. As can also be seen from
The advantageous audio decoder in
For ACELP initialization when switching from TCX to ACELP, a cross path (consisting of a shared TCX decoder frontend but additionally providing output at the lower sampling rate and some post-processing) exists that performs the inventive ACELP initialization. Sharing the same sampling rate and filter order between TCX and ACELP in the LPCs allows for an easier and more efficient ACELP initialization.
For visualizing the switching, two switches are sketched in 14B. While the second switch 1160 downstream chooses between TCX/IGF or ACELP/TD-BWE output, the first switch 1480 either pre-updates the buffers in the resampling QMF stage downstream the ACELP path by the output of the cross path or simply passes on the ACELP output.
Subsequently, audio decoder implementations in accordance with aspects of the present invention are discussed in the context of
An audio decoder for decoding an encoded audio signal 1101 comprises a first decoding processor 1120 for decoding a first encoded audio signal portion in a frequency domain. The first decoding processor 1120 comprises a spectral decoder 1122 for decoding first spectral regions with a high spectral resolution and for synthesizing second spectral regions using a parametric representation of the second spectral regions and at least a decoded first spectral region to obtain a decoded spectral representation. The decoded spectral representation is a full-band decoded spectral representation as discussed in the context of
Furthermore, the audio decoder comprises a second decoding processor 1140 for decoding the second encoded audio signal portion in the time domain to obtain a decoded second signal portion. Furthermore, the audio decoder comprises a combiner 1160 for combining the decoded first signal portion and the decoded second signal portion to obtain a decoded audio signal. The decoded signal portions are combined in sequence which is also illustrated in
Advantageously, the second decoding processor 1140 contains a time domain bandwidth extension processor 1220 and comprises, as illustrated in
Subsequently, an advantageous implementation of the upsampler 1210 of
Further processing operations can be performed within the QMF domain in addition or instead of the bandpass filtering 1472. If no processing is performed at all, then the QMF analysis and the QMF synthesis constitute an efficient upsampler 1210.
Subsequently, the construction of the individual elements in
The full-band frequency domain decoder 1120 comprises a first decoding block 1122a for decoding the high resolution spectral coefficients and for additionally performing noise filling in the low band portion as known, for example, from the USAC technology. Furthermore, the full-band decoder comprises an IGF processor 1122b for filling the spectral holes using synthesized spectral values which have been encoded only parametrically and, therefore, encoded with a low resolution on the encoder-side. Then, in block 1122c, an inverse noise shaping is performed and the result is input into a TNS/TTS synthesis block 705 which provides, as a final output, an input to a frequency-time converter 1124, which is advantageously implemented as an inverse modified discrete cosine transform operating at the output, i.e., high sampling rate.
Furthermore, a harmonic or LTP post-filter is used which is controlled by data obtained by the TCX LTP parameter extraction block 1006 in
Several elements in
The time domain decoding processor 1140 advantageously comprises the ACELP or time domain low band decoder 1200 comprising an ACELP decoder stage 1149 for obtaining decoded gains and the innovative codebook information. Additionally, an ACELP adaptive codebook stage 1141 is provided and a subsequent ACELP post-processing stage 1142 and a final synthesis filter such as LPC synthesis filter 1143, which is again controlled by the quantized LPC coefficients 1145 obtained from the bitstream demultiplexer 1100 corresponding to the encoded signal parser 1100 in
In accordance with embodiments of the present invention, the audio decoder additionally comprises the cross-processor 1170 illustrated in
Advantageously, the cross-processor 1170 comprises an additional frequency-time converter 1171 operating at a lower sampling rate than the frequency-time converter of the first decoding processor in order to obtain a further decoded first signal portion in the time domain to be used as the initialization signal or for which any initialization data can be derived. Advantageously, this IMDCT or low sampling rate frequency-time converter is implemented as illustrated in
As illustrated in FIG., the cross-processor 1170 further comprises, alone or in addition to other elements, a delay stage 1172 for delaying the further decoded first signal portion and for feeding the delayed decoded first signal portion into a de-emphasis stage 1144 of the second decoding processor for initialization. Furthermore, the cross-processor comprises, in addition or alternatively, a pre-emphasis filter 1173 and a delay stage 1175 for filtering and delaying a further decoded first signal portion and for providing the delayed output of block 1175 into an LPC synthesis filtering stage 1143 of the ACELP decoder for the purpose of initialization.
Furthermore, the cross-processor may comprise alternatively or in addition to the other mentioned elements an LPC analysis filter 1174 for generating a prediction residual signal from the further decoded first signal portion or a pre-emphasized further decoded first signal portion and for feeding the data into a codebook synthesizer of the second decoding processor and advantageously, into the adaptive codebook stage 1141. Furthermore, the output of the frequency-time converter 1171 with the low sampling rate is also input into the QMF analysis stage 1471 of the upsampler 1210 for the purpose of initialization, i.e., when the currently decoded audio signal portion is delivered by the frequency domain full-band decoder 1120.
The advantageous audio decoder is described in the following: The waveform decoder part consists of a full-band TCX decoder path with IGF both operating at the input sampling rate of the codec. In parallel, an alternative ACELP decoder path at lower sampling rate exists that is reinforced further downstream by a TD-BWE.
For ACELP initialization when switching from TCX to ACELP, a cross path (consisting of a shared TCX decoder frontend but additionally providing output at the lower sampling rate and some post-processing) exists that performs the inventive ACELP initialization. Sharing the same sampling rate and filter order between TCX and ACELP in the LPCs allows for an easier and more efficient ACELP initialization.
For visualizing the switching, two switches are sketched in
To summarize, advantageous aspects of the invention which can be used alone or in combination relate to a combination of an ACELP and TD-BWE coder with a full-band capable TCX/IGF technology advantageously associated with using a cross signal.
A further specific feature is a cross signal path for the ACELP initialization to enable seamless switching.
A further aspect is that a short IMDCT is fed with a lower part of high-rate long MDCT coefficients to efficiently implement a sample rate conversion in the cross-path.
A further feature is an efficient realization of the cross-path partly shared with a full-band TCX/IGF in the decoder.
A further feature is the cross signal path for the QMF initialization to enable seamless switching from TCX to ACELP.
An additional feature is a cross-signal path to the QMF allowing compensating the delay gap between ACELP resampled output and a filterbank-TCX/IGF output when switching from ACELP to TCX.
A further aspect is that an LPC is provided for both the TCX and the ACELP coder at the same sampling rate and filter order, although the TCX/IGF encoder/decoder is full-band capable.
Subsequently,
Generally, the time domain decoder comprises an ACELP decoder, a subsequently connected resampler or upsampler and a time domain bandwidth extension functionality. Particularly, the ACELP decoder comprises an ACELP decoding stage for restoring gains and the innovative codebook 1149, an ACELP-adaptive codebook stage 1141, an ACELP post-processor 1142, an LPC synthesis filter 1143 controlled by quantized LPC coefficients from a bitstream demultiplexer or encoded signal parser and the subsequently connected de-emphasis stage 1144. Advantageously, the decoded time domain signal being at an ACELP sampling rate is input, alongside with control data from the bitstream, into a time domain bandwidth extension decoder 1220, which provides a high band at the outputs.
In order to upsample the de-emphasis 1144 output, an upsampler comprising the QMF analysis block 1471, and the QMF synthesis block 1473 are provided. Within the filterbank domain defined by blocks 1471 and 1473, a bandpass filter is advantageously applied. Particularly, as has been discussed before, the same functionalities can also be used which have been discussed with respect to the same reference numbers. Furthermore, the time domain bandwidth extension decoder 1220 can be implemented as illustrated in
Subsequently, further details with respect to the frequency domain encoder and decoder being full-band capable are discussed with respect to
Typically, a first spectral portion such as 306 of
The decoder further comprises a frequency regenerator 116 for regenerating a reconstructed second spectral portion having the first spectral resolution using a first spectral portion. The frequency regenerator 116 performs a tile filling operation, i.e., uses a tile or portion of the first set of first spectral portions and copies this first set of first spectral portions into the reconstruction range or reconstruction band having the second spectral portion and typically performs spectral envelope shaping or another operation as indicated by the decoded second representation output by the parametric decoder 114, i.e., by using the information on the second set of second spectral portions. The decoded first set of first spectral portions and the reconstructed second set of spectral portions as indicated at the output of the frequency regenerator 116 on line 117 is input into a spectrum-time converter 118 configured for converting the first decoded representation and the reconstructed second spectral portion into a time representation 119, the time representation having a certain high sampling rate.
The spectral analyzer/tonal mask 226 separates the output of TNS block 222 into the core band and the tonal components corresponding to the first set of first spectral portions 103 and the residual components corresponding to the second set of second spectral portions 105 of
Advantageously, the analysis filterbank 222 is implemented as an MDCT (modified discrete cosine transform filterbank) and the MDCT is used to transform the signal 99 into a time-frequency domain with the modified discrete cosine transform acting as the frequency analysis tool.
The spectral analyzer 226 advantageously applies a tonality mask. This tonality mask estimation stage is used to separate tonal components from the noise-like components in the signal. This allows the core coder 228 to code all tonal components with a psycho-acoustic module.
This method has certain advantages over the classical SBR [1] in that the harmonic grid of a multi-tone signal is preserved by the core coder while only the gaps between the sinusoids is filled with the best matching “shaped noise” from the source region.
In case of stereo channel pairs an additional joint stereo processing is applied. This is used because for a certain destination range the signal can a highly correlated panned sound source. In case the source regions chosen for this particular region are not well correlated, although the energies are matched for the destination regions, the spatial image can suffer due to the uncorrelated source regions. The encoder analyses each destination region energy band, typically performing a cross-correlation of the spectral values and if a certain threshold is exceeded, sets a joint flag for this energy band. In the decoder the left and right channel energy bands are treated individually if this joint stereo flag is not set. In case the joint stereo flag is set, both the energies and the patching are performed in the joint stereo domain. The joint stereo information for the IGF regions is signaled similar the joint stereo information for the core coding, including a flag indicating in case of prediction if the direction of the prediction is from downmix to residual or vice versa.
The energies can be calculated from the transmitted energies in the UR-domain.
midNrg[k]=leftNrg[k]+rightNrg[k]
sideNrg[k]=leftNrg[k]−rightNrg[k]
with k being the frequency index in the transform domain.
Another solution is to calculate and transmit the energies directly in the joint stereo domain for bands where joint stereo is active, so no additional energy transformation is needed at the decoder side.
The source tiles are created according to the Mid/Side-Matrix:
midTile[k]=0.5·(leftTile[k]+rightTile[k])
sideTile[k]=0.5·(leftTile[k]−rightTile[k])
Energy adjustment:
midTile[k]=midTile[k]*midNrg[k];
sideTile[k]−sideTile[k]*sideNrg[k];
Joint stereo→LR transformation:
If no additional prediction parameter is coded:
leftTile[k]=midTile[k]+sideTile[k]
rightTile[k]=midTile[k]−sideTile[k]
If an additional prediction parameter is coded and if the signalled direction is from mid to side:
sideTile[k]=sideTile[k]−predictionCoeff·midTile[k]
leftTile[k]=midTile[k]+sideTile[k]
rightTile[k]=midTile[k]−sideTile[k]
If the signalled direction is from side to mid:
midTile[k]=midTile[k]−predictionCoeff·sideTile[k]
leftTile[k]=midTile[k]−sideTile[k]
rightTile[k]=midTile[k]+sideTile[k]
This processing ensures that from the tiles used for regenerating highly correlated destination regions and panned destination regions, the resulting left and right channels still represent a correlated and panned sound source even if the source regions are not correlated, preserving the stereo image for such regions.
In other words, in the bitstream, joint stereo flags are transmitted that indicate whether L/R or M/S as an example for the general joint stereo coding shall be used. In the decoder, first, the core signal is decoded as indicated by the joint stereo flags for the core bands. Second, the core signal is stored in both L/R and M/S representation. For the IGF tile filling, the source tile representation is chosen to fit the target tile representation as indicated by the joint stereo information for the IGF bands.
Temporal Noise Shaping (TNS) is a standard technique and part of AAC. TNS can be considered as an extension of the basic scheme of a perceptual coder, inserting an optional processing step between the filterbank and the quantization stage. The main task of the TNS module is to hide the produced quantization noise in the temporal masking region of transient like signals and thus it leads to a more efficient coding scheme. First, TNS calculates a set of prediction coefficients using “forward prediction” in the transform domain, e.g. MDCT. These coefficients are then used for flattening the temporal envelope of the signal. As the quantization affects the TNS filtered spectrum, also the quantization noise is temporarily flat. By applying the invers TNS filtering on decoder side, the quantization noise is shaped according to the temporal envelope of the TNS filter and therefore the quantization noise gets masked by the transient.
IGF is based on an MDCT representation. For efficient coding, advantageously long blocks of approx. 20 ms have to be used. If the signal within such a long block contains transients, audible pre- and post-echoes occur in the IGF spectral bands due to the tile filling.
This pre-echo effect is reduced by using TNS in the IGF context. Here, TNS is used as a temporal tile shaping (TTS) tool as the spectral regeneration in the decoder is performed on the TNS residual signal. The involved TTS prediction coefficients are calculated and applied using the full spectrum on encoder side as usual. The TNS/TTS start and stop frequencies are not affected by the IGF start frequency of the IGF tool. In comparison to the legacy TNS, the TTS stop frequency is increased to the stop frequency of the IGF tool, which is higher than fIGFstart. On decoder side the TNS/TTS coefficients are applied on the full spectrum again, i.e. the core spectrum plus the regenerated spectrum plus the tonal components from the tonality mask (see
In legacy decoders, spectral patching on an audio signal corrupts spectral correlation at the patch borders and thereby impairs the temporal envelope of the audio signal by introducing dispersion. Hence, another benefit of performing the IGF tile filling on the residual signal is that, after application of the shaping filter, tile borders are seamlessly correlated, resulting in a more faithful temporal reproduction of the signal.
In an IGF encoder, the spectrum having undergone TNS/TTS filtering, tonality mask processing and IGF parameter estimation is devoid of any signal above the IGF start frequency except for tonal components. This sparse spectrum is now coded by the core coder using principles of arithmetic coding and predictive coding. These coded components along with the signaling bits form the bitstream of the audio.
Advantageously, the high resolution is defined by a line-wise coding of spectral lines such as MDCT lines, while the second resolution or low resolution is defined by, for example, calculating only a single spectral value per scale factor band, where a scale factor band covers several frequency lines. Thus, the second low resolution is, with respect to its spectral resolution, much lower than the first or high resolution defined by the line-wise coding typically applied by the core encoder such as an AAC or USAC core encoder.
Regarding scale factor or energy calculation, the situation is illustrated in
Particularly, when the core encoder is under a low bitrate condition, an additional noise-filling operation in the core band, i.e., lower in frequency than the IGF start frequency, i.e., in scale factor bands SCB1 to SCB3 can be applied in addition. In noise-filling, there exist several adjacent spectral lines which have been quantized to zero. On the decoder-side, these quantized to zero spectral values are re-synthesized and the re-synthesized spectral values are adjusted in their magnitude using a noise-filling energy such as NF2 illustrated at 308 in
Advantageously, the bands, for which energy information is calculated coincide with the scale factor bands. In other embodiments, an energy information value grouping is applied so that, for example, for scale factor bands 4 and 5, only a single energy information value is transmitted, but even in this embodiment, the borders of the grouped reconstruction bands coincide with borders of the scale factor bands. If different band separations are applied, then certain re-calculations or synchronization calculations may be applied, and this can make sense depending on the certain implementation.
Advantageously, the spectral domain encoder 106 of
In the audio encoder of
Then, at the output of block 422, a quantized spectrum is obtained corresponding to what is illustrated in
The set to zero blocks 410, 418, 422, which are provided alternatively to each other or in parallel are controlled by the spectral analyzer 424. The spectral analyzer advantageously comprises any implementation of a well-known tonality detector or comprises any different kind of detector operative for separating a spectrum into components to be encoded with a high resolution and components to be encoded with a low resolution. Other such algorithms implemented in the spectral analyzer can be a voice activity detector, a noise detector, a speech detector or any other detector deciding, depending on spectral information or associated metadata on the resolution requirements for different spectral portions.
Subsequently, reference is made to
As illustrated at 301 in
Advantageously, an IGF operation, i.e., a frequency tile filling operation using spectral values from other portions can be applied in the complete spectrum. Thus, a spectral tile filling operation can not only be applied in the high band above an IGF start frequency but can also be applied in the low band. Furthermore, the noise-filling without frequency tile filling can also be applied not only below the IGF start frequency but also above the IGF start frequency. It has, however, been found that high quality and high efficient audio encoding can be obtained when the noise-filling operation is limited to the frequency range below the IGF start frequency and when the frequency tile filling operation is restricted to the frequency range above the IGF start frequency as illustrated in
Advantageously, the target tiles (TT) (having frequencies greater than the IGF start frequency) are bound to scale factor band borders of the full rate coder. Source tiles (ST), from which information is taken, i.e., for frequencies lower than the IGF start frequency are not bound by scale factor band borders. The size of the ST should correspond to the size of the associated TT.
Subsequently, reference is made to
Then, the first spectral portion of the reconstruction band such as 307 of
Furthermore, as illustrated in
As illustrated, the encoder operates without downsampling and the decoder operates without upsampling. In other words, the spectral domain audio coder is configured to generate a spectral representation having a Nyquist frequency defined by the sampling rate of the originally input audio signal.
Furthermore, as illustrated in
As outlined, the spectral domain audio decoder 112 is configured so that a maximum frequency represented by a spectral value in the first decoded representation is equal to a maximum frequency included in the time representation having the sampling rate wherein the spectral value for the maximum frequency in the first set of first spectral portions is zero or different from zero. Anyway, for this maximum frequency in the first set of spectral components a scale factor for the scale factor band exists, which is generated and transmitted irrespective of whether all spectral values in this scale factor band are set to zero or not as discussed in the context of
The IGF is, therefore, advantageous that with respect to other parametric techniques to increase compression efficiency, e.g. noise substitution and noise filling (these techniques are exclusively for efficient representation of noise like local signal content) the IGF allows an accurate frequency reproduction of tonal components. To date, no state-of-the-art technique addresses the efficient parametric representation of arbitrary signal content by spectral gap filling without the restriction of a fixed a-priory division in low band (LF) and high band (HF).
Subsequently, further optional features of the full band frequency domain first encoding processor and the full band frequency domain decoding processor incorporating the gap-filling operation, which can be implemented separately or together are discussed and defined.
Particularly, the spectral domain decoder 112 corresponding to block 1122a is configured to output a sequence of decoded frames of spectral values, a decoded frame being the first decoded representation, wherein the frame comprises spectral values for the first set of spectral portions and zero indications for the second spectral portions. The apparatus for decoding furthermore comprises a combiner 208. The spectral values are generated by a frequency regenerator for the second set of second spectral portions, where both, the combiner and the frequency regenerator are included within block 1122b. Thus, by combining the second spectral portions and the first spectral portions a reconstructed spectral frame comprising spectral values for the first set of the first spectral portions and the second set of spectral portions are obtained and the spectrum-time converter 118 corresponding to the IMDCT block 1124 in
As outlined, the spectrum-time converter 118 or 1124 is configured to perform an inverse modified discrete cosine transform 512, 514 and further comprises an overlap-add stage 516 for overlapping and adding subsequent time domain frames
Particularly, the spectral domain audio decoder 1122a is configured to generate the first decoded representation so that the first decoded representation has a Nyquist frequency defining a sampling rate being equal to a sampling rate of the time representation generated by the spectrum-time converter 1124.
Furthermore, the decoder 1112 or 1122a is configured to generate the first decoded representation so that a first spectral portion 306 is placed with respect to frequency between two second spectral portions 307a, 307b.
In a further embodiment, a maximum frequency represented by a spectral value for the maximum frequency in the first decoded representation is equal to a maximum frequency included in the time representation generated by the spectrum-time converter, wherein the spectral value for the maximum frequency in the first representation is zero or different from zero.
Furthermore, as illustrated in
Furthermore, the spectral domain audio decoder 112 is configured to generate the first decoded representation having the first spectral portions with the frequency values being greater than the frequency being equal to a frequency in the middle of the frequency range covered by the time representation output by the spectrum-time converter 118 or 1124.
Furthermore, the spectral analyzer or full-band analyzer 604 is configured to analyze the representation generated by the time-frequency converter 602 for determining a first set of first spectral portions to be encoded with the first high spectral resolution and the different second set of second spectral portions to be encoded with a second spectral resolution which is lower than the first spectral resolution and, by means of the spectral analyzer, a first spectral portion 306 is determined, with respect to frequency, between two second spectral portions in
Particularly, the spectral analyzer is configured for analyzing the spectral representation up to a maximum analysis frequency being at least one quarter of a sampling frequency of the audio signal.
Particularly, the spectral domain audio encoder is configured to process a sequence of frames of spectral values for a quantization and entropy coding, wherein, in a frame, spectral values of the second set of second portions are set to zero, or wherein, in the frame, spectral values of the first set of first spectral portions and the second set of the second spectral portions are present and wherein, during subsequent processing, spectral values in the second set of spectral portions are set to zero as exemplarily illustrated at 410, 418, 422.
The spectral domain audio encoder is configured to generate a spectral representation having a Nyquist frequency defined by the sampling rate of the audio input signal or the first portion of the audio signal processed by the first encoding processor operating in the frequency domain.
The spectral domain audio encoder 606 is furthermore configured to provide the first encoded representation so that, for a frame of a sampled audio signal, the encoded representation comprises the first set of first spectral portions and the second set of second spectral portions, wherein the spectral values in the second set of spectral portions are encoded as zero or noise values.
The full band analyzer 604 or 102 is configured to analyze the spectral representation starting with the gap-filing start frequency 209 and ending with a maximum frequency fmax represented by a maximum frequency included in the spectral representation and a spectral portion extending from a minimum frequency up to the gap-filling start frequency 309 belongs to the first set of first spectral portions.
Particularly, the analyzer is configured to apply a tonal mask processing at least of a portion of the spectral representation so that tonal components and non-tonal components are separated from each other, wherein the first set of the first spectral portions comprises the tonal components and wherein the second set of the second spectral portions comprises the non-tonal components.
Although the present invention has been described in the context of block diagrams where the blocks represent actual or logical hardware components, the present invention can also be implemented by a computer-implemented method. In the latter case, the blocks represent corresponding method steps where these steps stand for the functionalities performed by corresponding logical or physical hardware blocks.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
The inventive transmitted or encoded signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive method is, therefore, a data carrier (or a non-transitory storage medium such as a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
14178819 | Jul 2014 | EP | regional |
This application is a continuation of U.S. patent application Ser. No. 16/290,587 filed Mar. 1, 2019 which is a continuation of U.S. patent application Ser. No. 15/414,289 filed Jan. 24, 2017 which is a continuation of co-pending International Application No. PCT/EP2015/067005, filed Jul. 24, 2015, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 14178819.0, filed Jul. 28, 2014, which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6134518 | Cohen et al. | Oct 2000 | A |
6233550 | Gersho et al. | May 2001 | B1 |
6996198 | Cvetkovic | Feb 2006 | B2 |
7388525 | Miyauchi et al. | Jun 2008 | B2 |
7873511 | Herre et al. | Jan 2011 | B2 |
8015017 | Sung et al. | Sep 2011 | B2 |
8271274 | Massaloux et al. | Sep 2012 | B2 |
9318118 | Ekstrand | Apr 2016 | B2 |
20030093278 | Malah | May 2003 | A1 |
20030233234 | Truman et al. | Dec 2003 | A1 |
20040049376 | Sperschneider et al. | Mar 2004 | A1 |
20040162720 | Jang et al. | Aug 2004 | A1 |
20050004793 | Ojala | Jan 2005 | A1 |
20050256701 | Makinen | Nov 2005 | A1 |
20070177620 | Ohmuro | Aug 2007 | A1 |
20070282603 | Bessette | Dec 2007 | A1 |
20080106445 | Unno | May 2008 | A1 |
20080228500 | Oh et al. | Sep 2008 | A1 |
20090012797 | Boehm et al. | Jan 2009 | A1 |
20090213958 | Kondo et al. | Aug 2009 | A1 |
20090240491 | Reznik | Sep 2009 | A1 |
20100114585 | Yoon et al. | May 2010 | A1 |
20100217607 | Neuendorf et al. | Aug 2010 | A1 |
20100262420 | Herre et al. | Oct 2010 | A1 |
20100324912 | Choo et al. | Dec 2010 | A1 |
20110002266 | Gao | Jan 2011 | A1 |
20110004479 | Ekstrand | Jan 2011 | A1 |
20110010168 | Yu et al. | Jan 2011 | A1 |
20110058687 | Niemisto et al. | Mar 2011 | A1 |
20110099004 | Krishnan et al. | Apr 2011 | A1 |
20110173008 | Lecomte et al. | Jul 2011 | A1 |
20110173010 | Lecomte et al. | Jul 2011 | A1 |
20110178795 | Bayer | Jul 2011 | A1 |
20110202354 | Grill | Aug 2011 | A1 |
20110202355 | Grill | Aug 2011 | A1 |
20110218797 | Mittal et al. | Sep 2011 | A1 |
20110257984 | Virette | Oct 2011 | A1 |
20110264454 | Ullberg | Oct 2011 | A1 |
20110305352 | Villemoes | Dec 2011 | A1 |
20110320196 | Choo et al. | Dec 2011 | A1 |
20110320212 | Tsujino et al. | Dec 2011 | A1 |
20120016667 | Gao | Jan 2012 | A1 |
20120016668 | Gao | Jan 2012 | A1 |
20120065965 | Choo | Mar 2012 | A1 |
20120076323 | Disch et al. | Mar 2012 | A1 |
20120146831 | Eksler | Jun 2012 | A1 |
20120158409 | Nagel et al. | Jun 2012 | A1 |
20120209600 | Kim | Aug 2012 | A1 |
20120221342 | Oshikiri | Aug 2012 | A1 |
20120245947 | Neuendorf et al. | Sep 2012 | A1 |
20120253797 | Geiger et al. | Oct 2012 | A1 |
20120271644 | Bessette et al. | Oct 2012 | A1 |
20120328124 | Kjoerling | Dec 2012 | A1 |
20130006645 | Jiang | Jan 2013 | A1 |
20130030798 | Mittal et al. | Jan 2013 | A1 |
20130030819 | Purnhagen et al. | Jan 2013 | A1 |
20130035777 | Niemisto | Feb 2013 | A1 |
20130090933 | Villemoes et al. | Apr 2013 | A1 |
20130121411 | Robillard et al. | May 2013 | A1 |
20130124215 | Lecomte et al. | May 2013 | A1 |
20130138398 | Reza | May 2013 | A1 |
20130159005 | Kikuiri et al. | Jun 2013 | A1 |
20130173259 | Mittal et al. | Jul 2013 | A1 |
20130173275 | Liu | Jul 2013 | A1 |
20130226570 | Multrus et al. | Aug 2013 | A1 |
20130253917 | Schildbach | Sep 2013 | A1 |
20130268264 | Taleb et al. | Oct 2013 | A1 |
20130282368 | Choo et al. | Oct 2013 | A1 |
20130289981 | Ragot et al. | Oct 2013 | A1 |
20130315082 | Yan et al. | Nov 2013 | A1 |
20130325486 | Oh et al. | Dec 2013 | A1 |
20140019142 | Mittal | Jan 2014 | A1 |
20140081629 | Gao | Mar 2014 | A1 |
20150154970 | Purnhagen et al. | Jun 2015 | A1 |
20150332686 | Disch et al. | Nov 2015 | A1 |
20150380007 | Atti et al. | Dec 2015 | A1 |
20160133265 | Disch | May 2016 | A1 |
20160196829 | Liu et al. | Jul 2016 | A1 |
20160232907 | Kovesi et al. | Aug 2016 | A1 |
20160240203 | Lecomte | Aug 2016 | A1 |
20160293173 | Faure | Oct 2016 | A1 |
20170133023 | Disch et al. | May 2017 | A1 |
20170133027 | Kovesi et al. | May 2017 | A1 |
20170213561 | Faure et al. | Jul 2017 | A1 |
20170256267 | Disch | Sep 2017 | A1 |
20170316788 | Norvell et al. | Nov 2017 | A1 |
Number | Date | Country |
---|---|---|
2013200680 | Feb 2013 | AU |
101199121 | Jun 2008 | CN |
101872618 | Oct 2010 | CN |
102089758 | Jun 2011 | CN |
102113051 | Jun 2011 | CN |
102177543 | Sep 2011 | CN |
0653846 | May 1995 | EP |
1575032 | Sep 2005 | EP |
1990799 | Nov 2008 | EP |
2405426 | Jan 2011 | EP |
2343700 | Jul 2011 | EP |
2613316 | Jul 2013 | EP |
2424090 | Sep 2013 | ES |
H09-319396 | Dec 1997 | JP |
2002-050967 | Feb 2002 | JP |
2005-107255 | Apr 2005 | JP |
2008-503783 | Feb 2008 | JP |
2010-526346 | Jul 2010 | JP |
2010-210680 | Sep 2010 | JP |
2012-242785 | Dec 2012 | JP |
2013-543600 | Dec 2013 | JP |
2014-505902 | Mar 2014 | JP |
2439721 | Jul 2011 | RU |
2437172 | Dec 2011 | RU |
2515704 | Aug 2012 | RU |
2577195 | May 2014 | RU |
200809771 | Feb 2008 | TW |
200828826 | Jul 2008 | TW |
I321777 | Mar 2010 | TW |
WO 2008000316 | Jan 2008 | WO |
WO 2009029037 | Mar 2009 | WO |
WO 2011048117 | Apr 2011 | WO |
WO 2012091464 | Jul 2012 | WO |
WO 2014199632 | Dec 2014 | WO |
WO 2015010948 | Jan 2015 | WO |
Entry |
---|
Niediwiecki, M., and Krzysztof Cisowski. “Smart copying—a new approach to reconstruction of audio signals.” IEEE transactions on signal processing 49.10 (2001): 2272-2282. (Year: 2001). |
Nagel, Frederik, Sascha Disch, and Stephan Wilde. “A continuous modulated single sideband bandwidth extension.” 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. (Year: 2010). |
Helmrich, Christian R., et al. “Efficient transform coding of two-channel audio signals by means of complex-valued stereo prediction.” 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (Year: 2011). |
Notice of Allowance dated May 14, 2019 issued in the parallel Korean patent application No. 10-2017-7005432 (8 pages with English translation). |
Notice of Allowance dated May 14, 2019 issued in the parallel Korean patent application No. 10-2017-7005491 (8 pages with English translation). |
Ronald E. Crochiere, et al. Optimum FIR digital filter implementations for decimation, interpolation, and narrow-band filtering. IEEE Transactions on Acoustics, Speech, and Signal Processing, Oct. 1975. |
Audio codec processing functions, Extended Adaptive Multi-Rate—Wideband (AMR-WB+) codec. 3GPP TS 26.290 version 9.0.0 Release 9, Sep. 2009. |
ISO/IEC FDIS 23003-3:2011(E), Information technology—MPEG audio technologies—Part 3: Unified speech and audio coding. ISO/IEC JTC 1/SC 29/WG 11. Sep. 20, 2011. |
Notice of Allowance dated Feb. 2, 2019 issued in U.S. Appl. No. 15/414,427 (17 pgs). |
Notice of Allowance dated Oct. 31, 2018 issued in U.S. Appl. No. 15/414,289 (17 pgs). |
Office Action dated Sep. 30, 2022 issued in the parallel U.S. Appl. No. 17/336,132. |
Oh et al., Enhanced Long-Term Predictor for United Speech and Audio Coding, 2011, IEEE, whole document (Year:2011). |
Brinker et al., An Overview of the Coding Standard MPEG-4 Audio Amendments 1 and 2: HE-AAC, SSC and HE-AACv2, 2009, Hindawi, whole document (Year: 2009). |
Office Action dated Jul. 27, 2020 issued in the parallel U.S. Appl. No. 16/286,397 (37 pages). |
Office Action dated Aug. 25, 2020 issued in the parallel Japanese patent application No. 2019-117964. |
Office Action dated Sep. 13, 2019 issued in the parallel IN patent application No. 201637041442 (7 pages). |
Dietz M. et al., “Spectral Band Replication, a novel approach in audio coding”, 112th AES Convention, Munich 2002 (8 pages). |
Meltzer S. et al., “SBR enhanced audio codecs for digital broadcasting such as ‘Digital Radio Mondiale’ (DRM)”, 112th AES Convention, Munich 2002 (4 pages). |
Ziegler T. et al., “Enhancing mp3 with SBR: Features and Capabilities of the new mp3PRO Algorithm”, 112th AES Convention, Munich 2002 (7 pages). |
Bosi M. et al., “ISO/IEC MPEG-2 advanced audio coding”, Journal of the Audio Engineering Society, Audio Engineering Society, vol. 45, No. 10, XP00073016, New York 1997, pp. 789-812 (24 pages). |
Office Action dated Mar. 28, 2018 issued in parallel Russian patent application No. 2017106099 (12 pages). |
Office Action dated Apr. 4, 2018 issued in U.S. Appl. No. 15/414,289. |
Office Action dated Apr. 16, 2018 issued in parallel Russian patent application No. 2017105448. |
Office action dated Apr. 24, 2018 in the parallel Japanese patent application No. 2017-504786 (10 pages). |
Office action dated Apr. 25, 2018 for the parallel Japanese patent application No. 2017-504785 (11 pages with translation). |
Neuendorf, Max, WD7 of USAC, ISO/IEC JTC1/SC29/WG11, MPEG2010/N11299, Germany, Apr. 2010, p. 6-7, 10-13, 56, 105 (9 pages). |
Office Action dated Aug. 30, 2018 issued in in the parallel Russian patent application No. 2017105448 (35 pages). |
Anonymous: WD7 of USAC, 92. MPEG Meeting; Apr. 19, 2010-Apr. 23, 2010; Dresden; (Motion Picture Expert Group or ISO/IEC JTC1/SC29/WG 11), No. N11299, Apr. 26, 2010, XP030018547. |
International Search Report and Written Opinion dated Aug. 26, 2015 for PCT/EP2015/067005. |
International Search Report and Written Opinion dated Aug. 25, 2015 for PCT/EP2015/067003. |
Non-Final Office Action dated Jul. 25, 2018 issued in U.S. Appl. No. 15/414,427. |
Notice of Allowance dated Feb. 19, 2019 issued in U.S. Appl. No. 15/414,427. |
Bessette B., et al. “Efficient methods for high quality low bit rate wideband speech coding.” Speech Coding, 2002, IEEE Workshop Proceedings. IEEE, Oct. 2002, pp. 114-116. |
Geiser, Bernd, et al. “A qualified ITU-T G. 729EV codec candidate for hierarchical speech and audio coding.” Multimedia Signal Processing, 2006 IEEE 8th Workshop on IEEE, Oct. 2006, pp. 114-118. |
Nuendorf, Max et al. “Unified speech and audio coding scheme for high quality at low bitrates.” Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on IEEE, Apr. 2009, pp. 1-4. |
Lee, Taejin, et al. “Adaptive TCX Windowing Technology for Unified Structure MPEG-D USAC.” ETRI Journal 34.3, Jun. 2012, pp. 474-477. |
Makhoul, John et al. “High-frequency regeneration in speech coding systems.” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'79. vol. 4. IEEE, Apr. 1979, pp. 428-431. |
Song, Jeongook et al. “Enhanced long-term predictor for Unified Speech and Audio Coding.” Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on IEEE, May 2011, pp. 505-508. |
Zernicki, Tomasz et al. “Enhanced coding of high-frequency tonal components in MPEG-D USAC through joint applications of ESBR and sinusoidal modeling.” Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on IEEE, May 2011, pp. 501-504. |
Notice of Allowance issued dated 123/2018 in U.S. Appl. No. 15/414,289. |
Hearing Notice dated Feb. 25, 2022 in the parallel Indian patent application No. 201737001634 (2 pages). |
Office Action dated Mar. 5, 2020 issued in related U.S. Appl. No. 16/286,397 (33 pages). |
Office Action dated Mar. 10, 2020 issued in the parallel Japanese patent application No. 2019-024181. |
Office Action dated Mar. 12, 2020 issued in the parallel European patent application No. 19160134.3. |
Office Action dated Aug. 8, 2023 issued in the parallel EP patent application No. 231184408.5. |
Anonymous: “WD7 of USAC”, 92. MPEG Meeting; Apr. 19, 2010-Apr. 23, 2010; Dresden; (Motion Picture Expert Group or ISO/IEC JTC1/SC29/WG11), No. N11299, Apr. 26, 2010 (Apr. 26, 2010), XP030018547. |
Bosi M et al: “ISO/IEC MPEG-2 Advanced Audio Coding”, Journal of the Audio Engineering Society, Audio Engineering Society, New York, NY, US, vol. 45, No. 10, Oct. 1, 1997 (Oct. 1, 1997 ), pp. 789-812, XP000730161. |
Office Action dated May 17, 2023 issued in the parallel CN patent application No. 202110039148.6. |
Office Action dated May 22, 20223 issued in related U.S. Appl. No. 17/336,132. |
Office Action dated May 22, 20223 issued in related U.S. Appl. No. 18/158,035. |
Number | Date | Country | |
---|---|---|---|
20220051681 A1 | Feb 2022 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16290587 | Mar 2019 | US |
Child | 17453139 | US | |
Parent | 15414289 | Jan 2017 | US |
Child | 16290587 | US | |
Parent | PCT/EP2015/067005 | Jul 2015 | US |
Child | 15414289 | US |