The invention pertains to audio signal processing, and more particularly, to adaptive filtering of decoded audio signals to reduce audible noise (e.g., pre-echo noise) due to quantization during encoding.
In accordance with many conventional audio encoding methods, audio data undergoes quantization (e.g., to compress the audio data during perceptual audio coding). For example, encoding of audio data in accordance with the formats known as AC-3 and Enhanced AC-3 (or “E-AC-3”) includes such a quantization step. Dolby Laboratories provides proprietary implementations of AC-3 and E-AC-3 known as Dolby Digital and Dolby Digital Plus, respectively. Dolby, Dolby Digital, and Dolby Digital Plus are trademarks of Dolby Laboratories Licensing Corporation.
Although some embodiments of the present invention are useful to filter audio content of a decoded version of an encoded bitstream having AC-3 (or E-AC-3) format, it is contemplated that other embodiments of the invention are useful to filter audio content of decoded versions of encoded bitstreams having other formats (provided that the encoding includes a quantization step).
Next, with reference to
An encoded bitstream having AC-3 format comprises one to six channels of audio content, and metadata indicative of at least one characteristic of the audio content. The audio content is audio data that has been compressed using perceptual audio coding.
In encoding of an AC-3 audio bitstream, blocks of input audio samples to be encoded undergo time-to-frequency domain transformation resulting in blocks of frequency domain data, commonly referred to as transform coefficients, frequency coefficients, or frequency components, located in uniformly spaced frequency bins. The frequency coefficient in each bin is then converted (e.g., in BFPE stage 7 of the
Typical embodiments of AC-3 (and E-AC-3) encoders (and other audio data encoders) implement a psychoacoustic model to analyze the frequency domain data on a banded basis (i.e., typically 50 nonuniform bands approximating the frequency bands of the well known psychoacoustic scale known as the Bark scale) to determine an optimal allocation of bits to each mantissa. The mantissa data is then quantized (e.g., in quantizer 6 of the
To perform AC-3 encoding of an audio program, a number, N (e.g., N=1, N=2, or N=4), of quantized mantissa values (one for each of N consecutive frequency bins) which will share the same exponent value is chosen. Each such set of N consecutive frequency bins may also (and herein will) be referred to as a frequency “band” (each band comprising N bins). Thus, one bit allocation value for each frequency band of an encoded audio program (where the bit allocation value is indicative of the number of bits of the mantissa for one bin of the band) suffices to indicate the number of bits of each mantissa of each audio sample in the band. In this context, the frequency bands of the encoded audio program are typically not the same frequency bands assumed by the psychoacoustic model which is employed to determine the number of bits of each quantized mantissa of the encoded program.
Quantizer 6 performs bit allocation and quantization based upon control data (including masking data) generated by controller 4. The masking data (determining a masking curve) is generated from the frequency domain data 3, on the basis of a psychoacoustic model (implemented by controller 4) of human hearing and aural perception. The psychoacoustic modeling takes into account the frequency-dependent thresholds of human hearing, and a psychoacoustic phenomenon referred to as masking, whereby a strong frequency component close to one or more weaker frequency components tends to mask the weaker components, rendering them inaudible to a human listener. This makes it possible to omit the weaker frequency components when encoding audio data, and thereby achieve a higher degree of compression, without adversely affecting the perceived quality of the encoded audio data (bitstream 9). The masking data comprises a masking curve value for each frequency band (determined by the psychoacoustic model) of the frequency domain audio data 3. These masking curve values represent the level of signal masked by the human ear in each frequency band. Quantizer 6 uses this information to decide how best to use the available number of data bits to represent the frequency domain data of each frequency band of the input audio signal.
Controller 4 may implement a conventional low frequency compensation process (sometimes referred to herein as “lowcomp” compensation) to generate lowcomp parameter values for correcting the masking curve values for the low frequency bands. The corrected masking curve values are used to generate the signal-to-mask ratio value for each frequency component of the frequency-domain audio data 3. Low frequency compensation is a feature of the psychoacoustic model typically implemented during AC-3 (and E-AC-3) encoding of audio data. Lowcomp compensation improves the encoding of highly tonal low-frequency components (of the input audio data to be encoded) by preferentially reducing the mask in the relevant frequency region, and in consequence allocating more bits to the code words employed to encode such components.
In AC-3 and E-AC-3 encoding, each component of the frequency-domain audio data 3 (i.e., the contents of each transform bin) has a floating point representation comprising a mantissa and an exponent. To simplify the calculation of the masking curve, the Dolby Digital family of coders uses only the exponents to derive the masking curve. Or, stated alternately, the masking curve depends on the transform coefficient exponent values but is independent of the transform coefficient mantissa values. Because the range of exponents is rather limited (generally, integer values from 0-24), the exponent values are mapped onto a PSD scale with a larger range (generally, integer values from 0-3072) for the purposes of computing the masking curve. Thus, the loudest frequency components are mapped to a PSD value of 3072, while the softest frequency-domain data components are mapped to a PSD value of 0.
In conventional Dolby Digital (or Dolby Digital Plus) encoding, differential exponents (i.e., the difference between consecutive exponents) are coded instead of absolute exponents. The differential exponents can only take on one of five values: 2, 1, 0, −1, and −2. If a differential exponent outside this range is found, one of the exponents being subtracted is modified so that the differential exponent (after the modification) is within the noted range (this conventional method is known as “exponent tenting” or “tenting”). Tenting stage 10 of the
Spectral domain coding systems (e.g., conventional encoders of the type described with reference to
Typically, the encoder applies the filter in the frequency domain (i.e., to frequency components generated by applying a time domain-to-frequency domain transform on the audio data to be encoded), and the inverse filter is also applied (by the decoder) in the frequency domain (i.e., during or after decoding of frequency-domain encoded audio data, but before application of a frequency domain-to-time domain transform on the decoded audio data.
Herein, we use the term “quantization noise filter” to denote a filter designed to reduce audible noise (e.g., pre-echo noise) due to quantization during encoding of audio data. Herein, it is contemplated that a quantization noise filter may be applied by an encoder (i.e., during encoding of the audio data), or in a decoder (or a post-filtering system coupled and configured to filter the output of a decoder) during or after decoding of encoded audio data.
An example of a quantization noise filter implemented in an encoder (rather than in a decoder) is described in US Patent Application Publication No. 2010/0094637 A1, published Apr. 15, 2010, and assigned to the assignee of the present invention. The named inventor of US Patent Application Publication No. 2010/0094637 A1 is the same individual as the inventor of the present invention.
It is also contemplated herein that a quantization noise filter may be applied partially by an encoder and partially by a decoder (or a post-filtering system coupled and configured to filter the output of a decoder), for example, by applying a first filter stage in the encoder and a second filter stage in the decoder (or post-filtering system) after delivery of the encoded signal to the decoder. Examples of this latter type of quantization noise filter are those applied by the conventional TNS and Gain Control methods mentioned above. This type of conventional quantization noise filtering has limitations and disadvantages, such as the need for the decoder to apply the inverse of the filter stage (“encoder filter”) applied by the encoder, which prevents use of a decoder that is not specially configured to apply the inverse of the encoder filter.
The present inventor has recognized that it would be desirable to implement a quantization noise filter in a decoder (or a post-filter coupled to a decoder), so that a decoder (or post-filter) configured to apply the quantization noise filter can perform quantization noise filtering on audio content, and so that a conventional decoder (or a conventional decoder and conventional post-filter coupled thereto) not configured to apply the quantization noise filter can decode (and optionally also perform post-filtering on) audio content without performing quantization noise filtering on the audio content. In the latter case, the conventionally decoded audio content could usefully be rendered (i.e., the resulting sound could have acceptable quality, although the sound quality might suffer from audible noise due to quantization).
In a first class of embodiments, the invention is a method including steps of decoding an encoded audio signal indicative of encoded audio content to generate a decoded audio signal indicative of a decoded version of the audio content (e.g., a decoded version of at least one audio channel of an encoded audio program), and performing adaptive quantization noise filtering on the decoded audio signal. It is assumed that the encoding performed to generate the encoded audio content included a quantization step. The quantization noise filtering is performed adaptively in the spectral domain (frequency domain), in response to data indicative of “signal to noise” values which are indicative (e.g., at least approximately indicative) of a post-quantization, signal-to-quantization noise ratio for each frequency band of at least one segment (e.g., each segment) of the encoded audio content. The signal to noise values may be denoted as SQNR[k], with k denoting the frequency band to which each signal to noise value SQNR[k] pertains. In preferred embodiments in the first class, each signal to noise value SQNR[k] is a bit allocation value equal to the number of mantissa bits of at least one encoded audio sample (e.g., each audio sample) of a frequency band of a segment of the encoded audio content. In typical embodiments, the adaptive quantization noise filtering applies relatively less quantization noise filtering to frequency components of decoded audio content (decoded versions of encoded audio samples) in frequency bands having better signal to noise ratio (i.e., post-quantization signal to quantization noise ratio), and relatively more quantization noise filtering to frequency components of the audio content in frequency bands having lower signal to noise ratio.
In some embodiments, the quantization noise filtering is performed adaptively on the decoded audio signal by determining a filter gain value (e.g., one of the α[k] values output from subsystem 23 of below-described
In the first class of embodiments, the method is typically performed by a decoder only (e.g., in a post-filtering subsystem of a decoder) or by a post-filter coupled to receive a decoder's output (indicative of a decoded version of an encoded audio signal).
In typical embodiments, the adaptive quantization noise filtering is designed to reduce audible noise (e.g., pre-echo noise) that would otherwise occur (during rendering and playback of the decoded audio content which undergoes the filtering) as a result of noise introduced to the audio content by quantization during encoding. In such embodiments, because the spectral domain adaptive filtering is applied in a decoder (or a post-filter coupled to receive the output of a decoder), it will suppress both quantization noise and audio content in the time domain (i.e., both quantization noise and audio content indicated by a transformed version of the frequency components of the filtered signal, generated by applying a frequency-to-time domain transform to the frequency components of the filtered signal). In order to mitigate the damage to the original (pre-encoded) audio content caused by the quantization noise filter, the filter is applied adaptively such that spectral bins that have better signal to quantization noise ratio after quantization have relatively less quantization noise filtering applied to them, while spectral bins with poor signal to quantization nose ratio after quantization have relatively more quantization noise filtering applied to them.
Another aspect of the invention is an audio signal processing system (e.g., a decoder or a post-filter coupled to receive the output of a decoder) which is or includes an adaptive quantization noise filter configured to perform any embodiment of the inventive method.
It is contemplated that in some embodiments, the encoded audio signal which is decoded and adaptively filtered in accordance with the invention is indicative of audio captured (e.g., at different endpoints of a teleconferencing system) during a multiparty teleconference. The decoder (or post-filter) which performs the inventive filtering may be implemented at a conferencing system endpoint.
Another aspect of the invention is a method for decoding encoded audio data, including the steps of: decoding a signal indicative of encoded audio data to generate a decoded version of the encoded audio data (e.g., a decoded version of at least one audio channel of an encoded audio program); and performing adaptive quantization noise filtering on the decoded version of the encoded audio data signal in accordance with any embodiment of the inventive adaptive quantization noise filtering method.
Other aspects of the invention include a system or device (e.g., a decoder or a processor) configured (e.g., programmed) to perform any embodiment of the inventive method, and a computer readable medium (e.g., a disc) which stores code for implementing any embodiment of the inventive method or steps thereof. For example, the inventive system can be or include a programmable general purpose processor, digital signal processor, or microprocessor, programmed with software or firmware and/or otherwise configured to perform any of a variety of operations on data, including an embodiment of the inventive method or steps thereof. Such a general purpose processor may be or include a computer system including an input device, a memory, and processing circuitry programmed (and/or otherwise configured) to perform an embodiment of the inventive method (or steps thereof) in response to data asserted thereto.
Throughout this disclosure, including in the claims, the expression performing an operation “on” a signal or data (e.g., filtering, scaling, transforming, or applying gain to, the signal or data) is used in a broad sense to denote performing the operation directly on the signal or data, or on a processed version of the signal or data (e.g., on a version of the signal that has undergone preliminary filtering or pre-processing prior to performance of the operation thereon).
Throughout this disclosure including in the claims, the expression “system” is used in a broad sense to denote a device, system, or subsystem. For example, a subsystem that implements a decoder may be referred to as a decoder system, and a system including such a subsystem (e.g., a system that generates X output signals in response to multiple inputs, in which the subsystem generates M of the inputs and the other X-M inputs are received from an external source) may also be referred to as a decoder system.
Throughout this disclosure including in the claims, the term “processor” is used in a broad sense to denote a system or device programmable or otherwise configurable (e.g., with software or firmware) to perform operations on data (e.g., audio, or video or other image data). Examples of processors include a field-programmable gate array (or other configurable integrated circuit or chip set), a digital signal processor programmed and/or otherwise configured to perform pipelined processing on audio or other sound data, a programmable general purpose processor or computer, and a programmable microprocessor chip or chip set.
Throughout this disclosure including in the claims, the expressions “audio processor” and “audio processing unit” are used interchangeably, and in a broad sense, to denote a system configured to process audio data. Examples of audio processing units include, but are not limited to encoders (e.g., transcoders), decoders, codecs, pre-processing systems, post-processing systems, and bitstream processing systems (sometimes referred to as bitstream processing tools).
Throughout this disclosure including in the claims, the expression “metadata” refers to separate and different data from corresponding audio data (audio content of a bitstream which also includes metadata). Metadata is associated with audio data, and indicates at least one feature or characteristic of the audio data (e.g., what type(s) of processing have already been performed, or should be performed, on the audio data, or the trajectory of an object indicated by the audio data). The association of the metadata with the audio data is time-synchronous. Thus, present (most recently received or updated) metadata may indicate that the corresponding audio data contemporaneously has an indicated feature and/or comprises the results of an indicated type of audio data processing.
Throughout this disclosure including in the claims, the term “couples” or “coupled” is used to mean either a direct or indirect connection. Thus, if a first device couples to a second device, that connection may be through a direct connection, or through an indirect connection via other devices and connections.
Throughout this disclosure including in the claims, the following expressions have the following definitions:
speaker and loudspeaker are used synonymously to denote any sound-emitting transducer. This definition includes loudspeakers implemented as multiple transducers (e.g., woofer and tweeter);
speaker feed: an audio signal to be applied directly to a loudspeaker, or an audio signal that is to be applied to an amplifier and loudspeaker in series;
channel (or “audio channel”): a monophonic audio signal. Such a signal can typically be rendered in such a way as to be equivalent to application of the signal directly to a loudspeaker at a desired or nominal position. The desired position can be static, as is typically the case with physical loudspeakers, or dynamic;
audio program: a set of one or more audio channels (at least one speaker channel and/or at least one object channel) and optionally also associated metadata (e.g., metadata that describes a desired spatial audio presentation);
speaker channel (or “speaker-feed channel”): an audio channel that is associated with a named loudspeaker (at a desired or nominal position), or with a named speaker zone within a defined speaker configuration. A speaker channel is rendered in such a way as to be equivalent to application of the audio signal directly to the named loudspeaker (at the desired or nominal position) or to a speaker in the named speaker zone;
object channel: an audio channel indicative of sound emitted by an audio source (sometimes referred to as an audio “object”). Typically, an object channel determines a parametric audio source description (e.g., metadata indicative of the parametric audio source description is included in or provided with the object channel). The source description may determine sound emitted by the source (as a function of time), the apparent position (e.g., 3D spatial coordinates) of the source as a function of time, and optionally at least one additional parameter (e.g., apparent source size or width) characterizing the source; and
render: the process of converting an audio program into one or more speaker feeds, or the process of converting an audio program into one or more speaker feeds and converting the speaker feed(s) to sound using one or more loudspeakers. An audio channel can be trivially rendered (“at” a desired position) by applying the signal directly to a physical loudspeaker at the desired position, or one or more audio channels can be rendered using one of a variety of virtualization techniques designed to be substantially equivalent (for the listener) to such trivial rendering. In this latter case, each audio channel may be converted to one or more speaker feeds to be applied to loudspeaker(s) in known locations, which are in general different from the desired position, such that sound emitted by the loudspeaker(s) in response to the feed(s) will be perceived as emitting from the desired position. Examples of such virtualization techniques include binaural rendering via headphones (e.g., using Dolby Headphone processing which simulates up to 7.1 channels of surround sound for the headphone wearer) and wave field synthesis.
Embodiments of systems configured to implement the inventive method will be described with reference to
The adaptive post-filter of
In response to the bit allocation values (each indicative of the number of mantissa bits of at least one of the decoded frequency components Y[k], and thus indicative of (and corresponds to) a signal to quantization noise ratio, SQNR[k], for each corresponding decoded frequency component Y[k]), gain calculation subsystem 23 is configured to determine a quantization noise filter gain value, α[k], for each decoded frequency component, Y[k]. The adaptive quantization noise filter gain values α[k] determine a degree of quantization gain filtering to be applied to each decoded frequency component, Y[k].
Parsing subsystem 20 of the
Parsing subsystem 20 is coupled and configured to parse from the delivered bitstream the audio data indicative of the program's audio content (and typically also metadata corresponding to the audio data) and to assert the audio data (and typically also the metadata) to decoding subsystem 21. Parsing subsystem 20 is also coupled and configured to parse from the delivered bitstream the coefficients of the non-adaptive post-filter to be applied to a decoded version of the audio data (by subsystem 24) and to assert these filter coefficients to subsystem 24. The non-adaptive post-filter coefficients asserted to subsystem 24 may be the coefficients “b[j]” of equation (1) below (in the case that the non-adaptive post-filter is a finite impulse response (FIR) filter, so that the coefficients “a[j]” of equation (1) are all equal to zero), or they may be the coefficients “a[j]” and “b[j]” of equation (1) in the case that the non-adaptive post-filter is an infinite impulse response (IIR) filter.
In some embodiments, the delivered bitstream does not include the bit allocation values employed by filter gain calculation subsystem 23 (each indicative of the number of mantissa bits of at least one corresponding encoded audio data sample) to generate the adaptive quantization noise filter gain values, α[k]. In these embodiments, bit allocation subsystem 22 is coupled and configured to generate the bit allocation values (each of which may be the number of mantissa bits of a corresponding frequency domain audio sample in each of at least one of the frequency bands) from the bitstream's encoded audio data. In these embodiments, the bitstream's encoded audio data (or the encoded mantissas thereof) are asserted to subsystem 22 from subsystem 20, and subsystem 22 is configured to generate the bit allocation values in response thereto and to assert the generated bit allocation values to decoding subsystem 21 and filter gain calculation subsystem 23.
In some implementations of the
In implementations of the
Alternatively, the bitstream delivered to parsing subsystem 20 includes the bit allocation values (i.e., they are included as metadata indicative of the number of mantissa bits of corresponding audio data) required by filter gain calculation subsystem 23 (or by decoding subsystem 21 and filter gain calculation subsystem 23). In such alternative embodiments, bit allocation subsystem 22 is typically omitted, and parsing subsystem 20 is coupled and configured to parse the bit allocation values from the delivered bitstream and to assert the bit allocation values directly to subsystems 21 and 23.
Decoding subsystem 21 is configured to decode the encoded, frequency domain audio data of the bitstream. In typical implementations, the decoding includes steps of performing on the encoded audio data the inverse of each encoding operation (e.g., entropy coding and quantization) that had been performed (in an encoder) to generate the encoded audio data, typically using the above-mentioned bit allocation values. As a result, subsystem 21 generates (and asserts to multiplication element 25) a decoded audio signal. The decoded audio signal is indicative of a sequence of decoded frequency components Y[k], where the index k identifies the frequency band corresponding to each component Y[k], and thus the decoded audio signal will sometimes be referred to simply as the decoded frequency components Y[k].
The subsystem comprising elements 23, 24, 25, 26, and 27 (connected as shown, and which implement an embodiment of the inventive quantization noise filter) is configured to perform adaptive post-filtering on the decoded frequency components Y[k], sometimes referred to herein as the decoded spectrum, to generate:
a non-adaptively filtered audio signal indicative of a sequence of non-adaptively filtered values, Y′[k] (given by equation (1) below), for each of the frequency bands. The non-adaptively filtered signal is asserted at the output of subsystem 24; and
a quantization noise filtered audio signal indicative of a sequence of adaptively quantization noise filtered values, Z[k] (given by equation (2) below), for each of the frequency bands. The quantization noise filtered signal is asserted at the output of multiplication element 27.
Transform subsystem 31 is coupled and configured to perform a frequency-to-time domain transformation on the quantization noise filtered signal to generate a time-domain quantization noise filtered signal indicative of a sequence of audio samples z[n].
As noted, the non-adaptive post-filter applied by non-adaptive post-filter subsystem 24 to the decoded frequency components, Y[k], is typically determined by filter coefficients which are generated in the encoder, included in the bitstream delivered to the decoder, and parsed from the bitstream (and asserted to subsystem 24) by subsystem 20 of the decoder. In a typical class of implementations, the non-adaptive filter coefficients are the “a[j]” and “b[j]” coefficients of the following equation (“equation (1)”), and subsystem 24 applies the non-adaptive post-filter to generate the non-adaptively filtered components Y′[k] of the non-adaptively filtered signal such that they satisfy equation (1):
In equation (1), “M” and “0” denote feedback filter order and feedforward filter order.
In the case that the non-adaptive post-filter is a finite impulse response (FIR) filter, so that the “a[j]” coefficients of equation (1) are all equal to zero, the non-adaptive post-filter coefficients asserted (from subsystem 20) to subsystem 24 consist only of the “b[j]” coefficients of equation (1). In the case that the non-adaptive post-filter is an infinite impulse response (IIR) filter, the non-adaptive post-filter coefficients asserted (from subsystem 20) to subsystem 24 may be the “a[j]” and “b[j]” coefficients of equation (1).
Elements 23, 26, 25, and 27 are configured to generate the final (adaptively quantization noise filtered) spectrum Z[k] for each time segment of the bitstream as an adaptively varied linear combination of the non-filtered decoded spectrum Y[k] and the non-adaptively post-filtered spectrum Y′[k] for the time segment, for all the frequency bands k. Each combination of a value (Y′[k]) of the non-adaptively filtered decoded signal, and the corresponding value (Y[k]) of the non-filtered decoded signal, is adaptively controlled by a corresponding one of the quantization noise filter gain values, α[k], which is in turn determined by a corresponding one of the above-mentioned bit allocation values. Frequency bands with coarse quantization (poor signal to quantization noise ratio) will have α[k] close to 0, while frequency bands with finer quantization (better signal to quantization noise ratio) will have α[k] close to 1.
In a typical implementation, the quantization noise filtered signal Z[k] (for each segment of the decoded audio content) is generated from the non-filtered, decoded signal Y[k] (output from subsystem 21 for the same segment of the decoded audio content), and the non-adaptively post-filtered version Y′[k] of the signal Y[k] (output from subsystem 24 for the same segment of the decoded audio content), as follows:
Z[k]=α[k]Y[k]+(1−α[k])Y′[k] (2)
where the value α[k], for each frequency band k of each time segment of the bitstream, is the adaptive quantization noise filter gain value for the decoded frequency component, Y[k], for the same band k and the same time segment of the bitstream.
With reference to
Typically, subsystem 23 is configured to determine the quantization noise filter gain value α[k] for each decoded frequency component Y[k] from the corresponding bit allocation value (i.e., the bit allocation value for the same frequency band, k, and segment of the bitstream), by mapping the bit allocation value to the filter gain value in accordance with a predetermined non-decreasing function (typically having range from 0 to 1 inclusive) of the bit allocation value. Each of the bit allocation values is indicative of the number of mantissa bits of each of the decoded frequency components Y[k], in the relevant frequency band k and time segment of the bitstream, and thus is indicative of (and corresponds to) a signal to quantization noise ratio, SQNR[k], for each corresponding decoded frequency component Y[k].
Each of the adaptive quantization noise filter gain values, α[k], determines a degree of quantization gain filtering applied to each decoded frequency component, Y[k], as indicated in equation (2). For example, when α[k]=0 (which occurs when the signal to quantization noise ratio, SQNR[k], has its lowest value, indicating coarse quantization in the encoder), the value Z[k]=Y′[k] is output from element 27 so that full quantization gain filtering is applied to each corresponding decoded frequency component, Y[k]. For another example, when α[k]=1 (which occurs when the signal to quantization noise ratio, SQNR[k], has its highest value, indicating fine quantization in the encoder), the value Z[k]=Y[k] is output from element 27 so that no quantization gain filtering is applied to each corresponding decoded frequency component, Y[k].
As noted, the decoder of
As shown in
Bit allocation subsystem 45 is coupled and configured to generate bit allocation values for use by coding subsystem 42 in response to the frequency components X[k]. In a typical implementation, each of the bit allocation values is the number of mantissa bits of a corresponding one of the components (frequency domain audio samples), X[k]. Subsystem 45 is coupled and configured to assert the generated bit allocation values to coding subsystem 42, decoding subsystem 44, and filter coefficient calculation subsystem 47.
In typical implementations, the encoding operations performed by coding subsystem 42 include entropy coding and quantization of the frequency domain audio samples. The quantization typically quantizes a mantissa value of each audio sample to a number of bits determined by a corresponding one of the bit allocation values from subsystem 45.
In some implementations of the
The encoded audio data output from subsystem 42 are decoded in decoding subsystem 44 (in the same manner as they would be decoded by decoding subsystem 21 of the
In some implementations of the
An example of application of the inventive adaptive post-filter will be described with reference to
The waveforms plotted in
We next describe an example of a method for determining the non-adaptive post-filter (e.g., the filter applied by subsystem 24 of the
The solution to the above expression are the filter coefficients “b[j]” which satisfy the following equation (3), which is a normal equation:
As apparent from equation (3), to determine the non-adaptive filter coefficients “b[j]” which satisfy equation (3), the inverse of the autocorrelation matrix of the quantized spectrum Y[k] is multiplied by the cross correlation matrix between the original spectrum X[k] and the quantized spectrum Y[k]. In order to account for the adaptive application of the non-adaptive filter in accordance with the invention, the autocorrelation and cross correlation matrices in equation (3) are weighted as shown in equations (4) and (5) respectively:
It should be noted that equations (4) and (5) assume real signals.
In equations (4) and (5), the weighting value w[k] for each frequency band k is chosen as follows when the adaptive application of the non-adaptive filter in accordance with the invention is performed as described above with reference to equation (2), so that the quantization noise filtered signal Z[k] (for each segment of the decoded audio content) is generated from the non-filtered, decoded signal Y[k] (for the same segment of the decoded audio content), and the non-adaptively post-filtered version F[k] of the signal Y[k] (for the same segment of the decoded audio content) as: Z[k] =a[k] Y[k] +(1−α[k]) Y′[k] , where the value a[k], for each frequency band k of each time segment of the bitstream, is an adaptive quantization noise filter gain value for the decoded frequency component, Y[k], for the same band k and the same time segment of the bitstream. In this case:
each filter gain value a[k] is determined from a corresponding signal to quantization noise value, SQNR[k], for the same band, by mapping the signal to quantization noise value to the filter gain value in accordance with a predetermined non-decreasing function (typically having range from 0 to 1 inclusive) of the signal to quantization noise value; and
the weighting value w[k] for the band k is determined by mapping the quantization noise value, SQNR[k], for the band to the weighting value w[k] in accordance with the inverse of the predetermined non-decreasing function noted in the previous paragraph.
For example, if each filter gain value α[k] is proportional to the corresponding signal to quantization noise value SQNR[k], then the corresponding weighting value w[k] may be the inverse of the corresponding filter gain value α[k], so that w[k]·α[k] =1. Thus, relatively lower values of SQNR[k] correspond to relatively lower bit allocation values (relatively smaller numbers of mantissa bits per sample), relatively lower filter gain values α[k], and relatively larger values of w[k].
As noted above, when bit allocation values (e.g., those output from subsystem 45 of the
Another aspect of the invention is a system including a decoder (or post-filter) configured to perform any embodiment of the inventive method on a decoded version of encoded audio data, and an encoder configured to generate the encoded audio data. The
The system of
The system of
Another aspect of the invention is a method (e.g., a method performed by decoder 92 of
The invention may be implemented in hardware, firmware, or software, or a combination of both (e.g., as a programmable logic array). Unless otherwise specified, the algorithms or processes included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus (e.g., integrated circuits) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs executing on one or more programmable computer systems (e.g., a computer system which implements the decoder of
Each such program may be implemented in any desired computer language (including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language.
For example, when implemented by computer software instruction sequences, various functions and steps of embodiments of the invention may be implemented by multithreaded software instruction sequences running in suitable digital signal processing hardware, in which case the various devices, steps, and functions of the embodiments may correspond to portions of the software instructions.
Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be implemented as a computer-readable storage medium, configured with (i.e., storing) a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein.
A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Numerous modifications and variations of the present invention are possible in light of the above teachings. It is to be understood that within the scope of the appended claims, the invention may be practiced otherwise than as specifically described herein.
This application claims the benefit of priority to U.S. Provisional Application Ser. No. 61/918,076, filed on Dec. 19, 2013, which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5774835 | Ozawa | Jun 1998 | A |
6246345 | Davidson | Jun 2001 | B1 |
8315863 | Oshikiri | Nov 2012 | B2 |
8706507 | Vinton | Apr 2014 | B2 |
9026451 | Kleijn | May 2015 | B1 |
9384755 | Vaillancourt | Jul 2016 | B2 |
20020128822 | Kahn | Sep 2002 | A1 |
20060271354 | Sun | Nov 2006 | A1 |
20100094637 | Vinton | Apr 2010 | A1 |
20100161322 | Sung | Jun 2010 | A1 |
20100183067 | Garcia | Jul 2010 | A1 |
20110046947 | Vaillancourt | Feb 2011 | A1 |
20110125507 | Yu | May 2011 | A1 |
20110282656 | Grancharov | Nov 2011 | A1 |
20120323584 | Koishida | Dec 2012 | A1 |
20150142425 | Sjoberg | May 2015 | A1 |
20150179182 | Vinton | Jun 2015 | A1 |
Number | Date | Country |
---|---|---|
0 785 631 | Jul 1997 | EP |
2005078706 | Aug 2005 | WO |
2010009098 | Jan 2010 | WO |
2011048117 | Apr 2011 | WO |
2011142709 | Nov 2011 | WO |
2012110415 | Aug 2012 | WO |
Number | Date | Country | |
---|---|---|---|
20150179182 A1 | Jun 2015 | US |
Number | Date | Country | |
---|---|---|---|
61918076 | Dec 2013 | US |