This disclosure relates to signal processing.
Despite the increased capacity of memory devices and widely available data delivery at increasingly high bandwidths, there is continued pressure to minimize the amount of data to be stored and/or transmitted. For example, audio and video data are often delivered together, and the bandwidth for audio data is often constrained by the requirements of the video portion.
Accordingly, audio data are often encoded at high compression factors, sometimes at compression factors of 30:1 or higher. Because signal distortion increases with the amount of applied compression, trade-offs may be made between the fidelity of the decoded audio data and the efficiency of storing and/or transmitting the encoded data.
Moreover, it is desirable to reduce the complexity of the encoding and decoding algorithms. Encoding additional data regarding the encoding process can simplify the decoding process, but at the cost of storing and/or transmitting additional encoded data. Although existing data encoding and decoding methods are generally satisfactory, improved methods would be desirable.
Some aspects of the subject matter described in this disclosure can be implemented in signal processing methods and devices, including encoding and decoding methods and devices. Some such methods may involve receiving a signal and analyzing the signal to determine parameter values of an N-dimensional parameter set. As used herein, the phrase “N-dimensional parameter set” refers to a parameter set wherein each parameter is indexed in N dimensions.
In some implementations, the signal may include audio data. According to some such implementations, the dimensions may correspond to channels, frequency bands, time units (e.g., blocks), etc. In some implementations, parameters of the parameter set may include correlation coefficients between individual discrete channels and a coupling channel. These correlation coefficients may be referred to herein as “alphas.” Alternatively, or additionally, parameters of the parameter set may include inter-channel correlation coefficients that indicate a correlation between pairs of individual discrete channels. Such parameters may sometimes be referred to herein as reflecting “inter-channel coherence” or “ICC.” However, the signal processing methods and devices described herein are not only applicable to dimensions and parameters of audio data, but instead have wide applicability.
Some implementations involve applying a first vector quantization process to two or more parameter values along a first dimension of the N-dimensional parameter set to produce a first set of quantized values. Such implementations may involve calculating two or more parameter prediction values along a second dimension of the N-dimensional parameter set based, at least in part, on one or more values of the first set of quantized values. The implementations may involve calculating prediction residual values based, at least in part, on the parameter prediction values and applying a second vector quantization process to the prediction residual values to produce a second set of quantized values.
Some such implementations may involve determining a first vector quantization index corresponding to the first set of quantized values and determining a second vector quantization index corresponding to the second set of quantized values. The first and second quantization indices may, for example, include pointers to data structure locations at which the first and second sets of quantized values, respectively, are stored.
Some implementations may involve calculating two or more parameter prediction values along a kth dimension of the N-dimensional parameter set, based at least in part on one or more values of one or more of (k−1) previously produced sets of quantized values, calculating prediction residual values based at least in part on the parameter prediction values along the kth dimension and applying a kth vector quantization process to the prediction residual values along the kth dimension to produce a kth set of quantized values.
Some such implementations may involve determining a maximum vector quantizer length Mk for dimension k and determining that a number of values Vk to be vector quantized exceeds Mk. Such implementations may involve determining Vk−Mk remaining values to be vector quantized and predicting, based at least in part on at least one of the Mk quantized values, Vk−Mk parameter prediction values along the kth dimension. The implementations may involve calculating (Vk−Mk) kth dimension prediction residual values and performing a vector quantization process for the (Vk−Mk) kth dimension prediction residual values to produce Vk−Mk quantized values of the kth parameter set.
According to some implementations, a method may involve receiving a signal and analyzing the signal to determine parameter values of an N-dimensional parameter set. In some implementations, the signal may include audio data. The method may involve applying a first vector quantization process to two or more parameter values along a first dimension of the N-dimensional parameter set to produce a first set of quantized values and calculating two or more parameter prediction values along a second dimension of the N-dimensional parameter set based, at least in part, on one or more values of the first set of quantized values. The method may involve calculating prediction residual values based, at least in part, on the parameter prediction values and applying a second vector quantization process to the prediction residual values to produce a second set of quantized values. A distortion metric used to design the quantizers or in codebook search in the performing process may be a mean squared error distortion metric.
The method may involve determining a first vector quantization index corresponding to the first set of quantized values and determining a second vector quantization index corresponding to the second set of quantized values. The first and second quantization indices may comprise pointers to data structure locations at which the first and second sets of quantized values, respectively, are stored.
The method may involve calculating two or more parameter prediction values along a kth dimension of the N-dimensional parameter set, based at least in part on one or more values of one or more of (k−1) previously produced sets of quantized values, calculating prediction residual values based at least in part on the parameter prediction values along the kth dimension and applying a kth vector quantization process to the prediction residual values along the kth dimension to produce a kth set of quantized values.
The method may involve the following operations: determining a maximum vector quantizer length Mk for dimension k; determining that a number of values Vk to be vector quantized exceeds Mk; determining Vk−Mk remaining values to be vector quantized; predicting, based at least in part on at least one of the Mk quantized values, Vk−Mk parameter prediction values along the kth dimension; calculating (Vk−Mk) kth dimension prediction residual values; and performing a vector quantization process for the (Vk−Mk) kth dimension prediction residual values to produce Vk−Mk quantized values of the kth parameter set.
Determining the maximum vector quantizer length Mk may involve receiving an indication of the maximum vector quantizer length Mk from a user. The maximum vector length Mk may be a variable that controls a bit-rate for encoding parameters and may be determined based, at least in part, on an available bit-rate for parameter encoding.
The method may involve forming the parameter set into partitions of the parameter set in a signal-adaptive manner. In some implementations, the analyzing, applying and calculating processes may be applied separately on each partition of the parameter set. The forming process may vary in time.
The dimensions may include channels and/or frequency bands. The dimensions may include time blocks. The parameter values may include spatial parameter values. For example, the spatial parameter values may include correlation coefficients (“alpha values”) between individual discrete channels and a coupling channel. The prediction of an alpha value for a kth stage of the method may involve a reconstruction of an alpha value of a (k−1)th stage of the method.
The frequency bands may include coupling channel frequency bands. The alpha values may be shared across at least some adjacent time blocks. The method may involve performing a windowed calculation of alphas across at least one of time blocks or frequency bands.
The dimensions may include pairs of individual discrete channels. The parameter values may include inter-channel correlation coefficients (“ICCs”) that indicate a correlation between the pairs of individual discrete channels. The first dimension may correspond to pairs of individual discrete channels. The first vector quantization process may produce first quantized ICC values. For example, the first vector quantization may involve the following processes: quantizing a vector that includes ICCs of M−1 channel pairs in an Mp-channel-pair cycle, to produce quantized values of the M−1 ICCs; calculating a range in which the Mpth ICC lies based, at least in part, on the quantized values of the M−1 ICCs; and quantizing the Mpth ICC with a scalar quantizer, conditioned on the calculated range.
According to some alternative implementations, a method may involve receiving a signal comprising first and second vector quantization indices and performing a first inverse vector quantization operation in response to the first vector quantization index to reconstruct two or more parameter values along a first dimension of an N-dimensional parameter set. The method may involve determining two or more parameter prediction values of a second dimension of the N-dimensional parameter set based at least in part on one or more of the two or more parameter values of the first dimension of the N-dimensional parameter set, performing a second inverse vector quantization operation in response to the second vector quantization index to reconstruct two or more prediction residual values of the second dimension and combining the parameter prediction values of the second dimension with the prediction residual values of the second dimension to reconstruct two or more parameter values of the second dimension.
The method may involve the following processes: receiving a kth vector quantization index; determining two or more parameter prediction values along a kth dimension of the N-dimensional parameter set, based at least in part on one or more previously determined parameter values of a dimension less than k of the N-dimensional parameter set; performing a kth inverse vector quantization operation in response to the kth vector quantization index to reconstruct two or more prediction residual values of the kth dimension; and combining the parameter prediction values of the kth dimension with the prediction residual values of the kth dimension to reconstruct two or more parameter values of the kth dimension.
The method may involve the following processes: receiving an indication of a maximum vector quantizer length Mk for dimension k; determining that a remaining number of parameter values Vk to be reconstructed along dimension k exceeds Mk; reconstructing the first Mk values along dimension k based, at least in part, on the kth quantization index; determining, based at least in part on the kth quantization index, Vk−Mk parameter prediction values of the kth dimension; receiving an additional vector quantization index for the kth dimension; performing an inverse vector quantization operation, in response to the additional vector quantization index for the kth dimension, to reconstruct Vk−Mk prediction residual values of the kth dimension; and combining the Vk−Mk prediction residual values of the kth dimension with the Vk−Mk parameter prediction values of the kth dimension to reconstruct the remaining Vk−Mk parameter values of the kth dimension.
According to some implementations, the first vector quantization index may correspond to a memory location of a first set of quantized values and the second vector quantization index may correspond to a memory location of a second set of quantized values.
The method may involve receiving parameter set partition information and implementing the performing and/or the determining steps according to the parameter set partition information.
The signal may include encoded audio data. The dimensions may include channels and frequency bands. The dimensions may include time blocks. The parameter values may be spatial parameter values. For example, the spatial parameter values may comprise correlation coefficients (“alpha values”) between individual discrete channels and a coupling channel. The frequency bands may include coupling channel frequency bands. In some implementations, the prediction of an alpha value for a kth stage of the method may involve a reconstruction of an alpha value of a (k−1)th stage of the method. In some examples, the alpha values may be shared across at least some adjacent time blocks.
The dimensions may include pairs of individual discrete channels. The parameter values may include inter-channel correlation coefficients (“ICCs”) that indicate a correlation between the pairs of individual discrete channels.
According to some implementations, an apparatus may include an interface and a logic system. The logic system may include at least one of a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The apparatus may include a memory device. The interface may be an interface between the logic system and the memory device. Alternatively, or additionally, the interface may include a network interface.
The logic system may be capable of receiving a signal via the interface. The logic system may be capable of analyzing the signal to determine parameter values of an N-dimensional parameter set and for applying a first vector quantization process to two or more parameter values along a first dimension of the N-dimensional parameter set to produce a first set of quantized values. The logic system may be capable of calculating two or more parameter prediction values along a second dimension of the N-dimensional parameter set based, at least in part, on one or more values of the first set of quantized values, calculating prediction residual values based, at least in part, on the parameter prediction values and applying a second vector quantization process to the prediction residual values to produce a second set of quantized values.
The logic system may be further capable of determining a first vector quantization index corresponding to the first set of quantized values and for determining a second vector quantization index corresponding to the second set of quantized values. The first and second quantization indices may comprise pointers to data structure locations at which the first and second sets of quantized values, respectively, are stored.
The logic system may be further capable of performing the following operations: calculating two or more parameter prediction values along a kth dimension of the N-dimensional parameter set, based at least in part on one or more values of one or more of (k−1) previously produced sets of quantized values; calculating prediction residual values based at least in part on the parameter prediction values along the kth dimension; and applying a kth vector quantization process to the prediction residual values along the kth dimension to produce a kth set of quantized values.
The logic system may be further capable of performing the following operations: determining a maximum vector quantizer length Mk for dimension k; determining that a number of values Vk to be vector quantized exceeds Mk; determining Vk−Mk remaining values to be vector quantized; predicting, based at least in part on at least one of the Mk quantized values, Vk−Mk parameter prediction values along the kth dimension; calculating (Vk−Mk) kth dimension prediction residual values; and performing a vector quantization process for the (Vk−Mk) kth dimension prediction residual values to produce Vk−Mk quantized values of the kth parameter set.
According to some implementations, an apparatus may include an interface and a logic system. The logic system may include at least one of a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The apparatus may include a memory device. The interface may be an interface between the logic system and the memory device. Alternatively, or additionally, the interface may include a network interface.
The logic system may be capable of receiving a signal, via the interface, that includes first and second vector quantization indices. In some implementations, the signal may include encoded audio data. The logic system may be capable of performing a first inverse vector quantization operation in response to the first vector quantization index to reconstruct two or more parameter values along a first dimension of an N-dimensional parameter set. The logic system may be capable of determining two or more parameter prediction values of a second dimension of the N-dimensional parameter set based at least in part on one or more of the two or more parameter values of the first dimension of the N-dimensional parameter set.
The logic system may be capable of performing a second inverse vector quantization operation in response to the second vector quantization index to reconstruct two or more prediction residual values of the second dimension. The logic system may be capable of combining the parameter prediction values of the second dimension with the prediction residual values of the second dimension to reconstruct two or more parameter values of the second dimension.
The logic system also may be capable of performing the following operations: receiving, via the interface, a kth vector quantization index; determining two or more parameter prediction values along a kth dimension of the N-dimensional parameter set, based at least in part on one or more previously determined parameter values of a dimension less than k of the N-dimensional parameter set; performing a kth inverse vector quantization operation in response to the kth vector quantization index to reconstruct two or more prediction residual values of the kth dimension; and combining the parameter prediction values of the kth dimension with the prediction residual values of the kth dimension to reconstruct two or more parameter values of the kth dimension.
The logic system may be further capable of receiving an indication of a maximum vector quantizer length Mk for dimension k, of determining that a remaining number of parameter values Vk to be reconstructed along dimension k exceeds Mk and of reconstructing the first Mk values along dimension k based, at least in part, on the kth quantization index. The logic system may be capable of determining, based at least in part on the kth quantization index, Vk−Mk parameter prediction values of the kth dimension. The logic system may be capable of receiving an additional vector quantization index for the kth dimension and of performing an inverse vector quantization operation, in response to the additional vector quantization index for the kth dimension, to reconstruct Vk−Mk prediction residual values of the kth dimension. The logic system may be capable of combining the Vk-Mk prediction residual values of the kth dimension with the Vk−Mk parameter prediction values of the kth dimension to reconstruct the remaining Vk−Mk parameter values of the kth dimension.
The first vector quantization index may correspond to a memory location of a first set of quantized values. The second vector quantization index may correspond to a memory location of a second set of quantized values. The logic system may be further capable of receiving parameter set partition information; and of implementing the performing and determining steps according to the parameter set partition information.
In some implementations, an apparatus may include an interface and a logic system configured for performing at least some of the other methods described herein. The logic system may include at least one of a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The apparatus may include a memory device. In some implementations, the interface may be an interface between the logic system and the memory device. Alternatively, the interface may be a network interface.
Some aspects of this disclosure may be implemented via a non-transitory medium having software stored thereon. The software may include instructions for controlling at least one apparatus to perform the following operations: receive a signal; analyze the signal to determine parameter values of an N-dimensional parameter set; apply a first vector quantization process to two or more parameter values along a first dimension of the N-dimensional parameter set to produce a first set of quantized values; calculate two or more parameter prediction values along a second dimension of the N-dimensional parameter set based, at least in part, on one or more values of the first set of quantized values; calculate prediction residual values based, at least in part, on the parameter prediction values; and apply a second vector quantization process to the prediction residual values to produce a second set of quantized values.
The software may include instructions for controlling the at least one apparatus to determine a first vector quantization index corresponding to the first set of quantized values and to determine a second vector quantization index corresponding to the second set of quantized values. The first and second quantization indices may, for example, be pointers to data structure locations at which the first and second sets of quantized values, respectively, are stored.
The software may include instructions for controlling the at least one apparatus to perform the following operations: calculate two or more parameter prediction values along a kth dimension of the N-dimensional parameter set, based at least in part on one or more values of one or more of (k−1) previously produced sets of quantized values; calculate prediction residual values based at least in part on the parameter prediction values along the kth dimension; and apply a kth vector quantization process to the prediction residual values along the kth dimension, to produce a kth set of quantized values.
The software may include instructions for controlling the at least one apparatus to do the following: determine a maximum vector quantizer length Mk for dimension k; determine that a number of values Vk to be vector quantized exceeds Mk; determine Vk−Mk remaining values to be vector quantized; predict, based at least in part on at least one of the Mk quantized values, Vk−Mk parameter prediction values along the kth dimension; calculate (Vk−Mk) kth dimension prediction residual values; and perform a vector quantization process for the (Vk−Mk) kth dimension prediction residual values to produce Vk-Mk quantized values of the kth parameter set.
Other aspects of this disclosure also may be implemented via a non-transitory medium having software stored thereon. The software may include instructions for controlling at least one apparatus to perform the following operations: receive a signal comprising first and second vector quantization indices; perform a first inverse vector quantization operation in response to the first vector quantization index to reconstruct two or more parameter values along a first dimension of an N-dimensional parameter set; determine two or more parameter prediction values of a second dimension of the N-dimensional parameter set based at least in part on one or more of the two or more parameter values of the first dimension of the N-dimensional parameter set; perform a second inverse vector quantization operation in response to the second vector quantization index to reconstruct two or more prediction residual values of the second dimension; and combine the parameter prediction values of the second dimension with the prediction residual values of the second dimension to reconstruct two or more parameter values of the second dimension. In some implementations, the signal may include encoded audio data.
The software may include instructions for controlling the at least one apparatus to perform the following operations: receive a kth vector quantization index; determine two or more parameter prediction values along a kth dimension of the N-dimensional parameter set, based at least in part on one or more previously determined parameter values of a dimension less than k of the N-dimensional parameter set; perform a kth inverse vector quantization operation in response to the kth vector quantization index to reconstruct two or more prediction residual values of the kth dimension; and combine the parameter prediction values of the kth dimension with the prediction residual values of the kth dimension to reconstruct two or more parameter values of the kth dimension.
The software may include instructions for controlling the at least one apparatus to do the following: receive an indication of a maximum vector quantizer length Mk for dimension k; determining that a remaining number of parameter values Vk to be reconstructed along dimension k exceeds Mk; reconstructing the first Mk values along dimension k based, at least in part, on the kth quantization index; determining, based at least in part on the kth quantization index, Vk−Mk parameter prediction values of the kth dimension; receiving an additional vector quantization index for the kth dimension; performing an inverse vector quantization operation, in response to the additional vector quantization index for the kth dimension, to reconstruct Vk−Mk prediction residual values of the kth dimension; and combining the Vk−Mk prediction residual values of the kth dimension with the Vk−Mk parameter prediction values of the kth dimension to reconstruct the remaining Vk−Mk parameter values of the kth dimension.
In some implementations, the first vector quantization index may correspond to a memory location of a first set of quantized values and the second vector quantization index may correspond to a memory location of a second set of quantized values. The software may include instructions for controlling the at least one apparatus to receive parameter set partition information and to implement the performing and determining steps according to the parameter set partition information.
Other aspects of this disclosure also may be implemented in a non-transitory medium having software stored thereon. The software may include instructions to control one or more devices to perform at least some of the methods described herein.
Details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects and advantages will become apparent from the description, the drawings and the claims. Note that the relative dimensions of the following figures may not be drawn to scale.
Like reference numbers and designations in the various drawings indicate like elements.
The following description is directed to certain implementations for the purposes of describing some innovative aspects of this disclosure, as well as examples of contexts in which these innovative aspects may be implemented. However, the teachings herein can be applied in various different ways.
It is generally desirable to minimize the amount of data to be stored and/or transmitted. Encoding additional data may simplify the decoding process and/or provide greater functionality for the decoder, but at the cost of storing and/or transmitting additional encoded data. Therefore, there are many contexts in which efficient data encoding can provide benefit. Although the examples provided in this application are primarily described in terms of audio data, the concepts provided herein apply to other types of data, including but not limited to video data, image data, speech data, sensor signals (e.g., signals from temperature sensors, pressure sensors, gyroscopes, accelerometers), etc. Moreover, the described implementations may be embodied in various signal processing devices, including but not limited to encoders and/or decoders, which may be included in theater reproduction systems, mobile telephones, smartphones, desktop computers, hand-held or portable computers, netbooks, notebooks, smartbooks, tablets, stereo systems, televisions, set-top boxes, receivers, including but not limited to audio and audio-visual receivers, home theater systems, DVD players, digital recording devices and a variety of other devices. Accordingly, the teachings of this disclosure are not intended to be limited to the implementations shown in the figures and/or described herein, but instead have wide applicability.
Some audio codecs, including the AC-3 and E-AC-3 audio codecs (proprietary implementations of which are licensed as “Dolby Digital” and “Dolby Digital Plus”), employ some form of channel coupling to exploit redundancies between channels, encode data more efficiently and reduce the coding bit-rate. For example, with the AC-3 and E-AC-3 codecs, in a coupling channel frequency range beyond a specific “coupling-begin frequency,” the modified discrete cosine transform (MDCT) coefficients of the discrete channels (also referred to herein as “individual channels”) are downmixed to a mono channel, which may be referred to herein as a “composite channel” or a “coupling channel.” Some codecs may form two or more coupling channels.
The AC-3 and E-AC-3 decoders upmix the mono signal of the coupling channel into the discrete channels using scale factors based on coupling coordinates sent in the bitstream. In this manner, the decoder restores a high frequency envelope, but not the phase, of the audio data in the coupling channel frequency range of each channel.
As shown in
Various implementations described herein may mitigate these effects, at least in part. Some such implementations involve novel audio encoding and/or decoding tools. For example, some such implementations may involve efficient encoding of parameters, such as spatial parameters, that may be used in a decorrelation process that can restore phase diversity of the output channels in frequency regions encoded by channel coupling.
Some audio processing systems described herein may be configured to determine one or more types of spatial parameters of audio data. Some such spatial parameters may be correlation coefficients between individual discrete channels and a coupling channel, which also may be referred to herein as “alphas.” Alphas also may be referred to herein as “mixing ratios.” For example, if the coupling channel includes audio data for four channels, there may be four alphas, one alpha for each channel. In some such implementations, the four channels may be the left channel (“L”), the right channel (“R”), the left surround channel (“Ls”) and the right surround channel (“Rs”). In some implementations, the coupling channel may include audio data for the above-described channels and a center channel. An alpha may or may not be calculated for the center channel, depending on whether the center channel will be decorrelated. Other implementations may involve a larger or smaller number of channels.
Other spatial parameters may be inter-channel correlation coefficients that indicate a correlation between pairs of individual discrete channels. Such parameters may sometimes be referred to herein as reflecting “inter-channel coherence” or “ICC.” In the four-channel example referenced above, there may be six ICC values involved, for the L-R pair, the L-Ls pair, the L-Rs pair, the R-Ls pair, the R-Rs pair and the Ls-Rs pair.
In some implementations, the determination of spatial parameters by a device (such as a decoder) may involve receiving explicit spatial parameters in a bitstream. Alternatively, or additionally, a device (such as an encoder or a decoder) may be configured to determine or to estimate at least some spatial parameters. Some devices may be configured to determine mixing parameters based, at least in part, on spatial parameters.
Referring first to the left panel of
The right panel of
However, restoring the spatial relationship between individual discrete channels and a coupling channel does not guarantee the restoration of the spatial relationships between the discrete channels (represented by the ICCs). This fact is illustrated in
In the examples shown in
Because the discrete channels are ultimately reproduced and presented to listeners, proper restoration of the spatial relationships between discrete channels (the ICCs) may significantly improve the restoration of spatial characteristics of the audio data. As may be seen by the examples of
In the left panel of
Accordingly, by setting the IDC between spatially adjacent individual channels to −1, the ICC between these channels may be minimized and the spatial relationship between the channels may be closely restored when these channels are dominant. This results in an overall sound image that is perceptually approximate to the sound image of the original audio signal. Such methods may be referred to herein as “sign-flip” methods. In such methods, no knowledge of the actual ICCs is required.
Note, however, that such methods may still use the alpha parameters, and some methods may involve encoding these alpha parameters into a bitstream and transmitting the encoded parameters to a receiving device, such as a decoding device or a related device. The receiving device may use these alpha parameters, e.g., as an input to a decorrelation process. Other side information may be provided in a bitstream to a decoder, such as channel-specific scaling factors. For example, if the audio data has been encoded according to the AC-3 or E-AC-3 audio codecs, the scaling factors may be coupling coordinates or “cplcoords” that are encoded with the rest of the audio data. In alternate implementations, the ICCs may be derived at an encoder, coded and sent through a bitstream to a decoding device. Some such implementations may involve deriving the alpha parameters, if required, using the transmitted ICC parameters.
In some implementations, alphas may be transmitted at least once per frame, whereas in other implementations alphas may be transmitted as frequently as every block. In some implementations, a retransmission of alphas will occur whenever the coupling strategy changes. A retransmission of alphas generally implies a retransmission for all channels. Alphas are generally transmitted at the same frequency resolution as cplcoords and may be shared across frequency, e.g., as determined by the coupling band structure.
An encoder may calculate the alpha of a coupling band of a channel as the real part of the correlation coefficient between the complex (MDCT and MDST) transform coefficients of the channel and the complex transform coefficients of the coupling channel within the same band. This value may be averaged across blocks over which the alphas are shared and quantized. Further the encoder may employ a windowed calculation of alphas, where it may apply a window across frequency (e.g., on a consecutive set of frequency coefficients) centered in a particular band and tapering off to neighboring bands. The cross product of the windowed coefficients of a given channel and similarly windowed coefficients of the coupling channel may then be calculated to derive the correlation coefficient of the band.
Various implementations are described herein for efficiently encoding information, including but not limited to audio data. Some implementations involve exploiting the correlations between parameter values across various dimensions. In the example of audio data, some implementations may achieve relatively greater data encoding efficiencies by exploiting the correlations between parameter values across frequency bands, time intervals, channels and/or other dimensions. Some such correlations of parameters across dimensions will now be described in the context of audio data.
As shown by the peak in
According to some implementations described herein, this correlation between alphas of different channels is exploited to gain coding efficiency. In some such implementations, coding efficiency may be enhanced by the use of a vector quantizer (“VQ”) to jointly quantize alphas of coupled channels.
However,
In this example, method 500 begins with block 502, in which a signal is received. For example, a signal may be received by a logic system of an encoding device in block 502. In this implementation, block 504 involves analyzing the signal to determine parameter values of an N-dimensional parameter set.
In
In block 506 of
Block 506 also may involve determining a first vector quantization index corresponding to the first set of quantized values. The first vector quantization index may, for example, be a pointer to a data structure location in which the first set of quantized values may be stored.
Block 508 may involve calculating two or more parameter prediction values along a second dimension of the N-dimensional parameter set based, at least in part, on one or more values of the first set of quantized values. In this example, the second dimension is D2, which corresponds to frequency bands, and the parameter prediction values for frequency bands 1 through 4 of channel zero (corresponding to cells 610, 615, 620 and 625) are the quantized value of α0,0,0 or {circumflex over (α)}0,0,0. Similarly, the parameter prediction values for frequency bands 1 through 4 of channels one and two are the quantized values of α1,0,0 and α2,0,0, respectively. Therefore, in this example, the parameter prediction values correspond to the first set of quantized values. However, in alternative implementations, the parameter prediction values may be derived from, but not identical to, the first set of quantized values.
In this example, block 510 involves calculating prediction residual values based, at least in part, on the parameter prediction values. Here, the prediction residual values are the differences between parameter value (the alpha value in this instance) for each cell and the parameter prediction value for that cell.
In this implementation, block 512 involves applying a second vector quantization process to the prediction residual values to produce a second set of quantized values. Block 512 also may involve determining a second vector quantization index corresponding to the second set of quantized values. The second vector quantization index may be a pointer to a data structure location in which the second set of quantized values are, or will be, stored. The data structure may be a codebook. In some implementations, a distortion metric may be used to design the quantizers for the VQ process (or in codebook search). For example, the distortion metric may be a mean squared error distortion metric. The VQ design process may partition a training set of vectors into clusters such that the sum of distances of each training vector from the centroid or average vector in the subset containing the training vector is minimized. Here the distance may be the distortion, as calculated by the distortion metric, incurred in approximating a training vector by the centroid of the subset it belongs to. In other words, the centroid of the subset may be the reconstruction of the training vectors in the subset.
In the example shown in
The encoding process described above may be extended into any number of dimensions.
Here, block 522 involves calculating two or more parameter prediction values along a kth dimension of the N-dimensional parameter set, based at least in part on one or more values of one or more of (k−1) previously produced sets of quantized values. In this implementation, block 524 involves calculating prediction residual values based, at least in part, on the parameter prediction values along the kth dimension.
In the example shown in
Here, the parameter prediction value used for determining the prediction residual values for channel zero, frequency band zero is the quantized value of α0,0,0. The prediction residual values for cells 630, 635, 640 and 645 are determined by subtracting the quantized value of α0,0,0 from the alpha value corresponding to each cell.
In this implementation, block 526 involves applying a kth vector quantization process to the prediction residual values along the kth dimension to produce a kth set of quantized values. In the example shown in
Prediction residual values for other frequency bands and blocks may be determined in a similar fashion. Referring to
Here, block 532 involves determining a maximum vector quantizer length Mk for dimension k. In some implementations, determining the maximum vector quantizer length Mk may involve receiving an indication of the maximum vector quantizer length Mk from a user, e.g., via a user interface. Alternatively, block 532 may involve retrieving the maximum vector quantizer length Mk from a memory. In some implementations, the maximum vector length Mk may be a variable that controls a bit rate for encoding parameters. Accordingly, the maximum vector length Mk may be based, at least in part, on an available bit rate for parameter encoding. In some implementations, this bit rate may vary over time. Another reason that the VQ length may be limited to a maximum Mk would be to constrain the amount of memory required to store the VQ codebooks, the tables of reconstructions corresponding to the VQs.
In this example, block 534 involves determining that a number of values Vk to be vector quantized exceed Mk and block 536 involves determining Vk−Mk remaining values to be vector quantized. Referring to
In this implementation, block 538 involves predicting, based at least in part on at least one of the Mk quantized values, (Vk−Mk) parameter prediction values along the kth dimension. In the example shown in
Here, block 540 of
In this implementation, block 542 involves performing a vector quantization process for the (Vk−Mk) kth dimension prediction residual values to produce Vk-Mk quantized values of the kth parameter set. In the example of
In some implementations, block 536 may involve determining that there is only one remaining parameter value to be quantized (Vk−Mk=1). In such implementations, the parameter value may be scalar quantized.
As noted above, various implementations provided herein involve providing an indication of VQ length with encoded signals. This may be necessary in cases where the VQ length is not fixed but instead is variable, for example, as a function of one or more of time, frequency, channel, etc.
As a first example, in some implementations, the VQ length may be varied to control the bit-rate and resolution for parameter encoding.
Furthermore, the VQ length could be varied based on considerations other than bit-rate as well. For example, signal characteristics could change over time, in response to which encoding decisions including the VQ length for parameter encoding may change. For instance, transients may occur at different times in different channels of an audio signal. Since typically only channels that do not have strong transients are coupled, the number and choice of channels in coupling can change from one time-block to the next, depending on which of them have transients. Each time such a coupling decision changes one may need to retransmit alpha parameters. Naturally an inter-channel VQ may need to be only of length 2 if 2 channels are in coupling, while it will be 3, if 3 channels are in coupling. Some other implementations will now be described with reference to
In some implementations, a change similar to that depicted between
Moreover, in some implementations, at least some of the processes described above with reference to
Such partitioning may be advantageous, for example, to avoid exceeding a maximum VQ length for encoding parameter values corresponding to each of the volumes 705 and 710. For example, if the maximum VQ length is 3 and there are six parameter values to encode for each unit of data along dimension three (e.g., for each frame of data), it may be advantageous to partition the array along dimension three and group the parameter values into groups of 3.
Although
These parameter values may be quantized as described above with reference to any of
In some implementations, a quantization process (e.g., the first vector quantization process) may involve quantizing a vector that includes ICCs of M−1 channel pairs in an Mp-channel-pair cycle, to produce quantized values of the M−1 ICCs. Referring to
The quantization process also may involve calculating a range in which the Mpth ICC lies based, at least in part, on the quantized values of the M−1 ICCs. Referring to
Method 1000 may involve receiving signals that include data encoded according to methods described above. In this example, block 1002 of method 1000 involves receiving a signal that includes first and second vector quantization indices. The signal also may include other information, such as indications of VQ length, partitioning information, etc. In some implementations, the signal may include encoded audio data. The first and second quantization indices may, for example, include pointers to data structure locations at which the first and second sets of quantized values, respectively, are stored. The data structure locations may be locations in a codebook accessible by a decoding device, e.g., in a memory of a decoding device.
Here, block 1004 involves performing a first inverse vector quantization operation in response to the first vector quantization index to reconstruct two or more parameter values along a first dimension of an N-dimensional parameter set. In some implementations, the parameter values may be spatial parameter values. Referring to
In this example, block 1006 involves determining two or more parameter prediction values of a second dimension of the N-dimensional parameter set based, at least in part, on one or more of the two or more parameter values of the first dimension of the N-dimensional parameter set. Referring again to
In this implementation, block 1008 involves performing a second inverse vector quantization operation in response to the second vector quantization index to reconstruct two or more prediction residual values of the second dimension. In various implementations described above, these prediction residual values were vector quantized, e.g., by an encoding device. The second vector quantization index may include a pointer to a data structure location at which the vector quantized prediction residual values of the second dimension may be found.
Referring again to
These prediction residual values, not the actual parameter values, are the output of block 1008 in this example. Accordingly, block 1010 involves combining the parameter prediction values of the second dimension with the prediction residual values of the second dimension to reconstruct two or more parameter values of the second dimension. In the example shown in
As noted above, some implementations may involve forming a parameter set into partitions, e.g., in a time-varying and/or signal-adaptive manner. Therefore, in some implementations block 1002 may involve receiving other information, such as parameter set partition information. Block 1002 also may involve receiving VQ length information. The processes of method 1000 (as well as other decoding methods described herein) may be performed, at least in part, according to the parameter set partition information and/or the VQ length information.
In this implementation, block 1024 involves determining two or more parameter prediction values along a kth dimension of the N-dimensional parameter set, based at least in part on one or more previously determined parameter values of a dimension less than k. In the example shown in
In other implementations, the parameter prediction values may be based on, but not identical to, the quantized alpha values. In still other implementations, the parameter prediction values may be determined according to the first vector quantization index. For example, the parameter prediction values may be determined by performing an operation on values indicated by the first vector quantization index.
In this example, block 1026 of method 1000 involves performing a kth inverse vector quantization operation in response to the kth vector quantization index to reconstruct two or more prediction residual values of the kth dimension. In the example of
In order to reconstruct the actual parameter values, method 1020 includes a further operation: here, block 1028 involves combining the parameter prediction values of the kth dimension with the prediction residual values of the kth dimension to reconstruct two or more parameter values of the kth dimension. In the example of
In some implementations, alpha values may be shared across at least some adjacent time blocks. Accordingly, the alpha values for cells 630, 635, 640 and 645 may correspond to more than 4 time blocks. Moreover, in some implementations the dimensions may include pairs of individual discrete channels. The reconstructed parameter values may be inter-channel correlation coefficients (“ICCs”) that indicate a correlation between the pairs of individual discrete channels.
In this implementation, block 1034 involves determining that a remaining number of parameter values Vk to be reconstructed along dimension k exceeds Mk. Referring to
Here, block 1036 involves reconstructing the first Mk values along dimension k based, at least in part, on the kth quantization index. In the example shown in
In this example, block 1038 involves determining, based at least in part on the kth quantization index, Vk−Mk parameter prediction values of the kth dimension. In the example of
In block 1040, an additional vector quantization index for the kth dimension is received. In this example, the additional vector quantization index corresponds to the prediction residual values for cells 670, 675 and 680.
In block 1042, an inverse vector quantization operation is performed in response to the additional vector quantization index for the kth dimension to reconstruct Vk−Mk additional prediction residual values of the kth dimension. In this example, the inverse vector quantization operation reconstructs the prediction residual values corresponding to cells 670, 675 and 680.
Here, block 1044 involves combining the Vk−Mk prediction residual values of the kth dimension obtained in block 1042 with the Vk−Mk parameter prediction values of the kth dimension obtained in block 1038 to reconstruct the remaining Vk−Mk parameter values of the kth dimension. In the example of
The audio processing system 1100 may be configured to perform methods such as those that are described above, e.g., with reference to
In this example, an upmixer 1125 receives audio data 1110, which includes frequency domain representations of audio data of a coupling channel. The frequency domain representations are MDCT coefficients in this example.
The upmixer 1125 also receives coupling coordinates 1112 for each channel and coupling channel frequency range. In this implementation, scaling information, in the form of coupling coordinates 1112, has been computed in a Dolby Digital or Dolby Digital Plus encoder in an exponent-mantissa form. The upmixer 1125 may compute frequency coefficients for each output channel by multiplying the coupling channel frequency coordinates by the coupling coordinates for that channel.
In this implementation, the upmixer 1125 outputs decoupled MDCT coefficients of individual channels in the coupling channel frequency range to the decorrelator 1105. Accordingly, in this example the audio data 1120 that are input to the decorrelator 1105 include MDCT coefficients.
In the example shown in
In this example, decorrelation information 1140 is received by the decorrelator 1105. The type of decorrelation information 1140 received may vary according to the implementation. In some implementations, the decorrelation information 1140 may include explicit, decorrelator-specific control information and/or explicit information that may form the basis of such control information. The decorrelation information 1140 may, for example, include spatial parameters such as correlation coefficients between individual discrete channels and a coupling channel and/or correlation coefficients between individual discrete channels. Such explicit decorrelation information 1140 also may include explicit tonality information and/or transient information. This information may be used to determine, at least in part, decorrelation filter parameters for the decorrelator 1105.
However, in alternative implementations, no such explicit decorrelation information 1140 is received by the decorrelator 1105. According to some such implementations, the decorrelation information 1140 may include information from a bitstream of a legacy audio codec. For example, the decorrelation information 1140 may include time segmentation information that is available in a bitstream encoded according to the AC-3 audio codec or the E-AC-3 audio codec. The decorrelation information 1140 may include coupling-in-use information, block-switching information, exponent information, exponent strategy information, etc. Such information may have been received by an audio processing system in a bitstream along with audio data 1110.
In some implementations, the decorrelator 1105 (or another element of the audio processing system 1100) may determine spatial parameters, tonality information and/or transient information based on one or more attributes of the audio data. For example, the audio processing system 1100 may determine spatial parameters for frequencies in the coupling channel frequency range based on the audio data 1145a or 1145b, outside of the coupling channel frequency range. Alternatively, or additionally, the audio processing system 1100 may determine tonality information based on information from a bitstream of a legacy audio codec.
In this example, the device includes an interface system 1205. The interface system 1205 may include a network interface, such as a wireless network interface. Alternatively, or additionally, the interface system 1205 may include a universal serial bus (USB) interface or another such interface.
The device 1200 includes a logic system 1210. The logic system 1210 may include a processor, such as a general purpose single- or multi-chip processor. The logic system 1210 may include a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components, or combinations thereof. The logic system 1210 may be configured to control the other components of the device 1200. Although no interfaces between the components of the device 1200 are shown in
The logic system 1210 may be configured to perform various types of audio processing functionality, such as encoder and/or decoder functionality. Such encoder and/or decoder functionality may include, but is not limited to, the types of encoder and/or decoder functionality described herein. For example, the logic system 1210 may be configured to provide the vector quantization, partitioning, encoding, decoding, inverse vector quantization and/or decorrelator-related functionality described herein. In some such implementations, the logic system 1210 may be configured to operate (at least in part) according to software stored on one or more non-transitory media. The non-transitory media may include memory associated with the logic system 1210, such as random access memory (RAM) and/or read-only memory (ROM). The non-transitory media may include memory of the memory system 1215. The memory system 1215 may include one or more suitable types of non-transitory storage media, such as flash memory, a hard drive, etc.
For example, the logic system 1210 may be configured to receive frames of encoded audio data via the interface system 1205 and to decode the encoded audio data according to the methods described herein. Alternatively, or additionally, the logic system 1210 may be configured to receive frames of encoded audio data via an interface between the memory system 1215 and the logic system 1210. The logic system 1210 may be configured to control the speaker(s) 1220 according to decoded audio data. In some implementations, the logic system 1210 may be configured to encode audio data according to conventional encoding methods and/or according to encoding methods described herein. The logic system 1210 may be configured to receive such audio data via the microphone 1225, via the interface system 1205, etc.
The display system 1230 may include one or more suitable types of display, depending on the manifestation of the device 1200. For example, the display system 1230 may include a liquid crystal display, a plasma display, a bistable display, etc.
The user input system 1235 may include one or more devices configured to accept input from a user. In some implementations, the user input system 1235 may include a touch screen that overlays a display of the display system 1230. The user input system 1235 may include buttons, a keyboard, switches, etc. In some implementations, the user input system 1235 may include the microphone 1225: a user may provide voice commands for the device 1200 via the microphone 1225. The logic system may be configured for speech recognition and for controlling at least some operations of the device 1200 according to such voice commands.
The power system 1240 may include one or more suitable energy storage devices, such as a nickel-cadmium battery or a lithium-ion battery. The power system 1240 may be configured to receive power from an electrical outlet.
Various modifications to the implementations described in this disclosure may be readily apparent to those having ordinary skill in the art. The general principles defined herein may be applied to other implementations without departing from the spirit or scope of this disclosure. For example, while various implementations have been described in terms of Dolby Digital and Dolby Digital Plus, the methods described herein may be implemented in conjunction with other audio codecs. Moreover, the vector quantization and inverse quantization methods described herein are not limited to audio data applications, but have broad applicability.
For example, consider the motion vectors of a multi-view video sequence. Each motion vector may include a pair of parameters that represents the displacements in x and y directions for a small block of an image from one video frame to the next. Further, each view may have a motion vector for each such block in the view. Since a video object could be present in multiple views, the associated motion vectors may be correlated across views. Thus each displacement parameter may be indexed by two dimensions: one dimension may indicate the view and the second dimension may indicate whether the displacement is in the x direction or the y-direction. The displacement along x and y directions (e.g., the motion vector) in a single view may first be vector quantized. The motion vectors of adjacent views may then be predicted from the motion vectors of the first view. The prediction residual values of multiple views along a single position (x or y) may be jointly vector quantized.
The methods disclosed herein also may be applied to signal processing applications. For example, consider a grid of electronic sensors that are configured to respond to temperature variations. Thus, temperature is a parameter that can be extracted from the electrical signals (possibly digitized) provided by these sensors. The temperature parameter can thus be indexed by the sensor number in the grid and possibly by the time of sampling. Therefore the temperature parameter may have at least two dimensions. The parameter could be extracted and compressed for storage and use at a later time, or for transmission to a processing center on a channel of restricted bandwidth. Such data compression may involve quantization of the parameters. Temperatures from multiple sensors at a given time may be jointly vector quantized. The temperature of each sensor in subsequent instances of time may be predicted from the quantized temperature of the instant already considered. The prediction residuals across time may be grouped and vector quantized again.
Thus, the claims are not intended to be limited to the implementations shown herein, but are to be accorded the widest scope consistent with this disclosure, the principles and the novel features disclosed herein.
This application claims priority to U.S. Provisional Application No. 61/835,954, filed on 17 Jun. 2013, incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2014/042696 | 6/17/2014 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61835954 | Jun 2013 | US |