This disclosure relates to apparatus and methods for processing compression encoded signals.
A digital signal processor (DSP) is used to process digital signals, which have discrete values represented in the signal. There are typically two types of DSPs: floating-point DSPs and fixed-point DSPs. Generally, a floating-point DSP uses a certain number of bits to represent the mantissa of a signal's value and another set of bits to represent the exponent of the signal's value. For example, for a large signal, which may be quantified as 1126.4, which is 1.1 times 210, a floating point representation may be 1.1 for the mantissa and 10 for the exponent. Floating-point DSPs thus provide the ability to represent a very wide range of values, but with a precision that is limited by the number of bits used to represent the mantissa.
Unlike a floating-point DSP, a fixed-point DSP uses all of its bits to represent a signal's value. The precision of the fixed-point DSP is determined by dividing its range by the number of discrete values that can be represented by the available bits in the DSP. Thus, for example, if a DSP is to process signals having a range of 0-16 and it has three available bits, which can represent eight discrete values, then the least significant bit carries a value of two. Fixed-point DSPs can experience problems, however, with signals that are not sized well to the DSP. For example, in a 21-bit fixed point system, if the least significant bit is set to 1, the DSP can only handle signals having values up to 2,097,152, and therefore a signal with the value of 3,676,000 will not be properly processed. As another example, if the signal's value is small (e.g., 10) and changes to the signal's value are small (e.g., +/−1.4) compared to the range of the fixed-point DSP (e.g., 2,097,152), quantization noise from rounding problems may result in a degradation of signal quality because the least significant bit is larger than, or a large portion of, the changes to the signal's value. In contrast, in a floating-point DSP, the mantissa and exponent may be used to represent decimal values so that rounding errors are minimized.
Currently, floating-point DSPs are used in applications where the range of a signal's value varies. This is because the floating-point DSPs can adjust to the change in range by using exponent bits. Nevertheless, it is often desirable to use fixed-point DSPs instead, because fixed-point DSPs typically consume less power, are cheaper, and are fabricated in less chip area compared to floating-point DSPs.
Compression encoded signals include digital signals that have been compressed and encoded in a format, such as an MPEG format. Typically, these compression encoded signals are processed using floating point DSPs exclusively. It is desirable to provide fixed-point DSPs that can be used in processing compression encoded signals, without the problems typically associated with fixed-point DSPs, such as significant quantization noise or overflow.
This disclosure relates to apparatus and methods for processing compression encoded signals. Compression encoded signals are compressed signals. Certain techniques can take advantage of the compressed nature of the signal to introduce a special way of processing the signal. Once of these techniques is companding, which involves the compression and decompression of a signal. Since a compressed encoded signal is already compressed, companding processing can be manipulated to be applied directly to the compressed encoded signal. Companding techniques such as syllabic companding and block floating point are presented for processing compression encoded signals during the decoding process, using efficient fixed-point arithmetic operations. The efficient fixed-point arithmetic operations provide an advantage in terms of speed, power, and cost over using floating-point operations to achieve the same processing.
In some embodiments, a digital signal processor is provided that includes an input for receiving a subband of a compression encoded signal and a subband processor coupled to the input that is configured to process the subband of the compression encoded signal. The subband processor further includes a fixed-point companding digital signal processor that is configured to receive the subband of the compression encoded signal and process the subband of the compression encoded signal using envelope information that describes characteristics of the compression encoded signal to produce a processed compression encoded signal. The subband processor further includes an envelope generator that is configured to produce envelope information regarding the subband of the compression encoded signal to provide changes in the dynamic range of the compression encoded signal for fixed-point digital signal processing.
In one example, the fixed-point companding digital signal processor uses syllabic companding in processing the subband of the compression encoded signal. In another example, the envelope generator implements a look up table to convert from a compression encoded signal scale factor and a normalized subband sample to a scale factor that is an integer power of two and a re-normalized subband sample corresponding to the power-of-two scale factor. In yet another example, the compression encoded signal is an MPEG layer 2 (MP2) signal.
In still another example, the digital signal processor further includes a decoder that partially decodes a received compression encoded signal and provides a partially decoded signal to the subband processor that is time domain based. The compression encoded signal may be, for example, an MPEG layer 3 (MP3) signal and the partially decoded signal is an MPEG layer 2 signal.
In accordance with the disclosed subject matter, corresponding methods and software are also provided.
This disclosure relates to apparatus and methods for processing compression encoded signals. Compression encoded signals are signals that are compressed and encoded for storage and use. Certain techniques can take advantage of the compressed nature of the signal to introduce a special way of processing the signal. One of these techniques is companding, which involves the compression and decompression of a signal. Since a compressed encoded signal is already compressed, companding processing can be manipulated to be applied directly to the compressed encoded signal. Examples of compression encoded signals include MPEG, which further includes well-known formats such as MP3 and advanced audio coding (AAC), where the formats generally dictate the encoding and compression performed on the signal. These compression encoded signals are typically processed in digital signal processors (DSPs).
Generally, the DSPs processing compression encoded signals are floating point DSPs. This is because the floating-point DSPs can adjust to the change in range by using exponent bits. Nevertheless, it is desirable to use fixed-point DSPs instead, because fixed-point DSPs typically consume less power, are cheaper, and are fabricated in less chip area compared to floating-point DSPs. In this disclosure, techniques are presented for processing compression encoded signals during the decoding process using efficient fixed-point arithmetic operations. In certain embodiments, these processing techniques exploit the compressed nature of compression encoded signals to minimize quantization distortion such that it is largely inaudible, even though only low-resolution fixed-point operations are used in the processing. This allows processing on a fixed-point DSP, while maintaining signal quality.
Companding (compressing/expanding) is a technique used in transmission and sound recording to compress the dynamic range (DR) of input signals; at the output, the dynamic range is restored (expanded). The compression can be accomplished, for example, by using root-mean-square information or envelope information. For audio applications, envelope-based or root-mean-square-based companding is referred to as “syllabic” companding, as the amount of compression is roughly constant for each syllable, and usually only varies between syllables. The compression can also be accomplished via memoryless nonlinear functions; this type of companding is referred to as “instantaneous” companding, as the compression and expansion depend only on the instantaneous values of signals. Since, generally, the channel or storage medium does not modify the signal, the expansion operation is simply the inverse of the compression operation. Thus, for syllabic companding, if, for example, the input is compressed through a division by the input envelope, then the expansion is usually a multiplication by this same envelope signal. For instantaneous companding, if compression is accomplished via some invertible, nonlinear, “compressive” function with desirable properties, then expansion is accomplished by applying the inverse of the compressive function.
These techniques can be advantageously applied to compression encoded signals in some embodiments. In terms of compression encoded signals, the MPEG-1 coding standard is one of the most popular and widely used standards for efficient and perceptually lossless audio compression coding, as MPEG encoded audio achieves very high perceived audio fidelity, together with high compression rates. In operation, MPEG uses a digital filterbank to create 32 narrowband filtered versions of a digital input signal, referred to as “subbands,” each of which is downsampled by a factor of 32. The presence of a large signal in a particular subband makes noise in that subband perceptually inaudible; this phenomenon is known as “masking.” In MPEG-1 layers I and II, 64 high-precision scale factors are used to compress the dynamic range of the subband samples (normalization).
The actual value of each subband sample is given by the normalized subband sample, multiplied with the corresponding scale factor; this multiplication is referred to as “denormalization.” Processing of MPEG-encoded signals is conventionally performed by first fully decoding the input stream and then performing the desired processing. This method, which is referred to herein as “classical DSP,” ignores certain features of MPEG audio encoding. The processor is forced to process a signal with high dynamic range, and with frequency content throughout the audio band. As a result, to avoid introducing significant audible quantization distortion, these subband processors are implemented in either very high resolution fixed-point or in floating point.
In the embodiments described herein, processing is done prior to denormalization by using a syllabic companding DSP technique or a block-floating-point (BFP) technique. These techniques process the compressed input, along with corresponding input scale-factors, and yield compressed output, along with corresponding output scale factors. The resulting system-level block diagram is illustrated in
The subband processor 102 performs the desired processing on each sub-band of the compressed signal, before the de-normalization process. The processor can use an algorithm to implement the processing. The algorithm may be dependent on the type of processing that is being performed. For example, the processing can include changing the bass, treble, volume of the signal or adding reverberation to the signal. The adding of effects such as music sounding like it is in a concert hall or adjusting to other characteristics can be performed by the subband processor 102. The multipler 104 performs the de-normalization of the signal. The multipler 104 can be a simple multiplier, multiplying the compressed signal (which is large) with the corresponding envelope (which carries the information about the size of the actual signal), resulting in a decompressed sub-band signal.
The up-sampler 106 can perform discrete-time upsampling by a factor corresponding to the number of subbands. For MPEG, this factor is 32. Taking MPEG as an example, each sample at the input of up-sampler 106 results in 32 output samples. The spacing between each pair of the latter (samples) is 1/32 of the spacing between each pair of input samples. The sub-band reconstruction filter 110 processes a stream of sub-band samples so that they can be ready to be combined with the remaining sub-bands, by removing the “out-of-band” artifacts that were effectively inserted in each sub-band during the encoding process. The output collector 110 can be a digital multi-way adder. The output collector 110 combines (e.g., by means of a simple addition) the filtered sub-bands to create the final output.
The techniques described herein work best when the scale-factors correspond to the time-domain envelope of the subband samples. As such, MPEG 1-Layer II (MP2) is used to provide examples using this technique. MP2 is used for many applications, including digital-video-broadcasting (DVB) and DVD players. The companding subband processors can use few bits and simple low-bit fixed-point operations. Due to the compressed dynamic range of their input, state, and output signals, the resulting output signal to quantization distortion ratio (SNR) is always sufficiently high that the output quantization distortion is inaudible due to the masking properties of the MPEG reconstruction filterbank.
As a first example, the syllabic companding DSP technique is used to implement an all-pass reverberator prototype, described in state-space by the following equations:
x
1(n+1)=−0.8·xL(n)+0.2]·u(n)
x
i(n+1)=xi−1(n), 2≦i≦L
y(n)=1.8·xL(n)+0.8·u(n) (1)
where L=2048 and the sampling rate for the input u(n), output y(n), and states xi(n) of the prototype is fs=44.1 kHz. For this case, the technique can involve the insertion of 32 identical subband filters, each given by Eq. (1), but with L replaced by
this subband filter is referred to as the “subband-prototype.” Here, it is desirable to process samples before denormalization, so the companding DSP technique is applied to the subband-prototype. Next externally applied control signals are introduced: eu(n), ey(n), and ex
By substituting (2) in (1), with subband processors described by the state equations:
with K=64 and ex
The LUT can include a 14-bit input: the 8-bit normalized input sample, concatenated with its corresponding 6-bit scale-factor index. The LUT outputs a 4-bit integer corresponding to the base-2 logarithm of the lowest integer power of 2 greater than the scale-factor, and a new 8-bit compressed subband sample corresponding to this power-of-2 scale factor. The new 8-bit sample is used as û(n) in Eq. (2), while the power-of-2 scale factor is used as eu(n) in Eq. (2). The remaining e-controls can be chosen to correspond, at least roughly, to the envelopes of the corresponding signals in the prototype, in order to maximize the dynamic range of the subband processor, and minimize the quantization distortion.
Envelope generator 132 can be used instead of a replica DSP to provide an estimation of the intermediate envelopes that are used in companding based processing (see Eq. (3)). The envelope generator 132 obtains the remaining e-controls used by the companding DSP 130. In some embodiments, a replica DSP can be used to calculate the remaining e-controls. This could be done here as well, using 32 low-resolution fixed-point implementations of the subband-prototype. However, implementing the replica DSPs adds significant overhead, so a more efficient technique has been devised for estimating the remaining e-controls. The algorithm, shown in block diagram format in
In operation, the envelope generator detects changes in the input envelope, and can use scaling information and samples of the subband of the compression encoded signal. If the input envelope does not change by more than a pre-defined (emperically determined) amount, then the envelopes in equations (3) are assigned weighted versions of past values of the input envelope, according to the filter attributes. If the input envelope is detected to have changed by more than the pre-defined threshold, then the envelopes are assigned the value of the most recent input envelope. The envelope generator outputs this information as e-controls for the companding DSP.
The alogrithm for the design of the envelope generator of
The above discussion applies when there is no sudden change in the input, u(n), since until the system resettles after the sudden change, it cannot be viewed as above. It has been determined empirically that abrupt changes in u(n) are indicated by changes of more than a factor of 8 between consecutive values of eu(n) in Eq. (2). When no such change is detected, the subband signal can be considered to be narrowband. For the subband-prototypes, all input-state and input-output transfer functions are normalized such that their maxima are at 0 dB, so A1=1. Thus, in Eq. (2), the output envelope of the companding DSP's output, ey(n), can be approximated by eu(n−G1) and the first state's envelope, ex
The magnitude of the transfer function from the subband prototype's input, u(n), to its Kth state, xK(n), was simulated to range from −15 dB to 0 dB. Thus, when there have been no recent abrupt input envelope changes, eu(n) and ex
it is seen that x1(n+1) and y(n) are both composed of two components: one depending on the input, u(n), and the other on the Kth state, xK(n).
When there is an abrupt input envelope change, one or the other component will dominate in determining the envelopes of xi(n+1) and y(n), allowing us to use simple approximations for these envelopes. Specifically, for sudden increases in eu(n), eu(n) temporarily becomes significantly larger than ex
The above described functionality is shown in
Another way to process samples before denormalization is to apply a block floating point (BFP) technique, to provide input and output compression in addition to state-variable compression. In some embodiments, scaling signals gu(n), gy(n), and gi(n), referred to as “g-controls”, and normalized input, output and states û(n), ŷ(n), and {circumflex over (x)}i(n), such that:
û(n)=gu(n)·u(n)
ŷ(n)=gy(n)·y(n)
{circumflex over (x)}(n)=gi(n)·xi(n), 1≦i≦K (4)
Here this technique is applied to the subband prototypes of the previous subsection. In general, the BFP technique obtains an intermediate “partially compressed” state vector, {tilde over (x)}(n), and output, {tilde over (y)}(n), from the compressed input, û(n), the compressed state vector, {circumflex over (x)}(n), and the g-controls. For the subband prototypes, this is accomplished as follows:
where K=64. Eqn. (5) is not a standard state space, as it relates {tilde over (x)}(n+1) to {circumflex over (x)}(n). As in the previous subsection, a LUT can be used to convert from the compressed encoded signal's normalized subband samples and scale factors to scale factors that are integer powers of 2, along with the corresponding normalized subband samples. These are used as gu(n) and û(n) in Eq. (5). The remaining g-controls can be derived recursively by introducing “p-controls.” Since for this example, gK(n)=g1(n−K+1), we only need to derive g1(n) and gy(n−1), so we only need p1(n) and py(n). The former is obtained from {tilde over (x)}1(n):
where N is the number of bits used for compressed states, input, and output, and α is a constant “safety factor” set to be slightly less than unity. Similarly, py(n) is obtained by an equation identical to Eq. (6), but with {tilde over (y)}(n) replacing {tilde over (x)}1(n). The p-controls are used to recursively obtain g-controls:
g
1(n)=p1(n)·g1(n−1)
g
y(n)=py(n)·gy(n−1) (7)
The p-controls are also used to obtain the fully compressed {circumflex over (x)}1(n) and ŷ(n) from the partially compressed {tilde over (x)}1(n) and {tilde over (Y)}(n):
{circumflex over (x)}
1(n)=p1(n)·{tilde over (x)}1(n)
ŷ(n)=py(n)·{tilde over (y)}(n) (8)
The Kth state is simply obtained as: {circumflex over (x)}K(n)={circumflex over (x)}1(n−K+1).
The p(n) and g(n) signals in Eq. (6) are integer powers of 2, and they are stored as those powers. Thus, although Eq. (6) contains ratios and products, these can be implemented as additions and subtractions of powers of 2, and bitshifts by these powers. This can result in a simpler design.
In the above description, both syllabic companding and BFP embodiments are described. In particular, syllabic companding and BFP are applied to directly process compression encoded signals before denormalization. The proposed techniques take advantage of the compressed subband samples and scale factors already provided in the compression encoded signal. The compressed input and scale factors are used as inputs to low-resolution syllabic companding or BFP processors, and processing is thus accomplished with low-resolution fixed point arithmetic.
For the number of bits used, relatively large signal to noise ratio (SNR) is achieved over a large input dynamic range. The companding nature of the processing ensures that significant quantization distortion is only present in subbands that also simultaneously contain significant signal. This property, combined with the psychoacoustical masking properties of the MPEG reconstruction filterbank, ensures that even though the processor uses low-resolution fixed-point arithmetic, the resulting quantization distortion at the processor output is significantly reduced relative to that of the classical DSP. In one example, 8-bit systems can be used to clearly illustrate the noise reduction, relative to a classical DSP, resulting from the proposed schemes. More bits can be used in commercial applications to further reduce the resulting quantization noise. The results imply that by using companding or BFP in lieu of classical processing, fewer bits are needed to achieve inaudible quantization noise.
The range of input levels that a system can tolerate may be referred to as the system's dynamic range (DR). More specifically, if emax is the envelope of the largest-envelope input signal that a system can tolerate without overflow, while emin is the envelope of the smallest-envelope input signal for which the SNR at the output of the system is still greater than some specified minimum SNR, then the DR of the system is the ratio of emax to emin. Similarly, if a given signal has an envelope which is at most emax and at least emin, then the DR of the signal is the ratio of emax to emin. Note that if the DR of a signal is lower than that of a system, then when the signal is input to the system, provided that the signal is scaled by an appropriate constant, it will be processed with at least the minimum SNR, and will not cause overflows in the system.
In the BFP technique, only fixed-point hardware is used, but with extra scaling signals and extra operations to increase the dynamic range of the DSP. The BFP architecture allows the scaling signals to be dynamic (time-varying). The scaling-signals in the BFP technique are chosen specifically. Although most BFP architectures share a scaling signal throughout the DSP, the proposed BFP architectures of certain embodiments provide every state its own independent scaling signal.
The logarithmic number system (LNS) represents numbers using a sign bit, followed by the logarithm of the absolute value of the number. Dynamic range is increased significantly due to the compressive nature of the nonlinear logarithm function. Arithmetic operations such as addition and multiplication can take two LNS format numbers, and return the result in the LNS format. These operations are not generally implemented with standard fixed-point arithmetic units. In an LNS architecture, the DSP coefficients can be stored in the LNS format. A major advantage of LNS architectures is that multiplication and division are easily and efficiently accomplished using standard fixed-point addition and subtraction, respectively. These operations can thus be extremely efficient, and, in the absence of overflow and underflow, nearly error-free. Similarly, the computation of powers and roots is greatly simplified. However, LNS addition and subtraction is typically more complex than fixed-point addition and subtraction, and is often accomplished by resorting to lookup tables (LUTs), often including a linear interpolation algorithm.
In some embodiments, the system may be a reverberator with a delay given by a multiple of 32. The proposed techniques, though, are far more general, and can be applied to any set of subband processor prototypes. For example, the proposed techniques can be applied to a linear phase finite impulse response (FIR) filter. Additionally, a companding DSP and companding methods are further described in U.S. Pat. Nos. 7,602,320 and 6,389,445, each of which are hereby incorporated by reference herein in their entirety.
Other applications of the disclosed subject matter may include include, for example, providing the capability for users to manipulate (add effects) to the audio on their portable MPEG players in a very efficient manner. Currently, with typical portable MPEG players, the user selects an audio clip and plays it back. An MPEG decoder decodes the audio, and the user hears the audio, but does not have the option to add effects (echo, reverb, subwoofer, etc.).
The same functionality can be added to DVB (Digital Video Broadcast) players on portable devices, since the DVB standard uses the same standard (MP2) to which this technology can be applied for transmitting audio. While portable players are described for illustrative purposes, other audio players and DVB players can also benefit from this technology.
For example, in conventional devices that allow a user to manipulate audio, the typical device would first fully decode the MPEG and then process the manipulations to the audio. This requires the processor to have a high dynamic range, so it is more expensive and consumes more power (e.g., drain the batteries faster). In other conventional devices, processing could be done during the decode, but the processors are more complicated than those utilizing the technology described in this application. By using the technology described herein, the processing can be done during the decode in a very efficient manner, using the features of MPEG among other things. This can allow users to add effects, and the hardware used to give them this capability is relatively simple and inexpensive and does not cause significant additional power drain.
The techniques described herein can be readily applied to compressed encoded signals such as MP3 and AAC, which are used by a number of devices. The MP3 and AAC standard can be considered to be a layer on top of the MP2 standard, which allows the techniques described herein to be used quite readily. For example, the MP3 content can be partially decoded into MP2, and then the content can be processed using the techniques described above.
Although the description above focuses on a particular set of audio effects, the techniques described herein can be generalized and applied in a number of ways. With these generalized techniques, users can have a wide array of audio effects to choose from (for example an equalizer, a filter that cuts off bass effects etc.).
In the above, the processing was described as being user-selected. However, these techniques can also be used to add certain automatic effects to audio, for example, based on a user-selected template. For example, on car stereo equipment, a user typically can adjust bass, treble, etc. With the techniques described herein, users can make such adjustments (and many other types of manipulations) on their portable MPEG players, and the processing used to implement the user's selections can be made far more efficient, in terms of hardware cost and power consumption, by using these techniques.
The companding techniques presented in this disclosure could be advantageously applied whenever it is desirable to achieve high signal to noise ratio over a wide dynamic range, using relatively simple, fast, low-cost and low-power fixed-point arithmetic. For example, in high-speed wireless applications, where signals with wide dynamic range must be processed with some minimum required output signal to noise ratio (SNR), using companding could significantly simplify the processing, thus reducing the cost and power consumption. Such application could, for example, reduce the cost and improve the battery life of cell-phones, smart-phones, and personal digital assistants (PDAs).
The systems discussed were implemented and simulated in Matlab/Simulink with both pure-tone and speech inputs.
Starting from a signal encoded in MPEG-1 Layer II, standard open-source MPEG-1 Ccode is used to partially decode the MP2 bitstream, yielding compressed (normalized) subband samples and the corresponding scale-factors. These compressed subband samples and scale-factors are passed to MATLAB, and the direct-processing algorithms described above is implemented in MATLAB/Simulink.
For the conventional fixed-point system, two versions are implemented. In the first version, referred to as the “full-rate” version, the original, full-rate, uncoded signal is processed by a conventional fixed-point implementation of the prototype reverberator (with K=2048). In the second version, referred to as the “direct-processing” version, the subband samples are denormalized using the scale-factors, and conventional fixed-point implementations of the subband prototype reverberators were used to process the denormalized subband samples. The processed subband samples are then converted into a fully-decoded signal using a MATLAB implementation of the MPEG-1 subband synthesis algorithm.
The SNR curves of
Although the present disclosure has been described and illustrated in the foregoing example embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the disclosure may be made without departing from the spirit and scope of the disclosure, which is limited only by the claims which follow.
This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application No. 61/241,788, entitled “Apparatus and Methods for Processing Compression Encoded Signals,” filed Sep. 11, 2009, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61241788 | Sep 2009 | US |