SYSTEM AND METHOD FOR EFFICIENT COMPRESSION OF DIGITAL DATA

Abstract
A system for compressing digital data by representing a portion of it predictionally and transformationally as a block of transform coefficients, then quantizing that block selectively into a set of encoding symbols based on an indication whether the transform coefficients represent the portion as having a particular characteristic, and then by encoding the set of encoding symbols into a data bit stream. In particular, frequency may be used as the characteristic of the digital data in many applications.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The purposes and advantages of the present invention will be apparent from the following detailed description in conjunction with the appended tables and figures of drawings in which:



FIG. 1 (background art) is a block diagram showing the major elements of a typical end-to-end video system.



FIGS. 2
a-e are a series of depictions of a data block undergoing processing by the quantization stage of FIG. 1.



FIG. 3 shows a Huffman table for the syntax element “total_zeros” used in the H.264 standard, as might be applied to the data represented in FIGS. 2a-e.



FIGS. 4
a-e are also a series of depictions of a data block undergoing processing by the quantization stage, only here the data block includes high-frequency data.



FIGS. 5
a-c are a series of depictions of a data block, specifically that of the raw block of FIG. 4a and the high-frequency block of FIG. 5b, now undergoing alternate quantization processing (e.g., by the quantization stage of FIG. 1 with minor changes in accord with the inventive compression system).



FIG. 6 is a graph presenting statistical analysis of the amount of high-frequency data blocks in typical video data being compressed into H.264 bit streams.



FIG. 7 is a block diagram depicting an example in which a current block has three causal spatial neighbors that are used for spatio-temporal correlations to amortize the overhead of an extra mode flag used by some embodiments of the present invention.



FIG. 8 is a flowchart depicting a digital data compression process that may be used by the compression system of the present invention.



FIGS. 9
a-b depict data blocks having other characteristics in the digital data than high or low frequency that may be employed by alternate embodiments of the present invention.





In the various figures of the drawings, like references are used to denote like or similar elements or steps.


BEST MODE FOR CARRYING OUT THE INVENTION

A preferred embodiment of the present invention is apparatus and method for efficient compression of digital data. As illustrated in the various drawings herein, and particularly in the views of FIG. 5 and FIG. 8, preferred embodiments of the invention are depicted by the general reference characters 100 and 200.


In the context of H.264 video compression the present inventors have observed that poor compression efficiency results for high frequency residue information because the Huffman tables for syntax elements, such as “total_zeros,” are tuned for video content where these syntax elements take smaller values. As described above, in the Background Art section, these syntax elements tend to take larger values in the high frequency case, hence requiring more bits to represent them and resulting in poor compression.


Briefly, the present inventors have devised a way to counter this by adding flexibility to the interpretation of the syntax elements “total_zeros” and “run_before.” Specifically, the inventors propose adding flexibility so that these syntax elements (or corresponding elements in other compression techniques) can be handled conventionally for low-frequency data, and so that they can alternately be represented with counts of zeros done backwards from the last coefficient for high-frequency data.



FIGS. 5
a-c are a series of depictions of a data block 40, specifically that of the raw block 52 of FIG. 4a and the high-frequency block 54 of FIG. 5b, now undergoing alternate quantization processing (e.g., by the quantization stage 32 of FIG. 1 with minor changes in accord with the inventive compression system 100). FIG. 5a illustrates application of a novel linear zigzag reverse scan order 102 to produce a one-dimensional high-frequency array 104 shown in FIG. 5b. FIG. 5c shows a quadruplet sequence of entropy coding symbols 106 that describes the high-frequency block 54 and the high-frequency array 104. Furthermore, an optional mode flag 108 indicating that the linear zigzag reverse scan order 102 was used can be provided.


If the alternate quantization used here next uses the same technique that the quantization stage 32 of FIG. 1 uses for H.264 data, the syntax element “coeff_token” is 4; the values of all non-zero quantization levels including sign information are 1, 1, 3, and −2; the syntax element “total_zeros” is 1; and the syntax elements “run_before” are 0, 0, and 0. And if essentially the same (conventional) entropy coding stage 34 as in FIG. 1 is employed, the VLC encoded value for “total_zeros” here is a 3-bit codeword.


Thus, where using the conventional approach for the very same high-frequency block 54 and high-frequency array 56 produced a “total_zeros” of 11 (see e.g., FIG. 4e), which was VLC encoded into an 8-bit codeword, the inventive compression system 100 produces a “total_zeros” of 1 that can be VLC encoded into only a 3-bit codeword. We thus get bit-rate savings worth 5 bits.


In general, depending upon the data at hand (whether it is high-frequency or low-frequency), an encoding stage in accord with the inventive compression system 100 can switch between the forward counting mode (i.e., employing the conventional linear zigzag forward scan order 46) and the backward counting mode (i.e., employing the novel zigzag linear reverse scan order 102) for enhanced compression performance across a greater range of data. Notably, both counting modes require the same computational effort for quantization and for the major part of entropy encoding.


From the decoder standpoint, which of the two modes of quantization and encoding was used can be indicated by the binary valued “counting mode” mode flag 108 which indicates whether forwards or backwards counting was done. Of course, the addition of an extra mode flag itself constitutes an overhead, but it has been the inventors' observation that the inventive compression system 100 still often provides a net benefit.


In theory, the mode flag 108 is optional, although it is expected by the inventors that few embodiments of the invention will not include it in at least some form. For example, using the novel zigzag linear reverse scan order 102 with no indication of this usage in an otherwise conventional compression process is one way to encrypt the digital data in the resulting data bit stream. Alternately, for some types of digital data the zigzag linear reverse scan order 102 might inherently be more efficient and its use by an encoder thus assumed by a decoder for data of the type.


Digressing slightly, before considering the burden of adding the mode flag 108, it can be helpful to appreciate the amount of high-frequency data that is actually present in typically video data. FIG. 6 is a graph presenting statistical analysis of the amount of high-frequency data blocks in typical video data being compressed into H.264 bit streams. The bar labeled “Nasa” represents video of a rocket launch; the bar labeled “Imax” represents a movie trailer with zoom of a moon walk; the bar labeled “Bus” represents a video of a city bus traveling across frame down a street; the bar labeled “Table” represents a video of a ping pong game; the bar labeled “Coastguard” represents a video of a vessel traveling across frame in a nautical setting; the bar labeled “City” represents an aerial looking-downward video of major cityscape; and the bar labeled “BBC” represents a captured video sequence of typical British television programming. As would be expected for “movie trailer footage” with lots of quick action, rapid scene changes, and zoom-in and zoom-out special effects, the “Imax” sequence has a lot of high-frequency content. Perhaps surprising to some, however, is the quantity of high-frequency content even in the other scenes. These are of traditional block-motion video subject matter, and the values in FIG. 6 tend to undercut the traditional argument that most such subject matter has only inconsequential high-frequency content.


In summary, even for H.264 video compression with its sophisticated motion modeling, there is a significant percentage of data that is associated with high-frequency characteristics. The present inventive compression system 100 is directly applicable to such data. Similarly, even greater applicability and benefit can be expected for previous generation video compression standards such as MPEG-2 and MPEG-4, with their relatively simple motion modeling.


With its implications about the potential realizable benefits, FIG. 6 now permits a reasoned analysis of how much overhead the addition of the extra mode flag 108 might entail. In FIG. 5c the mode flag 108 is depicted as being present with the entropy coding symbols 106, implying that it is provided with each set of entropy coding symbols 106 for each data block 40. This can be the case, but in most embodiments of the inventive compression system 100 that will desirably so. It is the present inventors' position (a) that this extra syntax information for this can be amortized over larger block sizes and (b) that it can be encoded in a more efficient fashion.


With respect to (a), such the mode flag 108 can be indicated at a coarser level than at a 4×4 block. For instance, a natural granularity at which the mode flag 108 can be indicated is at the granularity of the motion block, the frame (or still image), a sub-sequence of video frames, or even some other unit basis entirely (e.g., a one second block of audio data). If a video motion block size of 16×16 is chosen, for example, the mode flag 108 then can be indicated at a 16×16 block level and the same mode would be used for all sixteen 4×4 blocks inside the 16×16 block, thus amortizing the extra syntax information.


With respect to (b), additional savings on the average bit-rate incurred by the mode flag 108 can be obtained by using context-based methods for predicting the likely value for the mode flag. Since typical video data exhibits high spatio-temporal correlations, the context information can be derived from values for the mode flag 108 of the spatial or temporal neighbors of the block in question. FIG. 7 is a block diagram depicting an example in which a current block 110 has three causal spatial neighbors (ABC) (neighbor blocks 112a-c) that are used for this. Such context-based prediction would be especially beneficial when fractional bit-rate methods such as arithmetic coding are used for entropy coding (e.g., as is the case for the context-based adaptive binary arithmetic coding (CABAC) mode in H.264 video compression standard).



FIG. 8 is a flowchart depicting a digital data compression process 200 that may be used by the compression system 100. The compression process 200 begins in a step 202, where any set-up can optionally be performed. In a step 204 prediction is then performed. This can be entirely conventional, in the manner that the prediction stage 28 performs its task. Next, in a step 206, transformation is performed. This can also be entirely conventional, in the manner the transformation stage 30 performs its task. The digital data compression process 200 departs from convention and from prior art approaches in the following step, a step 208 that is discussed in more detail, below. Next, in a step 210, entropy encoding is performed. This can also be entirely conventional, in the manner the traditional entropy coding stage 34 performs its task. In most variations of the compression process 200 (and in most embodiments of the compression system 100), however, this will be modified to at least handle the mode flag 108 (this step is also discussed in more detail, below). And in a step 212 the compression process 200 finishes, here performing any optional wrap-up.


In FIG. 8 step 208 is shown having two major internal operations and also in expanded form. Conceptually, the quantization in step 208 includes parsing (or inferring information about) the contents of the data block 40 and then creating the entropy coding symbols 106 based on that. This point is emphasized in the expanded depiction of step 208. Here it can be seen that step 208 can include a step 214 where a determination (an analysis based on a characteristic of the digital data) is made which type of parsing to apply. Based on this determination, either a step 216 for the conventional zigzag linear forward scan order 46 or a step 218 for the novel zigzag linear reverse scan order 102 is then performed. And following this, in a step 220, position coding is performed to create the entropy coding symbols 106 (and typically to add the optional mode flag 108 to signal to a decoder which parsing approach was employed in encoding).


Some particular variations of the compression process 200 from how it is represented in FIG. 8 merit noting. The step 220 is depicted as being the same regardless of whether it flows step 216 or step 218. While there will, of course, be some minor difference here if the optional mode flag 108 is employed, entirely different algorithms for position coding can also be used.


Using the same position coding algorithm works well in the examples used herein, which are based on video data and the standardized forms of compression applied to it, but this should not be taken as implying a limitation or even desirability. For instance, the present invention can also be applied to audio type data, which often has multiple channels for stereo or other sophisticated effects. Using different position coding algorithms may be more efficient here, possibly by applying different ones chosen to take advantage of inherent relationship between the channels to achieve more efficient compression of the audio data.


Similarly, when the optional mode flag 108 is employed, step 210 will typically have minor differences over what would have previously been done conventionally. Additionally, however, here as well there is no particular reason that step 210 (VLC entropy encoding) has to be performed the same for the output from step 216 as for the output from step 218.


In summary, providing “counting mode flexibility” and using a mode flag 108 to indicate the particular mode used enables an efficient representation of low frequency as well as high-frequency data, as opposed to only the low-frequency data that is the target of the present-day video compression standards. Furthermore, as noted herein repeatedly and now for the last time, video data is merely one type of data that is suitable for application of the inventive compression system 100. For example, without limitation, it is relatively easy to appreciate that suitable embodiments of the inventive compression system 100 can be beneficially applied to still image and audio data. Conceptually, images can be thought of as similar to the individual frames of raw video data in the examples presented above, and compressing audio data today (e.g., MP3) especially uses most of the same principles and techniques as compressed video.


Finally, it should be noted that we have used examples based on low-frequency and high-frequency digital data, since these should readily be ones readily appreciated by skilled practitioners in this art. The spirit of the present invention, however, has broader applicability than merely to above the diagonal 49 low-frequency data (e.g., FIG. 2b) versus below the diagonal 49 high-frequency data (e.g., FIG. 4b). From FIG. 2b and FIG. 4b it can be appreciated that frequency is a data characteristic that the present invention can employ. And from the two data blocks 40 in FIGS. 9a-b it can be further appreciated that there are other characteristics of the digital data that may be employed by embodiments of the present invention. FIG. 9a shows a low-frequency block 114 that will nonetheless be inefficiently compressed if frequency is the criteria used in step 214, and FIG. 9b shows a high-frequency block 116 that will also be inefficiently compressed if frequency is the criteria used in step 214. Accordingly, while frequency is expected to be the analysis characteristic most employed by embodiments of this invention other analysis characteristics may additionally or alternately also be used in other embodiments.


While various embodiments have been described above, it should be understood that they have been presented by way of example only, and that the breadth and scope of the invention should not be limited by any of the above described exemplary embodiments, but should instead be defined only in accordance with the following claims and their equivalents.

Claims
  • 1. A method for compressing digital data, the method comprising: (a) representing a portion of the digital data predictionally and transformationally as a block of transform coefficients;(b) quantizing said block selectively into a set of encoding symbols based on an indication whether said transform coefficients represent said portion as having a particular characteristic; and(c) encoding said set of encoding symbols into a data bit stream.
  • 2. The method of claim 1, wherein said characteristic is frequency.
  • 3. The method of claim 1, wherein: said portion is a current portion having an adjacent prior portion within the digital data; andsaid (a) includes representing said current portion in said block based on prediction including a shift component and a difference component, wherein said shift component indicates a change in location of said current portion with respect to said prior portion and said difference component indicates a change in content of said current portion with respect to said prior portion.
  • 4. The method of claim 1, wherein said (a) includes transforming said portion into a domain wherein said portion is more compactly represented in said block.
  • 5. The method of claim 4, wherein said domain is a frequency domain.
  • 6. The method of claim 1, wherein said (b) includes applying a lossy compression to said block.
  • 7. The method of claim 6, wherein said lossy compression includes at least one member of the set consisting of scaling down said transform coefficients and truncating said transform coefficients to integer values.
  • 8. The method of claim 1 wherein said indication is based on a member of the set consisting of analyzing said portion of the digital data, analyzing a section of the digital data including said portion, and a determination based on an inherent nature of the digital data.
  • 9. The method of claim 1, wherein said set of encoding symbols includes quadruplets representing run, level, sign, and last wherein run corresponds to a quantity of zeros before a non-zero value, level corresponds to a magnitude of said non-zero value, sign indicates whether said non-zero value is positive or negative, and last indicates whether a current said quadruplet is last in a said set.
  • 10. The method of claim 1, wherein said (b) further includes adding a mode flag to said set of encoding symbols that indicates a technique used for said quantizing.
  • 11. The method of claim 1, wherein said (b) includes: (1) applying a reverse zigzag scan order to said transform coefficients based on said indication; and(2) otherwise applying a forward zigzag scan order to said transform coefficients.
  • 12. The method of claim 11, wherein said indication is that said portion has a high-frequency type said characteristic.
  • 13. The method of claim 11, wherein said (b) further includes adding a mode flag to said set of encoding symbols that indicates whether said reverse zigzag scan order or said forward zigzag scan order was applied.
  • 14. The method of claim 11, wherein: said portion is in a sequence of adjacent portions within the digital data that are processed the same with respect to said indication by the method into a series of said sets of encoding symbols; andsaid (b) further includes adding a mode flag to said series that indicates whether said reverse zigzag scan order or said forward zigzag scan order was applied.
  • 15. The method of claim 1, wherein said (c) includes applying a loss-less compression to said set of encoding symbols.
  • 16. The method of claim 1, wherein said (c) includes applying a variable length coding to said set of encoding symbols.
  • 17. The method of claim 16, wherein said variable length coding is a Huffman coding.
  • 18. A system for compressing digital data, comprising: a logic that represents a portion of the digital data predictionally and transformationally as a block of transform coefficients;a logic that quantizes said block selectively into a set of encoding symbols based on an indication whether said transform coefficients represent said portion as having a particular characteristic; anda logic that encodes said set of encoding symbols into a data bit stream.
  • 19. The system of claim 18, wherein said characteristic is frequency.
  • 20. The system of claim 18, wherein said logic that represents includes a logic that transforms said portion into a domain wherein said portion is more compactly represented in said block.
  • 21. The system of claim 20, wherein said domain is a frequency domain.
  • 22. The system of claim 18, wherein said logic that quantizes includes a logic that applies a lossy compression to said block.
  • 23. The system of claim 22, wherein said lossy compression includes performs at least one member of the set consisting of scaling down said transform coefficients and truncating said transform coefficients to integer values.
  • 24. The system of claim 18, further comprising a logic that analyzes a section of the digital data to determine said indication.
  • 25. The system of claim 24, wherein said section is said portion of the digital data.
  • 26. The system of claim 18, wherein said logic that quantized adds a mode flag to said set of encoding symbols that indicates a technique used for quantizing.
  • 27. The system of claim 18, wherein said logic that quantizes includes a logic that applies a reverse zigzag scan order to said transform coefficients based on said indication and that otherwise applies a forward zigzag scan order to said transform coefficients.
  • 28. The system of claim 27, wherein said indication is that said portion has a high-frequency type said characteristic.
  • 29. The system of claim 27, wherein said logic that quantizes further includes a logic that adds a mode flag to said set of encoding symbols to indicate whether said reverse zigzag scan order or said forward zigzag scan order was applied.
  • 30. The system of claim 25, wherein said logic that quantizes further includes a logic that adds a mode flag to one in a series of said sets of encoding symbols, wherein said mode flag indicates whether said reverse zigzag scan order or said forward zigzag scan order was applied with respect to said series.
  • 31. The system of claim 18, wherein said logic that encodes includes a logic that applies a loss-less compression to said set of encoding symbols.
  • 32. The system of claim 18, wherein said logic that encodes includes a logic that applies a variable length coding to said set of encoding symbols.