1. Field of the Invention
The field of the invention is digital video data compression, in particular variable length coding. Still more specifically, the present invention relates to systems and methods for quickly determining which type of encoding to use for discrete cosine transformed, quantized pixel data in video blocks.
2. Description of the Related Art
Video data compression requires several calculations to be made repeatedly on pixel data from the video source. Some of those calculations are used to determine which way to encode portions of the video data, either to provide the best compression results or simply to comply with the MPEG specification. Typically those calculations are done in sequence, which in the worst case can cause a delay 5 times longer than the best case. Since the delay can be significant, this causes buffers to be greater to accommodate the worst case and also negatively affects performance.
The MPEGx (1, 2 and 4) standards specify how digital video data should be compressed and stored. Recognizing that there exists temporal and spatial continuity within a video signal, MPEG standards have been designed to represent video data in a way that takes advantage of the continuity in the signal. Prior art compression algorithms are well understood in the art and specified in complete detail in documents available from the International Standards Organization (ISO). The MPEG video and audio compression data format is widely available and can be purchased from the ISO. ISO/IEC 14496-2 from ISO/SC29/WG11 published in Tokyo, Japan in March 1998 will be used as a reference in describing this invention below. The latest version of the “Coding of Moving Pictures and Audio” standards document is always available from the ISO.
While the compression standards are widely published and well known in the industry, there are diverse ways to implement the compression standards. Prior art public domain or free software-based implementations of the compression algorithms are available on the Internet. Software implementations have acceptable performance but usually require longer than real time to convert video data to the MPEG standard format. In other words, using prior art compression methods, it takes longer to compress the video data than it would to view the video on screen.
With the recent release of 8×DVD writers (meaning hardware that can write or burn a DVD 8 times as fast as DVDs are normally played), it would be convenient to have an MPEG compression system which could run 8 times faster than real time so that video data could be written directly to a DVD without intermediate storage like a large data buffer. While hardware implementations of the compression schemes can be faster than their software counterparts, there continues to be a need for better systems and methods for performing MPEG compression.
Particular elements of the compression scheme must be performed repeatedly when compressing video data. Considering that a typical video stream has 15-30 frames per second and video data is often at VGA resolution (640×480 pixels or 307,000 pixels), almost 5 million pixels must be processed and compressed per second.
MPEG processing uses groups or blocks of 8×8 pixels or 64 pixels as a basic processing unit. In order to have the fastest possible hardware for MPEG processing, it is essential to make the total processing time required for one 8×8 pixel unit as short as possible. Techniques or methods that reduce the time required to process the data contained in each 8×8 pixel image can substantially decrease the time required to complete the compression algorithms and convert the data to an MPEG standard format.
What is needed are systems and methods for speeding up the conversion of uncompressed digital video data into a compressed MPEG format in order to operate at speeds mush faster than real time.
According to the MPEG video compression standard, pixel data in 8×8 groups are transformed using a Discrete Cosine Transformation (DCT). Each frame of video data includes such pixel data blocks. After quantizing the resulting values, they are compressed using a variable length code specified by the MPEG standard. The dictionary for the variable length coding scheme in the MPEG4 standard is fixed. Data that cannot be encoded using codes from the VLC dictionary is stored using one of three alternative forms.
The variable length codes indicate various states of the values in the quantized 8×8 matrix resulting from the DCT. According to one approach defined in the MPEG standard, the values in the matrix are serialized starting at the value in the upper left corner and moving back and forth in a diagonal eventually ending on the value in the lower right corner. The discrete cosine transformation usually creates a matrix that contains many zeros and contains relatively few non-zero values. Instead of storing the entire 64 values from the sparsely populated matrix, a list of pairs of values are stored. Each pair includes a run value R and a level value L. Valid run values are 0 to 62 and indicate how many zeros appear before the level value. Valid level values in MPEG4 are −2047 to 2047.
The calculations required to determine the format in which to store the compressed data requires several additions steps and three sequential table lookup steps. Depending on the results of the table lookup steps, the time to complete the calculation typically varies between 4 and 20 clock cycles. Since this calculation must be performed repeatedly, reducing the number of clock cycles to perform the calculation will have a significant performance boost.
The present invention overcomes the problems of the prior art with systems and methods that are performed in parallel in hardware so that the worst-case delay is minimized to speed up the compression of digital video data. Specifically, the present invention described herein performs the above-described calculation within a maximum time using special lookup tables and parallel execution of the calculation and logic. The special lookup tables along with the appropriate logic and arithmetic calculations complete the determination of which format to use in 4 clock cycles in all cases. This is particularly advantageous because it allows the buffers to be reduced in size and reduces the time required for compression.
The accompanying drawings illustrate embodiments and further features of the invention and, together with the description, serve to explain the principles of the present invention.
The present invention is now described more fully with reference to the accompanying Figures, in which several embodiments of the invention are shown. The present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather these embodiments are provided so that this disclosure will be complete and will fully convey the invention to those skilled in the art.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the invention. For example, the present invention will now be described in the context and with reference to MPEG compression, in particular MPEG 4. Still more particularly, the present invention will be described with reference to blocks of 8×8 pixels. However, those skilled in the art will recognize that the principles of the present invention are applicable to various other compression methods, and blocks of various sizes.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Some portions of the detailed description that follows are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CDROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms and modules presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein. Furthermore, as will be apparent to one of ordinary skill in the relevant art, the modules, features, attributes, methodologies, and other aspects of the invention can be implemented as software, hardware, firmware or any combination of the three. Of course, wherever a component of the present invention is implemented as software, the component can be implemented as a standalone program, as part of a larger program, as a plurality of separate programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future to those of skill in the art of computer programming. Additionally, the present invention is in no way limited to implementation in any specific operating system or environment.
Referring now to
The invention herein described relates to an improved system and method for performing variable length coding. Specifically an improved VLC unit 108 is described in more detail below. While the present invention is described in detail below with reference to encoding according to the ISO MPEG standards, it should be understood that the principles of the present invention are applicable to other encoding methods. The present invention minimizes the time required to determine the type of encoding to use in order to store (run, level) pairs after an 8×8 matrix of video data values is transformed using a discrete cosine transform, quantized and serialized as specified in ISO MPEG standards documents and as described generally above with reference to
In MPEG4 compression, there are four ways to encode the quantized values. First, for certain (run, level) pairs which occur frequently, an encoding is specified in the MPEG standard which is unique and efficient in that it contains as few bits as possible. This is well known in the art of video data compression and is clearly explained in the ISO MPEG standard so it will not be described here except in an effort to clarify the invention.
Referring now to
The reason that the cells containing “X”s were chosen to have specific variable length codes by the MPEG standards committee is because those pairs are found most frequently in transformed video data and by using the least number of bits to encode those pairs, the video data will be highly compressed.
The variable length code table only represents 67 codes. Increasing the number of variable length codes stored in the table would necessarily increase the length of the codes used to compress the data. Instead of modifying the table to include less frequently used run-level pairs, the MPEG committee chose to implement 3 different escape modes. The MPEG specification details how the escape modes work and what follows is only a brief description. All details can be found in the pertinent ISO standards document.
In the following description of the present invention, Rmax refers to the maximum run of zeros represented as a VLC in the table shown in
Alternatively, for every run of zeros, R, there exists a level, Lmax, that is the maximum level for a given run R that is represented as a VLC. The relationship between R and Lmax is shown in the table of
In accordance with the present invention and now also referring to
The method first determines 602 whether the run-level pair can be directly encoded using the VLC codebook. If the run-level pair (R, L) can be encoded directly using a standard VLC value, the VLC is stored 604 in the compressed video stream and the next run-level pair (R, L) is checked. Specifically, the method determine if there are additional run-level pairs to encode in step 618, and if so gets 620 the next run-level pair and returns to step 602, if there are no more run-level pair, the method is complete and ends.
If the run-level pair (R, L) can't be encoded directly, then L−Lmax is calculated 606 where Lmax is a function of R (Lmax=f(R)). Then the method determines 608 if the pair (R, L−Lmax) can be encoded directly. If the pair (R, L−Lmax) can be encoded directly using a VLC value, then the first type of escape mode (type I) is used 610 for encoding. In other words, the VLC for (R, L−Lmax) is used but it is surrounded by an escape sequence which indicates that the level L retrieved from the table must be added to Lmax to get the true level L.
If neither (R, L) nor (R, L−Lmax) can be encoded directly using a VLC, then another test is done in step 612. Rmax is calculated 611 where Rmax is a function of L (Rmax=f(L)) and if (R−(Rmax+1), L) can be encoded directly 612 using a VLC, then the second type of escape mode (type II) is used 614 for encoding. The VLC which represents (R−(Rmax+1), L) is surrounded by bits indicating the second escape mode is in use and when the run and level are retrieved from the VLC table using the second escape mode, (Rmax+1) is added to the retrieved run value before the (R, L) pair is used for decoding.
Finally, if none of (R, L), (R, L−Lmax), and (R−(Rmax+1), L) can be encoded directly using a VLC, a third escape mode (type III) is used 616 which encodes the run and level directly. In one embodiment, this is done by using the run value plus the level value plus the sign directly. This is the least efficient of the encoding methods and requires the most bits for storage. After the run-level pair is encoded using the third escape mode, the method continues in step 618 to determine whether there are more runlevel pairs to encode.
In order to calculate the compressed video data stream for a transformed, quantized, serialized video data, the above tests 602, 608, 612 must be performed for each (run, level) pair. Every time the third escape mode (type III) is used, the VLC table must be accessed 3 times. Calculating the VLC for a given pair can take many clock cycles in a hardware implementation and looking up a VLC three times to encode a single (run, level) pair is time-consuming and wasteful. Additionally, in some hardware implementations, the design must be based on the worst-case delays. It is wasteful to wait for three VLC lookups for all pairs when most are known after the very first lookup.
Those skilled in the art will recognize that any of the steps for the process outlined in
Returning now to
Referring now to
Once the level is 28 or higher (or more specifically, the absolute value of L is 28 or higher) there are no values of R for which there is a valid VLC. Hence, in the table of
For this invention, a lookup table is implemented in the VLC unit 108 or in memory which returns a value Rmax for a given value of L. N/A is not a valid value to store in a lookup table and for one embodiment of this invention, the value 63 is stored in the Rmax lookup table. In other words, if a level L greater than 27 is provided to the VLC unit 108 and Rmax is requested, the value 63 is returned. This value is invalid because the MPEG4 specification specifically forbids storing a (run, level) pair with R=63 in the video data stream. The invalid value is used in this invention to indicate that there is no VLC available for the given L.
Referring to
Referring now to
Lmax can be calculated (or looked up) based on run R. Rmax can be calculated from level L. Looking at the first column (L<Lmax) and the second column (R<Rmax) it is shown that if either of those two equations are true, VLC encoding can be used. In other words, there exists a valid VLC for both of those two cases. Type III encoding cannot be used in these two cases because the MPEG4 standard specifically forbids encoding (R, L) pairs with the type III encoding when a VLC is available.
As shown in the third column, if L greater than Lmax and L less than or equal to 2*Lmax, escape type I encoding should preferably be used. Escape type III encoding could be used but is unnecessary and is less efficient in terms of bits used. In some cases, escape type II encoding could be used also and in those cases, the best encoding is the one that uses fewer bits in the compressed data stream.
The fourth column indicates where type II encoding is preferably used. If R is greater than Rmax and R less than 2(Rmax+1), type II encoding is the preferable escape mode. However, if (Lmax<L<=2*Lmax), type I encoding is also available and the best encoding depends on the length of the VLC codes used to represent the pairs (R, L−Lmax) and (R−(Rmax+1), L).
As shown in the fifth column, if L>2*Lmax, neither VLC nor type I encoding may be used.
In the sixth column, if R>=2(Rmax+1), neither VLC nor type II encoding is available.
Lmax=0 indicates that it will be impossible to use type I encoding because no valid Lmax is available for the given value of R. VLC encoding is also not possible in this case because there is no VLC code for the given (R, L) pair.
Finally, Rmax=63 indicates that type II encoding is not available because no valid Rmax is available for the given value of L. VLC encoding is also not possible in this case because there is no VLC code for the given (R, L) pair.
In the table shown in
Referring now to
At look-up table 303, Rmax is calculated or looked up for a given value of L on line 301. The Rmax value is added to the negated value of R 300 by adder 325 and the result (Rmax−R) is compared with 0 by comparator 320. If the result is greater than or equal to zero, a TRUE signal is sent to the OR block 321 and sign line 316 is asserted TRUE.
At adder 317, the value of (Lmax−L) output by adder 314 is added to the value of Lmax from the look-up table 302. At comparator 318, in parallel to other calculations and logical blocks in one embodiment, if the value from adder 317 (Lmax+Lmax−L) is greater than or equal to zero a TRUE signal is sent on line 319, indicating that it is possible to use escape type I.
At adder 305, the value 1 is added to the value of Rmax output by the look-up table 303. At adder 306, the new value (Rmax+1) is added to itself, doubling it. At adder 307, R 300 is subtracted from the 2(Rmax+1) value. At comparator 308, the output of 307 is compared with zero. If it is greater than zero, a TRUE signal is sent on line 309 indicating that it is possible to use escape type II.
If either comparator 308 and 318 are false, a TRUE signal is sent to the AND block 310. If both 308 and 318 are false, then both send TRUE signals to block 310 and block 310 sends a TRUE signal to the OR block 311. In response, the OR block 311 outputs a TRUE signal on line 312 indicating that escape type III must be used for this pair (R 300, L 301).
From look-up table 303, the value of Rmax is compared with the number 63 at 313. If Rmax=63, a TRUE signal is sent to the OR block 311. In response, the OR block 311 sends a TRUE signal on line 312 indicating that escape type III must be used for this pair (R 300, L 301).
These calculations and logic sequences shown in
It will be recognized that the descriptions, calculations, and sequences described in this specification represent an embodiment of the invention and that one skilled in the art might implement the same invention in a slightly different manner but one that is in keeping with the spirit of this invention.
The present application claims priority under 35 U.S.C. § 119(e) to U.S. provisional patent application entitled “Video Processing System and Method” filed on May 7, 2004, having Ser. No. 60/568,892, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60568892 | May 2004 | US |