1. Field of the Invention
The present invention relates to video encoders and, more particularly, to a method and apparatus for detecting zero coefficients for various video encoding functions.
2. Description of the Background Art
The International Telecommunication Union (ITU) H.264 video coding standard is able to compress video much more efficiently than earlier video coding standards, such as ITU H.263, MPEG-2 (Moving Picture Experts Group), and MPEG-4. H.264 is also known as MPEG-4 Part 10 and Advanced Video Coding (AVC). H.264 exhibits a combination of new techniques and increased degrees of freedom in using existing techniques. Among the new techniques defined in H.264 are 4×4 and 8×8 discrete cosine transform (DCT). Since transformed quantized coefficients are used to form the final outputs of the encoding process, and since various encoding functions (e.g., motion estimation, intra prediction, and mode selection) involve numerous coefficient calculations, it is helpful to be able to quickly determine if a block will result in all zero coefficients by using simple computations.
For example, a method for implementing 4×4 intra mode decision is to compute coefficients for each 4×4 predicted region subtracted from the original or reconstructed pixels for all nine modes. Since a macroblock has 16 4×4 blocks, the method may have to perform 16×9=144 transforms and quantizations steps. Once all the computations are completed, the method will then be able to select the best mode. Unfortunately, this large number of calculations is computationally expensive and may be prohibitively large for real-time systems. Accordingly, there exists a need in the art for detecting zero coefficients for various video encoding functions in a more efficient manner.
In one embodiment, the present invention discloses a method and apparatus for determining whether a block of pixels will likely contain all zero coefficients for various video encoding functions. For example, the method receives or obtains a block of pixels from an input image and computes a distortion measure for the block of pixels. The method then determines whether the block of pixels contains all zero coefficients in accordance with the distortion measure.
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
Method and apparatus for implementing a video encoder is described. More specifically, the present invention discloses an implementation of an encoder, e.g., an H.264 encoder, that is capable of detecting zero coefficients (e.g., coefficients that will likely have values that will be zeros) for various video encoding functions in a more efficient manner. A brief description of the various encoding functions performed by an H.264 encoder or an H.264-like encoder is first described. One or more of these encoding functions (e.g., motion estimation, intra prediction, and mode selection) may benefit from a method that is capable of quickly detecting zero coefficients in a block.
The DCT module 104 transforms the difference signal from the pixel domain to the frequency domain using a DCT algorithm to produce a set of coefficients. The quantizer 106 quantizes the DCT coefficients. The entropy coder 108 codes the quantized DCT coefficients to produce a coded frame.
The inverse quantizer 110 performs the inverse operation of the quantizer 106 to recover the DCT coefficients. The inverse DCT module 112 performs the inverse operation of the DCT module 104 to produce an estimated difference signal. The estimated difference signal is added to the predicted frame by the summer 114 to produce an estimated frame, which is coupled to the deblocking filter 116. The deblocking filter deblocks the estimated frame and stores the estimated frame or reference frame in the frame memory 118. The motion compensated predictor 120 and the motion estimator 124 are coupled to the frame memory 118 and are configured to obtain one or more previously estimated frames (previously coded frames).
The motion estimator 124 also receives the source frame. The motion estimator 124 performs a motion estimation algorithm using the source frame and a previous estimated frame (i.e., reference frame) to produce motion estimation data. For example, the motion estimation data includes motion vectors and minimum SADs (sum of absolute differences) for the macroblocks of the source frame. The motion estimation data is provided to the entropy coder 108 and the motion compensated predictor 120. The entropy coder 108 codes the motion estimation data to produce coded motion data. The motion compensated predictor 120 performs a motion compensation algorithm using a previous estimated frame and the motion estimation data to produce the predicted frame, which is coupled to the intra/inter switch 122. Motion estimation and motion compensation algorithms are well known in the art.
To illustrate, the motion estimator 124 may include mode decision logic 126. The mode decision logic 126 can be configured to select a mode for each macroblock in a predictive (INTER) frame. The “mode” of a macroblock is the partitioning scheme. That is, the mode decision logic 126 selects MODE for each macroblock in a predictive frame, which is defined by values for MB_TYPE and SUB_MB_TYPE.
The above description only provides a brief view of the various complex algorithms that must be executed to provide the encoded bitstreams generated by an H.264 encoder. The increase in complexity is often a result of a desire to provide better encoding characteristics, e.g., less distortion in the encoded images while using less number of bits to transmit the encoded images. In order to achieve these improved encoding characteristics, it is often necessary to increase the overall computational overhead of an encoder. Unfortunately, the increase in computational overhead also increases the difficulty in implementing a real-time H.264 encoder. As such, the present invention provides a method that is capable of improving various encoding functions (e.g., motion estimation, intra prediction, and mode selection) by quickly detecting zero coefficients in a block.
More specifically, in H.264/AVC video coding standard, coefficients are computed by transforming and quantizing a set of pixels known as the “residuals”. For example, the residual pixels may be obtained by subtracting two sets of 4×4 pixel regions that depend on the implementation as well as the section of the encoding process. For example, during intra mode selection, the residuals are obtained by subtracting the predicted pixels from the original or reconstructed pixels; while during motion estimation, the residuals are the difference of the reconstructed pixels from the original.
Let R=[rij], for 1≦i, j≦4, be the 4×4 residual pixel block. The transform of R is obtained as:
The quantization of the transformed residuals T=[tij], for 1≦i, j≦4, is obtained as:
where
Here └.┘ is the floor operator, and Q is the quantization parameter or level. The level scale constant Mb is an element mab of the matrix M below where the row a=1+(Q %6), and column b=1+(i %2)+(j %2) of M, and % is the modulo operator. Matrix M is:
Let M1 be an element of matrix M from a given row (determined by Q %6) from column 1 of M, and M2 be an element from the same row and column 2, and M3 be an element from the same row and column 3 of M. Then we have from M above:
M1<M2<M3. (Eq. 4)
It should be noted that the matrix transform T=ARAT can be simplified into 16 vector inner products as follows:
Note that if a matrix W=[w11 w12 . . . w43 w44] is constructed, then W is orthogonal. From (Eq. 2), after combining the transform tij with Mb, we have the following 4×4 matrix for |cij|
where C=[|cij|] is the coefficient matrix and ONE is a 4×4 matrix of all 1s.
In order to obtain the upper bounds of |cij|, the present invention explores the upper bounds of Mb|wijTr|, for 1≦i, j≦4, and b=1+(i %2)+(j %2). The well-known Hölder's inequality of vector norms can be used to obtain:
|wijtr|≦wij∥p∥r∥q, (Eq. 7)
where 1≦p,q≦∞, 1/p+1/q=1, and ∥.∥ is the Lp norm. In one embodiment, the present invention selects values for p and q to derive an upper bound of |wijTr|:
p=2, and q=2:
|wijTr|≦∥wij∥2∥r∥2=∥wij∥2√D, where D=∥r∥22=Distortion. (Eq. 8)
From (Eq. 8), we have:
where b=1+(i %2)+(j %2). From (Eq. 5) and (Eq. 6), one may get three variations of Mb∥wij∥2, which are 10M1, √{square root over (40)}M2, and 4M3. For different values of Q %6, these are:
For Q %6={0,2,4} 10M1 is largest, whereas for Q %6={1,3,5} 4M3 is largest. Thus, the new bound B1 is:
for 1≦i, j≦4. As such, the present method can detect an all zero coefficient block as:
In one embodiment, the above B1 bound can be slightly modified or simplified. For example, the bound B1 in (Eq. 11) can be modified as:
In one embodiment, the PB of Eq. 13 serves as an upper bound of |cij| for the detection of all zero coefficient blocks. In other words, the present method can detect an all zero coefficient block with PB as:
In sum, the present invention has disclosed a method for quickly determining whether a block of pixels will likely contain all zero coefficients. More specifically, by computing a distortion measure D (e.g., using Eq. 8 above) for a block of pixels (e.g., a 4×4 block, or a 8×8 block and the like), one can then easily compare the computed distortion measure D against a threshold (e.g., as defined in Eq. 14) to determine whether the block of pixels will likely contain all zero coefficients. If the computed distortion measure D is less than the defined threshold,
e.g., the right side of Eq. 14, then the block will likely contain all zero coefficients. However, If the computed distortion measure D is greater than or equal to the defined threshold, then the block will likely contain some non-zero coefficients. Therefore, the present invention provides a rapid method to determine whether a block of pixels will likely contain all zero coefficients without having to perform a transform step or a quantization step for the block of pixels. This increased efficiency allows the present invention to be implemented in real-time encoding applications.
In step 210, method 200 receives or obtains a block of pixels for processing. For example, a block of 4×4 pixels can be selected for processing. It should be noted that although the present invention is described within the context of a 4×4 block of pixels, the present invention can be adapted to any block size, e.g., an 8×8 block of pixels and so on. It should be noted that the block of pixels can be selected to undergo various encoding functions (e.g., motion estimation, intra prediction, and mode selection).
In step 220, method 200 computes a distortion measure, e.g., D, for the block of pixels, e.g., using Eq. 8 as discussed above. For example, in the context of motion estimation, a residue r can be computed by subtracting a predicted block from a reconstructed block (or a reference block in a reference frame). In turn, the computed residue r can be used to compute the distortion measure D for the block of pixels, e.g., using Eq. 8 above, which essentially involves a sum of square operation.
In step 230, method 200 determines whether the computed distortion measure D is greater than a predefined threshold, e.g., as defined in Eq. 14. In one embodiment, a set of thresholds is provided that correlates to the number of available quantization levels or scales. For example, if there are 52 quantization levels, then a table having 52 thresholds is generated in accordance with Eq. 14 and stored. If the query is answered positively in step 230, method 200 proceeds to step 240. If the query is answered negatively, method 200 proceeds to step 250.
In step 240, method 200 will deem the block of pixels as containing all zero coefficients. In other words, an encoding function can quickly determine that this block of pixels will likely produce a block of all zero coefficients. As such, the computationally expensive steps of performing a transform operation followed by a quantization operation can be avoided for this block of pixels.
In step 250, method 200 will deem the block of pixels as containing at least one non-zero coefficient. As such, the computationally expensive steps of performing a transform operation followed by a quantization operation cannot be avoided for this block of pixels.
In step 260, method 200 determines whether there is an additional block that requires processing. If the query is answered positively, method 200 proceeds back to step 210 to receive the next block of pixels. If the query is answered negatively, method 200 ends in step 265.
It should be noted that additional encoding steps can be implemented after method 200 is performed. In other words, knowing whether a block of pixels will contain all zero coefficients will expedite the various encoding functions as described with respect to
It should be noted that although not specifically specified, one or more steps of method 200 may include a storing, displaying and/or outputting step as required for a particular application. In other words, any data, records, fields, and/or intermediate results discussed in the method can be stored, displayed and/or outputted to another device as required for a particular application. Furthermore, steps or blocks in
In one embodiment, the memory 303 stores processor-executable instructions and/or data that may be executed by and/or used by the processor 301 as described further below. These processor-executable instructions may comprise hardware, firmware, software, and the like, or some combination thereof. Modules having processor-executable instructions that are stored in the memory 303 may include encoding module 312. For example, the encoding module 312 is configured to perform the method 200 of
An aspect of the invention is implemented as a program product for execution by a processor. Program(s) of the program product defines functions of embodiments and can be contained on a variety of signal-bearing media (computer readable media), which include, but are not limited to: (i) information permanently stored on non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM or DVD-ROM disks readable by a CD-ROM drive or a DVD drive); (ii) alterable information stored on writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or read/writable CD or read/writable DVD); or (iii) information conveyed to a computer by a communications medium, such as through a computer or telephone network, including wireless communications. The latter embodiment specifically includes information downloaded from the Internet and other networks. Such signal-bearing media, when carrying computer-readable instructions that direct functions of the invention, represent embodiments of the invention.
While the foregoing is directed to illustrative embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims the benefit of U.S. Provisional Application No. 60/863,984 filed on Nov. 2, 2006, which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60863984 | Nov 2006 | US |