1. Field of Invention
The present invention relates to still image and motion video compression, and, more specifically to the efficient DCT coefficient coding method and apparatus that results in the saving of the computing times with higher coding efficiency.
2. Description of Related Art
Digital image and video have been adopted in an increasing number of applications, which include digital camera, scanner/printer, video telephony, videoconferencing, surveillance system, VCD (Video CD), DVD, and digital TV. In the past almost two decades, ISO and ITU have separately or jointly developed and defined some digital image and video compression standards including JPEG, MPEG-1, MPEG-2, MPEG-4, MPEG-7, H.261, H.263 and H.264. The success of development of the video compression standards fuels the wide applications. The advantage of image and video compression techniques significantly saves the storage space and transmission time without sacrificing much of the image quality.
Most ISO and ITU motion video compression standards adopt Y, Cb and Cr as the pixel elements, which are derived from the original R (Red), G (Green), and B (Blue) color components. The Y stands for the degree of “Luminance”, while the Cb and Cr represent the color difference been separated from the “Luminance”. In both still and motion picture compression algorithms, the 8x8 pixels “Block” based Y, Cb and Cr goes through the similar compression procedure individually.
There are essentially three types of picture encoding in the MPEG video compression standard. I-frame, the “Intra-coded” picture uses the block of 8×8 pixels within the frame to code itself. P-frame, the “Predictive” frame uses previous I-frame or P-frame as a reference to code the difference. B-frame, the “Bi-directional” interpolated frame uses previous I-frame or P-frame as well as the next I-frame or P-frame as references to code the pixel information. In principle, in the I-frame encoding, all “Block” with 8×8 pixels go through the same compression procedure that is similar to JPEG, the still image compression algorithm including the DCT, quantization and a VLC, the variable length encoding. While, the P-frame and B-frame have to code the difference between a target frame and the reference frames.
In most video compression standards including the MPEG 1, MPEG 2 or MPEG 4, there are six to eight syntactical layers of video streams which includes video sequence, group of pictures (GOP), picture, slice, macroblock and block layers.
This invention provides an efficient bit stream encoding method specifically for the reduction of computing time in the motion compensation as well as an efficient method of DCT coefficient coding for both still image and motion video compression.
The present invention is related to a method and apparatus of the image and video data encoding, which plays an important role in digital still image, JPEG and motion video compression, specifically in encoding the MPEG video stream. The present invention significantly reduces the computing times compared to its counterparts in the field of image and video compression.
It is to be understood that both the foregoing general description and the following detailed description are by examples, and are intended to provide further explanation of the invention as claimed.
The present invention relates specifically to the video bit stream encoding. The method and apparatus quickly encodes the block bit stream data, which results in a significant saving of the computing times.
There are in principle three types of picture encoding in the MPEG video compression standard including I-frame, the “Intra-coded” picture, P-frame, the “Predictive” picture and B-frame, the “Bi-directional” interpolated picture. I-frame encoding uses the 8×8 block of pixels within a frame to code information of itself. The P-frame or P-type macro-block encoding uses previous I-frame or P-frame as a reference to code the difference. The B-frame or B-type macro-block encoding uses previous I- or P-frame as well as the next I- or P-frame as references to code the pixel information. In most applications, since the I-frame does not use any other frame as reference and hence no need of the motion estimation, the image quality is the best of the three types of pictures, and requires least computing power in encoding. Because of the motion estimation needs to be done in both previous and next frames, bi-directional encoding, encoding the B-frame has lowest bit rate, but consumes most computing power compared to I-frame and P-frame. The lower bit rate of B-frame compared to P-frame and I-frame is contributed by the factors including: the averaging block displacement of a B-frame to either previous or next frame is less than that of the P-frame and the quantization step is larger than that in a P-frame. Therefore, the encoding of the three MPEG pictures becomes tradeoff among performance, bit rate and image quality, the resulting ranking of the three factors of the three types of picture encoding are shown as below:
JPEG image compression as shown in
A color space conversion 30 mechanism transfers each 8×8 block pixels of the R(Red), G(Green), B(Blue) components into Y(Luminance), U(Chrominance), V(Chrominance) and further shifts them to Y, Cb and Cr. JPEG compresses 8×8 block of Y, Cb, Cr 31, 32, 33 by the following procedures:
DCT 35 converts the time domain pixel values into frequency domain. After transform, the DCT “Coefficients” with a total of 64 sub-bands of frequency represent the block image data, no long represent single pixel. The 8×8 DCT coefficients form the 2-dimention array with lower frequency accumulated in the left top corner, the farer away from the left top, the higher frequency will be. Further on, the closer to the left top, the more DC frequency which dominates the more information. The more right bottom coefficient represents the higher frequency which less important in dominance of the information. Like filtering, quantization 36 of the DCT coefficient is to divide the 8×8 DCT coefficients and to round to predetermined values. Most commonly used quantization table will have larger steps for right bottom DCT coefficients and smaller steps for coefficients in more left top corner. Quantization is the only step in JPEG compression causing data loss. The larger the quantization step, the higher the compression and the more distortion the image will be.
After quantization, most DCT coefficient in the right bottom direction will be rounded to “0s” and only a few in the left top corner are still left non-zero which allows another step of said “Zig-Zag” scanning and Run-Length packing 37 which starts left top DC coefficient and following the zig-zag direction of scanning higher frequency coefficients. The Run-Length pair means the number of “Runs of continuous 0s”, and value of the following non-zero coefficient.
The Run-Length pair is sent to the so called “Variable Length Coding” 38 (VLC) which is an entropy coding method. The entropy coding is a statistical coding which uses shorter bits to represent more frequent happen patter and longer code to represent the less frequent happened pattern. The JPEG standard accepts “Huffman” coding algorithm as the entropy coding. VLC is a step of lossless compression. JPEG is a lossy compression algorithm, the JPEG picture with less than 10× compression rate has sharp image quality, 20× compression will have more or less noticeable quality degradation.
The JPEG compression procedures are reversible, which means the following the backward procedures, one can decompresses and recovers the JPEG image back to raw and uncompressed YUV (or further on RGB) pixels. The main disadvantage of JPEG compression algorithm is the input data are sub-sampled and the compression algorithm itself is a lossy algorithm caused by quantization step which might not be acceptable in some applications
The block pixel differences between a target block and the best match block are coded by going through the DCT, quantization and VCL encoding. The procedure of calculating the block MV and encoding the block pixel differences is called “Motion Compensation”. The DCT and quantization together consumes about 20% computing power. The VLC encoding consumes around 5-10%, while the motion compensation dominates about another 5%-10% of the total computing power.
The DCT, Discrete Cosine Transform consumes the high times of computing in most image and video compression standards. DCT equation is shown as below:
After the DCT transform, the more close to the left top corner AC coefficients, dominates more information. From the other hand, the closer to the right bottom, the less information the AC coefficient dominates. Therefore, the AC farer away from the DC and left top corner can be filtered out to be “0s” by quantization step without sacrificing much image quality.
If the block pixel difference range is smaller than an adaptively predetermined threshold, after the quantization with a predetermined quantization scale which is decided by the image quality and buffer, bit rate controller, then all AC coefficients are filtered out to be 0s and only the DC coefficient is left. If there is only DC left, then a very short “End of Block”, EOB, said “000”” code is assigned to represent the completeness of the block encoding.
Similar mechanism to the video compression as described above can be applied to the JPEG compression except for the differential block pixel calculation. In JPEG, each block pixels can look at left or upper row of blocks of pixels to identify whether a block has similarity or identical values to the target block and can represent the target block without running the procedures of the image compression hence can reduce the times of computing.
An optimized coding method of this invention is to apply variable code to represent the tables of DCT coefficient coding of each sub-band 71, 73 as shown in
An example as illustrated in the following more clearly describes the way of applying the variable length of code the DCT AC coefficients: AC10=3, AC11=0, AC12=−2, AC13=3, AC14=−3, since they range from −3 to 3 which is within [−3,+3], the sequence code to represent these 5 sub-band AC coefficients will be: one 1-bit code of “0” representing range of [−3,+3] followed by 5 values of 3-bit codes, 111, 100, 010, 111 and 011 representing values of 3, 0, −2, 3 and −3 resulting in a shorter code length.
After quantization, the higher frequency DCT coefficients have high possibility of being rounded to “0s”. For the block coding, there is a chance that from a certain AC coefficient, no longer non-zero coefficient, which is very common and using a short code like “0000” to represent “End Of Block” 75 can easily achieve short code length.
It will be apparent to those skills in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or the spirit of the invention. In the view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.