1. Field of Invention
The present invention relates to digital video decompression, and, more specifically to an efficient video bit stream decoding method and apparatus that results in the saving of computing times of accessing referencing memory.
2. Description of Related Art
ISO and ITU have separately or jointly developed and defined some digital video compression standards including MPEG-1, MPEG-2, MPEG-4, MPEG-7, H.261, H.263 and H.264. The success of development of the video compression standards fuels wide applications which include video telephony, surveillance system, DVD, and digital TV. The advantage of digital image and video compression techniques significantly saves the storage space and transmission time without sacrificing much of the image quality.
Most ISO and ITU motion video compression standards adopt Y, Cb and Cr as the pixel elements, which are derived from the original R (Red), G (Green), and B (Blue) color components. The Y stands for the degree of “Luminance”, while the Cb and Cr represent the color difference been separated from the “Luminance”. In both still and motion picture compression algorithms, the 8×8 pixels “Block” based Y, Cb and Cr goes through the similar compression procedure individually.
There are essentially three types of picture encoding in the MPEG video compression standard. I-frame, the “Intra-coded” picture uses the block of 8×8 pixels within the frame to code itself. P-frame, the “Predictive” frame uses previous I-frame or P-frame as a reference to code the difference. B-frame, the “Bi-directional” interpolated frame uses previous I-frame or P-frame as well as the next I-frame or P-frame as references to code the pixel information. In principle, in the I-frame encoding, all “Block” with 8×8 pixels go through the same compression procedure that is similar to JPEG, the still image compression algorithm including the DCT, quantization and a VLC, the variable length encoding. While, the P-frame and B-frame have to code the difference between a target frame and the reference frames.
In decompressing the P-type or B-type of video frame or block of pixels, accessing the referencing memory requires a lot of time. Due to IO data pad limitation of most semiconductor memories, accessing the memory and transferring the pixels stored in the memory becomes bottleneck of most implementations.
The method and apparatus of this invention significantly speeds up the procedure of reconstructing the digital video frames of pixels.
The present invention is related to a method and apparatus of the video data stream decoding, which speeds up the procedure of reconstructing the digital video with less power consumption. The present invention significantly reduces the computing times compared to its counterparts in the field of video stream decompression.
The present invention of the efficient video bit stream decoding analyzes the complexity and quality of the compressed video stream and decides which frame of pixels can be skipped in decompression.
The present invention of the efficient video bit stream, a hierarchical analysis way is applied to quickly decide whether a By-type frame can be skipped in motion compensation.
According to one embodiment of present invention, B-type frame or block will more likely be skipped than P-type frame or block in video decompression.
According to one embodiment of present invention, when skipping P-type frame or block, lossless image quality of the majority of pixels is required.
According to one embodiment of present invention, prediction mechanism is applied to determine which frame or block can be skipped.
According to one embodiment of present invention, within a macroblock, some blocks can be skipped in video decompression, some can not be skipped.
According to one embodiment of present invention, when skipping is determined, the weighted factors of neighboring frames are updated according to which block is skipped.
According to one embodiment of present invention, there will be no need to accessing the reference frame memory data which block is to be skipped, hence, the memory bandwidth will be available for other unit to access and the decompression engine has more time to work on other operation.
It is to be understood that both the foregoing general description and the following detailed description are by examples, and are intended to provide further explanation of the invention as claimed.
There are essentially three types of picture coding in the MPEG video compression standard as shown in
In most applications, since the I-frame does not use any other frame as reference and hence no need of the motion estimation, the image quality is the best of the three types of pictures, and requires least computing power in encoding. The encoding procedure of the I-frame is similar to that of the JPEG picture. Because of the motion estimation needs to be done in referring both previous and next frames, encoding B-type frame consumes most computing power compared to I-frame and P-frame. The lower bit rate of B-frame compared to P-frame and I-frame is contributed by the factors including: the averaging block displacement of a B-frame to either previous or next frame is less than that of the P-frame and the quantization step is larger than that in a P-frame. In most video compression standard including MPEG, a B-type frame is not allowed for reference by other frame of picture, so, error in B-frame will not be propagated to other frames and allowing bigger error in B-frame is more common than in P-frame or I-frame. Encoding of the three MPEG pictures becomes tradeoff among performance, bit rate and image quality, the resulting ranking of the three factors of the three types of picture encoding are shown as below:
In the encoding of the differences between frames, the first step is to find the difference of the targeted frame, followed by the coding of the difference. For some considerations including accuracy, performance, and coding efficiency, in some video compression standards, a frame is partitioned into macroblocks of 16×16 pixels to estimate the block difference and the block movement. Each macroblock within a frame has to find the “best match” macroblock in the previous frame or in the next frame. The mechanism of identifying the best match macroblock is called “Motion Estimation”.
Practically, a block of pixels will not move too far away from the original position in a previous frame, therefore, searching for the best match block within an unlimited range of region is very time consuming and unnecessary. A limited searching range is commonly defined to limit the computing times in the “best match” block searching. The computing power hungered motion estimation is adopted to search for the “Best Match” candidates within a searching range for each macro block as described in
The Best Match Algorithm, BMA, is the most commonly used motion estimation algorithm in the popular video compression standards like MPEG and H.26x. In most video compression systems, motion estimation consumes high computing power ranging from ˜50% to ˜80% of the total computing power for the video compression. In the search for the best match macroblock, a searching range, for example +/−16 pixels in both X- and Y-axis, is most commonly defined. The mean absolute difference, MAD or sum of absolute difference, SAD as shown below, is calculated for each position of a macroblock within the predetermined searching range, for example, a +/−16 pixels of the X-axis
and Y-axis. In above MAD and SAD equations, the Vn and Vm stand for the 16×16 pixel array, i and j stand for the 16 pixels of the X-axis and Y-axis separately, while the dx and dy are the change of position of the macroblock. The macroblock with the least MAD (or SAD) is from the BMA definition named the “Best match” macroblock. The calculation of the motion estimation consumes most computing power in most video compression systems.
Since B-type frame will need to access referencing pixels of previous frame and next frame which requires high memory I/O bandwidth, one of the method of this invention of decoding the video stream is to analyze the complexity of the image of the B-type frame, if not much difference between the two referencing frames, the corresponding B-type frame will be skipped and use the nearest referencing frame to represent it instead. Some video streams having same displacement of most pixels caused by vibration or movement of the image capturing device video recorder. Even most blocks with a B-type frame have displacement of non “0”, the B-type frame video decoding procedure can also be skipped and the nearest neighbor frame is used to represent the skipped B-type frame.
In some video standard including MPEG, there are efficient methods of P-type and B-type frame compression. For instance, the “Skip Macro-block” or named “Skip MB” in MPEG video compression standard, when an encoder can assign a code of “Skip Macro-block”. A macro-block in MPEG comprised of 4 Y blocks and another 1 or 2 Cb and Cr blocks with each block having 8×8 pixels. The requirement of applying “Skip MB” code includes
1. All 4 Y blocks having same motion vector of (0,0): this means no movement
2. All DCT coefficients of all 4Y and Cb Cr blocks are “0”: standing for no change of content with all pixels within a macro-block.
As illustrated in
This invention of the video decompression accesses only blocks pixels within a macro-block which are not equal to the corresponding block of the neighboring frame. Those blocks with DCT coefficients of all “0s” access only those blocks in one instead of two neighboring frame. Another key of the invention of the video decompression is those macro-blocks with none (0,0) MVs and can not apply “Skip MB” code due to frame movement can still be skipped if the DCT coefficients are minimum values. The Cb and Cr have higher probability of being all 0s than Y block and can more frequently being skipped one neighboring frame. To add the pixels of only one neighboring frame, the weighted factor in the original video stream should be updated to be “1” from 0.5/0.5 (if one B-type frame between 2 P-type or I-type frames) or 0.33/0.67 (if two B-type frame between 2 P-type or I-type frames). Accessing only one neighboring frame for compensation in B-type frame saves 50% of time and hence the I/O bandwidth of the storage device of the referencing buffer.
Some blocks and even macro-blocks move with the same motion vectors and one method of the is invention of the video decompression is to predict which blocks or macro-blocks have high probability of having same MV and all DCT coefficients are 0s, and the procedure of video block decompression can even be skipped and only access one neighboring frame.
For further reduce the time of accessing the referencing frame buffer when decompressing a B-type frame, this invention decodes the two block pixels difference plan of the two neighboring frames, the block of the frame with smaller difference will be selected to represent the pixels difference plan and the block pixels of the corresponding referencing frame will be accessed for motion compensation.
In some applications, many B-type frames can be skipped without degrading much the image quality. The decision of skipping B-type frame can be done by comparing the factor that the sum of difference between two P-type (or I-type and P-type) frames is smaller than a predetermined threshold. For quickly determine whether and which B-type frame can be skipped, a calculation of selecting some macro-blocks from variable locations of an image can decide the error with and without decoding the B-type frame. This method can also be done by a hierarchical way by firstly calculating the error between skipping and non-skipping, after identifying the location of higher error, more blocks of pixels are calculated in the second level of error estimation. For example, a B-type picture can be partition to be 4 quadrants with each having video stream of 32 macro-blocks to be decoded and calculated the error. The quadrant(s) with higher than a predetermined threshold will go through a second round of error estimation with another said 16 macro-blocks of each quadrant within it and to decide which quadrant(s) has high error.
It is very common that the frame has moving due to vibration or intentional moving of video recorder. In both cases, all macro-blocks within a frame will have the same frame motion vector, or called FMV. In skipping motion compensation of accessing referencing frames, one can copy blocks of the previous or next frame pixels of the corresponding frame motion vector to represent the blocks skipping motion compensation. It is also very common that a group of blocks of pixels are moving at the same motion direction and displacement which result in the same motion vector and some blocks not by the edge of an object can skip motion compensation of accessing two referencing frames but copying the corresponding blocks of pixels to represent the current block pixels.
It will be apparent to those skills in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or the spirit of the invention. In the view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.