This invention relates generally to digital video processing and in particular to digital video coding and compression.
To achieve the highest coding efficiency, advanced video coding (AVC) employs rate distortion optimisation (RDO) techniques to get the best coding result in terms of maximising coding quality and minimising resulting data bits. Advanced video coding includes AVC, H.264, MPEG-4 Part 10, and JVT. Further information about AVC can be found in ITU-T Rec. H.264|ISO/IEC 14496-10 AVC, “Joint Final Committee Draft (JFCD) of Joint Video Specification,” Klagenfurt, Austria, Jul. 22-26, 2002. To achieve RDO, the encoder uses all mode combinations to encode exhaustively the video. Such mode combinations include different intra and inter prediction modes. Consequently, the complexity and computational load of video coding in AVC increase drastically, which makes practical applications such as video communication difficult using state-of-the-art hardware systems.
Several efforts have been reported regarding fast algorithms in motion estimation for AVC video coding. See xiang Li and Guowei Wu, “Fast Integer Pixel Motion Estimation,” JVT-F011, 6th Meeting, Awaji Island, Japan, Dec. 5-13, 2002; Zhibo Chen, Peng Zhou, and Yun He, “Fast Integer Pel and Fractional Pel Motion Estimation for JVT,” JVT-F017, 6th Meeting, Awaji Island, Japan, Dec. 5-13, 2002; and Hye-Yeon Cheong Tourapis, Alexis Michael Tourapis and Pankaj Topiwala, “Fast Motion Estimation within the JVT Codec”, JVT-E023, 5th Meeting, Geneva, Switzerland, Oct. 9-17, 2002. However, no fast algorithm in intra prediction for AVC has been reported.
Intra coding refers to the case where only spatial redundancies within a video picture are exploited. The resulting picture is referred to as an I-picture. Traditionally, I-pictures are encoded by directly applying a transform to all macroblocks in the picture, which generates a much larger number of data bits compared to that of inter coding. To increase the efficiency of the intra coding, spatial correlation between adjacent macroblocks in a given picture is exploited in an AVC process. The macroblock of interest can be predicted from the surrounding macroblocks. The difference between the actual macroblock and its prediction is coded.
If a macroblock is encoded in intra mode, a prediction block is formed based on the previously encoded and reconstructed blocks. For the luminance (luma) components, intra prediction may be used for each 4×4 sub-block or 16×16 macroblock. There are nine prediction modes for 4×4 luma blocks and four prediction modes for 16×16 luma blocks. For the chrominance (chroma) components, four prediction modes may be applied to the two 8×8 chroma blocks (U and V). The resulting prediction mode for U and V components should be the same.
Again, AVC video coding is based on the concept of rate distortion optimisation; the encoder has to encode the intra block using all the mode combinations and choose the one that gives the best RDO. According to the structure of intra prediction in AVC, the number of mode combinations for luma and chroma blocks in a macroblock is M8×(M4×16+M16), where M8, M4 and M16 represent the number of modes for 8×8 chroma blocks, 4×4 luma blocks, and 16×16 luma blocks, respectively. Thus, for a macroblock, 592 RDO calculations must be performed before a best RDO is determined. Consequently, the complexity and computational load of the encoder is extremely high.
In accordance with one aspect of the invention, there is provided a method of AVC intra prediction to code digital video comprising a plurality of pictures. The method comprises the steps of: generating edge directional information for each intra block of a digital picture; and choosing most probable intra prediction modes for rate distortion optimisation dependent upon the generated edge directional information.
The edge directional information may be generated by applying at least one edge operator to the digital picture. The edge operator may be applied to every luminance and chrominance pixel except any pixels of the borders of the luminance and chrominance components of the digital picture. The method may further comprise the step of deciding the amplitude and angle of an edge vector for a pixel. The edge directional information may comprise an edge direction histogram calculated for all pixels in each intra block. The edge direction histogram may be for a 4×4 luma block; prediction modes may comprise 8 directional prediction modes and a DC prediction mode. The edge direction histogram is for 16×16 luma and 8×8 blocks; prediction modes may comprise 2 directional prediction modes, a plane prediction mode, and a DC prediction mode.
The edge direction histogram may sum up the amplitudes of pixels with similar directions in the block.
The method may further comprise the step of terminating an RDO mode computation and rejecting the current RDO mode if the number of non-zero coefficients in a current RDO mode computation exceeds that in a previously computed RDO mode.
The method may further comprise the step of intra coding a block of the digital picture using the chosen most probable intra prediction modes.
In accordance with a further aspect of the invention, there is provided an apparatus using AVC intra prediction to code digital video comprising a plurality of pictures. The apparatus comprises a device for generating edge directional information for each intra block of a digital picture; and a device for choosing most probable intra prediction modes for rate distortion optimisation dependent upon the generated edge directional information. Other aspects of the apparatus may be implemented in line with aspects of the above method,
Embodiments of the invention are described hereinafter with reference to the drawings, in which:
A method, an apparatus, and a computer program product for AVG intra prediction to code digital video comprising a plurality of pictures are disclosed herein. While only a small number of embodiments are set forth, it will be appreciated by those skilled in the art that numerous changes and/or substitutions may be made without departing from the scope and spirit of the invention. In other instances, details well known to those skilled in the art may be omitted so as not to obscure the invention.
The embodiments of the invention provide a fast mode decision algorithm for AVC intra prediction based on local edge directional information, which reduces the amount of calculations in intra prediction. Based on edge information in the image block to be predicted, a local edge direction histogram, an edge directional field, or any other form of edge directional information is generated for each image block. Based on this edge directional information, a mechanism is provided to choose only a small number of the most probable intra prediction modes for rate distortion optimisation calculation. That is, with the use of edge direction histograms derived from the edge map of the picture, only a small number of most possible intra prediction modes are chosen for the RDO calculation. Therefore, the fast mode decision algorithm increases significantly the speed of intra coding. The pixels along a local edge direction are normally of similar values (both luma and chroma components). Therefore, a good prediction may be achieved if the pixels are predicted using those neighbouring pixels that are in the same direction as an edge.
Embodiments of the invention have one or more of the following features: Edge directional information in an image block (4×4, 8×8, 16×16, or any other block size) is used to guide the process of intra prediction;
Edge direction histogram may be used as the local edge directional information to guide the process of intra prediction;
Edge directional field may be used as the local edge directional information to guide the process of intra prediction.
Other forms of edge directional information in the image block may be used as the local edge directional information to guide the process of intra prediction;
One edge direction that has the strongest edge strength may be used as the best candidate for rate distortion optimisation calculation;
Two or more edge directions that have the stronger edge strength may be used as the preferred candidates for rate distortion optimisation calculation;
Early termination of the RDO mode calculation based on the number of non-zero coefficients after integer transform and zigzag scanning; and
Early termination of the RDO mode calculation based on the length of zero runs after an integer transform and zigzag scanning.
There are a number of ways to get the local edge directional information, such as edge direction histogram (see Rafael C. Gonzalez, Richard E. Woods, “Digital image processing,” Prentice Hall, 2002, p. 572), directional fields (see A. M. Bazen and S. H. Gerez, “Systematic methods for the computation of the directional fields and singular points of fingerprints,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, pp. 905-919, July 2002), etc. The fast intra-mode prediction algorithm may be implemented based on both the edge direction histogram and directional fields, and the performance of the implementation has been compared in terms of time-saving, average PSNR and bit-rate for all the sequences recommended in JVT Test Model Ad Hoc Group, Evaluation sheet for motion estimation, Draft version 4, Feb. 19, 2003. The scheme based on edge direction histogram gives better performance. Therefore, the mode decision scheme described is based on edge direction histogram.
Edge Map
To obtain edge information in the neighbourhood of an intra block to be predicted, edge operators, such as Sobel edge operators, may be applied to an intra image to generate the edge map. Each pixel in the intra image is then associated with an element in the edge map, which is the edge vector containing its edge direction and amplitude. Prior to intra prediction, edge maps are created from the original picture.
The edge operator has two convolution kernels. Each pixel in the image is convolved with both kernels. One responds to the degree of difference in the vertical direction and the other in the horizontal. The edge operator is applied to every luminance and chrominance pixel except those pixels on the borders of luminance and chrominance pictures. This is because the operator cannot be applied to those pixels without 8 surrounding pixels. For a pixel pi,j, in a luminance (or chrominance) picture, the corresponding edge vector, {right arrow over (D)}i,j={dxi,j,dyi,j}, is defined as follows:
where dxi,j and dyi,j represent the degree of difference in the vertical and horizontal directions, respectively. Therefore, the amplitude of the edge vector can be decided by,
Amp({right arrow over (D)}i,j)=|dxi,j|+|dyi,j| (2)
In fact the amplitude may be obtained more accurately using the rooted sum of the squares of dxi,j and dyi,j. However, in the circumstance of the fast algorithm, Equation (2) is usually used instead. The direction of the edge (in degree) is decided by the hyper-function:
In one implementation of the algorithm, Equation (3) is not necessary, as in AVC there are only a limited number of directions that the prediction could be applied. In fact, simple thresholding techniques may be used to build up the edge direction histogram instead.
Edge Direction Histogram
To reduce the number of candidate prediction modes in RDO, an edge direction histogram is calculated from all the pixels in the block by summing up the amplitudes of hose pixels with similar directions in the block.
4×4 Luma Block Edge Direction Histogram
In the case of a 4×4 luma block, there are 8 directional prediction modes, as shown in
Therefore the edge direction histogram of a 4×4 luma block is decided as,
SET(k)ε{{(i0,j0)},{i1,i1)},{(i3,j3)}, . . . ,{(iu,ju)} . . . ,{(i8,j8)}|Ang({right arrow over (D)}i
while
a0=(−103.3°,−76.7]
a1=(−13.3°,13.3°]
a3=(35.8°,54.2°]
a4=(−35.8°,−54.2°]
a5=(−54.2°,−76.7°]
a6=(−35.8°,−13.3°]
a7=(54.2°,76.7°]
a8=(13.3°,35.8°] (4)
Note that k=1, . . . , 8 refers to 8 directional prediction modes. Note also that the angles of the direction in Equation (4) is 180° periodic.
Edge Direction Histogram for 16×16 Luma and 8×8 Chroma Block
In the case of 16×16 luma and 8×8 chroma blocks, there are only two directional prediction modes, plus a plane prediction and a DC prediction mode. Therefore, the edge direction histogram for this case is based on three directions 300, i.e., horizontal, vertical and diagonal directions, as shown in
Their edge direction histogram is constructed as follows,
where k=1 refers to the horizontal prediction mode, k=2 refers to vertical prediction mode, and k=3 refers to the plane prediction mode.
Histogram Based Fast Mode Selection for Intra Prediction
As mentioned above, each cell in the edge direction histogram sums up the amplitudes of those pixels with similar directions in the block. A cell with the maximum amplitude indicates that there is a strong edge presence in that direction, and thus could be used as the direction for the best prediction mode.
4×4 Luma Block Prediction Modes
Instead of performing the 9 mode RDO for 4×4 luma block, the fast algorithm only chooses some of the directional prediction modes with a higher possibility to be the candidate modes for intra 4×4 block prediction according to the edge direction histogram.
Since the pixels along an edge direction are likely to have similar values, the best prediction mode is probably in the edge direction whose cell has the maximum amplitude, or the directions close to the maximum amplitude cell. Therefore, the histogram cell with the maximum amplitude and the two adjacent cells are considered as candidates of the best prediction mode. In consideration of the case where all the cells have similar amplitudes in the edge direction histogram, the DC mode is also chosen as the fourth candidate.
Thus, for each 4×4 luma block, only 4 mode RDO calculation, may be performed instead of 9.
16×16 Luma Block Prediction Modes
Only the histogram cell with the maximum amplitude is considered as a candidate of the best prediction mode. Similarly as above, the DC mode is also chosen as the next candidate.
Thus, for each 1 6×1 6 luma block, only 2 mode RDO calculation may be performed, instead of 4.
8×8 Chroma Block Prediction Modes
In the case of chroma blocks, there are two different histograms, one from component U and the other from V. Therefore the histogram cells with maximum amplitude from the two components are both considered as candidate modes. As before, the DC mode also takes part in the RDO calculation. Note that if the direction with the maximum amplitude from the two components is the same, there could only 2 candidate modes for RDO calculation; otherwise, it is 3.
Thus, for each 8×8 chroma block, 2 or 3 mode RDO calculations are performed, instead of 4.
Table 1 summarises the number of candidates selected for the RDO calculation based on the edge direction histogram. As can be seen from Table 1, the encoder with the fast mode decision algorithm performs only 132˜198 RDO calculations, which is much less than that of current AVC video coding (592).
*The modes selected from the 2-chroma blocks may be the same.
Early Termination of Mode Computation
In the intra-prediction RDO mode computation, the most time-consuming portion lies in the context adaptive binary arithmetic coding (CABAC) coding. Also, the number of data bits generated after CABAC coding is heavily dependent on the number of non-zero coefficients after integer transform and zigzag scanning. Therefore, a simple early termination scheme in mode computation is implemented, i.e., if the number of non-zero coefficients in current RDO mode computation exceeds that in the previously computed RDO mode, an early termination of this RDO mode computation is activated and the current RDO mode is rejected.
AVC Intra Prediction
Computer Program Implementation
The method and apparatus of the above embodiment can be implemented on a computer system 500, schematically shown in
The computer system 500 comprises a computer module 502, input modules such as a keyboard 504 and mouse 506 and a plurality of output devices such as a display 508, and printer 510.
The computer module 502 is connected to a computer network 512 via a suitable transceiver device 514, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN).
The computer module 502 in the example includes a processor 518, a Random Access Memory (RAM) 520 and a Read Only Memory (ROM) 522. The computer module 502 also includes a number of Input/Output (I/O) interfaces, for example V/O interface 524 to the display 508, and I/O interface 526 to the keyboard 804.
The components of the computer module 502 typically communicate via and interconnected bus 528 and in a manner known to the person skilled in the relevant art.
The application program is typically supplied to the user of the computer system 500 encoded on a data storage medium such as a CD-ROM or floppy disk and read utilising a corresponding data storage medium drive of a data storage device 530. The application program is read and controlled in its execution by the processor 518. Intermediate storage of program data may be accomplished using RAM 520.
In the foregoing manner, a method and an apparatus for AVC intra prediction to code digital video comprising a plurality of pictures have been disclosed. While only a small number of embodiments are set forth, it will be appreciated by those skilled in the art that numerous changes and/or substitutions may be made without departing from the scope and spirit of the invention.
Number | Date | Country | Kind |
---|---|---|---|
60451553 | Mar 2003 | US | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/SG04/00047 | 3/3/2004 | WO | 10/10/2006 |