The invention relates to a process for coding images using intra prediction mode.
In MPEG-4 AVC, intra prediction of an image block, of size 4×4, 8×8 or 16×16 pixels, is done by 1D extrapolating in predefined directions from the neighbor rebuilt pixels. This prediction is done very locally, so only surrounding information is used. Images containing some kind of textures or repetitions of 2D-patterns can not be optimally intra predicted.
Methods of intra coding using motion estimation are suggested, for example in a paper of Siu-Leong Yu and Christos Chrysafis, titled “New Intra Prediction using Intra-Macroblock Motion Compensation”, document JVT-C151, JVT Meeting, May 2002 or in a paper of Satoshi Kondo, Hisao Sasai and Shinya Kadono, titled “Tree structured hybrid intra prediction”, 2004. Efficiency of such algorithms is not optimized as block matching relates to a same pattern.
One of the objects of the invention is to alleviate the aforesaid drawbacks.
Its subject is a process for a blockwise coding of a video image using the intra mode, comprising:
characterized in that the step of intra prediction comprises:
According to a particular embodiment, the reconstructed part taken into account depends on the position of the current block within the macroblock it belongs to.
According to a particular embodiment, the neighboring part of the current block taken into account depends on the position of the current block within the macroblock it belongs to.
According to a particular embodiment, the motion estimation implements a block matching algorithm.
According to a particular embodiment, the motion estimation is a full-pel search, a half-pel search or a quarter-pel search.
According to a particular embodiment, the motion estimation takes into account a weighting function to favor nearest pixels of the current block for the correlation.
Other characteristics and advantages of the present invention will emerge upon reading the description of different embodiments, this description being made with reference to the drawings attached in the appendix, in which:
A method called intra prediction based on motion estimation is proposed and consists in predicting the current block with an intra image motion estimation. The intra motion estimation is done similarly to the “traditional” inter motion estimation. The most important difference is that the reference image is not another already coded image but the current decoded image itself. Only the already coded part of the image, called the rebuilt image, containing previous macroblocks and previous blocks inside the current macroblock, is used. The aim of this method is to search inside the rebuilt image the block that is the most similar to the block to be predicted. That most similar block will be used as intra prediction the same ways as a prediction block obtained with a prediction mode of the standard MPEG-4 AVC.
This new intra prediction mode operating a motion estimation can be considered as another coding mode among the existing coding modes, for example the ones of the MPEG 4 standard. The chosen mode is for example the one giving the lowest coding cost for a given quality.
The process relating to the decoding comprises a step of motion estimation between a sub-partition already decoded and close to the current block to be decoded and the already decoded area of the current image. The part correlated to the sub-partition allows getting the part correlated to the current block which is the predicted block. This predicted block is added to the residue to get the current block. No motion vector is needed to find the predicted block.
The algorithm is based on the idea to take a sub-partition of the rebuilt image neighboring the current block to be predicted.
The intra motion estimation has two main advantages:
The intra prediction algorithm using motion estimation processes each block, for example of size 4×4 or 8×8, of the current macroblock in zigzag order. The macroblocks of an image are processed in raster scan order. For each block, the steps of the algorithm are the followings, represented in
1) Obtain the best motion vector that describes the position of sim_part, the candidate sub-partition in the rebuilt image the most similar to the neighbor sub-partition. That is done by scanning neigh_part in the rebuilt image, and at each position computing the difference between the neighbor and candidate sub-partitions. That difference is computed using a similarity criterion, for example the Sum of Absolute Differences or SAD in full-, half- and quarter-pel searches. The best motion vector is the one with the smallest difference.
2) Get the prediction block. The prediction is simply the block pred adjacent to sim_part corresponding to the motion vector described above.
3) Determine the best prediction block, from the one obtained by intra motion estimation and the others by standard MPEG-4 AVC intra prediction modes. That is done comparing the coding costs (sse+λ* block cost) of all the prediction blocks obtained.
Area of Definition of the Rebuilt Image.
In
Full-Gel Search
To obtain the best motion vector, the neighbor sub-partition of the current block is scanned over the rebuilt image defined in the previous paragraph in order to determine the most similar sub-partition. The precision unit of this search is one pixel. That is why it is called “full-pel search”.
Definition of the neighbor sub-partition of a 4×4 block:
The neighbor sub-partition of a 4×4 block can have two different shapes depending on the position of the current block inside the macroblock. It has the shape (a) of
Definition of the neighbor sub-partition of a 8×8 block:
Once again, the neighbor sub-partition of a 8×8 block can have two different shapes depending on the position of the current block inside the macroblock. It takes the shape (a) of
Full-Pel Search Algorithm:
The neighbor sub-partition is scanned over the rebuilt area, and the criterion for choosing the best candidate sub-partition as most similar sub-partition is the SAD (Sum of Absolute Differences). Adapted from the one described in the paper of Sahn-Gyu Park, Edward J. Delp and Hoaping Yu titled “Adaptive lossless video compression using an integer wavelet transform”, ICIP, 2004, it is computed like this in case of intra 4×4 prediction:
where, according to
dec(u,v) corresponds to the causal part of the current block to predict: blc
dec(m,n) is an homologous size block of dec(u,v) displaced of (u-m, v-n) vector in the context of motion estimation applied on the reconstructed part of the current frame,
i and j indexes allow to scan all the pixels of the blocks dec(u,v) and dec(m,n)
SADuv(m,n) is sum of absolute difference of the pixels contains respectively in the block dec(u,v) and dec(m,n)
Notice that the indexes of the sums depend on the shape of the neighbor sub-partition. Here, shape (a) from
Half-Pel Search
In full-pel search, the unit of the search grid is the pixel. It can happen that the best prediction is located between two unit positions. Such a prediction is a block constituted of interpolated pixels.
The interpolation of a half-pel is done like in the MPEG-4 AVC standard in function of the three neighbor pixels in two directions. The interpolation algorithm is the following:
1) The half-pixels on each line containing full-pels are first horizontally interpolated from their 6 nearest horizontal neighbors as shown in
The value of the interpolated half-pixel is:
2) The other half-pixels are interpolated vertically from full- or half-pixels already interpolated during the first step as shown in
The
In order not to compute all the 3.n.m half-pixels of an image and not to test all the half-pel sub-partitions in the image with the SAD, the half-pel search is done on the flight once per processed block. Starting from the most similar full-pel sub-partition computed before, the 8 half-pel sub-partitions around it are taken into consideration. The half-pixels needed by these 8 candidate sub-partitions are interpolated only. Then the 9 SADs, on all these 8 half-pel candidate sub-partitions and on the centre full-pel most similar sub-partition, are compared and the most similar half-pel sub-partition is chosen. Its adjacent block determines the prediction block.
Quarter-Pel Search
Similarly to the previous search precision improvement from full-pel to half-pel, the search can be improved from half-pel to quarter-pel.
The quarter-pixels interpolation is done as in the MPEG-4 AVC standard with a linear interpolation of two adjacent neighbor pixels as described in
All the quarter pixels are interpolated from two adjacent neighbors.
The 8 quarter pixels around a full pixel are interpolated as in (a1)
The 8 quarter-pixels around an interpolated half-pixel (between 4 full-pixels and 4 half-pixels) are interpolated as in (a2)
The 8 quarter-pixels around an interpolated half-pixel (between 2 full-pixels and 6 half-pixels) are interpolated as in (b)
The value of the interpolated quarter-pixel is:
As before in half-pel search, all the quarter-pels of an image are not computed. Only the quarter-pels needed by the on the flight computation of the 8 candidate sub-partitions surrounding the most similar half-pel sub-partition are computed.
Then the 9 SADs, on the 8 candidates and of the most similar half-pel sub-partition, are compared and the most similar quarter-pel sub-partition is chosen. Its adjacent block determines the prediction block.
Intra 4×4 and 8×8 Predictions
When both intra 4×4 and intra 8×8 prediction algorithms based on motion estimation are implemented, they can both be enabled at the same time without problem. They return respectively the 4×4 and 8×8 prediction blocks. These prediction modes can be integrated in the coding process.
The basic shape of the neighbor sub-partition is proposed on
The examples of
In case of 4×4 block, the neighbor sub-partition can have three different shapes, reference 1, depending on block coding order and the position of the current block, named a, b or c, inside the macrobock, reference 2:
The target shapes have been explained for 4×4 block sizes, concerning the 8×8 block the approach is similar, in the sense that the shapes are homologous.
Weighting Function
Type of block matching is specific because the motion estimator tries to find a sub-partition with the help of surrounding pixel blocks. So as to favor the nearest pixel during block matching, one solution consists in using a weighting function. The value of the weighting coefficients can vary according to the distance of the pixel to match from the center of the block to predict. In that case, the 4×4 and 8×8 weighting functions used are:
w
8×8(i,j)=c×ρ√{square root over ((i−11.5)
w
4×4(i,j)=c33 ρ2×√{square root over ((i−5.5)
where:
c is a normalization coefficient,
ρ=0.8,
i and j are the coefficient coordinates on the target referential, in which the center of the block to encode is (5.5, 5.5) for 4×4 block and (11.5, 11.5) for 8×8 block,
and the origin (0, 0) is on the left superior corner of the target.
With this function, the relation number is:
When referring to
Tests and Results
In the present configuration, the running time of the intra prediction based on motion estimation is very long. That is principally due to the search window size which is the entire already coded image (rebuilt image) for each block of an image. The motion estimation algorithm (full block matching) is done through each position in this window. That implies that the complexity of the algorithm is O(N2), where N is the total number of pixels of the image, O meaning a function. For example, when the height and width on an image are multiplied by 2, its number of pixels is multiplied by 4 and the running time is multiplied by 42=16.
Computation of the complexity of the algorithm:
Predicted Image
The predicted image is made of the prediction blocks computed by the intra prediction algorithm based on motion estimation. The encoder subtracts it from the source image. Thus it obtains the difference image, also called residue image. The residue image is coded and transmitted to the decoder.
These tests were done on QCIF images (176×144 pixels) which sources are displayed on the first row of the table.
It can be noticed that the predicted image with both intra 4×4 and intra 8×8 blocks is not equal to an image constituted of blocks brought from the intra 4×4 prediction image and from the intra 8×8 prediction image. The observed differences are due to the fact that the decoded image is generated on flight, and changes, during the encoding in function of the intra4×4/intra8×8 MB decision choice.
The following visual observations can be done from the table:
First, in the “foreman” sequence, the intra prediction algorithm based on motion estimation algorithm predicts well the regular structure behind the man. But some wrong edges are detected, and some diagonal down-left to up-right edges are not well predicted. These cases are discussed below. The irregular parts of the image, the face and the jacket of the man, are less good predicted than with the MPEG-4 AVC algorithm.
Second, in the “qcif—7” sequence, the repetition of the contents of the TV screens is best predicted with the method based on motion estimation.
Third, in the “qcif—8” sequence, the intra motion estimation algorithm gives very good results compared to the standard MPEG-4 AVC intra prediction (all intra modes allowed). In this case, our algorithm finds the right position on the matrix symbols. It doesn't find the right symbol (very difficult . . . ) but the gain is great comparing to MPEG-4 AVC.
As it was noticed with the prediction images, it can be seen in the residue images that the sequences “qcif—7” and “qcif—8” are visually better predicted with the algorithm based on motion estimation. In the “foreman” sequence, our algorithm visually decreases the residue in the regular structure behind the man. We will see below how much the bitstream size can be reduced with this method.
Performances
The mode performing an intra prediction based on the motion estimation is called below intra motion estimation mode. In the simulations described below, that mode replaces another intra mode of the MPEG-4 AVC standard.
The choice of which mode would be replaced was determined by simulations on sample sequences. The intra mode 5 (prediction along a vertical right axis) was in mean one of the less used modes. So the intra motion estimation mode replaces the mode 5 in the further simulations.
That mode substitution is done in order not to modify more the software and to produce decodable bitstreams. The bitstream is still decodable because only the sent residues generated after the blocks prediction have changed (compared to the original MPEG-4 AVC ones).
When receiving a block coded with the intra motion estimation based mode the decoder believes that it decodes the intra mode 5. The decoded image is false because the decoder rebuilts the block with the original prediction mode 5. But the size of the coded bitstream corresponds to what is looking for.
Of course, the coding syntax of the encoder would have to be modified according the new prediction mode. The decoder would have to be modified too. It would implement the same intra motion estimation based algorithm. This modification of the decoder would be done in both cases when the intra motion estimation mode replaces the intra mode 5 and when it is added the standard modes.
The intra prediction mode based on motion estimation was tested on different CIF images (352×288 pixels). More than with QCIF sequences before, the computation time is here very long. It was first tested with intra 4×4 blocks only.
Table 1 below shows the difference of the bitstream size in percent between the intra 4×4 prediction based on motion estimation (substituted to the prediction mode 5) and MPEG-4 AVC in the same conditions. That difference is computed with the Bjontegaard criterion. It can be seen that passing from full-pel search to half-pel search and finally to quarter pel search gives in all cases better performances.
Coding with the intra 8×8 block size only, the same conclusion can be done from table 2 below. The bitstream size is reduced when the intra motion estimation based prediction is improved from full-pel precision to quarter-pel precision.
In table 3 below showing the bitrate difference between the encoder with prediction based on motion estimation substituted to prediction mode 5 and MPEG-4 AVC (on intra 4×4 and 8×8 blocks), it can be observed that in less that half of the images the result are here better that the two previous ones. In the other images the results combining the 4×4 and 8×8 blocks sizes are not far from the results with only 4×4 or 8×8 blocks size.
Keeping good performances when passing from 4×4 block size to 8×8 block size is an improvement comparing to the previous methods based on the most probable mode estimation. Remember for example that the improvement of the first most probable mode estimation method was reduced in average from 75% when passing from 4×4 to 8×8 block size.
Conclusion on intra prediction based on motion estimation.
The results displayed in the previous paragraph show that the intra prediction based on motion estimation improves the quality of the MPEG-4 AVC intra prediction:
As examples, the following modifications can be applied to the process:
Number | Date | Country | Kind |
---|---|---|---|
06290290.3 | Feb 2006 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2007/051480 | 2/15/2007 | WO | 00 | 8/12/2008 |