This application claims the benefit, under 35 U.S.C. §365 of International Application PCT/EP2010/068181, filed Nov. 25, 2010, which was published in accordance with PCT Article 21(2) on Jun. 16, 2011 in English and which claims the benefit of French patent application No. 0958824, filed Dec. 10, 2009.
The invention relates to the general domain of image coding.
More specifically, the invention relates to a method for coding a block of a sequence of images and a method for reconstructing such a block.
It is known in the prior art that to code a current block of a sequence of images to predict this block from one or several reference blocks. More specifically the current block is predicted by a prediction block equal to the reference block or a filtered version of the reference block or again to a merging of several reference blocks. The prediction block is then extracted, for example by difference pixel by pixel, from the current block. The block of residues thus obtained is then coded in a stream of coded data.
In reference to
In reference to
In reference to
These techniques are not always efficient notably at low and medium bitrate. In particular the block Pr1 is not always pertinent, i.e. it does not predict sufficiently precisely the current block Bc. It thus results in an extra coding cost.
The purpose of the invention is to overcome at least one of the disadvantages of the prior art. For this purpose, the invention relates to a method for reconstruction of a current block of a sequence of images having the form of a stream of coded data. The method for reconstruction comprises the following steps for:
According to a particular aspect of the invention, the step of determination of the second item of motion data comprises the minimisation of a distortion equal to the sum of the first distortion weighted by a coefficient and a second distortion, the second distortion being calculated between a current template neighbouring the current block and a reference template previously reconstructed.
According to a particular characteristic, the coefficient is equal to 0.5.
According to another particular characteristic, the method for reconstruction also comprises a step of reconstruction of the coefficient from the stream of coded data.
According to a variant, the method for reconstruction also comprises a step of calculation of the coefficient according to the following formula:
σ12/(σ02+σ12)
where σ0 is the deviation type of a residue calculated between the current template and a corresponding template associated with a first prediction block,
and σ1 is the deviation type of the residue calculated between the current template and the reference template corresponding to a second item of motion data.
The invention also relates to a method for coding a current block of a sequence of images comprising the following steps for:
According to a particular aspect of the invention, the step of determination of the second item of motion data comprises the minimisation of a distortion equal to the sum of the first distortion weighted by a coefficient and a second distortion, the second distortion being calculated between a current template neighbouring the current block and a reference template previously reconstructed.
According to a particular characteristic, the coefficient is equal to 0.5.
According to another particular characteristic, the method for coding also comprises a step of reconstruction of the coefficient from the stream of coded data.
According to a variant, the method for coding also comprises a step of calculation of the coefficient according to the following formula:
σ12/(σ02+σ12)
where σ0 is the deviation type of a residue calculated between the current template and a corresponding template associated with the first prediction block,
and σ1 is the deviation type of the residue calculated between the current template and the reference template corresponding to a second item of motion data.
The invention will be better understood and illustrated by means of embodiments and advantageous implementations, by no means limiting, with reference to the figures in the appendix, wherein:
An image sequence is a series of several images. Each image comprises pixels or image points, with each of which is associated at least one item of image data. An item of image data is for example an item of luminance data or an item of chrominance data.
The term “motion data” is to be understood in the widest sense. It comprises the motion vectors and possibly the reference image indexes enabling a reference image to be identified in the image sequence. It can also comprise an item of information indicating the interpolation type used to determine the prediction block. In fact, in the case where the motion vector associated with a block Bc does not have integer coordinates, the image data must be interpolated in the reference image Iref to determine the prediction block. The motion data associated with a block are generally calculated by a motion estimation method, for example by block pairing. However, the invention is in no way limited by the method enabling a motion vector to be associated with a block.
The term “residual data” signifies data obtained after extraction of other data. The extraction is generally a subtraction pixel by pixel of prediction data from source data. However, the extraction is more general and comprises notably a weighted subtraction. The term “residual data” is synonymous with the term “residues”. A residual block is a block of pixels with which residual data is associated.
The term “transformed residual data” signifies residual data to which a transform has been applied. A DCT (Discrete Cosine Transform) is an example of such a transform described in chapter 3.4.2.2 of the book by I. E. Richardson entitled “H.264 and MPEG-4 video compression”, published by J. Wiley & Sons in September 2003. The wavelet transform described in chapter 3.4.2.3 of the book by I. E. Richardson and the Hadamard transform are other examples. Such transforms “transform” a block of image data, for example residual luminance and/or chrominance data, into a “block of transformed data” also called a “block of frequency data” or a “block of coefficients”. The term “prediction data” signifies data used to predict other data. A prediction block is a block of pixels with which prediction data is associated. A prediction block is obtained from a block or several blocks of the same image as the image to which belongs the block that it predicts (spatial prediction or intra-image prediction) or from one (mono-directional prediction) or several blocks (bi-directional prediction) of a different image (temporal prediction or inter-image prediction) of the image to which the block that it predicts belongs.
The term “prediction mode” specifies the way in which the block is coded. Among the prediction modes, there is the INTRA mode that corresponds to a spatial prediction and the INTER mode that corresponds to a temporal prediction. The prediction mode possibly specifies the way in which the block is partitioned to be coded. Thus, the 8×8 INTER prediction mode associated with a block of size 16×16 signifies that the 16×16 block is partitioned into 4 8×8 blocks and predicted by temporal prediction.
The term “reconstructed data” signifies data obtained after merging of residual data with prediction data. The merging is generally a sum pixel by pixel of prediction data to residual data. However, the merging is more general and comprises notably the weighted sum. A reconstructed block is a block of pixels with which reconstructed image data is associated.
A neighbouring block or neighbouring template of a current block is a block respectively a template situated in a more or less large neighbourhood of the current block but not necessarily adjacent to this current block.
The term coding is to be taken in the widest sense. The coding can possibly but not necessarily comprise the transformation and/or the quantization of image data. Likewise, the term coding is used even if the image data are not explicitly coded in binary form, i.e. even when a step of entropy coding is omitted.
In reference to
During a step 10, a first item of motion data MVc is reconstructed from the stream F of coded data. For this purpose, the coded data of the stream F representative of the first item of motion data MVc are decoded. The invention is in no way limited by the method used to reconstruct the first item of motion data MVc. According to a particular embodiment, the first item of motion data MVc is reconstructed by prediction from motion data previously reconstructed. During a step 12, a first prediction block Pr0 is identified with said first item of motion data MVc in a first reference image Ir0.
During a step 14, at least one second item of motion data MVtmp is determined by a template matching method. When several reference images are available, the template matching method is applied for example for the different reference images and the motion data MVtmp is retained, i.e. the reference image Ir1 and the associated vector, providing the lowest distortion on the current template Lc.
In reference to
where p designates a pixel and Lc[p] is the value of the item of image data associated in the current template Lc with the pixel p. The current template Pc is located in the causal neighbourhood of the current block Bc.
MVtmp is thus the motion data that minimises a first distortion D1 calculated between the current template Lc and a reference template Lr1 that belongs to a second reference image Ir1.
According to the invention, the step of determination of the second item of motion data is a function of the first distortion calculated between the first prediction block Pr0 and the second prediction block Pr1. Advantageously, the choice of the second item of motion data MVtmp is no longer only a function of the current template Lc and the reference template Lr1 that belongs to the second reference image Ir1. In fact, the first item of reconstructed motion data MVc is considered as reliable. Consequently, the block Pr0 is used to guide the determination of the second item of motion data MVtmp.
For example, the second item of motion data MVtmp is determined as follows:
MVtmp is thus the item of motion data that minimises a distortion equal to the sum of the first distortion D1 and a second distortion D2 weighted by the coefficient ‘a’, the second distortion D2 being calculated between the first prediction block Pr0 and the second prediction block Pr1.
‘a’ is a weighting coefficient that enables the influence of the second distortion D2 to be weighed according to the confidence accorded to the prediction Pr0. More specifically, the confidence is representative of the pertinence of the prediction Pr0 used to predict Bc. According to a first embodiment the value of the coefficient ‘a’ is fixed a priori. For example, ‘a’ is equal to ½.
According to another embodiment, the coefficient ‘a’ is reconstructed from the stream F. In this case the coefficient ‘a’ is coded in the stream F per image, per block or per image slice.
According to another variant, the coefficient ‘a’ is calculated according to the following formula:
σ12/(σ02+σ12)
Lr0 is a template associated with the block Pr0 that occupies relative to this block the same position as the current template Lc relative to the current block Bc.
During a step 16, a second prediction block Pr1 is identified with said second item of motion data MVc in a first reference image Ir1.
During a step 18, the current block Bc is reconstructed from the first prediction block Pr0 and the second prediction block Pr1. More specifically, the first prediction block Pr0 and the second prediction block Pr1 are merged, for example by making their average, into a unique prediction block Pr. This prediction block Pr is then merged, for example by addition pixel by pixel, to a block of residues reconstructed for the current block from the stream F.
According to a variant the first and second reference images are a single and same image.
In reference to
During a step 24, at least one second item of motion data MVtmp is determined by a template matching method. When several reference images are available, the template matching method is applied for example for the different reference images and the motion data MVtmp is retained, i.e. the reference image Ir1 and the associated vector, providing the lowest distortion on the current template Lc.
In reference to
where p designates a pixel and Lc[p] is the value of the item of image data associated in the current template Lc with the pixel p. The current template Pc is located in the causal neighbourhood of the current block Bc.
MVtmp is thus the motion data that minimises a first distortion D1 calculated between the current template Lc and a reference template Lr1 that belongs to a second reference image Ir1.
According to the invention, the step of determination of the second item of motion data is a function of the first distortion calculated between the first prediction block Pr0 and the second prediction block Pr1. Advantageously, the choice of the second item of motion data MVtmp is no longer only a function of the current template Lc and the reference template Lr1 that belongs to the second reference image Ir1.
In fact, the first item of motion data MVc determined is considered as reliable. Consequently, the block Pr0 is used to guide the determination of the second item of motion data MVtmp.
Consequently, the block Pr0 is used to guide the determination of the second item of motion data MVtmp.
For example, the second item of motion data MVtmp is determined as follows:
MVtmp is thus the item of motion data that minimises a distortion equal to the sum of the first distortion D1 and a second distortion D2 weighted by the coefficient ‘a’, the second distortion D2 being calculated between the first prediction block Pr0 and the second prediction block Pr1.
‘a’ is a weighting coefficient that enables the influence of the second distortion D2 to be weighed according to the confidence accorded to the prediction Pr0. More specifically, the confidence is representative of the pertinence of the prediction Pr0 used to predict Bc. According to a first embodiment the value of the coefficient ‘a’ is fixed a priori. For example, ‘a’ is equal to ½.
According to another embodiment, the coefficient ‘a’ is coded in the stream F per image, per block or per image slice.
According to another variant, the coefficient ‘a’ is calculated according to the following formula:
σ12/(σ02+σ12)
Lr0 is a template associated with the block Pr0 that occupies relative to this block the same position as the current template Lc relative to the current block Bc.
During a step 26, a second prediction block Pr1 is identified with said second item of motion data MVc in a first reference image Ir1.
During a step 28, the current block Bc is coded from the first prediction block Pr0 and the second prediction block Pr1. More specifically, the first prediction block Pr0 and the second prediction block Pr1 are merged, for example by making their average, into a unique prediction block Pr. The prediction block Pr is then extracted, for example by subtraction pixel by pixel, from the current block. The block of residues thus obtained is coded in the stream F. The first item of motion data MVc is also coded in the stream F either directly or by prediction. The invention is in no way limited by the method used to code the first item of motion data MVc.
According to a particular embodiment the first and second reference images are a single and same image.
The invention also relates to a coding device 12 described in reference
In reference to
The coding device 12 further comprises a motion estimation module 1212 capable of estimating at least one motion vector between the block Bc and a block of a reference image Ir stored in the memory 1210, this image having previously been coded then reconstructed. According to a variant the motion estimation can be done between the current block Bc and the source reference image Ic in which case the memory 1210 is not linked to the motion estimation module 1212. The motion estimation module 1212 is able to implement the step 20 of the coding method according to the invention. According to a method known to those skilled in the art, the motion estimation module searches for in a reference image Ir0 an item of motion data, specifically a motion vector, so as to minimise an error calculated between the current block Bc and a block in the reference image Ir0 identified using the item of motion data. The coding device 12 also comprises a motion data reconstruction module 1218 that implements a template matching method. The motion data reconstruction module 1218 is able to implement step 24 of the coding method to generate an item of motion data MVtmp. The module 1212 and 1218 can be integrated in a single and unique component.
The motion data MVc, MVtmp determined are transmitted by the motion estimation module 1212 respectively by the motion data reconstruction module to a decision module 1214 able to select a coding mode for the block Bc in a predefined set of coding modes. The coding mode retained is for example that which minimizes a bitrate-distortion type criterion. However, the invention is not restricted to this selection method and the mode retained can be selected according to another criterion for example an a priori type criterion. The coding mode selected by the decision module 1214 as well as the motion data, for example the item or items of motion data in the case of the temporal prediction mode or INTER mode are transmitted to a prediction module 1216. The coding mode selected and in the contrary case the item or items of motion data MVc are also transmitted to the entropy coding module 1204 to be coded in the stream F. The prediction module 1216 determines the prediction block Pr from the coding mode determined by the decision module 1214 and possibly from motion data MVc, MVtmp determined by the motion estimation module 1212 and by the motion data reconstruction module 1218. The prediction module 1212 is able to implement the steps 22 and 26 of the reconstruction method according to the invention. The step of coding 28 of the coding method according to the invention is implemented by the modules 1200, 1202 and 1204 of the coding device 12.
In reference to
The decoding device 13 also comprises a motion data reconstruction module 1310 that implements for example a template matching method. The motion data reconstruction module 1310 is able to implement step 14 of the reconstruction method to generate an item of motion data MVtmp.
The decoded data relating to the content of the images is then transmitted to a module 1302 able to carry out an inverse quantization followed by an inverse transform. The module 1302 is identical to the module 1206 of the coding device 12 having generated the coded stream F. The module 1302 is connected to a calculation module 1304 able to merge, for example by addition pixel by pixel, the block from the module 1302 and a prediction module Pr to generate a reconstructed current block Bc that is stored in a memory 1306. The decoding device 13 also comprises a prediction module 1308. The prediction module 1308 determines the prediction block Pr from the coding mode decoded for the current block by the entropy decoding module 1300 and possibly from motion data MVc, MVtmp determined by the entropy decoding module 1300 respectively by the motion data reconstruction module 1310. The prediction module 1308 is able to implement the steps 12 and 16 of the reconstruction method according to the invention. The step of reconstruction 18 of the reconstruction method according to the invention is implemented by the modules 1302 and 1304 of the decoding device 13.
Naturally, the invention is not limited to the embodiment examples mentioned above.
In particular, those skilled in the art may apply any variant to the stated embodiments and combine them to benefit from their various advantages. Notably the invention is in no way limited by the method used to determine a motion vector MVc, or by the manner in which this vector is coded and respectively reconstructed.
Number | Date | Country | Kind |
---|---|---|---|
09 58824 | Dec 2009 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2010/068181 | 11/25/2010 | WO | 00 | 9/4/2012 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2011/069831 | 6/16/2011 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6023298 | Hwang | Feb 2000 | A |
20020009144 | Ishihara et al. | Jan 2002 | A1 |
20070009044 | Tourapis et al. | Jan 2007 | A1 |
20080159400 | Lee et al. | Jul 2008 | A1 |
20080159401 | Lee et al. | Jul 2008 | A1 |
20090003443 | Guo | Jan 2009 | A1 |
20090067505 | Tourapis et al. | Mar 2009 | A1 |
20090180538 | Visharam et al. | Jul 2009 | A1 |
20090225847 | Min et al. | Sep 2009 | A1 |
20100208814 | Xiong et al. | Aug 2010 | A1 |
Number | Date | Country |
---|---|---|
101415122 | Apr 2009 | CN |
2001057768 | Jul 2001 | KR |
WO2007093629 | Aug 2007 | WO |
WO2009028780 | Mar 2009 | WO |
WO2009126260 | Oct 2009 | WO |
WO2011002809 | Jan 2011 | WO |
Entry |
---|
Suzuki et al., “Inter Frame Coding with Template Matching Averaging”, Image Processing, 2007. ICIP 2007, Sep. 1, 2007, pp. III-409. |
Search Report Dated Jan. 14, 2011. |
Kamp et al., “Multihypothesis prediction using decoder side motion vector derivation in inter frame video coding”, Conference on Visual Communication and Image Processing 2009, San José, California, USA, Jan. 20, 2009, pp. 1-9. |
Richardson, I., “H264 and MPEG4 Video Compression”, John Wiley & Sons Ltd., West Sussex, England, 2003, pp. 1-307. |
Number | Date | Country | |
---|---|---|---|
20120320982 A1 | Dec 2012 | US |