The invention relates to video coding and decoding. More particularly, it concerns a method of coding a sequence of images and a method of reconstruction of the sequence. It addresses the improvement of the video coding performance by keeping the same quality for a lower bit-rate.
When encoding an image of a sequence of images, it is known to first predict the image either spatially or temporally and to encode the residual signal resulting from the prediction of the image. Spatial prediction is also referred to as INTRA prediction and temporal prediction is also referred to as INTER prediction. The ITU-T Rec. H.264/ISO/IEC 14496-10 AVC video coding standard specifies three different intra prediction modes, Intra4×4, Intra8×8 and Intra16×16 that correspond to a spatial estimation of the block to be coded. These different modes can exploit different directional prediction modes in order to build the pixels of the prediction block. In Intra4×4 and Intra8×8, nine intra prediction modes are defined. Eight of these modes consist of a 1D directional extrapolation of pixels surrounding the block to be predicted. The additional prediction mode (DC mode) defines the pixels of the prediction block as the average of available surrounding pixels.
When the texture corresponds to mono-directional oriented structures, fitting with one of the available prediction directions, these structures can be well extrapolated with the appropriate directional 1D prediction. But in case of complex 2D patterns, H.264/AVC intra prediction modes are not able to correctly propagate and predict the signal.
The invention is aimed at alleviating at least one of the drawbacks of the prior art. One aim of the invention is to improve the principle of intra prediction, and this by using a coder/decoder scheme based an image summary (e.g. an epitome) of the current image, in which the image summary is indirectly used as a reference image.
Thus, the invention relates to a method of coding a sequence of images comprising for a current image the steps of:
The invention further relates to a method of reconstructing a sequence of images comprising for a current image the steps of:
The use of image summary is solving the issue of directional intra prediction by using 2D texture prediction. Indeed, a summary image is composed of real texture and come only from the original image. The main purpose of a summary image is to remove redundancy within the original image and to keep the most pertinent patterns (or patches) that best represent the image texture. These patterns could provide a prediction more suitable for 2D texture since 2D patches are considered instead of oriented mono-directional interpolations.
Other characteristics and advantages of the invention will appear through the description of a non-limiting embodiment of the invention, which will be illustrated, with the help of the enclosed drawing.
The invention relates to a coding method of a sequence of images. The method of coding is described for a current image of the sequence. The method of coding according to the invention uses an image summary of the current image in order to encode it. The invention further relates to a corresponding reconstruction method.
At step 20, an image summary is created from the current image Icurr. According to a specific embodiment the image summary is an epitome. However, the invention is not limited to this kind of summary. Any kind of summary (e.g. patch dictionary) may be used provided that an image is able to be reconstructed from this summary.
At step 22, the image summary is encoded into a first stream F1. As an example the summary is encoded in conformance with H.264 standard using intra only coding modes. According to a variant, the image summary is encoded in conformance with JPEG standard defined in document JPEG 2000 Part, ISO/IEC JTC1/SC 29/WG 1 Std., March 2000
At step 24, the image summary is decoded into a decoded summary. The step 24 is the inverse of step 22.
At step 26, an intermediate image is reconstructed from the decoded summary.
At step 28, the current image Icurr is encoded using the intermediate image as reference image into a second bitstream F2. As an example, the current image is encoded in conformance with H.264. According to a variant, the current image is encoded in conformance with the MPEG2 ISO-IEC 13818 video coding Standard. Usual coding modes (inter and intra coding modes) may be used. When a block of the current image is encoded according to inter coding mode then the difference between the block and a corresponding block in the reference image, i.e. in the image intermediate image reconstructed from the decoded epitome, is encoded. The corresponding block is identified in the reference image by a motion vector or may be the colocalized block in the reference image. Bidirectional prediction is also possible with two blocks of the reference image. The difference also known as residue is in fact the prediction error calculated between the block and its prediction derived from the reference image. Usually, the residue is first transformed into a block of coefficients such as DCT coefficients. The coefficients are then quantized into a block of quantized coefficients. The quantized coefficients are finally encoded into a bitstream using entropy coding such as well known arithmetic coding, CABAC (which stands for Context-Adaptive Binary Arithmetic Coding), CAVLC (which stands for Context-Adaptive Variable-Length Coding), etc. The invention is not limited to the type of encoding used to encode the residues. The bitstream F2 is the prediction error residual bitstream. According to a variant, the first and second bitstreams are multiplexed into a single bitstream.
At step 20, an epitome is created from the current image Icurr. Therefore, according to this specific embodiment, the current image Icurr is factorized, i.e. a texture epitome E and a transform map Φ are created for the current image. The epitome principle was first disclosed by Hoppe et al in the article entitled “Factoring Repeated Content Within and Among Images” published in the proceedings of ACM SIGGRAPH 2008 (ACM Transaction on Graphics, vol. 27, no. 3, pp. 1-10, 2008). The texture epitome E is constructed from pieces of texture (e.g. a set of charts) taken from the current image. The transform map Φ is an assignation map that keeps track of the correspondences between each block of the current image Icurr and a patch of the texture epitome E.
In document entitled “Video Epitomes” published in International Journal of Computer Vision, vol. 76, No. 2, February 2008 image Cheung et al disclose a statistical method in order to extract an epitome. This approach is based on a probabilistic model that captures both the color information and certain spatial pattern.
At step 210, the epitome construction method comprises finding self-similarities within the current image Icurr. The current image is thus divided into a regular grid of blocks. For each block in the current image Icurr, one searches the set of patches in the same image with similar content. That is, for each block Bi(∈block grid), a list Lmatch(Bi)={Mi,0,Mi,1, . . . } of matches (or matched patches) is determined that approximate Bi with a given error tolerance ε. In the current embodiment, the procedure of matching is performed with a block matching algorithm using an average Euclidian distance. Therefore, at step 210, the patches Mj,l in the current image whose distance to the block Bi is below ε are added to the list Lmatch(Bi). The distance equals for example the absolute value of the pixel by pixel difference between the block Bi and the patch Mj,l divided by the number of pixels in Bi. According to a variant, the distance equals the SSE (Sum of Square Errors), wherein the errors are the pixel by pixel difference between the block Bi and the patch Mj,l. An exhaustive search is performed in the entire image. Once all the match lists have been created for the set of image blocks new lists L′match(Mj,l)indicating the set of image blocks that could be represented by a matched patch Mj,l, are built at step 220. Note that all the matched blocks Mj,i found during the full search step are not necessarily aligned with the block grid of the image and thus belong to the “pixel grid” as shown in
At step 240, epitome charts are constructed. To this aim, texture patches are extracted, more precisely selected, in order to construct epitome charts, the union of all the epitome charts constituting the texture epitome E. Each epitome chart represents specific regions of the image in term of texture.
Step 240 is detailed in the following.
At step 2400, an index n is set equal to 0, n is an integer.
At step 2402, a first epitome chart ECn is initialized. Several candidate matched patches can be used to initialize an epitome chart. Each epitome chart is initialized by the matched patch which is the most representative of the not yet reconstructed remaining blocks. Let Y ∈ RN×M denote the input image and let Y′ 531 RN×M denote the image reconstructed by a candidate matched patch and the epitome charts previously constructed. To initialize a chart, the following selection criterion based on the minimization of the Mean Square Error (MSE) criterion is used:
The selected criterion takes into account the prediction errors on the whole image. This criterion allows the epitome to be extended by a texture pattern that allows the reconstruction of the largest number of blocks while minimizing the reconstruction error. In the current embodiment, a zero value is assigned to image pixels that have not yet been predicted by epitome patches when computing the image reconstruction error.
At step 2404, the epitome chart ECn is then progressively grown by a region from the input image, and each time the epitome chart is enlarged, one keeps track of the number of additional blocks which can be predicted in the image as depicted on
In the preferred embodiment, the λ value is set to 1000. The first term of the criterion refers to the average prediction error per pixel when the input image is reconstructed by texture information contained in the current epitome
and the increment ΔE. As in the initialization step when the image pixels are impacted neither by the current epitome Ecurr nor by the increment, a zero value is assigned to them. FCext is thus computed on the whole image and not only on the reconstructed image blocks. The second term of the criterion corresponds to a rate per pixel when constructing the epitome, which is roughly estimated as the number of pixels in the current epitome and its increment, divided by the total number of pixels in the image. After having selected the locally optimal increment ΔEopt, the current epitome chart becomes: ECn(k+1)=ECn(k)+ΔEopt. The assignation map is updated for the blocks newly reconstructed by ECn(k+1).
Then, the current chart is extended, during next iteration k+1, until there are no more matched patches Mj,l which overlap the current chart ECn(k) and represent others blocks. If such overlapping patches exist then the method continues at step 2404 with ECn(k+1). When the current chart cannot be extended anymore and when the whole image is not yet reconstructed by the current epitome (step 2406), the index n is incremented by 1 at step 2408 and another epitome chart is created at a new location in the image. The method thus continues with the new epitome chart at step 2402, i.e. the new chart is first initialized before its extension. The process ends when the whole image is reconstructed by the epitome (step 2406). An example of a texture epitome is given by the
Once an epitome is created for an image, an approximation of this image can be reconstructed from the texture epitome and the transform map. However, due to the error tolerance ε, there are remaining differences between the original image and the reconstructed one. For video coding applications, it is thus necessary to further encode those remaining differences. Back to
At step 24, the texture epitome E is decoded. This step is the inverse of the texture epitome coding step, entropy coding apart. As an example, if the texture epitome coding step comprises computing a residual signal from a prediction signal, DCT and quantization then the decoding step 24 comprises dequantization, inverse DCT and adding the prediction signal to the residual signal in order to get a reconstructed signal.
At step 26, an intermediate image is reconstructed from the decoded texture epitome E and from the transform map Φ.
An example of an intermediate image reconstructed from the epitome of
Usually, the residue is first transformed into a block of coefficients such as DCT coefficients. The coefficients are then quantized into a block of quantized coefficients. The quantized coefficients are finally encoded into a bitstream using entropy coding such as well known arithmetic coding, CABAC (which stands for Context-Adaptive Binary Arithmetic Coding), CAVLC (which stands for Context-Adaptive Variable-Length Coding), etc. The invention is not limited to the type of encoding used to encode the residues. The bitstream F2 is the prediction error residual bitstream.
According to a variant, the first and second bitstreams are multiplexed into a single bitstream.
The method of coding according to the specific embodiment of
Secondly, an intermediate image is reconstructed from the texture epitome and the assignation map. Finally the current image Icurr is encoded using the reconstructed image Irec as reference image in the sense of inter image prediction. The steps of the encoding method according to the specific embodiment are summarized as follows:
The two bitstreams F1 and F2 (the one relative to the texture epitome and assignation map of the encoded epitome and the one relative to the current image Icurr) are finally either sent to a decoder or stored on a storage medium such as a hard disk or DVD.
At step 32, an image summary is decoded from a first bitstream F1. Cette étape est I'étape inverse de I'étape 22 du procédé de codage.
At step 34, the image summary is used to reconstruct an intermediate image. This step is identical to step 26 of the coding method.
At step 36, the current image is reconstructed using the intermediate image as reference image. When a block of the current image is encoded according to inter coding mode then the difference between the block and a corresponding block in the reference image, i.e. in the intermediate image reconstructed from the decoded epitome, is decoded. The corresponding block is identified in the reference image by a motion vector. Bidirectional prediction with two blocks of the reference image is possible. According to a variant, no motion vector is encoded and colocalized block in the reference image are used. The difference is in fact the prediction error calculated, on the encoder side, between the block and its prediction derived from the reference image. Usually, quantized coefficients are first decoded from a second bitstream using entropy decoding such as well known arithmetic coding, CABAC, CAVLC, etc. The quantized coefficients are then dequantized into a block of dequantized coefficients such as DCT coefficients. The dequantized coefficients are finally transformed, e.g. using an inverse DCT, into a block of residues. The block of residues is then added to a corresponding block in the reference image.
According to a variant, the reconstruction method further comprises a step 30 of demultiplexing a bitstream into the first and the second bitstreams when the first and the second bitstream are multiplexed.
According to a specific embodiment depicted on
The image reconstruction of the intermediate image is realized symmetrically at the encoder and at the decoder sides from the decoded texture epitome and assignation map in order to avoid any drift when reconstructing the current image.
Compared to existing methods based on intra coding, the invention has the advantages to improve the Rate Distortion performance. The main characteristics of the invention is the use of an image summary to predict a current image to be encoded where the image summary, e.g. the epitome gives a reconstructed image, this reconstructed image (normal size, i.e. the same size as the size of the original image from which the epitome is created) being used as reference image in a video encoder. Advantageously, the reconstructed image is of same size as the image to encode. Therefore, efficient mode such as known “skip mode” may be used to encode blocks in the current image thus decreasing its encoding cost.
The main targeted applications of the invention are the video distribution (including compression) and the display technologies applications related to video compression.
First the invention is not limited by the encoding method used to code the residue (i.e. the difference between a block and a corresponding block in the reference image) computed for a current image. In addition, the method is not at all limited to the method used for constructing the epitome, i.e. the texture epitome and the assignation map. Indeed, the method of coding according to the invention only requires for an image to be encoded an image summary whatever the method used to create the summary.
Number | Date | Country | Kind |
---|---|---|---|
11305064.5 | Jan 2011 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP11/58474 | 5/24/2011 | WO | 00 | 10/11/2013 |