The present invention generally relates to the field of video compression and, for instance, more particularly to the video standards of the MPEG family (MPEG-1, MPEG-2, MPEG-4) and to the video coding recommendations of the ITU H26X family (H.261, H.263 and extensions). More specifically, the invention relates to a video encoding method applied to an input sequence of frames in which each frame is subdivided into blocks of arbitrary size, said method comprising for at least a part of said blocks of the current frame the steps of:
In the current video standards (up to the video coding MPEG-4 standard and H.264 recommendation), the video, described in terms of one luminance channel and two chrominance ones, can be compressed thanks to two coding modes applied to each channel: the “intra” mode, exploiting in a given channel the spatial redundancy of the pixels (picture elements) within each image, and the “inter” mode, exploiting the temporal redundancy between separate images (or frames). The inter mode, relying on a motion compensation operation, allows to describe an image from one (or more) previously decoded image(s) by encoding the motion of the pixels from one (or more) image(s) to another one. Usually, the current image to be coded is partitioned into independent blocks (for instance, of size 8×8 or 16×16 pixels in MPEG-4, or of size 4×4, 4×8, 8×4, 8×8, 8×16, 16×8 and 16×16 in H.264), each of them being assigned a motion vector (the three channels share such a motion description). A prediction of said image can then be constructed by displacing pixel blocks from a reference image according to the set of motion vectors associated to each block. Finally, the difference, or residual signal, between the current image to be encoded and its motion-compensated prediction can be encoded in the intra mode (with 8×8 discrete cosine transforms—or DCTs—for MPEG-4, or 4×4 DCTs for H.264 in the main level profile).
The DCT is probably the most widely used transform, because it offers a good compression efficiency in a wide variety of coding situations, especially at medium and high bitrates. However, at low bitrates, the hybrid motion compensated DCT structure may be not able to deliver an artefact-free sequence for two reasons. First, the structure of the motion-compensated inter prediction grid becomes visible, with blocking artifacts. Moreover, the block edges of the DCT basis functions become visible in the image grid, because too few coefficients are quantized—and too coarsely—to make up for these blocking artifacts and to reconstruct smooth objects in the image.
The document “Very low bit-rate video coding based on matching pursuits”, R. Neff and A. Zakhor, IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, n° 1, Feb. 1997, pp. 158-171, describes a new motion-compensated system including a video compression algorithm based on the so-called matching pursuit (MP) algorithm, a technique developed about ten years ago (see the document “Matching pursuits with time-frequency dictionaries”, S. G. Mallat and Z. Zhang, IEEE Transactions on Signal Processing, vol. 41, n° 12, Dec. 1993, pp. 3397-3414). Said technique provides a way to iteratively decompose any function or signal (for example, image, video, . . . ) into a linear expansion of waveforms belonging to a redundant dictionary of basis functions, well localized both in time and frequency and called atoms. A general family of time-frequency atoms can be created by scaling, translating and modulating a single function g(t) ε L2(R) supposed to be real and continuously differentiable. These dictionary functions may be designated by:
gγ(t)εG(G=dictionary set), (1)
γ(=gamma) being an indexing parameter associated to each particular dictionary element (or atom). As described in the first cited document, assuming that the functions gγ(t) have unit norm, i.e. <gγ(t), gγ(t)>=1, the decomposition of a one-dimensional time signal f(t) begins by choosing γ to maximize the absolute value of the following inner product:
p=<f(t), gγ(t)>, (2)
where p is called an expansion coefficient for the signal f(t) onto the dictionary function gγ(t). A residual signal R is then computed:
R(t)=f(t)−p·gγ(t) (3)
and this residual signal is expanded in the same way as the original signal f(t). An atom is, in fact, the name given to each pair γk, pk where k is the rank of the iteration in the matching pursuit procedure. After a total of M stages of this iterative procedure (where each stage n yields a dictionary structure specified by γn, an expansion coefficient pn and a residual Rn which is passed on to the next stage), the original signal f(t) can be approximated by a signal {circumflex over (f)}(t) which is a linear combination of the dictionary elements thus obtained. The iterative procedure is stopped when a predefined condition is met, for example either a set number of expansion coefficients is generated or some energy threshold for the residual is reached.
In the first document mentioned above, describing a system based on said MP algorithm and which performs better than the DCT ones at low bitrates, original images are first motion-compensated, using a tool called overlapped block-motion compensation which avoids or reduces blocking artifacts by blending the boundaries of predicted/displaced blocks (the edges of the blocks are therefore smoothed and the block grid is less visible). After the motion prediction image is formed, it is subtracted from the original one, in order to produce the motion residual. Said residual is then coded, using the MP algorithm extended to the discrete two-dimensional (2D) domain, with a proper choice of a basis dictionary (said dictionary consists of an overcomplete collection of 2D separable Gabor functions g, shown in
A residual signal f is then reconstructed by means of a linear combination of M dictionary elements:
If the dictionary basis functions have unit norm, {circumflex over (p)}n is the quantized inner product <,> between the basis function gγ
the pairs ({circumflex over (p)}n, γn) being the atoms. In the work described by the authors of the document, no restriction is placed on the possible location of an atom in an image (see
w(t)=4√{square root over (2·)}e−πt
A monodimensional (1D) discrete Gabor function is defined as a scaled, modulated Gaussian window:
The constant K{right arrow over (α)} is chosen so that g{right arrow over (α)}(i) is of unit norm, and {right arrow over (α)}=(s, ξ, φ) is a triple consisting, respectively, of a positive scale, a modulation frequency, and a phase shift. If S is the set of all such triples {right arrow over (a)}, then the 2D separable Gabor functions of the dictionary have the following form:
G{right arrow over (α)},{right arrow over (β)}(i,j)=g{right arrow over (α)}(i)g{right arrow over (β)}(j) for i,jε{0,1, . . . ,N−1}, and {right arrow over (α)},{right arrow over (β)}εS (8)
The set of available dictionary triples and associate sizes (in pixels) indicated in the document as forming the 1D basis set (or dictionary) is shown in the following table 1:
To obtain this parameter set, a training set of motion residual images was decomposed using a dictionary derived from a much larger set of parameter triples. The dictionary elements which were most often matched to the training images were retained in the reduced set. The obtained dictionary was specifically designed so that atoms can freely match the structure of motion residual image when their influence is not confined to the boundaries of the block they lie in (see
However, it has been recently proposed, in a European patent application filed on Aug. 5, 2003, by the applicant with the number EP03300081.1 (PHFR030085), a hybrid motion-compensated coding system using atoms that are confined to block boundaries, as depicted in
The main interest of this previous approach is to better model the blocky structure of residual signals, to augment the dictionary diversity for the same coding cost and to offer the possibility of alternating MP and DCT transforms since there is no interference across block boundaries (it also avoids the need to resort to overlapped motion compensation to limit blocking artefacts). The main elements useful to understand this previous implementation are recalled with reference to
A simplified block diagram of a video encoding device implementing a hybrid video coder using multiple coding engines is shown in
A matching pursuit coding engine is illustrated in
The encoding engine 43 carries out a method of coding an input bitstream that comprises the following steps. First, as in most coding structures, the original frames of the input sequence are motion-compensated (each one is motion-compensated on the basis of the previous reconstructed frame, and the motion vectors determined during said motion-compensated step are stored in view of their later transmission). Residual signals are then generated by difference between the current frame and the associated motion-compensated prediction. Each of said residual signals is then compared with a dictionary of functions consisting of a collection of 2D separable Gabor functions, in order to generate a dictionary structure gγ(t) specified by the indexing parameter γn, an expansion coefficient p(n) and a residual Rn(t)−p·gγ(t) which is passed on to the next stage of this iterative procedure. Once the atom parameters are found, they can be coded (together with the motion vectors previously determined), the coded signals thus obtained forming the bitstream sent to the decoder.
The technical solution proposed in the cited European patent application consists in confining the influence of atoms to the boundaries of the block they lie in. This block-restriction means that an atom acts only on one block at a time, confined into the motion-compensation grid, as illustrated in
If one assume that it is wanted to obtain the MP decomposition of the 2D residual in a block B of size M×N pixels after motion-compensation, and if one denotes G|B the MP dictionary restricted to B, the elements gγ
gγ
gγ
In this case, since gγ
The preferred embodiment of encoding device described above sends a bitstream which is received by a corresponding decoding device. A simplified block diagram of a video decoding device according to the invention and implementing a hybrid video decoder using multiple decoding engines is shown in
An example of matching pursuit decoding engine is further illustrated in
The interest of the previous approach, recalled above in a detailed manner, resides in the fact that because a single atom cannot span several blocks, it does not have to deal with the high-frequency discontinuities at block edges. Instead, it can be adapted to block boundaries, and even to block sizes, by designing block-size dependent dictionaries. Moreover, since overlapped motion compensation is no longer mandatory to preserve the MP efficiency, classical motion compensation may be used. However, with such an approach, it is not sure that the dictionary is well adapted to the structure of the signal to be modelled, when its atoms are confined in arbitrarily sized blocks.
It is therefore an object of the invention to propose a video encoding method based on matching pursuit algorithm and solving the above-indicated problem of adaptation.
To this end, the invention relates to a video encoding method such as defined in the introductory part of the description and which is moreover such that, when using said MP algorithm, a specific dictionary is available at the encoding side for each block shape respectively.
In another implementation of the method according to the invention, when using said MP algorithm, several dictionaries are available at the encoding side, and a bitstream syntax is defined for placing at a predetermined level flags provided to indicate which dictionary should be used.
It is another object of the invention to propose video encoding devices allowing to carry out these two implementations of the method according to the invention.
It is still an object of the invention to propose video decoding methods and devices allowing to decode signals coded by means of said video encoding methods and devices.
The present invention will now be described, by way of example, with reference to the accompanying drawing in which:
A simplified block diagram of a video encoding device implementing a matching pursuit algorithm has been described above in relation with
The technical solution now proposed according to the invention consists in having separate dictionaries, one for each block shape respectively (4×4, 4×8, 8×4, 8×8, 8×16, 16×8, 16×16, for example): with such a rule used by the encoder, the video decoder would implicitly know which dictionary an atom refers to. According to another implementation of the invention, the technical solution can also consists in providing several dictionaries, available at both the encoding side and decoding side, and in defining a bitstream syntax, which lets the encoder say to the decoder which dictionary should be used: for instance, the codeword MP_dictionary_1 tells the decoder that the next atom will refer to the first dictionary, MP_dictionary_2 tells the decoder to switch to the second dictionary, and so on, such codewords, or flags, being placed for example at the atom level, the block level, the macroblock level or the picture level.
Number | Date | Country | Kind |
---|---|---|---|
03300084 | Aug 2003 | EP | regional |
03300085 | Aug 2003 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2004/002478 | 7/14/2004 | WO | 00 | 2/7/2006 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2005/015501 | 2/17/2005 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5699121 | Zakhor et al. | Dec 1997 | A |
6148106 | Impagliazzo | Nov 2000 | A |
7003039 | Zakhor et al. | Feb 2006 | B2 |
20030058943 | Zakhor et al. | Mar 2003 | A1 |
20030103523 | Frossard et al. | Jun 2003 | A1 |
20040131268 | Sekiguchi et al. | Jul 2004 | A1 |
20040264792 | Hwang et al. | Dec 2004 | A1 |
Number | Date | Country |
---|---|---|
9715146 | Apr 1997 | WO |
9813787 | Apr 1998 | WO |
0149037 | Jul 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20070019723 A1 | Jan 2007 | US |