The present invention generally relates to the field of data compression and, more specifically, to a method of encoding a sequence of frames which are composed of picture elements (pixels), said sequence being subdivided into successive groups of frames (GOFs) themselves subdivided into successive pairs of frames (POFs) including a previous frame A and a current frame B, said method performing a three-dimensional (3D) subband decomposition involving a filtering step applied, in said sequence considered as a 3D volume, to the spatial-temporal data which correspond to each GOF, said decomposition being applied to said GOFs together with motion estimation and compensation steps performed in each GOF on saids POFs A and B and on corresponding pairs of low-frequency temporal subbands (POSs) obtained at each temporal decomposition level, this process of motion compensated temporal filtering leading in each previous frame A on the one hand to connected pixels, that are filtered along a motion trajectory corresponding to motion vectors defined by means of said motion estimation steps, and on the other hand to a residual number of so-called unconnected pixels, that are not filtered at all.
The invention also relates to a computer-readable programme code embodied in a computer-usable medium for causing a computer system to perform such an encoding method when said programme is implemented by means of a processor.
In recent years, three-dimensional (3D) subband analysis has been more and more studied for video compression. A 3D, or (2D+t), wavelet decomposition of a sequence of frames considered as a 3D volume indeed provides a natural spatial resolution and frame rate scalability. The coefficients generated by the wavelet transform constitute a hierarchical pyramid in which the spatio-temporal relationship is defined thanks to 3D orientation trees evidencing the parent-offspring dependencies between coefficients, and the in-depth scanning of the generated coefficients in the hierarchical trees and a progressive bitplane encoding technique lead to the desired quality scalability. The practical stage for this approach is to generate motion compensated temporal subbands using a simple two taps wavelet filter, as illustrated in
In the illustrated implementation, the input video sequence is divided into Groups of Frames (GOFs), and each GOF, itself subdivided into successive couples of frames (that are as many inputs for a so-called Motion-Compensated Temporal Filtering, or MCTF module), is first motion-compensated (MC) and then temporally filtered (TF). The resulting low frequency (L) temporal subbands of the first temporal decomposition level are further filtered (TF), and the process may stop when there is only two temporal low frequency subbands left (the root temporal subbands), each one representing a temporal approximation of the first and second halves of the GOF. In the example of
When a Haar multiresolution analysis is used for the temporal decomposition, since one motion vector field is generated between every two frames in the considered group of frames at each temporal decomposition level, the number of motion vector fields is equal to half the number of frames in the temporal subband, i.e. four at the first level of motion vector fields and two at the second one. Motion estimation (ME) and motion compensation (MC) are only performed every two frames of the input sequence, and generally in the forward way. Using these very simple filters, each low frequency temporal subband (L) represents a temporal average of the input couples of frames, whereas the high frequency one (H) contains the residual error after the MCTF step.
Unfortunately, due to the nature of the motion in the scenes and the covering/uncovering of the objects, the motion compensated temporal filtering may raise the problem of unconnected picture elements (or pixels), which are not filtered at all (or also the problem of double-connected pixels, which are filtered twice). A conventional solution for trying to solve that problem is described with reference to
For each successive pair of frames (a current frame B associated to the corresponding previous frame A), a pair of subbands, comprising a temporal low-subband L and a temporal high-subband H, is generated by filtering and decimation. As illustrated in
According to said conventional solution, for an unconnected pixel in the previous frame A (like a3 or a4 in
In case of half-pixel motion compensation, the management of the integer vectors is the same. For half-pixel vectors, the motion vector pointing to a half-pixel position in the previous frame A is truncated to point to an integer pixel in said previous frame, as indicated in
In all the cases, the number of unconnected pixels represents a weakness of the 3D subband coding/decoding approaches, because it highly impacts the resulting picture quality, especially for the high motion sequences or for the final temporal decomposition levels (for which the temporal correlation is not good).
It is therefore an object of the invention to avoid such a drawback and to propose a video encoding method with an improved coding efficiency due to a reduction of the number of unconnected pixels.
To this end, the invention relates to an encoding method such as defined in the introductory part of the description and in which the motion estimation steps comprise, in view of possible half-pixel motion compensations, a truncation mechanism according to which, when a motion vector points from the current frame B to a sub-pixel position in the corresponding previous frame A, said motion vector is truncated to point to an integer pixel of said previous frame, said vector truncation mechanism depending on the neighboring of said sub-pixel position.
The present invention will now be described, by way of example, with reference to the accompanying drawings in which:
The object of the invention is to reduce the number of unconnected pixels and therefore to improve the coding efficiency of the 3D subband approach. To this end, the principle of the invention is to modify the “systematic” vector truncation mechanism as illustrated in
In order to guarantee a perfect reconstruction, the vector association mechanism thus proposed for half-pixel motion vectors must be identical at the decoder side.
As the only common information that can be used in a symmetric way on both encoding and decoding sides is the motion vector field, because it is the only information that is fully transmitted, the proposed solution at the encoding side will therefore be associated with a vector association protocol that can be mirrored at the decoding side.
As illustrated in
This algorithm allows to store in a table the status of the pixels of the reference frame, thanks to “status (i,j)” and as soon as the current frame is processed (more precisely, each pixel of the current frame). Said table “status (i,j)” is initialized to “unconnected” at the beginning of the processing, and each pixel of the current frame is processed in the same order as the scanning order. As soon as an unconnected pixel of the reference frame becomes “connected”, “status (i,j)” also is modified and becomes “connected”. At any moment, the situation is therefore known thanks to this table.
It is important to note that the above-given disclosure is only illustrative and that the present invention is not limited to the aforementioned implementation. Although the invention has been described mainly in the context of half-pixel motion compensation, it can be successfully applied to a motion compensation with a sub-pixel accuracy different from half-pixel accuracy. Potential associations for some cases of quarter-pixel positions are for example illustrated in
Number | Date | Country | Kind |
---|---|---|---|
02292933.5 | Nov 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB03/05297 | 11/20/2003 | WO | 5/24/2005 |