The present invention relates to an encoding method for the compression of an original video sequence divided into successive groups of frames (GOFs) and to a corresponding decoding method. It also relates to corresponding encoding and decoding devices.
The growth of the Internet and advances in multimedia technologies have enabled new applications and services. Many of them not only require coding efficiency but also enhanced functionality and flexibility in order to adapt to varying network conditions and terminal capabilities. Scalability answers these needs. Current video compression standards often use so-called hybrid solutions, based on a predictive scheme where each frame is temporally predicted from a reference frame (the prediction options being: zero value prediction, for the intra frames or I frames, forward prediction, for the P frames, or bi-directional prediction, for the B frames) and the obtained prediction error is spatially transformed to get advantage of spatial redundancies. From MPEG-2 to MPEG-4, standard-based scalable solutions have then been proposed. They rely on the generation of a base layer, containing the lowest spatial, temporal and/or SNR resolution version of the original video sequence, and one or several enhancement layers allowing (if transmitted and decoded) a spatially, temporally and/or SNR refined reconstruction. A short-coming of these layer-based scalability schemes comes however from their lack of coding efficiency.
A different approach has been proposed with techniques such as three-dimensional (3D) subband coding, which are able to generate embedded bitstreams. Thanks to their multi-resolution analysis structure, scalability is inherent to these schemes and does not weaken their intrinsic coding efficiency. In a 3D subband codec such as described for example in “A fully scalable 3D subband video codec”, “Proceedings of the International Conference on Image Processing (ICIP2001), vol.2, 2001, pp.1017-1020, the embedded bitstream is fully scalable and can be decoded at any spatial and temporal resolutions, and with any desired SNR quality, simply by truncation at known locations. In such a scheme, successive groups of frames (GOFs) are processed as a 3D structures and spatio-temporally filtered in order to compact the energy in the low frequencies, a motion compensation being also provided in order to improve the overall coding efficiency. The 3D subband structure is depicted in
As it is implemented now, a 3D subband codec applies the motion-compensated (MC) spatio-teniporal analysis at the full original resolution at the encoder side. Spatial scalability is achieved by getting rid of the highest spatial subbands of the decomposition. However, when motion compensation is used in the 3D analysis scheme, this method does not allow a perfect reconstruction of the video sequence at lower resolution, even at very high bit-rates: this phenomena, referred to as drift in the following description, lowers the visual quality of the scalable solution compared to a direct encoding at the targeted final display size. As explained in the document “Multiscale video compression using wavelet transform and motion compensation”, P. Y. Cheng and al., Proceedings of the International Conference on Image Processing (ICIP95), Vol.1, 1995, pp.606-609, this drift comes from the order of wavelet transform and motion compensation that is not interchangeable. Indeed, when a frame (A) is synthesized at a lower resolution (a), the following operation is applied:
where DWTL denotes the resolution downsample using the same wavelet filters as in the 3D analysis. In a perfect scalable solution, one wants to have:
a=DWTL(A) (2)
The remaining part of the expression (1) therefore corresponds to the drift. It can be noticed that, if no MC is applied, the drift is removed. The same phenomena happens (except at the image borders) if a unique motion vector is applied to the frame. Yet, it is known that MC is unavoidable to achieve a good coding efficiency, and the likelihood of a unique global motion is small enough to eliminate this particular case in the following paragraphs.
Some authors, such as J. W. Woods and al in the document “A resolution and frame-rate scalable subband/wavelet video coder”, IEEE Transactions on Circuits and Systems for Video Technology, vol.1, no. 9, September 2001, pp.1035-1044, get rid of this drift to achieve good spatial scalability by different means. However, in said document, the described scheme, in addition to being quite complex, implies the sending of an extra information (the drift correction necessary to correctly synthesize the upper resolution) in the bitstream, thus wasting some bits (the solution described in the document “Multiscale video compression . . . ” avoids this bottleneck but works on a predictive scheme and is not transposable to the 3D subband codec).
It is therefore an object of the invention to propose a solution avoiding these drawbacks.
To this end, the invention relates to a video encoding method for the compression of an original video sequence divided into successive groups of frames (GOFs), said method comprising the steps of:
The proposed solution is remarkable in the sense that the global structure of the decomposition tree in the 3DS analysis is preserved aid no extra information is sent to correct the drift effect (only the decomposition/reconstruction mechanism is changed). If no motion estimation/compensation is performed at full resolution, it is a low-cost solution in terms of complexity. If motion compensation is introduced in the high spatial subbands, a better coding efficiency is provided.
The invention also relates to a corresponding decoding method, comprising the steps of:
The invention also relates to an encoding device and a decoding device provided for implementing said encoding method and said decoding method respectively.
The invention will now be described in a more detailed manner, with reference to the accompanying drawings in which:
The proposed solution (i.e. a spatial scalability with no drift in a motion compensated 3D subband codec) is now explained with reference to its two main steps: (a) motion compensation at the lowest resolution, (b) encoding the high spatial subbands.
First in order to avoid drift at lower resolutions, Motion Compensation (MC) is applied at this level. Consequently, as illustrated in
Then, for coding the high spatial subbands, two main solutions are proposed, the first one without MC, and the second one with MC.
A) Without MC
In the first solution, the high subbands simply correspond to the high frequency spatial subbands of the original (full resolution) frames of the GOF in the wavelet decomposition. Those subbands allow the reconstruction at full resolution at the decoder. Indeed, the frames can be decoded at the low resolution. However, these frames correspond to the low spatial subband in the wavelet analysis of the original frames. Hence one has merely to put the low resolution frames and the corresponding high subbands together and apply a wavelet synthesis to obtains the full resolution frames. But now, where and how to put those high subbands in order to optimize the 3D-SPIHT encoder ? In a MC scheme for a 3D subband encoder, the low temporal subbands always look like one of the original frames of the GOF. As a matter of fact:
so L looks like A. Consequently, the high spatial subband of A should be placed with the low resolution decomposition corresponding to L. This approach (reordering of the high spatial subband in the case of forward motion compensations) is illustrated in
However, the motion compensation in the 3D subband structure can be either forward or backward (it has even been shown that alternate directions improve coding efficiency. The following algorithm, in which the notations are:
makes the link between a frame GOF_index in the GOF and the spatio-temporal subband {jt;n;t} which resembles it most, depending on the Motion Estimation Direction Description Tree.
The way to define the coefficients cjt is now described (in Haar filter case). Let α be the coefficient used in the temporal 2-tap Haar filter. In the conventional 3D subband scheme, one has:
If, in the present scheme, one uses cjt=αjt for the high spatial subbands, then it is still meaningful to use temporal scalability. Indeed:
where UpSample refers to the picture upsizing using wavelet filters. For the reconstruction at a lower frame rate, only the low temporal subband is synthesized:
Finally, the reconstructed frames at each temporal level will tend to look like a motion-compensated average of the “reference” original frame and a blurred version of the other one (up-sampled version of the downsized frame), whereas in the current version of the 3D subband codec this blur is not introduced. Improving spatial scalability at the expense of adding blur in the temporal scalability is however a worthy step.
B) With MC
As using MC in every subband does not allow a reconstruction with no drift, it is possible, as depicted in
The solution is to define:
It can be noticed that the MC is only used in the high temporal subband: A is first reconstructed at the full resolution thanks to the low temporal subband, and then used to get frame B with MC thanks to H. The coefficients cjt are chosen as previously. Said MC at full resolution can be performed either by merely upsampling the low resolution motion vectors (which has the advantage of introducing no other motion vector overhead) or by refining these upsampled low resolution vectors (which costs some additional transmission bits but is more efficient in terms of texture coding).
It must be understood that the present invention is not limited to the aforementioned embodiments, and variations and modifications may be made without departing from the spirit and scope of the invention. There are numerous ways of implementing functions of the method according to the invention by means of items of hardware or software, or both, provided that a single item of hardware or software can carries out several functions. It does not exclude that an assembly of items of hardware or software or both carry out a function, thus forming a single function without modifying the method in accordance with the invention. Said hardware or software items can be implemented in several manners, such as by means of wired electronic circuits or by means of an integrated circuit that is suitable programmed. The integrated circuit can be contained in a computer or in an encoder or decoder and comprise a set of instructions, contained, for example, in a computer programming memory or in an encoder or decoder memory and causing the computer or the decoder to carry out the different steps of the methods according to the invention. This set of instructions may be loaded into the programming memory by reading a data carrier such as, for example, a disk. A service provider can also make the set of instructions available via a communication network such as, for example, the Internet.
Number | Date | Country | Kind |
---|---|---|---|
02290155.7 | Jan 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB03/00156 | 1/20/2003 | WO |