The present invention relates to a video coding method for the compression of a bitstream corresponding to an original video sequence that has been divided into successive groups of frames (GOFs) the size of which is N=2n with n=0, or 1, or 2, . . . , said coding method comprising the following steps, applied to each successive GOF of the sequence:
The invention also relates to a video coding device for carrying out said coding method.
Video streaming over heterogeneous networks requires a high scalability capability. That means that parts of a bitstream can be decoded without a complete decoding of the sequence and can be combined to reconstruct the initial video information at lower spatial or temporal resolutions (spatial/temporal scalability) or with a lower quality (PSNR or bitrate scalability). A convenient way to achieve all these three types of scalability (scalable, temporal, PSNR) is a three-dimensional (3D, or 2D+t) subband decomposition of the input video sequence, performed after a motion compensation of said sequence.
Current standards like MPEG-4 have implemented limited scalability in a predictive DCT-based framework through additional high-cost layers. More efficient solutions based on a 3D subband decomposition followed by a hierarchical encoding of the spatio-temporal trees—performed by means of an encoding module based on the technique named Fully Scalable Zerotree (FSZ)—have been recently proposed as an extension of still image coding techniques for video: the 3D or (2D+t) subband decomposition provides a natural spatial resolution and frame rate scalability, while the in-depth scanning of the coefficients in the hierarchical trees and the progressive bitplane encoding technique lead to the desired quality scalability. A higher flexibility is then obtained at a reasonable cost in terms of coding efficiency.
The ISO/IEC MPEG normalization committee launched at the 58th Meeting in Pattaya, Thailand, Dec. 3-7, 2001, a dedicate AdHoc Group (AHG on Exploration of Interframe Wavelet Technology in Video Coding) in order to, among other things, explore technical approaches for interframe (e.g. motion-compensated) wavelet coding and analyze in terms of maturity, efficiency and potential for future optimization. The codec described in the document PCT/EP01/04361 (PHFR000044) is based on such an approach, illustrated in
With Haar filters used for the temporal filtering operations, motion estimation (ME) and motion compensation (MC) are only performed every two frames of the input sequence, the total number of ME/MC operations required for the whole temporal tree being roughly the same as in a predictive scheme. Using these very simple filters, the low frequency temporal subband represents a temporal average of the input couple of frames, whereas the high frequency one contains the residual error after the MCTF operation.
A parameter has been identified as being relevant for the MCTF module of a motion compensated 3D subband video coding scheme: it is what is called motion estimation activation, or “ME Activation”, or, in other words, the decision to perform or not ME on a couple of input frames (for the first temporal level) or subbands (for the following levels). For high motion activity sequence, it has indeed been observed that using ME and therefore performing temporal filtering along motion trajectories do increase the overall coding efficiency. However, this gain in coding efficiency may be lost in case of decoding at low bit-rate (one must keep in mind that the decoding bit-rate is a priori unknown in the framework of scalable coding), due to a too possible high overhead for motion vectors. So it may be more efficient in certain circumstances to decide not to activate ME so as to keep as much as possible bit-rate for texture coding (and decoding).
It is therefore an object of the invention to propose an encoding method avoiding the conventional solutions encountered in current MC 3D subband video coding schemes, in which ME Activation within a MCTF module is either arbitrarily chosen or derived from some information obtained a posteriori, i.e. only after having actually performed MCTF.
To this end, the invention relates to a coding method such as defined in the introductory paragraph of the description and which is moreover characterized in that said spatio-temporal analysis step also comprises a decision sub-step for activating or not the motion estimation sub-step, said decision sub-step itself comprising a motion activity pre-analysis operation based on the MPEG-7 Motion Activity descriptors and performed on the input frames or subbands to be motion compensated and temporally filtered.
According to a particularly advantageous implementation, said method is characterized in that said decision sub-step, based on the Intensity of activity attribute of the MPEG-7 Motion Activity Descriptors for all the frames or subbands of the current temporal decomposition level, comprises the following operations:
Since the ME deactivation for a specific level results in the ME deactivation for the following levels, this technical solution leads to a significant complexity reduction of the overall MCTF module, while still offering a good compression efficiency and above all a good compromise between motion vector overhead and picture quality.
It is another object of the invention to propose a coding device for carrying out such a coding method.
The present invention will now be described, by way of example, with reference to the accompanying drawings in which:
As seen above, the whole efficiency of any MC 3D subband video coding scheme depends on the specific efficiency of its MCTF module in compacting the temporal energy of the input GOF. As the parameter “ME Activation” is now known to be a major one for the success of MCTF, it is proposed, according to the invention, to derive this parameter from a dynamical Motion Activity pre-analysis of the input frames (or subbands) to be motion-compensated temporally filtered, using normative (MPEG-7) motion descriptors (see the document “Overview of the MPEG-7 Standard, version 6.0”, ISO/IEC JTC1/SC29/WG11 N4509, Pattaya, Thailand, December 2001, pp. 1-93). The following description will define which descriptor is used and how it influences the choice of the above-mentioned encoding parameter.
In the 3D video coding scheme described above, ME/MC is generally arbitrarily performed on each couple of frames (or subbands) of the current temporal decomposition level. It is now proposed to either activate or deactivate ME according to the “Intensity of activity” attribute of the MPEG-7 Motion Activity Descriptors, and this for all the frames—or subbands—of the current temporal decomposition level (Intensity of activity takes its integer values within the [1, 5] range: for instance 1 means a “very low intensity” and 5 means “very high intensity”). This Activity Intensity attribute is obtained by performing ME as it would be done anyway in a conventional MCTF scheme and using statistical properties of the motion-vector magnitude thus obtained. Quantized standard deviation of motion-vector magnitude is a good metric for the motion Activity Intensity, and Intensity value can be derived from the standard deviation using thresholds. The ME Activation will therefore be obtained as now described:
If ME is activated for a specific level, based on such a pre-analysis, motion vectors are already computed and can be directly used for MCTF of that level. On the contrary, if ME is deactivated, the motion vectors pre-computed for the needs of the pre-analysis are then useless and can be discarded. Moreover, the ME deactivation for a specific level results in the ME deactivation for the following levels, which leads to a reduction of complexity of the overall MCTF module, as illustrated for example in
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB03/03151 | 7/11/2003 | WO | 1/11/2005 |