The present invention relates to the field of video compression and, more particularly, to a three-dimensional (3D) video coding method for the compression of a bitstream corresponding to an original video sequence that has been divided into successive groups of frames (GOFs) the size of which is N=2n with n being an integer, these GOFs being themselves subdivided into successive couples of frames (COFs), said coding method comprising the following steps, applied to each successive GOF of the sequence:
a) a spatio-temporal analysis step, performed with a given number of levels at most equal to n and leading to a spatio-temporal multiresolution decomposition of the current GOF into low and high frequency temporal subbands, said step itself comprising:
a motion estimation sub-step;
based on said motion estimation, a motion compensated temporal filtering sub-step, performed on each of the 2n−1 COFs of the current GOF;
a spatial analysis sub-step, performed on the subbands resulting from said temporal filtering sub-step;
b) an encoding step, said step itself comprising:
an entropy coding sub-step, performed on said low and high frequency temporal subbands resulting from the spatio-temporal analysis step and on motion vectors obtained by means of said motion estimation step;
an arithmetic coding sub-step, applied to the coded sequence thus obtained and delivering an embedded coded bitstream.
The invention also relates to a corresponding video coding device, allowing to implement said coding method.
The first standard video compression schemes were based on so-called hybrid solutions: an hybrid video encoder uses a predictive scheme where each current frame of the input video sequence is temporally predicted from a given reference frame, and the prediction error thus obtained by difference between said current frame and its prediction is spatially transformed (the transform is for instance a bi-dimensional DCT transform) in order to get advantage of spatial redundancies. A more recent approach, called 3D (or 2D+t) subband analysis, has then consisted in processing a group of frames (GOF) as a three-dimensional structure and spatio-temporally filtering it in order to compact the energy in the low frequencies.
The introduction of a motion compensation step in such a 3D subband decomposition scheme allows to improve the overall coding efficiency and leads to a spatio-temporal multiresolution (hierarchical) representation of the video signal thanks to a subband tree. As depicted for instance in
Among the different entropy coding techniques that can be used to encode the 3D wavelet coefficients resulting from this subband decomposition, the so-called 3D-SPIHT algorithm, described for example in the document “Low bit-rate scalable video coding with 3D set partitioning in hierarchical trees (3D-SPIHT)”, K. Z. Xiong and W. A. Pearlman, IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no 8, December 2000, pp. 1374-1387, is one of the most efficient ones (and also its extension to support scalability, described in “A fully scalable 3D subband video codec,” V. Bottreau, M. Bénetière, B. Pesquet-Popescu and B. Felts, Proceedings of IEEE International Conference on Image Processing, ICIP 2001, vol. 2, pp. 1017-1020, Thessalonild, Greece, Oct. 7-10, 2001).
This 3D-SPIHT algorithm is presented in
In the literature, when the 3D-SPIHT is used, the temporal decomposition may be stopped (see
It is an object of the invention to propose more efficient coding method with which the dependencies at this deep temporal decomposition level, which do not play a major role in the efficiency of the SPIHT approach (the benefit of exploiting inter-subband correlation appears especially in the first steps of the decomposition), are removed.
To this end, the invention relates to a coding method such as defined in the introductory part of the description and which is moreover characterized in that, when said temporal filtering sub-step comprises (n−1) decomposition levels so that the final temporal decomposition level that would have led to a single low-frequency subband is omitted, the spatio-temporal analysis and encoding steps are performed according to the following rules:
(a) each current input GOF is splitted into two new GOFs with half the original size and half the number of COFs, said new GOFs being independent and comprising respectively the 2n−1 first frames and the 2n−1 last ones of said original input GOF;
(b) in each of these two new GOFs, a complete spatio-temporal multiresolution decomposition with (n−1) levels is performed down to the last low frequency temporal subband in order to get only one final approximation subband for each of said new GOFs;
(c) a modified 3D-SPIHT scanning is applied consecutively and independently on these two new GOFs, the spatio-temporal orientation trees used by said SPIHT scanning for defining the spatio-temporal relationships inside the hierarchical pyramid of the wavelet coefficients including now half the original number of subbands with respect to a spatio-temporal decomposition as conventionally performed on the original GOF.
The invention also relates to a video coding device allowing to carry out said method.
To this end, the invention relates to a device comprising:
a) spatio-temporal analysis means applied to each successive GOF of the sequence with a given number of levels at most equal to n and leading to a spatio-temporal multiresolution decomposition of the current GOF into low and high frequency temporal subbands, said analysis means performing:
a motion estimation sub-step;
based on said motion estimation, a motion compensated temporal filtering sub-step, performed on each of the 2n−1 COFs of the current GOF;
a spatial analysis sub-step, performed on the subbands resulting from said temporal filtering sub-step;
b) encoding means, themselves comprising:
entropy coding means, applied to said low and high frequency temporal subbands resulting from the spatio-temporal analysis step and to motion vectors obtained by means of said motion estimation sub-step;
arithmetic coding means, applied to the coded sequence thus obtained and delivering an embedded coded bitstream;
said video coding device being further characterized in that, when said temporal filtering sub-step comprises (n−1) decomposition levels and the final temporal decomposition level that would have led to a single low-frequency subband is omitted, the spatio-temporal analysis and encoding means use the following rules:
(a) each current input GOF is splitted into two new GOFs with half the original size and half the number of COFs, said new GOFs being independent and comprising respectively the 2n−1 first frames and the 2n−1 last ones of said original input GOF;
(b) in each of these two new GOFs, a complete spatio-temporal multiresolution decomposition with (n−1) levels is performed down to the last low frequency temporal subband in order to get only one final approximation subband for each of said new GOFs;
(c) a modified 3D-SPIHT scanning is applied consecutively and independently on these two new GOFs, the spatio-temporal orientation trees used by said SPIHT scanning for defining the spatio-temporal relationships inside the hierarchical pyramid of the wavelet coefficients including now half the original number of subbands with respect to a spatio-temporal decomposition as conventionally performed on the original GOF.
The present invention will now be described, by way of example, with reference to the accompanying drawings in which:
In order to remove dependencies between the two approximation subbands LL0 and LL1 of the uncompleted temporal decomposition of
This new temporal decomposition is illustrated in
Starting from this new temporal decomposition, the original SPIHT scanning of
The technical solution thus proposed halves the number of frames per GOF for a given number of decomposition levels. This can be considered as a major improvement when compared to the original solution, because it halves the memory requirement both at the encoding side and at the decoding side. Moreover, this approach does not bring any penalty to the coding efficiency, since the modified dependencies only affect the temporal approximation subbands that can be considered as uncorrelated.
It may be noted that the new SPIHT scanning illustrated in
Number | Date | Country | Kind |
---|---|---|---|
02292994.7 | Dec 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB03/05465 | 11/27/2003 | WO | 6/3/2005 |