The present invention generally relates to the field of video compression and, more specifically, to a video coding method based on a subband coding scheme applied to a sequence of two-dimensional frames, said method comprising a subband decomposition step of a current frame (F2), a motion prediction step, carried out with respect to a previous or reference frame (F1), and a coding step.
In the multimedia domain, new interactive applications such as Internet video streaming, video database browsing or multi-quality video services, are becoming widespread. This recent expansion of video services over heterogeneous networks (Internet, mobile nets, In Home Digital Networks) has raised new issues in terms of varying transport conditions (bandwidth, error rate . . . ) as well as varying consumer demands and terminal decoding capacibilities (CPU, display size, application . . . ), and has led to investigate new algorithms for video compression, in particular methods based on subband decompositions.
In the conventional video coding algorithms, and more particularly within the frame of the MPEG-4 standard, the motion estimation between successive frames of a processed video sequence is carried out by means of the so-called block-matching algorithm (BMA):in BMA, a motion vector is assigned to a block of picture elements (pixels) in a current frame (decomposed into blocks of fixed size) by searching a similar block within a defined area—or search window—of a reference frame, the best vector (that represents the shifting of the block) being found by evaluating candidates according to an error measure (the best corresponding block is the one that gives rise to a minimal error). The BMA unfortunately generates high frequency artifacts at the blocks edges (blocking effects) in the motion compensated frames. When the BMA is used in coding schemes based on a wavelet decomposition, these artefacts reduce the coding efficiency by inducing high coefficients in the wavelet decomposition of the compensated frames.
The object of the invention is to propose a video coding method incorporating another type of motion estimation and compensation and allowing to improve the coding efficiency.
To this end, the invention relates to a method such as defined in the introductory paragraph of the description and which is moreover characterized in that said motion prediction step is based on a redundant decomposition of the reference frame, according to the following procedure:
The proposed technical solution is based on the implementation of a pel-recursive motion estimation algorithm using wavelets. A motion estimation in the wavelet domain has to cope with the problem of translation invariance. The redundant decomposition of the reference frame is then used to predict the motion, allowing thus to take into account motion on the finest resolution grill. In the approximation subband, the motion is estimated by a full search algorithm applied to every pixel in the current frame. For the other subbands, a pel-recursive algorithm using for prediction the reference redundant subband is implemented. The initialization of this algorithm is made using a weighted value of the motion vectors of the spatial neighbours in the same subband and of the motion vector corresponding to the same position at the previous level of the decomposition. The algorithm allows to do a re-initialization at the positions failing to converge towards a good estimate. The scanning order in the subbands is also optimized in order to minimize the drift that can occur in a line-by-line scan.
More specifically, said pel-recursive motion estimation algorithm may comprise the following sub-steps:
In a particular implementation of the invention, the break test may be based on the estimation of the following ratio:
said ratio having to be greater than a given threshold ε.
Said break test may also include an additional condition put on a maximum number of iterations.
It may also be indicated that, according to an advantageous embodiment of the invention, the determination of said update vector ui is based on a computation step including the following minimizing operation: knowing that B(m)=A(m−di)−δdT.∇A where δdT.∇A is the inner product of ∇A and vector δd=d−di, to minimize the square error J:
J=(B(m)−A(m−di)+δdT.∇A)2+λ||δd||2
with respect to δd, the term λ||δd||2 being a regularization term provided in view of a trade-off between the smoothness of the resulting motion vector field, for large values of λ, and the accuracy of the motion vectors, for small values of λ, the minimum being obtained for ∂J/∂δd=0 and leading to the update vector:
In another embodiment of the invention, the motion estimation algorithm may also comprise the following sub-steps:
The present invention will now be described, by way of example, with reference to the accompanying drawings in which:
In order to obtain flexible video coding systems, able to cope with different requirements and capabilities, the scalability is the expected functionality. Progressive encoding techniques based on subband decompositions may be an answer, since they allow a fully progressive transmission. Particularly, wavelets have a high efficiency in progressively encoding images in view of a scalable representation. They offer a natural multiscale representation for still images, that can be extended to video data by means of a 3D (or 2D+t) wavelet analysis including the temporal dimension within the decomposition (3D =three-dimensional; 2D =two-dimensional; t=time). The introduction of a motion compensation step in the 3D subband decomposition scheme leads to a spatio-temporal multiresolution (hierarchical) representation of the video signal, as illustrated in
The illustrated 3D wavelet decomposition with motion compensation is applied to a group of frames (GOF), referenced F1 to F8. In this 3D subband decomposition scheme, each GOF of the input video is first motion-compensated (MC) in order to process sequences with large motion, and then temporally filtered (TF) using Haar wavelets. The dotted arrows correspond to a high-pass temporal filtering, while the other ones correspond to a low-pass temporal filtering, and three stages of decomposition are shown (L and H=first stage ; LL and LH=second stage ; LLL and LLH=third stage).
However, when performing a pel-recursive motion estimation on the coefficients of each subband, the lack of translation invariance must be taken into account. The major problem with a dyadic subband decomposition (see
For a 1D signal, a subsampling operation yields two possible choices: either odd or even samples can be taken. For a 2D signal (such as an image), this operation yields four possible decompositions. Therefore, if a signal frame is decomposed upon three levels, 64 possible decompositions, equivalent in terms of quantity of information, can be obtained. However, as previously mentioned, a motion estimation cannot be performed on them, because they are not redundant. It is then observed from
The wavelet decomposition itself is performed using the so-called lifting scheme, depicted for instance in
−even samples first ?:
x0[n]=f[2n]
x1[n]=f[2n+1]}n≧0
−odd samples first ?:
x0[n]=f[2n+1]
x1[n]=f[2(n+1)]}n≧0
FIGS. 10 to 12 then show the “classical” wavelet decomposition obtained by applying the identities of
The problem is that lifting wavelets inherently use decimation before filtering. In
It has been seen, in relation with
The critically subsampled subbands are then used to construct the redundant subband. The number of the decomposition basis yields the position where the subband will be placed on a rectangular grid. For example, if the bits describing x are 0 1 0 (3rd level, 2nd level, 1st level), they yield the number 2, which is the reconstruction offset showing where the first sample of the subband has to be placed in the redundant subband, following the x direction. The same operation is applied in the y direction, and the samples are then placed at intervals of length:interval=2decomposition level. Knowing the offset and the interval for each subsampled subband, it is possible to interleave them in order to reconstruct the redundant subband, as shown in
The pel-recursive motion estimation will be now described. In what follows, bold characters for vectors (d), ∇ for the gradient, and (.)T for transposition will be used. The following algorithm is meant to compute the Optic Flow between two matrices A and B, which can be two successive frames of a sequence, or (as considered in the present case) two successive subbands. The aim is to estimate for each pixel m=(m,n), the motion vector d(m):
B(m)=A(m−d) (1)
To perform a recursive description of d, it has to be assumed that d is smoothly distributed over the image plane. Basically the method described here is:for each reference pixel in B, the gradient in A is computed at the estimated position (m−d) in order to find a pixel in A that is closer to the reference pixel. This yields a new position m−d. If it is the same position, the algorithm stops, else it iterates once more. Thus the algorithm converges and stops when it has reached the right position.
The iterative procedure is composed of the following steps, including the initialization, the computation of an update vector, the update of the motion vector, and a break test:
The second step (update vector computation, described by the relation (3)) can be detailed. The first order approximation of the relation (1) reads:
B(m)=A(m−di)−δdT. ∇A (8)
where ∇A is the gradient from the relation (4) and a δ=31 di
One has to minimize the square error (the term λ||δd||2 will be explained later):
The minimum is obtained when:
i.e. when:
(B(m)−A(m−di)+δdT.∇A).∇A+λδd=0 (10)
As δdT.∇A is the inner product of vectors δd and ∇A, one has: δdT.∇A=∇AT. δd and the relation (10) becomes:
(∇A.∇T+λI).δd=−E(m).∇A (11)
The utility of λ can now be seen: the matrix ∇A∇AT is not invertible (rank 1), but ∇A∇AT+λI can be inverted using the following lemma:
which yields here for u=∇A and M=λI:
and finally leads to:
This is the update vector ui used in the relations (3) and (6). This regularization provides a trade-off between the smoothness of the resulting motion vector field (for large λ) and the accuracy of the motion vectors (for small λ).
It has been seen, in the relation (11), that ∇A has to be computed. The gradient computation can be approximated by a discrete filtering of the image. To perform a discrete approximation of the gradient by linear filtering, the filters must satisfy some conditions that will be now reminded. Assuming that h is a FIR filter, and considering the rectangular sampling of a continuous field F(x,y), one has: F[n, m]=F(nL1, nL2). The gradient of the image following the x direction is:
S being considered as a close neighbourhood of 9 pixels (the center pixel m and the eight pixels around). On the other hand, at first order, one has:
and combining the relations (14) and (15) yields:
Identifying the members of the relation (16) then yields three constraints for h:
The chosen filters, which satisfy these conditions, are the following:
Different improvements can be proposed for the implementation of the algorithm described above, especially in terms of accuracy of motion vector and speed. The speed of the algorithm can be increased by making it more rapidly convergent. The initialization step is thus crucial for the convergence speed. A good initialization has to take into account as much as possible the previously computed motion vectors. However, the assumption about the smoothness of the distribution of the motion vectors over the image plane can be untrue (for certain types of motions, for instance when objects in the plane are moving in different directions) at the boundaries of the objects. The algorithm has to be able to detect this event, and to correct the initialization by breaking the smoothness of the motion vector field. It is possible to introduce a break test at the end of the initialization step (and even after each update of the motion vector). One has to compute two test values:
E0(m)=|Bj,s(m)−Aj,s(fm)|
Ej,s(m)=|Bj,s(m)−Aj,s(fm−fdi)|,
where E0 is the error without motion (di=0), and Ej,s is the error when the computed motion vector is taken into account. The principle is that if Ej,s is greater than E0, the motion vector has to be reinitialized to zero. However, too frequent reinitializations will prevent the algorithm from converging. When the prediction error is very small, this case could appear quite often. That is why a tolerance in this test is introduced, which is defined through inequality:
Ej,s>E0+THR (17)
where THR is a threshold, to be determined according to the value of the subband coefficients.
Another improvement relates to the scanning order. The image being a table, it is scanned from the first item (i.e. m=(0,0)) to the last one (m=xmax, ymax), as shown in
* j0 m *
For the even lines, the neighbourhood remains unchanged:
A third improvement relates to the approximation subband SB Ajmax,0 of the image, which has, by definition, very few details. Thus a pel-recursive motion estimation is sometimes not accurate enough. Small errors in the compensated approximation yield bad results for the image reconstructed from the compensated subbands. Better results are obtained if a full search is performed for each pixel of Bjmax,0 in a window of Ajmax,0, but this method is not as fast as the pel-recursive algorithm. The size of the search window is:
(2.(2jmax)+1) by (2.(2jmax)+1)
as shown in
D0=α.dapproximation+(1−α).d0, (18)
where α is a constant that allows to control the influence of dapproximation in the initialization. If α=0, dapproximation has no influence at all. The same principle can be used for the other subbands SB. It is then possible to use the previously computed motion vector, at a higher level of decomposition j, in order to initialize the algorithm. If one calls dj+1 the motion vector estimated at level j+1, the new initialization vector is: D0=α.dj+1+(1−α).d0. The generic algorithm described above can be applied between two subbands. However, if it is used, the correlation that exists between the subbands of a same level of decomposition is not taken into account: the computed motion vector should be the same for each pixel in the three subbands (or four when LL-subband is taken into account). It is then possible to decide to estimate, for a given level of decomposition jε{1,2, . . . jmax}, for each pixel m=(m,n), the motion vector dj (m):
Bj,s(m)=Ajs(m−dj) for sε{0, . . . 4} (19)
where:
The assumption of d being smoothly distributed over the subband planes has to be made just as previously.
As previously, the iterative procedure, carried out for each level of decomposition j ε{1, 2, . . . jmax}, comprises four steps including the initialization, the computation of an update vector, the update of the motion vector and a break test, but with some slight modifications:
The iterative procedure is initialized by the mean value of the motion vector in a causal neighbourhood S,
with for example a typical neighbourhood of 4 pixels (the motion vectors are supposed to be zero out of the image boundaries):
For the update vector computation, at the first order, the relation (1) becomes, for every s:
Bj,s(m)=Aj,s(m−dj,s)−δdjT∇Aj,s (26)
where ∇Aj,s is the gradient from Eq. (4) and δdj=dj−dj,j.
As it is wanted to minimize the square error on all the subbands at the same time:
the minimum is obtained when:
The relation (28) then becomes:
Even without regularization, the matrix
can be invertible. It is a 2 by 2 matrix, and the inverted matrix can be computed explicitely. In this case, the regularization term λI is required only for the control of the motion vector field coherency.
In case this pel-recursive motion estimation algorithm should be applied to the redundant subband, the previous algorithm must be adapted. Let f be the scale factor caracterizing the ratio between the size of the redundant subband and the size of the subsampled subband:
f=2j
where j is the decomposition level.
It is known that a redundant subband presents the lack translation invaxiance. As a result, the pel-recursive motion estimation can be performed between a redundant subband (reference frame) and a subsampled subband (current frame). The motion vectors are then (assuming that even samples have been retained for the subsampled decomposition of Bj,s) defined through:
Bj,s(m)=Aj,s(fm−fdj) (30)
The iterative procedure is then slightly different:
Number | Date | Country | Kind |
---|---|---|---|
01401697.6 | Jun 2001 | EP | regional |
01402666.0 | Oct 2001 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB02/02362 | 6/20/2002 | WO |