FILTERING A DISPLACEMENT FIELD BETWEEN VIDEO FRAMES

TECHNICAL FIELD

The present invention relates generally to the field of dense point matching in a video sequence. More precisely, the invention relates to a method for filtering a displacement fields.

BACKGROUND

This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.

The problem of point and path tracking is a widely studied and still open issue with implications in a broad area of computer vision and image processing. On one side and among others, applications such as object tracking, structure from motion, motion clustering and segmentation, and scene classification may benefit from a set of point trajectories by analyzing an associated feature space. On the other side, applications related to video processing such as augmented reality, texture insertion, scene interpolation, view synthesis, video inpainting and 2D-to-3D conversion eventually require determining a dense set of trajectories or point correspondences that permit to propagate large amounts of information (color, disparity, depth, position, etc.) across the sequence. Dense instantaneous motion information is well represented by optical flow fields and points can be simply propagated through time by accumulation of the motion vectors, also called displacement vectors. That is why state-of-the-art methods as described by Brox and Malik in “Object segmentation by long term analysis of point trajectories” (Proc. ECCV, 2010) or by Sundaram, Brox and Keutzer in “Dense point trajectories by GPU-accelerated large displacement optical flow” (Proc. ECCV, 2010) have built on top of optical flow, methods for dense point tracking using such accumulation of motion vectors. Finally, such state-of-the art methods produce a motion field either based on a from-the-reference integration, for instance using Euler integration as disclosed by Sundaram, Brox and Keutzer in “Dense point trajectories by GPU-accelerated large displacement optical flow” (Proc. ECCV, 2010)) or a to-the-reference integration as disclosed in an international patent application PCT/EP13/050870 filed on Jan. 17^th, 2013 by the applicant.

The technical issue is how to combine both representations in order to efficiently exploit their respective benefits such as a better representation of spatio-temporal features of a point (or pixel) for a from-the-reference displacement field and accuracy of the estimation with to-the-reference displacement field.

The present invention provides such a solution.

SUMMARY OF INVENTION

The invention is directed to a method for filtering a displacement field between a first image and a second image, a displacement field comprising for each pixel of the first (reference) image a displacement vector to the second (current) image, the method comprising a first step of spatio-temporal filtering wherein a weighted sum of neighboring displacement vectors produces, for each pixel of the first image, a filtered displacement vector. The filtering step is remarkable in that a weight in the weighted sum is a trajectory weight where a trajectory weight is representative of a trajectory similarity. Advantageously, the first filtering step allows taking into account trajectory similarities between neighboring points.

According to an advantageous characteristic, a trajectory associated to a pixel of the first image comprises a plurality of displacement vectors from the pixel to a plurality of images. According to another advantageous characteristic, a trajectory weight comprises a distance between a trajectory from the pixel and a trajectory from a neighboring pixel.

In a first embodiment, the first step of spatio-temporal filtering comprises for each pixel of the first image:

- Determining a set of neighboring images around the second image;
- Determining a set of neighboring pixels around the pixel of the first image;
- Determining neighboring displacement vectors for each neighboring pixel, neighboring displacement vectors belonging to a displacement field between the first image and each image from the set of neighboring images;
- Determining a weight for each neighboring displacement vector including a trajectory weight;
- Summing weighted neighboring displacement vectors producing a filtered displacement vector.
  - According to an advantageous characteristic, the set of neighboring images comprises images temporally placed between the first (reference) image and the second (current) image.
  - In a second embodiment, the first spatio-temporal filtering step is applied to a from-the-reference displacement field producing a filtered from-the-reference displacement field; and the method further comprises a second step of joint forward backward spatial filtering comprising a weighted sum of displacement vectors wherein the displacement vector belongs:
  - either to a set of filtered from-the-reference displacement vectors between the first image and the second image for each neighboring pixel in the first image;
  - or to a set of to-the-reference inverted displacement vectors for each neighboring pixel in the second image of an endpoint location resulting from a from-the-reference displacement vector for the pixel of the first image.

Advantageously in the second filtering step, backward displacement field is used to refine forward displacement field build by a from-the-reference integration. Advantageously the second step is applied on filtered from-the-reference displacement field. In a variant, the second step is applied on from-the-reference displacement field.

In a variant of the second embodiment, the method comprises a second step of joint forward backward spatial filtering comprising a weighted sum of displacement vectors wherein the displacement vector belongs:

- either to a set of to-the-reference displacement vectors between the second image and the first image for each neighboring pixel of the second image;
- or to a set of filtered from-the-reference inverted displacement vectors for each neighboring pixel in the first image of an endpoint location resulting from a to-the-reference displacement vector for the pixel of the second image.

In another variant of the second embodiment, the method comprises, after the second joint forward backward spatial filtering step, a third step of selecting a displacement vector between a previously filtered displacement vector and a current filtered displacement vector. This variant advantageously produces converging displacement fields.

In a third embodiment, the method comprises, before the first spatio-temporal filtering step a step of occlusion detection wherein a displacement vector for an occluded pixel is discarded in the first and/or second filtering steps.

In a refinement of the third embodiment, the 3 steps (spatio-temporal filtering, joint forward backward filtering, occlusion detection) are sequentially iterated for each displacement vector of successive second images belonging to a video sequence.

In a further refinement of the third embodiment, the steps are iterated for each inconsistent displacement vectors of successive second images belonging to the video sequence. In others words, once displacement vectors are filtered for a set of N images, the filtering is iterated only for inconsistent displacement vectors of the same set of N images. Advantageously, in this refinement, only bad displacement vectors (those for which the similarity of forward and backward displacement vectors are above a threshold) are processed in a second pass.

According to another aspect, the invention is directed to a graphics processing unit comprising means for executing code instructions for performing the method previously described.

According to another aspect, the invention is directed to a computer-readable medium storing computer-executable instructions performing all the steps of the method previously described when executed on a computer.

Any characteristic or variant embodiment described for the method is compatible with the device intended to process the disclosed method or the computer-readable medium.

BRIEF DESCRIPTION OF DRAWINGS

Preferred features of the present invention will now be described, by way of non-limiting example, with reference to the accompanying drawings, in which:

FIG. 1
a illustrates motion integration strategies through Euler integration method according to prior art;

FIG. 1
b illustrates motion integration strategies through inverse integration method according to an international patent application of the applicant;

FIG. 2
a illustrates estimated trajectories for rotational motion;

FIG. 2
b illustrates estimated trajectories for divergent motion;

FIG. 2
c illustrates estimated trajectories for zero motion;

FIG. 3
a illustrates position square error through time for rotational motion;

FIG. 3
b illustrates position square error through time for divergent motion;

FIG. 3
c illustrates position square error through time for zero motion;

FIG. 4
a illustrates from-the-reference correspondence point scheme;

FIG. 4
b illustrates to-the-reference correspondence point scheme;

FIG. 5 illustrates the steps of the method of filtering according to an embodiment of the invention;

FIG. 6 illustrates the steps of the method of filtering according to another embodiment of the invention;

FIG. 7 illustrates a device configured for implementing the method according to an embodiment of the invention; and

FIG. 8 illustrates the neighboring images and pixels for the filtering method.

DESCRIPTION OF EMBODIMENTS

In the following description, the term “motion vector” or “displacement vector” d_0,N(x) comprises a data set which defines a displacement from a pixel x of a first frame I₀to a corresponding location into a second frame I_Nof a video sequence and wherein indices 0 and N are numbers representative of the temporal frame position in the video sequence. An elementary motion field defines a motion field between 2 consecutives frames I_Nand I_N+1.

Respectively the terms “motion vector” or “displacement vector”, “elementary motion vector” or “elementary displacement vector”, “motion field” or “displacement field”, “elementary motion field” or “elementary displacement field” are indifferently used in the following description.

A salient idea of the method for filtering a motion field or a set of motion fields for a video sequence is to introduce an information representative of trajectory similarity of spatial and temporal neighboring points in the filtering method.

Consider a sequence of images {I_n}_{n:0 . . . N}where I_n:G→Λ is defined on the discrete rectangular grid G and A is the color space. Let d_n,m:Ω→ custom-character ²be a displacement field defined on the continuous rectangular square Ω, such that for every xεΩ it corresponds a displacement vector d_n,m(x)ε²for the ordered pair of images {I_n, I_m}. Furthermore, let us call I₀the reference image. We pose the following problem: Given an input set of elementary optical flow fields v_n,n+1:G→ custom-character ²defined on the grid G, compute the displacement vectors d_0,m(x)=d_0,m(i,j) ∀m: 1 . . . N, and for the grid position x=(i,j)εG.

This is essentially the problem of determining the position of the initial point (i,j) in I₀at each subsequent frame, i.e. the trajectory of (i,j) from I₀to I_Nor custom-character _0:N(i,j). The classical solution to this problem is to apply a simple Euler's integration method1 which is defined by the iteration

d
_0,m+1(i,j)=d_0,m(i,j)+ν_m,m+1((i,j)+d_0,m(i,j)) (1)

from which the trajectory position in I_m+1is given by x_m+1=(i,j)+d_0,m+1(i,j) and ν_m,m+1(•) is probably an interpolated value at a non-grid location. Now, is this the best way of computing each displacement vector and hence the trajectory custom-character _0:N(i, j)? In an ideal error-free world, yes. But . . . .

We shall see how the unavoidable optical flow estimation inaccuracies lead to errors in the estimated displacements. Let us call d_0,m+1(i,j) the true displacement vector and {circumflex over (d)}_0,m+1(i,j) an estimation of it. Likewise we use the notation to indicate any estimated error-prone quantity. For a given iteration of (1) we can express the estimation error ξ_m+1={circumflex over (d)}_0,m+1(i,j)−d_0,m+1(i,j) as

$\begin{matrix} \begin{matrix} ξ_{0, m + 1} = {\hat{d}}_{0, m} (i, j) - d_{0, m} (i, j) + {\hat{v}}_{m, m + 1} ((i, j) + {\hat{d}}_{0, m} (i, j)) - \\ v_{m, m + 1} ((i, j) + d_{0, m} (i, j)) \\ = ξ_{0, m} + {\hat{v}}_{m, m + 1} (x_{m} + ξ_{0, m}) - v_{m, m + 1} (x_{m}) \\ = ξ_{0, m} + v_{m, m + 1} ({\hat{x}}_{m}) - v_{m, m + 1} (x_{m}) + δ_{m, m + 1} ({\hat{x}}_{m}) \end{matrix} & (2) \end{matrix}$

with x_m=(i,j)+d_0,m(i,j) and where δ_m,m+1(•) accounts for the input optical flow estimation error such that {circumflex over (ν)}_m,m+1(x)={circumflex over (ν)}_m,m+1(x)+δ_m,m+1(x). Here we distinguish three types of terms:

- An error propagation term ξ_0,mwhich stands for the accumulation of displacement error along the trajectory.
- A noise term δ_m,m+1({circumflex over (x)}_m) which is an error inherent to the estimation of the instantaneous motion maps and is always present.
- A motion bias term v_m,m+1({circumflex over (x)}_m)−v_m,m+1(x_m), which reflects the bias in the current displacement computation given by the fact that the current estimated position is different (by ξ_0,m) from the true one.

The two first terms are inherent to the process of integration and elementary motion estimation and thus, they cannot be avoided nor neglected. On the other hand, it is interesting to analyze the motion bias term. We first define the relative motion bias magnitude as

$\begin{matrix} B_{m, m + 1} (x_{m}, x_{m} + ξ_{0, m}) = \frac{ v_{m, m + 1} (x_{m} + ξ_{0, m}) - v_{m, m + 1} (x_{m}) }{ v_{m, m + 1} (x_{m}) } \leq \sup_{y \in Ω} \frac{ v_{m, m + 1} (y) - v_{m, m + 1} (x_{m}) }{ v_{m, m + 1} (x_{m}) } & (3) \end{matrix}$

Note that ∥ξ_0,m∥ is in general an increasing value (as the position estimation error inevitably increases along the sequence) and thus this bound cannot be tightened. In other words, as ∥ξ_0,m∥ is not bounded, the motion bias term can be arbitrarily large, only limited by the maximum flow difference between two (possibly distant) image points. This undesirable behavior is the cause of the ubiquitous position drift observed in dense optical-flow-based tracking algorithms, independently of the flow estimation precision. What equation (3) states is that even small errors introduced by δ_m,m+1may lead to an unbounded drift. How to radically reduce this drift is the concern of what follows.

Surprisingly, we can dramatically reduce the drift effect if we proceed differently while integrating the input optical flow fields. Consider the following iteration for computing d_n,m, (i,j)

d
_n,m(i,j)=ν_n,n+1(i,j)+d_n+1,m((i,j)+ν_n,n+1(i,j)) (4)

for n=m 1, . . . , 0, so that one pass for the index n finally gives the displacement field d_0,m. Let us discuss the differences between (1) and (4). Euler's method starts at the reference I₀and performs the motion accumulation in the sense of motion providing a sequential integration. Meanwhile, what we call inverse integration starts from the target image I_mand recursively computes the displacement fields back to the reference image, in a non-causal manner. Note that in (1) a previously estimated displacement value is accumulated with an interpolation of the elementary motion field, which introduces both an error due to the noisy field ν_m,m+1itself and an error due to evaluating ν_m,m+1at a position biased by the current accumulated drift. In (4), on the other side, an elementary flow vector is accumulated with an interpolation now of a previously estimated displacement value. However, the difference is that in this second case, the drift is limited to that introduced by ν_n,n÷1(i,j)

FIG. 1
a illustrates motion integration strategies through Euler integration method according to prior art. Euler integration method also called direct integration method performs the estimation by sequentially accumulating the motion vectors in the sense of the sequence, that is to say from the first image I₀to last image I_m.

FIG. 1
b illustrates motion integration strategies through inverse integration method according to a method disclosed in an international patent application PCTEP13050870 filed on Jan. 17, 2013 by the applicant. The inverse integration performs the estimation recursively in the opposite sense from the last image to first image.

Effectively, for n=0 we have

ξ_0,m=δ_0,1(i,j)+d_1,m((i,j)+{circumflex over (ν)}_0,1(i,j))+ε_1,m((i,j)+{circumflex over (ν)}_0,1(i,j))−d_1,m((i,j)+{circumflex over (ν)}_0,1(i,j)) (5)

In this case, as δ_0,1(i, j) corresponds to the error term in the estimated optical flow {circumflex over (ν)}_0,1(i,j), we can assume that ∥δ_0,1(i,j)∥ is kept small (it is not an increasing accumulated error as ξ_0,min (3) and thus for the motion bias we have

$\begin{matrix} B_{0, m} (x_{1}, x_{1} + δ_{0, 1} (i, j)) = \frac{ d_{1, m} (x_{1} + δ_{0, 1} (i, j)) - d_{1, m} (x_{1}) }{ d_{1, m} (x_{1}) } \leq \sup_{y \in ρ (x_{1})} \frac{ d_{1, m} (y) - d_{1, m} (x_{1}) }{ d_{1, m} (x_{1}) } & (6) \end{matrix}$

with ρ(x₁) a ball of radius ∥δ_0,1(i,j)∥ centered at x₁=(i,j)+ν_0,1(i,j). Assuming continuous displacement fields d_n+1,Nand small elementary motion estimation error ∥δ_0,1(i,j)∥, ∥d_1,m(y)−d_1,m(x₁)∥ is bounded as well as B_0,m.

We have attained a highly desirable property, by changing the way of integrating the same input optical flows: the bias introduced at each integration step does not diverge anymore.

We now analyze the behavior of the two integration methods in trajectory estimation, by studying the case of stationary affine motion models perturbed by zero-mean Gaussian noise. We assume elementary motion fields of the form ν_m,m+1(x)=Ax+b and the estimated fields are ν_m,m+1(x)=d_m,m+1(x_m)+r_mwith r_m≡ custom-character (0, σ²I). The same input fields are used for estimating trajectories using both methods.

In the case of Euler's integration the application of equation (1) is straightforward, by iterating over m=1 . . . N. For the inverse integration method, equation (4) is repeated for each m: 1 . . . N and n:m−1 . . . 0, so as to obtain the series of displacement fields d_0,m. We have tested three different affine models: a rotational motion, a divergent motion and the zero motion. FIGS. 2a, 2b, 2c illustrates estimated trajectories for Euler's method and inverse method for noisy synthetic affine motion fields and FIGS. 3a, 3b, 3c illustrates the results for Euler's method and inverse method. Results show significant improvements in the estimated positions for the inverse method. FIG. 2a illustrates estimated trajectories for rotational motion for Euler's method (blue) and inverse method (green) with respect to ground truth (red). FIG. 2b illustrates estimated trajectories for divergent motion for Euler's method (blue) and inverse method (green) with respect to ground truth (red). FIG. 2c illustrates estimated trajectories for zero motion for Euler's method (blue) and inverse method (green) with respect to ground truth (red). All three different affine models being perturbed by noise of variance σ²=4. FIG. 3a illustrates position square error through time for rotational motion for Euler's method (blue) and inverse method (green). FIG. 3b illustrates position square error through time for divergent motion for Euler's method (blue) and inverse method (green). FIG. 3c illustrates position square error through time for zero motion for Euler's method (blue) and inverse method (green).

The behavior depicted by the simulations can be predicted by analyzing the stability of each integration method by recoursing to the theory of dynamical systems. For simplicity, let us consider ν_m,m+1(x)=Ax ∀m:0 . . . N−1. Then the true displacement fields are d_0,m+1(x)=((A+I)^m+1−I)x and for Euler's method ξ_0,m+1(x₀)|_Euler=(A+I)·ξ_0,m(x₀)|_Euler+r_mwhile for the inverse integration approach ξ_0,m+1(x₀)|_Inv=(A+I)^m·r₀+ε_1,m+1(x₁)|_Inv. Essentially, Euler's method error equation is stable if all the eigenvalues λ_iof A lie inside the unit circle centered at −1 in the complex plane (i.e. |1+λ_i|<1), and possibly unstable (the error may diverge) otherwise. Meanwhile, the inverse approach defines a linear model with transition matrix equal to the identity and driven by the motion estimation errors r_m. Though it is not an asymptotically stable system around the zero-error equilibrium point (i.e. ∥ξ_0,m+1(x₀)|_Inv∥→0 does not hold), it is always stable in the sense of Lyapunov (or just stable, loosely ∥ξ_0,m+1(x₀)|_Inv∥<ε, for some ε>0, ∀m). The error depends only on the accumulation of instantaneous motion estimation errors, but shows no unstable behavior. Concretely, a divergent field (R(λ_i)>0), a rotational field (|1+λ_i|=1) or the zero-field (λ_i=0→|1+λ_i|=1) are not well handled by the Euler method. For the case of the inverse method, we must emphasize that our analysis does not imply a zero-error or the absence of error accumulation, but a more robust dynamic behavior. Besides, it also appears that it implicitly performs a temporal filtering of the trajectory as observed in the figures.

Finally, in the general case of an arbitrary motion model, and thanks to the Grobman-Hartman theorem (known from C. Robinson in “Dynamical Systems: Stability, Symbolic Dynamics, and Chaos”, Studies in Advanced Mathematics, CRC Press, 2nd edition 1998) we can study the behavior of both methods by regarding the linear approximations of (1) and (4) around an equilibrium point. This may lead to the problem of analyzing time-varying linear systems, for which it is not trivial to determine its stability properties. However we believe one can still obtain useful and analogous conclusions about the behavior of the error function by applying the theory of time-invariant systems.

Within the universe of dense point correspondence estimation we have distinguished two different scenarios, tightly bonded together but also to the concrete application one needs to deal with. Let us leave apart for an instant our concern about high accuracy displacement field estimation, and focus on the way we represent the information. Given a reference image, say I₀, we might want to determine either:

- From-the-reference correspondences, that is, for all the grid locations of the reference image we seek for their position at each frame of the sequence. This is equivalent to the point tracking problem which is a key component in applications such as object tracking, trajectory clustering, long term object segmentation, activity recognition etc. FIG. 4a illustrates such from-the-reference correspondence point scheme wherein from-the-reference scheme corresponds to the problem of determining the position of each initial grid point in the reference frame, along the sequence, i.e. along the trajectories.
- To-the-reference correspondences, that is, for all grid locations of all the frames of the sequence, determine their position in the reference image. We call this the problem of point retrieving. Such representation is more suitable for problems related to propagating information present at a key-frame to the rest of the sequence. For example, graphic elements insertion, video inpainting, user-assisted video editing, disparity propagation, view synthesis, video volume segmentation. In this context, to-the-reference correspondences guarantee that every pixel of every frame is matched with the reference from which one retrieves the desired information. FIG. 4b illustrates to-the-reference correspondence point scheme. To-the-reference corresponds to determining the position in the reference image of each grid point of each image of the sequence.

As illustrated on FIGS. 4a and 4b, each of the mentioned scenarios has a natural representation in terms of displacements fields. Point tracking (from-the-reference) is compactly represented by d_0,m(i,j) ∀m: 1 . . . N while for point retrieving (to-the-reference) it is more natural to deal with d_n,0, (i,j) ∀n:N . . . 1.

Now returning to the motion integration methods discussed above, one would ask which is the best option, not only in terms of accuracy, but also ease of implementation with regard to the reference (from or to), computational load, memory requirements and of course, concrete application-related issues.

Thus, from-the-reference scheme presents the following characteristics for each integration methods:

- Unknown fields: d_0,m(i, j) ∀m: 1 . . . N
- Ease of implementation: Each iteration of Euler's integration equation naturally generates the trajectory in a sequential manner. Inverse integration needs one whole pass for each m.
- Accuracy: Euler low, Inverse high
- Computational load: Euler 0(NP), Inverse 0(N²P)
- Memory: Euler low, Inverse high

Thus to-the-reference scheme presents the following characteristics for each integration methods:

- Unknown fields: d_n,0(i, j) ∀n:N . . . 1
- Ease of implementation: Inverse method needs only one pass of the process. Euler's method need to initiate a trajectory for each point at each image of the sequence.
- Accuracy: Euler low, Inverse high
- Computational load: Euler 0(N²P), Inverse 0(NP)
- Memory: Euler low, Inverse medium

On the other side, a trajectory-based (from-the-reference) representation of point correspondences seems to be more natural for capturing spatio-temporal features of a point along the sequence as there is a direct (unambiguous) association between points and the path they follow. Consequently, refinement tasks as trajectory based filtering are easier to formulate. Meanwhile, to-the-reference fields do not directly provide such spatio-temporal information but can be efficiently and more accurately estimated. The question is then how to combine both representations which essentially can be formulated as how to pass from one representation to the other in order to efficiently exploit their benefits.

Considering the reference frame I₀we call forward the from-the-reference displacements fields d_0,nand backward the to-the-reference displacement fields d_n,0. The set of forward vectors d_0,n(x) that give the position of pixel x in the frames n describe its trajectory along the sequence. On the other hand, backward fields d_n,0, have been estimated independently and carry consensual, complementary or contradictory information. Forward and backward displacement fields can be advantageously combined in particular to detect inconsistencies and occlusions (this is widely used in stereo vision and for example disclosed by G. Egnal and R. Wildes in “Detecting binocular half-occlusions: empirical comparisons of five approaches”, PAMI, 24(8) 1127-1133, 2002). In addition, one can highlight the interest of combining both approaches in a refinement step as each one can constrain the other. In this section, both forward and backward displacement fields are combined in order to be mutually improved while taking into account the trajectory aspect.

FIG. 5 illustrates the iterative filtering processing according to an embodiment of the invention. The first step 51 is occlusion detection that identifies the vectors of pixels that have no correspondence in the other view. These vectors are then discarded in the filtering process. Inconsistency between forward and backward vector fields is then evaluated in the second step 54. Both forward and backward fields are then jointly updated via a multilateral filtering 55. All the pairs {I₀, I_n} are processed similarly. The whole process is iterated up to fields stability.

Occlusions are detected and taken into account in the filtering process. For this sake, the forward 52 (respectively backward 53) displacement field at the reference frame I₀(respectively, I_n) is used to detect occlusions at frame I_n(respectively, I₀). The occlusion detection method (called OCC by Egnal) works as follows: addressing the detection of those pixels in frame I₀that are occluded in frame I_n, one considers the displacement map {tilde over (d)}_n,0(x) and scans the image I_n, to identify for each pixel via its displacement vector, the corresponding position in frame I₀. Then the closest pixel to this (probably) non-grid position in frame I₀is marked as visible. At the end of this projection step, the pixels that are not marked in frame I₀are classified as occluded in frame I_n.

Moreover, inconsistency value is evaluated between forward and backward displacement fields on the non-occluded pixels. It provides a way to identify unreliable vectors. After the first process iteration, the filtering is limited to the vectors which inconsistency value is above a threshold.

In the third step 55, for each frame pair {I₀, I_n}, forward and backward displacement fields d_0,nand d_n,0are jointly processed via multilateral filtering. Moreover, the “trajectory” aspect of the forward fields is considered via two ways. First, in addition to generally used weights, a trajectory similarity weight is introduced that replaces classical displacement similarity often introduced when two vectors are compared. Second, 2D filtering is extended to 2D+t along the trajectories.

Each updated vector 56 results from a weighted average of neighboring forward and backward vectors at frame pair {I₀, I_n} and also forward vectors d_0,m(mε[n−Δ,n+Δ]) at frame pairs {I₀, I_m}. Updated forward displacement vector {tilde over (d)}_0,n(x) is obtained as follows:

${\tilde{d}}_{0, n} (x) = \frac{\sum_{m = n - Δ}^{m = n + Δ} \sum_{y \in ℱ_{{x}}} w_{traj}^{xy} w_{0, m}^{xy} d_{0, m} (y) - \sum_{y \in ℱ_{{z}}} w_{n, 0}^{zy} d_{n, 0} (y)}{\sum_{m = n - Δ}^{m = n + Δ} \sum_{y \in ℱ_{{x}}} w_{traj}^{xy} w_{0, m}^{xy} + \sum_{y \in ℱ_{{z}}} w_{n, 0}^{zy}}$

where custom-character _{x} is a spatial window centered at x and w_0,m^xyis a weight that links points x and y at frame I₀. Similarly, _{z} is a spatial window centered at z=x+d_0,n(x) and w_n,0^zyis a weight that links points z and y at frame I_n. The weight w_s,t^uvassigned to each displacement vector d_s,t(y) is defined as:

$\begin{matrix} w_{s, t}^{uv} = ρ_{st} \times e^{- γ^{- 1} Γ_{uv}^{2} - ϕ^{- 1} Φ_{uv, s}^{2} - θ^{- 1} Θ_{v, st}^{2}} & (7) \end{matrix}$

with:

Γ_uvis the Euclidean distance between locations u and v:

Γ_uv=∥u−ν∥₂ (8)

The color similarity Φ_uv,sbetween pixels u and ν in I_sis defined as follows:

$Φ_{uv, s} = \sum_{c \in {r, g, b}} \langle I_{s}^{c} (u) - I_{s}^{c} (v) \rangle$

The matching cost Θ_ν,stis:

Θ_ν,st≡Θ_s,t(ν,d_s,t(ν))=Σ_cε{r,g,b}|I_s^c(ν)−I_t^c(ν+d_s,t(ν))| (9)

ρ_stis a binary value that takes into account the occlusion detection as follows:

$ρ_{st} = {\begin{matrix} 0 if pixel y at frame I_{s} is occluded in frame I_{t} \\ 1 else \end{matrix}$

The weight

$w_{traj}^{xy} = e^{- ψ^{- 1} Ψ_{xy}}$

refers to the similarity measurement between the trajectories that support the two currently compared forward vectors. This trajectory similarity is defined as follows:

$Ψ_{xy} = \sum_{m = n - δ}^{m = n + δ} { d_{0, m} (x) - d_{0, m} (y) }_{2}$

Similarly, updated backward displacement vector {tilde over (d)}_n,0(x) is obtained as follows:

${\tilde{d}}_{n, 0} (x) = \frac{\sum_{y \in ℱ_{{x}}} w_{n, 0}^{xy} d_{n, 0} (y) - \sum_{m = n - Δ}^{m = n + Δ} \sum_{y \in ℱ_{{z}}} w_{traj}^{zy} w_{0, m}^{zy} d_{0, m} (y)}{\sum_{y \in ℱ_{{x}}} w_{n, 0}^{xy} + \sum_{m = n - Δ}^{m = n + Δ} \sum_{y \in ℱ_{{z}}} w_{traj}^{zy} w_{0, m}^{zy}}$

where custom-character _{x} and _{z}are windows defined respectively in frames I_naround x and I₀around z=x+d_n,0(x).

FIG. 6 represents a diagram illustrating the sequential steps of the filtering method according to an embodiment of the invention. An input set 61 of forward or from-the-reference displacement fields is provided at the initialisation of the method. A sequential loop is performed on images of the video sequence. In an advantageous embodiment, displacement field for consecutive images in the video sequence is generated for instance starting from the image I_Iadjacent to the reference image I₀and following the order I₀, I₁, . . . to I_Nfor the from-the-reference variant. Thus filtered displacement vectors for intermediary images that are temporally placed between the reference image I₀and the current image I_Nare available for the filtering of displacement vectors of the next image I_N+1. In a first step 62, filtering (preferentially in parallel) for each pixel X of the reference frame is performed in order to generate a motion field for the whole current image I_N. In this first filtering, as previously disclosed, a 2D filtering is extended along the trajectories by introducing temporal filtering. Thus, in a step 621, temporally neighboring images I_m(mε[n−Δ, n+Δ]) are determined while in a step 622 spatially neighboring pixel y from pixel x resulting in a spatial window custom-character _{x} centered at x are determined. From this information, neighboring displacement vectors d_0,m(y) are determined from temporal and spatial window. FIG. 8 illustrates the neighboring images and pixels for the filtering method. Besides in this first filtering step 62 and as previously disclosed, a trajectory similarity weight w_traj^xyis introduced that replaces classical displacement similarity often introduced when two vectors are compared. This similarity weight is computed in a step 624 by computing a distance between a trajectory from the pixel x and a trajectory from the neighboring pixel y. Finally in a step 625, the weighted sum of neighboring displacement vectors is performed producing updated forward displacement vector {tilde over (d)}_0,n(x) 63.

In a second filtering step 65, a joint filtering of backward and forward displacement vector is performed. In a first variant, filtered updated forward displacement vectors {tilde over (d)}_0,n(y) 63 and backward displacement vectors d_n,0(y) 64 are processed to produce a filtered forward displacement vector {tilde over (d)}_0,n(x) 66. In a second variant, filtered updated forward displacement vectors {tilde over (d)}_0,n(y) 63 and backward displacement vectors d_n,0(y) 64 are processed to produce a filtered backward displacement vector {tilde over (d)}_n,0(x) 66. The filtered from-the-reference displacement vectors {tilde over (d)}_0,n(y) 63 are considered for pixels y belonging to the spatial window custom-character _{x} centered at x. While the to-the-reference displacement vectors d_n,0(y) 64 are considered for pixels y belonging to the spatial window _{z} centered at z=x+d_0,n(x) that is the endpoint location in the image I_nresulting from from-the-reference displacement vector d_0,n(x) for pixel x of I₀. FIG. 8 also illustrates the neighboring pixels y of pixel z for the joint backward forward filtering method. This second filtering step 65 produces a filtered motion vector, also noted {tilde over (d)}_0,n(x) (or {tilde over (d)}_n,0(x) for the second variant) 66. In a refinement, the second filtering step 65 generates a filtered from-the-reference displacement field, and in a second pass the second filtering step 65 generates a filtered to-the-reference displacement field 66.

Once the filtering steps 62, 65 are processed, advantageously in parallel, for each pixel of current image, the spatio-temporal filtered motion field 66 is memorized. The filtered motion field is then a motion field available for the filtering of motion field of the next frame to be processed or for a second pass of the algorithm as disclosed in FIG. 5.

The skilled person will also appreciate that as the method can be implemented quite easily without the need for special equipment by devices such as PCs. According to different variant, features described for the method are being implemented in software module or in hardware module. FIG. 7 illustrates schematically a hardware embodiment of a device 7 adapted for generating motion fields. The device 7 corresponds for example to personal computer, to a laptop, to a game console or to any image processing unit. The device 7 comprises following elements, linked together by an address and data bus 75:

- a microprocessor 71 (or CPU);
- a graphical card 72 comprising:
  - several graphical processing units 720 (GPUs);
  - a graphical random access memory 721;
- a non volatile memory such as ROM (Read Only Memory) 76;
- a RAM (Random Access memory) 77;
- one or several Input/Output (IO) devices 74, such as for example a keyboard, a mouse, a webcam, and so on;
- a power supply 78.

The device 7 also comprises a display device 73 such as a display screen directly connected to the graphical card 72 for notably displaying the rendering of images computed and composed in the graphical card for example by a video editing tool implementing the filtering according to the invention. According to a variant, the display device 73 is outside the device 7.

It is noted that the word “register” used in the description of memories 72, 76 and 77 designates in each of the memories mentioned, a memory zone of low capacity (some binary data) as well as a memory zone of large capacity (enabling a whole programme to be stored or all or part of the data representative of computed data or data to be displayed).

When powered up, the microprocessor 71 loads and runs the instructions of the algorithm comprised in RAM 77.

The memory RAM 57 comprises in particular:

- in a register 770, a “prog” program loaded at power up of the device 7;
- data 771 representative of the images of the video sequence and associated displacement fields.

Algorithms implementing the steps of the method of the invention are stored in memory GRAM 721 of the graphical card 72 associated to the device 7 implementing these steps. When powered up and once the data 771 representative of the video sequence have been loaded in RAM 77, GPUs 720 of the graphical card load these data in GRAM 721 and execute instructions of these algorithms under the form of micro-programs called “shaders” using HLSL language (High Level Shader Language), GLSL language (OpenGL Shading Language) for example.

The memory GRAM 721 comprises in particular:

- in a register 7210, data representative of spatial window _{x} centered at x;
- displacement vectors for the spatial window _{x} centered at x for temporal segment [n−Δ, n+Δ] 7211;
- the similarity weight 5213 computed for each displacement vectors stored in 7212;
- Forward displacement vectors for the spatial window _{x} centered at x 7213;
- Forward and backward displacement vectors for the spatial window _{z} centered at z 7214.

According to a variant, the power supply is outside the device 7.

The invention as described in the preferred embodiments is advantageously computed using a Graphics processing unit (GPU) on a graphics processing board.

The invention is also therefore implemented preferentially as software code instructions and stored on a computer-readable medium such as a memory (flash, SDRAM . . . ), said instructions being read by a graphics processing unit.

The foregoing description of the embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above teaching. It is therefore intended that the scope of the invention is not limited by this detailed description, but rather by the claims appended hereto.

FILTERING A DISPLACEMENT FIELD BETWEEN VIDEO FRAMES

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information