The invention relates to a method for detecting motion at a temporal intermediate position between previous and next images, in which a criterion function for candidate vectors is optimised, said function depending on data from both previous and next images and in which the optimising is carried out at the temporal intermediate position in non-covering and non-uncovering areas.
The basic observation from which the current invention results is that an estimator estimating motion between two successive pictures from a video sequence, cannot perform well in areas where covering or uncovering occurs, as it is typical for these areas that the information only occurs in either of the two impages. Block matchers, as a consequence will always find large match errors even for the correct vector.
The invention has the object to remove the above-cited problems, in general problems in calculating vector fields from previous and/or next images.
According to the invention this object is achieved in that the optimising is carried out at the temporal position of the next image in covering areas and at the temporal position of the previous image in uncovering areas.
Furthermore, the invention relates to an apparatus for detecting motion between previous and next images, comprising means for optimising a criterion function for candidate vectors, said function depending on data from both previous and next images and in which the optimising is carried out at the temporal intermediate position in non-covering and uncovering areas, wherein means for detection covering or uncovering areas are provided and that the optimising is carried out at the temporal position of the next image in covering areas and at the temporal position of the previous image in uncovering areas. Embodiments of the method and apparatus according to the invention are specified in sub-claims.
The invention will be described in more detail with reference to the attached drawings.
The invention can be applied in various technical fields. Examples are: picture rate conversion, where an improved vector field according to the invention results in a more pleasing, artefact free video stream. Typical devices for this application are TV and PC.
3D disparity analysis, where images are generated by a rolling camera or from a rolling scene.
motion based video compression where an improved vector field according to the invention results in higher quality predictions and therefore in higher compression ratio or improved picture quality. An example is compression in MPEG.
motion based image analyses where an improved vector field according to the invention results in a more faithful extraction of objects and hence in an easier post-processing. Examples are: security camera video analyses, video special effects and traffic analyses.
picture enhancement in television techniques e.g. avoidance of blurring of background.
scientific application, where an improved vector field according to the invention results in a better data analyses. Examples are satellite photos of clouds and oceanography.
It is recognised, that in the case of covering all information in the current picture is present in the previous picture, while in the event of uncovering the next picture(s), while in an area of uncovering, the current picture contains all information of the previous one (locally around the uncovering area). Ergo, by modifying the match error calculation, controlled by a covering/uncovering detector, e.g. the one disclosed in WO 00/11863 from matching the current picture with the motion compensated previous picture (covering) to matching the previous picture with the motion compensated current picture (uncovering), any ambiguity in the estimator can be prevented. This is expected to yield more accurate and consistent vector fields and therefor reduced halo. This modification shall be elaborated a later part of the specification.
As a side effect of this dynamically changing match calculation, the resulting unambiguous vector field is no longer valid for one moment in time, let alone the moment where the upconversion takes place, but this ‘validity moment’ changes depending on the area under consideration being a covering, uncovering or simple rigidly moving area. This effect can be eliminated with a second ingredient of this disclosure, described in a later part of the specification.
Now the improved motion vector calculation using a full-search block-matching motion estimator algorithm to calculate the motion vectors, will be elucidated with reference to
In block-matching motion estimation algorithms, a displacement vector is assigned to the centre
of a block of pixels B({right arrow over (X)}) in the current picture n by searching for a similar block within a search area SA({right arrow over (X)}), also centred at {right arrow over (X)}, but in the previous picture n−1. The similar block has a centre, which is shifted with respect to {right arrow over (X)} over the displacement vector {right arrow over (D)}({right arrow over (X)},n). To find {right arrow over (D)}({right arrow over (X)},n), a number of candidate vectors {right arrow over (C)} are evaluated by applying an error measure ({right arrow over (C)},{right arrow over (X)},n) to quantify block similarity. More formally, CSmax is defined as the set of candidates {right arrow over (C)}, describing all possible (usually integer) displacements with respect to {right arrow over (X)} within the search area SA({right arrow over (X)}) in the previous image:
CSmax={{right arrow over (C)}|−N≦Cx≦N,−M≦Cy≦M} (1)
where N and M are constants limiting SA({right arrow over (X)}). A block B({right arrow over (X)}) centred at {right arrow over (X)} and of size X by Y, consisting of pixel positions
in the present picture n, is defined:
B({right arrow over (X)})={{right arrow over (x)}|Xx−X/2≦x≦Xx+X/2 Xy−Y/2≦y≦Xy+Y/2} (2)
The displacement vector {right arrow over (D)}({right arrow over (X)},n) resulting from the block-matching process, is a candidate vector {right arrow over (C)} which yields the minimum value of an error function ({right arrow over (C)},{right arrow over (X)},n):
{right arrow over (D)}({right arrow over (X)},n)∈{{right arrow over (C)}∈CSmax|ε({right arrow over (C)},{right arrow over (X)},n)≦ε({right arrow over (V)},{right arrow over (X)},n)∀{right arrow over (V)}∈CSmax} (3)
If, which is the common case, the vector {right arrow over (D)}({right arrow over (x)},n) with the smallest matching error is assigned to all pixel positions x in the block B({right arrow over (X)}):
∀{right arrow over (x)}∈B({right arrow over (X)}):{right arrow over (D)}({right arrow over (x)},n)={right arrow over (D)}({right arrow over (x)},n) (4)
rather than to the centre pixel only, a large reduction of the number of computations is achieved.
As an implication, consecutive blocks (B({right arrow over (X)}) are not overlapping. The error value for a given candidate vector {right arrow over (C)} is a function of the luminance values of the pixels in the current block and those of the shifted block from a previous picture, summers over the block (B({right arrow over (X)}). A common choice, which we too shall use, is the sum of the absolute differences (SAD):
ε({right arrow over (C)},{right arrow over (X)},n)=Σ{right arrow over (x)}∈B({right arrow over (X)})|F({right arrow over (x)}−α{right arrow over (C)},n−1)−F({right arrow over (x)}+(1−α){right arrow over (C)}, n)| (5)
where α is a constant, 0≦α≦1, determining the temporal position between the two pictures, where the vector field has to be valid.
Similarly, from
The invention is based on the insight that in the case of e.g. covering, all pixel blocks present in and around the occlusion area in the future picture can also be found in the past picture, however, in the past picture there are extra blocks, which no longer exist in the future picture, because they become covered. A natural position to put the reference position α for block matching is at the temporal position of the future picture, since then for all blocks in and around the occlusion area a correct motion vector can in principle be found.
The improvement proposed, results from the observation that in areas where covering occurs, correct motion vectors can only be estimated with an estimator using α=1. Similarly, for uncovering only correct vectors are found with an estimator using α=0.
Vector estimators calculating vectors for intermediate temporal instances have difficulties with both covering and uncovering, be it that their worst case area of ambiguity is smaller than for the extreme estimators. (The worst case ambiguity is least for the estimator applying α=0.5). The
The first step is improving the block matcher for covering and uncovering situations, regardless the required value of α for the interpolated picture results by changing equation 5 to:
∈c({right arrow over (C)},{right arrow over (X)},n)=Σ{right arrow over (x)}∈B({right arrow over (x)})|F({right arrow over (x)}−{right arrow over (C)},n−1)−F({right arrow over (x)},n)| (6)
in case an area of covering as indicated by the covering/uncovering detector, e.g. the one described in WO 00/11863 and to
∈c({right arrow over (C)},{right arrow over (X)},n)=Σ{right arrow over (x)}∈B({right arrow over (x)})|F({right arrow over (x)},n−1)−F({right arrow over (x)}+{right arrow over (C)},n)| (7)
in the event of uncovering.
In non-occlusion areas we take e.g. α=0.5 and project the blocks bidirectionally from both the past and future pictures to the reference block position, i.e. compare them at the temporal position interpolated picture. For covering blocks we take α=1 and match a block in the future picture in its own position with a block fetched with the candidate motion vector from the past picture. In uncovering areas of the picture we take α=0, in other words we matched the blocks in past picture with the blocks from the future picture.
There are a number of ways in which occlusion or more specific covering and uncovering regions can be detected.
The covering/uncovering detector may use a previous vector field, or a previous calculation (iteration) of the current vector field, or use methods based on evaluation of match errors calculated between neighbouring pictures.
A preferred covering/uncovering detector looks at the signs of the velocities. If e.g. to neighbouring velocities point towards each other we know that covering must occur. The other cases can be analysed accordingly and are summarised as follows
vU−vD>δ(covering)
vU−vD<−δ(uncovering)
in which vU is the velocity edge XE, vD is the velocity below the edge and 67 is a small constant to avoid noisy detections. This method is extremely robust since it does not rely on the specific numerical values of the vectors. One will always have a more or less correct vector value on either side of the edge and even if the values are incorrect their difference still could be correct. Secondly a large band around the predicted edge is “guarded” since the estimations for covering respectively uncovering behave like the classical estimation (equation 7) in foreground FG regions i.e. outside the ambiguity triangle. This provides robustness against position deviations of the edge XE.
The benefit gained from switching α to 0 or 1, is that no incorrect vectors result from the minimisation, but the price to be paid is that the vectors obtained are not always in the correct block positions. E.g. the foreground FG edge (the block position where the vector changes from the background BG to the foreground FG velocity) is found at the position XE, in stead of at its real position U (see
In fact the obtained vector field is not valid at any single time moment, but at three different time moments, depending on the position in the image, hence this method is called “tritemporal motion estimator”. In general the vector field has to be retimed to the desired intermediate position (e.g. α=0.5). In the specific case of a sequence in which the foreground FG object is stationary and the background BG moves behind it, no retiming of the tritemporal vector field is necessary, since the foreground FG edge does not move (XE=U). This happens often in film material where the cameraman tracks the main subject.
As a consequence of the proposed modification, there are no ambiguities for the motion estimator, although we have sacrificed the correct temporal instance where the vector field is valid. This shall be detailed in the following sub-section.
In a second step of the algorithm, the retimer, the time error is corrected. To this end, the vector field is ‘projected’ to the desired temporal instance, i.e. projected back in time for covering:
D*(({right arrow over (x)},n−1+α)=D({right arrow over (x)}+(1−α){right arrow over (D)}({right arrow over (x)},n),n) (8)
and forward for uncovering:
D*({right arrow over (x)},n−1+α)=D({right arrow over (x)}−α{right arrow over (D)}({right arrow over (x)},n−1),n−1) (9)
This projection reintroduces ambiguous areas, i.e. areas to which no vector is assigned. The origin of this ambiguity is that with one estimate it is unknown whether the discontinuity has moved along the line ‘a’, or along the line ‘b’ in
The retimer performs the following actions, it determines where the retiming should occur. It looks for a velocity edge XE and marks a sufficiently broad occlusion region around this edge. Second it calculates how much blocks exactly should be corrected, by rounding the foreground FG velocity to block precision. Third it determines which retimer correction action should be applied.
Furthermore the retimer determines the starting position {right arrow over (x)}A of the ambiguity area in the vector field at time n−1+α by projecting the foreground velocity from the vector edge {right arrow over (x)}E in the unambiguous vector field. A small refinement results if the position {right arrow over (x)}A of the edge in the intermediate vector field is not a result of shifting the estimates from one vector field, but is calculated as the weighted (with α) average of the positions in the current and previous vector field. The retimer then fills the space between {right arrow over (x)}A and {right arrow over (x)}E with either the foreground or background velocity, depending on the side of the foreground region (left or right of {right arrow over (x)}E) and the sign of the foreground velocity. For this strategy to work, a robust foreground/background determination strategy is needed.
It turns out that there are 8 different retimer cases depending on:
The first upper part of
The retimer needs to know which velocity around an object edge is foreground and which is background. With the aid of a previous motion vector field it could be determined which object regions or velocities belong to the foreground and which to the background. A background block is a block for which both the grey value pixels and the associated velocity vector disappear under covering. The correspondence of the vectors in foreground/background detectors are used, since they are judged to yield simpler, more reliable measures than pixel based measures.
In a first strategy shown in
and fetch the vector present at an intermediate position in the previous vector field (covering) in the ambiguous area:
If we need to fill in the foreground vector in the ambiguous area of the interpolation vector field, we choose between {right arrow over (x)}α and {right arrow over (x)}b the one which is most different from {right arrow over (D)}α({right arrow over (x)},n).
A variant of this first strategy fetches the background vector from the future for uncovering:
A second strategy (see
In case the two projections yield the same (foreground) vector, we have a certain determination. If this vector equals the vector of the upper block, this vector is the foreground velocity vector and vice versa. In case the vectors are different, the method was unsuccessful, and yields an uncertain determination. A similar projection towards the future can be applied for the case of covering.
A third strategy (see
and vice versa. Care should be taken that the velocities in n−1 are the same velocities as in n, since other velocity edges can occur in the vicinity of the projection. Obviously again the principle can be applied by substituting uncovering for covering and the future for the past.
It should be noted that the strategies can be enhanced by incorporating the match errors. In case a crossing to the foreground region occurred, the match errors of the vector in that block should be low. In case we project to a background region that was erroneously allotted a foreground vector in the previous image, the errors should be higher.
In
Said image signal E is also supplied to means 2 for detecting covering/uncovering areas, the output of which is connected to the means 1 for optimising a criterion function for candidate vectors. Said means 2 for detecting covering/uncovering areas is for example disclosed in WO 00/1863.
The means 1 for optimising a criterion function for candidate vectors is provided with switching means controlled by means 2 for detecting covering/uncovering areas such, that the optimising by means 1 is carried out at the temporal intermediate position in non-covering and non-uncovering areas, whereas the optimising is carried out at the temporal position of the next image in covering areas and at the temporal position of the previous image in uncovering areas.
The previous image is shifted over a fraction α times the candidate vector and the next image is shifted over 1−α times the candidate vector and the fraction α may change within the image period. The above-mentioned criterion function is a match error which is minimised. Said match error is also a function of the fraction α.
The means for optimising the match error is arranged such that the fraction α is controlled by a covering/uncovering detector in the matching process. Preferable the fraction α is set to 1 in case of covering and set to 0 in case of uncovering.
The means 2 for detection covering/uncovering areas decides preferably on data in a previous image to the fraction α in the current estimation.
The image signal J is also applied to the foreground/background detector 4 the output of which controls the retimer 3. The output of the retimer 3 is connected to the means 1 for optimising the criterion function.
The retimer 3 determines a velocity edge XE in the image signal and marks an occlusion area around said edge. The retimer 3 controls the means 1 for optimising such that in said occlusion area a foreground velocity is replaced by a background velocity or reversibly dependent on the occlusion is a covering or uncovering area, the sign of the foreground velocity and on which side of the velocity edge XE foreground is.
In
The outputs of the calculating means 5 and 6 are connected to the inputs of calculating means 8, which calculates a third intermediate position between {right arrow over (x)}α and {right arrow over (x)}b.
The foreground/background detector of
which is most different from v{right arrow over (α)}v is filled in, in case a foreground vector v{right arrow over (F)}G should be filled in. According to a further elaboration the third intermedidate position is ({right arrow over (x)}α+{right arrow over (x)}b)/2.
Another embodiment of a foreground/background detector is shown in
Preferably a checking means 13 is connected to a projecting means 10 and 11, which means checks if the two projections yield the same vector. If so, the identification is certain.
In
Furthermore filling means could be connected to the testing means 20, which filling means fills the first (second) vector in those regions of the projected vector field in the environment of the discontinuity, to which no vector is projected, in case a foreground vector should be filled in and the other vector is filled in, in case a background vector should be filled.
In
Number | Date | Country | Kind |
---|---|---|---|
00201752 | May 2000 | EP | regional |
01200560 | Feb 2001 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
4890160 | Thomas | Dec 1989 | A |
6154519 | Florent et al. | Nov 2000 | A |
6192079 | Sharma et al. | Feb 2001 | B1 |
6195389 | Rodriguez et al. | Feb 2001 | B1 |
6480615 | Sun et al. | Nov 2002 | B1 |
6594313 | Hazra et al. | Jul 2003 | B1 |
20020186889 | De Haan et al. | Dec 2002 | A1 |
Number | Date | Country |
---|---|---|
WO 0011863 | Mar 2000 | WO |
Number | Date | Country | |
---|---|---|---|
20030206246 A1 | Nov 2003 | US |