The invention relates to a motion estimation unit for estimating a motion vector for a group of pixels of an image of a series of images.
The invention further relates to an image processing apparatus comprising:
The invention further relates to a method of estimating a motion vector for a group of pixels of an image of a series of images.
2-D motion estimation solves the problem of finding a vector field {right arrow over (d)}({right arrow over (x)},n), given two successive images f({right arrow over (x)},n−1) and f({right arrow over (x)},n) where {right arrow over (x)} is the 2-D position in the image and n is the image number, such that
f({right arrow over (x)},n−1)=f({right arrow over (x)}+{right arrow over (d)}({right arrow over (x+EE,n),n) (1) )}
2-D motion estimation suffers from the following problems:
Because of the ill-posed nature of motion estimation, assumptions are required about the structure of the 2-D motion vector field. A popular approach is to assume that the motion vector is constant for a block of pixels: model of constant motion in blocks. This approach is quite successful and used in for instance MPEG encoding and scan-rate up-conversion. Typically, the dimensions of the blocks are constant for a given application, e.g. for MPEG-2 the block size is 16×16 and for scan-rate up-conversion it is 8×8. This introduces the constraint that
{right arrow over (d)}({right arrow over (x)},n)={right arrow over (d)}({right arrow over (x)}′,n), ∀{right arrow over (x)}′εB({right arrow over (x)}) (2)
where B({right arrow over (x))} is the block of pixels at position {right arrow over (x)}=(x0,x1) i.e.
B({right arrow over (x)})={{right arrow over (x)}′|x′divβi=xidivβi, i==0,1} (3)
and β1 are the block dimensions.
The choice for a predetermined block size is a trade-off between spatial accuracy and robustness. For larger block sizes, motion estimation is less sensitive to noise, and the “aperture” is bigger, therefore, reducing the “aperture problem”. Hence, larger block sizes reduce the effect of two out of three problems. However, bigger block sizes reduce the spatial accuracy, i.e. one motion vector is assigned to all pixels of the block. Because of the trade-off between spatial accuracy and robustness it has been proposed to use variable block sizes. An embodiment of the motion estimation unit of the kind described in the opening paragraph is known from U.S. Pat. No. 5,477,272. In that patent a top-down motion estimation method is described, i.e. starting with the largest blocks. The motion vectors are first computed for the highest layer, which serves as an initial estimate for the next layer, and so on. Motion vectors are calculated for all blocks including those with the smallest possible block sizes. Hence the method is relatively expensive from a computing point of view.
It is an object of the invention to provide a motion estimation unit of the kind described in the opening paragraph which provides a motion vector field for variable sizes of groups of pixels of an image and which has a relatively low computing resource usage.
The object of the invention is achieved in that the motion estimation unit for estimating a motion vector for a group of pixels of an image of a series of images, comprises:
The motion estimation unit is designed to estimate motion vectors initially with relatively large groups of pixels, e.g. 32×32 pixels. After a motion vector has been estimated for the group, it is verified whether the motion vector is representative for the whole group of pixels. If this is not the case then the group of pixels is split into sub-groups. After splitting, motion vectors are also estimated for the sub-groups by applying the generating means, the matching means and the selecting means. If the test results in a positive result, i.e. the particular motion vector is appropriate, then the group of pixels is not split and the estimated motion vector is assigned to the pixels of the group of pixels. In this case no further motion estimation steps are required and hence no additional computer resource usage is needed.
In an embodiment of the motion estimation unit according to the invention the particular motion vector is the first one of the motion vector candidates. Preferably the measure which is used for the test is related to the motion vector candidate which is selected as the best matching motion vector.
In an embodiment of the motion estimation unit according to the invention the group of pixels corresponds to a block of pixels and the sub-groups of pixels corresponds to respective sub-blocks of pixels. The groups of pixels might form an arbitrary shaped portion of the image, but preferably the group of pixels corresponds to a block of pixels. This is advantageous for the design of the motion estimation unit.
In an embodiment of the motion estimation unit according to the invention, the testing means are designed to test whether a first one of the sub-block of pixels has to be split into further sub-blocks of pixels for which respective other motion vectors have to be estimated, similar to the motion vector being estimated for the block of pixels. Splitting the images into blocks and the blocks into sub-blocks, etcetera is repeated recursively. For the various blocks and sub-blocks, motion vectors are calculated.
In an embodiment of the motion estimation unit according to the invention the matching means are arranged to calculate the match error of the motion vector which corresponds to a sum of absolute differences between values of pixels of the block of pixels and respective further values of pixels of a further block of pixels of another image of the series of images. This match error is relatively robust and can be calculated with relatively few computer resource usage. It is common practice, to evaluate the validity of a candidate motion vector, {right arrow over (c)}, by calculating a match error ε. A popular criterion is the SAD, i.e.
This match error ε is minimized varying {right arrow over (c)} in order to obtain the best matching motion vector for the block {right arrow over (d)}({right arrow over (x)},n), i.e.
As can been seen in Equation 4, the match error calculations require the computation of a number of differences of values of pixels shifted over the motion vector. If the block dimensions are doubled in both directions, the number of differences of values of pixels increases with a factor four. However, the number of blocks decreases with a factor of four, so the number of calculations per image remains the same. Optionally sub-sampling is applied for the calculation of the match errors, i.e. only a portion of the pixels of a block are applied.
In an embodiment of the motion estimation unit according to the invention the measure related to the particular motion vector is based on a difference between the motion vector and a neighbor motion vector being estimated for a neighbor block of pixels in the neighborhood of the block of pixels. In this embodiment the splitting is based on the vector field inconsistency VI. That means that if the motion vectors locally differ more than a predetermined threshold then it is assumed that these motion vectors do not belong to one and the same object in the scene being captured, i.e. represented by the series of images. In that case the block should be split in order to find the edge of the object. At the other hand, the block does not have to be split any further if the neighboring blocks of pixels have the same, or hardly distinct motion vectors. In that case it is assumed that the blocks correspond to the same object.
In an embodiment of the motion estimation unit according to the invention the measure related to the particular motion vector is based on a difference between a first intermediate result of calculating the match error and a second intermediate result of calculating the match error, the first intermediate result corresponding to a first portion of the block of pixels and the second intermediate result corresponding to a second portion of the block of pixels. These intermediate results are also used as match errors for sub-blocks. Hence, computer resource usage is minimized.
In an embodiment of the motion estimation unit according to the invention the testing means are designed to test whether the block of pixels has to be split into the sub-groups of pixels, on basis of a dimension of the block of pixels. Another criterion to test whether the block should be split is the dimension of the block. This additional criterion enables flexibility in resource usage: if relatively much computing resources usage is allowed the splitting might be continued till fine grain blocks and if relatively little computing resources usage is allowed the splitting might be continued till coarse grain blocks. It should be noted that by adapting the threshold of the other criterion, i.e. measure, the granularity of blocks can be controlled too.
An embodiment of the motion estimation unit according to the invention comprises a merging unit for merging a set of sub-blocks of pixels into a merged block of pixels and for assigning a new motion vector to the merged block of pixels, by selecting a first one of the further motion vectors corresponding to the sub-blocks of the set of sub-blocks. Neighboring blocks are merged if they have motion vectors which are mutually equal or if the difference between their motion vectors is below a predetermined threshold. An advantage of merging is that memory reduction can be achieved for storage of motion vectors, since the number of motion vectors is reduced.
An embodiment of the motion estimation unit according to the invention comprises an occlusion detector for controlling the testing means. An advantage of applying an occlusion detector is that object boundaries can be extracted from the occlusion map being calculated by the occlusion detector. The splitting of blocks is relevant nearby object boundaries and less within objects. Hence, applying an occlusion detector to control the testing means is advantageous, because computing resource usage is reduced. Optionally the occlusion map being determined for an image is used for a subsequent image of the series.
An embodiment of the motion estimation unit according to the invention is arranged to calculate normalized match errors. An advantage of applying normalized match errors is the robustness of the motion estimation. Besides that the match errors are a basis for the test whether the block of pixels has to be split. Normalization results in being less sensitive for the content of the images.
It is a further object of the invention to provide an image processing apparatus of the kind described in the opening paragraph which provides a motion vector field for variable sizes of groups of pixels of an image and which has a relatively low computing resource usage.
This object of the invention is achieved in that the image processing apparatus comprises:
The image processing apparatus may comprise additional components, e.g. a display device for displaying the processed images. The motion compensated image processing unit might support one or more of the following types of image processing:
It is a further object of the invention to provide a method of the kind described in the opening paragraph which provides a motion vector field for variable sizes of groups of pixels of an image and which requires a relatively low computing resource usage.
This object of the invention is achieved in that the method of estimating a motion vector for a group of pixels of an image of a series of images, comprises:
Modifications of the motion estimation unit and variations thereof may correspond to modifications and variations thereof of the method and of the image processing apparatus described.
These and other aspects of the motion estimation unit, of the method and of the image processing apparatus according to the invention will become apparent from and will be elucidated with respect to the implementations and embodiments described hereinafter and with reference to the accompanying drawings, wherein:
Corresponding reference numerals have the same meaning in all of the Figs.
On the input connector 212 of the motion estimation unit 200 a series of images is provided. The motion estimation unit 200 provides a motion vectors at its output connector 214. Via the control interface 216 parameters which are related to the spitting, i.e. splitting criteria, can be provided. These parameters comprise the minimum dimensions of the blocks and thresholds for a measure which is related to the quality of the selected motion vector. Two examples of such a measure are described below. They will be referred to as “Variance of Quad-SAD”, var({right arrow over (ε)}({right arrow over (c)},{right arrow over (x)},{right arrow over (n)})) and “Vector Field Inconsistency”, VI. A combination of measures is preferred. That means e.g. that one possible criterion for splitting a block into four smaller blocks would be:
VI({right arrow over (x)})>Ts{circumflex over ( )}var(ε({right arrow over (d)},{right arrow over (x)},n))>Tv (6)
In words the “Vector Field Inconsistency” is higher than a first predetermined threshold Ts and the “variance of Quad-SAD” is higher than a second predetermined threshold Tv.
The “Vector Field Inconsistency” is related to the amount of difference between neighboring motion vectors. An example of the “Vector Field Inconsistency” is specified by means of Equation 7. In that case a particular motion vector is compared with four neighboring motion vectors. It will be clear that alternative approaches for calculating a “Vector Field Inconsistency” are possible: with more or with fewer neighboring motion vectors.
with βoh and β1h the block dimensions at the highest level and with the local vector average defined by Equation 8:
The “Variance of Quad-SAD” is specified by means of Equation 10. But first the Quad-SAD is specified in Equation 9. The so-called Quad-SAD, {right arrow over (ε)}({right arrow over (c)},{right arrow over (x)},n) corresponds to a combination of four SAD values. Or in other words, a block at position {right arrow over (x)} is divided into four blocks and for each quadrant of the block a SAD is calculated, i.e.
where the block at position {right arrow over (x)} is split into its quadrants with positions {right arrow over (x)}11, . . . , {right arrow over (x)}22i.e. four equally sized smaller blocks. The Quad-SAD can be derived from the SAD values without any additional computational cost. Then the “Variance of Quad-SAD” can be calculated by e.g.:
var({right arrow over (ε)}({right arrow over (c)},{right arrow over (x)},n))=|ε({right arrow over (c)},{right arrow over (x)}21,n)−ε({right arrow over (c)},{right arrow over (x)}22,n)|+|ε({right arrow over (c)},{right arrow over (x)}11,n)−ε({right arrow over (c)},{right arrow over (c)},{right arrow over (x)}21,n)|+|ε({right arrow over (c)},{right arrow over (x)}12,n)−ε({right arrow over (c)},{right arrow over (x)}22,n)| (10)
The basic idea behind the criterion as specified in Equation (6) is that the lowest level, i.e. small block sizes is required only near the edges in the vector field. Areas containing an edge in the vector field are characterized by a VI value above the threshold Ts. The presence of the edge is characterized by high SAD values for one part of the block and low values for other parts. Resulting in a large variation of the SAD values within the Quad-SAD.
However, this model is only valid if there is only one motion vector appropriate for the block, i.e. when splitting of the block is not required. Hence, Equation 11 can be applied to predict the expected SAD value. When the motion estimation has converged it is expected that the vector error VE is low, e.g. ½ pixel. If the SAD value is higher than the expected SAD value the block is split up. Hence the split criterion becomes:
where VAR({right arrow over (x)}) is e.g. given by:
with {right arrow over (e)}x and {right arrow over (e)}y unity vectors in x-direction and y-direction, respectively. Thus, the threshold in Equation 12 on the SAD value becomes the allowed vector error.
The motion estimation units 200, 201, 203, 205 as described in connection with the
The motion compensated image processing unit 306 requires images and motion vectors as its input.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be constructed as limiting the claim. The word ‘comprising’ does not exclude the presence of elements or steps not listed in a claim. The word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitable programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware.
Number | Date | Country | Kind |
---|---|---|---|
02076439.5 | Apr 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB03/01090 | 3/20/2003 | WO |