This invention relates to the field of image processing, in particular to motion estimation and its matching criterion. Motion estimation and scene change detection is needed for many applications like e.g. coding, segmentation.
Motion estimation of motion vectors for motion blocks is dependent on the local motion vectors, said local motion vectors describing the movement of pixels within two frames or areas, said frames or areas being successive in time. Depending on a matching criterion (also called cost function, goodness-of-fit), e.g. the SAD (summed absolute difference) value, specific local motion vectors are chosen and are used for said motion estimation to obtain said motion vectors.
Also other criteria can be calculated for said motion estimation. Due to disturbing facts which comprises similar picture elements, the applied matching criterion as said SAD can fail, and the failure possibility is even relative high.
Another problem involved in this subject is the scene change detection which suffers from the same problem of the reliability of the motion estimation and its matching criterion. With the method of the present invention, it is possible to conclude whether and to what extend there is a scene change detection within two frames.
Motion estimation techniques broadly fall into four classes: differential methods, phase correlation methods, block matching methods, and feature-based methods. Among them, block matching methods are probably the most widespread motion estimation technique. This is not only because the current video coding standards (like MPEG2, AVC) are block-based, but also because it is required by other motion estimation technique, e.g. the phase correlation methods. The phase correlation surface only indicates the possible motion candidates and possibly helps to reduce the motion search range. Finally, it is the block matching technique, which is also called image correlation or vector assignment in case of phase correlation motion estimation technique, which decides whether there is really motion, and where the motion vectors locate if applicable. It is well-known that the motion estimation technique has great influence on the motion estimation result.
It is an object of the present invention to provide a better motion estimation whose results are more reliable. To get better results from the motion estimation, one have to think of means of how to improve the potential of said results before starting said motion estimation. Moreover the preparation of the data being reliably used by the motion estimation should involve less calculation time. Also the image scene change detection of between blocks has to be improved and should be more reliable and meaningful.
The present invention relates to a method for obtaining at least one global motion vector from local motion vectors, said local motion vectors describing the motion of pixels of a region within two different images, said at least one global motion vector describing the motion of said region within said two images, said image processing method comprising steps of evaluating said local motion vectors based on at least one matching criterion, wherein one matching criterion is based on a result of a statistical estimation function regarding said local motion vectors, and whereby said step of evaluating concludes the reliability of said local motion vectors for motion estimation; processing said local motion vectors, whereby said step of processing comprises steps of selecting and/or deleting said local motion vectors according to their reliability; and performing said motion estimation technique with a least number of most reliable local motion vectors obtained by said processing step, whereby said number is dependent on said motion estimation technique.
Favorably said result of the statistical estimation function is based on a result of the Cramer-Rao-Inequation.
Favorably said one matching criterion is inversely proportional to said result of the Cramer-Rao-Inequation.
Favorably said step of evaluating processes a second matching criterion which is based on a cost function calculation like gradient, energy, standard deviation or summed absolute difference.
Favorably said step of evaluating processes at least two matching criteria, said matching criteria are simultaneously applied to evaluate said local motion vectors.
Favorably said image processing method comprises steps of calculating at least one of said at least one matching criterion.
Favorably said image processing method is conducted when a previously executed comparing step states that reliable motion estimation for said region is possible.
Favorably said comparing step evaluates whether the result of the Cramer-Rao-Inequation is in a specific range, said range starting and located above a predetermined value.
Favorably said comparing step evaluates whether a inversely proportional result of the Cramer-Rao-Inequation is smaller than a result of the cost function calculation.
Favorably said comparing step is conducted when a controlling step affirms the similarity of the result of the Cramer-Rao-Inequation of said region of the current image and the result of the Cramer-Rao-Inequation of said region of the succeeding image.
Favorably said result of the Cramer-Rao-Inequation is calculated before executing the controlling step.
Favorably said image processing method comprises a multi-resolution processing step to reduce the variance of the estimation error.
Favorably said image processing method comprises an orthogonal filtering step.
Favorably said image processing method describes a scene change or the degree of scene change between two images based on the number of regions successfully allocated with said global motion vectors.
The present invention also relates to an image processing device which is operable to obtain at least one global motion vector from local motion vectors, said local motion vectors describing the motion of pixels of a region within two different images, said at least one global motion vector describing the motion of said region within said two images, said image processing device comprising an evaluation device operable to evaluate said local motion vectors based on at least one matching criterion, wherein one matching criterion is based on a result of a statistical estimation function regarding said local motion vectors, and to conclude the reliability of said local motion vectors for motion estimation; a processing device operable to process said local motion vectors, whereby said processing device comprises a selection and/or a deletion device operable to respectively select or delete said local motion vectors according to their reliability; and a motion estimation device operable to perform motion estimation with a least number of most reliable local motion vectors obtained by said processing device, whereby said number is dependent on said motion estimation.
Favorably said result of the statistical estimation function is based on a result of the Cramer-Rao-Inequation.
Favorably said one matching criterion is inversely proportional to said result of the Cramer-Rao-Inequation.
Favorably said evaluation device is operable to evaluate said local motion vectors based on a second matching criterion, said second matching criterion being based on a cost function calculation like gradient, energy, standard deviation or summed absolute difference.
Favorably said evaluation device is operable to evaluate said local motion vectors based on at least two matching criteria, said matching criteria are simultaneously applied to said local motion vectors.
Favorably said motion estimation device is operable to select affine model transformation, projective transformation or polynomial transformation for motion estimation.
Favorably said image processing device comprises a calculation device, said calculation device being operable to receive said local motion vectors, to calculate said at least one matching criterion and output said at least one matching criterion to said evaluation device.
Favorably said image processing device describes a scene change or a degree of scene change based on the number of regions successfully allocated to said global motion vectors.
The features, objects, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, wherein:
Through out the description of the present invention the description and mentioning of motion estimation or motion estimation technique(s) is always intended to be equivalent to scene change detection, since due to the reliable estimation of global motion vectors a scene change can be directly concluded. The motion estimation can successfully estimate a large number of global motion vectors, when there is no scene change. Vice-versa due to scene change, less or no global motion vector will be estimated by the motion estimation technique. Basically a conclusion is formed after the calculation of the global motion vectors after the motion estimation. All technical features for said scene change detection will be described in the succeeding description of the motion estimation.
In order to reduce the failure possibility in case of motion estimation, saying block matching and all other motion estimations based on said block matching, besides the criterion, saying e.g. SAD (summed absolute difference), a 2nd criterion can be applied so that the matching will become more meaningful, in particular when a 2nd criterion is applied that does not belong to the same category as the 1st criterion. As 2nd criterion, e.g. the gradient, energy, standard deviation calculated from the picture area in question can be selected. The motion estimation comprises parameter/model based motion estimation techniques like e.g. affine model/transformation. Said motion estimation comprises global motion estimation operable to obtain global motion vectors.
Besides the estimation technique adopted, the materials play a very important role in estimation result. In case of motion estimation, the materials are the image contents, namely textures, edges, lines, points. If the materials are not suitable for estimation purpose, the estimation error variance will be so high that the estimation result is not useful, and this is independent of the estimation technique. The Cramer-Rao-Inequation gives the lower bound of the estimation error variance, i.e. the best case of motion estimation. For the state-of-the-art approaches, the σxy2 min of the Cramér-Rao-Inequation or gradient value is only used to select regions where reliable motion estimation is possible or impossible, namely the σxy2 min parameter only plays a 1/0, or ON/OFF role. For horizontal (x) motion estimation, the lower bound of the estimation error variance is:
and for vertical (y) motion estimation, the lower bound of the estimation error variance is:
with σ2 for the image noise variance, and I for the image area in question, usually a block of size M lines by N columns.
Therefore, for any motion estimation technique, the variance of horizontal estimation error, denoted by σx2, will not be smaller than σx2min, i.e.
And for any motion estimation technique, the variance of vertical estimation error, denoted by σy2 will not be smaller than σy2min, i.e.
Because objects do not only move in horizontal and vertical direction, σx2min and σy2min have to be combined together. For simplicity reason, it is possible to simply compute the product of σx2min and σy2min, i.e.
For practical application the reciprocal value of σxy2min is preferred, denoted as σxy2max, i.e.
The result of equation (8) or (9) explains in which region a motion vector can be estimated, and in which region reliable motion estimation is impossible. From equation (8) or (9) it becomes apparent, that reliable estimation of motion vectors in flat areas is not possible.
The motion vector shall be estimated from picture region whose estimation error variance is small enough, at best reaches the lower bound of the estimation error variance, which is given by the Cramer-Rao-Inequation. In other words, motion vectors are only estimated from regions whose goodness-of-fit (G of) is larger than the reciprocal value of the lower bound of the estimation error variance determined by the Cramer-Rao-Inequation.
Because the Cramer-Rao-Inequation is capable to point out where (within a picture) reliable motion estimation is impossible, it is possible to remove these spurious local motion vectors, and they are replaced by the interpolation result of their neighboring local motion vectors.
The Cramer-Rao-Inequation specifies the potential of motion estimation reliability, independent of motion estimation technique adopted. However, it can happen that theoretically the potential of motion estimation reliability is high, i.e. the image content is suitable for motion estimation, but it is not possible to reach this potential due to the motion estimation or its pre-processing technique adopted.
As cost function, goodness-of-fit or matching criterion for motion estimation e.g. normalized cross-correlation coefficient (NCC), squared error difference, summed absolute difference (SAD) can be applied. Because of its simplicity, SAD is often adopted. As said, the widespread motion estimation method is the block matching. Parameter/model based motion estimation, e.g. affine model/transformation based motion estimation, is widely applied to estimate e.g. rotation, zooming, shearing motion, in addition to translation motion. Affine model/transformation has six unknown variables/parameters. If the motion is beyond translation, zooming, rotation and shearing, other type of transformation can be used, e.g. the projective transformation and the polynomial transformation, which can need more unknown variables/parameters than affine transformation does.
Global motion estimation is needed for many applications, e.g. coding, segmentation. To get global motion vectors the parameter/model based motion estimation method, e.g. affine transformation is applied.
These six unknown parameters of affine transformation have to be solved using equations, which are built by the known ones, e.g. results, conditions. Usually the results of local motion estimation, e.g. block matching, are applied to build the equations.
Because affine transformation has six unknown parameters, at least six equations are required, i.e. at least three local motion vectors (x- and y-movement respectively) are needed. Motion estimation is an ill-posed problem, some local motion vectors can be totally wrong, which are often called outlier in the literature. Some local motion vectors are not totally wrong, but have a relative large tolerance. The outliers and local motion vectors with a relative large tolerance will cause wrong solution of the six unknown parameters. In order to alleviate this kind of affect, more or even much more than three motion vectors are applied for global motion estimation so that the influence of outliers and local motion vectors with a relative large tolerance on global motion estimation can be reduced.
In the present invention, the Cramer-Rao-Inequation result, i.e. the result from equation (8) or (9) is applied as the 2nd matching criterion. That is, the usual matching criterion (e.g. SAD) and a 2nd criterion, preferably the result from equation (8) or (9) are combined together as matching criterion. While SAD belongs to a goodness-of-fit function summarizing the discrepancy between observed values and the values expected under a model in question, the Cramer-Rao-Inequation is based on the maximum-likelihood-method and is a point estimation function which performs a statistical analysis for a spot sample to define an estimator with a maximized likelihood.
When the gradient or the result of equation (8) or (9) is calculated for selecting areas that are suitable for motion estimation, the above approach does not increase the hardware complexity or computational load significantly. On the contrary, it can even reduce the computational load because it is necessary to estimate motion vector from the areas where equation (8) or (9) results are the same or similar. If the σxy2min (or σxy2max) values at two regions of two different frames/images differ from each other significantly, it is not necessary to calculate the SAD and the following comparison operation any more. If the computational load has to be further reduced, the gradient, energy, standard deviation, preferably the σxy2min (or σxy2max) value can be used alone as matching criterion. Therefore, the σxy2min (or σxy2max) plays more than a 1/0, or ON/OFF role. According to the state of the art the parameter's value of the Cramér-Rao-Inequation was not fully made use of. Although the Cramer-Rao-Inequation specifies the potential of motion estimation reliability for an image sequence in question, the potential by pre-processing the image sequence in question can be improved.
Above, a case with two matching criteria has been discussed. It is possible to combine more than two criteria as matching criterion to reduce the wrong motion estimation possibility. Thus the more matching criteria are applied to the local motion vectors, the more these local motion vectors are considered as reliable for motion estimation. The application of the Cramer-Rao-Inequation is particularly useful for the parameter/model based motion estimation, e.g. the affine model based one, because the model based motion estimation usually needs control points. The model-based motion estimation method, in particular for global motion estimation, will weight the local motion vectors using different factors, in particular by weighting the outliers and local motion vectors with a relative large tolerance using small factors, if it is calculated in advance which local motion vectors are reliable or are outliers or have relative large tolerance. The Cramer-Rao-Inequation provides an effective way to determine these weighting factors: the weighting factor is selected as inversely proportional to σxy2min, σx2min or/and σy2min. Besides, as the Cramer-Rao-Inequation defines, its result leads to the error tolerance value of the motion estimation in question. Therefore, this value also statistically describes the reliability of the motion estimation result. As result, the L most reliable motion estimation result are selected for further parameter/model based motion estimation. The variable L is a number which is determined by the model in question, e.g., for the affine transformation/model the least L value is six. Because the Cramer-Rao-Inequation needs image gradients, which are affected by noise, the σxy2min (or σxy2max) can also be affected by noise. Although the summing operation Σ plays role in reducing noise disturbance, the σxy2min (or σxy2max) is still more sensitive than e.g. the SAD. The usual pre-filtering technique, e.g. the orthogonal filtering, can reduce the noise affect, but the hardware complexity or computational load will be increased.
Because the multiresolution signal processing is an efficient method to extend the motion vector search range and reduce the noise affect on motion estimation, the multiresolution processing is applied to the Cramer-Rao-Inequation. Besides the noise reduction effect, the edge gradient will become larger after decimation, the multiresolution processing can, therefore, reduce the estimation error variance (cf. the calculation of the Cramer-Rao-Inequation) and again improve the motion estimation result. The decimation operation comprises low pass filtering followed by down sampling.
Further the reliability of the global motion estimation which is based on the local motion vectors can be improved:
Global motion estimation is needed for many applications, e.g. coding, segmentation.
Global motion—translation, zooming, rotation and shearing can be modelled as affine transformation, which has six unknown variables/parameters. If the motion is beyond translation, zooming, rotation and shearing, other type of transformation can be used, e.g. the projective transformation and the polynomial transformation, which can need more unknown variables/parameters than affine transformation does. For simplicity of discussion, in the following we only discuss affine transformation case.
These six unknown parameters have to be solved using equations, which are constructed by the known ones, e.g. results, conditions. Usually the motion estimation results are applied. The motion vectors of size M×N, e.g. 16×16, can be estimated from the actual field/frame picture and its previous field/frame picture by e.g. the block matching technique. In the following, they will be referred as local motion vectors. From a field/frame picture a motion vector field/frame is obtained.
Because affine transformation has six unknown parameters, at least six equations are required, i.e. at least three local motion vectors (x- and y-movement respectively) are needed. Motion estimation is an ill-posed problem, some local motion vectors can be totally wrong, which are often called outlier in the literature. Some local motion vectors are not totally wrong, but have a relative large tolerance. The outliers and local motion vectors with a relative large tolerance will cause wrong solution of the six unknown parameters. In order to alleviate this kind of affect, more or even much more than three motion vectors are applied for global motion estimation so that the influence of outliers and local motion vectors with a relative large tolerance on global motion estimation can be reduced. A still better result can be achieved by weighting the local motion vectors using different factors, in particular by weighting the outliers and local motion vectors with a relative large tolerance using small factors, if it is known in advance which local motion vectors are reliable, which are outlier and which have relative large tolerance.
However, this kind of a-priori can be unreliable and even not available at all if the picture sequence in question is not examine. At best the picture sequence in question should be examined quantitatively. Even if such a-priori is available, the outliers and local motion vectors with a relative large tolerance will degrade the precision of the six unknown parameters solution. Therefore, the local motion vectors from the vector field/frame should be selected that are with the highest probability “correct”. Concerning this matter, the Cramer-Rao-Inequation is the best solution known so far, in particular when the Cramer-Rao-Inequation result is not only considered as 1/0, or ON/OFF, but also used as the value which it is itself. Therefore the nearer the calculated value of σxy2 is in reference to σxy2min, the higher a local motion vector is weighted. Thus the higher weighted local motion vectors are in favor to be used for the motion estimation calculation to create a motion vector.
For some application, e.g. mode detection (interlaced or progressive, film mode), the local motion vectors are taken advantage of. Because the local motion vectors that are “incorrect” with the highest probability are removed, the mode detection result will become more reliable.
Also the scene change detection will become more reliable, since the next frame or area which are considered as the “correct” next ones are compared with the current frame or area, respectively. It is possible to detect a scene change by the number of the matched blocks found between two neighboring pictures. In this way a scene change can be gradually detected, e.g. from small change (the number of the matched blocks is large) to complete change (the number of the matched blocks is zero).
Moreover the interaction of similar picture blocks can be reduced:
There can be many similar picture blocks in a sequence, whose SAD values are the same or similar. Thus, it can happen that blocks are found having the same or similar SAD value by motion estimation, although whose movements do not correspond to the true movement. For many kinds of application, dense motion vectors are desired, which in turn needs to reduce the block size of motion estimation. Reducing the block size, however, will increase the probability that similar blocks are found by motion estimation, although whose movements do not correspond to the true movement. Therefore, another matching criterion in addition to SAD is needed. Because the Cramer-Rao-Inequation result is calculated to decide where reliable motion estimation is possible or impossible, the Cramer-Rao-Inequation result is applied as matching criterion besides SAD. In this way, without increasing the hardware complexity or computational load significantly, the interaction of similar picture blocks can be reduced.
Now describing in detail the features of the
The LBC step comprises the steps of calculating a sum of γx—component 1, a sum of γy—component 2 and a sum of γxy—component 3; furthermore steps of dividing said sums 1 to 3 by a factor based on the image noise variance 6, respectively, whereby said steps of dividing is referenced as 4a, 4b and 4c for the γx—component 1, γy—component 2 and γxy—component 3, respectively. In this example the factor is twice the value of the image noise variance 6. Finally a step of calculating σxy2min 5, which outputs said value σxy2min 8. In step 5 the output signals 7a, 7b and 7c of the respective dividing steps 4a, 4b and 4c are computed with each other according to the formula (8).
The σxy2min value 8 is the lower-bound value of the estimation error variance of the horizontal and vertical estimation and is transferred as matching criterion to the motion estimation step 10. In step 9, a cost function calculation like e.g. SAD is performed. The output of step 9 is also transferred to the motion estimation step 10 as a second matching criterion. The data 11 is input into the steps 1, 2, 3, 9 and 10. In step 10, the local motion vectors, which are part of the input data 11, get evaluated and weighted by the matching criteria of step 9 the cost function calculation and by the value 8. Depending on the motion estimation technique a least number of motion vectors is needed to perform said motion estimation, whereby said motion vectors are to be considered as the most reliable ones. Advantageously, the motion estimation might also comprise additional techniques than the affine model transformation like e.g. the projective transformation or the polynomial transformation which are alternatively used dependent on the global motion. Moreover, the cost function calculation in step 9 might also comprise alternative calculation methods than the SAD.
In another embodiment of the present invention only value 8 is used as a matching criterion for the motion estimation, whereby said method only comprises the LBC step and the motion estimation step 10. Again there are embodiments of the present invention using one or more matching criteria.
In another embodiment the σxy2min value 8 does not need to be calculated, since it was already calculated for the pretesting as described below in step 17 of
In
Not shown in
The evaluation device 29 is operable to receive and evaluate said local motion vectors 43 and eventually output the evaluated local motion vectors 45. The evaluation device 29 comprises one operation 33 wherein one matching criterion is applied to said local motion vectors 43. In other embodiments the evaluation device 29 comprises at least one more operation 34 wherein further matching criteria are applied to said local motion vectors 43. In case of a plurality of matching criteria said matching criteria are either simultaneously or subsequently applied to said local motion vectors 43.
The processing device 30 is operable to receive and process the evaluated local motion vectors 45, and eventually output the processed local motion vectors 46. The processing device 30 comprises a selection process 36 and/or a deletion process 37, wherein the evaluated local motion vectors 46 are either selected or deleted, respectively, according to their reliability for the motion estimation.
The motion estimation device 31 is operable to receive the processed local motion vectors 46, execute a motion estimation based on the input of the processed local motion vectors 46 and eventually output global motion vectors 44. The motion estimation device 31 comprises a selection unit 39 for selecting a motion estimation technique like e.g. affine model transformation 40, projective transformation 41 or polynomial transformation 42. Then based on the selected motion estimation technique the least number L of most reliable local motion vectors is determined. This number L is used in a vector selection unit 38 which selects the L most reliable local motion vectors from said processed local motion vectors 46 for the motion estimation.
The calculation device 32 is operable to receive the local motion vectors 32, calculate and output at least one matching criterion based on said local motion vectors 32.
The image processing device is also operable to describe a degree of scene change based on the number of regions successfully allocated to said global motion vectors as described in
Number | Date | Country | Kind |
---|---|---|---|
06124068.5 | Nov 2006 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP07/09845 | 11/14/2007 | WO | 00 | 7/1/2009 |