The present invention relates to an image registration method and, more particularly, to an image registration method estimating motions between an image of interest in an image sequence and a reference image in the image sequence.
Image registration means a technique that estimates transformation parameter matching two images when two images are put (i.e. when an image of interest is put on a reference image). That is to say, image registration means to estimate motions between the image of interest and the reference image.
Image registration, i.e. motion estimation between the image of interest and the reference image, is the most basic and important processing in much image processing such as super-resolution processing, image mosaicing, three-dimensional reconstruction, stereo vision, depth estimation, image measurement and machine vision (see Non-Patent Document 1 and Non-Patent Document 2).
In order to conduct the image registration, i.e. in order to estimate motion parameter between an image of interest in the image sequence and a reference image in the image sequence, it is often that the motion approximated by planar projective transformation. If only planar region of registration object (hereinafter also simply referred to as “object”) is set as a region of interest (ROI), it is possible to obtain accurate motion parameter.
However, because there may not be a plane (planar region) in ROI, it is often that the actual motion of registration object is different from the estimated motion based on planar projective transformation model. Furthermore, it is often that estimation of motion parameter is unsuccessful due to illumination variation, occlusion and so on.
For such a problem, heretofore, many methods such as a method using a base image represents illumination variation (see Non-Patent Document 3 and Non-Patent Document 4), a method to divide in multiple regions (see Non-Patent Document 5 and Non-Patent Document 6), a method modeling the object geometry in quadric surface (see Non-Patent Document 7), a method modeling the complicated geometry such as face (see Non-Patent Document 8), a method using motion segmentation (see Non-Patent Document 9 and Non-Patent Document 10), a method modeling the motion distribution (see Non-Patent Document 11) and a method selecting and using an appropriate region in motion estimation (see Non-Patent Document 12 and Non-Patent Document 13), were proposed.
Of these methods, for example, in the method selecting and using an appropriate region in motion estimation (hereinafter also simply referred to as “a region-selection-based method”), the method disclosed in Non-Patent Document 12, obtains a residual motion weighted by the size of concentration gradient for spatial direction (this residual motion is also referred to as “a normal flow”) and evaluates a region where the size of obtained residual motion is small to be a region where the registration is conducted precisely.
However, because this normal flow is easy to be affected by noise that was included in an image, in order to extract a region where the registration is conducted precisely, there is the problem that the post-processing such as a processing in which the results from multiple images are weighted and averaged (see Non-Patent Document 12), and a processing which uses a probability model (see Non-Patent Document 14), is necessary.
In addition, in the method which is disclosed in Non-Patent Document 13 and belongs to the region-selection-based method, because the weight is lowered based on the difference in pixel value of each pixel within ROI between the reference image and the image transformed by the estimated motion, there is the problem that the weight is lowered by illumination variation of object and there is the possibility to fail in registration.
Here, we explain the conventional image registration method using planar projective transformation. That is to say, when image registration is conducted by using the conventional image registration method, the motion between images, i.e. the motion between the input image (the image of interest) in the image sequence and the reference image in the image sequence, is estimated by using a planar projective transformation model.
In order to estimate the parameter representing this planar projective transformation model, i.e. the motion parameter of the image of interest for the reference image, we define an objective function represented by the following Expression 1.
Where, I(x) represents the input image in the image sequencer I0(x) represents the reference image, and ROI represents the region of interest. Furthermore, x=[x,y,1]T represents a position on an image expressed by homogeneous coordinate system. h=[h1, h2, . . . , h8]T represents eight parameters expressing planar projective transformation.
represents the motion of the image of interest for the reference image, i.e. represents planar projective transformation.
An arbitrary region can be used as a ROI, but it is often that a rectangle region is used as a ROI. When minimizing the objective function represented by the aforementioned Expression 1, all of pixel values within ROI is used.
In order to estimate the transformation parameter h that minimizes the above Expression 1, for example, the methods such as a gradient method (see Non-Patent Document 15, Non-Patent Document 16, Non-Patent Document 17 and Non-Patent Document 18), a hyperplane intersection method that does not use iterative computation (see Non-Patent Document 19) and a fast method of the hyperplane intersection method disclosed in Non-Patent Document 19 (see Non-Patent Document 20), can be used.
When conducting registration between images that captured an object regarded as a rigid body by the aforementioned conventional image registration methods that just use pixel values within ROI and belong to region-based method, there is a possibility to fail in for the following factors.
That is to say, the failure factors of registration that is conducted by the conventional image registration method that uses planar projective transformation and belongs to region-based method, are the following factors.
When capturing a plane by using a camera that arbitrarily changes position and direction in three-dimensional space, it is possible to represent the transformation between images by using the planar projective transformation model. Here, we assume that lens distortion of camera is small so as to be able to ignore it or lens distortion of camera is corrected separately. When the object shape is not a plane, as a matter of course, it is impossible to completely describe the change of the visibility of the object by the planar projective transformation model, and the minimum value of the objective function represented by Expression 1 becomes big. For this reason, there is a possibility to fall into a minimal position that is different from the correct motion parameter and fail in the registration.
When the position and the direction of a camera change for a plane, both the angle that the camera watches the plane and the distance from the camera to the plane change. If the plane has a Lanbertian surface, the brightness of the plane does not change by the angle that the camera watches the plane and the distance from the camera to the plane. However, actually, because it is rare that the object having planar geometry is an absolute uniform diffuser, the luminance of the object (the plane) changes by the position and direction of the camera for the object (the plane). Therefore, when the luminance of the object (the plane) changes, i.e. when the brightness between images changes, the value of the objective function represented by Expression 1 changes, the minimum value of the objective function becomes big and there is a possibility to fail in the registration.
Failure Factor 3: The Change of the Distance from the Camera to the Plane
In the case of registering sequentially the object which was captured as an image sequence, when the distance from the camera to the plane changes, there is a possibility to fail in registration for the following causes.
Firstly, in case that the lens of the camera can be approximated by pinhole lens, i.e. in case that the image sequence which is visually in focus can be always captured even if the distance from the camera to the plane changes, when the distance from the camera to the plane gradually becomes big, such a problem occurs. The object is captured to the reference image greatly, but the object gradually becomes small on the image sequence (i.e. the images which are used as input images) which were captured over time. In order to transform the input image for ROI set on the reference image and conduct the registration, it is necessary to enlarge the input image. With the enlargement of the image, the image inevitably blurs. In the end, conducting the registration of the input image for the reference image, means conducting the registration between images having different blurs, and finally fail in the registration.
Secondly, in case that the lens of the camera can be approximated by thin lens, i.e. in case that the image sequence which is visually in focus can be captured when the object exists in a certain distance range, when the manner of focus sliding for the object changes, a problem like the problem that occurred for the first cause occurs.
The objective function represented by the above Expression 1, represents sum of squares of difference of pixel value between images. For this reason, of course the value of the objective function changes by the geometry variation between images, but the value of the objective function also changes by a change of brightness of image. The brightness of the object changes by the illumination variation, but the illumination variation becomes the bigger obstacle for the registration when the change of brightness of the object is different by the position on image.
When the occlusion of the registration object (target object, i.e. object) and other object exists in ROI, it becomes a big obstacle for the registration. Particularly, when the density and the contrast of texture in the surface of other object that appeared to occlude the target object are higher than the density and the contrast of texture in the surface of the target object, the motion parameter h which minimizes the objective function represented Expression 1 is affected strongly by the position of other object, and there is a possibility to obtain a registration result unlike the position and pose of the target object by using this motion parameter. Furthermore, a shadow of the target object itself occurs by the target object geometry and the position of light source, this shadow can change, but a problem that this shadow as the change of brightness of the target object affects the registration result also occurs.
Conventionally, as the countermeasure against the above-mentioned registration failure factors, many artifices are used. As main countermeasure, there are the following countermeasures.
Specifically, for example, it is possible to reduce the influence of the change of brightness of the target object by using Laplacian and Laplacian of Gaussian. Furthermore, it is possible to absorb the change of blur of the target object in a certain degree. However, Countermeasure 1 has a problem that can not correspond to the object having the geometry except plane and occlusion.
It is possible to reduce the influence of the change of brightness of the target object by minimizing the objective function represented Expression 1 after normalizing the pixel values within ROI. When the motion is limited to translation, normalized cross correlation can be used. When the motion is planar projective transformation, the methods using normalized cross correlation are also proposed (see Non-Patent Document 19 and Non-Patent Document 20). However, Countermeasure 2 has a problem that can not correspond to the object having the geometry except plane and occlusion.
Countermeasure 3 is a countermeasure that utilizes the change of the visibility of the object between adjacent frames being small. Specifically, first, the motion parameter between temporally-adjacent frames (ht,t−1) is sequentially obtained. Then, planar projective transformation for the reference image (W(x;ht,0)) is obtained as the product of planar projective transformation between adjacent frames (W( . . . W(W(x;ht,t−1); ht−1,t−2); . . . ; h1,0). However, Countermeasure 3 has a problem that registration error gradually accumulates and in the end displacement occurs.
A countermeasure that updates the reference image by using an index such as the minimum value of Expression 1 before the input image greatly changes from the reference image, is utilized. However, like Countermeasure 3 (i.e. the method using accumulate operation of motions between adjacent frames), Countermeasure 4 also has a problem that registration error gradually accumulates and in the end displacement occurs.
The present invention has been developed in view of the above described circumstances, and an object of the present invention is to provide a region-selection-based image registration method for estimating motions between an image of interest in an image sequence and a reference image in the image sequence which can be applied to the registration of the object having every geometry and by which the high-precision image registration with robustness to illumination variation and occlusion can be conducted.
The present invention relates to an image registration method for conducting a high-precision image registration between a reference image in an image sequence capturing an object and an image of interest in said image sequence. The above object of the present invention is effectively achieved by the construction that said method characterized in that a predetermined region on said reference image is set as a region of interest, when conducting said high-precision image registration, a motion parameter is estimated based on pixels of a mask image representing a region where the registration is performed precisely by a predetermined transformation within said region of interest that is set. The above object of the present invention is also effectively achieved by the construction that said mask image is generated by utilizing the similarity evaluation between images. The above object of the present invention is also effectively achieved by the construction that said predetermined transformation is a planar projective transformation, an affine transformation, a scale transformation, a rotation transformation, a parallel transformation, or a transformation by the combination of these transformations.
Further, the above object of the present invention is also effectively achieved by the construction that in the case of assuming that the registration for an image Ia(x) and an image Ib(x) is roughly conducted by transforming said image Ia(x) with a certain transformation parameter h, the following expression holds,
I
b(x)≈Ia(W(x;h)),xεROI
where ROE represents said region of interest, in this case, an SSD (Sum of Squared Difference) between an image Ia(W(x;h)) and said image Ib(x) for a patch that is centered around a pixel within said ROI (x), is defined as the following expression,
where, u=[u,u2,1] is a vector representing a translation between images and Patch represents said patch, when said SSD fulfills all of the following three conditions, the pixel value of said mask image corresponding to the pixel x within said ROI is set to 1, and in other cases, the pixel value of said mask image is set to 0, a condition 1: said condition 1 is that with respect to said SSD, the sub-pixel displacement about the translation is smaller than 0.5 pixel, a condition 2: said condition 2 is that the minimum value of said SSD is small enough, a condition 3: said condition 3 is that either of a two dimensional coefficient for the horizontal direction and the vertical direction when conducting the parabola fitting that is centered around the minimum value of said SSD, is bigger than the threshold.
Further, the above object of the present invention is also effectively achieved by the construction that said high-precision image registration consists of a first step registration that a mask image for tracking which represents pixels without the change between adjacent frames is generated and at the same time a motion parameter between adjacent frames is conducted, and a second step registration that a mask image for error correction which represents pixels within said image of interest that correspond to said reference image is generated between the image of interest transformed by said motion parameter estimated in said first step registration and said reference image and at the same time a motion parameter of said image of interest for said reference image is estimated again by using the generated mask image for error correction.
In general, the image registration method (hereinafter also simply referred to as “the motion estimation method”) can be divided into the feature-based method and the region-based method.
The image registration method of the present invention, belongs to the region-based method and can be applied to the registration of the object having every geometry as well as planar geometry. According to the image registration method of the present invention, it is possible to conduct the high-precision image registration with robustness to illumination variation and occlusion.
That is to say, the image registration method of the present invention is a region-selection-based image registration method which uses mask images represent regions that can be approximated with high accuracy by the planar projective transformation model. Since the image registration method of the present invention conducts the image registration by two steps, hereinafter the image registration method of the present invention is also simply referred to as “the region selection two step registration method”.
In the first step of the present invention, in order to realize the image registration with robustness to illumination variation and occlusion, the image registration based on motions between adjacent frames is conducted. In this case, a mask image (hereinafter this mask image is also simply referred to as “a mask image for tracking”) is used simultaneously. Furthermore, in order to make the geometry of ROI uniformity, the motion estimation is conducted after having matched the frame of interest (hereinafter also simply referred to as “the image of interest”) with the reference frame (hereinafter also simply referred to as “the reference image”) by using the estimated motion parameter and having transformed the frame of interest.
Then, in the second step of the present invention, in order to compensate the accumulate operation error to be included in the motion parameter estimated in the first step, that is to say, in order to realize the high-precision image registration, the motion parameter is estimated again between the frame of interest transformed by the motion parameter estimated in the first step and the reference frame, at the same time, a mask image for the motion parameter that was estimated again (hereinafter this mask image is also simply referred to as “a mask image for error correction”) is generated.
By conducting such a two-step registration, the image registration method of the present invention can be applied to the registration of the object having every geometry, and can realize the high-precision image registration with robustness to illumination variation and occlusion.
As described in background technique, a cause to fail in the image registration conducted by the use of the conventional region-based method, is a point using all pixel values within ROI equally.
The point aimed at of the present invention is not to be affected by variation of geometry and illumination in principle, if estimating the motion parameter by using only pixels that geometry and brightness for the reference image do not change within ROI (i.e. pixels of the mask image representing the region that can conduct the registration accurately by the planar projective transformation) without using all pixel values within ROI equally as conventional region-based method.
Furthermore, the present invention realizes the high-precision image registration by repeatedly and alternately conducting region selection and motion parameter estimation.
Practically, when conducting the registration of the input image for the reference image by using the image registration method of the present invention, the registration is conducted according to the following two steps.
The first step is a step that simultaneously estimates the mask image in time t (Qt(x)) and the transformation parameter between adjacent frames that used the mask image Qt(x) (ht,t−1) by iterative computation.
The second step is a step that obtains the mask image in time t between the input image transformed by the planar projective transformation W(W(x;ht,t−1);ht−1,0) and the reference image (Mt(x)), and estimates the transformation parameter of the input image for the reference image (ht,0) again by using the estimated Mt(x).
As described above, in the image registration method of the present invention, two kinds of mask image, i.e. Qt(x) and Mt(x) are utilized. The mask image Qt(x) represents pixels without the change between adjacent frames, and hereinafter is also referred to as “the mask image for tracking”. Furthermore, the mask image Mt(x) represents pixels within the input image that correspond to the reference image, and hereinafter is also referred to as “the mask image for error correction”. In addition, not only the mask image for tracking but also the mask image for error correction is represented by a coordinate system on the reference image like ROI.
The image registration method of the present invention will be described below in detail with reference to the accompanying drawings.
As shown in
I
b(x)≈Ia(W(x;h)),XεROI
In this case, an SSD (Sum of Squared Difference) between the image Ia(W(x;h)) and the image Ib(x) for a patch that is centered around a pixel within ROI (x), is defined as the following Expression 3.
Where, u[u1,u2,1]T is a vector representing the translation between images. Patch represents the patch.
When all of the following three conditions are fulfilled with respect to the SSD, the pixel value of the binary format mask image Q(x) for tracking corresponding to the position with in ROI (x), i.e. the pixel within ROI (x) is set to 1, and in other cases, the pixel value of the binary format mask image Q(x) for tracking is set to 0. Furthermore, the pixel value of the binary format mask image M(x) for error correction is set like the binary format mask image Q(x) for tracking, too.
Condition 1 is that the sub-pixel displacement about the translation is smaller than 0.5 pixel (see
Condition 2 is that there is no mismatching. That is to say, Condition 2 is that the minimum value of the SSD (R(X,[0,0,1]T)) is small enough (see Non-Patent Document 21 and Non-Patent Document 22). When representing Condition 2 by a mathematical expression, Condition 2 can be represented by the following Expression 5.
R(x,[0,0,1]T)<2Sσn2×κ1 [Expression 5]
Where, S is the area of the patch, σn2 is the variance of normalized white noise to be included in the image, and κ1(≈1) is a tunable parameter. For reference's sake, in the experiments that are performed by the present invention as described hereinbelow, a patch where S is 9[pixel]×9[pixel], is used. Furthermore, the parameter σn2 is different by camera used in image capturing and the setting of gain and so on. For example, σn2 is set to 3.5 when using a DragonFly camera (Point Grey Research Inc., a single-chip color VGA camera), and σn2 is set to 4.5 when using VX2000 made in Company SONY.
Condition 3 is that texture exists. That is to say, Condition 3 is that either of the two dimensional coefficient (The description of R( ) is omitted. When fitting R(u)=au2+u+c in R(−1), R(0) and R(1), a=(R(−1)+R(1)/2−R(0) b=R(1)−R(−) and c=R(0) hold.) for the horizontal direction and the vertical direction when conducting the parabola fitting that is centered around the minimum value of the SSD, is bigger than the threshold. When representing Condition 3 by a mathematical expression, Condition 3 can be represented by the following Expression 6.
Where, κ2(≈14.0) is a tunable parameter determined by experiment.
The aim of the first step registration (hereinafter also simply referred to as “the robust registration”) is to continue the tracking of the object as possible for a long time.
As shown in
Since there are few changes of image between adjacent frames, there is not the big failure in the first step registration. For this reason, even if the brightness and geometry of the object gradually change for the reference image, it is possible to conduct a robust registration.
In short, in the first step registration, the motion parameter ht,t−1 and the mask image for tracking Qt(x) that minimize an objective function represented by the following Expression 7, are obtained.
Where, ht−1,0 is a motion parameter that is already obtained for previous frame, i.e. a motion parameter of (t−1)-th frame (the input image of time (t−1)) for the reference frame (the reference image). ht,t−1 is a motion parameter between adjacent frames, i.e. a motion parameter between t-th frame (the input image of time t) and (t−1)-th frame (the input image of time (t−1)). It(x) and It−1(x) are the input images of time t and time (t−1) respectively.
Since the mask image for tracking Qt(x) is defined on the reference image, the motion parameter between adjacent frames is obtained after having matched the input image with the reference image by using the motion parameter that is already obtained and having transformed the input image.
Specifically, the motion parameter between adjacent frames and the mask image for tracking are repeatedly and alternately obtained by procedures from Step 1 to Step 4.
An index i that represents the number of times of iteration is initialized. That is to say, i is set to 0 (i=0). A mask image for tracking that is already obtained in time (t−1) (Qt−1(x)), is set as an initial mask image of time t (Qt<1>(x)).
A motion parameter between adjacent frames that minimizes Expression 7 (ht,t−1<i>) is obtained by using the mask image for tracking (Qt<i>(x)).
First, an image that planar projective transformation is conducted (It(W(W(x;ht,t−1);ht−1,0))) is generated by using the motion parameter between adjacent frames that is obtained in Step 2 (ht,t−1<i>). Second, a mask image for tracking (Qt<i+1>(x)) is generated by using the generation method of mask image as described in (1) between the generated image (It(W(W(x;ht,t−1);ht−1,0))) and an image It−1(W(x;ht−1,0)).
It is judged whether the change of the motion parameter between adjacent frames became less than or equal to a certain value. When the change of the motion parameter between adjacent frames became less than or equal to a certain value, it is judged that the motion parameter between adjacent frames converged, hi,i−1<i> is output as the motion parameter between adjacent frames, and the processing of the first step registration is finished. On the other hand, when the change of the motion parameter between adjacent frames does not become less than or equal to a certain value, i.e. when ∥ht,t−1<i>−ht,t−1<i−1>∥≧Th (Th represents a certain value that is mentioned in Step 4) holds, it is judged that the motion parameter between adjacent frames did not converge, the processing of the first step registration returns to Step 2 after setting i+1 as i (i←+1).
The aim of the second step registration (hereinafter also simply referred to as “the accumulate operation error correction registration”) is to correct the registration error that occurred in the first step registration.
As shown in
The mask image for tracking (Qt(x)) represents pixels that the registration between adjacent frames can use. Since the change between adjacent frames is small generally, the area of the mask image for tracking (Qt(x)) does not become so small. On the other hand, when the input image gradually changes for the reference image, the area of the mask image for error correction (Mt(x)) gradually become small.
Since the region of the mask image for error correction (Mt(x)) is used in the second step registration, in order to obtain stable results, it is necessary that the region of the mask image for error correction (Mt(x)) is bigger than a certain degree of region. Therefore, when the region of the mask image for error correction (Mt(x)) is small, i.e. when the area of the mask image for error correction (Mt(x)) is under a predetermined threshold, the motion parameter obtained in the first step registration, is just used without conducting the second step registration.
In short, in the second step registration, the motion parameter ht,0 and the mask image for error correction Mt(x) that minimize an objective function represented by the following Expression 8, are obtained.
Where, the planar projective transformation W(W(x;ht,t−1);ht−1,0) that is obtained in the first step registration, is used as an initial value W(x;ht,0<0>) of a planar projective transformation W(x;ht,0).
Specifically, the motion parameter ht,0 (i.e. the transformation parameter of the input image of time t for the reference image) and the mask image for error correction are obtained by procedures from Step 5 to Step 9. The second step registration differs from the first step registration and does not conduct procedures from Step 5 to Step 9 repeatedly.
The planar projective transformation W(W(x;ht,t−1);ht−1,0) that is obtained in the first step registration, is set as the initial value W(x;ht,0<0>) of the planar projective transformation W(x;ht,0).
A mask image for error correction (Mt<1>(x)) is generated by using the generation method of mask image as described in (1) between the input image of time t that is transformed by the planar projective transformation (W(x;ht,0<0>)), i.e. the image It(W(x;ht,0<0>)) and the reference image I0(x).
It is judged whether the area of the mask image Mt<1>(x) for error correction that is generated in Step 6 is under a predetermined threshold. When the area of the mask image Mt<1>(x) for error correction is under a predetermined threshold, ht,0<0> is output as the motion parameter, and the processing of the second step registration is finished. Hereby, all the registration processing for the input image of time t is finished. Here, it is necessary to adjust the predetermined threshold by characteristics of motions of the image sequence, the centroid position of the mask region within ROI, the distribution of the mask region, and so on. Furthermore, in the registration experiments performed by the present invention as described below, the predetermined threshold is set to 20 [pixel]×20[pixel].
On the other hand, when the area of the mask image Mt<1>(x) for error correction is more than or equal to predetermined threshold, ht,0<1> that minimizes Expression 8 is obtained by using the mask image Mt<1>(x) for error correction generated in Step 6, the obtained ht,0<1> is output as the motion parameter, and the processing of the second step registration is finished. Hereby, all the registration processing for the input image of time t is finished.
Finally, a mask image for error correction (Mt<2>(x)) is generated by using the generation method of mask image as described in (1) between the image It(W(x;ht,0<1>)) and the reference image I0(x) for confirmation.
In the registration experiments performed by the present invention as described below, in order to estimate the initial value of the motion parameter between adjacent frames that is obtained in the first step registration, the hierarchical search method that is disclosed in Non-Patent Document 23 is used. The hierarchies are limited so that the area of ROI within the reduction image did not become smaller than 25[pixel]×25[pixel].
In the image registration method of the present invention, firstly, in the first step registration, it is necessary to conduct motion parameter estimation that uses the gradient method used iterative computation (the number of times of iteration is set to ig) and conduct only iterative computation (the number of times of iteration is set to ir) for simultaneously estimating with the mask image Qt(x) for tracking. And then, in the second step registration, it is necessary to only conduct the gradient method once (the number of times of iteration is also set to ig).
In early gradient method, it was necessary to compute the Hessian matrix of the input image repeatedly. For this reason, in the present invention, it is necessary to obtain the Hessian matrix (ig×ir+ig) times for each frame image. Consequently, in comparison with the normal gradient method that does not conduct the region selection, the amount of computation in the present invention increases greatly.
Incidentally, instead of computing the Hessian matrix of the input image repeatedly, a speeding up method that the Hessian matrix of the input image is only computed once and is used (see Non-Patent Document 15), is proposed. In the present invention, since it is possible to require only one computation of the Hessian matrix for each frame image in the first step registration and preliminarily compute through the preprocessing in the second registration by using the speeding up method disclosed in Non-Patent Document 15, therefore it is possible to speed up the computation considerably.
(6) The results of Registration Experiments Performed by the Present Invention
We performed a few registration experiments using real images by the image registration method of the present invention (hereinafter also simply referred to as “the present invention”), and confirmed the validity of the present invention. That is to say, through the registration experiments described below, the originally superior effects of the present invention, i.e. (a) it is possible to apply to the registration of the object having every geometry as well as planar geometry, (b) having robustness to illumination variation and occlusion, (c) it is possible to conduct the high-precision image registration, are confirmed.
In the registration experiments described below, a stationary single-chip color VGA camera DragonFly (Point Grey Research inc.) running at 30 FPS is used. In the registration, the transformed luminance component is used after the demosaicing processing. Experiment 1 (an experiment for comparing the present invention and the conventional method)
Firstly, a registration experiment that compares the present invention with the conventional image registration method (hereinafter also simply referred to as “the conventional method”)), is performed.
In Experiment 1, an image sequence which consists of 300 frames, is used. The size of ROT is 200[pixel]×200[pixel]. Further, the object for tracking is an aerial photograph poster stuck on a fixed plane. In order to make illumination variation and occlusion, a hand is moved on the poster. The shade of the hand corresponds to the illumination variation and the hand itself becomes the occlusion.
In Experiment 1, since the object having planar geometry (the aerial photograph poster) that was fixed without moving, is captured in the state that the camera is fixed, so the position of the object does not change, therefore the motion parameter to become the correct solution, of course becomes a unit matrix.
Through Experiment 1, we confirmed that when there are illumination variation and occlusion, even in the situation that the registration fails by the conventional method, it is possible to conduct the high-precision image registration without failing by using the present invention.
Specifically,
Through
In Experiment 1, since the motion is a known thing preliminarily, that is to say, since the motion parameter to become the correct solution is a unit matrix, the motion estimation accuracy is evaluated by the position error that occurred by the estimated motion parameter (see Non-Patent Document 15).
Specifically,
Through
In Experiment 2, the tracking object is a globe whose diameter is about 14 cm, and an image sequence which consists of 300 frames that captured the globe turned slowly by hand by right and left, is used. In Experiment 2 the size of ROI is 100 [pixel]×100 [pixel].
Through Experiment 2, we confirmed that even if the tracking object has nonplanar geometry, the robust tracking is possible by the region selection two step registration method of the present invention.
Specifically,
Through
Furthermore, through
In Experiment 3, the tracking object is a person's face that is captured in the room, and since the direction of the face is changed under the fixed room illumination, the illumination variation exists. Furthermore, in Experiment 3, not only the face is a nonplane but also the geometry slightly changes. In Experiment 3, an image sequence which consists of 600 frames that captured the face is used, and the size of ROI is 90[pixel]×100[pixel].
Through Experiment 3, we confirmed that even if the tracking object is an object which is not usually used by the conventional region-based registration method, the robust tracking is possible by the region selection two step registration method of the present invention.
Specifically,
Through
However through
In addition, in the embodiment of the present invention described above, when conducting the image registration, the transformation of the image is performed based on the planar projective transformation. However, the planar projective transformation to mention in the present invention, includes not only the planar projective transformation but also the affine transformation, the scale transformation, the rotation transformation, the parallel transformation as particular cases of the planar projective transformation, and the transformation by the combination of these transformations.
The image registration method of the present invention, is a region-selection-based image registration method that uses the mask image representing the region which can be approximated with a high degree of accuracy by the planar projective transformation model.
The image registration method of the present invention is characterized in that the mask image representing the region where the registration is performed precisely by the planar projective transformation, is utilized when estimating the transformation parameter (the motion parameter).
Actually, in the case of conducting the registration by using the image registration method of the present invention, it is the big feature to perform the first step registration that the mask image for tracking is generated and at the same time the registration based on the motion between adjacent frames is conducted to realize the registration with robustness to illumination variation and occlusion, and the second step registration that the mask image for error correction is generated between the frame of interest (the image of interest) transformed by the motion parameter estimated in the first step registration and the reference frame (the reference image) and at the same time the motion parameter of the image of interest for the reference image is estimated again by using the generated mask image for error correction to compensate the accumulate operation error to be included in the motion parameter estimated in the first step registration, i.e. to realize the high-precision registration.
According to the image registration method of the present invention having the above feature, since the registration between the image of interest and the reference image is conducted by performing such a two step registration, it is possible to obtain the originally superior effects of the present invention to be able to realize the high-precision image registration with robustness to illumination variation and occlusion for the object having every geometry as well as planar geometry.
Number | Date | Country | Kind |
---|---|---|---|
2006-80784 | Mar 2006 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2007/057011 | 3/23/2007 | WO | 00 | 9/18/2008 |