Hand/eye calibration method using projective invariant shape descriptor of 2-dimensional image

BACKGROUND OF THE INVENTION

This application claims priority from Korean Patent Application No. 2002-72695, filed on 21 Nov. 2002, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.

1. Field of the Invention

The present invention relates to visual servoing technology, and more particularly, to a hand/eye calibration method using a projective invariant shape descriptor of a two-dimensional image.

2. Description of the Related Art

Hand/eye calibration denotes a procedure for determining the spatial transformation between a robot hand and a camera mounted on the robot hand to obtain a desired image using visual servoing technology in a robot's coordinate frame and control the robot. One of the hand/eye calibration methods, which is most frequently used, is to provide a prior motion information and to obtain desired information from images transforms generated based on the provided motion information. In order to easily and correctly extract transformation information of the robot hand, it is very important to correctly select corresponding points between transformed images in this method.

SUMMARY OF THE INVENTION

The present invention provides a hand/eye calibration method which makes it possible to easily and correctly extract transformation information between a robot hand and a camera.

The present invention also provides a computer readable medium having embodied thereon a computer program for the hand/eye calibration method.

According to an aspect of the present invention, there is provided a hand/eye calibration method. The method includes (a) calculating a projective invariant shape descriptor from at least two images consecutively obtained through a camera mounted on a robot hand; (b) extracting corresponding points between the images by using the projective invariant shape descriptor; (c) calculating a rotation matrix for the corresponding points from translation of the robot; (d) calculating translation vector for the corresponding points from translation and rotation of the robot; and (e) finding a relation between the robot hand and the camera based on the rotation matrix calculated in step (c) and the translation vector calculated in step (d).

According to an aspect of the present invention, there is provided a method of extracting corresponding points between images. The method includes (a) defining errors for a projective invariant shape descriptor for a two-dimensional image from at least two images obtained at a predetermined interval and calculating noisy invariance; (b) calculating a threshold to be used to set corresponding points according to the noisy invariance; (c) extracting boundary data from the images and presenting the extracted boundary data by subsampling N data; (d) minimizing the projective invariant shape descriptor; (e) transforming a following image into the following image according to the minimized projective invariant shape descriptor; (f) resetting distance between boundary data in consideration of the ratio of distance between boundary data before the transformation to distance between boundary data after the transformation; and (g) finding similarities between the boundary data and extracting corresponding points between the previous images and the following image.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:

FIG. 1 is a view of a robot hand on which a camera is mounted, the robot hand being controlled using visual servoing;

FIG. 2 is a view of an appearance of the camera of FIG. 1;

FIG. 3 is a view of a pin-hall camera model of the camera of FIG. 2;

FIG. 4 is a view of coordinate systems of an object plane and an image plane, and a projection result of the image plane to the object plane;

FIG. 5 is a view showing conditions to satisfy a linearity of the camera of FIG. 2;

FIG. 6 is a view presenting a relationship for a coordinate system of an object plane, a coordinate system on an image plane, and a world coordinate system of a robot hand;

FIG. 7 is a view for explaining a method of obtaining a projective invariant shape descriptor from a 2-dimensional image;

FIG. 8 is a view showing an example of projective invariant shape descriptor calculated from a 2-dimensional shape of FIG. 7;

FIG. 9 is a flowchart of a method for hand/eye calibration according to an embodiment of the present invention; and

FIG. 10 is a detailed flowchart of a method of obtaining corresponding points between a previous image and a following image shown in FIG. 9.

DETAILED DESCRIPTION OF THE INVENTION

The present invention will now be described more fully with reference to the accompanying drawings, in which preferred embodiments of the invention are shown. In the drawings, like reference numerals refer to like elements throughout.

FIG. 1 is a view of a robot hand 100 on which a charged-coupled device (CCD) camera 110 is mounted to adopt a visual servoing controlling technique. Referring to FIG. 1, at least one CCD camera 110 is mounted at an end of the robot hand 100 used as an industrial robot.

FIG. 2 is a view of an appearance of the CCD camera 110 of FIG. 1. Referring to FIG. 2, the CCD camera 110 is composed of a main body 111, a CCD array 112, and a lens 114. The lens 114 has the same function as that of a crystalline lens of a person's eyes. The CCD array 112 corresponds to an image plane on which an image projected through the lens 114 of the CCD camera 110 is cast and has the same function as that of the retina of a person's eyes. When a focus of the lens 114 is set to infinitely, the length from a center of the lens 114 to the CCD array 112 denotes a focal length F, and the image looks different according to the focal length F. The focal length F is an important parameter necessary to estimate the length between the CCD camera 110 and an object.

FIG. 3 is a view of a pin-hall camera model of the CCD camera 110 of FIG. 2, and FIG. 4 is a view of coordinate systems of an object plane and an image plane, and a projection result of the image plane to the object plane. Referring to FIGS. 3 and 4, a projective transformation for an image in the pin-hall camera model can be expressed as follows.
$\begin{matrix} [\begin{matrix} u \\ v \\ 1 \end{matrix}] = \frac{1}{t_{31} X + t_{32} Y + t_{33} Z + t_{34}} [\begin{matrix} t_{11} & t_{12} & t_{13} & t_{14} \\ t_{21} & t_{22} & t_{23} & t_{24} \\ t_{31} & t_{32} & t_{33} & t_{34} \end{matrix}] [\begin{matrix} X \\ Y \\ Z \\ 1 \end{matrix}] & (1) \end{matrix}$

where (u, v, 1) denotes a coordinates of a point q defined on the image plane, (X, Y, Z, 1) denotes a coordinates of a point P in an object coordinate system, and t_ijdenotes an ij factor of a transformation matrix between an object plane and the image plane.

Here, if an object is projected to on a two-dimensional plane, i.e., Z=0, Equitation 1 is transformed as follows.
$\begin{matrix} [\begin{matrix} u \\ v \\ 1 \end{matrix}] = \frac{1}{t_{31} X + t_{32} Y + t_{34}} [\begin{matrix} t_{11} & t_{12} & t_{14} \\ t_{21} & t_{22} & t_{24} \\ t_{31} & t_{32} & t_{34} \end{matrix}] [\begin{matrix} X \\ Y \\ 1 \end{matrix}] & (2) \end{matrix}$

As shown in Equations 1 and 2, the process for obtaining an image is performed in non-linear environment. However, a linearized projective transformation is adopted to a two-dimensional image obtained through the CCD camera 110 rather than a non-linear projective transformation like in Equation 2.

FIG. 5 is a view showing conditions to obtain a linear model of the CCD camera 110 of FIG. 2. As shown in FIG. 5, if a length S from the CCD camera 110 to the object is sufficiently longer than a size S₀of the object, the non-linear projective transformation from like Equation 2 is transformed as follows.
$\begin{matrix} [\begin{matrix} u \\ v \\ 1 \end{matrix}] \approx S [\begin{matrix} t_{11} & t_{12} & t_{14} \\ t_{21} & t_{22} & t_{24} \\ t_{31} & t_{32} & t_{34} \end{matrix}] [\begin{matrix} X \\ Y \\ 1 \end{matrix}] & (3) \end{matrix}$

A Fourier descriptor is a linearized shape descriptor which satisfies Equations 1, 2, and 3. The Fourier descriptor represents an image of the object with Fourier coefficients which are obtained by a two-dimensional Fourier transformation for the image contour of a two-dimensional object. However, this method can be applied only to a case where linearity of a camera is guaranteed, that is, where a distance between the CCD camera 110 and the object is too long. Therefore, to overcome the restriction, the image obtained from the CCD camera 110 is analyzed by using a projective invariant shape descriptor I in the present invention. As a result, even in a case where the linearity of the camera is not guaranteed, that is, the distance between the CCD camera 110 and the object is not long, the image can be analyzed correctly without being affected by noise, slant angles, or the nonlinearity of-the CCD camera 110 occurring when images are obtained.

FIG. 6 is a view presenting a relationship for a coordinate system of an object plane, a coordinate system on an image plane, and a world coordinate system of a robot hand. Referring to FIG. 6, a coordinate system of the CCD camera 110 corresponds to a world coordinate system of the robot hand 100 after a rotation and a translation. Thus, a robot hand/eye calibration is a process of finding the elements, a rotation matrix R, which gives a direction of the CCD camera 110, and components of a translation vector t, which gives a location of the CCD camera 110, in the world coordinate system.

In the hand/eye calibration method according to the present invention, the pin-hall camera model is used in which distortion of the lens 114 or misalignment of a light axis can be ignored. The relations between the robot hand 100 of FIG. 6 and the CCD camera 110 can be expressed as follows,

X_h=RX_c+t (4)

where X_hdenotes the world coordinate system of the robot hand 100, i.e., X_cdenotes a coordinate system of the CCD camera 110, R denotes a rotation matrix, and t denotes a translation vector.

The relation between the CCD camera 110 and the image can be expressed as follows.
$\begin{matrix} (u - u_{0}) = \frac{f}{S_{x}} \frac{x_{c}}{z_{c}}, (v - v_{0}) = \frac{f}{S_{y}} \frac{y_{c}}{z_{c}} & (5) \end{matrix}$

where, u and u₀denote X coordinates on an image plane, and v and v₀denote Y coordinates on an image plane. In addition, f denotes a focal length between the lens 114 and the CCD array 112, and S_xand S_ydenote scale factors of the CCD camera 110. The focal length f and scale factors S_xand S_yare characteristic values which indicate original characteristics of the CCD camera 110 and they are fixed according to a specification of the CCD camera 110.

If robot motion information already known to the user X_P1=R_p1X_p2+t_p1is introduced in Equation 4, the following Equation is obtained.

RX_c1+t=R_p1(RX_c2+t)+t_p1 (6)

The motion of the CCD camera 110 X_c1can be expressed as follows by using Equation 6.
$\begin{matrix} X_{c 1} = \frac{(R^{- 1} R_{P 1} R)}{R_{c 1}} X_{c 2} + \frac{R^{- 1} (R_{p 1} t + t_{p 1} - t)}{t_{c 1}} & (7) \end{matrix}$

If the rotation matrix is excluded from Equation 7 and only translation is considered, the rotation matrix R can be expressed as follows.

t_c1=R⁻¹t_p1 (8)

Equation 8 can be expressed by substituting t_p1with three motion vectors of the robot, t_p1, t_p2, and t_p3as follows.

(t_c′1, t_c′2, t_c′3)=R⁻¹(t_p1, t_p2, t_p3) (9)

Here, image vectors corresponding to three motion vectors of the robot hand, t_p1, t_p2, and t_p3are OF₁, OF₂, and OF₃, and each image vector is defined by the following Equation 10,
$\begin{matrix} {OF}_{i} = f [\begin{matrix} \frac{(u_{i} - u_{0})}{f_{x}}, & \frac{(v_{i} - v_{0})}{f_{y}}, 1 \end{matrix}] & (10) \end{matrix}$

where
$f_{x} = \frac{f}{S_{x}}, and f_{y} = \frac{f}{S_{y}} .$

Intrinsic parameters can be calculated as follows.

[OF₁·OF₂=0]
$\begin{matrix} [{OF}_{1} \cdot {OF}_{2} = 0] \frac{1}{f_{x}^{2}} (u_{1} - u_{0}) (u_{2} - u_{0}) + \frac{1}{f_{y}^{2}} (v_{1} - v_{0}) (v_{2} - v_{0}) + 1 = 0 [{OF}_{1} \cdot {OF}_{3} = 0] \frac{1}{f_{x}^{2}} (u_{1} - u_{0}) (u_{3} - u_{0}) + \frac{1}{f_{y}^{2}} (v_{1} - v_{0}) (v_{3} - v_{0}) + 1 = 0 [{OF}_{2} \cdot {OF}_{3} = 0] \frac{1}{f_{x}^{2}} (u_{2} - u_{0}) (u_{3} - u_{0}) + \frac{1}{f_{y}^{2}} (v_{2} - v_{0}) (v_{3} - v_{0}) + 1 = 0 & (11) \end{matrix}$

Equation 11 can be expressed as follows,

u₀(u₂−u₃)+s₁(v₂−v₃)−s₂v₁(v₂−v₃)=u₁(u₂−u₃)
u₀(u₁−u₃)+s₁(v₁−v₃)−s₂v₂(v₁−v₃)=u₂(u₁−u₃) (12)

where
$s_{1} = v_{0} \frac{f_{x}^{2}}{f_{y}^{2}}, and s_{2} = \frac{f_{x}^{2}}{f_{y}^{2}} .$

Equation 12 can be expressed in a matrix form as follows.
$\begin{matrix} [\begin{matrix} (u_{2} - u_{3}) & (v_{2} - v_{3}) & - v_{1} (v_{2} - v_{3}) \\ (u_{1} - u_{3}) & (v_{1} - v_{3}) & - v_{2} (v_{1} - v_{3}) \\ (u_{2}^{'} - u_{3}^{'}) & (v_{2}^{'} - v_{3}^{'}) & - v_{1}^{'} (v_{2}^{'} - v_{3}^{'}) \\ (u_{1}^{'} - u_{3}^{'}) & (v_{1}^{'} - v_{3}^{'}) & - v_{2}^{'} (v_{1}^{'} - v_{3}^{'}) \end{matrix}] [\begin{matrix} u_{0} \\ s_{1} \\ s_{2} \end{matrix}] = [\begin{matrix} u_{1} (u_{2} - u_{3}) \\ u_{2} (u_{1} - u_{3}) \\ u_{1}^{'} (u_{2}^{'} - u_{3}^{'}) \\ u_{2}^{'} (u_{1}^{'} - u_{3}^{'}) \end{matrix}] & (13) \end{matrix}$

In consideration of rotation and translation of the robot hand 100, the translation vector t between the robot hand 100 and the CCD camera 110 can be expressed as follows,

t_c1=R⁻¹(R_p1t+t_p1−t)
t=(R_p1−I)⁻¹(Rt_c1−t_p1) (14)

where (R_p1, t_p1) denotes motion information already known by the user, and R denotes a rotation matrix which is calculated from three rotations of the robot hand 100. t_c′1denotes an image vector, and I denotes a projective invariant shape descriptor calculated from a two-dimensional image. In order to improve the precision of the hand/eye calibration, it is very important to correctly set points corresponding to coordinates which are predetermined within the field of view of the CCD camera 110. Therefore, in the present invention, corresponding points are obtained by the CCD camera 110 by using the projective invariant shape descriptor which does not vary under nonlinear transformation. Then, the corresponding points are used as calibration targets to conduct hand/eye calibration. The projective invariant shape descriptor I, which is used as a fundamental factor of the hand/eye calibration, can be defined as follows.
$\begin{matrix} I \equiv \frac{\det (q_{5} q_{1} q_{4}) \det (q_{5} q_{2} q_{3})}{\det (q_{5} q_{1} q_{3}) \det (q_{5} q_{2} q_{4})} = \frac{\det (P_{5} P_{1} P_{4}) \det (P_{5} P_{2} P_{3})}{\det (P_{5} P_{1} P_{3}) \det (P_{5} P_{2} P_{4})} & (15) \end{matrix}$

where P denotes points of the object, q denotes corresponding points of the image as shown in FIG. 3. det(·) in Equation 15 can be defined as follows.
$\begin{matrix} \begin{matrix} \det (q_{1} q_{2} q_{3}) = f [\begin{matrix} x_{1} & x_{2} & x_{3} \\ y_{1} & y_{2} & y_{3} \\ 1 & 1 & 1 \end{matrix}] \\ \det (P_{1} P_{2} P_{3}) = f [\begin{matrix} X_{1} & X_{2} & X_{3} \\ Y_{1} & Y_{2} & Y_{3} \\ 1 & 1 & 1 \end{matrix}] = 2^{k} (Area of Δ P_{1} P_{2} P_{3}) \end{matrix} & (16) \end{matrix}$

The projective invariant shape descriptor I expressed in Equations 15 and 16 represents information which does not vary under nonlinear transformation as shown in Equation 2 and does not vary though images obtained by the CCD camera 110 are transformed.

FIG. 7 is a view for explaining a method of obtaining a projective invariant shape descriptor from a 2-dimensional image. Referring to FIG. 7, a contour of the two-dimensional image of a drawing is extracted, and the extracted contour is divided into five similar intervals. Coordinates of points (X₁(1), X₁(k), X₂(1), X₂(k), X₃(1), X₃(k), X₄(1), X₄(k), X₅(1), X₅(k)) which constitute each interval are obtained, and then the projective invariant shape descriptor is calculated. The points (X₁(1), X₁(k), X₂(1), X₂(k), X₃(1), X₃(k), X₄(1), X₄(k), X₅(1), X₅(k)) are input into Equation 15 recursively, with moving continuously by 1/N times the length of the contour along the contour until each point reaches its initial location.
$\begin{matrix} I (k) = \frac{\det (X_{5} X_{1} X_{4}) \det (X_{5} X_{2} X_{3})}{\det (X_{5} X_{1} X_{3}) \det (X_{5} X_{2} X_{4})} & (17) \end{matrix}$

where X₁(k)=(X(k), Y(k),1),
$\begin{matrix} X_{2} (k) = (X (\frac{N}{5} + k), Y (\frac{N}{5} + k), 1), \\ X_{2} (k) = (X (\frac{N}{5} + k), Y (\frac{N}{5} + k), 1), \\ X_{3} (k) = (X (\frac{2 N}{5} + k), Y (\frac{2 N}{5} + k), 1), \\ X_{4} (k) = (X (\frac{3 N}{5} + k), Y (\frac{3 N}{5} + k), 1), \end{matrix}$ $X_{5} (k) = (X (\frac{4 N}{5} + k), Y (\frac{4 N}{5} + k), 1),$

and 1≦k≦N, and X(k) and Y(k) denotes X and Y axis coordinate function of the contour.

An example of projective invariant shape descriptor calculated from a 2-dimensional shape of FIG. 7 is shown in FIG. 8. The calculation result of the projective invariant shape descriptor I of FIG. 8, i.e., projective invariant, maintains its intrinsic value though the shape of the image obtained by the CCD camera 110 is transformed. In addition, the projective invariant shape descriptor I is not affected by noise or slant angles. Thus, if the projective invariant is used for the hand/eye calibration, precision of the calibration can be improved.

Corresponding points of the images obtained by the CCD camera 110 by using the projective invariant shape descriptor I can be extracted as follows.

In order to extract corresponding points of the images, errors in the projective invariant shape descriptor I have to be defined. In the present invention, the errors are defined using a Gaussian noise model. In order to use the Gaussian noise model, Equation 17 can be expressed as follows.
$\begin{matrix} I = \frac{\det (X_{5} X_{1} X_{4}) \det (X_{5} X_{2} X_{3})}{\det (X_{5} X_{1} X_{3}) \det (X_{5} X_{2} X_{4})} & (18) \end{matrix}$

where X_i=(x_i, y_i, 1)^τor I=I(x₁, y₁, x₂, y₂, x₃, y₃, x₄, y₄, x₅, y₅).

Here, if (x_i, y_i) is true data, and ({tilde over (x)}_i, {tilde over (y)}_i) is a noisy observation parameter, the noise observation parameter can be expressed as follows,

{tilde over (x)}_i=x_i+ξ_i, {tilde over (y)}_i=y_i+η_i (19)

where noise terms ξ_iand η_iare distributed noise terms, and their mean and variance are 0 and σ_i². The noise terms can be expressed as follows.
$\begin{matrix} E [ξ_{i}] = E [η_{i}] = 0 V [ξ_{i}] = V [η_{i}] = σ_{i}^{2} E [ξ_{i} ξ_{j}] = {\begin{matrix} σ_{0}^{2} & if i = j \\ 0 & otherwise \end{matrix}}, E [η_{i} η_{j}] = {\begin{matrix} σ_{0}^{2} & if i = j \\ 0 & otherwise \end{matrix}} E [ξ_{i} η_{j}] = 0 & (20) \end{matrix}$

Noisy invariant can be expressed as follows after noisy measurements on the image are observed.

Ĩ({tilde over (x)}₁, {tilde over (y)}₁, {tilde over (x)}₂, {tilde over (y)}₂, {tilde over (x)}₃, {tilde over (y)}₃, {tilde over (x)}₄, {tilde over (y)}₄, {tilde over (x)}₅, {tilde over (y)}₅) (21)

In order to calculate an expected value and a variance of the noisy invariant Ĩ, the noisy invariant Ĩ can be expressed with (x₁, y₁, x₂, y₂, x₃, y₃, x₄, y₄, x₅, y₅) by using Talyor series.
$\begin{matrix} \begin{matrix} \tilde{I} \approx I + \sum_{i = 1}^{5} [({\tilde{x}}_{i} - x_{i}) \frac{\partial \tilde{I}}{\partial {\tilde{x}}_{i}} + ({\tilde{y}}_{i} - y_{i}) \frac{\partial \tilde{I}}{\partial {\tilde{y}}_{i}}] \\ = I + \sum_{i = 1}^{5} [ξ_{i} \frac{\partial \tilde{I}}{\partial {\tilde{x}}_{i}} + η_{i} \frac{\partial \tilde{I}}{\partial {\tilde{y}}_{i}}] \end{matrix} & (22) \end{matrix}$

Here, the variance can be expressed as follows.
$\begin{matrix} E [{(\tilde{I} - I)}^{2}] = σ_{0}^{2} \sum_{i = 1}^{5} [{(\frac{\partial \tilde{I}}{\partial {\tilde{x}}_{i}})}^{2} + {(\frac{\partial \tilde{I}}{\partial {\tilde{y}}_{i}})}^{2}] & (23) \end{matrix}$

A threshold of the noisy invariant can be defined as follows.

ΔI=3×√{square root over (E[(Ĩ−I)²])} (24)

The corresponding points are found from the images obtained by the CCD camera 110 of the robot hand 100 by repeating calculation of the projective invariant shape descriptor, and boundary data between a previous image and a following image consecutively obtained through the CCD camera 110 can be expressed as follows.

O_k^In={X_Ik^In, Y_Ik^In}, k=1˜n^In
O_k^Mo={X_Ik^Mo, Y_Ik^Mo}, k=1˜n^Mo (25)

where n^Inand n^Modenote the number of points in a boundary between a scene and a model. The boundary data are presented by subsampling N data, and this subsampling can be expressed as follows.

q_i^In={X_τ1(i)^In,Y_τ1(i)^In},q_i^Mo={X_τ2(i)^Mo,Y_τ2(i)^Mo}, i=1˜N (26)

where
$τ_{1} (i) = \frac{n^{In}}{N} \times i, τ_{2} (i) = \frac{n^{Mo}}{N} \times i,$

and N denotes the number of points on a normalized contour.

Then, a projective invariant shape descriptor is calculated by using q_i^Inand q_i^Modefined in Equation 26 when a value of the following Equation is minimized,
$\begin{matrix} ɛ^{2} = \sum_{i = 1}^{N} {w_{i}^{2} (q_{i}^{In} - {(C \cdot q_{i}^{Mo} + d)}^{- 1} (A q_{i}^{Mo} + b))}^{T} (q_{i}^{In} - {(C \cdot q_{i}^{Mo} + d)}^{- 1} (A q_{i}^{Mo} + b)) & (27) \end{matrix}$

where A, b, c, and d denote variants defined from transformation between the previous image and the following image and can be expressed as follows.
$\begin{matrix} [\begin{matrix} X_{Ij}^{2} \\ Y_{Ij}^{2} \\ 1 \end{matrix}] = {(s_{7}^{'} X_{Ij}^{2} + s_{8}^{'} Y_{Ij}^{2} + s_{9})}^{- 1} ([\begin{matrix} s_{1} & s_{2}^{'} \\ s_{4}^{'} & s_{5} \end{matrix}] [\begin{matrix} X_{Ij}^{1} \\ Y_{Ij}^{1} \end{matrix}] + [\begin{matrix} s_{3}^{'} \\ s_{6}^{'} \end{matrix}]) & (28) \\ q_{i}^{2} = {(C \cdot q_{i}^{1} + d)}^{- 1} (A q_{i}^{1} + b) & (29) \end{matrix}$

The weight w_iof Equation 27 is calculated by the variance defined in Equation 23 as follows.
$\begin{matrix} \frac{1}{w_{i}^{2}} \equiv σ_{0}^{2} = \sum_{i = j}^{5} [{(\frac{\partial {\tilde{I}}_{i}}{\partial {\tilde{x}}_{j}})}^{2} + {(\frac{\partial {\tilde{I}}_{i}}{\partial {\tilde{y}}_{j}})}^{2}] \times E [{({\tilde{I}}_{i} - I_{i})}^{2}] & (30) \end{matrix}$

The projective invariant shape descriptors can be minimized by Equations 27 through 30 as follows.
$\begin{matrix} \begin{matrix} P = {(Q^{T} Q)}^{- 1} Q^{T} H \\ Here, P = {(s_{1}, s_{2}^{'}, s_{4}^{'}, s_{5}, s_{3}^{'}, s_{6}^{'}, s_{7}^{'}, s_{8}^{'})}^{T}, \\ \begin{matrix} Q = \\ [\begin{matrix} ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ w_{i} X_{Ii}^{Mo} & w_{i} Y_{Ii}^{Mo} & w_{i} & 0 & 0 & 0 & w_{i} X_{Ii}^{In} X_{Ii}^{Mo} & w_{i} X_{Ii}^{In} Y_{Ii}^{Mo} \\ 0 & 0 & 0 & w_{i} X_{Ii}^{Mo} & w_{i} Y_{Ii}^{Mo} & w_{i} & w_{i} X_{Ii}^{In} X_{Ii}^{Mo} & w_{i} X_{Ii}^{In} Y_{Ii}^{Mo} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \end{matrix}] \end{matrix} \\ H = {(\dots - w_{i} X_{Ii}^{In} - w_{i} Y_{Ii}^{In} \dots)}^{T} \end{matrix} & (31) \end{matrix}$

After the projective invariant shape descriptors are minimized by using Equation 31, the following image obtained by the CCD camera 110 is transformed into a previous image. This transformation can be expressed as follows.

q_i′^Mo′=(C·q_i^Mo+d)⁻¹(A q_i^Mo+b), i=1˜N (32)

where A, b, c, and d denotes variance defined transformation between the previous image and the following image.

After the transformation between the previous image and the following image is completed, the ratio of the distance between boundary data before the transformation to the distance between boundary data after the transformation, i.e., _τ2(i)′, is calculated by using the following Equation, and then the length between data is reset by using the ratio.
$\begin{matrix} \begin{matrix} \begin{matrix} τ_{2}^{'} (i) = \frac{\langle q_{i + 1}^{{Mo}^{'}} - q_{i}^{{Mo}^{'}} \rangle}{T}, & i = 1 \sim N \end{matrix} \\ T = \sum_{i = 1}^{N} \langle q_{i + 1}^{{Mo}^{'}} - q_{i}^{{Mo}^{'}} \rangle \end{matrix} & (33) \end{matrix}$

By using the ratio _τ2(i)′ calculated by Equation 33, the following image can be resampled as follows.

q_i^In′=O_τ2(i)^In, i=1˜N (34)

After that, in order to include errors between the previous image and the following image within a predetermined scope, Equations 29 through 34 are repeated. The errors are expressed by errors of the corresponding points and similarities between the projective invariant shape descriptors (I_m, I_i) of the boundary data. A similarity value of the projective invariant shape descriptors (I_m, I_i) of the boundary data can be expresses as follows.
$\begin{matrix} \begin{matrix} similarity (I_{m}, I_{i}) = \frac{\sum_{k = 1}^{N} T (k)}{N} \\ T (k) = {\begin{matrix} 1, & if \langle I_{m} (k) - I_{i} (k) \rangle < Δ I \\ 0, & otherwise \end{matrix} \end{matrix} & (35) \end{matrix}$

If the maximum value of the similarity values is greater than a predetermined threshold, it is determined that the corresponding points of the previous image and the following image are the same as each other. The value of ΔI in Equation 35 and the predetermined threshold are selected according to an environment to which the present invention is applied and a required precision.

FIG. 9 is a flowchart of a method for hand/eye calibration according to a preferred embodiment of the present invention. Referring to FIG. 9, the hand/eye calibration method according to the present invention includes calculating a projective invariant shape descriptor I from a two-dimensional image (step 210). Then, corresponding points between a previous image and a following image consecutively obtained through the CCD camera 110 are extracted by using the calculated projective invariant shape descriptor I (step 220). The extracted corresponding points are used as targets to perform the hand/eye calibration.

Then, a rotation matrix R for a coordinate, i.e., the extracting corresponding points, is calculated from a translation of the robot hand 100 (step 230). Here, the rotation matrix R is calculated by Equation 8. A translation vector t for the coordinate, i.e., the extracted corresponding points, is calculated from translation and rotation of the robot hand 100 (step 240). Here, the translation vector t is calculated by Equation 14. After completion of steps 230 and 240, the hand/eye calibration which defines a relation between the robot hand 100 and the CCD camera 110, that is, obtains a calculation result of X_h=RX_c+t, is completed (step 250). Here, X_hdenotes a coordinate system of the robot hand 100, and X_cdenotes a coordinate system of the CCD camera 110 mounted on the robot hand 100.

FIG. 10 is a detailed flowchart of a method of obtaining corresponding points between a previous image and a following image shown in FIG. 9. In general, a calibration target which is finely organized is required to accurately perform the hand/eye calibration. Thus, as shown in FIG. 10 of the present invention, corresponding points of consecutive images obtained by the CCD camera 110 are extracted by using the projective invariant shape descriptor I which is not affected by nonlinearity or noise of the CCD camera 110, and the projective invariant shape descriptor I is used as the calibration target.

In order to accurately perform the hand/eye calibration, errors of the projective invariant shape descriptor I of images are defined (step 2200), and noisy invariant is calculated by analyzing an amount of noises of the images (step 2210). Then, a threshold is calculated according to the noisy invariant calculated in step 2210 (step 2220).

Boundary data of a previous image and a following image obtained by the CCD camera 110 are extracted (step 2230). The extracted boundary data is presented by subsampling N data (step 2240). Then, a projective invariant shape descriptor is minimized in accordance with Equation 31 (step 2250), and the following image is transformed into a previous image in response to the minimized projective invariant shape descriptor (step 2260). After that, the distance between boundary data is reset by using the ratio of the distance between boundary data before the transformation to the distance between boundary data after the transformation (step 2270).

After the distance between the boundary data is reset in step 2270, similarities between the boundary data of the previous image and the following image are founded (step 2280), and corresponding points between the previous image and the following image are extracted by using the found similarities (step 2290).

The hand/eye calibration method according to the present invention extracts corresponding points between the previous image and the following image by using the projective invariant shape descriptor I. Thus, it is possible to accurately perform the hand/eye calibration without being affected by noise or nonlinearity of a camera.

In the present invention, the hand/eye calibration method is applied to the robot hand. However, the hand/eye calibration method can be applied to various kinds of visual servoing devices which control motions of an object by using images obtained through at least one camera.

The present invention may be embodied as a computer readable code in a computer readable medium. The computer readable medium includes all kinds of recording device in which computer readable data are stored. The computer readable medium includes, but not limited to, ROM's, RAM's, CD-ROMs, magnetic tapes, floppy disks, optical data storage device, and carrier waves such as transmissions over the Internet. In addition, the computer readable medium may be distributed to computer systems which are connected via a network, be stored and embodied as the computer readable code.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various transforms in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.

	Number	Date	Country
Parent	10706936	Nov 2003	US
Child	11300435	Dec 2005	US

Hand/eye calibration method using projective invariant shape descriptor of 2-dimensional image

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims

Priority Claims (1)

Divisions (1)