The invention is a process for determining the geometry, position and orientation of one or several objects in an environment. The objective is to provide dimensional or measurement information on three-dimensional primitives (dots, straight lines, circles, cylinders, etc.) representing these objects using a projection on images acquired by one or several sensors. These dimensional data are used for the dimensional check of manufactured objects (prototype or series production), the measurement of structural deformation, and modeling of industrial environments.
There are several major families of processes to accomplish this type of measurement. Some involve direct measurement of objects in the environment by the tip of a feeler, but this method cannot always be applied and it becomes very long as soon as the environment becomes voluminous or cluttered, or if its shape is complicated; This method is unthinkable when the environment is the size of a complete room. Other methods make use of range finding, in other words distances are measured to various dots in the environment without any physical contact; a laser is moved towards these dots, one at a time, and the measurement is made on the flight time or phase shift of the wave. A mirror or a mechanical system is used to continuously move the laser ray towards other dots, to enable fast measurements of the environment, but it is found that this method is not always very precise (although the resolution is limited only by the laser scanning system) and is accompanied by errors when the beam touches reflecting objects, and it is also necessary to maintain the stability of the mechanical system while scanning and to take care to guarantee the safety of any persons within the measurement volume.
Other methods include optical methods in which a camera is moved in front of the environment to be measured and takes a sequence of images. The details of the environment are identified on the different images and their position is calculated by triangulation based on their position on the different images and the known positions as the camera advances, as a function of image taking parameters of the camera. Sometimes, a network of dots is identified in the environment, these dots being illuminated by a laser or projector in a beam of rays; additional light may be added to better illuminate the surroundings around the dots network and to make it stand out from the rest of the environment. The use of an illumination means resembles range finding processes and introduces corresponding disadvantages of inaccuracy and lack of safety, that do not always compensate for the speed and ease of identification and the calculations that can frequently be carried out.
In other methods, the dots to be measured are light sources, reflecting or colored marks previously placed in the environment. These methods give good results if the marks and their positions are suitably chosen, but they are not applicable in all cases and particularly for large complicated environments; in particular, they are useful for monitoring the position of a determined object moving in the environment, rather than for measuring the environment itself.
Finally, other optical processes are based on the lack of marks in the environment and on measuring some points of interest in images. The points of interest are chosen automatically, or the operator may choose them in the case of interactive processes. Interactive processes have the advantage that they are universal or theoretically applicable to any environment, but it is not always easy to have a sufficiently large number of points of interest that are common to all images; the step in which each dot is identified on different images may be difficult; furthermore, a description of an environment by even a large number of dots is not very meaningful.
The invention consists of a process included in purely optical methods and more precisely methods that include an image analysis. This new process may include triangulation, but it is fundamentally different from previous processes, in that in this case we are interested in distinguishing details of the environment rather than drawing up a map of the dots in it. It is often easier and more useful to discern a specific element of the environment and to distinguish it from the rest, rather than to have a complete but indistinct knowledge about this environment. In the frequently encountered case of measuring industrial rooms, this consideration will be particularly important when there are a lot of different elements and obstacles of a simple shape, that are superposed and create a very complicated relief, but interpretation of the resulting representation is much easier when these elements are distinguished and when they are characterized by a few position and shape parameters.
The process has many advantages: there is not really any dot in making specific marks in the environment; a much larger portion of the information in the images is used than if points of interest only are considered, which must give better precision of the resulting representation; the process is efficient even with a variety of diffusing or reflecting surfaces; it is applicable to a variety of volumes, possibly very large; the acquisition time is very fast, a few tens of milliseconds; the process may be fully automated; the representation may be completed later by adding new entities which had been neglected earlier, or by correcting it with updates or other operations; and since it immediately supplies a correct model of the environment, it can be used immediately, whereas a map of dots needs to be interpreted.
The process is based on a system composed of five main modules defined in the following list:
The use of this process requires one or several previously calibrated video cameras (although calibration is not necessary if dot type primitives are used exclusively), in order to determine the relation between any dot on the image and the position of the associated light ray. Preliminary calibrations have already been described by different authors, for example the article by Viala, Chevillot, Guérin and Lavest: “Mise en oeuvre d'un procédé d'étalonnage précis de camera CCD—Implementation of a process for precise calibration of a CCD camera” presented at the 11th Conference on Shape Recognition and Artificial Intelligence (Clermont-Ferrand, Jan. 20 to 22, 1998). When several cameras are used, the system is said to be stereoscopic and is capable of automatically giving a three-dimensional model of the environment by searching for corresponding dots on the images and triangulation. If a single camera is used, the same result can be obtained by successive images by moving the camera by a determined distance. This distance may also be determined afterwards by calculation, if a standard meter is available in the environment.
In summary, the invention relates to a process for measuring three-dimensional objects in a three-dimensional environment, consisting of taking at least one image by at least one camera and creating a representation of the environment based on an analysis of the image, characterized in that the analysis comprises detection of discontinuities in the appearance of the image, a combination of discontinuities detected at geometric contours defined on the image by parameters, an adjustment of contours to discontinuities by varying the parameters, an estimate of the shape and position in the environment of geometric objects projecting onto the image according to the said contours, the representation showing the said objects.
The representation of the environment is added to every time that a new image is taken or when additional information is supplied. The process can also include initial estimates of the position of objects or the camera starting from information given manually or in a computer description file.
In general, the process can be carried out with many alternatives and with flexibility depending on the situation encountered. One possibility with some of the best embodiments is a correction to the position of objects by estimating positions of projections of the objects onto the images, based on the respective positions of the camera after the images have been taken, and by adjusting the estimated positions of the projection based on the measured positions of the projection on the images.
This correction is usually made during a final summary calculation in which the total representation error is estimated and then minimized; the estimate of camera parameters can also be corrected.
We will now describe a specific embodiment of the invention with reference to the following figures:
and
The modules mentioned above are referenced with marks 20 to 24 on
A representation of the environment means a measurement of geometric or dimensional characteristics of one or several objects, measurement of geometric or dimensional characteristics of elements or objects forming a scene or an environment. This term also relates to the measurement of the position and orientation of one or several objects.
A camera image consists of a network of dots with different shades of gray, that are converted into digital values to be stored in memory 12.
Positioning of natural contours on an image is based on “deformable” models or active contours (see the article by Kaas, Witkin and Terzopoulos “Snake: active contour models” published in the International Journal of Computer Vision, 1(4), p 321 to 331, January 1988 and the Bascle's thesis at the University of Nice—Sophia Antipolis (January 1994) “Contributions et applications des modèles déformables en vision par ordinateur—Contributions and applications of deformable models in computer vision”. They consist of digitally varying a deformable contour model starting from an initial position while calculating its energy after each deformation. This energy conventionally includes two terms, the first of which expresses the geometric regularity of the model and takes account of any physical properties, and the second takes account of the match between the model and the experimental image obtained. Specifically, the purpose of this processing is to regularize the model by reducing its local irregularities, usually due to noise, without getting too different from the information in the image; but it only works well on fairly simple images, which is not the case here. Furthermore, this invention proposes an improvement by describing some elements of the image by global geometric parameters. Therefore, we can say that the environment models that will be obtained will be both deformable and parametric.
The shapes of the contours in which we are interested here are simple and belong to a few preferred types that are encountered very frequently in reality; as shown on
where u is between −1 and +1 and θ is between 0 and π.
A cylinder will be defined by its contours or limbs. It will consist of two parallel segments, unless the perspective effect is considered. A suitable model is shown in
where u (path parameter) is between −1 and +1.
But if we want to take account of a perspective effect, the previous model can be enriched by parameters δθ expressing deviations in opposite directions and making the two segments converge, as shown in
replace equations 2 and 3.
Projection of a circle in space onto a two-dimensional image forms an ellipse, and
give the coordinates of dots on the ellipse, where u is a curved abscissa parameter between 0 and 2π.
The process begins by initializing the representation of the environment, usually manually, in which an operator examines one of the images on a computer screen and marks the contours to be modeled. After choosing the appropriate contour type, he chooses a sufficient number of dots on the screen to define this contour and enable a first calculation of the parameters.
These dots are marked by stars on
The next step is to match the contour selected by the operator, or selected automatically on the image by using a potential function using calculations made by the positioning module 20. In general, an improvement to a model on an image is evaluated by successively reduction of a function Pe called the potential function that includes several terms. In most cases, the energy term alone is sufficient. The image is processed by calculating the differences in digitized shades of gray of adjacent dots, to relate a high potential intensity to each dot on the image if the dot is within an area with a uniform color, and a low potential intensity if it is located in a transition or color discontinuity area. This is done for each dot on the image. If a potential image was shown, it would show dark areas around the contours of objects, and usually a light background elsewhere. The sum of the potential of a contour is calculated on all its dots, and then a digital analysis algorithm by reduced gradient is used to calculate potential variations as a function of the variation of contour parameters. In this case, the objective is to minimize the root mean square ε of the potential Pe along the contour C, using the following equation
where a is the model parameters vector and x, y are the abscissas and ordinates of the dots on the contour. Apart from the rate of convergence, this digital tool has the advantage that it provides an evaluation of the covariance matrix on the estimated model, denoted Δa. This information will be used by the three-dimensional reconstruction and positioning module.
A special distance given by equation
is used to calculate the potential Pe of dots on the image. This special distance has the advantages of being quadratic close to zero, in other words to the contour, and approximately constant when the Euclidian distance between dots on the image d becomes large. σ is a fixed coefficient. This distance is comparable to a weighting coefficient that attenuates the influence of remote dots in the calculation of the potential Pe.
However, an additional potential term is used in addition to the previous term Pe for cylinder contours. It frequently arises that these elements are affected by lighting variations that create highly reflecting bands of brightness towards which the deformable model may converge by confusing them with contours. The use of this additional term avoids this danger; it is a conventionally very high potential term for strongly illuminated dots; the total potential thus modified becomes high close to reflecting bands, which pushes the modeled contours towards real contours of the cylinder.
Note also the influence of geometric aberrations introduced by the lenses of an objective; a straight line in space is projected onto the image as a curved segment, rather than a straight line segment. The deformable models described here cannot give a perfect approximation of this type of deformed parts, but a process for correction of geometric aberrations can be used to apply the process according to the invention to corrected images, obtained without distortion. This correction process is made for all dots on the image at the same time in advance, and the corrected image is stored in memory 12.
Geometric aberrations are composed of two terms, including one radial distortion term that moves a dot radially with respect to the optical center of the image and is expressed as a polynomial with equation
δr(r)=K1r3+K2r5+K3r7 (9)
as a function of the radial distance r=√{square root over (x2+y2)}; and a tangential distortion term that includes a tangential component and a radial component in accordance with the following equations:
The coefficients K1, K2, K3 and P1 and P2 are distortion coefficients estimated while the camera is being calibrated.
The radial distortion is estimated by a preliminary calculation of an aberration table as a function of the radial distance. For each radial distance rD from the center of a distorted calibration image, this table contains the corresponding distance rND of the same position in the undistorted image. The separation between successive values of the distances rD stored in the table is chosen such that the minimum precision Δ between the successive values of the corrected distance rND is respected. The precision of this process can be as high as one tenth of the distance between two successive dots on the image.
It is not intended to use the same method in this invention to take account of tangential distortion, since tables giving corrections as a function of the x and y coordinates should apply to all dots on the image and would occupy too much space in memory. This is why it is recommended that an equation roots search algorithm based on equations (10) should be used, such as Powell's algorithm that is well known to a person skilled in the art, if these tangential distortions have to be taken into account.
We will now go on to describe the second module 21 of the operating system, which is a module for reconstruction and positioning that makes use of the positions of contours of objects detected previously on the images to determine the position of these objects in the environment, in other words to build up a three-dimensional representation of the environment while calculating the position of the image sensor 7 in a positioning step. The process is recurrent, in other words the images are used in sequence, the representation of the environment being added to and corrected each time to make it more precise. It is an application of the Kalman filter. This presentation describes the use of a stereoscopic sensor 7 with two cameras, but the process would be applicable to a sensor with a single camera; reconstruction and positioning can be evaluated except for a scale factor, that can be determined by inputting additional information into the system, such as a distance between two dots or the radius of a cylinder.
The following describes the formulas that relate the vector, xk of parameters of the object detected in an absolute coordinate system and the vector zki of its observation coordinates in this image, for a camera with index i of the sensor that took an image at instant k. The position of the camera will be noted by a rotation matrix Rki and a translation vector tki in the absolute coordinate system. Transfer formulas are denoted by the letter h.
In the case of a dot, the equations
in which (xk,yk,zk)t=Rki(x, y, z)t+tki are respected, where xk=(x, y, z)t, zki=(u, v).
In the case of a straight line, xk and zki are defined by vectors (13) xk=(x, y, z, β, φ)t, zki=(x, y, z, β, φ)t, zki=(u, v, θ), in which β and φ are the spherical coordinates of the unit vector of the straight line and θ is the angle formed by its projection onto the image; the formulas
where x is the vector product, define the conditions to be satisfied, in which (mk, vk) are the parameters of the straight line (the coordinates of one of its dots mk and its unit vector) in accordance with the following equations:
mk=Rkim+tki, vk=Rkiv, (15)
mp represents the coordinates of the projection of dot mk onto the image, mI is the middle of the segment detected on the image and vI is the unit vector of the segment in accordance with
An infinite cylinder is defined by the vector
xk(x, y, z, β, φ, r)t, (16)
in which x, y and z are the coordinates (denoted m) of a dot on its axis, β and φ are the spherical coordinates (denoted v) of the unit vector along its axis, and r is its radius. The equations
mk=Rkim+tki and vk=Rkiv (17)
express the position of the axis of the cylinder in the coordinate system of camera i at time k. The coordinates of its limbs (m1, v1) and (m2, v2) , and mp1 and mp2, the projections of dots m1 and m2 of the limbs onto the image, are also calculated. The measured parameters on the image
(u, v, θ, δθ, d) (18)
are used to deduce the observation vector zk=(u1,v1, θ1, u2, v2, θ2) corresponding to the mid-dots and the orientations of the two observed limbs and the following measurement equation is obtained:
The circle is defined by a state vector conform with the following formula:
xk=(x, y, z, β, φ, r)t, (20)
where x, y and z denote the coordinates of its center, β and φ the spherical coordinates of the unit vector along its normal and r is its radius. Furthermore, the formulas
mk=Rkim+tki and vk=Rkiv (21)
are applicable. If observation coordinates are represented by the function
zki=(u,v,l1,l2,θ), (22)
the following equations
where Q=a2(xk2+yk2+zk2−r2)+1−2bxk express the transfer between the state vector and observations, in which q0, . . . , q4 are derived from conversion of parameters (22) to obtain a representation of the ellipse in implicit form such that u2+q0v2+q1uv+q2+q3v+q4=0.
We will now go on to the description of the reconstruction process in the special case of a sensor formed from two cameras fixed with respect to each other, denoted by their indexes 1 and r and simultaneously taking an image. For a dot, the global observation vector can be expressed by
zk=(u1,v1,ur,vr,χk,βk,αk,txk,tyk,tzk) (24)
where u1, v1, ur and vr are the coordinates of the dot on the two images and the other parameters are the orientation and translation vectors of the sensor in the absolute coordinate system. The dot observation function is then given by the following equation
for which the solution (which is a duplication of equation (11) for the two cameras) gives an evaluation of the state vector xk of the dot, composed of coordinates x, y and z in the absolute coordinate system.
The position of a straight line is determined by obtaining an observation vector
zk=(u1,v1,θ1,ur,vr,θr,χk,βk,αk,txk, tyk,tzk)t (26)
and solving the following equations
analogically; note that the θ parameters are the angles between the projections of the straight line onto the images l and r and the horizontal. However, note that since straight line segments are observed rather than the straight lines themselves, the state vector for a straight line is given by the formula
xk=(a, b, p, q)t, (28)
rather than by the coordinates of a dot on the straight line and the unit vector along this straight line. For each acquisition, the straight line estimated by the parameters of the state vector a, b, p and q is expressed in the form of a finite straight line with parameters x, y, z, β, φ and l where l denotes the length of the segment and the coordinates x, y and z denote the middle of this segment. These coordinates x, y and z are evaluated by reprojection into the image. The definition of parameters a, b, p and q is as follows:
The cylinder is also defined in the representation by the parameters a, b, p and q of its axis and by its radius, using the formula
xk=(a, b, p, q, r)t. (29)
The observation vector is defined by the formula
zk=(u1l,v1l,θ1l,u2l,v2l,θ2l,u1r,v1r,θ1r,u2r,v2r,θ2rχk,βkαk,txk, tyk,tzk)t. (30)
The system of equations
must be solved. Finally, the state vector of a circle is defined by the following formula
xk=(x, y, z, β, φ,r)t, (32)
and the observation vector is defined by the formula
zk=(u1,v1,l11,l21,θ1,ur,vr,l1r, l2r,θr,αk,βk,χk,t xk,tyk,tzk)t, (33)
and the system of equations
must be solved.
The estimated position of the object is refined for each new acquisition. When an object appears in a pair of images for the first time, this estimate is initialized by a preliminary reconstruction by triangulation. Prior art already contains descriptions of such methods. A suitable initialization makes the estimate of the position of the object converge more quickly for each new image.
Reconstruction of the three-dimensional environment requires the position of the sensor to be determined; this position is usually not known, or is known but with an insufficient precision. For each new acquisition, dots previously reconstructed in the environment are used and their observation vector is used for pre-positioning of the sensor by searching for
in other words the values χk, βk, αk, txk, tyk, tzk that give the best agreement between the representation of the environment and its image on the cameras (h close to 0) for all dots j in the model. The following equations are then solved recurrently:
hp(xk,zk)=0, hd(xk,zk)=0, hcy(xk,zk)=0, or hc(xk,zk)=0 (37)
(one for each object already built, depending on the category of the object), in which observation vectors zk are given by the appropriate formula
zk=(u1,v1,ur,vr,x,y,z,)t, (38)
zk=(u1,v1,θ1,ur,vr, θr,x,y,z,β,φ)t,
zk=(u11,v11,θ11,u21,v21, θ21,u1r,v1r,θ2rx,y,z,βφr)t.
or
zk=(u1,v1,l11,l21,θ1,ur,vrl1r,l2r, θr,x,y,z,β,φ,r)t
this is another application of the Kalman filter in which the estimated state vector in this case is (χk, βk, αk, txk, tyk, tzk). Module 22 performs this positioning.
The identification module 23 of the system automatically identifies at least some of the contours defined in the previous calculations, each time that an image is taken. It is proposed to proceed as follows:
The last module performs a three-dimensional block calculation. This is done using module 24 when all images in the environment have been used as described and a complete representation of the environment has been produced. The calculation is carried out as follows:
The next step is to use a least squares method, minimizing a global error. A vector x=(xG1 . . . xGn xM1 . . . xMP)T can be defined in which the xG values contain the parameters of all n objects of the representation and the xM values contain the parameters of the p photos (α, β, χ, tx, ty, tz)T, together with a measurement vector z that contains all observations made for each object and for each image. The adjustment made by module 24 is equivalent to minimizing an error function F(x,z,a) in which a denotes known information about the image taking means (for example intrinsic parameters, optical center, focal length, scale and distortion factors) or about the representation (for example the parameters of vector x that are assumed to be well determined or known). Weightings of the different parameters may be introduced. Therefore, this module 24 can evaluate uncertainties of the representation of the environment and can reduce them by modifying estimated image taking parameters.
Some parameters can be corrected or blocked. The parameters used are u and v for a dot, θ and d (distance to the origin of the image coordinate system) for a straight line and each cylinder limb. Furthermore, the coordinates u and v of the ends of straight line and cylinder segments are also used.
The block calculation can also be used to measure the position and orientation of one or several objects using a single image and a camera. This can only be done if additional information about the objects is available; the geometric characteristics of each object must be known and injected into the block calculation. The measurement of projections of these said characteristics in a single image is sufficient to determine the position and orientation of the object. It will be necessary to make sure that a sufficient number of characteristics is available to evaluate all position and orientation parameters.
Number | Date | Country | Kind |
---|---|---|---|
00 05392 | Apr 2000 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR01/01274 | 4/26/2001 | WO | 00 | 12/26/2001 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO01/81858 | 11/1/2001 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5528194 | Ohtani et al. | Jun 1996 | A |
5537494 | Toh | Jul 1996 | A |
6249285 | Madden et al. | Jun 2001 | B1 |
Number | Date | Country |
---|---|---|
43 25 269 | Feb 1995 | DE |
2 272 515 | May 1994 | GB |
Number | Date | Country | |
---|---|---|---|
20050063581 A1 | Mar 2005 | US |