The utility of computer vision systems in a variety of applications is widely recognized. A fundamental task in computer vision systems is determining the pose of the image capturing device (e.g., a video camera) given one or more images of known points in the world.
An exemplary application of pose estimation is vehicle localization. By tracking environmental features from a vehicle-mounted camera, it is possible to estimate changes in vehicle position and to use this information for navigation, tracking or other purposes. However, current techniques for camera-based vehicle localization that are based on known pose estimation algorithms do not make optimal use of multiple cameras present on the vehicle, because existing methods for pose estimation assume a single-perspective camera model. As a result of this limitation, ad hoc pose estimation methods are used. For example, pose may be estimated independently for each vehicle-mounted camera and the separate pose estimates subsequently combined. Thus, such methods do not generally make the best use of available information.
By way of further example, one known method for determining the pose of a calibrated perspective camera implements the images of three known points in the world in order to constrain the possible poses of the camera to up to four pairs of solutions (where no more than one solution from each pair is valid). These solutions are typically generated in accordance with the known “three-point perspective pose problem”. Though this approach can be successfully applied in many circumstances, its utility, as it stands, is limited to particular camera geometries and viewpoints. Thus, this approach is less applicable to camera models having more generalized geometries (e.g., geometries that do not adhere to a central perspective model or correspond to a single viewpoint), which have become increasingly popular tools in computer vision systems.
Therefore, there is a need in the art for a method and apparatus for determining camera pose that is substantially model-independent.
A method and apparatus for determining camera pose characterized by six degrees of freedom (e.g., for use in computer vision systems) is disclosed. In one embodiment an image captured by the camera is received, and at least two constraints on the potential pose are enforced in accordance with known relations of the image to the camera, such that the potential pose is constrained to two remaining degrees of freedom. At least one potential pose is then determined in accordance with the remaining two degrees of freedom.
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
The present invention discloses a method and apparatus for determining camera pose (e.g., for computer vision systems). Unlike conventional methods for determining pose, which rely of assumptions concerning the camera model or image capture geometry, the method and apparatus of the present invention function regardless of camera model or geometry, and thus may be implemented for determining the pose of substantially any kind of camera. For example, a camera operating in conjunction with a curved mirror (e.g., a catadioptric or a dioptric system), a multi-camera rig (e.g., a stereo camera head), or any compound camera comprising a plurality of individual sensing elements rigidly attached to each other are all potential configurations that may benefit from application of the present invention. Additionally, one or more conventional cameras moving over time can be treated as a single generalized camera as long as the pose transformation relating the camera positions from one time to another is known. In each case, the only requirement imposed by the methods and devices to be described herein is that the spatial relationships among rays corresponding to image locations be known. In other words, the present invention deals with calibrated generalized cameras, including calibrated camera configurations as undistinguished special cases.
The method 100 is initialized in step 102 and proceeds to step 104, where the method 100 receives an image (e.g., an individual frame of a sequence of scene imagery) that includes at least three known world points. In one embodiment, the image is received from a generalized camera that samples a light field or plenoptic function in some arbitrary but known fashion. For example, the image may be received from at least one of: a camera operating in conjunction with a curved mirror (e.g., a catadioptric or dioptric system), a multi-camera rig or a compound camera comprising a plurality of individual sensing elements rigidly attached to each other. The world points comprise any points in the real world that are captured in the received image and whose three-dimensional position in the real world is known.
In step 106, the method 100 identifies rays of points in the coordinate system of the image capturing device that project to three of the known world points. In one embodiment, the image capturing device is calibrated such that these rays can be immediately derived.
Referring back to
The method 100 then proceeds to step 110 and determines the effects that the enforced constraints have on these remaining two degrees of freedom. One of these remaining two degrees of freedom can be considered in terms of a transformation of the known world point corresponding to the third ray. The second of the two remaining degrees of freedom can be considered in terms of possible positions of the third ray.
In step 112, the method 100 uses the knowledge of the remaining two degrees of freedom to determine possible transformations between the coordinate system of the world points and the coordinate system of the rays. These transformations represent potential solutions for the pose of the image capturing device. The method 100 then ends in step 114.
The first and second world points 3021, 3023 define a unique axis 304 in space that passes through both the first world point and the second world point. The configuration of world points 302 may thus be rotated around the axis 304 without violating the constraints given by the first and second rays 300 (in effect, this illustrates the first of the remaining two degrees of freedom discussed above with respect to
As illustrated in
There exists a rigid motion that revolves the perpendicular axis 400 around on a cylinder (the ends of the cylinder being defined by the circles 4041 and 4042, hereinafter referred to as “circles 400”) while satisfying the constraints imposed by the first and second rays 300. In addition to the perpendicular axis 400, the cylinder defined by the circles 400 also includes the first and second world points 3021, 3022, which lie on the circumference of respective circles 400. The revolving motion of the perpendicular axis 400 affects all planes perpendicular to the perpendicular axis 400 in an equal manner. Thus, if one considers an orthographic projection along the perpendicular axis (the direction of the projection indicated by arrow 402), a cross section of the cylinder may be represented as illustrated in
Referring back to
The known world points 302 are then transformed such that the first and second world points 3021, 3022 are disposed at particular positions. In one embodiment, the first world point 3021 coincides with the origin 700. Rotation around the first world point 3021 results in the second world point 3022 lying on the line of intersection between the xz-plane and the z-plane of the second ray 3002. There are typically two such possible locations for the second world point 3022. In one embodiment, the location for the second world point 3022 is chosen such that x≧0. This determines the coordinate system for the world points 302 up to a rotation around the axis defined by the first and second world points 3021, 3022. In one embodiment, this rotation is chosen arbitrarily.
Referring back to
y=sx (EQN. 1)
where s is the slope of the projection.
In one embodiment, the transformation must maintain the first world point 3021 on the y-axis. Thus, the position of the first world point 3021 is determined to be at y=m. Moreover, the projection line 800 of the axis through the first and second world points 3021, 3022 is positioned such that the line equation of the projection 800 is:
y=kx+m (EQN. 2)
where k is the slop of the projection 800.
In addition, the transformation maintains the second world point 3022 on the second ray 3002. Thus, the projection of the second ray 3002 and the projection 800 meet at the projection, (b,c), of the second world point 3022, and the projection (b,c) has x-coordinate b such that
m=(s−k)b (EQN. 3)
As is apparent from
where D is the projected distance between the first and second world points 3021, 3022.
EQN. 4 can be rewritten by defining a scalar, u, as:
u≡D/b (EQN. 5)
and observing that, by Pythagoras's Theorem, one has (k2+1)b2=D2, which leads to:
u2=1+k2 (EQN. 6)
Thus, according to EQNs. 2, 3 and 5, the transformation of EQN. 4 can be rewritten as:
Thus, the one-dimensional family of valid transformations is parameterized by u and k under the constraint from EQN. 6.
Referring back to
In one embodiment, the family of world points mapped onto the third ray can be determined by interpreting the transformation of EQN. 7 as a full three-dimensional transformation that maps homogenous coordinates X=[x y z 1]T of a world point into the new homogeneous coordinates X′, one gets:
The transformed point X′ lies on a plane L=[I1 I2 I3 I4]T if and only if LTX′=0, or:
a+ka2+ua3=0 (EQN. 9)
where
a1≡xI1+yI2+sDI2 (EQN. 10)
a2≡xI2−yI1−DI2 (EQN. 11)
a3≡zI3+I4 (EQN. 12)
The third ray can be represented by two planes L and L′, and if one defines a1′, a2′, a3′ analogously using L′, one has:
a1+ka2+ua3=0 (EQN. 9)
a1′+ka2′+ua3′=0 (EQN. 13)
Respectively eliminating k and u gives:
u(a2′a3−a2a3′)=(a2a1′−a2′a1) (EQN. 14)
−k(a2′a3−a2a3′)=(a3a1′−a3′a1) (EQN. 15)
Inserting EQNs. 15 and 16 into EQN. 6 gives:
(a2a1′−a2′a1)2=(a2′a3−a2a3′)2+(a3a1′−a3′a1)2 (EQN. 16)
which, by the degree, is readily seen to be the definition of a quartic surface. Since the quartic surface is a union of lines, it is a ruled quartic surface.
One can now insert the equation for the plane containing the circle (e.g., circle 306 of
In one embodiment, the rays are represented by the points p1, p2 and p3 on the rays and by the unit direction vectors d1, d2 and d3. d1 is made parallel to the y-axis. The common perpendicular direction of the first and second rays is:
d4≡(d1×d2)/|d1×d2| (EQN. 17)
which is the direction to be made parallel to the z-axis.
The direction that will then become parallel to the x-axis is defined as:
d5≡d1×d4 (EQN. 18)
The conditions of EQN. 17 and EQN. 18 will be satisfied if the rays are rotated by the rotation:
R1≡[d5d1d4]T (EQN. 19)
The slope, s, of the second ray as defined by EQN. 1 will be:
s≡(d1Td2)/(d5Td2) (EQN. 20)
Moreover, the point at which the first ray meets the perpendicular axis is placed at the origin, such that the point is given as:
p4≡p1+αd1 (EQN. 21)
where
α≡(d1−sd5)T(p2−p1) (EQN. 22)
The transformation that brings the rays to their starting positions is hence:
where it may be assumed that this transformation has always been applied to the rays.
In step 1006, the method 1000 lines up the world points (e.g., world points 302) up in their canonical positions. In one embodiment, the world points are given as q1, q2 and q3. To bring the world points to their starting positions, it may first be observed that the second ray now lies entirely in the z-plane, whose z-coordinate, e, is the z-coordinate of p2.
In addition, it may be observed that
D≡√{square root over (|q2−q1|2−e2)} (EQN. 24)
Also,
d6≡[D0e]T/|q1−q2| (EQN. 25)
is a unit vector in the direction from the origin toward the location where the second world point is to be positioned. Before the transformation, this direction is:
d7≡(q1−q2)/|q1−q2| (EQN. 26)
Also defined is:
d8≡[0 1 0]T (EQN. 27)
and
d9≡(d7×(q3−q1))/|d7×(q3−q1)| (EQN. 28)
where db and d9 are unit vectors perpendicular to d6 and d7, respectively. Thus, the desired rotation becomes:
R2≡[d6d8(d6×d8)][d7d9(d7×d9)]T (EQN. 29)
and the transformation applied to the points becomes:
where in one embodiment, it is assumed that the transformation H2 has already been applied to the points.
In step 1008, the method 1000 computes the coefficients L and L′ of the two plane equations representing the third ray. In one embodiment, the coefficient L and L′ are computed by finding two distinct normal vectors n and n′ that are perpendicular to d3. In one embodiment, n is chosen as the vector of largest magnitude out of the two vector products d3×[1 0 0] and d3×[0 1 0]T. n′ is then chosen such that n′≡d3×n, and the plane vectors are:
L≡[nT−nTp3]T (EQN. 31)
and
L′≡[n′T−n′Tp3]T (EQN. 32)
The method 1000 then proceeds to step 1010 and computes the coefficients of the octic polynomial whose roots correspond to the intersections of a circle and a ruled quartic surface. In one embodiment, the octic polynomial, in z, is derived by eliminating x and y from the quartic expression of EQN. 16. After the world points have been lined up in accordance with step 1006, the circle sits in a plane defined by:
[x y z]d6=q3Td6 (EQN. 33)
which gives
x=K1 (EQN. 34)
where
Moreover, the circle is the intersection between the plane and the sphere
x2+y2+z2=|q3|2 (EQN. 36)
If one inserts EQN. 34 into EQN. 36, ones gets:
y2=K2 (EQN. 37)
where
K2=|q3|2−z2−K12 (EQN. 38)
By subsequently inserting EQN. 34 into EQN. EQNs. 10 and 11, one gets:
a1=y K3+K4 (EQN. 39)
a2=y K5+K6 (EQN. 40)
a3=K7 (EQN. 41)
a′1=y K8+K9 (EQN. 42)
a′2=y K10+K11 (EQN. 43)
a′3=K12 (EQN. 44)
where
K3≡I2 (EQN. 45)
K4≡I1K1+sDI2 (EQN. 46)
K5≡−I1 (EQN. 47)
K6≡I2K1−DI2 (EQN. 48)
K7≡zI3+I4 (EQN. 49)
K8≡I′2 (EQN. 50)
K9≡I′1K1+sDI′2 (EQN. 51)
K10≡−I′1 (EQN. 52)
K11≡I′2K1−DI′2 (EQN. 53)
K12≡zI′3+I′4 (EQN. 54)
Applying EQNs. 39 through 44 and EQN. 38, the expressions of EQN. 17 may be expanded as:
a2a′1−a′2a1=K13y+K14 (EQN. 55)
a′2a3−a2a′3=K15y+K16 (EQN. 56)
a3a′1−a′3a1=K17y+K18 (EQN. 57)
where
K13≡K5K9+K6K8−K3K11−K4K10 (EQN. 58)
K14≡K6K9−K4K11+K2(K5K8−K3K10) (EQN. 59)
K15≡K7K10−K5K12 (EQN. 60)
K16≡K7K11−K6K12 (EQN. 61)
K17≡K7K8−K3K12 (EQN. 62)
K18≡K7K9−K4K12 (EQN. 63)
Squaring the right hand sides of EQNs. 55-57, inserting into EQN. 16 and again applying EQN. 38 yields:
K19=K20y (EQN. 64)
where
K19≡K2(K132−K152−K172)+K142−K162−K182 (EQN. 65)
K20≡2(K15K16+K17K18−K13K14) (EQN. 66)
By squaring EQN. 64 and again applying EQN. 38, one gets:
K21=0 (EQN. 67)
where
K21=K192−K2K202 (EQN. 68)
which is an octic polynomial in z whose roots correspond to the up to eight solutions.
Once the octic polynomial is established in step 1010, the method 1000 proceeds to step 1012 and extracts the roots of the octic polynomial. In one embodiment, the roots are extracted by eigen-decomposing a companion matrix.
In one embodiment, the octic polynomial is first normalized so that it may be written as:
z8+β7z7+β6z6+ . . . +β0 (EQN. 69)
The roots are then found as the eigenvalues of the 8×8 companion matrix:
In an alternative embodiment, the roots of the octic polynomial may be found using Sturm sequences, e.g., as discussed by D. Nister in An Efficient Solution to the Five-Point Relative Pose Problem, IEEE Conference on Computer Vision and Pattern Recognition, Volume 2, pp. 195-202, 2003.
Once the roots of the octic polynomial are extracted, the method 1000 proceeds to step 1014 and backsubstitutes with each root of the octic polynomial to get solutions for the transformation between the world point coordinate system and the ray coordinate system. In one embodiment, this is accomplished by computing, for each solution for z, x by EQN. 34 and y by EQN. 64. u and k can then be computed using EQNs. 14 and 15, respectively. The transformation defined in EQN. 7 is then uniquely determined and labeled as H3. For each solution, the transformation H4, which rotates the third world point q3 around to the correct point on the circle, is also found.
In one embodiment, d10 is defined such that:
d10≡(d6×[x y z]T)/|d6×[x y z]T| (EQN. 71)
The desired rotation is then:
R4≡[d6d10(d6×d10)][d6d8(d6×d8)]T (EQN. 72)
and the transformation applied to the world points is:
The full transformation from the coordinate system of the world points to the coordinate system of the rays is then:
H=H1−1H3H4H2 (EQN. 74)
The transformation embodied in EQN. 74 is determined for each root of the octic polynomial such that up to eight transformations are obtained. These eight transformations represent potential solutions for transforming from the coordinate system of the world points to the coordinate system of the rays. Thus, the potential solutions may be treated as hypotheses for testing within a hypothesize-and-test architecture that determines the pose of the image capturing device.
The method 1000 then terminates in step 1016.
Alternatively, the camera pose estimation module 1205 can be represented by one or more software applications (or even a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC)), where the software is loaded from a storage medium (e.g., I/O devices 1206) and operated by the processor 1202 in the memory 1204 of the general purpose computing device 1200. Thus, in one embodiment, the camera pose estimation module 1205 for determining the pose of a camera or other image capturing device described herein with reference to the preceding Figures can be stored on a computer readable medium or carrier (e.g., RAM, magnetic or optical drive or diskette, and the like).
Thus, the present invention represents a significant advancement in the field of computer vision systems. A method and apparatus are provided that enable the pose of an image capturing device to be hypothesized based on a limited amount information known about the image capturing device and an image captured by the image capturing device. The method and apparatus function independently of the model or geometry of the image capturing device, and thus may be implemented for determining the pose of substantially any kind of image capturing device.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims benefit of U.S. provisional patent application Ser. No. 60/581,868, filed Jun. 22, 2004, which is herein incorporated by reference in its entirety.
The invention was made with Government support under grant number DAAD19-01-2-0012 awarded by the United States Army. The Government has certain rights in this invention.
Number | Name | Date | Kind |
---|---|---|---|
5259037 | Plunk | Nov 1993 | A |
5475584 | Bani-Hashemi | Dec 1995 | A |
5525883 | Avitzour | Jun 1996 | A |
5633995 | McClain | May 1997 | A |
5699444 | Palm | Dec 1997 | A |
5974348 | Rocks | Oct 1999 | A |
6724930 | Kosaka et al. | Apr 2004 | B1 |
20020057838 | Steger | May 2002 | A1 |
20030004694 | Aliaga et al. | Jan 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20060013437 A1 | Jan 2006 | US |
Number | Date | Country | |
---|---|---|---|
60581868 | Jun 2004 | US |