1. Field of the Invention
The present invention relates to a motion calculation device and a motion calculation method which calculates the motion of the host device based on captured images.
2. Description of Related Art
JP-A-2006-350897 describes a motion measurement device which measures the motion of a mobile object including image-capturing means based on an epipolar constraint relating to images captured by the image-capturing means.
JP-T-2003-515827 describes a system which predicts the motion (egomotion) of transfer means including a camera based on a homography relating to the images captured by the camera.
However, in the motion measurement device described in JP-A-2006-350897, there is a problem in that, when there are a small number of feature points corresponding to, for example, a stationary object in an image captured by an image-capturing unit, it is very difficult to calculate the motion of the host device based on the epipolar constraint relating to the feature points.
In the system described in JP-T-2003-515827, there is a problem in that, in the image captured by the image-capturing unit, when there is insufficient texture in a region of a plane, it is very difficult to calculate the motion of the host device based on the homography relating to the plane.
The invention has been finalized in consideration of the above-described situation, and an object of the invention is to provide a motion calculation device and a motion calculation method capable of stably calculating the motion of the host device.
The invention has been finalized in order to solve the above-described problems, and provides a motion calculation device. The motion calculation device includes an image-capturing unit configured to capture an image of a range including a plane and outputs the captured image, an extraction unit configured to extract a region of the plane from the image, a detection unit configured to detect feature points and motion vectors of the feature points from a plurality of images captured by the image-capturing unit at a predetermined time interval, and a calculation unit configured to calculate the motion of the host device based on both of an epipolar constraint relating to the feature points and a homography relating to the region.
In the motion calculation device, the calculation unit may minimize a cost function based on the epipolar constraint and the homography to calculate the motion.
The invention provides a motion calculation method in a motion calculation device which calculates the motion of the host device. The motion calculation method includes causing an image-capturing unit to capture an image of a range including a plane and to output the captured image, causing an extraction unit to extract a region of the plane from the image, causing a detection unit to detect feature points and motion vectors of the feature points from a plurality of images captured by the image-capturing unit at a predetermined time interval, and causing a calculation unit to calculate the motion of the host device based on both of an epipolar constraint to the feature points and a homograph), relating to the region.
According to the invention, the motion calculation device calculates the motion of the host device based on the epipolar constraint relating to the feature points in the images and the homography relating to the region of the plane. Therefore, the motion calculation device can stably calculate the motion of the host device compared to a where the motion of the host device is calculated based on either the epipolar constraint or the homography.
A first embodiment of the invention will be described in detail with reference to the drawings,
The motion calculation device 10 captures an image of a range including a plane (for example, the ground) and calculates the motion (hereinafter, referred to as a “camera motion”) of the host device based on the captured image. The motion calculation device 10 outputs the calculated camera motion to the control device 20.
The motion calculation device 10 includes an image-capturing unit 11, a tracking unit (detection unit) 12, an extraction unit 13, and a calculation unit 14. The image-capturing unit 11 includes a light-receiving element, and the light-receiving surface of the light-receiving element has a plurality of pixels. An optical image is formed on the light-receiving surface by an optical system. The image-capturing unit 11 captures age of a range including a plane in a predetermined cycle and outputs the captured image to the tracking unit 12 and the extraction unit 13. Hereinafter, an image I captured by the image-capturing unit 11 at the time t is denoted by an image It. Hereinafter, in the coordinate system with the principal point of the image-capturing unit 11 as the origin, the coordinate system with the optical axis direction as the Z axis is referred to as a “camera coordinate system”. The direction of light incident at the coordinate (xp,yp) in the camera coordinate system is denoted by a direction vector p=[x,y,z]T (the superscript T means transposition). The pixel value of a point p constituting the image It is denoted by It(p) using the direction vector p. The pixel value may be calculated through interpolation from the pixel values of peripheral pixels. It is assumed that the image-capturing unit 11 is calibrated in advance.
The image It captured by the image-capturing unit 11 in a predetermined cycle is input to the tracking unit 12. The tracking unit 12 detects feature points from the image It using a Harris operator or the like. The tracking unit 12 also detects the motion vectors of the feature points. Hereinafter, a set of feature points extracted by the tracking unit 12 is denoted by a set St. The tracking unit 12 detects a corresponding point qk corresponding to the same position in an observed scene (for example, a subject) of the image It for a k-th feature point pk constituting a set St-1 of feature points in an image It-1 captured at the time t−1 using a Lucas-Kanade method or the like. Hereinafter, a set of feature point correspondences (pk,qk) (i.e., a set of pairs of feature point and corresponding point) is denoted by Ψt.
The tracking unit 12 excludes, from Ψt, a feature point correspondence (pk,qk) which does not follow the epipolar constraint relating to the feature points of a static background (for example, captured ground or building) using RANSAC (RANdom SAmple Consensus) or the like. The tracking unit 12 outputs the set Ψt of feature point correspondences to the calculation unit 14. The tracking unit 12 performs the above-described processing for each image It to track the feature points within the image.
The set Ψt of feature point correspondences is input from the tracking unit 12 to the calculation unit 14. A region of a plane (hereinafter, referred to as a “plane region”) in the image It is input from the extraction unit 13 to the calculation unit 14. Here, it is assumed that the plane region is expressed by a set Πt={Bi|i=1 . . . m} of a plurality of blocks obtained by dividing the image It. It is also assumed that the initial value of the set Πt is predefined. For example, when the motion calculation device 10 is provided in a vehicle, the initial value of the plane region Πt is defined by the blocks {Bi|i=1 . . . m} of a high-incidence region (for example, the second half of the image) with the ground in the image It.
The calculation unit 14 stores the vector n=[nx,ny,nz]7 of a three-dimensional parameter representing a plane nxX+nyY+nzZ=1 on the X-Y-Z coordinate system with a predetermined position as the origin. Here, the initial value n0 of the vector n is predefined based on the installation orientation of the motion calculation device 10 with respect to the plane on which the mobile object is connected to the ground. For example, the initial value n0 is predefined based on the installation orientation of the motion calculation device 10 with respect to the ground on which the vehicle is connected.
The calculation unit 14 calculates the camera motion and the orientation of the plane (hereinafter, referred to as a “plane orientation”) with respect to the host device based on both of an epipolar constraint relating to the feature points and a homography relating to the plane region. Specifically, the calculation unit 14 minimizes a cost function based on both of a set Ψt of feature point correspondences and a plane region Πt to calculate a parameter μ=[ωT,tT,nT] representing the camera motion and the plane orientation. Here, the can era motion is constituted by a rotation parameter ωT=[ωx,ωy,ωz] and a translation vector tT=[tx,ty,tz]. The plane orientation is expressed by nT=[nx,ny,nz].
The cost function is expressed by Expression (1).
In Expression (1), [t]x is a cross product operator of a vector t which is expressed by a 3×3 matrix. R(ω) is a rotation matrix which is defined by the rotation parameter ω. A warp function W in the first a function which represents homography transformation for the plane region and is defined in accordance with the parameter μ representing the camera motion and the plane orientation. The first term is called a homography constraint term and represents an error in the homography transformation from the plane region of the image It-1 to the plane region of the image It.
Here, it is assumed that, with the motion of the motion calculation device 10, a feature point p in the plane region of the image It-1 moves to a feature point q in the plane region of the image It. In this case, the relationship shown in Expression (2) is established between the feature point p and the feature point q.
q˜(R(ω)−tnT)p (2)
In Expression (2), the symbol “˜” represents that the vectors of both sides are in parallel and the same except for the scale. The vector p and the vector n expressed by the camera coordinate system at the time t−1. Meanwhile, the vector q and the vector t expressed by the camera coordinate system at the time t.
Here, H=R(ω)−tnT which is a 3×3 matrix is taken into consideration. If the row vectors of the matrix H are denoted by h1, h2, and h3, the warp function W which represents the transformation from the feature point p=[x,y,1]T to the corresponding point q=[x′,y′,1]T is expressed by Expression (3).
The second term in Expression (1) is called an epipolar constraint term and represents the epipolar constraint. When the feature point moves from the coordinate p to the coordinate q, Expression (4) is satisfied by the epipolar constraint.
qT[t]×R(ω)p=0 (4)
In general, the left side does not become “0” due to errors of the parameters and observation errors of the vector p and the vector q. For this reason, it is considered that the left side in Expression (4) is an algebraic error on the epipolar constraint. Thus, the second term in Expression (1) represents the sum of square errors concerning the epipolar constraint.
The third term in Expression (1) is a regularization term in which the magnitude of the translation vector is v. Here, αp, βp, and λ in Expression (1) are weight coefficients. The third term in Expression (1) is added to Expression (1) so as to resolve the scale ambiguity of the translation vector t. For example, when the speed information of the mobile object is not obtained, v=1 may be set. In this case, the estimated distance the motion calculation device 10 to the plane and the estimated distance from the motion calculation device 10 to each feature point have values proportional to the norm of the translation vector t.
The calculation unit 14 improves the estimation value of the parameter μ using a variate δ so as to minimize the cost function. The calculation unit 14 minimizes the value of the cost function by, for example, a Gauss-Newton method.
If the inside of each square error in Expression (1) is primarily approximated to the variate δ of the parameter μ, Expression (5) is established. Here, rw, re, and rv are residual errors on the parameters μ of the respective terms in Expression (5). rw is expressed by Expression (6), re is expressed by Expression (7), and rv is expressed by Expression (8),
r
w
=I
t(w(p;μ))−It-1(p) (6)
re=qT[t]×R(ω)p (7)
rv=tTt−v2 (8)
If a definition is made as expressed by Expression (9), Jacobian Jw at the feature point p is expressed by Expression (10) using the current set parameter set value μ=[ωT,tT,nT].
Je which appears in the differentiation of the second term in Expression (5) is expressed by Expression (11). Here, Expression (12) is established, Jv which appears in the differentiation of the third term in Expression (5) is expressed by Expression (13).
If the differentiation for δ in Expression (5) is set to a value “0”, Expression (14) is established.
If a theorem is built for δ, Expression (14) is expressed by Expression (15).
The left side in Expression (15) is the product of a 9×9 matrix and a nine-dimensional parameter vector δ. The right side in Expression (15) is a nine-dimensional vector. As described above, Expression (15) is a linear simultaneous equation, such that a solution can be found for δ. δ gives the approximate value (μ+δ) of the optimum value of the parameter estimation value μ.
As described above, the calculation unit 14 repeatedly updates the parameter estimation value μ until the value of the cost function ceases to improve and, when the value of the cost function ceases to improve, determines that the solution converges. When the solution converges, the calculation unit 14 outputs the parameter estimation value μ=[(ωT,tT,nT] to the extraction unit 13.
The weight coefficients αp, βp, and λ may be values experimentally defined or may be defined as follows. Each term in Expression (1) is in the form of square minimization, such that it can be seen that the cost function expressed in Expression (1) is in the form of Expression (I) which represents a negative log likelihood based on the probability distribution for each of the errors rw, re, and rv.
In Expression (I), the notation p(r|A) represents the probability density of the occurrence of the observed value r under the condition A. The subscript p of the first term in Expression (I) represents the probability density function for each point p in the image. The subscripts p and q of the second term Expression (I) represent the probability density function for each feature point correspondence (p,q).
If it is assumed that the probability distribution of each term is the normal probability distribution, Expression (I) is expressed by Expression (II).
In Expression (II), the notation A|p represents the evaluation of Expression A for the point p in the image. The notation A|(p,q) represents the evaluation of Expression A for the feature point correspondence (p,q) in the image, σw is the standard deviation of error relating to the homography transformation. σe is the standard deviation of error relating to the epipolar constraint. σv is the standard deviation of error relating to the speed. The constant term C has nothing to do with the minimization of Expression (II) and may thus be negligible.
If Expression (II) is minimized, a parameter which maximizes the likelihood of the observed value is estimated. Thus, when the noise standard deviations σw, σe, and σv are given, α, β, and γ in Expression (1) are respectively the reciprocals of the squares of σw, σe, and σv from the viewpoint that the best likelihood value is estimated.
As described above, the weight coefficients αp, βp, and λ are defined using the approximate model (Expression (II)) of the relationship between information concerning the luminance, gradient, and the positions of the feature points in the image and the noise standard deviations. In this way, it can be anticipated that the performance of estimating the camera motion and the plane orientation is improved compared to a case where the weight coefficients are defined experimentally.
Returning to
In Expression (1), the first term which is the only term related to the plane orientation is used as the cost function on the plane orientation. The cost function on the plane orientation is expressed by Expression (16).
The value of the cost function of Expression (16) is minimized for the parameter nB of the plane orientation of a block B. Here, as in Expression (1), the parameter nB is updated by solving the simultaneous equation (17) of the first term related to the plane orientation for each of both sites of Expression (15).
In Expression (17), J′w is a vector which has only three components concerning the parameter n of the plane orientation. δ′ is a three-dimensional vector for updating the three components related to the parameter n of the plane orientation. That is, J′w refers to the three components in Expression (10) and is expressed by Expression (18).
The left side in Expression (17) is the product of a 3×3 square matrix and a three-dimensional vector δ′. The right side in Expression (17) is a three-dimensional vector. Thus, δ′ is calculated as the solution of the linear simultaneous equation.
The extraction unit 13 calculates the difference between the parameter nB of the plane orientation of the block B and the parameter n of the plane orientation over the entire plane region. When the difference between the parameter nB of the plane orientation and the parameter n of the plane orientation is equal to or smaller than a predetermined threshold value, the extraction unit 13 determines that the block B is a block which belongs to the plane region and adds the block B to the set Πt of blocks to update the set Πt. Here, the predetermined threshold value is, for example, the Euclidean distance between the parameter nB and the parameter n. The extraction unit 13 outputs the updated set Πt to the calculation unit 14.
The calculation unit 14 calculates nine parameters included in the camera motion and the plane orientation based on the set Πt updated by the extraction unit 13. Here, with the stopping condition that the value of the cost function in Expression (1) is not improved, the calculation unit 14 repeatedly calculates the nine parameters included in the camera motion and the plane orientation until the stopping condition is satisfied. The calculation unit 14 outputs the finally calculated camera motion and plane orientation to the control device 20.
Here, as the number of blocks representing the plane region decreases, the value of the cost function in Expression (1) decreases. As the number of blocks representing the plane region decreases, estimation accuracy of the camera motion and the plane orientation is lowered. Thus, as the stopping condition for stopping the processing for calculating the camera motion and the plane orientation, Expression (19) which is obtained by normalizing the first terra in Expression (1) with the number of blocks may be used.
The camera and the plane orientation are input from the calculation unit 14 of the motion calculation device 10 to the control device 20. The control device 20 performs predetermined processing based on the motion and the plane orientation. Here, the predetermined processing refers to, for example, processing in which the three-dimensional positions of the feature points are calculated based on the camera motion through triangulation and, when a cluster of calculated feature points is a cluster outside the plane region based on the plane orientation, the cluster is displayed on the screen (not shown) as an obstacle. For example, the predetermined processing may be a processing in which an avoidance traction is issued to the mobile object such that the mobile object avoids the cluster outside the plane region. In this case, the predetermined processing may be a processing in which an avoidance instruction is issued to a vehicle including the camera system (motion calculation device 10 and control device 20) such that the vehicle avoids another vehicle which is traveling ahead.
Next, the operation procedure of the motion calculation device 10 will be described.
The calculation unit 14 initializes the estimation value μ of the camera motion and the plane orientation (Step S1). The image-capturing unit 11 captures image of a range including a plane (Step S2). The tracking unit 12 tracks the feature point p within the image. The tracking unit 12 outputs the set Ψt of feature point correspondences to the calculation unit 14 (Step S3). The calculation unit 14 calculates the estimation value μ of the camera motion and the plane orientation (Step S4).
The extraction unit 13 extracts the plane region (Step S5). The calculation unit 14 calculates the estimation value μ of the camera motion and the plane orientation (Step S6). The calculation unit 14 determines whether or not the solution converges (Step S7). When the solution does not converge (Step S7—No), the operation of the calculation unit 14 returns to Step S5. Meanwhile, when the solution converges (Step S7—Yes), the calculation unit 14 outputs the calculated camera motion and plane orientation to the control device 20. The control device 20 performs predetermined processing (scene analysis, display, warning, control, and the like) (Step S7).
As described above, the motion calculation device 10 includes the image-capturing unit 11 which captures the image of the range including the plane and outputs the captured image, the extraction unit 13 which extracts the region of the plane from the image, the tracking unit 12 which detects the feature points and the motion vectors of the feature points from a plurality of images captured by the image-capturing unit 11 at a predetermined time interval, and the calculation unit 14 which calculates the motion (camera motion) of the host device based on both of the epipolar constraint relating to the feature points and the homography relating to the region.
Thus, the motion calculation device can stably calculate the camera motion compared to a case where the camera motion is calculated based on either the epipolar constraint or the homography.
The calculation unit 14 minimizes the cost function based on the epipolar constraint and the homography to calculate the motion of the host device. Thus, the motion calculation device can stably calculate the camera motion based on the cost function.
A second embodiment of the invention will be described in detail with reference to the drawings. In the second embodiment, the regularization term of the cost function is different from that in the first embodiment. Hereinafter, a description will be provided of only differences from the first embodiment.
When it is assumed that the distance d between the principal point of the image-capturing unit 11 (see
The calculation unit 14 minimizes the value of the cost function using Expression (21), instead of Expression (8).
The calculation unit 14 minimizes the value of the cost function using Expression (22), instead of Expression (13).
As described above, the calculation unit 14 minimizes the cost function, which does not include the speed information v of the host device, to calculate the motion of the host device. Thus, the motion calculation device can stably calculate the camera a without using the speed information v.
Although the embodiments of the invention have been described in detail with reference to the drawings, the specific configuration is not limited to the embodiments and may include design and the like without departing from the scope and spirit of the invention.
A program for implementing the above-described motion calculation device may be recorded in a computer-readable recording medium, and the program may be loaded and executed on a computer system. The term “computer system” used herein is a concept including an OS or hardware, such as peripheral devices. The “computer-readable recording medium” refers to a portable medium, such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device, such as a hard disk incorporated into the computer system. The “computer-readable recording medium” includes a medium which holds the program for a predetermined time such as a volatile memory (RAM) in the computer system which serves as a server or a client when the program is transmitted through a network, such as Internet, or a communication link, such as a telephone link. The program may be transmitted from the computer system which stores the program in the storage device or the like to another computer system through a transmission medium or transmission waves in the transmission medium. The “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, for example, a network, such as Internet, or a communication link (communication line), such as a telephone link. The program may implement a portion of the above-described function. The program may be a program which can implement the above-described function through a combination with a program previously stored in the computer system, that is, a so-called differential file (differential program).
While preferred embodiments of the invention have been described and illustrated above, it should be understood that these are exemplary of the invention and are not to be considered as limiting. Additions, omissions, substitutions, and other modifications can be made without departing from the spirit or scope of the present invention. Accordingly, the invention is not to be considered as being limited by the foregoing description, and is only limited by the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
P2010-135487 | Jun 2010 | JP | national |
This application claims benefit from U.S. Provisional application Ser. No. 61/295,448, filed Jan. 15, 2010, and claims priority on Japanese Patent Application No. 2010-135487, filed Jun. 14, 2010, the contents of which are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
6137491 | Szeliski | Oct 2000 | A |
6353678 | Guo et al. | Mar 2002 | B1 |
6832001 | Kashiwagi | Dec 2004 | B1 |
7027659 | Thomas | Apr 2006 | B1 |
7149325 | Pavlidis et al. | Dec 2006 | B2 |
7193640 | Egan | Mar 2007 | B2 |
7567704 | Au et al. | Jul 2009 | B2 |
7786898 | Stein et al. | Aug 2010 | B2 |
7982662 | Shaffer | Jul 2011 | B2 |
8134479 | Suhr et al. | Mar 2012 | B2 |
8280105 | Kishikawa et al. | Oct 2012 | B2 |
8401276 | Choe et al. | Mar 2013 | B1 |
20030053659 | Pavlidis et al. | Mar 2003 | A1 |
20040222987 | Chang et al. | Nov 2004 | A1 |
20050036659 | Talmon et al. | Feb 2005 | A1 |
20050063596 | Yomdin et al. | Mar 2005 | A1 |
20050185844 | Ono et al. | Aug 2005 | A1 |
20070185946 | Basri et al. | Aug 2007 | A1 |
20080033649 | Hasegawa et al. | Feb 2008 | A1 |
20080273751 | Yuan et al. | Nov 2008 | A1 |
20090010507 | Geng | Jan 2009 | A1 |
20110090337 | Klomp et al. | Apr 2011 | A1 |
Number | Date | Country |
---|---|---|
2003-515827 | May 2003 | JP |
2006-350897 | Dec 2006 | JP |
WO 0139120 | May 2001 | WO |
Entry |
---|
Tomoaki Teshima et al., “Real Time Method to Detect the Waving Drive from a Single Camera”, The Institute of Electrical Engineers of Japan, 2009, vol. 129, No. 9, pp. 1714-1723. |
Shigeki Sugimoto et al., “Efficient Plane-Parameter Estimation Using Stereo Images”, Information Processing Society of Japan, Computer Vision and Image Media, Feb. 2007, vol. 48, No. SIGI(CVIM17), pp. 24-34. |
Japanese Office Action, Application No. 2010-135487 dated Dec. 6, 2011. |
Qifa KE et al., “Transforming Camera Geometry to A Virtual Downward-Looking Camera: Robust Ego-Motion Estimation and Ground-Layer Detection”, Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'03), pp. 1-8. |
Gideon P. Stein et al., “A Robust Method for Computing Vehicle Ego-motion”, In IEEE Intelligent Vehicles, 2000, pp. 1-7. |
Richard Szeliski et al., “Geometrically Constrained Structure from Motion: Points on Planes”, In European Workshop on 3D Structure from Multiple Images of Large-scale Environments (SMILE), pp. 171-186, 1998. |
Number | Date | Country | |
---|---|---|---|
20110175998 A1 | Jul 2011 | US |
Number | Date | Country | |
---|---|---|---|
61295448 | Jan 2010 | US |