The present invention relates to the monocular model-based pose estimation. More specifically, the present invention relates to a=method and a computer program product of the simultaneous pose and points-correspondences determination from a planar model.
Model-based pose estimation from a monocular camera consists of estimating the position and orientation of a calibrated camera with respect to a known model, under the assumption that a set of features (such as 3D model points, lines, or conics) and their images points are given. Pose estimation is a fundamental problem in computer vision and can find its applications in visual servoing, pattern recognition, camera calibration, and etc.
The literature on pose estimation is extensive and falls into two categories: iterative algorithms and noniterative algorithms. The noniterative algorithms such as the algorithm of Gerald Schweighofer and Axel Pinz, the iterative algorithms such as the method of Oberkampf et al and the method of L. S. Davis et al., all these algorithms require known correspondences, i.e., the known points or lines correspondences. In addition, there are some algorithms which attempt to solve the pose and correspondences simultaneously, the typical example is the SoftPosit algorithm.
The existing methods can handle the simultaneous pose and correspondences determination given a 3D model, such as the SoftPosit algorithm. However, it fails in the presence of a planar model because it cannot cope with the pose redundancy, which states that the estimated poses increase exponentially as iterations continue.
As stated above, the applications of the existing pose estimation algorithms are limited by the requirements of known points-correspondences or a 3D model. In view of the above-mentioned problems, particularly the pose redundancy problem that the number of the estimated poses increase exponentially as iterations go for a planar model, we disclose in the invention an effective method and a software to solve the simultaneous pose and points-correspondences determination for a planar model.
Since there exists the pose redundancy problem for the simultaneous pose and correspondences determination for a planar model that the number of the estimated poses increases exponentially as iterations continue, it is of good practical value to disclose a method and a computer product to solve the problem.
To solve the pose redundancy problem above, the present invention is embodied in a method of the simultaneous pose and correspondences determination for a planar model, comprising:
Obtaining two possible coarse poses of a planar model by a coarse pose estimation algorithm;
Obtaining two pose candidates by the iterative extended TsPose algorithm initialized with the two possible coarse pose estimates; and
Selecting a pose as the final output from the two pose candidates based on said coarse poses.
The method described above, wherein said obtaining two pose candidates by the iterative extended TsPose algorithm further comprises:
using each of the two coarse pose estimates as an initial estimate, forming the initial correspondences matrix by the computations of the initial “tangent-image” points, updating the correspondences matrix, and estimating the objective function value;
updating the affine transformation and recovering two rotation matrices based on the updated affine transformation, selecting a new rotation matrix and a new tangent-image plane based on the coarse azimuth angle of the coarse pose in the initialization, and estimating a new translation vector and new “tangent-image” points based on the new tangent-image plane;
determining whether the convergence criterion is satisfied, if it is satisfied, then terminating the iterative searching of the extended TsPose algorithm; if not, then continuing a new iteration based on the estimated new “tangent-image” points until the convergence criterion is satisfied; and
said pose estimate including a rotation matrix, a translation vector, and/or a correspondence matrix; using said pose estimate as a pose candidate when the iterative searching is terminated.
The method described above further comprises:
When the iterative searching terminates, judging whether said initial estimate is used, if there exists a unused said initial estimate, using the unused said initial estimate as the initial estimate of the extended TsPose algorithm and conducting the iterative searching of the extended TsPose algorithm to obtain a pose candidate.
The method described above, wherein said convergence criterion is that the uncertainty parameter β is more than the prescribed maximum value, wherein the uncertainty parameter β corresponds to the multiplication of the initial β0 and a uncertainty update parameter βu; and/or,
said convergence criterion is that the objective function value is smaller than a prescribed parameter ε;
Whenever either of said convergence criterions is satisfied, the iterative searching is terminated.
The method described above, wherein selecting a pose as the final output from the two pose candidates based on said coarse poses further comprises:
When two said initial estimates are used, selecting the pose candidate corresponding to a smaller objective function value and using it as the final output.
A second embodiment of the present invention is a computer program product of the simultaneous pose and correspondences determination from a planar model, comprising:
A computer-readable medium storing a computer program implementing the method described above, and a computer-readable code implementing the method of the simultaneous pose and points-correspondences determination from a planar model; the method of the simultaneous pose and points-correspondences determination from a planar model including:
Obtaining two possible coarse poses of the planar model based on a coarse pose estimation algorithm;
Obtaining two pose candidates based on the iterative searching of the extended TsPose algorithm which is initialized by each of the two possible coarse pose estimates; and
Selecting a pose as the final output from the two pose candidates based on said coarse poses.
The computer program produce described above, wherein said obtaining two pose candidates based on the iterative searching of the extended TsPose algorithm further comprises:
using each of the two coarse pose estimates as an initial estimate, forming the initial correspondences matrix based on the computations of the initial “tangent-image” points and updating the correspondences matrix, and estimating the objective function value;
updating the affine transformation and recovering two rotation matrices based on the updated affine transformation, selecting a new rotation matrix and a new tangent-image plane based on the coarse azimuth angle of the coarse pose in the initial, and estimating a new translation vector and new “tangent-image” points based on the new tangent-image plane;
determining whether the convergence criterion is satisfied, if it is satisfied, then terminating the iterative searching of the extended TsPose algorithm; if not, then continuing the new iteration based on the estimated new “tangent-image” points until the convergence criterion is satisfied; and
said pose estimate including a rotation matrix, a translation vector, and/or a correspondence matrix; using said pose estimate as a pose candidate when the iterative searching is terminated.
The computer program product described above, wherein said method further comprises:
When the iterative searching terminates, judging whether said initial estimate is used, if there unused is said initial estimate, using the unused said initial estimate as the initial estimate of the extended TsPose algorithm and conducting the iterative searching of the extended TsPose algorithm to obtain a pose candidate.
The computer program product described above, wherein said convergence criterion is that the uncertainty parameter β is more than the prescribed maximum value, wherein the uncertainty parameter β corresponds to the multiplication of the initial β0 and a uncertainty update parameter βu; and/or,
said convergence criterion is that the objective function value is smaller than a prescribed parameter ε;
Whenever either of said convergence criterions is satisfied, the iterative searching is terminated.
The computer program product described above, wherein selecting a pose as the final output from the two pose candidates based on said coarse poses further comprises:
When two said initial estimates are used, selecting the pose candidate corresponding to a smaller objective function value and using it as the final output accurate pose result.
Compared to the existing methods, the present invention has the advantages as follows:
the present invention disclose a method of the simultaneous pose and points-correspondences determination from a planar model, comprising the steps of:
(1) using a cluster method to obtain two possible coarse poses, based on which the space of searching is restricted and the problem of pose redundancy is solved;
(2) using the extended TsPose algorithm to estimate a more accurate points correspondences and pose result, wherein said TsPose algorithm is an algorithm for the accurate pose determination, i.e., the tangent-image sliding algorithm; and said extended TsPose algorithm is an algorithm combined the Softassign algorithm for estimating the points-correspondences and the TsPose algorithm for the accurate pose determination in an iterative scheme.
In addition, the present invention can solve the pose redundancy problem, wherein the estimate poses increase exponentially, in the simultaneous pose and points-correspondences determination from a planar model; and the disclosed method in the present invention is based on the coplanar points and does not place restrictions on the shape of a planar model, therefore, it does not require a hard condition, the disclosed method in the present invention performs well in a cluttered and occluded environment with different levels of noise and hence own practical applied value.
The present invention is embodied in a method of estimating two possible coarse poses from a set of model points and a set of image points with unknown correspondences. Firstly, transform each cluster formed by a single anchor point and its c closest neighbors so that its center is at the origin and its covariance matrix is an identity matrix, then compute the polar coordinates of each cluster, and rearrange points of each cluster in an increasing polar angle order. Next, compute the subspace distances associated with every possible combination pair of each model cluster and each image cluster, and retain the combination pair corresponding to the minimum subspace distance. Thirdly, select good matching combination pairs whose associated subspace distances fall below a predefined threshold. Fourthly, estimate the affine parameter based on each selected good pair, and then recover two possible coarse orientations of the model plane and depths of the translation vector from each estimated affine parameter. Finally, use k-means algorithm to generate two possible coarse poses and depths of the translation vector by clustering all estimated coarse orientations and depths.
A second embodiment of the present invention is an iterative algorithm, named as the TsPose algorithm, for accurately estimating the pose of a planar model and the points-correspondences. The TsPose algorithm is initialized with one of the two coarse pose, with the initial tangent-image points computed by the initial orientation. A matrix of correspondences is computed from the current tangent-image points, the current affine transformation, and the model points, and updated by the Sinkhorn algorithm. The affine transformation relating the model points and the image points is updated using the current matrix of correspondences. Two orientations represented by elevation and azimuth angles and translation vectors are computed from the current affine transformation. The orientation and the translation vector corresponding to a smaller absolute difference between the estimated azimuth angle and the input coarse azimuth angle are retained, and the tangent-image points are updated by the retained orientation. The above mentioned process is repeated until the objective function value is smaller than a prescribed tolerance or the maximum number of iterations is attainted. Two independent trials initialized with different coarse poses and the same parameters setting should be performed, and the pose and the matrix of correspondences from the trail whose objective function value is relatively smaller is retained as the output.
The present invention is also embodied in a software. The software includes a computer-readable medium that stores instructions for causing a computer to perform the simultaneous pose and points-correspondences determination from a planar model, and the computer-readable code that implements the method for the simultaneous and points-correspondences pose determination from a planar model.
The basic idea of the present invention consists of a two-step method for the simultaneous pose and points-correspondences from a planar model: in the first step, obtaining two possible coarse poses to restrict the scope of searching and therefore solve the problem of pose redundancy; in the second step, using each of the two possible coarse pose as the initialization of the iterative algorithm combing a points-correspondence finding algorithm and a accurate pose estimation algorithm and then obtaining a more accurate pose by the iterative searching, wherein said coarse poses comprise the coarse pose angles and the coarse z-entry of the translation vector; said accurate pose comprises pose angles and a translation vector.
The above and other aspects of the present invention will be shown and described in detail with reference to the exemplary embodiments thereof and the accompanying drawings.
The exemplary embodiments of the invention cover obtaining two possible coarse poses of the planar model based on a coarse pose estimation algorithm, obtaining two pose candidates based on the iterative searching of the extended TsPose algorithm which is initialized by each of the two possible coarse pose estimates, and selecting a pose as the final output from the two pose candidates based on said coarse poses.
The first step—a coarse pose estimation algorithm, comprising the steps of:
S101, Constructing the model clusters and the image clusters;
In the step of S101, suppose that the number of neighbor points is set to be k; computing all the distances between a single anchor model point and the remaining model points, and selecting the k closest points of the anchor model points and constructing a model cluster by taking the anchor model points and its k closest points; the number of points in a model cluster is k+1.
Similarly, computing all the distances between a single anchor image point and the remaining image points, and selecting the k closest points of the anchor image point and constructing a image cluster by taking the anchor image points and its k closest points; the number of points in a image cluster is k+1. In the exemplary embodiment, suppose that the number of model points is m and the number of image points is n, accordingly, m model cluster and n image clusters will be constructed respectively.
S102 Applying transformations to the points of each model and image clusters respectively.
In the step, the said transformations applied to the points of each model and image clusters involve: centering the model points in each model cluster by subtracting their centroid and applying the whitening transformation in the formula (1-3) to the points of each model cluster; accordingly, centering the image points in each image cluster by subtracting their centroid and applying the whitening transformation in the formula (1-3) to the points of each image cluster.
S103, rearranging the orders of the points of each model and image clusters.
In the step, computing the polar coordinates of each transformed model cluster, and rearranging the points of each model cluster in increasing order of polar angles; In the same way, computing the polar coordinates of each transformed image cluster, and rearranging the points of each image cluster in increasing order of polar angles;
S104, computing the subspace distances and constructing the matrix of the subspace distances;
In the step, computing the subspace distance between the image points of an image cluster in (k+1) permutations and the ordered model points in a model cluster, retaining the minimum value of all the subspaces, the order of model points in the model cluster, and the order of image points corresponding to the minimum value in the image cluster, applying the procedures described above to each model and image cluster to produce a m×n matrix of subspaces, wherein each row of the matrix represents a model cluster and each column represents an image cluster;
S105, set a threshold d for the subspace distances, and selecting the model and image clusters whose corresponding subspace distances fall below the threshold d.
Applying the steps S102-S105 to the points in the model and image clusters can produce transformed model and image points with fewer possible correspondences and therefore lead to a reduction of computational cost in finding the desired correspondences.
For example,
S106, using the original points in the model and image clusters whose indices are the same as those of the selected points in the matching pairs to estimate coarse poses;
After the matching pairs are obtained, it is needed to use the original points that whose indices are the same as those of the points in the matching pairs to estimate the associated affine transformation. In this step, using the original points that whose indices are the same as those of the points in the matching pairs to estimate the associated affine transformation, recover two rotation matrices according the formula (2-10), and then computing the pose angle (i.e., elevation angles and azimuth angles) based on the third columns of the two rotation matrices, meanwhile, estimating a coarse z-entry of the translation vector based on the affine transformation;
S107, using the k-means clustering method to cluster all the estimated pose to yield two possible coarse poses.
In this step, using the k-means clustering method to cluster the pose angles, the z-entry of the translation vector to yield two possible coarse pose angles and a coarse z-entry of the translation vector. The said coarse poses include two coarse pose angles and a z-entry of the translation vector. The said coarse pose angle include a coarse elevation angle and a azimuth angle.
An Exemplary Embodiment of the Coarse Pose Estimation Algorithm
In the embodiment, the transformation applied to the model clusters and the image clusters can be summarized as three steps as follows:
(1) subtracting the respective centroid from each model point in the model clusters, and subtracting the respective centroid from each image point in the image clusters. Let the model points denoted by {pi}, and the image points denoted by {qi}, the transformation relating {pi} and {qi} is
qi=Api+t (1-1)
where A is a 2×2 matrix, t is a 2×1 vector. Let the centroid of a model cluster be mp, and the centroid of a image cluster be mq, then
{circumflex over (q)}i=A{circumflex over (p)}i (1-2)
where {circumflex over (q)}i=qi−mq, {circumflex over (p)}i=pi−mp. Therefore, after the step 1) is applied, the model points of the model cluster and the image points of the image cluster are related by a linear transformation.
(2) applying the “whitening transformation” to the points {circumflex over (q)}i and {circumflex over (p)}i, i.e.,
where
represents the covariance of the model points {circumflex over (p)}i, and
represents the covariance of the image points {circumflex over (q)}i.
(3) Let the transformation relating transformed model points and transformed image points be the matrix Ā, obviously,
Computing the polar coordinates of p′i and denote them by (rp′
After the steps of 1)-3) are applied, the transformed model points and image points are related by
where q′i=(ui, vi)t is an image point, p′i=(xwi, ywi)t is a model point, and R is a 2×2 rotation matrix. The
In the embodiment, suppose that there are k points in a model cluster that are arranged in an increasing order of polar angles as is illustrated by
dist(S1,S2)=∥P1−P2∥2 (1-5)
where, P1 is the projection matrix of the subspace S1 and P2 is the projection matrix of the subspace S2, the norm of a matrix is the 2-norm, i.e., the biggest singular value of a matrix. The projection matrix can be computed by the existing standard method in mathematics and hence we omit the details of its computations here.
The second step—a method of accurately estimating the pose of a planar model, comprising the steps of:
S201, normalizing the given image points;
In the step, dividing each given image point by its norm, i.e., the normalization, to produce a set of unit vectors referred to as “bundles”.
S202, set the convergence criterion;
In the step, the said convergence criterion includes: setting the initial uncertainty parameter βφ, the maximum uncertainty parameter βF, the updating uncertainty parameter βu, and the initial match matrix M. Here, the updating rules of β is βk+1=βk×βu, and the iteration terminates if βk+1>βF.
Alternatively, whether the method converges or not could be judged by the value of the cost function, when the right correspondences are found, the value of the cost function (2-7) should theoretically be zero, therefore, one could set a small constant number based on which the convergence of the method of the second step can be judged, specifically, the iterations of the method of the second step should be terminated when the value of cost function (2-7) falls below the prescribed small constant number.
It is need to point out that the iterations of the method of the second step can be terminated no matter which one of the two convergence criteria is satisfied.
S203, setting the initialization of the method of accurately estimating the pose of a planar model;
In the step, using the method of accurately estimating the pose of a planar model, i.e., the extended TsPose algorithm, with the initialization of the coarse pose angles and the coarse z-entry of the translation vector obtained in the coarse pose estimation algorithm.
It is needed to clarify that there is no limitation on how the initialization of the extended TsPose algorithm is set in the embodiment of the extended TsPose algorithm, i.e., the initialization of the extended TsPose algorithm can be carried out in several ways, for example, using one of the two coarse poses in the execution of the extended TsPose algorithm; or inputting both coarse poses simultaneously and using one of them respectively; or when the coarse pose estimation finishes, inputting the coarse poses into the medium for implementing the extended TsPose algorithm via the medium for implementing the coarse pose estimation algorithm, or calling the coarse poses from the medium implementing the coarse pose algorithm by the medium implementing the extended TsPose algorithm.
S204, computing rotation matrices, and using the computed rotation matrices as the estimated rotation matrices of the iterative searching of the extended TsPose algorithm;
In the step, computing a rotation matrix from a coarse pose angle by the rodrigues formula, and setting the rotation matrix as the initial rotation matrix of the extended TsPose algorithm, and generating the x-entry and the y-entry of the translation vector by the uniformly distributed pseudorandom generator, the z-entry of the translation vector is given by the coarse pose estimation algorithm. Here, the said coarse angle consists of the coarse elevation angle and the coarse azimuth angle.
S205, determining the initial “tangent-image” points;
In the steps of S203-S204, obtaining the rotation matrix R and the translation vector T, corresponding to:
From (2-1′) and (2-2′) we can obtain two column vectors giving the parameters of the affine transformation as follows:
where, the superscript k=0 denotes the 0-iteration, i.e., the initialization, the superscript k denotes the number of iterations in the method of accurately estimating the pose of a planar model.
Let the model points in the world reference frame be wi=(xwi, ywi, 1)T, we can obtain the initial tangent plane as lk=(r13, r23, r33)T from (1), k=0, since the initial tangent plane lk and “bundles” (which have been obtained by the normalization step from the given image points) have been known, the “tangent-image” points {circumflex over (q)}ik=({circumflex over (x)}ik, ŷik)T, k=0 then can be computed by (2-2), (2-3).
S206, using the “tangent-image” points to compute the match matrix, updating the match matrix, and determining the value of the cost function by the updated match matrix;
In the step, let:
dij=(h1wj−{circumflex over (x)}ik)2+(h2wj−ŷik)2 (2-5′)
then, computing the match matrix by:
mij0=γe(β×d
where, the superscript of mij0 denotes the initial value of the match matrix before the application of the Sinkhorn algorithm. Using the Sinkhorn algorithm to update the match matrix. Let the updated elements of the match matrix be mij, then computing the value of the cost function (2-7), corresponding to:
where, h1=λ(a11, a12, b1)T, h2=λ(a21, a22, b2)T, α is a small constant number.
It should be noted that, at the first iteration, h1, h2 are given by (2-3′), (2-4′) respectively, and in the remain iterations, h1, h2, λ are given by (2-8), (2-9) and (2-11) respectively.
S207, updating the affine transformation, recovering the rotation matrices based on the updated affine transformation, and selecting one rotation matrix and one corresponding tangent-image plane;
In the step, using the method of accurately estimating the pose of a planar model, i.e., the extended TsPose algorithm, to estimate the affine transformation relating image points and model points, and recovering two rotation matrices by (2-10), using the third columns of the two rotation matrices as the normal vectors of two new “tangent-image” planes and computing two groups of the second pose angles by the two normal vectors. Using the sinkhorn algorithm to update the match matrix.
For example: according to (2-8), (2-9), given h1=(h11, h12, h13)T, h2=(h21, h22, h23)T, and let
then
a′11=h11,a′12=h12,a′21=h21,a′22=h22
According to (2-11), we can obtain λ, since A′=λA, we then obtain the matrix A, according to (2-10) we can obtain two rotation matrices R1 and R2, let the third column of R1 be l1, and the third column of R2 be l2, l1 and l2 denote the two new “tangent-image” planes respectively, as is illustrated by
Obtaining a group of pose angles by l1, and denote it by (θ1, φ1) (where θ1 is the elevation angle, φ1 is the azimuth angle), obtaining a group of pose angles by l2, and denote it by (θ2, φ2) (θ2 is the elevation angle, φ2 is the azimuth angle), let one group of pose angles obtained in the step of S203 be (θ, φ) (θ is the elevation angle, φ is the azimuth angle), if |φ1−φ|<|φ2−φ|, the retaining l1, and let the new “tangent-image” plane lk+1=l1, simultaneously retaining R1 as the estimated rotation matrix at the k+1 iteration, let Rk+1=R1; otherwise, retaining l2, and let the new “tangent-image” plane lk+1=l2, retaining R2 as the estimated rotation matrix at the k+1 iteration, let Rk+1=R2.
In the embodiment, the said pose includes: pose angles and the translation vector. It should be pointed out that the pose angles can be obtained from the rotation matrix. Hence, the said pose is equivalent to the rotation matrix and the translation vector. In addition, one notable advantage of the embodiment of the present invention is its ability of finding the points-correspondences simultaneously when estimating the pose. Therefore, the final output result includes the rotation matrix, the translation vector, and the match matrix.
S208, using the new tangent-image plane to estimate the translation vector and the updated “tangent-image” point;
Here, substituting lk+1 into (2-2), (2-3), corresponding to:
lk+1□(ωik+1pi)=1
Vik+1=ωik+1pi
yields
Vik+1=({circumflex over (x)}ik+1,ŷik+1,{circumflex over (z)}ik+1)T (2-13)
According to the input coarse azimuth angle obtained in the coarse pose estimation algorithm, we can determine a new “tangent-image” plane.
A group of points are formed from Vik+1 and (k-th iteration)“tangent-image” points {circumflex over (q)}ik as follows:
Qik=({circumflex over (x)}ik+1,ŷik+1,{circumflex over (z)}ik+1)T (2-14)
Then the rigidity constraints can be expressed by:
λQik=Rpwi+T (2-16)
The parameter λ, R in (2-16) have been obtained in the step of S207, Qik is obtained in this step, and the model points pwi are known.
The translation vector T then can be obtained by minimizing the least-squares problem as follows:
Since the function in (2-17) is quadratic in T, given a fixed rotation R, the optimal value for T can be computed in closed form as:
Therefore, based on {λ(Qik), i=1, . . . , n}, the model points {pwi}, and the rotation matrix Rk, we can obtain the translation vector as follows:
It can be seen from the procedure mention above, or the
In the step, computing the intersection points of the bundles with the two tangent planes of the unit sphere in the camera reference frame respectively and obtaining two sets of intersection points, forming the new “tangent-image” points by selecting the x-entries and the y-entries of the intersection points, and forming the three-dimensional estimated camera representatives of the model points by taking the z-entries of the new “tangent-image” points and the coordinates of the “old” “tangent-image” points and using them and the model points to estimate two translation vectors by (2-18). In the step, selecting new “tangent-image” plane and new “tangent-image” points.
In the embodiment, using both said coarse pose angles and coarse z-entries of the translation vector as the initializations to determine two new “tangent-image” planes, and obtained two estimated poses including the estimated pose 1 and estimated pose 2 based on the said new “tangent-image” planes. It should be pointed out that we should select one of the two estimated poses based on the input coarse azimuth angle, and the said estimated pose includes the rotation matrix and the translation vector.
S209, judging whether the convergence criterion is satisfied, if it does, then terminate the iterative searching, do S210; otherwise go back to S206 and continue the iterative searching with the new “tangent-image” points;
In the step, two ways for determining the convergence of the iterative searching are as follows:
(1) Judging whether the convergence criterion is satisfied, specifically, judging whether the uncertainty parameter βk+1=βk×βu is bigger than βF; if βk+1>βF, then exit the iterative searching, do S210; otherwise, go back to S206. or
(2) Judging whether the convergence criterion is satisfied, specifically judging whether the value of cost function (2-7) is smaller than a prescribed value ε; if the value of the cost function (2-7) is smaller than the prescribed value ε, then exit the iterative searching, do S210; otherwise, go back to S206.
In the step, subtracting the second pose angles of the estimated poses from the coarse azimuth angle of the said coarse pose in the step of S203, and selecting the “tangent-image” plane and the “tangent-image” points corresponding to a smaller difference as the new “tangent-image” plane and new “tangent-image” points respectively, and then go back to do step S206 until the convergence criterion is satisfied.
Let the new “tangent-image” points given by (8) be:
qik+1=(xik+1,yik+1)T (2-14′)
In the step, using the new “tangent-image” plane and new “tangent-image” points as the current estimated results and do the steps of S206-S209, the process is repeated until the prescribed convergence criterion is satisfied. The said convergence criterion refers to the criterion for terminating the iterative loop, corresponding to: if β>βF, terminate the iterative loop, or if the value of the cost function (2-7) is smaller than a prescribed value ε, then terminate the iterative loop.
Then, after n iterations, we can obtain one estimate pose referred to as the candidate estimated pose 1; in the same way, using the other coarse pose as the initialization of the iterative searching, we can obtain the other estimated pose referred to the candidate estimated pose 2
In the following, we give a further explanation of the iterative searching of the embodiment with reference to
With the initializations as the two coarse poses obtained by the step of S203 respectively, the iterative searching continues as illustrated by
S210, retaining the value of the cost function corresponding to a initialization of a given coarse pose, and judging whether both coarse poses are served as the initializations, if not, then go back to S203; if so, then go to step S211;
If there is a coarse pose that have not been used as the initialization, then go back to S203, and use it as the initialization of the iterative searching and begin the iterative process for accurately estimating the pose of a planar model.
S211, based on values of the cost function with the iterative searching initialized with different coarse poses, we obtain the final output result of the accurate pose.
In the step, retaining the value of the cost function each time when the iterative searching finishes, and selecting the candidate estimated pose corresponding to a smaller value of the cost function as the final output. Here, the said cost function refers to a quadratic function in terms of the correspondences between the model points and the image points and the transformation relating the model points and the image points.
For example, if the cost function value of the candidate estimated pose 1 is smaller than that of the candidate estimated pose 2, then use the candidate estimated pose 1 as the final output; otherwise, use the candidate estimated pose 2 as the final output.
The Exemplary Embodiment of the Accurate Pose Estimation Method
The extended TsPose algorithm in the disclosed invention is based on the assumption that the planar model is known, and the camera is calibrated. We use the image points normalized by the focal length, i.e., points on a image plane located at the focal length f=1, unless otherwise stated.
As is illustrated by
Assume that the planar model is known, the z axis of the world reference frame is along the normal vector of the planar model, the world reference frame is defined such that its x, y and z axes satisfy the right handed rules, the world reference frame is denoted by owxwywzw, since the model points are coplanar, without loss of generality, we denote them by pwi=(xwi, ywi, 0)T, and use the notation wi=(xwi, ywi, 1)T as their homogeneous coordinates, the camera representatives of the model points are pi=(xi, yi, zi)T, where i=1, 2, . . . , n, the image points are qi=(ui, vi)T, and the homogenous image points are i=(ui, vi, 1)T.
Let oqi=(ui, vi, 1)T, i=1, 2, . . . , n, and qi=oqi/∥oqi∥, and the unit normal vector of the image plane π be l=(lx, ly, lz)T. Since the image plane in this invention is both a tangent plane of a unit sphere and a plane for imaging, the image plane π is briefly referred to as tangent-image plane denoted by:
The intersection points of the bundles with the tangent-image plane are denoted by Vi=(, , )T, and Vi are computed by combing (2-2) and (2-3):
l(ωipi)=1 (2-2)
Vi=ωipi (2-3)
where ωi is a scale factor, i=1, 2, . . . , n.
To not be confused with the given image points q, we use {circumflex over (q)} to denote the “new” image points, briefly referred to as “tangent-image” points, the coordinates of the “tangent-image” points are formed by taking the x and y entries of the points Vi, i.e., i=({circumflex over (x)}i, ŷi)T, and their corresponding homogeneous points are i=({circumflex over (x)}i, ŷi, 1)T. Assume that the tangent-image plane π is updated by k iterations, i.e., at the kth iteration, the unit normal vector of the tangent-image plane π is lk, and the “tangent-image” points are ik (the superscript k denotes the kth iteration, the subscript I denotes the index of the “tangent-image” point), then ik and model points wi are related by the following affine transformation:
(2-4) can be briefly rewritten as:
{circumflex over (q)}ik=λ(A
where, λ is a scale factor,
and B=(b1, b2)T.
To solve the affine parameters of (2-5), we define the cost function as follows:
where, h1=λ(a11, a12, b1)T, h2=λ(a21, a22, b2)T.
When the points-correspondences are unknown, the cost function E in (2-6) must be modified accordingly. In the following, we first introduce the match matrix M representing the points-correspondences, let the number of model points be n1, and the number of image points be n2, since the points-correspondences are unknown, then each image point qi (i=1, 2, . . . , n2) can match any of the model points pwj (j=1, 2, . . . , n1), suppose that the variables mij (i=1, 2, . . . , n2, j=1, 2, . . . , n1) represent the correspondence, then mij=1 implies a matched pair while mij=0 implies a unmatched pair, therefore the matrix M must be a permutation matrix, wherein the element of the matrix M should be either 0 or 1, and each row and column of the matrix add up to one, i.e., there is a two-way constraints imposed on the matrix M. In view of the fact of occlusion and the failure of the image detection algorithm in some condition, we add an extra row and column to M, slack row n1+1 and slack column n2+1, to represent the unmatched points. A value of 1 in the slack row n1+1 at column k, mi,n
To simultaneously solving the match matrix M and the affine transformation, we modify the cost function in (2-6) as follows:
where, h1=λ(a11, α12, b1)T, h2=λ(a21, a22, b2)T, and α is small constant number.
Updating the match matrix M by the sinkhorn algorithm with the steps of:
a) initialize M by mij0=γ×e(β×d
b) normalize each row of the matrix by:
c) normalize each row of the matrix by:
The cost function Ek are quadratic in the unknown vectors h1 and h2, let the partial derivatives of Ek with respect to h1 and h2 be
respectively, to compute the optimal values of h1 and h2 that minimize the cost function Ek, we should set
when the correspondence variables mij are fixed, the optimal values of h1 and h2 are computed in closed form as:
Assume that
we can recover two rotation matrices R1 and R2 with the 2×2 matrix A embedded in the upper left part, and the scale factor λ from the 2×2 matrix A′, the specific forms of the two rotation matrices are as follows:
where, S=−sign (a11a21+a12a22), sign denotes the sign of its argument, the three columns of the two rotation matrices satisfy the right-hand rule, and Sr=a112+a122+a212+a222.
Note that the unit normal vector l of the tangent-image π is along the optical axis and λl represents a vector perpendicular to the planar model in the camera reference frame. Hence, we use the positive λ, corresponding to:
Let l1 and l2 be the third columns of the rotation matrices R1 and R2, the two new “tangent-image” plane. One pose angles computed by l1 are (θ1, φ1) (wherein θ1 denotes the elevation angle, and φ1 denotes the azimuth angle), the other pose angles computed by l2 are (θ2, φ2) (wherein θ2 denotes the elevation angle, and φ2 denotes the azimuth angle). Suppose the pose angles are given by (θ, φ) are obtained by S203 (wherein θ denotes the elevation angle, and φ denotes the azimuth angle), If |φ1−φ|<|φ2−φ|, then lk+1=l1, otherwise lk+1=l2.
Then, substituting lk+1 into (2-2), (2-3), corresponding to:
lk+1□(ωik+1pi)=1
Vik+1=ωik+1pi
From the two equations above, we obtain
Vik+1=({circumflex over (x)}ik+1,ŷik+1,{circumflex over (z)}ik+1)T (2-13)
Using Vik+1 and (the kth iteration) “tangent-image” points “{circumflex over (q)}i to form a set of points as follows:
Qik=({circumflex over (x)}ik+1,ŷik+1,{circumflex over (z)}ik+1)T (2-14)
Based on the rigidity constraints, we obtain:
λQik=Rpwi+T (2-16)
The translation vector can be obtained by minimizing the least-squares errors as:
Since (2-17) is quadratic in T, given a fixed rotation matrix R, the optimal value of the translation vector T is given in closed form as:
Therefore, based on the points {λ(Qik) i=1, . . . , n}, {pwi}, and the rotation matrix Rk, we can obtain:
It should be pointed out here that, in the embodiment, the said extended TsPose is an algorithm combining the softassgin algorithm for finding correspondences and the TsPose algorithm for accurately estimating the pose of a planar model, the match matrix M in the extended TsPose algorithm is one of the parameters needed to be solved in the iterative searching, and when the match matrix M remains constant and equals to an identity matrix during iterations, the extended TsPose algorithm turns into the TsPose algorithm.
The simulated camera is with the focal length 769 pixel, the image size 512×512, the principal point (256,256), and the pixel size 0.0076 mm. The model points and the added spurious points are uniformly distributed in a square with 200 mm width, the number of model points is 50, the ratio between the number of spurious points and that of the model points is 1:2, the number of image points is 75, the image points are added the Gaussian Noise with 0 mean and 1 pixel standard deviation. The elevation angle is set to be 65 degree, and the azimuth angle is set to be 60 degree, the translation vector is [0.5; 1; 1600] (mm). The maximum number of iterations for the extended TsPose algorithm is 50.
Step 1: let the number of neighbor points in the Euclidean distance sense be 7, and constructing 50 model clusters and 50 image clusters.
Step 2: Applying a transformation to the 50 model clusters, which includes: subtracting the corresponding centroid from each model cluster, then applying the “whitening transform”, and computing the polar angles of each model cluster and arranging the model points in each model cluster in an increasing order of polar angles; similarly, applying the same procedure to the image clusters.
Step 3: computing the subspace distance between the image points of an image cluster in 8 permutations and the ordered model points in a model cluster, and then constructing the subspace distance matrix.
Step 4: set the threshold 2 for the subspace distances, and selecting the model clusters and image clusters whose corresponding subspace distances fall below the threshold 2.
Step 5: using the original points in the model clusters and image clusters whose indices are the same as those of the selected points in the matching pairs to estimate the affine transformation and then obtain a coarse pose estimate associated with each matching pair, and using the k-means clustering method to cluster all the estimated pose to yield two coarse pose.
Step 6: using both coarse pose estimates as the initializations of the extended TsPose algorithm and the baseline of selecting estimated pose in the iterative searching, and obtaining two candidate estimated poses and match matrices.
Step 7: if the cost function value of the candidate estimated pose 1 is smaller than that of the candidate estimated pose 2, then using the candidate estimated pose 1 as the final pose result, and the corresponding match matrix as the final match matrix; otherwise, we use the candidate estimated pose 2 and its corresponding match matrix as the final output result.
We present a curve plot of the cost function in the numerical exemplary embodiment, as is illustrated by
The present invention includes an embodiment of the method of the simultaneous pose and points-correspondences determination from a planar model are disclosed, which can solves problem of pose redundancy in the simultaneous pose and points-correspondences determination from a planar model. In addition, the disclosed embodiment is based on the coplanar points, and does not place restriction on the shape of a planar model. It performs well in a cluttered and occluded environment, and is noise-resilient in the presence of different levels of noise.
Accordingly, it should be apparent to those skilled in the art that the above steps of exemplary embodiments of the present invention can be implemented by computer. Those steps can be implemented in a single computer, or in a network composed of several computers; alternatively, they are implemented by some computer-readable codes, thus, they can be implemented by the computer-readable medium storing a computer program, or implemented by different electronics devices or an integrated electronics device. Other non-limiting exemplary embodiment(s) may be utilized and derived from the disclosure such that structural, logical substitutions and changes may be made without departing from the true spirit and scope of the present disclosure.
To implement the said method, the present invention also disclose a computer program product of the simultaneous pose and points-correspondences determination from a planar model, comprising:
A computer-readable medium storing a computer program implementing the method described above, and a computer-readable code implementing the method of the simultaneous pose and points-correspondences determination from a planar model; the method of the simultaneous pose and points-correspondences determination from a planar model including:
Obtaining two possible coarse poses of the planar model based on a coarse pose estimation algorithm;
Obtaining two pose candidates based on the iterative searching of the extended TsPose algorithm which is initialized by each of the two possible coarse pose estimates; and
Selecting a pose as the final output from the two pose candidates based on said coarse poses.
The computer program produce described above, wherein said obtaining two pose candidates based on the iterative searching of the extended TsPose algorithm further comprises:
using each of the two coarse pose estimates as an initial estimate, forming the initial correspondences matrix based on the computations of the initial “tangent-image” points and updating the correspondences matrix, and estimating the objective function value;
updating the affine transformation and recovering two rotation matrices based on the updated affine transformation, selecting a new rotation matrix and a new tangent-image plane based on the coarse azimuth angle of the coarse pose in the initial, and estimating a new translation vector and new “tangent-image” points based on the new tangent-image plane;
determining whether the convergence criterion is satisfied, if it is satisfied, then terminating the iterative searching of the extended TsPose algorithm; if not, then continuing the new iteration based on the estimated new “tangent-image” points until the convergence criterion is satisfied; and
said pose estimate including a rotation matrix, a translation vector, and/or a correspondence matrix; using said pose estimate as a pose candidate when the iterative searching is terminated.
The computer program product described above, wherein said method further comprises:
When the iterative searching terminates, judging whether said initial estimate is used, if there unused is said initial estimate, using the unused said initial estimate as the initial estimate of the extended TsPose algorithm and conducting the iterative searching of the extended TsPose algorithm to obtain a pose candidate.
The computer program product described above, wherein said convergence criterion is that the uncertainty parameter β is more than the prescribed maximum value, wherein the uncertainty parameter β corresponds to the multiplication of the initial β0 and a uncertainty update parameter βu; and/or,
said convergence criterion is that the objective function value is smaller than a prescribed parameter ε;
Whenever either of said convergence criterions is satisfied, the iterative searching is terminated.
The computer program product described above, wherein selecting a pose as the final output from the two pose candidates based on said coarse poses further comprises:
When two said initial estimates are used, selecting the pose candidate corresponding to a smaller objective function value and using it as the final output accurate pose result.
The exemplary embodiments of the present invention are provided for illustration purpose only and not for the purpose of limiting the invention as defined by the appended claims and their equivalents.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2011/083850 | 12/12/2011 | WO | 00 | 5/23/2014 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/086678 | 6/20/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6985620 | Sawhney | Jan 2006 | B2 |
7957584 | Najafi | Jun 2011 | B2 |
8126260 | Wallack | Feb 2012 | B2 |
20080298672 | Wallack | Dec 2008 | A1 |
20090074238 | Pfister | Mar 2009 | A1 |
20090110267 | Zakhor | Apr 2009 | A1 |
20120148145 | Liu | Jun 2012 | A1 |
20130250050 | Kanaujia | Sep 2013 | A1 |
20150371440 | Pirchheim | Dec 2015 | A1 |
Number | Date | Country |
---|---|---|
101377812 | Mar 2009 | CN |
Entry |
---|
Philip David , Daniel DeMenthon , Ramani Duraiswami , Hanan Samet, SoftPOSIT: Simultaneous Pose and Correspondence Determination, Proceedings of the 7th European Conference on Computer Vision—Part III, p. 698-714, May 28-31, 2002. |
International Search Report in international application No. PCT/CN2011/083850, mailed on Sep. 6, 2012. |
English Translation of the Written Opinion of the International Search Authority in international application No. PCT/CN2011/083850, mailed on Sep. 6, 2012. |
Planar Pose Estimation Baesed on Monocular Vision, Jun. 20, 2011. |
Number | Date | Country | |
---|---|---|---|
20140321735 A1 | Oct 2014 | US |