The present invention concerns a system and computer program to estimate location(s) of at least one point and/or pose(s) of at least one sensor.
Camera pose estimation, or the perspective-n-point (PnP) problem, aims to determine the pose (location and orientation) of a camera, given a set of correspondences between 3-D points in space and their projections on the camera sensor. The problem has applications in robotics, odometry, and photogrammetry, where it is known as space resection. In the simplest case, one can use an algebraic closed-form solution to derive the camera pose from a set of minimal 3D-to-2D correspondences. Usually, three correspondences are used and hence these algorithms are called perspective-3-point or P3P methods. When there is a redundant set of points available (more than three), the most straightforward solution is to use robust algorithms, such as RANSAC, which run P3P (or its variants) on minimal subsets of correspondences. However, such algorithms suffer from low accuracy, instability and poor noise-robustness, due to the limited number of points.
An alternative approach is to directly estimate the camera pose, using an objective function, such as the 12 norm of the reprojection error, defined over all available point correspondences. Minimisation of the l2-norm leads to the maximum likelihood estimator, if it is assumed a Gaussian noise model. However, the main drawback of the l2-norm is that its resulting cost function is non-convex and usually has a lot of local minima. Therefore, iterative algorithms are used that are reliant on a good initialisation. The shortcomings of the l2-norm have lead to using other norms, such as the l∞-norm. The main advantage of the l∞-norm is that its minimisation can be formulated as a quasi-convex problem and solved using Second-Order Cone Programming (SOCP). This leads to a unique solution, however SOCP techniques are computationally demanding and rely on the correct tuning of extra parameters.
There is an interesting, well known, duality between pose estimation and triangulation, which allows common algorithms to be used for both problems. Triangulation estimates the location of a point given its projection in a number of calibrated cameras. Various triangulation algorithms exist, which once again mostly relying on minimising the reprojection error. To see the duality, in both cases we have a set of projections and we want to estimate the location of an object of interest; i.e., the camera, in pose estimation, and the point, in triangulation.
Finally, the most difficult problem is to jointly detect the location of the points in a scene and the poses of one or more cameras of different images at different viewpoints. This problem is often called SLAM problem, where SLAM stands for simultaneous localization and mapping. The SLAM problem is also used with similar algorithms as the two problems described before.
It is an object to find a method to solve those problems faster and with higher quality.
According to the invention, these aims are achieved by means of the independent claims.
The approach to calculate the sub-spaces of potential locations of point(s) and/or of potential sensor pose(s) on the basis of the known information and of the errors and to intersect theses sub-spaces to a point and/or pose intersection region results in a reliable and efficient estimator for the location of point(s) and/or sensor pose(s). Tests showed that this approach outperforms existing methods in reliability and speed.
The dependent claims refer to further advantageous embodiments.
The invention will be better understood with the aid of the description of an embodiment given by way of example and illustrated by the figures, in which:
The at least one sensor is configured to take at least one image of a scene. Preferably, the sensor comprises a plurality of pixels, wherein each pixel occupies a pixel area on the sensor. Preferably, the sensor is a camera with a camera centre. In one embodiment, the camera is a pinhole camera or can be approximated by a pinhole camera. However, the invention is also possible for fish eye cameras. In one embodiment, the at least one sensor is configured to take a set of images from different viewpoints. This can be realized by moving one sensor to different camera poses and taking the corresponding image of the different viewpoints related to the different sensor poses at different times. However, it is also possible to have different sensors at different sensor poses to take the images from different viewpoints at the same time, like for stereo cameras or camera arrays. Obviously, it is also possible to combine those approaches and take some of the images with different sensors at first poses at a first time and some others of the images with the different sensors at second poses at a second time. A sensor pose is the combination of sensor location and sensor orientation. If in the following a sensor pose is determined, computed or estimated, it is meant that the sensor location and/or the sensor orientation is determined, computed or estimated.
The processor is any means configured to perform the methods described in the following and/or claimed. The processor shall be any means to automatically perform the described/claimed steps. The processor could be a single processor or could also comprise a plurality of interconnected processors. Those interconnected processors could be in a common housing or building or could be remote from each other connected by a network like internet.
For simplicity and ease of visualisation, but without limiting the invention, we will describe the embodiments in a two dimensional space, resulting in one dimensional images; the extension of the idea to a three dimensional space and two dimensional sensors is straightforward and is also part of the invention.
If pi would be known, equation (1) would define a line on which the point si must be located. However, due to the area of the pixel in the image plane I, the exact position pi of the point source si cannot be detected. It is known that the exact position pi is somewhere in the subregion related to that pixel qi. In
In a first embodiment, a source point s shall be identified localised in a set of images taken from different viewpoints.
In a first step, the point source s represented in the images of the set of images is identified as shown e.g. in
In a second step, for each image of the set of images the sub-space of potential locations of the point source s in the scene is computed on the basis of the viewpoint of the image and on the basis of the sub-region of the image representing the point source. This step is shown in an example in
In a third step, a point intersection region of the subspaces of potential locations of the point source s of the images of the set of images is computed by intersecting the mentioned subspaces. In
In a fourth step, a point 4′ of the point intersection region 4 is selected as an estimate for the location of the point source s. A good selection algorithm could be the centre of mass of the intersection region 4. However, other selection algorithms could be used to select a point in the point intersection region 4. The intersection region 4 could be proved to be consistent and would yield always a reliable estimate for the point source localisation.
The first embodiment could be either used for multiple sensor/camera systems or for a sensor/camera moving to different viewpoints. Examples of multiple sensor systems are for example linear camera arrays or linear circular camera arrays. The shown algorithm could help to reliably and quickly reveal the point source locations in multiple camera systems. This approach could also be used to perform an adaptive camera tracking. This could be achieved by a multiple camera system, where at least one of the multiple cameras can move its pose (location and/or orientation) in order to optimise (reduce) the point intersection region for a tracked point. When the tracked point changes its location, the movable camera/sensor could adapt its position to maintain the point intersection region optimal/small.
In second embodiment, a number of point sources are known and an unknown sensor pose shall be estimated.
In a first step, sub-regions, e.g. pixels, of the image are identified or received, wherein each sub-region representing a different point source, e.g. point sources 5.1, 5.2 and 5.3.
In a second step, for each point source the sub-space of potential poses of the sensor in the scene is computed on the basis of the location of the point source and on the basis of the sub-region of the image representing the point source. In the following, it is shown how this subspace of potential poses of the sensor can be computed.
Given a quantised projection qi, it is known that the true projected point pi satisfies
Combining equation (1) with equation (2) and rearranging results in the equations
Since the point source location (six, siz), the camera parameter like f and the size of the subregion of the sensor w are known, the subspace of potential sensor poses can be computed in the solution space (tx, tz, θ). If we assume that the camera/sensor orientation θ is known, the subspace of potential sensor poses/locations of a point source is the space formed between hyperplanes going through the point source. In a two dimensional case, there are two hyperplanes/lines forming a triangle. If the orientation θ of the sensor is unknown, the solution space becomes three dimensional and the triangle rotates for changing orientations θ around the point s. In
In a third step, the pose intersection region of the sub-spaces of potential poses corresponding to the point sources is calculated. In
In a fourth step, a point of said pose intersection region is selected as the estimated pose of the sensor having captured the image or of the estimated viewpoint of the image. As in the previous embodiment, the centre of mass 7′ of the pose intersection region could be used as estimator of the sensor pose. However also other points in the intersection region 7 could be used as estimator.
In a third embodiment, the locations of a plurality of point sources and/or the poses of different images of a set of images shall be determined on the basis of the set of images taken from different viewpoints or sensor poses.
In a first step a plurality of point sources in the images of the set of images are identified (or received). In one embodiment, each of the identified point sources is represented in each of the images. In another embodiment, at least some of the identified point sources might be present only in some of the set of images, because with different viewpoints new point sources are added to the images and some point sources are lost. In one embodiment, the subregion or pixel of the image related to a point source is identified.
In a second step, for each point source of a first image of the set of images the sub-space of potential locations of the point source in the scene is computed on the basis of the sub-region of the image representing the point source.
The following steps are performed iteratively for each other image of the set of images, if there is more than one other image in the set of images. If the images are taken successively at different viewpoints, the steps could be performed in real time. The term “the other image” shall mean in the following the other image of the set of images presently treated in this iteration.
In a third step for each computed subspace of potential locations of the point sources (identified also in the other image) the subspace of potential poses of the sensor having captured the other image is computed on the basis of the previously computed subspace of the potential locations of the point source and the sub-region of the image representing the point source. In the second embodiment, it was described how the subspace of potential poses of the sensor can be calculated for a single known point. Since the subspace of potential locations of the point source is known, the subspace of potential poses of the sensor of the other image can be calculated from this subspace of potential locations of the point source and its subregion on the other image or its sensor. Preferably, this is achieved by computing first for each vertex (corner) or some of the vertices of the subspace of the potential locations of the point source the subspace of potential poses of the camera from the vertex. This is possible, because the subspace of potential locations of the point source is a convex polygon (or polyhedron for higher dimensional cases).
In a fourth step, a sensor intersection region between the sub-spaces of potential poses of the sensor of the other image corresponding to the point sources is computed. This is achieved by intersecting the subspaces 11.C, 12.C of potential sensor poses for the computed point sources 9.1, 9.2 resulting in a sensor intersection region 13 shown in
In a fifth step for each point source of the other image the sub-space of potential locations of the point source in the scene are calculated on the basis of the sub-region of the other image representing the point source and on the basis of the sensor intersection region of the sub-spaces of potential poses of the sensor of the other image. As described in the first embodiment, the subspace 10.1 of potential locations of a point source 9.1 could be improved by intersecting it with a subspace of potential locations of the point source 9.1 from a second image (here the other image) taken from another viewpoint. Contrary to the first embodiment, the exact viewpoint or sensor pose of the other image is not known. But the subspace of potential poses of the other image is known from the sensor intersection region 13 previously calculated. By calculating the subspaces of potential locations of the point 9.1 for all poses in the sensor intersection region 13 and combining them, the subspace of potential sensor poses for the point source 9.1 is achieved. Similar as in the third step, it is also sufficient to calculate in one embodiment the subspaces 14.1, 14.2, 14.3 of potential locations of the point source 9.1 from the vertices of the intersection region 13 and combining them to the subspace 14.C of potential locations for the point source 9.1 (see
In a sixth step, for each point source 9.1, 9.2 as a new sub-space of potential locations of the point source 9.1 it is calculated the point intersection region 15 between the sub-space 14.C of potential locations of the point source 9.1 in the scene 1 of the other image and the previously computed sub-space 10.1 of potential locations of the point source 9.1 in the scene 1. The fifth step could be performed first for all point sources 9.1 and 9.2 and then the sixth step can be performed for all point sources 9.1 and 9.2. In an alternative embodiment, the fifth and sixth step could be performed successively for each point source 9.1 and 9.2 before the next point source 9.1 and 9.2 is treated.
The third to sixth step is repeated for all other images of the set of images and the sensor intersection regions and the point intersection regions convert quickly and reliably to the true sensor poses and the true point locations.
If the locations of the point sources 9.1 and 9.2 shall be estimated, a location of each point intersection region 15 is selected to estimate the location of each point source 9.1 and 9.2. E.g. the centre of mass of the point intersection region 15 could be a good estimator. Alternatively or in addition, if the poses of the sensors shall be estimated, a pose of each sensor intersection region 13 is selected to estimate the pose of each sensor. E.g. the centre of mass of the sensor intersection region 13 could be a good estimator.
An application of such a system is in an autonomous moving device, like autonomous cars, autonomous flying device (e.g. drones), robots, etc. Autonomous means here that moving device controls itself its movement. The processor to determine the point source location and the sensor pose could be arranged in the autonomous moving device or at a remote location connected to the autonomous moving device. The processor could calculate the location of the autonomous moving device on the basis of the sensor(s) in the moving device in real time and base the control of the movement of the car on its position relative to the identified point sources.
The geometrical operations like creating hyperplanes in a space, combining hyperplanes, intersecting subspaces, etc. can be performed on a processor by geometrical computing modules which are well known in the state of the art.
This application claims priority of the provisional patent application U.S. 62/232,667, filed Sep. 25, 2015 the contents whereof are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62232667 | Sep 2015 | US |