The present invention relates to an apparatus and a method for localising a vehicle along a route.
Vehicle localisation has been a vigorously researched topic over the last few decades. For road vehicles especially, a recent approach used by Behringer et al.—An Autonomous Ground Vehicle for Desert Driving in the DARPA Grand Challenge 2005, Intelligent Transportation Systems, 2005 Proceedings 2005 IEEE, pages 644-649, 2005 and Chen et al.—Developing a Completely Autonomous Vehicle, Intelligent Systems, IEEE, 19(5):8-11, 2004, is to use some combination of differential global positioning systems (DGPS), inertial and 3D laser sensing coupled with a prior survey.
The motivation of the present inventors is the generation of a high precision localisation without a reliance on external infrastructure or workspace modification. Localisation and pose estimation derived from local sensors suffers from compounding errors. For example, stereo Visual Odometry (VO) produces locally metric maps and trajectories. However, when extending to larger scales without correction the metric precision is lost and maps and trajectories become only topologically correct. Small angular errors, which over the course of a few hundred meters will lead pose estimates to be tens of meters in error. This makes “knowing where you are” impossible without some sort of correction or reference to prior data.
Previous work using a VO system, such as that performed by Napier et al.—Real-time Bounded-Error Pose Estimation for Road Vehicles using Vision, Intelligent Transportation Systems (ITSC), 2010 13th International IEEE Conference on, pages 1141-1146, 2010, attempted to correct for these small errors using aerial images as prior information, thereby maintaining the metric accuracy and global consistency of trajectory estimates. A coarse-to-fine approach was adopted where progressively finer refinements to the pose estimates were made by matching images from the local stereo camera to aerial images. This approach produced pose estimates commensurate with the performance of off-the-shelf GPS over kilometer scales. A similar approach by Pink et al.—Visual Features for Vehicle Localization and Ego-Motion Estimation, Intelligent Vehicles Symposium, IEEE, pages 254-260, 2009, extracts road markings from aerial and camera images to perform the localisation. Another approach by Kummerle et al.—Large Scale Graph Based SLAM using Aerial Images as Prior Information. Proc. of Robotics: Science and Systems, 2009, extracts edges of buildings from aerial images for corrections using a 2D laser based system.
However, the suitability of aerial images for reliable and accurate correction of VO poses for road vehicles has its limitations. The road surface is often occluded by trees and bridges and image resolution is of the order of tens of centimeters per pixel.
Feature based methods have been shown by Furgale et al. Visual Teach and Repeat for Long Range Rover Autonomy. Journal of Field Robotics, 27(5):534-560, 2010, to be very sensitive to relatively small changes in view point, leading to a significant drop off in matched features available for localisation.
During a survey stage of a route to be traversed, the present inventors leverage a VO system to synthesise a continuous image strip of the road as seen from above, a synthetic local orthographic image. This strip need not be metrically correct over large scales (100 m) but locally it provides an excellent template against which to match views obtained during subsequent traversals. In contrast with many registration techniques the vehicle pose is not obtained with a feature based registration technique. Instead, the vehicle pose is obtained relative to a survey trajectory by maximising the mutual information between synthetic local orthographic images and a current, live view. The synthetic images allow localisation when traveling in either direction over the road surface. With this in hand global localisation is possible if the survey vehicle's trajectory has been post processed optimised into a single global frame.
According to a first aspect of the present invention, there is provided a method for localizing a vehicle along a route, the method comprising:
Advantageously, the method of the present invention replaces the previously used aerial images with synthetic local orthographic images of the route. For example, these images may generated by a survey vehicle and vehicle localization for subsequent traversals of a route is done relative to the survey vehicles trajectory. This approach allows generation of orthographic images under bridges, trees and in man-made structures such as multi story car parks and allows for a more accurate pose estimation than that exhibited by other approaches using aerial images. This method also has the advantage that only things which can be seen from a road vehicles perspective are included in the images, excluding roof tops and grassy fields etc. Preferably, the plurality of orthographic images of the route collectively comprise a continuous, overhead image of the route. The plurality of orthographic images preferably comprise a continuous overhead image strip of the route, the vehicle only need consider the section of the strip in its vicinity.
The relation between the orthographic images and the current view preferably comprises a comparison of the mutual information between the orthographic images and the current view.
Here, the orthographic images may be represented relative to a survey vehicle trajectory which is not necessarily metrically correct over large scales. The synthetic orthographic image generation is therefore not tied to a global frame, so does not require metric global consistency. There is no reliance on GPS, any external infrastructure or workspace modification. Synthetic orthographic images are generated at a resolution two orders of magnitude higher than the best available aerial images (5 mm per pixel). Accordingly, subsequent traversals of surveyed routes by a follow vehicle can then be localized against the survey trajectory using these high resolution, high fidelity orthographic images.
In an embodiment of the invention, the orthographic images created during the survey are obtained with a stereo camera, mounted on a survey vehicle. The stereo camera comprises at least a first and second imaging portion, such as a first and second camera, which are used to synthesise the orthographic images. In an embodiment, the first imaging portion is arranged to image at least a portion of a route track and at least a portion of a route environment, and the second imaging portion is preferably arranged to image only the route track. It is desirable that objects in the environment of the route, such as off the road are observed in the stereo images, as this improves the accuracy of the visual odometry.
According to a second aspect of the present invention there is provided an apparatus for localizing a vehicle along a route, the apparatus comprising
In an embodiment, the camera comprises a stereo camera.
According to a third aspect of the present invention there is provided a vehicle comprising the apparatus of the second aspect.
According to a fourth aspect of the present invention, there is provided a computer program element comprising: computer code means to make the computer execute a method according to the first aspect.
An embodiment of the present invention will now be described by way of example only and with reference to the accompanying drawings, in which:
Referring to
In order to localise the vehicle 300 (or other device) when traversing the route, the route is first surveyed at step 201 to create a plurality of orthographic images of the route. The vehicle 300 is subsequently passed along the route and the camera 101, mounted to a front bumper 104 of the vehicle 300, is arranged to acquire a live view of the route from the perspective of the vehicle 300 at step 202. The processor 102 subsequently processes the live view at step 203 according to a computer program element at step 204, to resolve the pose of the vehicle 300 relative to a survey trajectory, by maximising a relation between the orthographic images and the live camera view.
In the embodiment illustrated, the repository is mounted within the apparatus on the vehicle, however, it is to be appreciated that the repository may alternatively be disposed at a remote site and communicatively coupled with the processor via a communications link 105. It is envisaged that this would facilitate updates to be made to the route images centrally, in accordance with further surveys of the route, so that all vehicles which traverse the route can access the most up-to-date images of the route.
Referring to
Where ⊕ and ⊖ represent the composition and inverse composition operators respectively, such that ixi+1 ⊖ixi+1 is the identity transformation.
When the vehicle subsequently passes along the route, knowledge gleaned from a previously recorded excursion (survey) is used to ensure localisation relative to the prior survey. A fundamental competency is the availability/generation of a synthetic orthographic image around a particular node Xi in the survey trajectory T. The function Iπ(Xi) is used to denote the generation of this image which utilizes the local metric scene Mi defined above. Iπ(Xi) can be computed during the vehicle traversal of the route, but can also be precomputed. The process begins with the extraction of a ground plane using the landmarks ilj in Mi and RANdom SAmple Consensus (RANSAC) to solve
biπ·{circumflex over (n)}iπ=l·{circumflex over (n)}iπ (4)
where biπ is the base, {circumflex over (n)}iπ the normal and l an arbitrary point on the plane. In the present embodiment, a stereo camera 101 (
where K is the matrix of camera intrinsics, K−1V1j is the ray associated with VIj and λj is the distance along K−1VIj to the intersection with the ground plane. A homography Hi is then generated from VI and Vπi such that
VI=HiViπ (7)
Hi is then used to project the texture in the survey images ROI taken at Xi into an orthographic image which is termed Ix(Xi). This is done for all poses in scene the Mi. The camera frame rate of 20 Hz coupled with the survey vehicle's velocity of approximately 20 kph leads to adequate overlap between consecutive Vπi (road regions of interest projected onto the orthographic image Iπ(Xi)). This presents an opportunity to combine ROIs for consecutive poses in Mi by taking an average of intensity values. This generates an image of length defined by the poses in Mi of the road surface as seen from overhead in the vicinity of Xi (
Consider now an image Ik acquired on a subsequent traversal of a surveyed route at time k in the vicinity of Xi. The pose of the vehicle can now be represented relative to Xi with itk which is the transformation between Xi and the location of the vehicle at time k (See
Xk=Xi⊕itk (8)
At run time the Stereo VO system provides a continual stream of estimates of the relative pose between camera frames vk. In the absence of any other knowledge this could be used to infer the trajectory open loop. If however the synthetic orthographic images can be leveraged to correct relative poses from the VO, then it would be possible to track the motion (stay localised) relative to the survey trajectory. Furthermore, if as a new stereo pair is presented, it would be possible to use vk to seed a guess for the transformation itko between the new camera frame and the survey trajectory, for the example in
itko=⊖i−1xi⊖i−2xi−1⊕i−2tk−1⊕vk (9)
As the vehicle moves it is necessary to track the location relative to sequential poses in the trajectory—Xi will change as the vehicle moves. However, this is a trivial data association problem; the transition to a new reference pose can be predicted using vk as indicated by the VO system. The goal of this work is then to develop a way to hone this initial estimate itko and this is done by comparing the live view Ik and that predicted by a hypothesized view of the synthetic orthographic image Iπ(Xi) at itk.
The objective function used is based on Mutual Information (MI). If itk is known perfectly then the projection proj(Iπ(Xi),itk) (hypothesized view) of the road lying in Iπ(Xi) into the live view Ik, would overlap completely. Conversely, if the pose is in error, the two views will not align and in particular will exhibit a markedly reduced amount of MI. The optimisation therefore finds a relative pose i{circumflex over (t)}k which maximises image alignment by maximising MI (see
Mutual Information is used rather than a simple correlation based approach, as it has shown to be robust against varying lighting conditions and occlusions. The MI between two images I and I*, intuitively can be thought of the information shared between the two images. It is defined as follows.
MI(I,I*)=H(I)+H(I*)−H(I,I*) (10)
The MI is obtained by evaluating the Shannon entropy of the images individually H(I) and H(I*), and then evaluating the joint entropy H(I, I*). The entropy of a single image is a measure of how much information is contained within the image
Where pi(n) is the probability of a pixel in image I having intensity n. An image can therefore be thought of as a random variable with each pixel location x having a distribution defined by pI(n)=p(I(x)=n), for nε[0,N], where N is the maximum intensity (in the present embodiment, N=255, since 8-bit grayscale images are used). The joint entropy is defined by
Where pII′(n,m)=p(I(x)=n, I*(x)=m) the joint probability of intensity co-occurrences in both images. The MI can then be written as
As an implementation detail it was found empirically that evaluating the MI over all possible pixel intensity values had little advantage over histogramming intensities into bins. Quantising the intensity values into 16 bins has a welcome smoothing effect on the cost surface and eases optimisation. Another advantage of using MI over other plausible measures such as SSD (Sum of Square Distances) is that it is meaningfully bounded. The minimum MI is zero and the maximum is the minimum value of information contained within each of the images min(H(I),H(I*)).The maximum possible information for an image is also bounded as an image with a uniform distribution of pixel values.
The problem of estimating the current pose relative to a pose Xi in the survey trajectory T then reduces to solving
As the Stereo VO has high metric accuracy over small distances ˜10 m, the deviation from the initial position estimates itko are relatively small, of the order of centimeters. Two approaches have been implemented for estimating itk the first is a fast approximate SE2 and the second a full SE3 pose correction.
The application domain is road vehicles so in the first approach to reduce complexity and increase speed itk is confined to in road plane motion, reducing the search space to SE2. However, the SE3 pose information is maintained which allows for a correction to rolling and pitching during cornering and accelerations respectively. Rather than solve eq 14 iteratively by for example using non-linear Gauss Newton methods, the small search radius is exploited and a histogram filter is used to evaluate an approximation to eq. 14. The in plane motion approximation has the consequence of reducing the sensitivity of the matching step to high frequency image content, such as fine texture on the tarmac. Very small errors in pitch, roll or height cause misalignments which stop the matching process from leveraging this fine detail. It is envisaged that the vehicle will typically operate in urban environments where the vast majority of roads have distinct road markings and so this was deemed to be an acceptable trade-off for speed. However, for short periods where there are no road markings the histogram filter can fall into local minima. This can lead to errors in the estimations of i{circumflex over (t)}k which has the effect of pulling the trajectory off course. In order to avoid this, the difference between the corrected pose i{circumflex over (t)}k in the survey trajectory and the initialisation from the Stereo VO itko was first computed.
e=i{circumflex over (t)}k⊖itko (15)
If e is greater than a threshold then the right imaging portion 101a of the camera 101 is invoked, the localisation is performed on Ik,right and a check is made for consensus. If the pose estimates from both Ik,right and Ik are commensurate, then the pose correction is adopted into the trajectory. If the pose estimates don't agree, then the match is ignored and itko is simply adopted into the trajectory. This is then repeated until matching can be reestablished. In essence when the image matching step fails the system falls back to raw VO and runs in Open Loop mode.
To evaluate the performance of the algorithm the present inventors conducted experiments on data collected from an autonomous vehicle platform, such as that illustrated in
The precision of the localisation is obtained by comparing localized images Ik and the corresponding artificially perturbed hypothesised views proj(Iπ(Xi),itk⊕ε) where ε is a perturbation of the order of centimeters.
This work presents a methodology for generating and exploiting synthetic local orthographic images to achieve centimeter precision road vehicle localisation without any infrastructure or work space modification. The method improves accuracy by an order of magnitude on a previous method using off the shelf aerial images. Stereo VO is used to generate synthetic orthographic images from a survey vehicle which are far superior in terms of resolution and fidelity to available aerial images. The approach also facilitates the generation of orthographic images in areas unavailable to aerial photography such as under bridges, trees and covered areas. These images provide a high fidelity and stable template for view matching as unlike feature based systems the gross appearance of the road surface ahead of the vehicle is used. The approach also avoids all the tracking and data association required by feature based approaches. Centimeter level accurate localisation and pose tracking is demonstrated on a 700 m trajectory as well as robustness to partial occlusion and varying weather and lighting conditions.
Number | Date | Country | Kind |
---|---|---|---|
1116958.8 | Sep 2011 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2012/052402 | 9/27/2012 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/045935 | 4/4/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5517419 | Lanckton et al. | May 1996 | A |
6194860 | Seelinger et al. | Feb 2001 | B1 |
7693654 | Dietsch et al. | Apr 2010 | B1 |
8290305 | Minear et al. | Oct 2012 | B2 |
8447474 | Breed | May 2013 | B2 |
8478642 | Dey | Jul 2013 | B2 |
8626392 | Kojo | Jan 2014 | B2 |
20070071311 | Rovira-Mas et al. | Mar 2007 | A1 |
20080033645 | Levinson et al. | Feb 2008 | A1 |
20090005961 | Grabowski et al. | Jan 2009 | A1 |
20090076655 | Blondel et al. | Mar 2009 | A1 |
20100013615 | Hebert et al. | Jan 2010 | A1 |
20100106356 | Trepagnier et al. | Apr 2010 | A1 |
20100106603 | Dey | Apr 2010 | A1 |
20100114416 | Au et al. | May 2010 | A1 |
20100121577 | Zhang et al. | May 2010 | A1 |
20100183192 | Fritsch et al. | Jul 2010 | A1 |
20100245573 | Gomi et al. | Sep 2010 | A1 |
20110123135 | Hsieh et al. | May 2011 | A1 |
20110205338 | Choi et al. | Aug 2011 | A1 |
20120050525 | Rinner et al. | Mar 2012 | A1 |
20120123642 | Kojo | May 2012 | A1 |
20120203431 | Kojo et al. | Aug 2012 | A1 |
20130120578 | Iga et al. | May 2013 | A1 |
20130162829 | Kadowaki et al. | Jun 2013 | A1 |
20140233010 | Baldwin et al. | Aug 2014 | A1 |
20140240501 | Newman et al. | Aug 2014 | A1 |
20140240690 | Newman et al. | Aug 2014 | A1 |
Number | Date | Country |
---|---|---|
2162014 | Jan 1986 | GB |
2015227139 | Dec 2015 | JP |
2013045917 | Apr 2013 | WO |
2013045927 | Apr 2013 | WO |
2013045932 | Apr 2013 | WO |
2013045935 | Apr 2013 | WO |
WO 2015186294 | Dec 2015 | WO |
Entry |
---|
Anselm Spoerri, Novel Route Guidance Displays, IEEE Vehicle Navigation & Information Systems conference, Oct. 1993, Ottawa, Canada. |
Stephen Scott-Young, Seeing the Road Ahead: GPS-Augmented Reality Aids Drivers, GPS World magazine, Nov. 1, 2003, the front cover and pp. 22-28, vol. 14, No. 11, published monthly by Questex Media Group, Santa Ana, CA, USA. |
Generation and exploitation of local orthographic imagery for road vehicle localisation; Ashley Napier; Paul Newman; Intelligent Vehicles Symposium (IV), 2012 IEEE; Year: 2012; pp. 590-596, DOI: 10.1109/IVS.2012.6232165. |
Visual topometric localization; H. Badino; D. Huber; T. Kanade; Intelligent Vehicles Symposium (IV), 2011 IEEE; Year: 2011; pp. 794-799, DOI: 10.1109/IVS.2011.5940504. |
Real-time side scan image generation and registration framework for AUV route following; Peter King; Andrew Vardy; Peter Vandrish; Benjamin Anstey; 2012 IEEE/OES Autonomous Underwater Vehicles (AUV); Year: 2012; pp. 1-6, DOI: 10.1109/AUV.2012.6380758. |
View-based localization in outdoor environments based on support vector learning; H. Morita; M. Hild; J. Miura; Y. Shirai; 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems; Year: 2005; pp. 2965-2970, DOI: 10.1109/IROS.2005.1545445. |
International Preliminary Report on Patentability received for Patent Application No. PCT/GB2012/052402, mailed on Apr. 10, 2014, 8 pages. |
International Search Report and Written Opinion of The International Searching Authority received for Patent Application No. PCT/GB2012/052402, mailed on Dec. 6, 2012, 12 pages. |
GB Intellectual Property Office Search Report under Section 17(5) received for GB Patent Application No. 1116958.8, mailed Dec. 15, 2011, 3 pages. |
Furgale et al., “Visual Teach and Repeat for Long-Range over Autonomy,” Journal of Field Robotics, vol. 27, No. 5, Sep. 1, 2010, pp. 534-560. |
Lategahn, et al., “Visual SLAM for Autonomous Ground Vehicles,” IEEE International Conference on Robotics and Automation, May 9-13, 2011, 6 pages. |
Royer et al., “Monocular Vision for Mobile Robot Localization and Autonomous Navigation,” International Journal of Computer Vision, Kluwer Academic Publishers, BO, vol. 74, No. 3, Jan. 13, 2007, pp. 237-260. |
Scaramuzza, et al., “Exploiting Motion Priors in Visual Odometry for Vehicle-Mounted Cameras with Non-holonomic Constraints,” IEEE/RSJ International Conference on Intelligent Robots and Systems, Sep. 25-30, 2011, 8 pages. |
Segvic, et al., “A mapping and localization framework for scalable appearance-based navigation,” Computer Vision and Image Understanding, Academic Press, US, vol. 113, No. 2, Feb. 1, 2009, pp. 172-187. |
Tomasi, et al., “Shape and Motion from Image Streams Under Orthography: A Factorization Method,” International Journal of Computer Vision, Kluwer Academic Publishers, Norwell, US, vol. 9, No. 2, Nov. 1, 1992, pp. 137-154. |
Levinson et al., “Robust Vehicle Localization in Urban Environments Using Probabilitic Maps”, Stanford University, 2010 IEEE International Conference on Robotics and Automation, May 2010, pp. 4372-4378, 7 pages. |
Glennie et al., “Static Calibration and Analysis of the Velodyne HDL-64E S2 for High Accuracy Mobile Scanning”, Remote Sensing, 2, 1610-1624; doi: 10.3390/rs 2061610, 15 Pages. |
Harrison et al., “TICSync: Knowing When Things Happened”, 2011 IEEE International Conference on Robotics and Automation, May 2011, pp. 356-363, 8 pages. |
International Preliminary Report on Patentability received for Patent Application No. PCT/GB2012/052381, mailed on Apr. 10, 2014, 11 pages. |
International Search Report and Written Opinion of The International Searching Authority received for Patent Application No. PCT/GB2012/052381, mailed on Dec. 12, 2012, 17 pages. |
GB Intellectual Property Office Search Report under Section 17(5) received for GB Patent Application No. 1116959.6 mailed Jan. 24, 2012, 3 pages. |
Stewart, et al., “LAPS-Localisation using Appearance of Prior Structure: 6DoF Monocular Camera Localisation using Prior Pointclouds,” IEEE International Conference on Robotics and Automation, RiverCentre, Saint Paul, Minnesota, May 14-18, 2012, pp. 2625-2632. |
Cole, et al., “Using Laser Range Data for 3D SLAM in Outdoor Environments,” IEEE International Conference on Robotics and Automation, Orlando, Florida, May 15-19, 2006, pp. 1556-1563. |
Moosmann, et al., “Velodyne SLAM,” IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany, Jun. 5-9, 2011, pp. 393-398. |
Baldwin, et al., “Road vehicle localization with 2D push-broom LIDAR and 3D priors,” IEEE International Conference on Robotics and Automation, RiverCentre, Saint Paul, Minnesota, May 14-18, 2012, pp. 2611-2617. |
Wulf, et al., “Robust Self-Localization in Industrial Environments based on 3D Ceiling Structures,” IEEE International Conference on Intelligent Robots and Systems, Oct. 9-15, 2006, Beijing, China, pp. 1530-1534. |
Viola, et al., “Alignment by Maximization of Mutual Information,” International Journal of Computer Vision, Kluwer Academic Publishers, vol. 24, No. 2, Sep. 1, 1997, pp. 137-154. |
Weste, et al., “Dynamic Time Warp Pattern Matching Using an Integrated Multiprocessing Array,” IEEE Transactions on Computers, vol. C-32, No. 8, Aug. 1, 1983, pp. 731-744. |
International Preliminary Report on Patentability received for Patent Application No. PCT/GB2012/052393, mailed on Apr. 10, 2014, 8 pages. |
International Search Report and Written Opinion of The International Searching Authority received for Patent Application No. PCT/GB2012/052393, mailed on Jan. 3, 2013, 12 pages. |
GB Intellectual Property Office Search Report under Section 17(5) received for GB Patent Application No. 1116961.2 mailed Mar. 22, 2012, 3 pages. |
Jordt, et al., “Automatic High-Precision Self-Calibration of Camera-Robot Systems,” IEEE International Conference on Robotics and Automation, Kobe, Japan—May 12-17, 2009, pp. 1244-1249. |
Saez, et al., “Underwater 3d SLAM through Entropy Minimization,” IEEE International Conference on Robotics and Automation, Orlando, Florida—May 15-19, 2006, pp. 3562-2567. |
Underwood, et al., “Error Modeling and Calibration of Exteroceptive Sensors for Accurate Mapping Applications,” Journal of Field Robotics, vol. 27, No. 1., Jan. 1, 2010, pp. 2-20. |
Sheehan, et al., “Automatic Self-Calibration of a Full Field-of-View 3D n-Laser Scanner,” International Symposium on Experimental Robotics, Delhi, India, Dec. 18, 2010, 14 pages. |
International Preliminary Report on Patentability received for Patent Application No. PCT/GB2012/052398, mailed on Apr. 10, 2014, 8 pages. |
International Search Report and Written Opinion of The International Searching Authority received for Patent Application No. PCT/GB2012/052398, mailed on Dec. 6, 2012; 13 pages. |
GB Intellectual Property Office Search Report under Section 17(5) received for GB Patent Application No. 1116960.4 mailed Jul. 6, 2012, 3 pages. |
Number | Date | Country | |
---|---|---|---|
20140249752 A1 | Sep 2014 | US |