In an unmanned vehicle, driving autonomously on today's roads, it is necessary to obtain an accurate position estimate that provides a precise location within a lane, with a precision of a few centimeters, and in GPS-denied environments (e.g., in a tunnel). In the preferred embodiment a human first drives a vehicle on a network of roads, with the vehicle recording position data from a GPS receiver and scene data from laser scanners, which are processed into a high-resolution map file. Later, a vehicle, manned or unmanned, drives on this same network of roads, and using its sensors is able to determine its position on the road relative to the map file with significantly greater accuracy than GPS alone affords. It can also determine its position when GPS is temporarily unavailable, e.g., in narrow urban canyons. The resulting precision and reliability is a key prerequisite of autonomous unmanned ground vehicle operation.
In an in-car navigation system, the ability to track vehicle position to within higher resolution than GPS allows extra features that benefit a human driver. For example, according to the present invention it is possible to obtain a precise location estimate which easily specifies which lane the vehicle is in. This information allows the vehicle's navigation system to provide helpful information to the driver beyond that which is possible with current technology. For example, the system may warn a human driver if she departs from a lane on a highway without an expressed intent to do so (e.g., setting the turn signals). It also may assist a human driver in the decisions as to when to change lanes, such as when approaching a turn. Furthermore, the vehicle can maintain its position accuracy even in the absence of GPS signals, for example in tunnels, offering those assistances to human drivers even in GPS-denied environments.
In an unknown or hostile desert environment, an autonomous ground vehicle drives around while recording position and scene data. These data are converted into a high-resolution map file which is then analyzed and annotated by human experts, such as for military purposes. The autonomous vehicle is then able to follow specific missions with high accuracy and drive to targets with higher precision than would be possible with GPS alone. Furthermore, this system would be equally advantageous in areas devoid of GPS coverage. In this case, the system would produce position estimates using wheel odometry and inertial measurement unit data and still be able to follow precise routes and drive to targets effectively.
A computer 120 runs software algorithms 130 which log data from sensors 150, 160, and 170 and environment or LIDAR sensors 180. Sensors 150-180 are mounted in or on the vehicle 110 and are connected to computer 120 via digital communication interfaces. The computer 120 can be any variety, capable of receiving data via these interfaces. An operating system such as Mac OS X, Windows, or Linux can be adapted to integrate software 130. Computer 120 incorporates a storage medium 190, such as a hard disk drive with sufficient storage to main a log file 198 of data from sensors 150-180.
Sensor 150 is an inertial measurement unit which reports to software 130 via computer 120 changes in x, y, and z acceleration and rotational acceleration about all three axes. Sensor 160 comprises odometry information obtained directly from vehicle 110, such as over an industry-standard CAN interface. This information preferably comprises wheel revolution speeds for each wheel and steering angle. These data are sent to software 130 via computer 120. Sensor 170 is a Global Positioning System receiver that obtains position estimates in global coordinates based on data from multiple geosynchronous satellites. These data are sent to software 130 via computer 120. In one embodiment, sensors 150, 160, and 170 are integrated via commercial off-the-shelf software so as to increase the overall reliability and accuracy of the vehicle position estimation. In another embodiment, the data from these sensors are treated separately until they are integrated directly into the software 130. Sensors 180 preferably comprise one or more LIDAR measurement systems which return distance and intensity or reflectivity data at a high data rate. In one embodiment, three sensors 180 are mounted on vehicle 110, one positioned on the left side facing left, one positioned on the right side facing right, and one positioned on the rear side facing rear. The data from the LIDAR sensors 180 is also sent to the software 130 via computer 120.
In the preferred embodiment of the invention, system 100 and system 200 perform mapping of a segment of the environment, creating map 201 of this segment. At a later time, system 300 drives through this segment of the environment and uses software 330, sensors 350-380, and map 201 to continuously output estimates of the location of vehicle 310.
Software 130 logs data from sensors 150-180 to the hard disk drive 190 of computer 120 as the vehicle travels through the section of the environment to be mapped. These data are recorded at various rates. For example, the GPS 170, IMU 150, and odometry 160 data may be recorded at 200 Hz while the LIDAR data 180 may be recorded at 75 Hz. These data are stored in a computer log file 198.
Log file 198 may be post-processed, either in system 100 or system 200, using available software that implements a Kalman filter for data smoothing. In this manner, discontinuities in sensor data such as GPS signals are filtered and a smooth stream of position estimations with improved accuracy is saved to an intermediate log file 199 in the preferred embodiment.
Referring then to
In step 712 the 3-D mesh is converted to a ground plane or near ground plane representation. The simplest way to perform this conversion is to simply delete any elements which have a z axis value greater than a predetermined value. Other more sophisticated techniques can be used if desired. The remaining data is then effectively a 2-D reflectivity map or representation of the environment. This conversion results in a number of holes in the data where outliers existed, i.e. where elements not forming part of the ground plane such as vehicles and so on were present. By removing the outliers from the calculations at this stage, later mapping and localization operations are improved as they ignore points with no data. In step 713 the remaining portions of the 3-D mesh, the ground plane portion, are used to render a 2-D image of the region. In step 714 the 2-D images for the regions that overlap are aligned. More details on the alignment operations are described below, but briefly, the optimal shift is found via a least squares problem. Between any two position labels, the preferred embodiment assumes the existence of two slack variables, one that measures the relative error in x direction, and one that measures the error in y direction. The preferred embodiment adjusts the position data by minimizing the sum of all squared slack variables, under the constraint established by the ground map alignment. In step 716 the log file 199 is updated for the overlapping regions so that in those regions higher confidence log file entries are present for later operations.
In step 718, for the entire log file, the pose information and the LIDAR data are combined to form 3-D meshes, thus creating a 3-D environment for the entire area which has been recorded. In step 720 this entire 3-D mesh is then converted to a ground plane or near ground plane representation, again with outliers and other data not on or near the ground plane being deleted. In step 721 this ground plane portion is used to render a 2-D image. This 2-D representation is then stored in step 722 as map 201. In the preferred embodiment, map 201 may optionally be saved as a collection of small images instead of one large image, so that significant regions without intensity data need not be stored in the map. Consequently, the storage space required for map 201 is linear rather than quadratic in the distance traversed by the vehicle.
Certain aspects of the map generation are now described in more detail. The following equations are for the more general case, where rotational displacements, as well as x and y displacements, of poses are addressed. The preferred embodiment is simplified from the equations and assumes only x and y displacements. The preferred embodiment uses a version of the GraphSLAM algorithm, described in S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, MIT Press, Cambridge, Mass., 2005, which is hereby incorporated by reference, in developing the map 201. This description uses terminology from this text, and extends it to the problem of road surface mapping.
In GraphSLAM, the vehicle transitions through a sequence of poses as mentioned above. In urban mapping, poses are three-dimensional vectors, comprising the x-y coordinates of the vehicle, along with its heading direction (yaw); the remaining three dimensions (z, roll, and pitch are irrelevant for this problem). Let xt denote the pose at time t. Poses are linked together through relative odometry data, acquired from the vehicle's inertial guidance system.
x
t
=g(ut,xt−1)+εt
In log-likelihood form, each odometry measurement induces a nonlinear quadratic constraint of the form
(xt−g(ut,xt−1))TRt−1(xt−g(ut,xt−1))
As described above, in the preferred approach, the map 201 is a 2-D data structure which assigns to each x-y location in the environment an infrared reflectivity value (which can be thought of as a gray-level in a B/W image) based on the reflectivity or intensity values obtained from the LIDAR system 180. Thus, the ground map can be viewed as a high-resolution photograph of the ground with an orthographic camera. As seen in the various Figures, lane markings have a higher reflectivity than pavement and so are very apparent in the views.
To eliminate the effect of moving or movable objects in the map, referred to as outliers, the preferred approach fits a ground plane to each laser scan, and only retains measurements that coincide with this ground plane. As a result, only the ground or near ground are being mapped. Moving vehicles are automatically discarded from the data. For example, the cross-hatched areas of
For any pose xt and any (fixed and known) laser angle relative to the vehicle coordinate frame αt, the expected infrared reflectivity can easily be calculated. Let hi (m; xi) be this function, which calculates the expected laser reflectivity for a given map m, a robot pose xt, and a laser angle αi. Model the observation process as follows
z
t
i
=h
i(m,xt)+δti
In log-likelihood form, this provides a new set of constraints, which are of the form
(zti−hi(m,x))TQt−1(zti−hi(m,xt))
In outdoor environments, a vehicle can use GPS for localization. GPS offers the advantage that its error is usually limited to a few meters. Let yt denote the GPS signal for time t. (For notational convenience, treat yt as a 3-D vector, with the yaw estimate simply set to zero and the corresponding noise covariance in the measurement model set to infinity). At first glance, one might integrate GPS through an additional constraint in the objective function J. The resulting constraints could be of the form
where Γt is the noise covariance of the GPS signal. However, this form assumes that GPS noise is independent. In practice, GPS is subject to systematic noise. This is because GPS is affected through atmospheric properties that tend to change slowly with time.
The preferred approach models the systematic nature of the noise through a Markov chain, which uses GPS bias term bt as a latent variable. The assumption is that the actual GPS measurement is corrupted by an additive bias bt, which cannot be observed (hence is latent but can be inferred from data). This model yields constraints of the form
In this model, the latent bias variables bt are subject to a random walk of the form
b
t
=γb
t−1+βt
Putting this all together, obtain the goal function
The preferred approach uses conjugate gradient to iteratively optimize J. Unfortunately, optimizing J directly is computationally infeasible.
A key step in GraphSLAM, which is adopted here, is to first integrate out the map variables. In particular, instead of optimizing Jover all variables {xt}, {bt}, and m, first optimize a modified version of J that contains only poses {xt} and biases {bt}, and then compute the most likely map. This is motivated by the fact that the map variables can be integrated out in the SLAM joint posterior. Since nearly all unknowns in the system are with the map, this makes the problem of optimizing J much easier and can be solved efficiently. Road surface patches that are only seen once during mapping have no bearing in the pose estimation. Hence it is safe to remove the associated constraints from the goal function J. As far as the poses are concerned, this is still the identical goal function.
Of concern, however, are places that are seen more than once. Those do create constraints between pose variables from which those places were seen. These constraints correspond to the loop closure problem in SLAM. To integrate those map variables out, the preferred approach uses a highly effective approximation known as map matching. Map matching compares local submaps to find the best alignment. The preferred approach implements map matching by first identifying regions of overlap, which will then form the local maps. By just operating on regions of overlap and using those results to update the overall map, the optimization problem is simplified by orders of magnitude. A further optimization is based on exploiting the linear nature of the corrections which occur in this map matching and pose updating.
A region of overlap is the result of driving over the same terrain twice. Formally it is defined as two disjoint sequences of time indices, t1, t2, . . . and s1, s2, . . . , such that the corresponding grid cells in the map show an overlap that exceeds a given threshold θ.
Once such a region is found, the preferred approach builds two separate maps, one using only data from t1, t2, . . . , and the other only with data from s1, s2 . . . . It then searches for the alignment that maximizes the measurement probability, assuming that both adhere to a single maximum likelihood infrared reflectivity map in the area of overlap.
More specifically, a linear correlation field is computed between those maps, for different x-y offsets between these images. Because both maps are incomplete, the preferred approach only computes correlation coefficients from elements whose infrared reflectivity value is known. In cases where the alignment is unique, a single peak in this correlation field is found. The peak of this correlation field is then assumed to be the best estimate for the local alignment. The relative shift is then labeled δst, and the resulting constraint of the form above is added to the objective J.
This leads to the introduction of the following constraint in J:
(xt+δst−xs)TLst(xt+δst−xs)
Here δst is the local shift between the poses xs and xt, and Lst, is the strength of this constraint (an inverse covariance).
Replacing the map variables in J with this new constraint is clearly approximate; however, it makes the resulting optimization problem tractable. This results in a modified goal function, that replaces the many terms with the measurement model by a small number of between-pose constraints. It is of the form:
Notice that J′ does not contain any map variables m. The set of poses is then easily optimized using conjugate gradient descent (CG). After the poses are known, simply fill in all map values for which one or more measurements are available, using the average infrared reflectivity value for each location in the map (which happens to be the maximum likelihood solution under the Gaussian noise model). This is equivalent to optimize the missing constraints
under the assumption that the poses {xt}, are known.
Optimizing J′ and then J″ is done off-line by software 230. For the type of maps relevant to this description, the process of finding a suitable map takes only a few seconds; it requires less time than the acquisition of the data.
Given the corrected vehicle trajectory, it is straightforward to render the map images. It is preferred to utilize hardware accelerated OpenGL to render smoothly interpolated polygons whose vertices are based on the distances and intensities returned by the three lasers as the vehicle traverses its trajectory. The images are saved in a format which breaks rectangular areas into square grids and only squares with data are saved and can be rendered at 5-cm resolution faster than real-time.
Software 230 includes in map file 201 header data which includes information such as the resolution of the map and the global coordinates of a reference point within the map, if available.
Vehicle 310 drives through the environment that is covered by map 201. Computer 320 runs software 330 which uses map 201 and sensor data 350-380 to provide continuous location estimates relative to map 201. Software 330 continuously maintains an estimate of vehicle location. A particle filter is utilized, wherein multiple discrete vehicle estimates 700 are tracked as shown in
The preferred approach utilizes the same latent variable model discussed above. However, to achieve real-time performance, a particle filter is preferably utilized, known in robotics as Monte Carlo localizer. The preferred approach maintains a three-dimensional pose vector (x, y, and yaw). Further, to correct for environment-related changes of infrared reflectivity (e.g., wet versus dry surface), the preferred approach also maintains a variable that effectively gamma-corrects the perceived road brightness.
Referring then to
In step 814 any non- or non-near ground plane LIDAR data is removed. In step 816, for each particle, this ground plane LIDAR data is then cross-correlated to the map data for the region in which the vehicle is traveling. In the preferred embodiment the mean squared error between intensities of map 201 and the corresponding intensities from the projected LIDAR scans are compared, and in step 818 the weight of each particle is multiplied by the reciprocal of this error to update the weight of each particle. Software 330 ignores LIDAR scans which fall outside a tolerance of the ground plane and similarly ignores LIDAR scans which fall on a region of map 201 which contains no data. In this manner, dynamic obstacles, which, in general, do not occur on the ground plane itself, are not factored into the location estimates by software 330. This embodiment is therefore particularly robust to dynamic obstacles such as pedestrians, bicycles, and other vehicles, generally outliers. If available, the indicated GPS location of the vehicle may also be incorporated into the weight, which is particularly useful when GPS data is first received after a period without GPS data.
In step 820 a single or most likely vehicle location estimate is obtained by utilizing the various particle values. The most likely location is preferably computed by taking an average of all current particle locations weighted by the particle weights. This location estimate, preferably in global coordinates, can then be used to locate the vehicle on the map, as on a navigation screen or in navigation processing in an unmanned vehicle or for lane location, for example. In addition to providing a single estimate of the vehicle location, in the preferred embodiment further statistics are provided, such as the uncertainty of the vehicle location as measured by the standard deviation of the particles.
In step 822 a determination is made whether a resample period has been completed. It is necessary to occasionally resample the particles by removing particles which are of very low accuracy and providing additional higher accuracy particles, generally by duplicating particles randomly so that higher weight particles tend to be duplicated, though providing some particles at the current GPS location may also provide a better estimation, to be utilized in the operation. If it is not time, control returns to step 806 to again wait for the next set of LIDAR data. If the resample period has been completed in step 824, the particles are probabilistically resampled. In the preferred embodiment the low weight particles are removed and duplicated particles, generally high weight particles, are incorporated. Control proceeds to step 806 after step 824 to again wait for LIDAR data so that a loop thus is fully developed and positions are currently continually updated as the vehicle moves.
Depending on the type of position data, post-processing of log file 198 is not strictly necessary. Although post-processing will generally improve accuracy and data consistency, a map 201 can still be generated from real-time data, particularly from odometry sensors.
In another embodiment, absolute position estimates are unavailable in log file 198, for example because GPS was not used. In this case software 230 dynamically generates position estimates based on available odometry data 160 and/or IMU data 150 using a vehicle motion model which relates these sensor data to vehicle position changes.
Instead of, or in addition to saving a two-dimensional map, software 230 may save the three dimensional data directly to a map file 201.
In an alternative embodiment, software 330 is augmented to directly compare distance information from LIDAR scans to distance information in a 3-D representation of map 201, in addition to comparing intensity information. The additional information provided by distance allows further accuracy to be obtained at the expense of computational efficiency. In one embodiment, kernel density (KD) tree lookups are used to compare distances. In another embodiment, a separate filter is further employed to reduce the effects of dynamic obstacles which are present in LIDAR scans but not in map 201.
In another embodiment of the invention, system 100 and 200 map an environment multiple times, wherein vehicle 110 drives through the environment multiple times and log files 198 and 199 are created, from which multiple map files 201 are created independently. Subsequently, software 230 merges the multiple map files 201 and outputs a single merged map file which incorporates the data from the individual map files. The merging process can reduce the effect of dynamic obstacles by choosing to incorporate LIDAR scans which occur closer to the ground plane, among other possibilities. Additionally, in the event that vehicle 110 drives a slightly different route each time, the individual map files 201 will contain data which partially cover different sections of the environment, and these sections can be merged together into one larger cohesive map file.
Localization techniques other than a particle filter may be used. The vehicle location estimate may be represented, among other possibilities, as a single Gaussian distribution, as a multi-hypothesis Gaussian distribution, as a histogram, or any other representation that represents the position of the vehicle.
In another embodiment, software 330 is augmented to handle varying weather conditions. When LIDAR scans are projected into space and compared to the same locations in map 201, instead of computing mean squared error between intensities of map 201 and scans 380, deviations from the average of each are compared, thereby eliminating global effects such as roads which have been darkened by rain. This may be in replacement of or in addition to the gamma correction described above. In another embodiment, software 330 dynamically derives a mathematical mapping function which maps intensities from map 201 to intensities observed in laser scans and applies this transformation to values from map 201 before performing a direct comparison. One such implementation comprises treating the weather-related gamma as a latent variable and thus includes a slowly-changing gamma as part of the state vector of the particle filter. In yet another embodiment, these two methods are combined.
In an alternative embodiment, instead of fixing the number of particles at a constant value, the number of particles is dynamically adjusted based on factors such as CPU availability and location estimation uncertainty.
Another embodiment of the present invention comprises the use of an arbitrary mobile robotic platform in lieu of a traditional vehicle.
Many alternative embodiments comprising various sensor choices are easily conceivable based on the present invention. For example, the system can be adapted to function with a camera or cameras in addition to, or instead of, laser scanners. Furthermore, many types of location or position sensing devices other than those detailed herein may be utilized.
The algorithms and modules presented herein are not inherently related to any particular computer, vehicle, sensor, or other apparatus. Various general-purpose systems can be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the method steps. In addition, the present embodiments are not described with reference to any particular vehicle, sensor specification, computer operating system, or programming language. It will be appreciated that a variety of programming languages, sensor types, and vehicle platforms can be used to implement the teachings of the invention as described herein. Furthermore, as will be apparent to one of ordinary skill in the relevant art, the modules, features, attributes, methodologies, and other aspects of the invention can be implemented as software, hardware, firmware, or any combination of the three. Additionally, the present invention is not restricted to implementation in any specific operating system or environment or on any vehicle platform.
The invention described herein in its preferred embodiment has demonstrated to be robust and accurate at mapping and localizing a vehicle on city streets and freeways. The mapping algorithm was tested successfully on a variety of urban roads. A challenging mapping experiment was a four-mile urban loop which was driven twice, though the preferred algorithm works on arbitrarily large datasets from much longer distances. The preferred algorithm automatically identified and aligned 148 match points between the two loops, corrected the trajectory, and output consistent imagery at 5-cm resolution.
The online systems 100 and 300 are able to run in real-time on typical computer hardware and are capable of tracking a vehicle's location to within 5 to 10 cm accuracy, as compared to the 50-100 cm accuracy achievable with the best GPS systems. It is noted that this better than 10 cm, preferably 5 cm, accuracy is readily obtained when clean lines are present in the map 201, such as lane stripes, stop lines and the like. This condition most often occurs in the lateral direction due to the frequency of lane markers. The longitudinal accuracy often varies more widely as highly defined objects such as crosswalk lines or stop lines are not present as often, though accuracy will improve to 5 to 10 cm as they are approached, with dashed lines or reflectors then forming the primary cleanly defined lines. While strongly reflective markings enable the best localization performance, it should be noted that the invention described herein in its preferred embodiment maintains the ability to improve localization accuracy over traditional GPS/IMU systems even in the absence of strong or distinct markings, provided that there exists meaningful variation in the reflectivity of the ground, such as is common in typical unmarked roads.
This invention has also been shown to afford highly accurate localization even in the absence of GPS data during the localization phase given a fixed ground map. Using vehicle odometry data alone, system 300 is still capable of tracking vehicle location to within 5-10 cm accuracy on a typical street.
Embodiments of the invention have demonstrated robustness in the face of dynamic obstacles that were not present in the map file. When driving next to vehicles and other obstacles that are not incorporated into the map file, the system successfully ignores those obstacles and functions properly.
It will be understood by those skilled in the relevant art that the above-described implementations are merely exemplary, and many changes can be made without departing from the true spirit and scope of the present invention. Therefore, it is intended by the appended claims to cover all such changes and modifications that come within the true spirit and scope of this invention.