The present invention relates generally to collision avoidance systems, and more particularly, to a method and system for estimating the position and motion information of a threat vehicle by fusing vision and radar sensor observations of 3D points.
Collision avoidance systems for automotive navigation have emerged as an increasingly important safety feature in today's automobiles. A specific class of collision avoidance systems that have generated significant interest of late is advanced driving assistant systems (ADAS). Exemplary ADAS include lateral guidance assistance, adaptive cruise control (ACC), collision sensing/avoidance, urban driving and stop and go situation detection, lane change assistance, traffic sign recognition, high beam automation, and fully autonomous driving. The efficacy of these systems depends on accurately sensing the spatial and temporal environment information of a host object (i.e., the object or vehicle hosting or including the ADAS system or systems) with a low false alarm rate. Exemplary temporal environment information may include present and future road and/or lane status information, such as curvatures and boundaries; and the location and motion information of on-road/off-road obstacles, including vehicles, pedestrians and the surrounding area and background.
Conversely, the stereo camera system 14 may be configured to provide high quality angular measurements (lateral resolution) to identify the boundaries of the threat vehicle 12, but poor range estimates, as indicated by the vision error bounds 20. Moreover, although laser scanning radar can detect the occupying area of the threat vehicle 12, it is prohibitively expensive for automotive applications. In addition, affordable automotive laser detection and ranging (LADAR) can only reliably detect reflectors located on a threat vehicle 12 and cannot find all occupying areas of the threat vehicle 12.
In order to overcome the deficiencies associated with using either the stereo camera system 14 and the radar sensor 16 alone, certain conventional systems attempt to combine the lateral resolution capabilities of the stereo camera system 14 with the range capabilities of the radar sensor 16, i.e., to “fuse” multi-modality sensor measurements. Fusing multi-modality sensor measurements helps to reduce error bounds associated with each measurement alone, as indicated by the fused error bounds 22.
Multi-modal prior art fusion techniques are fundamentally limited because they treat the threat car as a point object. As such, conventional methods/systems can only estimate the location and motion information of the threat car (relative to the distance between the threat and host vehicles) when it is far away (the size of the threat car does not a matter) from the sensors. However, when the threat vehicle is close to the host vehicle (<20 meters away), the conventional systems fail to consider the shape of the threat vehicle. Accounting for the shape of the vehicle provides for greater accuracy in determining if a collision is imminent.
Accordingly, what would be desirable, but has not yet been provided, is a method and system for fusing vision and radar sensing information to estimates the position and motion of a threat vehicle modeled as a rigid body object at close range, preferably less than about 20 meters from a host vehicle.
The above-described problems are addressed and a technical solution achieved in the art by providing a method for fusing depth and radar data to estimate at least a position of a threat object relative to a host object, the method comprising the steps of: receiving a plurality of depth values corresponding to at least the threat object; receiving radar data corresponding to the threat object; fitting at least one contour to a plurality of contour points corresponding to the plurality of depth values; identifying a depth closest point on the at least one contour relative to the host object; selecting a radar target based on information associated with the depth closest point on the at least one contour; fusing the at least one contour with radar data associated with the selected radar target based on the depth closest point on the at least one contour to produce a fused contour; and estimating at least the position of the threat object relative to the host object based on the fused contour.
According to an embodiment of the present invention, fusing the at least one contour with radar data associated with the selected radar target further comprises the steps of: fusing ranges and angles of the radar data associated with the selected radar target and the depth closest point on the at least one contour to form a fused closest point and translating the at least one contour to the fused closest point to form the fused contour, wherein the fused closest point is invariant. Translating the at least one contour to the fused closest point to form the fused contour further comprises the step of translating the at least one contour along a line formed on the origin of a coordinate system centered on the host object and the depth closest point to an intersection of the line and an arc formed by rotation of a central point associated with a best candidate radar target location about the origin of the coordinate system, wherein the best candidate radar target is selected from a plurality of radar targets by comparing Mahalanobis distances from the depth closest point to each of the plurality of radar targets.
According to an embodiment of the present invention, fitting at least one contour to the plurality of contour points corresponding to the plurality of depth values further comprises the steps of: fitting at least one contour to a plurality of contour points corresponding to the depth values further comprises the steps of: extracting the plurality of contour points from the plurality of depth values, and fitting a rectangular model to the plurality of contour points. Fitting a rectangular model to the plurality of contour points further comprises the steps of: fitting a single line segment to the plurality of contour points to produce a first candidate contour, fitting two perpendicular line segments joined at one point to the plurality of contour points to produce a second candidate contour, and selecting a final contour according to a comparison of weighted fitting errors of the first and second candidate contours. The single line segment of the first candidate contour is fit to the plurality of contour points such that a sum of perpendicular distances to the single line segment is minimized, and the two perpendicular line segments of the second candidate contour is fit to the plurality of contour points such that the sum of perpendicular distances to the two perpendicular lines segments is minimized. At least one of the single line segment and the two perpendicular line segments are fit to the plurality of contour points using a linear least squares model. The two perpendicular line segments are fit to the plurality of contour points by: finding a leftmost point (L) and a rightmost point (R) on the two perpendicular line segments, forming a circle wherein the L and the R are points on a diameter of the circle and C is another point on the circle, calculating perpendicular errors associated with the line segments LC and RC, and moving C along the circle to find a best point (C′) such that the sum of the perpendicular errors associated with the line segments LC and RC is the smallest. According to an embodiment of the present invention, the method may further comprise estimating location and velocity information associated with the selected radar target based at least on the radar data.
According to an embodiment of the present invention, the method may further comprise the step of tracking the fused contour using an Extended Kalman Filter.
According to an embodiment of the present invention, a system for fusing depth and radar data to estimate at least a position of a threat object relative to a host object is provided, wherein a plurality of depth values corresponding to the threat object are received from a depth sensor, and radar data corresponding to at least the threat object is received from a radar sensor, comprising: a depth-radar fusion system communicatively connected to the depth sensor and the radar sensor, the depth-radar fusion system comprising: a contour fitting module configured to fit at least one contour to a plurality of contour points corresponding to the plurality of depth values, a depth-radar fusion module configured to: identify a depth closest point on the at least one contour relative to the host object, select a radar target based on information associated with the depth closest point on the at least one contour, and fuse the at least one contour with radar data associated with the selected radar target based on the depth closest point on the at least one contour to produce a fused contour; and a contour tracking module configured to estimate at least the position of the threat object relative to the host object based on the fused contour.
The depth sensor may be at least one of a stereo vision system comprising one of a 3D stereo camera and two monocular cameras calibrated to each other, an infrared imaging systems, light detection and ranging (LIDAR), a line scanner, a line laser scanner, Sonar, and Light Amplification for Detection and Ranging (LADAR). The position of the threat object may be fed to a collision avoidance implementation system. The position of the threat object may be the location, size, pose and motion parameters of the threat object. The host object and the threat object may be vehicles.
Although embodiments of the present invention relate to the alignment of radar sensor and stereo vision sensor observations, other embodiments of the present invention relate to aligning two possibly disparate sets of 3D points. For example, according to another embodiment of the present invention, a method is described as comprising the steps of: receiving a first set of one or more 3D points corresponding to the threat object; receiving a second set of one or more 3D points corresponding to at least the threat object; selecting a first reference point in the first set; selecting a second reference point in the second set; performing a weighted average of a location of the first reference point and a location of the second reference point to form a location of a third fused point; computing a 3D translation of the location of the first reference point to the location of the third fused point; translating the first set of one or more 3D points according to the computed 3D translation; and estimating at least the position of the threat object relative to the host object based on the translated first set of one or more 3D points.
The present invention may be more readily understood from the detailed description of an exemplary embodiment presented below considered in conjunction with the attached drawings and in which like reference numerals refer to similar elements and in which:
It is to be understood that the attached drawings are for purposes of illustrating the concepts of the invention and may not be to scale.
A stereo vision module 36 accepts the stereo images 32 and outputs a range image 38 associated with the threat object, which comprise a plurality of at least one of 1, 2, or 3-dimensional depth values (i.e., scalar values for one dimension and points for two or three dimensions). Rather than deriving the depth values from a stereo vision system 36 employed as a depth sensor, the depth values may alternatively be produced by other types of depth sensors, including, but not limited to, infrared imaging systems, light detection and ranging (LIDAR), a line scanner, a line laser scanner, Sonar, and Light Amplification for Detection and Ranging (LADAR).
According to an embodiment of the present invention, a contour may be interpreted as an outline of at least a portion of an object, shape, figure and/or body, i.e., the edges or lines that defines or bounds a shape or object. According to another embodiment of the present invention, a contour may be a 2-dimensional (2D) or 3-dimensional (3D) shape that is fit to a plurality of points on an outline of an object.
According to another embodiment of the present invention, a contour may be defined as points estimated to belong to a continuous 2D vertical projection of a cuboid-modeled object's visible 3D points. The 3D points (presumed to be from the threat vehicle 12) may be vertically projected to a flat plane, that is, the height (y) dimension is collapsed, and thus the set of 3D points yields a 2D contour on a flat plane. Optionally, a 2D contour may be fit to the 3D points, based on the 3D points' (x,z) coordinates, and not based on the (y) coordinate.
The contour (i.e., the contour points 40) of a threat object (e.g., a threat vehicle) may be extracted from the depth values associated with the range image 38 using a vehicle contour extraction module 41. The vehicle contour extraction module 41 may be, for example, a computer-based module configured to perform a segmentation process, such as the segmentation processes described in co-pending U.S. patent application Ser. No. 10/766,976 filed Jan. 29, 2004, and U.S. Pat. No. 7,263,209, which are incorporated herein by reference in their entirety.
The contour points 40 are fed to a contour fitting module 42 to be described hereinbelow in connection with
As shown in
More particularly, depth-radar fusion module 50 finds a depth closest point on the 3-point contour 44 relative to the host object 10. The depth closest point is the point on the 3-point contour that is closest to the host vehicle 10. A radar target is selected based on information associated with the depth closest point on the 3-point contour 44. The 3-point contour 44 is fused with the radar data 34 associated with the selected radar target based on the depth closest point on the 3-point contour 44 to produce a fused contour. According to an embodiment of the present invention, the depth-radar fusion system 30 further comprises an extended Kalman filter 54 configured for tracking the fused contour 52 to estimate the threat vehicle's location, size, pose and motion parameters 56.
According to an embodiment of the present invention, a threat vehicle's 3-point contour 44 is determined from a plurality of contour points 40 based on depth (e.g., stereo vision (SV)) points/observations of the threat vehicle and the depth closest point on the contour of the threat vehicle relative to the host vehicle (i.e., the closest point as determined by the contour of the threat vehicle to the origin of a coordinate system centered on the host vehicle).
As shown in
For fitting the single line segment 62, the sum of the perpendicular distances from the contour points 40 to the line segment 62 is minimized. In a preferred embodiment, a perpendicular linear least square module is employed. More particularly, assuming the set of points (xi,zi) (i=1, n) are given (i.e., the contour points 40), the fitting module estimates line z=a+bx such that the sum of perpendicular distance D to the line is minimized, i.e.,
By taking a square for both sides of Equation (1), and letting
then
To fit two perpendicular line segments 64, in a preferred embodiment of the present invention, a perpendicular linear least squares module is employed. More particularly, the most left and right points, L and R are found. A circle 66 is formed in which the line segment, LR is a diameter. Perpendicular errors are calculated to the line segments LC and RC. The point C is moved along the circle 66 to find a best point (C′) (i.e., the line segments LC and RC forming right traingles are adjusted along the circle 66) such that the sum of the perpendicular errors to the line segments LC′ and RC is the smallest. With the above fitted two candidate contours 62, 64, the final fitted contour is chosen by selecting the candidate contour with the minimum weighted fitted error.
Once the fitted contour of a threat vehicle and filtered radar objects are obtained, the depth-radar fusion module 50 adjusts the location of the fitted contour by using the radar data.
In step 82, a candidate radar target from radar returns is selected using depth closest point information. The best candidate radar target is selected from among the candidate radar targets A, B, based on its distance from the depth closest point pv. More particularly, a candidate radar target, say pr, may be selected from all radar targets by comparing the Mahalanobis distances from the depth closest point pv to each the radar targets A, B.
In step 84, ranges and angles of radar measurements and the depth closest point pv are fused to form the fused closest point pf. The fused closest point pf is found based on the depth closest point pv and the best candidate radar target location. The ranges and azimuth angles of the depth closest point pv and radar target pr may be expressed as (dv±σJ
According to an embodiment of the present invention, the fused azimuth angle and its uncertainty may be calculated in a similar manner.
In step 86, the contour from the depth closest point pv is translated to the fused closest point pf to form the fused contour 79 of the threat vehicle under the constraint that the fused closest point pf is invariant. The fused contour 79 can be obtained by translating the fitted contour from pv to pf. In graphical terms, the fused contour 79 is obtained by translating the SVS contour 132 along a line formed by the origin of a coordinate system centered on the host object and the depth closest point pv to an intersection of the line and an arc formed by rotation of a central point associated with a best candidate radar target location about the origin of the coordinate system, wherein the best candidate radar target is selected from a plurality of radar targets by comparing Mahalanobis distances from the depth closest point pv to each of the plurality of radar targets.
According to another embodiment of the present invention, th depth closest point and the radar data 34 may be combined according a weighted average.
Since false alarms and outliers may exist in both radar and vision processes, the fused contour 79 needs to be filtered before being reported to the collision avoidance implementation system 84 of
xk=[xc,{dot over (x)}c,zc,żc,rL,rR,θ,{dot over (θ)}]kT, (3)
where c is the intersection point of the two perpendicular line segments if the contour is represented by two perpendicular lines, otherwise it stands for the middle of the one line segment; [xc,zc] and [{dot over (x)}c,żc] are the location and velocity of point c in host reference system, respectively; rL and rR are respectively the left and right side lengthes of the vehicle, θ is the pose of the threat vehicle with respect to (w.r.t.) x-direction; and {dot over (θ)} stands for the pose rate.
By considering a rigid body constraint, the motion of the threat vehicle in the host reference coordinate system can be modeled as a translation of point c in the x-z plane and a rotation w.r.t. axis y, which is defined down to the ground in an overhead view. In addition, assuming a constant velocity model holds between two consecutive frames for both translation and rotation motion, the kinematic equation of the system can be expressed as
xk+1=Fkxk+vk, (4)
where vk: N(0,Qk), and
Fk=diag{Fcv,Fcv,I2,Fcv}, (5)
Qk=diag{σx2Qcv,σz2Qcv,σr2I2,σθ2Qcv}. (6)
In (12) and (13), I2 is a two dimensional identity matrix, Fcv and Qcv, can be given by constant velocity model, σx, σz, σr, and σθ are system parameters.
Since the positions of the three points L, C, and R can be measured from fusion results, the observation state vector is
zk=[xL,zL,xC,zC,xR,zR]k. (7)
According to the geometry, the measurement equation can be written as
z
k
=h(xk)+wk. (8)
where h is state to observation mapping function, and wk is the observation noise under a Gaussian distribution assumption.
Once the system and observation equations have been generated, the EKF is employed to estimate the contour state vector and its covariance at each frame.
The method according to an embodiment of the present invention receives the radar data 34 from a radar sensor, comprising range-azimuth pairs that represents the location of a scattering center (SC) (i.e, the point of highest reflectivity of the radar signal) of potential threat targets and feeds them through the MTT module to estimate the locations and velocities of the SCs. The MTT module may dynamically maintain (create/delete) tracked SCs by evaluating their track scores.
More particularly, the MTT module can be related to the state vector of each SC defined by
xk=[x,{dot over (x)},z,ż]kT, (9)
where (x,z) and ({dot over (x)},ż) are the location and velocity of the SC in radar coordinate system, which is mounted on the host vehicle. A constant velocity model is used to describe the kinematics of the SC, i.e.,
x
k+1
=F
k
x
k
+v
k, (10)
where Fk is the transformation matrix, and v: N(0,Qk) (i.e., a normal distribution with zero mean and covariance Qk). The measurement state vector is
zk=[d,α]k, 11)
and the measurement equations are
d
k
=√{square root over (xk2+zk2)}+nd(k),αk=tan−1(zk/xk)+nα(k), (12)
where both nd(k) and nα(k) are 1d Gaussian noise terms.
Since the measurement equations (12) are nonlinear, the standard Extended Kalman Filtering (EKF) module may be employed to perform state (track) propagation and estimation.
To evaluate the health status of each track, the track score of each SC is monitored. Assume M is the measurement vector dimension, Pd the detection probability, Vc the measurement volume element, PFA the false alarm probability, H0 the FA hypotheses, H1 the true target hypotheses, βNT the new target density, and ys the signal amplitude to noise ratio. The track score can be initialized as
which can be updated by
{tilde over (z)} and S are measurement innovation and its covariance, respectively.
Once the evolution curve of track score is obtained, a track can be deleted if L(k)−Lmax<THD, where Lmax is the maximum track score till tk, and THD is a track deletion threshold.
respectively. The sampling frequencies for both radar and stereo vision systems are choose as 30 Hz.
The synthetic observation for radar range and range-rate are generated by: rk=
To evaluate the simulation results, the averaged errors from vision and fusion are calculated by
where {circumflex over (x)} and
From these results, the following conclusions can be gleaned: (i) there is no significant difference for the x-errors between vision and fused data, since the vision azimuth detection errors are already small enough (compared with radar) and the fusion module can not improve x-errors any further; (ii) the z-errors in the fused result are much smaller than that from vision alone, especially when the threat vehicles are far away from the host. The vision sensor at larger range gives larger observation error, and by fusing with the accurate radar observations, the overall range estimation accuracies are significantly improved.
Embodiments of the method described above were integrated into an experimental stereo vision based collision sensing system, and tested in a vehicle stereo vision and radar test bed.
An extensive road test was conducted using 2 vehicles driven 1500 miles. Driving conditions included day and night drive times, in weather ranging from clear to moderate rain and moderate snow fall. Testing was conducted in heavy traffic conditions, using an aggressive driving style to challenge the crash sensing modules.
During the driving tests, each sensor was configured with an object time-to-collision decision threshold, so that objects could be tracked as they approached the test vehicle. The object location time to collision threshold was located at 250 ms from contact, as determined by each individual sensor's modules and also by the sensor fusion module. As an object crossed the time threshold, raw data, module decision results, and ground truth data were recorded for 5 seconds prior to the threshold crossing, and 5 seconds after each threshold crossing. This allowed aggressive maneuvers to result in a 250 ms threshold crossings to happen from time to time during each test drive. The recorded data and module outputs were analyzed to determine system performance in each of the close encounters that happened during the driving tests.
During the 1500 miles of testing, 307 objects triggered the 250 mS time-to-collision threshold of the radar detection modules, and 260 objects triggered the vision systems 250 mS time-to-collision threshold. Eight objects triggered the fusion module based time-to-collision threshold. Post test data analysis determined that the eight fusion module based objects detected were all 250 mS or closer to colliding with the test car, while the other detections were triggered from noise in the trajectory prediction of objects that were upon analysis, found to be further away from the test vehicle when the threshold crossing was triggered.
It is to be understood that the exemplary embodiments are merely illustrative of the invention and that many variations of the above-described embodiments may be devised by one skilled in the art without departing from the scope of the invention. It is therefore intended that all such variations be included within the scope of the following claims and their equivalents.
This application claims the benefit of U.S. provisional patent application No. 61/039,298 filed Mar. 25, 2008, the disclosure of which is incorporated herein by reference in its entirety.
This invention was made with U.S. government support under contract number 70NANB4H3044. The U.S. government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
61039298 | Mar 2008 | US |