Described herein are techniques for fusing Radar/LiDAR information with camera information to improve vision algorithm performance. The approach uses low level fusion where raw active sensor information (detection, range, range-rate, and angle) are sent to the vision processing stage. The approach takes advantage of active sensor information early in vision processing. The disclosure is directed to Advanced Driver Assistance Systems (ADAS) and autonomous vehicles. In these systems, multiple sensors are used to detect obstacles around the vehicle.
Current ADAS and autonomous vehicle systems use multiple sensors to detect obstacles around the vehicle. Most fusion systems use high level fusion as shown in the
Described herein is a technique for fusing Radar (LiDAR) and vision information so as to take advantage of the strength of each sensor at a low level. The benefit of the proposed approach is that it takes advantage of active sensor strength at a very early stage. The technique improves vision algorithms performance by providing useful information, such as range or range-rate, which cannot be easily computed from vision sensors.
There is limited work in the area of low level sensor fusion. An example of previous work is the European project SAVE-U, entitled RADAR SENSORS AND SENSOR PLATFORM USED FOR PEDESTRIAN PROTECTION BY Tons et al., and published by Research Gate, publication number 4092473. In the project, Radar and vision information are combined at both low and high levels. In the low level, Radar information is used to define a region of interest which can be further processed by other sensors. The approach target detection of pedestrian and bicycle. However other type of object can also be detected.
Various embodiments describe herein include methods and systems for using Radar (LiDAR) information to improve vision algorithms, wherein Radar (LiDAR) detections, Radar (LiDAR) range information, and Radar range-rate are used as input to vision algorithms.
In one exemplary embodiment of the present invention, a method for constructing region of interest from Radar (LiDAR) detection units (RDU) is presented, wherein RDU are used as points in the image domain to divide the image into free vs. occupied space. Only the occupied space is searched by the vision algorithms which results in significant saving in processing time. Free space may be searched at lower frequency to protect against miss detection by the active sensor.
In addition, in one embodiment of the present invention, RDUs are used to increase the confidence of vision detections since the presence of a detection increase the likelihood that an object is present.
A missed-vision-detection around an RDU can be an indication of the failure of vision algorithm to detect an object. In one exemplary embodiment of the present invention, image areas around the RDU are processed again with different parameters or different algorithms.
Even further, in one embodiment of the present invention, the number of RDUs in an image area is used as an indication of the size of the object for the purpose of object classification, for example, truck vs. vehicle. Also the speed of the object determined from RDU range rate can be used by the classifier to distinguish bicycle from motorbike.
In one exemplary embodiment of the present invention, a range map is determined from Radar (LiDAR) detection range information. The range map can be used by vision algorithms to decide on the scale of search to use, on determining Time To Contact (TTC), and for properly placing vision detection boxes on the ground.
In an alternative embodiment of the present invention, the speed of the object in the image is determined from RDU range rate information. This help vision tracking by limiting the search space in the next frame. It can also be used to improve classification results of the vision algorithms. For example, a high speed object cannot be classified as a pedestrian.
In another embodiment of the present invention, the height information from the LiDAR sensor can be used as a feature for object classification. This can help improve classification results and reduce false detections. An object longer than 8 meters, for example, is most likely a pool and not a pedestrian.
In accordance with one embodiment, an object detection system configured to detect an object proximate to a vehicle is provided. The system includes a radar-sensor, a camera, and a controller. The radar-sensor is used to detect a radar-signal reflected by an object in a radar-field-of-view. The camera is used to capture an image of the object in a camera-field-of-view that overlaps the radar-field of view. The controller is in communication with the radar-sensor and the camera. The controller is configured to determine a location of a radar-detection in the image indicated by the radar-signal, determine a parametric-curve of the image based on the radar detections, define a region-of-interest of the image based on the parametric-curve derived from the radar-detection, wherein the region-of-interest is a subset of the camera-field-of-view, and process the region-of-interest of the image to determine an identity of the object.
The present invention will now be described, by way of example with reference to the accompanying drawings, in which:
It should be understood that the drawings are for purposes of illustrating the concepts of the invention and are not necessarily the only possible configuration for illustrating the invention.
The present principles advantageously provide a method and system for improving vision detection, classification and tracking based on RDUs. Although the present principles will be described primarily within the context of using Radar (LiDAR), the specific embodiments of the present invention should not be treated as limiting in the scope of the invention. For example, in an alternative embodiment of the present invention, a Light Emitting Diode (LED) sensor may be used.
In ADAS and automated driving systems, sensors are used to detect, classify, and track obstacles around the host vehicle. Objects can be Vehicles, Pedestrian, or unknown class referred to as general objects. Typically two or more sensors are used to overcome the shortcoming of single sensor and to increase the reliability of object detection, classifications, and tracking. The output of the sensors are then fused to determine the list of objects in the scene. Fusion can be done at a high level where every sensor is processed independently with fusion done at the end or at a low level where one sensor is used by another sensor at an early stage of processing. A combination of these fusion methods is also possible. Without loss of generality, the system and method presented herein focuses on ADAS and Radar. In general, LiDAR provide more accurate and denser data and hence can result in better performance than Radar.
Monocular vision system has been popular for ADAS applications due to its low cost and effectiveness in determining object class. To detect objects, a dataset of labeled image windows of fixed size (e.g. −n×m) are built. The database includes both positive and negative examples of the object to be detected (e.g. vehicle). A classifier is then trained to tell these windows apart. Every n x m window is then passed to the classifier for processing. Windows that the classifier labels positive contain the object, and those labeled negative do not. Object detection and classification can be done separately i.e. first detect then classify or detect/classify at the same. Detected objects are then tracked in the temporal domain using, for example, the centroid of the object window.
This search for objects in vision systems is typically done in sliding window fashion [Computer vision a Modern approach, by Forsyth and Ponce, Publisher: Pearson, 2nd edition 2011] starting from the top left corner of the image to the lower right. Since not all instances of an object will be the same size in the image, search must be done over multiple scales. Typically three or more scales are used depending on processing power available. The sliding window detection is well behaved in practice with different applications requiring different choices of feature. However the processing requirements are very high which limits the types and number of objects that can be searched.
To speed up the search for objects, in one embodiment, RDUs are used to construct a boundary curve that divides the image into free and occupied space. The resulting occupied space is the only region that needs to be searched by the vision algorithm and since it is usually smaller than the full image; it results in significant processing time saving to the sliding window algorithm.
To construct the free space boundary, RDUs are first mapped to image domain. The mapping of one sensor to another is well known in the art. It involves knowing the locations of the camera and Radar and defining a rotation and translation parameters to associate each RDU with an (x, y) location in the image space. Once RDUs are mapped into the image, they are treated as image points and fitted by a curve that extends across the image in the horizontal direction. Curve fitting is the process of constructing a curve that best fit a series of data points.
In one embodiment, the boundary curve is constructed from line segments that connect two RDUs. Staring from the left most RDU, a line is built between two consecutive RDU locations. The process continues until the right most RDU is reached. Many line fitting techniques can be used. In one embodiment, the well-known line DDA (Digital Differential Algorithm) is used which sample the line at regular intervals between the end points. The advantage is in its simplicity and speed. This can be followed by image dilation and erosion to create a smoother curve.
In another embodiment, a parametric curve may be fitted to the RDUs. There are significant amount of work in the literature on curve fitting. For example, Splines has been widely used as mathematical way to express curves. A spline is defined as a piecewise polynomial function whose derivatives satisfy some continuity constraints. It should be noted that Spline is one example of curves and many others can be used for the purpose of building the boundary curve from RDUs.
In vision algorithm a confidence value is usually associated with vision detection and classification. The confidence value represents how close the object in the current window is to the images in the training database. The confidence value can be a single value if a single level classifier is used or multiple values where each value is associated with a classifier level in cascade classification. The confidence value is typically scaled to a value between 0-1. Many algorithms rely on the confidence value for high level processing. In one embodiment, RDUs are used to increase or decrease the confidence level in the detection. A simple additive or subtractive strategy can be used such as;
If an RDU exist inside the vision detection box, Confidence=Confidence+α*x, where x is the number of RDUs inside the vision detection window, and α is used to control the influence of RDUs on vision confidence. In most cases, RDU should not completely control the confidence value of the vision detection. A max value of 0.1 can be used as an upper limit to the term α*x.
Similarly, a penalty value of −0.1 is deducted from confidence if no RDUs are present inside the vision detection window. Since Radar has poor azimuth angle, vision window can be slightly expanded for the purpose of confidence calculation.
In another embodiment, the number of RDUs can be used to influence classification results. For example, large number of detections in a vision window is an indication of the presence of large object such as a truck. This concept is also illustrated in
In other embodiments, RDUs are used as a secondary object detection indicator. For example, if vision did not detect an object close to an RDU, it can be an indication that the vision algorithm missed the object due, for example, to large variations between the object in the window and what the vision algorithm is trained to detect. The vision algorithm can then adjust some of the parameters close to an RDU to allow the missed object to be detected. One typical example is to adjust the threshold value for detection and/or classification. The threshold value measures how close, for example 80%, the features detected in the window to what is expected for this object. Near an RDU, the threshold value can be reduced to, for example, 70%. This may allow the object to be detected by the vision algorithm and avoid relying completely on RDUs to decide on object detection. In yet another embodiment, different features can be computed or different algorithm can be used around an RDU if processing resources and time are available.
Range (or depth map) is important for many ADAS applications. In computer vision, structure from motion or stereo camera can be used to estimate a depth map. This typically does not provide sufficient density in the case of structure from motion or expensive using stereo camera in either hardware (two cameras) or software (disparity algorithms). RDU range is used to build an approximate range map. The range map can be used to reduce the search space in the sliding window algorithm, to estimate Time to contact (TTC), and to probably place or resize vision detection box.
In one embodiment, the range map is constructed as shown in
In another embodiment, the range image 610 is defined as shown in
The range map has a number of uses in vision algorithm. In one embodiment, the range map is used to decide the number of scale to use in the vision processing algorithm. As an example if the object is far away (large range value), only one scale or two scales are used in the sliding window algorithm. This can result in significant saving in processing time.
In another embodiment, the range map is used for TTC calculation. The TTC use the free space in front of the vehicle to estimate the time to reach an object in front of the image. TTC can be defined as TTC=Range/Velocity, where range is estimated from the range map and Velocity is defined as the relative speed of the host vehicle minus the speed of the target object. The target object speed can be computed from the RDU range-rate.
In one embodiment, the location and size of the detection box can be adjusted based on the range map. For example, it is well known in the art [Robust range estimation with a monocular camera for vision based forward collision warning system, K. Park and S. Hwang, Scientific world journal Jan 2014], that if the real width of a vehicle is known, the vehicle width in the image can be defined using the formula: Vehicle width in image=Camera focal length*(Real Vehicle width/Z), where Z is the range in front of the camera. Using certain value for real vehicle width such as 1.4 m-2 m, the width of the image can be estimated. The projection of object into the image can be estimated as follows. A point in the road at Z will project to the image at a height y, where y is given by y=focal length*(camera height/Z)
Since Z is known from the range map, the height of the object y can be estimated. The analysis assumes that the camera is mounted so that the optical axis is parallel to the road surface. Similar analysis can be done if the camera is pitching down.
In an embodiment, RDU range-rate is used to improve tracking speed and accuracy. In vision tracking, an object detected at time “t” can be tracked at time “t+1” by searching the image to find the best match to the image window at time “t” in the image at “t+1”. If no information on the object speed is available, the search area would be large and hence time consuming. If the search space is large, you may also end up matching the wrong image window in the second frame. Using range-rate and possibly azimuth angle from RDU would help us better estimate how much an object moved and hence would help us focus the search in a smaller area of the image.
In yet another embodiment, the RDUs range-rate can also be used to influence the classification results. As an example if an object window has an RDU with large range-rate (fast object) it can be used to increase the confidence of a vehicle class vs. truck or motorcycle vs. bicycle.
As mentioned earlier, LiDAR can be used in similar way to compute free space and range map. Since LiDAR provides more accurate and denser data, free space and range map can be estimated with higher accuracy. The approach can be very useful for sparse LiDAR where only limited measurements are available during the LIDAR scan. However, LiDAR cannot, in general, provide range-rate and hence cannot be used to improve tracking, for example.
In one embodiment, LiDAR height value can be used to improve vision detection in a number of ways. As an example, if vision detected object has a very large height value from LiDAR it can be used an indication that vision detection is not correct and hence is discarded. In another example, if LiDAR height value of detected object is small, vision detection may be discarded.
To this end, the system 900 also includes a controller 912 in communication with the radar-sensor 904 and the camera 908. The controller 912 is configured to determine a location of a radar-detection 400 in the image indicated by the radar-signal 926, determine a parametric-curve 404 of the image 402 based on the radar detections, define a region-of-interest 406 of the image 402 based on the parametric-curve 404 derived from the radar-detection. Preferably, the region-of-interest 406 is a subset of the camera-field-of-view 910. The parametric-curve 404 helps to reduce the amount of image processing by limiting the area of the image 402 that is processed because objects of interest are not expected outside of the region-of-interest 406. As such, the controller 912 is configured to process the region-of-interest 406 of the image to determine an identity 914 of the object.
Since the object of interest are expected to be further away from the vehicle 924 than the boundary defined by the parametric-curve 404, the controller 912 is configured to define the region-of-interest 406 as being above the parametric-curve 404.
In order to determine when something detected in the image 402 has been adequately analyzed, the controller 912 may be configured to determine a confidence-value 916 of the identity 914 of the object 902 based on the image, and vary the confidence-value 916 based on the radar-detection.
When the controller detects the object based on the radar-signal and does not detect the object in the image, the controller 912 may decrease a threshold 918 used by the controller to process the image.
The controller 912 may be configured to determine a size 920 of the object based on a radar-detection-count 922 associated with the object.
While this invention has been described in terms of the preferred embodiments thereof, it is not intended to be so limited, but rather only to the extent set forth in the claims that follow.