This invention relates to non-line of sight obstacle detection, and more particularly to such obstacle detection and vehicular applications.
Despite an increase in the number of vehicles on the roads, the number of fatal road accidents has been trending downwards in the United States of America (USA) since 1990. One of the reasons for this trend is the advent of active safety features such as Advanced Driver Assistance Systems (ADAS).
Still, approximately 1.3M fatalities occur due to road accidents annually. Specifically, dangerous are nighttime driving scenarios and almost half of the intersection related crashes are caused due to the driver's inadequate surveillance. Better perception systems and increased situational awareness could help to make driving safer.
Many autonomous navigation technologies such as RADAR, LiDAR, and computer vision-based navigation are well established and are already being deployed in commercial products (e.g., commercial vehicles). While continuing to improve on those well-established technologies, researchers are also exploring how new and alternative uses for the existing technologies of an autonomous system's architecture (e.g. perception, planning, and control) can contribute to safer driving. One example on an alternative use of an existing technology of an autonomous system's architecture is described in Naser, Felix et al. “ShadowCam: Real-Time Detection Of Moving Obstacles Behind A Corner For Autonomous Vehicles.” 21st IEEE International Conference on Intelligent Transportation Systems, 4-7 Nov. 2018, Maui, Hi., United States, IEEE, 2018.
Aspects described herein relate to a new and alternative use of a moving vehicle's computer a vision system for detecting dynamic (i.e., moving) objects that are out of a direct line of sight (around a cornet) from the viewpoint of the moving vehicle. Aspects monitor changes in illumination (e.g., changes in shadows or cast light) in a region of interest to infer the presence of out-of-sight objects that cause the changes in illumination.
In a general aspect, an object detection method includes receiving sensor data including a number of images associated with a sensor region as the actor traverses an environment, the number of images characterizing changes of illumination in the sensor region over time, the sensor region including a region to be traversed by the actor in the future, processing the number of images determine a change of illumination in sensor the region over time. The processing includes registering the number of images to a common coordinate system based at least in part on odometry data characterizing the actor's traversal of the environment, determining the change of illumination in the sensor region over time based on the registered number of images. The method further includes determining an object detection result based at least in part on the change of illumination in the sensor region over time.
Aspects may include one or more of the following features.
The odometry data may be determined using a visual odometry method. The visual odometry method may be a direct-sparse odometry method. The change of illumination in the sensor region over time may be due to a shadow cast by an object. The object may not be visible to the sensor in the sensor region. Processing the number of images may includes determining homographies for transforming at least some of the images to a common coordinate system. Registering the number of images may include using the homographies to warp the at least some images into the common coordinate system.
Determining the change of illumination in the sensor region over time may include determining a score characterizing the change of illumination in the sensor region over time. Determining the object detection result may include comparing the score to a predetermined threshold. The object detection result may indicate that an object is detected if the score is equal to or exceeds the predetermined threshold and the object detection result may indicate that no object is detected if the score does not exceed the predetermined threshold.
Determining the change of illumination in the sensor region over time may further include performing a color amplification procedure on the number of registered images. Determining the change of illumination in the sensor region over time may further include applying a low-pass filter to the number of color amplified images. Determining the change of illumination in the sensor region over time may further include applying a threshold to pixels of the number of images to classify the pixels as either changing over time or remaining static over time. Determining the change of illumination in the sensor region over time may further include performing a morphological filtering operation on the pixels of the images. Determining the change of illumination in the sensor region over time may further include generating a score characterizing the change of illumination in the sensor region over time including summing the morphologically filtered pixels of the images.
The common coordinate system may be a coordinate system is a coordinate system associated with a first image of the number of images. The method may include providing the object detection result to an interface associated with the actor. The sensor may include a camera. The actor may include a vehicle. The vehicle may be an autonomous vehicle.
In another general aspect, software embodied on a non-transitory computer readable medium includes instructions for causing one or more processors to receive sensor data including a number of images associated with a sensor region as the actor traverses an environment, the number of images characterizing changes of illumination in the sensor region over time, the sensor region including a region to be traversed by the actor in the future and process the number of images determine a change of illumination in sensor the region over time. The processing includes registering the number of images to a common coordinate system based at least in part on odometry data characterizing the actor's traversal of the environment, determining the change of illumination in the sensor region over time based on the registered number of images, and determining an object detection result based at least in part on the change of illumination in the sensor region over time.
In another general aspect, an object detection system includes an input for receiving sensor data including a number of images associated with a sensor region as the actor traverses an environment, the number of images characterizing changes of illumination in the sensor region over time, the sensor region including a region to be traversed by the actor in the future, one or more processors for processing the number of images determine a change of illumination in sensor the region over time. The processing includes registering the number of images to a common coordinate system based at least in part on odometry data characterizing the actor's traversal of the environment, determining the change of illumination in the sensor region over time based on the registered number of images. The system includes a classifier for determining an object detection result based at least in part on the change of illumination in the sensor region over time.
Aspects may have one or more of the following advantages.
Among other advantages, aspects described herein are able to detect non-line of sight objects even under nighttime driving conditions based on shadows and illumination cues. Aspects are advantageously able to detect obstacles behind buildings or parked cars and thus help to prevent collisions.
Unlike current sensor solutions (e.g. LiDAR, RADAR, ultrasonic, cameras, etc.) and algorithms widely used in ADAS applications, aspects described herein advantageously do not require a direct line of sight in order to detect and/or classify dynamic obstacles.
Aspects may advantageously include algorithms which run fully integrated on autonomous car. Aspects also advantageously do not rely on fiducials or other markers on travel ways (e.g., AprilTags). Aspects advantageously operate even at night and can detect approaching vehicles or obstacles based on one or both of lights cast by the vehicles/obstacles (e.g., headlights) and shadows cast by the vehicles/obstacles before conventional sensor systems (e.g. a LiDAR) can detect the vehicles/objects.
Aspects advantageously do not rely on any infrastructure, hardware, or material assumptions. Aspects advantageously use a visual odometry technique that performs reliably in hallways and areas where only very few textural features exist.
Aspects advantageously increase safety by increasing the situational awareness of a human driver when used as an additional ADAS or of the autonomous vehicle when used as an additional perception module. Aspects therefore advantageously enable the human driver or the autonomous vehicle to avoid potential collisions with dynamic obstacles out of the direct line of sight at day and nighttime driving conditions.
In contrast to conventional taillight detection approaches, which are rule-based or learning-based, aspects advantageously do not require direct sight of the other vehicle to detect the vehicle
Other features and advantages of the invention are apparent from the following description, and from the claims.
Referring to
To detect such out-of-sight objects, the sensor 113 captures data (e.g., video frames) associated with a sensor region 114 in front of the vehicle 100. The sensor data processor 116 processes the sensor data to identify any changes in illumination at particular physical locations (i.e., at locations in the fixed physical reference frame as opposed to a moving reference frame) that are indicative of an obstacle that is approaching the path of the vehicle 100 but is not directly visible to the sensor 113. The obstacle detection system 112 generates a detection output that is used by the vehicle 100 to avoid collisions or other unwanted interactions with any identified obstacle that is approaching the path of the vehicle 100.
For example, in
In
Referring to
The sensor data processor 116 includes a circular buffer 220, a visual odometry-based image registration and region of interest (ROI) module 222, a pre-processing module 224, and a classifier 226.
Each image 216 acquired by the sensor 113 is first provided to the circular buffer 220, which stores a most recent sequence of images from the sensor 113. In some examples, the circular buffer 220 maintains a pointer to an oldest image stored in the buffer. When the new image 216 is received, the pointer is accessed, and the oldest image stored at the pointer location is overwritten with the new image 216. The pointer is then updated to point to the previously second oldest image. Of course, other types of buffers such as FIFO buffers can be used to store the most recent sequence of images, so long as they are able to achieve near time or near real-time performance.
With the new image 216 stored in the circular buffer 220, the circular buffer 220 outputs a most recent sequence of images, fc 228.
The most recent sequence of images, fc 228 is provided to the visual odometry-based image registration and ROI module 222, which registers the images in the sequence of images to a common viewpoint (i.e., projects the images to a common coordinate system) using a visual odometry-based registration method. A region of interest (ROI) is then identified in the registered images. The output of the visual odometry-based image registration and ROI module 222 is a sequence of registered images with a region of interest identified, fr 223.
As is described in greater detail below, in some examples, the visual odometry-based image registration and ROI module executes an algorithm that registers the images to a common viewpoint using a direct sparse odometry (DSO) technique. Very generally, DSO maintains a ‘point cloud’ of three-dimensional points encountered in a world as an actor traverses the world. This point cloud is sometimes referred to as the ‘worldframe’ view.
For each image in the sequence of images 228, DSO computes a pose Mwc including a rotation matrix, R and a translation vector t in a common (but arbitrary) frame of reference for the sequence 228. The pose represents the rotation and translation of a camera used to capture the image relative to the worldview.
The poses for each of the images are then used to compute homagraphies, H, which are in turn used to register each image to a common viewpoint such as a viewpoint of a first image of the sequence of images (e.g., the new image 216), as is described in greater detail below. In general, each of the homagraphies, H is proportional to information given by the planar surface equation, the rotation matrix, R, and the translation vector t between two images:
H∝R−tnT
where n designates the normal of the local planar approximation of a scene.
Very generally, having established a common point of view of the “world,” the approach identifies a “ground plane” in the world, for example, corresponding to the surface of a roadway or the floor of a hallway, and processes the images passed on the ground plane.
Referring to
In second and third steps 332, 333 steps at lines 2 and 3 of the algorithm 300, a pose Mc
In a fourth step 334 at line 4 of the algorithm 300, a for loop is initiated for iterating through the remaining images of the sequence of images 228. For each iteration, i of the for loop, the ith image of the sequence of images 228 is processed using fifth through fourteenth steps of the algorithm 300.
In the fifth 335 and sixth 336 steps at lines 5 and 6 of the algorithm 300, a pose, Mc
In the seventh and eighth steps 337, 338 at lines 7 and 8 of the algorithm 300, a transformation, Mc
where the rotation matrix Rc
R
c
c
=R
w
c
·(Rwc
and the translation vector tc
t
c
c
=R
w
c
·(−(Rwc
In the ninth step 339 at line 9 of the algorithm 300, the planePointsw parameters are transformed from the worldframe view, w to the viewpoint of the camera in the ith image, ci using the following transformation:
In the tenth step 340 at line 10 of the algorithm 300, the result of the transformation of the ninth step 339, planePointsc
In the eleventh step 341 at line 11 of the algorithm 300, the distance, dc
In the twelfth step 342 at line 12 of the algorithm 300, the homography matrix, Hc
Finally, in the thirteenth step 343 at line 13 of the algorithm 300, a warpPerspective( ) function is used to compute a warped version of the the ith image, fr,i by warping the perspective of the first image, f0 using the homography matrix, Hc
where K is the camera's intrinsic matrix.
With the images of the sequence of images 228 registered to a common viewpoint, a region of interest (e.g., a region where a shadow is likely to appear) in each of the images is identified and the sequence of registered images with the region of interest identified is output as, fr. In some examples, the region of interest is hand annotated (e.g., an operator annotates the image(s)). In other examples, the region of interest is determined using one or more of maps, place-recognition, and machine learning techniques.
Referring again to
Referring to
In a first step 451 at line 1 of the pre-processing algorithm 400, the registered images, fr are processed by a crop/downsample( ) function, which crops the images according to the region of interest and down samples the cropped images (e.g., using a bilinear interpolation technique) to, for example, a 100×100 pixel image.
In a second step 452 at line 2 of the pre-processing algorithm 400, the down sampled images are processed in a mean( ) function, which computes a mean image over all of the down-sampled images as:
In a third step 453 at line 3 of the pre-processing algorithm 400, the down-sampled images, fr and the mean image,
d
r,i
=|G((fr,i−
In some examples, G is a linear blur filter of size k using isotropic Gaussian kernels with covariance matrix diag (σ2, σ2), where the parameter σ is determined from k as σ=0.3·((k−1)0.5−1)+0.8. The parameter α configures the amplification of the difference to the mean image. This color amplification process helps to improve the detectability of a shadow (sometimes even if the original signal is invisible to the human eye) by increasing the signal-to-noise ratio.
In a fourth step 454 at line 4 of the pre-processing algorithm 400, the sequence of color amplified images, dr,i is then temporally low-pass filtered to generate a sequence of filtered images, tr as follows:
t
r,i
=d
r,i
·t+d
r,i−1·(1−t)
where tr,i is the filtered result of the color amplified images dr,i .
In a fifth step 455 at line 5 of the pre-processing algorithm, a “dynamic” threshold is applied, on a pixel-by-pixel basis, to the filtered images tr to generate images with classified pixels, cr. In some examples, the images with classified pixels, cr include a classification of each pixel of each image as being either “dynamic” or “static” based on changes in illumination in the pixels of the images. To generate the images with classified pixels, cr a difference from the mean of each pixel of the filtered images is used as a criterion to determine a motion classification with respect to the standard deviation of the filtered image as follows:
where w is a tune-able parameter that depends on the noise distribution. In some examples, w is set to 2. The underlying assumption is that dynamic pixels are further away from the mean since they change more drastically. So, any pixels that are determined to be closer to the mean (i.e., by not exceeding the threshold) are marked with the value ‘0’ (i.e., classified as a “static” pixel). Any pixels that are determined to be further from the mean are marked with the value ‘1’ (i.e., classified as a “dynamic” pixel).
In a sixth step 456 at line 6 of the pre-processing algorithm 400, a morphologicalFilter( ) function is applied to the classified pixels for each image, cr,i including applying a dilation to first connect pixels of the image which were classified as motion and then applying an erosion to reduce noise. In some examples, this is accomplished by applying morphological ellipse elements with two different kernel sizes, e.g.,
c
r,i=dilate(cr,i,1),
c
r,i=erode(cr,i,3).
In a seventh step 456 at line 7 of the pre-processing algorithm 400, The output of the sixth step 456 for each of the images, cr,i is summed, classified pixel-by-classified pixel as follows:
The intuitive assumption is that that more movement in between frames will result in a higher sum. Referring again to
Referring to
In a first step 561 at line 1 of the classification algorithm 500, the score, sr is compared to the camera-specific threshold, T.
In a second step 562 at line 2 of the classification algorithm 500, if the score, sr is greater than or equal to the camera-specific threshold, T then the classifier 226 outputs a detection result, c 218 indicating that the sequence of images, fc is “dynamic” and an obstacle is approaching the path of the vehicle 100.
Otherwise, in a third step 563 at line 5 of the classification algorithm, if the score, sr does not exceed the threshold, T, then the classifier 226 outputs a detection result, c 218 indicating that the sequence of images, fc is “static” and no obstacle is approaching the path of the vehicle 100. In some examples, the detection result, c 218 indicates whether it is safe for the vehicle 100 to continue along its path.
The detection result, c 218 is provided to the vehicle interface, which uses the detection result, c 218 to control the vehicle 100 (e.g., to prevent collisions between the vehicle and obstacles). In some examples, the vehicle interface temporally filters on the detection results to further smooth the signal.
Referring to
As was the case in
To detect such out-of-sight objects, the sensor 613 captures data (e.g., video frames) associated with a sensor region 614 in front of the first vehicle 600. The sensor data processor 616 processes the sensor data to identify any changes in illumination at particular physical locations (i.e., at locations in the fixed physical reference frame as opposed to a moving reference frame) that are indicative of an obstacle that is approaching the path of the vehicle 600 but is not directly visible to the sensor 613. The obstacle detection system 612 generates a detection output that is used by the vehicle 600 to avoid collisions or other unwanted interactions with any identified obstacle that is approaching the path of the vehicle 600.
For example, in
In
While the examples described above are described in the context of a vehicle such as an automobile and a pedestrian approaching an intersection, other contexts are possible. For example, the system could be used by an automobile to detect other automobiles approaching an intersection. Alternatively, the system could be used by other types of vehicles such as wheelchairs or autonomous robots to detect non-line of sight objects.
In the examples described above, detection of shadows intersecting with a region of interest is used to detect non-line of sight objects approaching an intersection. But the opposite is possible as well, where an increase in illumination due to, for example, approaching headlights is used to detect non-line of sight objects approaching an intersection (e.g., at night)
In some examples, some of the modules (e.g., registration, pre-processing, and/or classification) described above are implemented using machine learning techniques such as neural networks.
In some examples, other types of odometry (e.g., dead reckoning) are used instead of or in addition to visual odometry in the algorithms described above.
The approaches described above can be implemented, for example, using a programmable computing system executing suitable software instructions or it can be implemented in suitable hardware such as a field-programmable gate array (FPGA) or in some hybrid form. For example, in a programmed approach the software may include procedures in one or more computer programs that execute on one or more programmed or programmable computing system (which may be of various architectures such as distributed, client/server, or grid) each including at least one processor, at least one data storage system (including volatile and/or non-volatile memory and/or storage elements), at least one user interface (for receiving input using at least one input device or port, and for providing output using at least one output device or port). The software may include one or more modules of a larger program. The modules of the program can be implemented as data structures or other organized data conforming to a data model stored in a data repository.
The software may be stored in non-transitory form, such as being embodied in a volatile or non-volatile storage medium, or any other non-transitory medium, using a physical property of the medium (e.g., surface pits and lands, magnetic domains, or electrical charge) for a period of time (e.g., the time between refresh periods of a dynamic memory device such as a dynamic RAM). In preparation for loading the instructions, the software may be provided on a tangible, non-transitory medium, such as a CD-ROM or other computer-readable medium (e.g., readable by a general or special purpose computing system or device), or may be delivered (e.g., encoded in a propagated signal) over a communication medium of a network to a tangible, non-transitory medium of a computing system where it is executed. Some or all of the processing may be performed on a special purpose computer, or using special-purpose hardware, such as coprocessors or field-programmable gate arrays (FPGAs) or dedicated, application-specific integrated circuits (ASICs). The processing may be implemented in a distributed manner in which different parts of the computation specified by the software are performed by different computing elements. Each such computer program is preferably stored on or downloaded to a computer-readable storage medium (e.g., solid state memory or media, or magnetic or optical media) of a storage device accessible by a general or special purpose programmable computer, for configuring and operating the computer when the storage device medium is read by the computer to perform the processing described herein. The system may also be considered to be implemented as a tangible, non-transitory medium, configured with a computer program, where the medium so configured causes a computer to operate in a specific and predefined manner to perform one or more of the processing steps described herein.
A number of embodiments of the invention have been described. Nevertheless, it is to be understood that the foregoing description is intended to illustrate and not to limit the scope of the invention, which is defined by the scope of the following claims. Accordingly, other embodiments are also within the scope of the following claims. For example, various modifications may be made without departing from the scope of the invention. Additionally, some of the steps described above may be order independent, and thus can be performed in an order different from that described.
This application is a continuation-in-part of U.S. patent application Ser. No. 16/730,613 filed on Dec. 30, 2019 and entitled “INFRASTRUCTURE-FREE NLOS OBSTACLE DETECTION FOR AUTONOMOUS CARS,” which is a continuation-in-part of U.S. patent application Ser. No. 16/179,223 entitled “SYSTEMS AND METHOD OF DETECTING MOVING OBSTACLES” filed on Nov. 2, 2018 and which claims the benefit of U.S. Provisional Patent Application No. 62/915,570 titled “INFRASTRUCTURE FREE NLoS OBSTACLE DETECTINO FOR AUTONOMOUS CARS” filed on Oct. 15, 2019, the disclosures of which are expressly incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
62915570 | Oct 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16730613 | Dec 2019 | US |
Child | 17085641 | US | |
Parent | 16179223 | Nov 2018 | US |
Child | 16730613 | US |