The present disclosure relates generally to the field of stereo vision systems. More particularly, the disclosure relates to a stereo vision system having characteristics for improved operation during low-light conditions, and to methods for detecting and tracking objects during low-light conditions.
A stereo vision system may be incorporated into a vehicle so as to provide for viewing of a region in front of the vehicle during nighttime conditions and other low-ambient conditions and may include a plurality of camera sensors. Stereo vision systems can be used to detect objects and estimate the position of objects in the path of the vehicle in three dimensions. The detection and estimation can be obtained from a slightly different projection of the objects on two camera sensors positioned with a horizontal offset between them. The difference between the images of the two sensors is called horizontal disparity. This disparity is the source of the information for the third dimension of the position.
A typical stereo vision system may be equipped with two identical camera sensors with parallel boresight vectors. The two camera sensors are positioned with an offset in a direction that is orthogonal to the boresight vectors. This offset or separation is called the baseline separation. The baseline separation and the tolerance of collinearity between the boresights of the two vision sensors impact the three-dimensional accuracy.
A radar, for example a monopulse radar, is typically equipped with two receive and/or two transmit apertures with a boresight angle and relative positioning that is chosen in a way similar to the stereo vision sensor described above. For example, in a radar with two receive apertures, the back scatter from a target that reaches one of the receive apertures typically reaches the other aperture with a slightly longer or shorter return path length. The difference in the return path length is used to compute the angle of the target with respect to a reference angle.
Like most vision systems, camera sensors for a stereo vision system inevitably suffer from adverse illumination and weather conditions when the assistance is needed most. In low-light conditions, such as between dusk and dawn, the timing of camera exposure may be increased. As a result, the integrity of an image captured from two camera sensors may be sufficiently degraded so that a system or method cannot determine the horizontal disparity between the two sensors. Therefore, a need exists for a system and method to measure the horizontal disparity between camera sensors during low-light conditions.
One disclosed embodiment relates to a stereo vision system for use in a vehicle. The stereo vision system includes a first camera sensor and a second camera sensor. The first camera sensor is configured to sense first reflected energy and generate first sensor signals based on the sensed first reflected energy. The second camera sensor is configured to sense second reflected energy and generate second sensor signals based on the sensed second reflected energy. The stereo vision system further includes a processor configured to receive the first sensor signals from the first camera sensor and configured to receive the second sensor signals from the second camera sensor. The processor is configured to perform stereo matching based on the first sensor signals and the second sensor signals. The first camera sensor is configured to sense reflected energy that is infrared radiation. The second camera sensor is configured to sense reflected energy that is infrared radiation.
Another disclosed embodiment relates to a stereo vision system for use in a vehicle. The stereo vision system includes a first camera sensor, a second camera sensor, and a third camera sensor. The first camera sensor is configured to sense first reflected energy and generate first sensor signals based on the sensed first reflected energy. The second camera sensor is configured to sense second reflected energy and generate second sensor signals based on the sensed second reflected energy. The third camera sensor is configured to sense third reflected energy and generate third sensor signals based on the sensed third reflected energy. The stereo vision system further includes a processor configured to receive the first sensor signals from the first camera sensor, configured to receive the second sensor signals from the second camera sensor, and configured to receive the third sensor signals from the third camera sensor. The processor is further configured to perform stereo matching based on at least one of the first sensor signals, the second sensor signals, and the third sensor signals. The first camera sensor is configured to sense reflected energy that is visible radiation. The second camera sensor is configured to sense reflected energy that is visible radiation. The third camera sensor is configured to sense energy that is infrared radiation.
Still another disclosed embodiment relates to a method for stereo vision in a vehicle. The method includes sensing first reflected energy using a first camera sensor; generating first sensor signals based on the sensed first reflected energy; sensing second reflected energy using a second camera sensor; generating second sensor signals based on the sensed second reflected energy; and performing stereo matching based on the first sensor signals and the second sensor signals. The first reflected energy is infrared radiation. The second reflected energy is infrared radiation.
Referring generally to the figures, systems and methods for night vision object detection and driver assistance are shown and described. Various sensor technologies, sensor configurations, and illumination techniques are disclosed that may be used to overcome issues relating to the stereo vision system (SVS) operating in nighttime or other low ambient environments.
The stereo vision system may include a camera system including a plurality of camera sensors for sensing objects. The stereo vision system includes a stereo camera system including two cameras sensing reflected energy in the wavelength interval from 0.9 to 1.8 microns (900 to 1800 nanometers). The stereo vision system is equipped with eye-safe supplemental illumination selectively activated during low-light conditions The stereo camera system may optionally include a third central camera that may be used for data fusion techniques to add further capabilities to the stereo vision system.
Typical stereo vision systems have improved object detection and tracking capabilities throughout many environmental conditions. However, overall system performance can be limited under scenarios involving low ambient illumination (e.g., in the shadows of buildings or trees, in tunnels, and in covered parking garages) and nighttime operation at distances beyond the vehicle's headlight pattern (e.g., 30 meters while using low-beam headlights and 50 meters while using high-beam headlights).
Referring now to
Successful stereo ranging relies on measuring disparities (column shifts) between correlated structures as they appear in the left and right images captured by the cameras of the stereo vision system. In low ambient conditions, camera exposure timing increases. This in turn degrades the left and right image quality (the image regions become blurred and defocused) and eventually the search for correlated structures between the left and right images can fail. The black regions 18 in the stereo range maps of
The stereo vision system 20 of the present disclosure may include a processing circuit 30 including a processor 32 and memory 34 for completing the various activities described herein. The processor 32 may be implemented as a general purpose processor, an application specific integrated circuit (ASIC), one or more field programmable gate arrays (FPGAs), a group of processing components, or other suitable electronic processing components. The memory 34 is one or more devices (e.g., RAM, ROM, flash memory, hard disk storage, etc.) for storing data and/or computer code for completing and/or facilitating the various user or client processes, layers, and modules described in the present disclosure. The memory 34 may be or include volatile memory or non-volatile memory, and may include database components, object code components, script components, or any other type of information structure for supporting the various activities and information structures of the present disclosure. The memory 34 is communicably connected to the processor 32 and includes computer code or instruction modules for executing one or more processes described herein.
The stereo vision system 20 of the present disclosure may further include a supplemental or active illumination source or component 36. The illumination source 36 may be utilized to provide illumination (e.g., infrared illumination) to allow the stereo camera system 22 to capture images in scenes devoid of light (i.e., driving in non-illuminated tunnels and parking garages or driving under thick forest canopies, etc.).
The systems and methods of the present disclosure provide for a sensing mode implementable using the stereo camera system 22 of
Sensing in the SWIR band shares a similarity with sensing in the human-visible band (the wavelength interval from 0.35 to 0.7 microns or 350 to 700 nanometers). SWIR light is reflective light, reflected off object surfaces much like light in the human-visible band. Therefore, imagery from the camera operating in the SWIR band may be processed using established machine vision techniques developed from imagery collected with cameras capable of sensing in the human-visible band. Images from the camera of the SWIR system constructed from InGaAs are comparable to images from cameras constructed from Silicon Oxides (sensing in the human-visible band) in angular resolution and spatial detail.
Stereo matching of the imagery from the left camera 24 and right camera 26, from the stereo camera system 22 operating in the visible band produces a stereo range map (or stereo disparity map). The stereo matching of the imagery may be accomplished using one or more of the well-known methods (e.g., CENSUS, sum of absolute differences (SAD), or normalized cross correlation (NCC)). The stereo matching of the imagery may be carried with the processing circuit 30 of the stereo vision system 20, a component thereof (e.g., processor 32, memory 34, etc.), or with another data processing element in communication with the processing circuit 30. Downrange measurements at pixel locations are correlated with either the left camera pixel locations or right camera pixel locations. The elements of the stereo range map (or stereo disparity map) are commonly referred to as “range pix” or “range pixels.”
The stereo range map may be used as the basis for various modules of the stereo vision system 20 (e.g., for object detection, object tracking, and collision likelihood) and for various vehicle subsystems and applications such as forward collision warning, automatic emergency braking, adaptive cruise control, child back-over protection, etc. Such examples of machine vision algorithms are disclosed in U.S. Patent Application No. 2013/0251194, U.S. Pat. No. 8,509,523, and U.S. Pat. No. 8,594,370, all of which are incorporated by reference herein. It should be understood that the stereo range map may be used for any other type of vehicle subsystem or application.
Referring to
The left image and right image provided by the left camera and right camera are rectified (step 46). Image rectification may generally include removing lens distortion from the left and right camera images and bringing the left and right camera images into epipolar alignment.
The rectified image is used to produce a stereo range map (step 48). The stereo range map may be computed using one or more well-known methods (e.g., CENSUS, SAD, NCC, etc.), as described above.
The stereo range map is analyzed to detect objects (step 50). The object detection generally includes the process of identifying legitimate objects in the images, separating foreground and background objects in the images, and computing positional measurements for each object relative to the vehicle (e.g., calculating the down range, cross range, height, or elevation of the object relative to the vehicle).
The objects detected in the object detection step are tracked and classified (step 52). This includes identifying associated objects in consecutive video frames captured by the cameras, estimating kinematic properties of the objects, and classifying the objects into pre-defined categories (e.g., vehicle, pedestrian, bicyclist, etc.).
An output signal is provided based on a result of the object tracking (step 54) in order to provide assistance to a driver of the vehicle. For example, the processor 32 may provide an output signal to a vehicle system, or safety application. Based on the objects tracked and classified, one or more safety applications may be enabled (step 56). The safety applications activated may be various types of applications for assisting the driver. Examples of such applications may include a forward collision warning (FCW) system, an automatic emergency braking (AEB) system, an adaptive cruise control (ACC) system, and a child back-over protection (CBP) system. In further embodiments, other safety applications or other applications may be enabled based on the object tracking and classification. In further embodiments, the output signal may be relayed to the driver of the vehicle, such as with a display (e.g., center stack display, dashboard display, heads-up display, etc.) and/or an audio, tactile, or visual alarm device.
Referring to
Referring now to
In some embodiments, SWIR cameras constructed from InGaAs cannot deliver optimal or timely results in scenes devoid of light (i.e., driving in non-illuminated tunnels and parking garages or driving under thick forest canopies). Many vehicle subsystems or applications (e.g., FCW, AEB, ACC, and CBP as described above) require high scene sampling rates (camera frame rates of at least 30 frames per second) to establish reliable levels for the enabling of machine vision algorithms (object detection, object classification, object tracking, and collision likelihood). The frame rate requirement greatly limits a camera's maximum allowable integration or exposure timing (the time given to accumulate light energy for a single image frame).
In such cases, supplemental illumination generated by the supplemental or active illumination source or component 36 may be provided. In one embodiment, the supplemental or active illumination source 36 may include laser diodes. The illumination generated from the laser diodes may emit energy in the wavelength interval from 1.2 to 1.8 microns (1200 to 1800 nanometers). Laser energy in a sub-region of this wavelength interval is qualified as eye safe by The Center for Devices and Radiological Health (CDRH), a branch of the United States Food and Drug Administration (FDA) responsible for the radiation safety performance of non-medical devices which emit electromagnetic radiation. The CDRH eye safety qualification specifically includes laser energy emissions from 1.4 to 1.6 microns (1400 to 1600 nanometers).
Supplemental illumination in the SWIR band is suitable for automotive safety applications and other vehicle subsystems and applications. Illumination in the SWIR band is not human-visible and therefore would not distract the driver of the equipped vehicle and does not interfere with the vision of drivers in the oncoming vehicles. The illumination generated from the laser diodes may be compactly integrated into a vehicle's headlight assembly for forward-looking safety applications and integrated in a vehicle's taillight assembly for rear-looking safety applications.
Referring now to
Supplemental illumination in the SWIR band may also be generated from multiple light sources 36 (e.g., multiple laser diodes emitting energy in the wavelength interval from 1.4 to 1.6 microns). The collimated beam from each light source 36 may be diffused into a conic beam with a unique dispersion angle. The conic beams may be overlapped to form a layered broad-form active illumination region. According to an exemplary embodiment, the one or more light sources 36 may project overlapped diffused conic beams with dispersion angles of 60°, 40°, and 20°. Laser energy diffused into a conic beam has an inverse relationship between dispersion angle and downrange illumination distance. A larger dispersion angle reduces downrange illumination distance. For example, a first conic beam 80 with a dispersion angle of 60° provides a downrange illumination distance of 30 meters, a second conic beam 82 with a dispersion angle of 40° provides a downrange illumination distance of 60 meters, and a third conic beam 84 with a dispersion angle of 20° provides a downrange illumination distance of 90 meters. Overlapped diffused conic beams may be projected from a vehicle's headlight position, as shown in
Referring now to
The above describes an embodiment of the stereo camera system incorporating a left camera and right camera (e.g., left camera 24 and right camera 26). Referring now to
Referring to
Referring to the process of
Referring now to
Downrange=(Baseline×Focal Length)/Disparity (eq. 1)
The baseline is the real-world physical distance between the central optical axes of two cameras (illustrated by the arrow 132 in
The disparity is the image-coordinate distance (in pixels) between corresponding regions in the left camera image and right camera image. The distance is shown as DLR[2] for the disparity corresponding to region 2 in the two images. The regions may be any size from single pixels to arbitrarily-shaped clusters of pixels. The disparity may be computed within, for example, the CENSUS method or another method. Computing the disparity between corresponding pixels (a region of size 1×1) in the left camera image and right camera image results in a stereo range map (or stereo disparity map) with unique downrange measurements for every pixel (commonly referred to as “range pix” or “range pixels”) corresponding to intensity pixels in the left camera image or right camera image. This process gives the highest resolution range map, but is computationally expensive.
The focal length is the calibrated measurement of optical convergence for collimated light rays, or equivalently stated as the distance required to bring parallel light rays to intersect on a single point after passing through the lens. All three cameras of the stereo camera system may use identical lens elements and therefore share the same focal length.
The relationship between downrange, baseline, focal length, and disparity (eq. 1) states the inverse proportionality of downrange and disparity. A large disparity corresponds to a small downrange and a small disparity corresponds to a large downrange. Referring to
The baseline and focal length may be varied, which may result in various advantages or disadvantages for the stereo vision system. A larger baseline may result in better downrange accuracy; however, this may result in a larger blind zone. Referring to
Referring to
Referring to
As stated above, the disparity is computed within the stereo matching method (CENSUS, SAD, or NCC). In order to find corresponding regions in the left camera image and right camera image, the method searches through all possible disparities. In other words, for a particular intensity pixel (a 1×1 region) in the left camera image, the stereo matching method looks for the best matching pixel in the right camera image over all possible disparities (pixel distances from the current pixel). For a particular stereo camera system, a reduction in the maximum number of disparities geometrically reduces the searching required to optimally match pixels between the left camera image and right camera image. This in turn reduces the execution time for stereo matching, allowing for faster frame rates (acquiring and processing more images within a specified time interval) and possibly running the stereo matching method on a less expensive embedded processor.
Referring again to
Referring to
Another alternative embodiment of the present disclosure may be a hybrid camera system including a pair of cameras operating in the visible band (with minor additional in infrared to 1050 nm), and a center camera in the SWIR band as described above. According to an exemplary embodiment, the two cameras operating in the visible band are capable of sensing energy in a first wavelength interval (e.g., a wavelength interval from 0.4 to 1.1 microns (400 to 1100 nanometers)). The cameras' focal plane array may be constructed from common CMOS technology. According to an exemplary embodiment, the center camera operating in the SWIR band is capable of sensing energy in a second wavelength interval (e.g., a wavelength interval from 0.9 to 1.8 microns), as described above.
The imagery resulting from the use of the SWIR camera of the hybrid camera system may be used to confirm the information obtained from the cameras operating in the visible band. The SWIR camera has different spectral properties than the CMOS cameras. Therefore, the imagery from the SWIR camera may not confirm information as well with environmental colors that reflect well in the infrared, such as red.
However, black clothing (and other black materials) reflect well in the infrared. Thus, the SWIR camera can “see” black clothing much better at night, since common halogen headlights have significant power in the infrared since they are almost blackbody radiation. The black objects detected by the SWIR camera are a significant addition to the objects detected by the CMOS cameras. The use of the SWIR camera allows the image processor to more clearly display the black materials and the object detection system to more easily detect the objects.
SWIR imagery with active SWIR illumination can be fused with the information from the CMOS stereo cameras to improve the CMOS stereo camera performance. The CMOS camera sensors may have about ⅙ sensitivity of the peak at 900 mm and decreases to near zero above 1050 nm, resulting in an increased signal strength. There may be an advantage when the normal headlight is in low beam, since the SWIR illumination is invisible and therefore can illuminate a pattern similar to visible high beams. Therefore, the top portion of pedestrians, close vehicles, and other object are illuminated for the stereo vision system.
With SWIR illumination, the SWIR camera sensor has larger signals and can be a better confirmation or validation check for the visible CMOS stereo system. According to another exemplary embodiment, a thermal infrared sensor/camera may be used instead of the SWIR sensor. For example, a long-wave infrared sensor that allows detection of self-luminous infrared radiation from objects may be used. This type of thermal infrared sensor detects radiation from objects which radiate in this thermal range since the objects are at non-absolute zero temperatures. Living beings typically radiate around 10 microns wavelength. Vehicles and infrastructure radiate at shorter wavelengths as they get hotter. Either an SWIR or a thermal infrared camera may be used in the center location of the stereo vision system.
The stereo sensors and the infrared sensor can work together to enhance the night vision capability of the stereo vision system. As one example, sensor fusion can be used to fuse information extracted from cameras sensing in different spectrum bands. In order to capture the same scene at each time instance, sensors are typically aligned so that their lines of sight are parallel with each other. Sensor calibration is often a necessary step to remove lens distortion in the images and to meet the epipolar constraint for stereo matching. The geometric relations between the infrared sensor and the stereo sensors (relative positions and rotations) can also be precisely calculated during calibration, so that the sensing spaces of the two different sensors can be accurately related mathematically.
Sensor fusion can occur in different ways and at different levels. In one embodiment, sensor fusion may occur at the raw signal level. If the stereo sensors and the infrared sensor have the same spatial resolution (angular degree per pixel) in both the horizontal and vertical directions, then the infrared image can be registered to the left and right stereo images. Registration of the rectified images allows the infrared image to be merged with the left and right stereo images to improve signal to noise ratio in the stereo images. This approach combines the stereo sensors and the infrared sensor at the image level and assumes that objects reflect or irradiate energy in both the visible light spectrum and the infrared spectrum.
In another embodiment, sensor fusion may occur at the range map level. If the stereo sensors and the infrared sensor have the same spatial resolution (angular degree per pixel) in both the horizontal and vertical directions, then the infrared image can be registered to the left stereo image. Assuming the stereo range map is referenced to the left stereo image, the infrared image can then be combined with the range map, filling holes and missing parts in the range map based on infrared image segmentation. This approach also assumes that objects reflect or irradiate energy in both the visible light spectrum and the infrared spectrum.
In another embodiment, sensor fusion may occur at the detection level. The infrared sensor herein may also be replaced by a non-image forming technology, such as LIDAR or radar, or other technology that provides range information. Object detection and segmentation may be conducted separately in stereo range maps and the infrared images, or other ranging technology. Three-dimensional locations of detected objects may also be calculated separately based on available information from each sensor. Depending on the scene to be sensed, sensor fusion may happen in various ways.
For example, if an object is fully or partially detected by the stereo sensor, then the stereo detection can serve as a cue in infrared image based object detection and segmentation, and the downrange of the detected object can be directly obtained from the stereo detection. This is especially helpful when part of the object is missing from the stereo range map (e.g. black pants of a pedestrian at night).
If an object is only detected by the infrared sensor or non-CMOS ranging technology, then the infrared or non-CMOS detection is the output of the fusion process, and the stereo sensor can provide dynamic pitch angle calculation of the three camera sensors based on range information of the flat road surface immediately in front of the host vehicle. The dynamic pitch information enables an accurate downrange calculation of the detected object in the infrared image or non-CMOS data. In this case, the infrared or non-CMOS sensor plays a critical role in detecting dark objects that cannot be seen in the visible light spectrum.
The construction and arrangement of the systems and methods as shown in the various exemplary embodiments are illustrative only. Although only a few embodiments have been described in detail in this disclosure, many modifications are possible (e.g., variations in sizes, dimensions, structures, shapes and proportions of the various elements, values of parameters, mounting arrangements, use of materials, colors, orientations, etc.). For example, the position of elements may be reversed or otherwise varied and the nature or number of discrete elements or positions may be altered or varied. Accordingly, all such modifications are intended to be included within the scope of the present disclosure. The order or sequence of any process or method steps may be varied or re-sequenced according to alternative embodiments. Other substitutions, modifications, changes, and omissions may be made in the design, operating conditions and arrangement of the exemplary embodiments without departing from the scope of the present disclosure.
The present disclosure contemplates methods, systems, and program products on any machine-readable media for accomplishing various operations. The embodiments of the present disclosure may be implemented using existing computer processors, or by a special purpose computer processor for an appropriate system, incorporated for this or another purpose, or by a hardwired system. Embodiments within the scope of the present disclosure include program products comprising machine-readable media for carrying or having machine-executable instructions or data structures stored thereon. Such machine-readable media can be any available media that can be accessed by a general purpose or special purpose computer or other machine with a processor. By way of example, such machine-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code in the form of machine-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer or other machine with a processor. Combinations of the above are also included within the scope of machine-readable media. Machine-executable instructions include, for example, instructions and data, which cause a general purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions.
Although the figures may show a specific order of method steps, the order of the steps may differ from what is depicted. Also two or more steps may be performed concurrently or with partial concurrence. Such variation will depend on the software and hardware systems chosen and on designer choice. All such variations are within the scope of the disclosure. Likewise, software implementations could be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various connection steps, processing steps, comparison steps and decision steps.
This application claims priority to and the benefit of U.S. Provisional Patent Application No. 61/976,930, filed Apr. 8, 2014. The foregoing provisional application is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61976930 | Apr 2014 | US |