The subject disclosure relates to the fusion of radar and vision sensor systems.
Vehicles (e.g., automobiles, trucks, construction equipment, farm equipment, automated factory equipment) are increasingly outfitted with sensor systems that facilitate enhanced or automated vehicle operation. For example, when a sensor system detects an object directly ahead of the vehicle, a warning may be provided to the driver or automated braking or other collision avoidance maneuvers may be implemented. The information obtained by the sensor systems must facilitate the detection and identification of objects surrounding the vehicle. One type of sensor system, a light detection and ranging (lidar) system, provides a dense point cloud (i.e., a dense set of reflections) that can be helpful in identifying a potential region of interest for further investigation. But, lidar systems have weather and other limitations. Accordingly, it is desirable to provide fusion of radar and vision sensor systems.
In one exemplary embodiment, a method of fusing a radar system and a vision sensor system includes obtaining radar reflections resulting from transmissions of radio frequency (RF) energy. The method also includes obtaining image frames from one or more vision sensor systems, and generating region of interest (ROI) proposals based on the radar reflections and the image frames. Information is provided about objects detected based on the ROI proposals.
In addition to one or more of the features described herein, a radar map is obtained from the radar reflections. The radar map indicates an intensity of processed reflections at respective range values.
In addition to one or more of the features described herein, a visual feature map is obtained from the image frames. Obtaining the visual feature map includes processing the image frames using a neural network.
In addition to one or more of the features described herein, generating the ROI proposals includes finding an overlap among features of the visual feature map and points in the radar map.
In addition to one or more of the features described herein, obtaining the radar map includes projecting three-dimensional clusters onto an image plane.
In addition to one or more of the features described herein, obtaining the three-dimensional clusters is based on performing a fast Fourier transform of the radar reflections.
In addition to one or more of the features described herein, obtaining the visual feature map includes performing a convolutional process.
In addition to one or more of the features described herein, performing the convolutional process includes performing a series of convolutions of the image frames with a kernel matrix.
In addition to one or more of the features described herein, providing the information includes providing a display to a driver of a vehicle that includes the radar system and the vision sensor system.
In addition to one or more of the features described herein, providing the information is to a vehicle system of a vehicle that includes the radar system and the vision sensor system, the vehicle system including a collision avoidance system, an adaptive cruise control system, or an autonomous driving system.
In another exemplary embodiment, a fusion system includes a radar system to obtain radar reflections resulting from transmissions of radio frequency (RF) energy. The system also includes a vision sensor system to obtain image frames from one or more vision sensor systems, and a controller to generate region of interest (ROI) proposals based on the radar reflections and the image frames, and provide information about objects detected based on the ROI proposals.
In addition to one or more of the features described herein, the controller obtains a radar map from the radar reflections, the radar map indicating an intensity of processed reflections at respective range values.
In addition to one or more of the features described herein, the controller obtains a visual feature map based on processing the image frames using a neural network.
In addition to one or more of the features described herein, the controller generates the ROI proposals based on finding an overlap among features of the visual feature map and points in the radar map.
In addition to one or more of the features described herein, the controller obtains the radar map based on projecting three-dimensional clusters onto an image plane.
In addition to one or more of the features described herein, the controller obtains the three-dimensional clusters based on performing a fast Fourier transform of the radar reflections.
In addition to one or more of the features described herein, the controller obtains the visual feature map based on performing a convolutional process.
In addition to one or more of the features described herein, the controller performs the convolutional process based on performing a series of convolutions of the image frames with a kernel matrix.
In addition to one or more of the features described herein, the controller provides the information as a display to a driver of a vehicle that includes the radar system and the vision sensor system.
In addition to one or more of the features described herein, the controller provides the information to a vehicle system of a vehicle that includes the radar system and the vision sensor system, the vehicle system including a collision avoidance system, an adaptive cruise control system, or an autonomous driving system.
The above features and advantages, and other features and advantages of the disclosure are readily apparent from the following detailed description when taken in connection with the accompanying drawings.
Other features, advantages and details appear, by way of example only, in the following detailed description, the detailed description referring to the drawings in which:
The following description is merely exemplary in nature and is not intended to limit the present disclosure, its application or uses. It should be understood that throughout the drawings, corresponding reference numerals indicate like or corresponding parts and features.
As previously noted, vehicle systems that provide warnings or take automated actions require information from sensor systems that identify regions of interest (ROI) for investigation. A lidar system transmits pulsed laser beams and determines the range to detected objects based on reflected signals. The lidar system obtains a more dense set of reflections, referred to as a point cloud, than a radar system. But, in addition to a relatively higher cost as compared with radar systems, lidar systems require dry weather and do not provide Doppler information like radar systems. Radar systems generally operate by transmitting radio frequency (RF) energy and receiving reflections of that energy from targets in the radar field of view. When a target is moving relative to the radar system, the frequency of the received reflections is shifted from the frequency of the transmissions. This shift corresponds with the Doppler frequency and can be used to determine the relative velocity of the target. That is, the Doppler information facilitates a determination of the velocity of a detected object relative to the platform (e.g., vehicle) of the radar system.
Embodiments of the systems and methods detailed herein relate to using a radar system to identify ROI. A fusion of radar and vision sensor systems is used to achieve the performance improvement of a lidar system as compared with the radar system alone while providing benefits over the lidar system in terms of better performance in wet weather and the ability to additionally obtain Doppler measurements. Specifically, a convolutional neural network is used to perform feature map extraction on frames obtained by a video or still camera, and this feature map is fused with a range map obtained using a radar system. The fusion according to the one or more embodiments will be more successful the higher the angular resolution of the radar system. Thus, the exemplary radar system discussed for explanatory purposes is an ultra-short-range radar (USRR) system. Cameras are discussed as exemplary vision sensor systems.
In accordance with an exemplary embodiment,
The controller 110 includes processing circuitry to implement a deep learning convolutional neural network (CNN). The processing circuitry may include an application specific integrated circuit (ASIC), an electronic circuit, a processor 115 (shared, dedicated, or group) and memory 120 that executes one or more software or firmware programs, as shown in
At block 240, obtaining image frames 207 includes obtaining images from each of the cameras 150. An image frame 207 that corresponds with the exemplary three-dimensional clusters 225 is also shown in
At block 260, generating one or more region of interest (ROI) proposals includes using the range map 235 resulting from the radar reflections 205 and the visual feature map 255 resulting from the image frames 207 as inputs. Specifically, objects that are indicated in the radar map 235 and visual features that are identified in the visual feature map 255 are compared to determine an overlap as the ROI. The visual feature map 255 and ROI proposals (generated at block 260) are used for region proposal (RP) pooling, at block 270. RP pooling, at block 270, refers to normalizing the ROI proposals (generated at block 260) to the same size. That is, each ROI proposal may be a different size (e.g., 32-by-32 pixels, 256-by-256 pixels) and may be normalized to the same size (e.g., 7-by-7 pixels) at block 270. The pixels in the visual feature map 255 that correspond with ROI proposals are extracted and normalized to generate a normalized feature map 275. This process is further discussed with reference to
Providing output, at block 290, can include multiple embodiments. According to an embodiment, the output may be a display 410 to the driver overlaying an indication of the classified objects in a camera display. The display may include an image with boxes indicating the outline of classified objects. Color or other coding may indicate the classification. The boxes are placed with a center location u, v in pixel coordinates and a size (width W and height H) in pixel units. Alternately or additionally, the output includes information that may be provided to one or more vehicle systems 140. The information may include the location and classification of each classified object in three-dimensional space from the vehicle perspective. The information may include the detection probability, object geometry, velocity (i.e., heading angle, velocity), which is determined based on Doppler information obtained by the radar system 130 or frame-by-frame movement determined based on the cameras 150, and position (e.g., in the x, y coordinate system) for each object.
While the above disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from its scope. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the disclosure without departing from the essential scope thereof. Therefore, it is intended that the present disclosure not be limited to the particular embodiments disclosed, but will include all embodiments falling within the scope thereof.