The subject disclosure relates to three-dimensional (3D) alignment of radar and camera sensors.
Vehicles (e.g., automobiles, farm equipment, automated factory equipment, construction equipment) increasingly include sensor systems that facilitate augmented or automated actions. For example, light detection and ranging (lidar) and radio detection and ranging (radar) sensors respectively emit light pulses or radio frequency energy and determine range and angle to a target based on reflected light or energy that is received and processed. A camera (e.g., still, video) facilitates target classification (e.g., pedestrian, truck, tree) using a neural network processor, for example. In autonomous driving, sensors must cover all 360 degrees around the vehicle. More than one type of sensor covering the same area provides functional safety and complementary information through sensor fusion. In this respect, the sensors must be geometrically aligned to provide sensing within a shared field of view (FOV). Yet, different types of sensors (e.g., radar, camera) obtain different types of information in different coordinate spaces. Accordingly, it is desirable to provide 3D alignment of radar and camera sensors.
In one exemplary embodiment, a method of performing a three-dimensional alignment of a radar and a camera with an area of overlapping fields of view includes positioning a corner reflector within the area, and obtaining sensor data for the corner reflector with the radar and the camera. The method also includes iteratively repositioning the corner reflector within the area and repeating the obtaining the sensor data, and determining a rotation matrix and a translation vector to align the radar and the camera such that a three-dimensional detection by the radar projects to a location on a two-dimensional image obtained by the camera according to the rotation matrix and the translation vector.
In addition to one or more of the features described herein, the obtaining sensor data with the camera includes determining a position of a light emitting diode disposed at an apex position of the corner reflector in an image of the corner reflector.
In addition to one or more of the features described herein, the obtaining the sensor data with the radar includes detecting the apex position of the corner reflector as a point target.
In addition to one or more of the features described herein, the method includes mapping a three-dimensional position obtained by operating on a radar detection with the rotation matrix and the translation vector to the location on the two-dimensional image.
In addition to one or more of the features described herein, the method includes defining a cost function as a sum of squared Mahalanobis distances between a location of a center of the corner reflector as determined by the camera and the location of the center of the corner reflector as determined by the radar and projected on the two-dimensional image obtained by the camera for each position of the corner reflector in the area.
In addition to one or more of the features described herein, the determining the rotation matrix and the translation vector includes determining the rotation matrix and the translation vector that minimize the cost function.
In addition to one or more of the features described herein, the determining the rotation matrix includes determining three angle values.
In addition to one or more of the features described herein, the determining the translation vector includes determining three position components.
In addition to one or more of the features described herein, the obtaining the sensor data with the camera includes using a pinhole camera.
In addition to one or more of the features described herein, the obtaining the sensor data with the camera includes using a fisheye camera.
In another exemplary embodiment, a system to align a radar and a camera with an area of overlapping fields of view includes a camera to obtain camera sensor data for a corner reflector positioned at different locations within the area, and a radar to obtain radar sensor data for the corner reflector at the different locations within the area. The system also includes a controller to determine a rotation matrix and a translation vector to align the radar and the camera such that a three-dimensional detection by the radar projects to a location on a two-dimensional image obtained by the camera according to the rotation matrix and the translation vector.
In addition to one or more of the features described herein, the camera determines a position of a light emitting diode disposed at an apex position of the corner reflector in an image of the corner reflector.
In addition to one or more of the features described herein, the radar detects the apex position of the corner reflector as a point target.
In addition to one or more of the features described herein, the controller maps a three-dimensional position obtained by operating on a radar detection with the rotation matrix and the translation vector to the location on the two-dimensional image.
In addition to one or more of the features described herein, the controller defines a cost function as a sum of squared Mahalanobis distances between a location of a center of the corner reflector as determined by the camera and the location of the center of the corner reflector as determined by the radar and projected on the two-dimensional image obtained by the camera for each position of the corner reflector in the area.
In addition to one or more of the features described herein, the controller determines the rotation matrix and the translation vector to minimize the cost function.
In addition to one or more of the features described herein, the controller determines the rotation matrix as three angle values.
In addition to one or more of the features described herein, the controller determines the translation vector as three position components.
In addition to one or more of the features described herein, the camera is a pinhole camera, and the pinhole camera and the radar are in a vehicle.
In addition to one or more of the features described herein, the camera is a fisheye camera, and the fisheye camera and the radar are in a vehicle.
The above features and advantages, and other features and advantages of the disclosure are readily apparent from the following detailed description when taken in connection with the accompanying drawings.
Other features, advantages and details appear, by way of example only, in the following detailed description, the detailed description referring to the drawings in which:
The following description is merely exemplary in nature and is not intended to limit the present disclosure, its application or uses. It should be understood that throughout the drawings, corresponding reference numerals indicate like or corresponding parts and features.
As previously noted, vehicles increasingly include sensor systems such as radar and camera sensors. In an autonomous vehicle or a vehicle with autonomous features (e.g., autonomous parking), coverage of 360 degrees around the vehicle with more than one sensor facilitates obtaining complementary information through sensor fusion. However, sensor fusion (i.e., combining of data obtained by each sensor) requires geometric alignment of the sensors that share a FOV. If sensors are not aligned, detections by one sensor that are transformed to the frame of reference of the other sensor will project at the wrong coordinates. For example, radar detections that are transformed to the camera frame of reference will project at wrong image coordinates. Thus, the distance, in pixels, between the projected and the actual image locations is a measure of the misalignment of the sensors.
Embodiments of the systems and methods detailed herein relate to 3D alignment of radar and camera sensors. Specifically, transformation parameters between the radar and camera are determined for geometric alignment of the two types of sensors. Then, radar detections transformed to the camera frame of reference project onto the target image at the correct image coordinates. In the exemplary embodiment detailed herein, corner reflectors are used to determine the transformation parameters. In a radar system, a corner reflector appears as a strong point-like target with all reflected energy coming from near the apex. By inserting a light emitting diode (LED) in the apex of the corner reflector, image coordinates of the LED in the image obtained by the camera can be aligned with the apex detection by the radar system, as detailed below.
In accordance with an exemplary embodiment,
Three targets 140a, 140b, 140c (generally referred to as 140) are in the FOV of both the radar 110 and the camera 120 (camera FOV 125 is indicated in
As also previously noted, the corner reflector 210 has an LED 220 in the center (i.e., at the apex) such that known image processing techniques performed by the controller 130 on an image obtained by the camera 120 identify the location of the LED 220 within the image. Two exemplary types of cameras 120—a pinhole camera and a fisheye camera—are discussed herein for explanatory purposes, but other types of known cameras 120 (i.e., any calibrated camera 120) may be used in the vehicle 100 and may be aligned according to the processes discussed with reference to
When the camera 120 is a pinhole camera,
In EQS. 1 and 2, f is the focal length of the pinhole camera, and {right arrow over (p)}0=[u0,v0]T is the principle point of the pinhole camera. {tilde over (X)} and {tilde over (Y)} are normalized (or projective) coordinates. Distortions introduced by lenses of the camera 120 may be considered in the model for a more accurate representation of the position of the LED 220, for example. When the camera 120 is a fisheye camera, within the Equidistance model,
In EQS. 3 and 4, c is model parameter, and {tilde over (X)} and {tilde over (Y)} are normalized (or projective) coordinates.
When the radar obtains a detected location qi=[Xi Yi Zi]T for the corner reflector 210, the location qi is first transformed to a location pi in the frame of reference of the camera 120. The transformation is given by:
p
i
=Rq
i
+T [EQ. 5]
The projected image location (i.e., based on the mapping from the three-dimensional location pi to a two-dimensional image {right arrow over (l)}i is given by:
{right arrow over (l)}
i
={right arrow over (F)}(pi) [EQ. 6]
In EQ. 2, the symbol {right arrow over (F)} stresses the vector nature of the mapping. When the transformation (R, T) is correct, then {right arrow over (l)}i coincides or closely approximates the image location {right arrow over (l)}ic of the LED 220 detected by the camera 120. The processes used to determine the rotation matrix R and the translation vector T are detailed with reference to
The initial estimate of the rotation matrix R and the translation vector T) ({circumflex over (R)},{circumflex over (T)}) may be obtained using a perspective-n-point (PnP) approach. PnP refers to the problem of estimating the pose of a camera 120 given a set of n 3D points in the world and their corresponding 2D projections in the image obtained by the camera. In the present case, the n 3D points qi=[Xi Yi Zi]T, where i is the index from 1 to n and T indicates a transpose for a column vector, are detections by the radar 110 in the radar-centered frame of reference. In the camera-centered frame of reference, the corresponding points have the coordinates pi=[X′i Y′i Z′i]T according to EQS. 5 and 6 such that:
At block 330, determining the rotation matrix R and the translation vector T involves determining the transformation (R, T) that minimizes the total camera-radar projection error. The cost function Φ is defined to facilitate the minimization.
Specifically, the cost function Φ is defined as the sum of squared Mahalanobis distances between the detected LED centers {right arrow over (l)}i=[uic,vic]T and the location of the apex of the corner reflector 210 at each different position, as detected by the radar 110 and projected onto the camera plane:
In EQ. 8, Σ indicates the covariance matrix, which characterizes spatial errors. Using EQ. 6,
Δ{right arrow over (l)}i(R,T)={right arrow over (l)}ic−{right arrow over (F)}(pi)={right arrow over (l)}ic−{right arrow over (F)}(Rqi+T) [EQ. 9]
As EQ. 9 indicates, each covariance matrix Σ is composed of two parts, one relating to the camera 120 (c) and covariance of the detection of the LED 220 on the image and the other relating to the radar 110 (r) and covariance of the detection projected on the image plane:
Σi=Σi(c)+Σi(r) [EQ. 10]
To calculate Σi(r) an analysis is done of the way the three-dimensional error of radar detection manifests itself in the two-dimensional covariance in EQ. 10. With p=[X, Y, Z]T being a three-dimensional point in the camera 120 field of view and, according to EQ. 6, with {right arrow over (l)}={right arrow over (F)}(p) being a projection of p on the image, a small change in p will result in:
In component notation, EQ. 11 may be written:
With p1=X, p2=Y, p3=Z, l1=u, l2=v, then the projected covariance is given by:
In EQ. 13, Γ is the covariance matrix describing the three-dimensional error of the radar detection, j, k=1, 2, 3, and μ, υ=1, 2.
Determining the R and T that minimize the cost function Φ, according to EQ. 8, involves solving for six parameters in total. This is because the rotation matrix R is parameterized by three angles (ψ, θ, ϕ) and translation vector T is parameterized by three components Tx, Ty, Tz. EQ. 8 is re-written by performing a Cholesky decomposition on Σi−1 as:
Σi−1=LiLiT [EQ. 14]
In EQ. 14, L denotes a lower triangular matrix with real and positive diagonal entries. Then the cost function Φ may be re-written in a form suitable for nonlinear least squares optimization:
From EQ. 15, the parameters associated with R and T may be estimated such that:
{circumflex over (ψ)},{circumflex over (θ)}i,{circumflex over (ϕ)}i,{circumflex over (T)}x,{circumflex over (T)}y,{circumflex over (T)}z=arg min[Φ(ψ,θ,ϕ,T)] [EQ. 16]
Optimization to determine the parameters of R and T may be performed using known tools and a standard numerical routine. Initial estimates may be obtained from geometric measurements of computer-aided design (CAD) drawings of sensor (radar 110 and camera 120) installation. As previously noted, a PnP estimation may be used for a perspective camera 120.
While the above disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from its scope. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the disclosure without departing from the essential scope thereof. Therefore, it is intended that the present disclosure not be limited to the particular embodiments disclosed, but will include all embodiments falling within the scope thereof.