This document describes techniques to perform multi-sensor calibration for sensors located in a vehicle with pre-defined and/or endurable markers.
A vehicle may include cameras attached to the vehicle for several purposes. For example, cameras may be attached to a roof of the vehicle for security purposes, for driving aid, or for facilitating autonomous driving. Cameras mounted on a vehicle can obtain images of one or more areas surrounding the vehicle. These images can be processed to obtain information about the road or about the objects surrounding the vehicle. For example, images obtained by a camera can be analyzed to determine distances of objects surrounding the autonomous vehicle so that the autonomous vehicle can be safely maneuvered around the objects.
This patent document describes example multi-sensor sequential calibration techniques to calibrate multiple sensors located on or in a vehicle.
An example method of calibrating one or more sensors for an autonomous vehicle, comprises receiving, from a first camera located on a vehicle, a first image comprising at least a portion of a road comprising lane markers, where the first image is obtained by the camera at a first time; obtaining a calculated value of a position of an inertial measurement (IM) device at the first time; obtaining an optimized first extrinsic matrix of the first camera by adjusting a function of a first actual pixel location of a location of a lane marker in the first image and an expected pixel location of the location of the lane marker, where the expected pixel location is based at least on a previously known first extrinsic matrix of the first camera and the position of the IM device; and performing autonomous operation of the vehicle using the optimized first extrinsic matrix of the first camera when the vehicle is operated on another road or at another time.
In another example aspect, the above-described methods are embodied in the form of processor-executable code and stored in a non-transitory computer-readable storage medium. The non-transitory computer readable storage includes code that when executed by a processor, causes the processor to implement the methods described in the embodiments.
In yet another example embodiment, a device that is configured or operable to perform the above-described methods is disclosed.
The above and other aspects and their implementations are described in greater detail in the drawings, the descriptions, and the claims.
An autonomous vehicle may include sensors such as cameras, Radars, and Light Detection and Ranging (LiDAR) sensors mounted on the autonomous vehicle to obtain sensor data items (e.g., an image from camera, point cloud map from LiDAR sensor, or radar points from Radar) of one or more areas surrounding the autonomous vehicle. These sensor data items can be analyzed by a computer which may be on-board the autonomous vehicle to obtain information about the road (e.g., location of lane markers on the road) or about the objects surrounding the autonomous vehicle (e.g., distance to objects located around the autonomous vehicle). However, the sensors on the autonomous vehicle need to be calibrated so that a computer on-board the autonomous vehicle can precisely or accurately detect, for example, locations of the lane markers on the road or a location of an object in front of the autonomous vehicle.
The vehicle 104 includes a global positioning system (GPS) device 108 that provides information about a location of the vehicle 104. Since the GPS device 108 and the IMU sensor 106 are installed or located in the vehicle 104, the location of the IMU sensor 106 relative to the GPS device 108 is previously known. And, since the GPS device 108 and the multiple sensors (e.g., cameras 102a-102g) are installed or located in the vehicle 104, the location of each of the multiple sensors relative to the GPS device 108 is also previously and approximately known. Thus, a location information provided by the GPS device 108 can be used to obtain the approximate locations of the IMU sensor 106 and/or the multiple sensors on the vehicle 104. The approximate locations of the IMU sensor 106 and/or the multiple sensors (e.g., cameras 102a-102g) are provided as initial values that can be fine-tuned with the example multi-sensor sequential calibration process described herein. The GPS device 108 may provide measurements (e.g., location) at a pre-determined frequency (e.g., 5 Hz or 10 Hz). In some embodiments, the IMU sensor 106 may provide measurements at a frequency that is higher than that of the measurements provided by the GPS device 108.
The example multi-sensor sequential calibration process described herein can calibrate multiple sensors at least based on: (1) an association determined between sequential sensor data of a road obtained from the multiple sensors (e.g., at least two cameras 102a, 102d located on the vehicle 104) and previously stored map information of the road; (2) IMU sensor's 106 measurements; and (3) GPS device's 108 measurements. The use of the IMU sensor 106 can make the example multi-sensor sequential calibration process an asynchronous process at least because the sequential sensor data (e.g., images) obtained by the multiple sensors (e.g., cameras 102a-102g) may not be synchronized with the measurement data provided by the IMU sensor 106.
The road 208 includes a calibration road segment 212 that has lane markers that can be affixed on either side of the road 208. The lane markers include a first set of lane blocks 210a located on a first side of the road 208, and a second set of lane blocks 210b located on a second side of the road 208 opposite to the first side. In each set of lane blocks, one lane block can be separated from another lane block by a pre-determined distance to form a set of broken lines on the road. A lane block 210a, 210b can have a rectangular shape and can have a white color. As shown in
The road 208 can be straight (as shown in
A computer, which may be located in the vehicle 202, may store a map database (shown as 415 in
The calibration module can provide an optimized estimate of a camera-IMU sensor transform, Tc, which describes extrinsic parameters between the camera and the IMU sensor by using Equation (1) and measurements obtained by the GPS device. The calibration module obtains a location of the vehicle at time t from the GPS device located on the vehicle. The calibration module can determine a position (also known as pose) of the multiple sensors (e.g., cameras) located on the vehicle and the IMU sensor located on the vehicle based on the location of the vehicle provided by the GPS device and previously known distances from the GPS device to the IMU sensor and the multiple sensors. For example, if the vehicle included at least one camera and an IMU sensor, the calibration module can obtain both the pose of the camera, Ttc and the pose of the IMU sensor, Xt at time t. The camera-IMU sensor transform, Tc can be determined by the calibration module by using Equation (1):
T
c
=X
t
⊖T
t
c Equation (1)
In this patent document, the LaTeX symbols ⊕ and ⊖ denote the composition and the inverse composition operation for a Lie Group. In Equation (1), the Lie group is SE3. The compose operator for the SE3 Lie group is the composition of 6D transforms. A 6D transform includes (roll, pitch, yaw) information of the sensor (e.g., IMU sensor or camera), e.g., R that belongs to a three-dimensional coordinate space 3 and (x, y, z) position information of the sensor (e.g., IMU sensor or camera), e.g., T in a three-dimensional coordinate space 3. The composition of two transforms, A=Ra,Ta, B=Rb,Tb in SE3 for C=A⊕B is Rc=Ra*Rb, and Tc=RaTb+Ta. The inverse composition reverses a composition. Equation (1) only describes IMU sensor pose Xt and camera pose Ttc. Although the IMU sensor pose, Xt, and the camera pose, Ttc, change over time, the camera-IMU sensor transform, Tc, between them can remain consistent. The camera-IMU sensor transform, Tc, belongs to a SE(3) matrix, where SE(3) is a Lie group.
The variable Xt indicates IMU sensor pose at time t, where Xt belongs to a SE(3) matrix, where SE(3) is a Lie group. The calibration module can determine the IMU sensor pose Xt based on location information provided by the GPS at time t and based on a previously known location of the IMU relative to the GPS device. The variable Vt indicates the velocity of the vehicle at time t, where Vt belongs to a three-dimensional coordinate space 3. The variable Bt indicates an IMU accelerometer (or accel)+gyro bias at time t, where Bt belongs to a six-dimensional coordinate space 3. The variable ωt indicates an angular velocity of the vehicle at time t, where ωt belongs to a three-dimensional coordinate space 3. The variable Tc is camera-IMU sensor transform or camera extrinsic parameters of a camera, where Tc belongs to a SE(3) matrix. The variable Kc is an intrinsic matrix of the camera, where the intrinsic matrix includes parameters that describe optical characteristics (e.g., focal length, distortion, optical center, etc.,) of the camera. The variables Tc and Kc can be separately determined for each of the multiple cameras located on the vehicle.
As indicated by the legend in
In some embodiments, the IMU acceleration measurement, at, relates to an estimate of the true acceleration, ât, according to Equation (2).
a
t
=R
t(ât−g)+b(a)t+H(a)t Equation (2)
where Rt describes the rotation of the device at time t, g is the 3D vector for gravity, b(a)t is the bias of the acceleration, and H(a)t is white noise. The IMU factor constrains the device rotation and the bias to be consistent with the measured acceleration.
In some embodiments, the IMU angular velocity measurement, wt, relates to an estimate of the true angular velocity, ât, according to Equation (3).
w
t
=ŵ
t
+b(g)t+H(g)t Equation (3)
where b(g)t and H(g)t are the bias and the white noise of the gyroscope. The IMU factor constrains the gyroscope bias to match the measurement.
The IMU factor uses the estimates of the true angular velocity to constrain the device rotation according to Equation (4).
where Ri, Rj represent the orientation of the device at times i, j, Exp(.) is the exponential map function, and Δt is the change in time. The exponential map defines a conversion from a vector in 3 to a 3
The IMU factor uses the estimates of the true acceleration to constrain the device velocity and position according to Equations (5)-(6).
where vi, vj represent the device velocity at times i, j, and pi, pj, the device positions.
The IMU factor also constraints consecutive IMU bias terms to change slowly by minimizing the error according to Equation (7).
∥b(g)j−b(g)i∥2 and ∥b(a)j−b(a)i∥2 Equation (7)
Once the calibration module obtains IMU sensor measured data for at least two consecutive times (e.g., 502a, 502b), the calibration module can obtain the gaussian process factor. The calibration module obtains the gaussian factor, which can be a set of at least three values that describe the pose, velocity, and angular velocity between Xt-1, Vt-1, ωt-1 and Xt,Vt,ωt so that the calibration module can obtain the gaussian factor between two states as a measurement. Thus, for example, the calibration module can obtain gaussian factor 504a (at least three values that includes a value between X1 and X2, a value between V1 and V2, and a value between ω1 and ω2), and the calibration module can obtain gaussian factor 504b, and so on as indicated in
The calibration module obtains an image from each of multiple cameras (e.g., a first camera and a second camera located on the vehicle) at time t=m where the image comprises one or more lane blocks (shown as 210a, 210b in
p″=KcXtTcP Equation (8)
where Kc is the intrinsic camera matrix for a camera that obtained the image, Xt is IMU sensor pose, Tc is the extrinsic matrix of the camera that obtained the image, and P is the 3D world coordinates of the lane block in the image, where P is the 3D world coordinates of the location (e.g., corner) of the lane block obtained by the calibration module from the map database.
The loss function for each 3D-2D factor (also known as localization factor) can be determined by subtracting the expected pixel location p″ from the actual pixel location p′ using a L2 Euclidian norm as shown in Equation (9):
∥p′−p″∥2=0 Equation (9)
where p′ is the actual pixel location of a location (e.g., corner) of the lane block in an image in two-dimensional coordinate space. Thus, the calibration module minimizes the difference between p′ and p″ by adjusting the extrinsic matrix Tc of the camera to obtain an optimized extrinsic matrix Tc of the camera.
The multi-sensor sequential extrinsic calibration technique for the cameras can be performed by the calibration module as follows: the vehicle drives on a road (e.g., as shown in
In some embodiments, the calibration module can determine whether a 2D-2D image-to-image factor can be applied for two cameras that have at least some overlapping field of views. The identity of the two cameras having at least some overlapping field of views is previously known and may be stored in a table on the computer (e.g., 400 in
A 2D-2D image-to-image factor describes a constraint between cameras for images captured at the same time, which can enable a 2D-2D correspondences of a same lane block in two images of the two cameras to match Epipolar geometry.
The calibration module applies a 2D-2D factor constraint between Tc, Kc, prc and Td, Kd, psd using Equations (10) to (14), where prc is the measured pixel coordinates of a location (e.g., corner) of a lane block obtained from an image from camera_c and psd is the measured pixel coordinates of the same location (e.g., corner) of the same lane block obtained from an image from camera_d, where Tc and Td are the camera-IMU sensor transform or the camera extrinsic parameters of camera_c and camera_d, respectively, and, where Kc and Kd are the intrinsic parameters of camera_c and camera_d, respectively:
ΔP=Tc−1⊕Td Equation (10)
ΔP describes the pose transform in SE3 between the two cameras at poses Tc and Td.
where R and T are the rotation and translation components of the transform and the []{circumflex over ( )} operator in Equation (11) denotes the mapping from the unit vector to the skew symmetric matrix.
H
r
=K
c
−1
×p
r
c Equation (12)
where Hr is a matrix that describes a 3D camera ray from the camera through the pixel prc.
H
s
=K
d
−1
×p
s
d Equation (13)
where Hs is a matrix that describes a 3D camera ray from the camera through the pixel psd.
H
rT
×E×H
s=0 Equation (14)
Equation (14) defines the Epipolar constraint. Since the extrinsic matrix Tc of the camera_c can be determined using the techniques described in Section V of this patent document, the calibration module can adjust the extrinsic matrix Td of the camera_d such that the matrix multiplication shown in Equation (14) equals zero.
In some embodiments, the calibration module can determine whether an asynchronous 2D-2D image-to-image factor can be applied for two cameras that are pre-determined to have overlapping or non-overlapping field of views. For example, for the example camera setup in
ΔP=Tc−1⊕(Xm⊖Xn)⊕Td Equation (15)
The calibration module can apply ΔP from Equation (15) into Equations (11) to (14) to optimize (or adjust) the extrinsic matrices of camera_c and/or camera_d at the same time by minimizing the error function described in Equations (11) to (14). In some embodiments, if the calibration module determines that the extrinsic matrix of camera_c has been optimized (e.g., by using the techniques described in Section V or VI), then the calibration module can minimize the error function described in Equations (10) to (14) by adjusting the extrinsic matrix of camera_d.
In some embodiments, the calibration module can also apply an image-to-3D factor to a camera. The image-to-3D factor replaces an exact 2D correspondence to a lane block with an image mask of lane markings. The distance transform between a re-projected point and the mask defines the error.
In some embodiments, the calibration module can include a general lane mark constraint between a whole image and a map. This technique enables the formation of constraints to solid lines, in addition to dashed ones. For each solid lane, the HD-map includes 3D world coordinates at an interval. For each dashed lane, the 3D world coordinates are the locations (e.g., corners) of the lane dashes, as previously specified. The calibration module projects a map point of a location of a solid line or dashed line to a region of the image with a lane marking detected from the image. The calibration module forms a constraint from a map point to an image by first applying image processing to the image. Lane marks are found in the camera image with image processing. The result is a binary image that indicates which pixels are lane marks and which are not. The calibration module projects the map point according to Equation (8), which can fall within a lane mark of the image mask and can provide an expected pixel location p″ of the lane mark. Given an expected pixel location p″ as defined in Equation (8), the calibration module determines a nearest pixel (lm) of a lane marker in the image obtained by the camera. The calibration module applies the constraint as defined in Equation (16) to adjust the extrinsic matrix of the camera (e.g., using Equation (8)) to minimize the difference between the expected pixel location and the determined nearest pixel of the lane marker (lm):
∥p″−lm∥2=0 Equation (16)
In some embodiments, the method of
In some embodiments, the previously known second extrinsic matrix of the second camera is adjusted so that the matrix multiplication is equal to zero. In some embodiments, the second image is obtained by the second camera at the first time. In some embodiments, the essential matrix is based at least on the optimized first extrinsic matrix and the second extrinsic matrix. In some embodiments, identities of the first camera and the second camera that have at least some overlapping field of view is previously known. In some embodiments, the first matrix for the first 3D camera ray is determined by multiplying an inverse of a first intrinsic matrix of the first camera with the first actual pixel location, and the second matrix for the second 3D camera ray is determined by multiplying an inverse of a second intrinsic matrix of the second camera with the second actual pixel location.
In some embodiments, a system for calibration of one or more sensors on or in a vehicle is described, where the system comprising a processor configured to perform the operations described at least in
In some embodiments, the calculated value of the position of the IM device at the first time is based on: a first location provided by the GPS device for the second time, a second location provided by the GPS device at the third time, and a previously known location of the IM device relative to that of the GPS device. In some embodiments, the calculated value of the position of the IM device is an interpolated value in between a second position of the IM device at the second time and a third position of the IM device at the third time, the second position of the IM device is based on the second location provided by the GPS device and the previously known location of the IM device relative to that of the GPS device, and the third position of the IM device is based on the third location provided by the GPS device and the previously known location of the IM device relative to that of the GPS device.
In some embodiments, a non-transitory computer readable storage medium having code stored thereon, the code, when executed by a processor, causing the processor to implement a method disclosed at least in
In some embodiments, the adjusting of the previously known third extrinsic matrix of the third camera or the optimized first extrinsic matrix of the first camera is based on a matrix multiplication of the first matrix, the second matrix, and an essential matrix so that the matrix multiplication is equal to zero, and the essential matrix describes a relationship of the first actual pixel location and the third actual pixel location. In some embodiments, the essential matrix is based at least on a second position of the IM device when the third image is obtained by the third camera, the position of the IM device when the first image is obtained by the first camera, the optimized first extrinsic matrix, and the third extrinsic matrix. In some embodiments, the first camera has a field of view that at least partially overlaps with that of the third camera.
In some embodiments, the method of
The autonomous operation module 425 can receive sensor data (e.g., images, PCD, radar data) of an environment in which the autonomous vehicle is operating, where the sensor data is obtained from the multiple sensors onboard the vehicle. The autonomous operation module 425 can instruct the autonomous vehicle to operate on a road (e.g., determine distance to vehicles in front of the autonomous vehicle, turn, apply brakes) based on the sensor data and based on the sensor data adjusted using the optimized extrinsic matrix obtained from the calibration module. In an example implementation, the autonomous operation module can determine a location of an object located in front of or next to or behind the autonomous vehicle based on a presence of the object on an image obtained by a camera onboard the autonomous vehicle and based on the optimized extrinsic matrix of the camera.
This patent document describes example sequential techniques for performing extrinsic camera calibration of a multi-sensor equipped vehicle. The calibration module in this patent document can use an IMU, a GPS, and an HD map of 3D lane markers to resolve the pose transforms between cameras and the IMU. Image processing techniques can be applied by the calibration module to register views of map points, views of lane lines, and/or shared views of map points between images obtained by different cameras. Shared views include any image pair of cameras that shares an overlapping field of view, whether the images were acquired at the same time or at different times, whether the cameras are of the same type or not, or whether the lenses have the same focal length or not. Constraints between shared views can maximize the agreement of the extrinsic calibration to Epipolar geometry, which can be better than HD map-only calibration due to the fact that map points may have errors. Non-linear optimization that may be performed by the calibration module on the constraints obtained from the IMU, the GPS, and the results of image process can render the pose transforms. The example multi-sensor sequential calibration methods can yield precise extrinsic calibration of multi-view, standard, wide-angle, and telephoto lens cameras with perception up to 1000 m, for an automotive application.
Some of the embodiments described herein are described in the general context of methods or processes, which may be implemented in one embodiment by a computer program product, embodied in a computer-readable medium, including computer-executable instructions, such as program code, executed by computers in networked environments. A computer-readable medium may include removable and non-removable storage devices including, but not limited to, Read Only Memory (ROM), Random Access Memory (RAM), compact discs (CDs), digital versatile discs (DVD), etc. Therefore, the computer-readable media can include a non-transitory storage media. Generally, program modules may include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer- or processor-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps or processes.
Some of the disclosed embodiments can be implemented as devices or modules using hardware circuits, software, or combinations thereof. For example, a hardware circuit implementation can include discrete analog and/or digital components that are, for example, integrated as part of a printed circuit board. Alternatively, or additionally, the disclosed components or modules can be implemented as an Application Specific Integrated Circuit (ASIC) and/or as a Field Programmable Gate Array (FPGA) device. Some implementations may additionally or alternatively include a digital signal processor (DSP) that is a specialized microprocessor with an architecture optimized for the operational needs of digital signal processing associated with the disclosed functionalities of this application. Similarly, the various components or sub-components within each module may be implemented in software, hardware or firmware. The connectivity between the modules and/or components within the modules may be provided using any one of the connectivity methods and media that is known in the art, including, but not limited to, communications over the Internet, wired, or wireless networks using the appropriate protocols.
While this document contains many specifics, these should not be construed as limitations on the scope of an invention that is claimed or of what may be claimed, but rather as descriptions of features specific to particular embodiments. Certain features that are described in this document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub-combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub-combination or a variation of a sub-combination. Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results.
Only a few implementations and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this disclosure.