The instant specification generally relates to autonomous vehicles. More specifically, the instant specification relates to improving autonomous driving systems and components with calibration of various sensors of autonomous vehicles using natural scenes as well as motion of the autonomous vehicle and various objects encountered in driving environments.
An autonomous (fully and partially self-driving) vehicle (AV) operates by sensing an outside environment with various electromagnetic (e.g., radar and optical) and non-electromagnetic (e.g., audio and humidity) sensors. Some autonomous vehicles chart a driving path through the environment based on the sensed data. The driving path can be determined based on Global Navigation Satellite System (GNSS) data and road map data. While the GNSS and the road map data can provide information about static aspects of the environment (buildings, street layouts, road closures, etc.), dynamic information (such as information about other vehicles, pedestrians, street lights, etc.) is obtained from contemporaneously collected sensing data. Precision and safety of the driving path and of the speed regime selected by the autonomous vehicle depend on timely and accurate identification of various objects present in the driving environment and on the ability of a driving algorithm to process the information about the environment and to provide correct instructions to the vehicle controls and the drivetrain.
The present disclosure is illustrated by way of examples, and not by way of limitation, and can be more fully understood with references to the following detailed description when considered in connection with the figures, in which:
In one implementation, disclosed is a system that includes a plurality of sensors of a sensing system of an autonomous vehicle (AV), the sensing system configured to collect sensing data during operation of the AV. The disclosed system further includes a data processing system, operatively coupled to the sensing system, the data processing system configured to identify, based on the collected sensing data, a first reference point associated with a first object in an environment of the AV; determine i) a first estimated location of the first reference point using a sensing data obtained by a first subset of the plurality of sensors and ii) a second estimated location of the first reference point using a sensing data obtained by a second subset of the plurality of sensors; and adjust parameters of one or more of the plurality of sensors based on a loss function representative of a difference of the first estimated location and the second estimated location.
In another implementation, disclosed is a first sensor and a second sensor of a sensing system of an AV, the sensing system configured to collect sensing data during operation of the AV. The disclosed system further includes a data processing system, operatively coupled to the sensing system, the data processing system configured to identify, based on the collected sensing data, a plurality of reference points associated with an environment of the AV, determine i) first estimated locations of each of the plurality of reference points using the sensing data collected by a first sensor and ii) second estimated locations of each of the plurality of reference points using the sensing data collected by a second sensor; and adjust parameters of at least one of the first sensor or the second sensor based on a loss function representative of a difference of the first estimated locations and the second estimated locations.
In another implementation, disclosed is a method that includes collecting sensing data, using a plurality of sensors of a sensing system of an AV; identifying, based on the collected sensing data, a first reference point associated with a first object in an environment of the AV; determining i) a first estimated location of the first reference point using a sensing data obtained by a first subset of the plurality of sensors and ii) a second estimated location of the first reference point using a sensing data obtained by a second subset of the plurality of sensors; and adjusting parameters of one or more of the plurality of sensors based on a loss function representative of a difference of the first estimated location and the second estimated location.
An autonomous vehicle can employ a number of sensors to facilitate detection of objects in the outside environment and determine a motion performed by such objects. The sensors typically include radio detection and ranging (radar) sensors, light detection and ranging (lidar) sensors, multiple digital cameras, sonars, positional sensors, and the like. Timely and precise detection of objects and their motion depends on accurate calibration of the sensors and maintaining calibration of the sensors throughout autonomous driving missions. Different types of sensors provide different and complementary benefits. For example, radars and lidar emit electromagnetic signals (radio signals or optical signals) that reflect from the objects and determine distances to the objects (e.g., from the time of flight of the signals) and velocities of the objects (e.g., from the Doppler shift of the frequencies of the signals). Radars and lidars can cover an entire 360-degree view by using a series of consecutive sensing frames. Sensing frames can include numerous reflections covering the outside environment in a dense grid of return points. Each return point can be associated with the distance to the corresponding reflecting object and a radial velocity (a component of the velocity along the line of sight) of the reflecting object.
Lidars and radars have different advantages. Lidars, by virtue of a significantly smaller wavelength, have a higher spatial resolution, which allows obtaining many closely-spaced return points from the same object. Lidars, however, have expensive high-quality optical components that have to be carefully manufactured and maintained. Additionally, optical signals of lidars attenuate at distances of the order of several hundred meters and work less effectively in poor weather conditions.
Radar sensors are inexpensive, require less maintenance than lidar sensors, have a large working range of distances, and possess a good tolerance of adverse environmental conditions, e.g., foggy, snowy, or misty driving conditions. Because of the low cost and complexity of radar sensors, multiple radar devices can be mounted on a vehicle and positioned at locations that collectively provide a wide field of view. Precision of radar returns, however, can be limited by a relatively long wavelength, as compared with the wavelength of light used in the optical lidar technology. Both radars and lidars are capable of providing a three dimensional view of the environment.
A camera (e.g., a photographic or video camera) allows a high resolution of objects at both shorter and longer distances, but provides a projection of a three-dimensional space onto a two-dimensional image plane (or some other non-planar surface). As a result, the resolution of lateral coordinates of objects can be different (e.g., higher) than the resolution of radial distances to the same objects (with radial distances being estimated from, e.g., a focal length of the camera that provides the maximum sharpness of the images of objects).
With various sensors providing different benefits, an autonomous vehicle sensing system typically deploys sensors of multiple types, leveraging each sensor's advantages to obtain a more complete picture of the outside environment. For example, a lidar can accurately determine a distance to an object and the radial velocity of the object whereas a camera, upon acquiring a time series of images, can determine the lateral velocity of the object. Reliability of object tracking, however, depends on the accuracy of matching objects in the images and other data obtained by different sensors. For example, an unrecognized (and, therefore, uncorrected) error in a camera orientation of 1° results in an error of about 2 meters at a distance of 100 meters and can result in an incorrect lidar-camera association interpreting different objects as the same object or vice versa. A misalignment of a camera orientation can occur due to manufacturing (e.g., installation) tolerances, as a result of shaking due to road bumps, uneven heating of different components of the autonomous vehicle and the camera, and so on. The focal distance of the lens of a camera can be affected by elements of the environment precipitating on the camera or by wear of the optical and mechanical components of the camera. Correcting for various errors in alignment and inaccuracies of sensor data is referred to herein as calibration. Currently, calibration of various sensors of an autonomous vehicle can be performed during the vehicle's off time, and may require special facilities, sophisticated calibration targets depicting a special scenery, and various expensive procedures performed by fleet engineers. Such procedures, however, are complex and not efficient to be performed after each driving mission and, moreover, do not insure against a loss of calibration during performance of a particular mission.
Aspects and implementations of the present disclosure address these and other shortcomings of the existing calibration technology by enabling methods and systems that perform run-time calibration and recalibration of various sensors of an autonomous vehicle using native scenery encountered during actual driving missions. Run-time calibration can include identifying features of outside objects (herein referred to as reference points) that are sufficiently prominent to be detectable by various sensors, such as multiple cameras, lidar(s), radar(s), and so on. Coordinates of reference points can be determined based on data from multiple sensors, e.g., by a lidar sensor, a front-facing camera, a surround-view camera, and so on. The coordinates determined by different sensors can differ from each other. Calibration of the sensors can be performed by optimizing various sensor parameters (such as directions of view, focal lengths, precise locations of the sensors on the AV, etc.) in a way that minimizes errors in coordinates of reference points as determined by different sensors. Reference points can be stationary, e.g., a road sign, a parked vehicle, a trunk of a tree, a feature of a building, bridge, or any other structure. In some instances, reference points can be moving, such as a vehicle or a pedestrian, and the run-time calibration can further include tracking a motion of the moving reference point relative to the autonomous vehicle over a period of time preceding and concurrent with the collection of the calibration data. Advantages of the described implementations include fast, accurate, and inexpensive calibration of sensors during actual driving missions using native scenes (actual objects) encountered in the driving environments. Described implementations can operate in situations of a large parallax of distributed sensors mounted at different locations of the autonomous vehicle.
A driving environment 101 can include any objects (animate or inanimate) located outside the AV, such as roadways, buildings, trees, bushes, sidewalks, bridges, mountains, other vehicles, pedestrians, piers, banks, landing strips, animals, birds, and so on. The driving environment 101 can be urban, suburban, rural, and so on. In some implementations, the driving environment 101 can be an off-road environment (e.g. farming or other agricultural land). In some implementations, the driving environment can be an indoor environment, e.g., the environment of an industrial plant, a shipping warehouse, a hazardous area of a building, and so on. In some implementations, the driving environment 101 can be substantially flat, with various objects moving parallel to a surface (e.g., parallel to the surface of Earth). In other implementations, the driving environment can be three-dimensional and can include objects that are capable of moving along all three directions (e.g., balloons, leaves, etc.). Hereinafter, the term “driving environment” should be understood to include all environments in which an autonomous motion (e.g., Level 5 and Level 4 systems), conditional autonomous motion (e.g., Level 3 systems), and/or motion of vehicles equipped with driver assistance technology (e.g., Level 2 systems) can occur. Additionally, “driving environment” can include any possible flying environment of an aircraft (or spacecraft) or a marine environment of a naval vessel. The objects of the driving environment 101 can be located at any distance from the AV, from close distances of several feet (or less) to several miles (or more).
The example AV 100 can include a sensing system 110. The sensing system 110 can include various electromagnetic (e.g., optical) and non-electromagnetic (e.g., acoustic) sensing subsystems and/or devices. The sensing system 110 can include a radar 114 (or multiple radars 114), which can be any system that utilizes radio or microwave frequency signals to sense objects within the driving environment 101 of the AV 100. The radar(s) 114 can be configured to sense both the spatial locations of the objects (including their spatial dimensions) and velocities of the objects (e.g., using the Doppler shift technology). Hereinafter, “velocity” refers to both how fast the object is moving (the speed of the object) as well as the direction of the object's motion. The sensing system 110 can include a lidar 112, which can be a laser-based unit capable of determining distances to the objects and velocities of the objects in the driving environment 101. Each of the lidar 112 and radar 114 can include a coherent sensor, such as a frequency-modulated continuous-wave (FMCW) lidar or radar sensor. For example, lidar 112 and/or radar 114 can use heterodyne detection for velocity determination. In some implementations, the functionality of a ToF and coherent lidar (or radar) is combined into a lidar (or radar) unit capable of simultaneously determining both the distance to and the radial velocity of the reflecting object. Such a unit can be configured to operate in an incoherent sensing mode (ToF mode) and/or a coherent sensing mode (e.g., a mode that uses heterodyne detection) or both modes at the same time. In some implementations, multiple lidars 112 and/or radar 114s can be mounted on AV 100.
Lidar 112 (and/or radar 114) can include one or more optical sources (and/or radio/microwave sources) producing and emitting signals and one or more detectors of the signals reflected back from the objects. In some implementations, lidar 112 and/or radar 114 can perform a 360-degree scanning in a horizontal direction. In some implementations, lidar 112 and/or radar 114 can be capable of spatial scanning along both the horizontal and vertical directions. In some implementations, the field of view can be up to 90 degrees in the vertical direction (e.g., with at least a part of the region above the horizon being scanned with radar signals). In some implementations, the field of view can be a full sphere (consisting of two hemispheres).
The sensing system 110 can further include one or more cameras 118 to capture images of the driving environment 101. Some cameras 118 can use a global shutter while other cameras 118 can use a rolling shutter. The images can be two-dimensional projections of the driving environment 101 (or parts of the driving environment 101) onto a projecting surface (flat or non-flat) of the camera(s). Some of the cameras 118 of the sensing system 110 can be video cameras configured to capture a continuous (or quasi-continuous) stream of images of the driving environment 101. The sensing system 110 can also include one or more sonars 116, for active sound probing of the driving environment 101, e.g., ultrasonic sonars, and one or more microphones 113 for passive listening to the sounds of the driving environment 101.
The sensing data obtained by the sensing system 110 can be processed by a data processing system 120 of AV 100. For example, the data processing system 120 can include a perception system 130. The perception system 130 can be configured to detect and track objects in the driving environment 101 and to recognize the detected objects. For example, the perception system 130 can analyze images captured by the cameras 118 and can be capable of detecting traffic light signals, road signs, roadway layouts (e.g., boundaries of traffic lanes, topologies of intersections, designations of parking places, and so on), presence of obstacles, and the like. The perception system 130 can further receive radar sensing data (Doppler data and ToF data) to determine distances to various objects in the environment 101 and velocities (radial and, in some implementations, transverse, as described below) of such objects. In some implementations, the perception system 130 can use radar data in combination with the data captured by the camera(s) 118, as described in more detail below.
The sensing system 110 can include one or more modules to facilitate calibration of various sensors of the sensing system 110. For example, the sensing system 110 can include a run-time calibration module (RTC) 115 that can be used to perform calibration of one or more sensors of the sensing system 110. More specifically, in some implementations, RTC 115 can identify multiple features in the driving environment of the AV that are capable of serving as reference points for sensor calibration. For example, reference points can include a road sign, an edge of a moving or parked car, an axle of a vehicle, a corner of a bus stop overhang, a top/bottom of a beam supporting a bridge, or any other suitable object or a part of such an object. RTC 115 can also access parameters of the one or more sensors, e.g., direction of view (optical axis) of a camera or an angular speed of a scanning by a lidar transmitter/receiver, etc., and determine coordinates of the reference points relative to Earth or the AV, based on the accessed parameters. RTC 115 can perform such determinations for multiple sensors and compute differences between the coordinates determined based on data from different sensors. RTC 115 can then adjust the parameters of at least one sensor to optimize (e.g., minimize) the computed differences. In some implementations, optimization can be performed based on multiple sensing frames obtained by the sensors at different instances of time during tracking of the reference points.
In some implementations, the sensing system 110 may further include sensor logs 117 to store recorded sensor readings for various sensors, such as lidar 112, radar(s) 114, camera(s) 118, and the like. Sensor logs 117 may be used to perform sensor calibration during AV downtime, e.g., during time between driving missions. Sensor logs 117 may be indexed by the time of sensing, to allow correlating data from different sensors. Downtime calibration may be performed similarly to the run-time calibration described below. For example, coordinates of various reference points observed during autonomous driving missions can be stored in sensor logs 117 and processed between driving missions, e.g., by the data processing system 120 of the AV or by a server to which the sensor logs 117 may be uploaded.
The perception system 130 can further receive information from a Global Navigation Satellite System (GNSS) positioning subsystem (not shown in
The data processing system 120 can further include an environment monitoring and prediction component 126, which can monitor how the driving environment 101 evolves with time, e.g., by keeping track of the locations and velocities of the animated objects (e.g., relative to Earth). In some implementations, the environment monitoring and prediction component 126 can keep track of the changing appearance of the environment due to a motion of the AV relative to the environment. In some implementations, the environment monitoring and prediction component 126 can make predictions about how various animated objects of the driving environment 101 will be positioned within a prediction time horizon. The predictions can be based on the current state of the animated objects, including current locations (coordinates) and velocities of the animated objects. Additionally, the predictions can be based on a history of motion (tracked dynamics) of the animated objects during a certain period of time that precedes the current moment. For example, based on stored data for a first object indicating accelerated motion of the first object during the previous 3-second period of time, the environment monitoring and prediction component 126 can conclude that the first object is resuming its motion from a stop sign or a red traffic light signal. Accordingly, the environment monitoring and prediction component 126 can predict, given the layout of the roadway and presence of other vehicles, where the first object is likely to be within the next 3 or 5 seconds of motion. As another example, based on stored data for a second object indicating decelerated motion of the second object during the previous 2-second period of time, the environment monitoring and prediction component 126 can conclude that the second object is stopping at a stop sign or at a red traffic light signal. Accordingly, the environment monitoring and prediction component 126 can predict where the second object is likely to be within the next 1 or 3 seconds. The environment monitoring and prediction component 126 can perform periodic checks of the accuracy of its predictions and modify the predictions based on new data obtained from the sensing system 110. The environment monitoring and prediction component 126 can operate in conjunction with RTC 115. For example, the environment monitoring and prediction component 126 can track relative motion of the AV and various objects (e.g., reference objects that are stationary or moving relative to Earth) and compensate for rolling shutter effects based on this tracking. For example, if the shutter time of a camera is T, and the relative velocity of the camera and the scenery being imaged is V, the data for regions of the scenery scanned near the end of the shutter time can be compensated by a shift VT relative to the regions scanned near the beginning of the shutter time. As depicted explicitly in
The data generated by the perception system 130, the GNSS processing module 122, and the environment monitoring and prediction component 126 can be used by an autonomous driving system, such as AV control system (AVCS) 140. The AVCS 140 can include one or more algorithms that control how AV is to behave in various driving situations and environments. For example, the AVCS 140 can include a navigation system for determining a global driving route to a destination point. The AVCS 140 can also include a driving path selection system for selecting a particular path through the immediate driving environment, which can include selecting a traffic lane, negotiating a traffic congestion, choosing a place to make a U-turn, selecting a trajectory for a parking maneuver, and so on. The AVCS 140 can also include an obstacle avoidance system for safe avoidance of various obstructions (rocks, stalled vehicles, a jaywalking pedestrian, and so on) within the driving environment of the AV. The obstacle avoidance system can be configured to evaluate the size of the obstacles and the trajectories of the obstacles (if obstacles are animated) and select an optimal driving strategy (e.g., braking, steering, accelerating, etc.) for avoiding the obstacles.
Algorithms and modules of AVCS 140 can generate instructions for various systems and components of the vehicle, such as the powertrain, brakes, and steering 150, vehicle electronics 160, signaling 170, and other systems and components not explicitly shown in FIG. 1. The powertrain, brakes, and steering 150 can include an engine (internal combustion engine, electric engine, and so on), transmission, differentials, axles, wheels, steering mechanism, and other systems. The vehicle electronics 160 can include an on-board computer, engine management, ignition, communication systems, carputers, telematics, in-car entertainment systems, and other systems and components. The signaling 170 can include high and low headlights, stopping lights, turning and backing lights, horns and alarms, inside lighting system, dashboard notification system, passenger notification system, radio and wireless network transmission systems, and so on. Some of the instructions output by the AVCS 140 can be delivered directly to the powertrain, brakes, and steering 150 (or signaling 170) whereas other instructions output by the AVCS 140 are first delivered to the vehicle electronics 160, which generates commands to the powertrain, brakes, and steering 150 and/or signaling 170.
In one example, the AVCS 140 can determine that an obstacle identified by the data processing system 120 is to be avoided by decelerating the vehicle until a safe speed is reached, followed by steering the vehicle around the obstacle. The AVCS 140 can output instructions to the powertrain, brakes, and steering 150 (directly or via the vehicle electronics 160) to: (1) reduce, by modifying the throttle settings, a flow of fuel to the engine to decrease the engine rpm; (2) downshift, via an automatic transmission, the drivetrain into a lower gear; (3) engage a brake unit to reduce (while acting in concert with the engine and the transmission) the vehicle's speed until a safe speed is reached; and (4) perform, using a power steering mechanism, a steering maneuver until the obstacle is safely bypassed. Subsequently, the AVCS 140 can output instructions to the powertrain, brakes, and steering 150 to resume the previous speed settings of the vehicle.
Reference object detection module 210 can select multiple reference points associated with the identified objects. The reference points can be selected based on visibility and distinctiveness, e.g. a top of a road sign, a tip of a post, a front bumper of a car, a center of a bounding box of an object, and so on. Selected reference points can have certain locations described by radius-vectors {right arrow over (R)}j, with subscript j=1 . . . M enumerating selected reference points. The radius-vectors {right arrow over (R)}j may be defined with respect to Earth, in which case non-moving reference points (e.g., points associated with buildings, structures, road signs, etc.) are characterized by fixed {right arrow over (R)}j. Alternatively or additionally, the radius-vectors can be defined relative to the AV (e.g., relative to any suitably chosen center of a coordinate system associated with the AV, e.g., its center of mass) and denoted herein with {right arrow over (r)}j. In this case, radius-vectors {right arrow over (r)}j of some reference points can remain stationary if those reference points move in the same way as the AV (e.g., together with the traffic). The actual radius-vectors {right arrow over (r)}j (or {right arrow over (R)}j) may be unknown to the reference object detection module 210 (or other modules of the sensing or perception systems of the AV). Instead, the radius-vectors {right arrow over (r)}j (or {right arrow over (R)}j) can be inferred based on the data acquired by various sensors of the AV with a certain degree of accuracy that is determined by the intrinsic accuracy of the sensors, precision of calibration of the sensors, and so on. From the vantage point of sensor α, reference point j can appear to be at a location {right arrow over (r)}j(α) relative to sensor α.
Sensor parameterization module 220 can keep track of various parameters of the sensors of the AV. The sensor parameters can include intrinsic as well as extrinsic parameters. Intrinsic parameters refer to sensor characteristics and settings that are internal to the sensor (e.g., focal distance of a camera, angular frequency of a lidar, etc.). Extrinsic parameters refer to interrelationships of sensors to outside bodies (e.g., orientation of the optical axis of the camera relative to the body of the AV, location of the lidar on the roof of the AV, etc.) Since various sensors are, generally, located at different points on the AV, the parameters of sensors can include locations of the sensors {right arrow over (r)}α on the AV relative to the coordinate system associated with the AV; directions of view {right arrow over (n)}α of the sensors (e.g., directions of the optical axes of the sensors), focal lengths fα of the sensors, and so on. The parameters of sensors of the AV can be known from a manufacturing specification, from a previously performed calibration, or both. Based on the location of the sensors {right arrow over (r)}α, the direction of view {right arrow over (n)}α of sensor α, focal lengths fα of sensor α, etc. (provided by sensor parameterization module 220), reference point tracking module 230 can determine location {right arrow over (r)}j(α) of reference point j relative to sensor α. Lateral and radial components (relative to the direction of view {right arrow over (n)}α) of {right arrow over (r)}j(α) can be determined from different procedures and, correspondingly, have different precisions. For example, lateral coordinates of a lidar return can be obtained from angular coordinates of the transmitted and/or received optical beams while the radial coordinate can be determined from the time-of-flight of the beams. The detected radius-vectors {right arrow over (r)}j(α) can, therefore, depend on {right arrow over (n)}α (as well as the focal distance fα).
To estimate the actual locations of reference points based on detected radius-vectors {right arrow over (r)}j(α), reference object tracking module 230 can use locations of the sensors {right arrow over (r)}α relative to the AV. Additionally, reference object tracking module 230 can receive data about motion of the AV (both the translational and the rotational motion of the AV) from an AV motion tracking module 240 and determine how locations of the sensors {right arrow over (r)}α relative to the AV are transformed to the locations of the same sensors relative to Earth. Together with the observed locations {right arrow over (r)}j(α), this enables reference object tracking module 230 to estimate actual locations {right arrow over (R)}j of various reference points relative to Earth. The estimates of the actual locations of various reference points can be obtained based on data from multiple sensors, e.g., from each sensor that is capable of observing the respective reference point, and can depend on the sensor whose data is used to obtain the estimates. For example a location of reference point j estimated from a first sensor (or a first group of sensors) can be {right arrow over (R)}j(1) and estimated from a second sensor (or a second group of sensors) can be {right arrow over (R)}j(2). In some implementations, object tracking module 230 may input the difference in the locations estimated by the two sensors (or groups of sensors) {right arrow over (R)}j(1)−{right arrow over (R)}j(2) into a loss function 250 to minimize the observed error. In some implementations, the observed error, e.g., Ω12=({right arrow over (R)}j(1)(t1)−{right arrow over (R)}j(2)(t1))2 can be between locations estimated by two different sensors for the same imaging frame (indicated with time t1). In some implementations, the loss function can be summed over multiple sensors (or groups of sensors), Ω=ΣαβΩαβ, e.g., over all sensors to which the reference point j is visible. In some implementations, the loss function may be defined for different frames, e.g., frames taken at time t1, t2 . . . , etc. For example, the loss function can evaluate an error of two estimates of locations by the same sensor α (or groups of sensors) at different instances of time, e.g., Ωαα(t1, t2)=({right arrow over (R)}j(α)(t1)−{right arrow over (R)}j(α)(t2))2. In some implementations, the loss function can evaluate an error of two estimates of locations by different sensors α and β (or groups of sensors) for different time frames, e.g., Ωαβ(t1, t2) ({right arrow over (R)}j(α)(t1)−{right arrow over (R)}j(β)(t2))2. In some implementations, the loss function can evaluate an aggregate error for multiple sensors and different frames, Ω(t1, t2)=Σαβ({right arrow over (R)}j(α)(t1)−{right arrow over (R)}j(α)(t2))2. Numerous other loss functions 250 can be designed. In some implementations, loss function 250 can include constraints from IMU. For example, acceleration {right arrow over (a)} (and angular acceleration {right arrow over (w)}) obtained by IMU can be integrated over time to obtain a change in the AV velocity and further integrated to obtain a change in the AV pose (position and orientation), and e.g., at time t and any other time t′. The difference (e.g., quadratic difference) in the change in the AV pose predicted from the IMU data (and/or wheel odometry data) and the change predicted based on data from other sensors can be included in the loss function 250. In some implementations, this loss function can also be used for intrinsic calibration of the IMU. Although for simplicity of the above illustrations the loss function is quadratic and isotropic (evaluating errors along all three spatial directions in the same way), various other non-quadratic and non-isotropic loss functions can be used. For example, a loss function can be selected that evaluates an error along the direction of the field of view {right arrow over (n)}α of a given sensor differently than an error along the directions perpendicular to the direction of view {right arrow over (n)}α, e.g., as described below in connection with
In some implementations, optimization can be performed with respect to both measured quantities (such as visible locations of reference points) as well as quantities that are unknown or partially known. For example, while the lateral coordinate of a reference object can be determined with substantial precision from a camera image, the radial coordinate of the same object can be ascertained from the same camera data with a lower accuracy. A distribution of likelihoods of such unknown or partially known quantities can be assumed and the optimization can be performed with respect to such distributions, e.g., parameterized with any suitable sets of parameters, including average value, variance, skewness, and the like, together with the optimization with respect to directly measured quantities. In some implementations, the distributions of unknown or partially known quantities can be non-parametric (e.g., non-Gaussian).
Sensor calibration module 260 can adjust parameters of various sensors in a way that optimizes (e.g., minimizes) the loss function 250. After optimization, sensor calibration module 260 updates the sensor parameterization module 220. Additionally, parameters of the sensors can be provided to perception system 130 of the AV, for more accurate interpretation of run-time sensing system data. As depicted with the respective errors in
Multiple variations of operations depicted in
Reference object detection module 210 can identify M reference points (indicated with black triangles) specified with radius-vectors (herein also referred, for brevity, as locations) {right arrow over (r)}j relative to COM 330. A reference points can be any object, or a point of any object, that is observable by more than one sensor and associated with an identifiable feature of the object, e.g., as depicted, an edge point 315 of bus 314, an axle 317 of truck 316, a headlight 319 of car 318, a corner 321 of stop sign 320, and so on. Depicted with dashed lines are lines of sight from right-front camera 310 to various reference points 315, 317, 319, and 321.
Each sensor observes reference points (as well as any other objects) from a different vantage point relative to other sensors. For example, the difference {right arrow over (r)}α−{right arrow over (r)}β characterizes parallax between sensors α and β. The locations {right arrow over (r)}α, being defined relative to COM 330 of AV 302, can remain constant during motion of AV 302 (provided that shaking and outside impacts do not shift sensors). In the frame of sensor α, reference point j is visible at a location defined by a vector
{right arrow over (r)}j(α)={right arrow over (r)}j−{right arrow over (r)}α,
which is equal to the difference between the radius-vector of the reference point (in the COM frame) and the radius-vector of the corresponding sensor. Position of COM 330 and orientation of AV 302, generally, change with time. For example, over time interval Δt=t2−t1 between two frames of sensor α, taken at times t1 and t2, COM 330 can move by an amount characterized by a translation vector Δ{right arrow over (R)}={right arrow over (V)}Δt and turn by an angle characterized by a rotation vector Δ{right arrow over (θ)}={right arrow over (ω)}Δt. The components of the rotation vector along the three coordinate vector represent roll angle θx, pitch angle θy, and yaw angle θz, respectively. As a result of the translational motion and rotation of AV 302, the new positions of the sensors relative to Earth can be approximated as (vectors in the frame of Earth indicated with capital {right arrow over (R)}):
{right arrow over (R)}α(t2)={right arrow over (R)}α(t1)+{right arrow over (V)}Δt+{right arrow over (ω)}×{right arrow over (r)}αΔt.
This approximation applies provided that a variation of the velocity {right arrow over (V)} over the time interval Δt is much less than the velocity {right arrow over (V)} itself and the angle of rotation is small, |Δ{right arrow over (θ)}|<<1. Such conditions can be satisfied when two frames are obtained in quick succession so that the vehicle does not have time to change its position and orientation significantly. In the instances of two frames separated by longer times, a more accurate transformation can be used for the new positions of the sensors relative to Earth,
{right arrow over (R)}α(t2)={right arrow over (R)}α(t1)+Δ{right arrow over (R)}+{right arrow over (T)}Δ{right arrow over (θ)}[{right arrow over (r)}α].
where {right arrow over (T)}Δ{right arrow over (θ)} is a matrix of rotations to an arbitrary angle Δ{right arrow over (θ)} being applied to the vector location {right arrow over (r)}α of sensor α relative to COM. Any form of the matrix of rotations can be used, expressed via any suitable parameters, such as roll angle θx, pitch angle θy, and yaw angle θz, Euler's angles, or in any other suitable form, e.g. quaternion form, and so on.
Over the time interval Δt, some of the reference points can remain stationary (relative to Earth), such as reference point 319 associated with parked car 318 and reference point 321 associated with stop sign 320. Some other reference points could have moved in the meantime, e.g., reference point 317 associated with moving truck 316 and reference point 315 associated with moving bus 314. Denoting the average velocity of j-th reference point over the time interval Δt as {right arrow over (V)}j, the new locations of the reference points relative to Earth are displaced by Δ{right arrow over (R)}j={right arrow over (V)}jΔt:
{right arrow over (R)}j(t2)={right arrow over (R)}j(t1)+{right arrow over (V)}jΔt.
The velocities {right arrow over (V)}j may be determined from lidar data, radar data, tracking data, including data that has been subject to statistical (e.g. Kalman) filtering, and so on. In some implementations, the velocities {right arrow over (V)}j may themselves be determined during sensor calibration as additional fitting parameters.
The locations of the new reference points after time interval Δt relative to the location of sensors can be obtained by subtracting the new locations of the reference points relative to Earth from the new position of the sensors relative to Earth, e.g.,
{right arrow over (r)}j(α)(t2)={right arrow over (r)}j(α)(t1)+({right arrow over (V)}j−{right arrow over (V)})Δt−{right arrow over (ω)}×{right arrow over (r)}αΔt,
in the instances of small time intervals and rotation angles, and
{right arrow over (r)}j(α)(t2)={right arrow over (r)}j(α)(t1)+Δ{right arrow over (R)}j−Δ{right arrow over (R)}−{right arrow over (T)}Δ{right arrow over (θ)}[{right arrow over (r)}≢0],
in the instances of arbitrary time intervals and rotations angles.
More specifically,
Sensor α may track observed motion of reference point j over multiple frames, denoted with time stamps tk, to obtain multiple values {right arrow over (r)}j(α)(tk). A particular sensor may be tracking a longitudinal distance to reference point j differently that lateral coordinates of reference point j. For example, if sensor α is a camera, sensor α may track lateral coordinates of reference point j based on the coordinates of pixel(s) in the image plane of the camera and may track longitudinal (e.g., radial) coordinates by determining a focal length of the camera that provides the sharpest image of the reference point. Accordingly, accuracy of the lateral and radial distance determination may be different (e.g., a lateral distance may be determined with a higher accuracy). Similarly, if sensor α is a lidar (or radar), sensor α may track lateral coordinates of reference point j based on the angular information obtained from lidar (or radar) transmitter/receiver and may track radial coordinates based on the time of flight of lidar (or radar) signals.
Direction of longitudinal sensing by sensor α is denoted herein by a unit vector {right arrow over (n)}α. For example, for a camera sensor, direction {right arrow over (n)}α can be the direction of an optical axis of the sensor, which may be perpendicular to the imaging plane of the camera (e.g., a plane of photodetectors of the camera). Correspondingly, sensor α can be separately keeping track of the longitudinal motion of reference point j, [{right arrow over (r)}j(α)(t)]∥={right arrow over (r)}j(α)(t)·{right arrow over (n)}α:
[{right arrow over (r)}j(α)(t2)]∥=[{right arrow over (r)}j(α)(t1)]∥+Δ{right arrow over (R)}j·{right arrow over (n)}α−Δ{right arrow over (R)}·{right arrow over (n)}α−{right arrow over (T)}Δ{right arrow over (θ)}[{right arrow over (r)}α]·{right arrow over (n)}α,
and the lateral motion of reference point j, [{right arrow over (r)}j(α)(t)]⊥={right arrow over (r)}j(α)(t)−{right arrow over (n)}α({right arrow over (r)}j(α)(t)·{right arrow over (n)}α), or in a different but substantially similar (rotated) representation, [{right arrow over (r)}j(α)(t)]⊥={right arrow over (r)}j(α)(t)×{right arrow over (n)}α:
[{right arrow over (r)}j(α)(t2)]⊥=[{right arrow over (r)}j(α)(t1)]⊥+Δ{right arrow over (R)}j×{right arrow over (n)}α−Δ{right arrow over (R)}×{right arrow over (n)}α−{right arrow over (T)}Δ{right arrow over (θ)}[{right arrow over (r)}α]×{right arrow over (n)}α.
For reference points that are not moving relative to Earth, Δ{right arrow over (R)}j=0.
Radial [{right arrow over (r)}j(α)(t)]∥ and lateral [{right arrow over (r)}j(α)(t)]⊥ coordinates can be identified for individual frames by reference object detection module 210 and tracked across multiple frames tk by reference object tracking module 220. Sensor parameterization module 230 may assign values to different parameters of sensors, e.g., based on manufacturing specification or an outcome of the last calibration. The parameters can include the positions {right arrow over (r)}α of the sensors, the direction of the optical axes of the sensors {right arrow over (n)}α, focal lengths of the sensors fα, optical aberration, and the like. Sensor calibration module 260 may optimize at least some these parameters using a loss function 240 suitably chosen to characterize a mismatch between expected (based on the assigned parameter values) and observed coordinates of various reference points. In one non-limiting example, the loss function for sensor α is a quadratic function that evaluates errors in cross-frame predictions for radial and lateral coordinates, e.g.,
Ωα=[Ωα]∥+[Ωα]⊥,
where [Ωα]∥ is a weighted (with weight Wα∥) squared difference between a measured radial coordinate of reference point [{right arrow over (r)}j(α)(t2)]∥ and the expected radial coordinate, as determined based on the previously measured radial coordinate of the same point [{right arrow over (r)}j(α)(t1)]∥. A sum may be taken over some or all reference points j (e.g., reference points that are visible to detector α):
similarly, [Ωα]∥ is a weighted (with weight Wα⊥) squared difference between measured lateral coordinates of the reference point [{right arrow over (r)}j(α)(t2)]⊥ and the expected lateral coordinates, as determined based on the previously measured radial coordinates of the same point [{right arrow over (r)}j(α)(t1)]⊥, summed over some or all (visible) reference points:
In some implementations, each loss function Ωα is minimized independently of other loss functions, with parameters of sensor α determined based on minimization of Ωα. In some implementations, minimization is performed for an aggregate loss function
In the aggregate loss function, optimization can be performed with respect to at least some or all of the following parameters: direction of view {right arrow over (n)}α of sensor α, location {right arrow over (r)}α of sensor α (relative to COM 330 of the AV), displacement Δ{right arrow over (R)}j of reference point j (relative to Earth), displacement Δ{right arrow over (R)} of the AV (relative to Earth), a focal length of a camera sensor, and so on. In some implementations, the displacement Δ{right arrow over (R)}j of reference point j and displacement Δ{right arrow over (R)} of the AV may be determined based on data obtained from multiple sensors. For example, a lidar sensor may be capable of identifying coordinates of each reference point for different sensing frames. In some instances, the motion of the AV can be determined by first identifying (e.g., by perception system 130) those reference points that remain stationary relative to Earth. For example, such a determination can be performed by identifying multiple lidar return points that have the same velocity relative to the AV; such points can be associated with objects that are stationary with respect to Earth (buildings, structures, curbs, fences, bridges, traffic signs, etc.). Similarly, locations of various reference objects can be determined from radar data. As another example, a camera can determine locations of the reference points based on a two-dimensional pixel image of a portion of the driving environment and the focal distance that provides a sharp image of the respective reference points. In some implementations, locations of reference objects may be average, e.g., weighted average, locations obtained by multiple sensors. The weighted average locations can be computed using empirically selected weights, which can be determined by field-testing.
Although the loss function used as an example above is a squared error loss function, any other suitable loss function can be used instead, including but not limited to absolute error loss function, Huber loss function, log-cos h loss function, and the like. Parameters of the loss function, e.g., weights Wα∥ and Wα⊥, can be selected empirically. In some implementations, the weights (or other parameters) can be fixed. In some implementations, higher weights can be assigned to sensor (or sensors) that are being calibrated and lower weights assigned to other sensors. In some instances, multiple (or all) sensors can be calibrated at once. Although in the above example, weights for the lateral distance errors are the same (e.g., Wα⊥), in some implementations, errors in the vertical distance can be weighted differently than errors in the horizontal distance. Numerous other loss evaluations techniques and schemes can be used to optimize a suitably chosen loss function and determine sensor calibration parameters. Similarly, in those instances where a particular subset of sensors (e.g., one or more cameras and a lidar) are used, during perception processing of the AV, together, such a subset of sensors can be weighed differently than other sensors. For example, weights Wα∥ and Wα⊥ for the sensors of the subset can be larger than the weights for other sensors). In some implementations, calibration can be performed until a target accuracy is achieved, e.g., when the sum of the loss functions (e.g., for all or a subset of sensors) is within a certain threshold. The threshold can be specific for a particular type of the sensor being calibrated, the loss function being used, and may be determined based on empirical testing. For example, it may be empirically determined that a square error loss function needs to be below a first threshold for a direction of view of a camera to be determined within a 0.5-degree accuracy and to be below a second threshold for a higher 0.1-degree accuracy.
The described techniques enable calibration of sensors that, individually, can be weakly sensitive to the changing natural scene. For example, a side-facing camera mounted on the AV can have a limited field of view and can detect objects (e.g., oncoming traffic) that move too quickly to allow an accurate calibration for small extrinsic errors in the camera positioning. By optimizing concurrently parameters of a combination of sensors, including sensors that observe the same natural scene from different vantage points and using various feature matching techniques described above, RTC 115 can reliably and efficiently calibrate sensors that otherwise (without data available from other sensors) would not be amenable to accurate calibration.
Method 500 can use real-time sensing data obtained by scanning a driving environment of the AV using a plurality of sensors of the sensing system of the AV. The sensing system can include one or more lidar sensors, radar sensors, and/or one or more camera sensors. Camera sensors can include panoramic (surround-view) cameras, partially panoramic cameras, high-definition (high-resolution) cameras, close-view cameras, cameras having a fixed field of view (relative to the AV), cameras having a dynamic (adjustable) field of view, cameras having a fixed or adjustable focal distance, cameras having a fixed or adjustable numerical aperture, and any other suitable cameras. At block 510, method 500 can include collecting sensing data during operation of the AV. Operation of the AV can include starting the AV, driving the AV (e.g., on streets and highways, rural roads, etc.), stopping the AV, parking the AV, operating the AV in an idling mode, and so on. Collected sensing data can include lidar return points, each return point associated with a distance to a reflecting surface, a direction from the lidar transmitter/receiver towards the reflecting surface, a velocity (e.g., radial velocity) of the reflecting surface, a strength of the reflected signal, and the like. Collected sensing data can further include camera images. Each camera image can include a two-dimensional projection of a portion of the driving environment, obtained using an optical system having a particular focal distance and pointing at a particular direction (direction of view), which can be a direction of the optical axis of the camera's optical system. Camera images can carry information about instantaneous locations of various objects in the driving environment. Camera images can have different accuracy of mapping objects with respect to different directions, e.g., resolution of objects' locations can be greater in the lateral direction (relative to the direction of view) than in the radial direction. The data obtained by the sensing system of the AV can be provided to the data processing system, e.g., to the run-time calibration module (RTC 115).
At block 520, method 500 can continue with the RTC identifying, based on the collected sensing data, a first reference point associated with a first object in an environment of the AV. The first reference point can be identified by multiple sensors, e.g., a lidar sensor, a panoramic camera, a frontal view camera, a side view camera, and the like. At block 530, the RTC can use determine a first estimated location, e.g., {right arrow over (R)}1(1), of the first reference point using sensing data obtained by a first subset of the plurality of sensors. In some implementations, the first subset of sensors can include one sensor (e.g., a lidar sensor). In some implementations, the first subset of sensors can include multiple sensors (e.g., a lidar sensor and a frontal view camera). At block 540, the RTC can identify a second estimated location of the first reference point, e.g., {right arrow over (R)}1(2), using a sensing data obtained by a second subset of the plurality of sensors. The second subset can similarly include a single sensor (e.g., a side view camera) or multiple sensors (e.g., a side view camera and a rear view camera). In some implementations, the first subset and the second subset of the plurality of sensors are overlapping (sharing at least some or sensors). Due to errors in calibration of some sensors or changing environmental conditions, the two estimated locations can be different, {right arrow over (R)}1(1)≠{right arrow over (R)}1(2), even though they are related to the same (first) reference point.
At block 550, the RTC can compute a loss function representative of the difference of the first estimated location and the second estimated location, {right arrow over (R)}1(1)≠{right arrow over (R)}1(2). In some implementations, the loss function can be a square error loss function, a mean absolute error function, a mean bias error function, a Huber function, a cross entropy function, or a Kullback-Leibler function. In some implementations, the loss function weighs differently a radial part of the difference of the first estimated location and the second estimated location, e.g., ({right arrow over (R)}1(1)−{right arrow over (R)}1(2))∥ and a lateral part ({right arrow over (R)}1(1)−{right arrow over (R)}1(2))⊥ of the difference of the first estimated location and the second estimated location. At block 560, method 500 can continue with the RTC adjusting parameters of one or more of the plurality of sensors based on the computed loss function. The parameters of one or more of the plurality of sensors being adjusted can include a location of a sensor on the AV, a direction of view of a camera sensor, a focal length of the camera sensor, the direction of view associated with an optical axis of the camera sensor, and the like. The parameters can be adjusted to reduce the loss function and, therefore, the difference between the first estimated location and the second estimated location.
In some implementations, multiple reference points can be used to adjust parameters of the sensing system. For example, the RTC can identify a second reference point associated with a second object in the environment of the AV and determine a third estimated location, e.g. {right arrow over (R)}2(1), of the second reference point using the sensing data obtained by the first subset of the plurality of sensors (or a different, third subset of the plurality of sensors). The RTC can further determine a fourth estimated location of the second reference point, e.g. {right arrow over (R)}2(2), using the sensing data obtained by the second subset of the plurality of sensors (or a different, fourth subset of the plurality of sensors). The RTC can further apply the loss function to the difference between the third estimated location and the fourth estimated location, {right arrow over (R)}2(1)−{right arrow over (R)}2(2), and perform the optimization of the loss function to adjust one or more parameters of at least some of the sensors. In some implementations, the optimization based on the second reference point can be performed after the optimization based on the first reference point is completed. In some implementations, the optimization based on the second reference point can be performed concurrently and independently of the optimization based on the first reference point. In some implementations, the optimization based on the second reference point can be performed together with the optimization based on the first reference point, e.g., with the loss functions for the two reference points aggregated and optimized together (batch optimization). Any number of reference points can be processed as described above, e.g., one after another or aggregated together.
Example computer device 600 can include a processing device 602 (also referred to as a processor or CPU), a main memory 604 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM), etc.), a static memory 606 (e.g., flash memory, static random access memory (SRAM), etc.), and a secondary memory (e.g., a data storage device 618), which can communicate with each other via a bus 630.
Processing device 602 (which can include processing logic 603) represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, processing device 602 can be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 602 can also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. In accordance with one or more aspects of the present disclosure, processing device 602 can be configured to execute instructions performing method 500 of run-time calibration of sensors of an autonomous vehicle.
Example computer device 600 can further comprise a network interface device 608, which can be communicatively coupled to a network 620. Example computer device 600 can further comprise a video display 610 (e.g., a liquid crystal display (LCD), a touch screen, or a cathode ray tube (CRT)), an alphanumeric input device 612 (e.g., a keyboard), a cursor control device 614 (e.g., a mouse), and an acoustic signal generation device 616 (e.g., a speaker).
Data storage device 618 can include a computer-readable storage medium (or, more specifically, a non-transitory computer-readable storage medium) 628 on which is stored one or more sets of executable instructions 622. In accordance with one or more aspects of the present disclosure, executable instructions 622 can comprise executable instructions performing method 500 of run-time calibration of sensors of an autonomous vehicle.
Executable instructions 622 can also reside, completely or at least partially, within main memory 604 and/or within processing device 602 during execution thereof by example computer device 600, main memory 604 and processing device 602 also constituting computer-readable storage media. Executable instructions 622 can further be transmitted or received over a network via network interface device 608.
While the computer-readable storage medium 628 is shown in
Some portions of the detailed descriptions above are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “identifying,” “determining,” “storing,” “adjusting,” “causing,” “returning,” “comparing,” “creating,” “stopping,” “loading,” “copying,” “throwing,” “replacing,” “performing,” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Examples of the present disclosure also relate to an apparatus for performing the methods described herein. This apparatus can be specially constructed for the required purposes, or it can be a general purpose computer system selectively programmed by a computer program stored in the computer system. Such a computer program can be stored in a computer readable storage medium, such as, but not limited to, any type of disk including optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic disk storage media, optical storage media, flash memory devices, other type of machine-accessible storage media, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
The methods and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems can be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear as set forth in the description below. In addition, the scope of the present disclosure is not limited to any particular programming language. It will be appreciated that a variety of programming languages can be used to implement the teachings of the present disclosure.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other implementation examples will be apparent to those of skill in the art upon reading and understanding the above description. Although the present disclosure describes specific examples, it will be recognized that the systems and methods of the present disclosure are not limited to the examples described herein, but can be practiced with modifications within the scope of the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than a restrictive sense. The scope of the present disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
Number | Name | Date | Kind |
---|---|---|---|
20070182623 | Zeng | Aug 2007 | A1 |
20200033444 | Marsch | Jan 2020 | A1 |
20210033712 | Yang | Feb 2021 | A1 |
Entry |
---|
Extended European Search Report (EESR), for Application No. 22207271.2—Communication pursuant to Article 62 EPC (R. 63 EPC), dated Apr. 13, 2023, 15 pages. |
J. Solà, “Consistency of the monocular EKF-SLAM algorithm for three different landmark parametrizations,” 2010 IEEE International Conference on Robotics and Automation, 2010, pp. 3513-3518. |
Number | Date | Country | |
---|---|---|---|
20230150518 A1 | May 2023 | US |