This patent document claims the benefit of U.S. patent application Ser. No. 15/996,292, filed on Jun. 1, 2018, which is incorporated herein by reference in its entirety for all purposes.
This document relates to camera position and orientation estimation based on captured images.
Autonomous vehicle navigation is a technology for sensing the position and movement of a vehicle and, based on the sensing, autonomously control the vehicle to navigate towards a destination. Autonomous vehicle navigation can have important applications in transportation of people, goods and services. One of the components of autonomous driving, which ensures the safety of the vehicle and its passengers, as well as people and property in the vicinity of the vehicle, is reliable navigation. Reliably navigating in urban environments requires precise location information of cars and obstacles adjacent to the vehicle.
Disclosed are devices, systems and methods for robust camera pose estimation. This may be achieved by incorporating a moving object smoothness constraint into a known camera pose estimation objective function, wherein the moving object smoothness constraint is based on object detection and tracking results.
In one aspect, the disclosed technology can be used to provide a method for robust camera pose estimation. This method, implemented in a vehicle, may include determining a first bounding box based on a previous frame, determining a second bounding box based on a current frame that is temporally subsequent to the previous frame, estimating the camera pose by minimizing a weighted sum of a camera pose function and a constraint function, where the camera pose function tracks a position and an orientation of the camera in time, and where the constraint function is based on coordinates of the first bounding box and coordinates of the second bounding box, and using the camera pose for navigating the vehicle.
In another exemplary aspect, the above-described methods are embodied in the form of processor-executable code and stored in a computer-readable program medium.
In yet another exemplary aspect, devices that are configured or operable to perform the above-described methods are disclosed.
The above and other aspects and features of the disclosed technology are described in greater detail in the drawings, the description and the claims.
Camera pose estimation is an important component for vision-based autonomous driving which is used to infer the location of the moving objects by backprojecting the detection results in the image to the 3D world. The pose of a camera is its position and orientation with respect to a referenced coordinate system. In some embodiments, the pose includes a three-dimensional orientation and a three-dimensional translation. In an example, the orientation of an adjacent vehicle or object with respect to a host vehicle may be evolving, and may continuously need to be tracked in the frame of reference of the host vehicle, e.g., in its reference coordinate system, to ensure the safe driving and maneuvering of the host vehicle.
In a vehicular system, the camera pose changes as a function of time due to the relative movements of the host and adjacent vehicles that must be accounted for, in the reference coordinate system of the host vehicle, to ensure that estimates of the relative locations of adjacent vehicles and obstacles are accurate. Unreliable estimates of the camera pose may prove very detrimental for autonomous driving and navigation. Embodiments of the disclosed technology advantageously provide a moving object smoothness constraint in addition to a known camera pose estimation objective function to improve the robustness of the pose estimation by incorporating object detection and tracking results. For example, the projection error between the current detection results and the projection of the moving object from tracking is minimized.
Existing systems may directly use the estimate of the camera pose to determine the vehicle position, as well as the positions/locations of adjacent vehicles. However, this approach is very sensitive to the raw estimate of the camera pose, which may be subject to many sources of error. Embodiments of the disclosed technology adopt an approach that relies on regularization, which improves the robustness of the resulting estimate.
The regularization term (also referred to as the constraint function in this document) minimizes the perturbations in a solution to an optimization problem. In this case, the resulting camera pose estimate is less susceptible to perturbations that may result from a variety of factors (e.g. host vehicle movement, road vibrations, calibration errors, etc.) since it relies on the location of an adjacent vehicle. More specifically, the temporal correlation or continuity of the location of the adjacent vehicle is used to ensure that perturbations in the camera pose estimate may be eliminated. In other words, adding the regularization term assumes that the location of an adjacent vehicle will not change dramatically between two subsequent frames.
This patent document provides an exemplary mathematical framework for adding the regularization term to the camera pose objective function, and discusses its implementation in the safe navigation of vehicles, both fully-autonomous and semi-autonomous.
At a later time t2>t1, the camera is in a different (or second) pose 140 with respect to the first position of the camera 130. In some embodiments, the difference in the camera pose may be due to the movement of the host vehicle. The camera in the second pose 140 at time t2 captures its own image (or frame) 180 of the car 120. A second bounding box 160 identifies the rear of the car in the second frame, in which the car has moved relative to its position in the first frame 170. In other words, the bounding box for the same vehicle is tracked across different frames. In some embodiments, the car 120 is being tracked with high confidence by the tracking system in the vehicle that is implementing the disclosed technology.
The dashed line in
In automatic navigation, to control a vehicle's speed and bearing, it is useful to obtain a quantitative estimate of an object's movement (e.g., is the next car now closer or farther than it was at a previous time), and also camera movement (to avoid un-necessarily accelerating or braking the vehicle based on camera pose fluctuations).
In some embodiments, the first bounding box 150 and the second bounding box 160 (which correspond to the same adjacent car 120 at different times), and the assumption that the car 120 could not have moved (or shifted locations) dramatically between times t2 and t2, may be used to constrain the camera movement between the two times when estimating the position and orientation of the camera. More generally, the bounding boxes from temporally different frames may be used to smooth the estimation of the camera pose, among other uses.
In some embodiments, the input image at time t is referred to as It:Ω→3 where Ω ⊂ 2 is the image domain. The corresponding 6 degree-of-freedom (DoF) camera pose in the local navigation coordinates (e.g., Euler angles, Lie algebra SO(3), unit quaternions, rotation matrices, and so on) may be represented as a 3D rigid body transformation matrix Pt∈SE(3). In an example, the camera pose
is parameterized with the 6×1 vector ξ ∈ se(3), where R is a 3×3 rotation matrix, Tis a translation, and se(3) is the Lie algebra corresponding to the Special Euclidean group SE(3). The function G: se(3)→SE(3) is defined as a function to form the rigid body transformation matrix. A 2D point in an image is denoted as x=(u, v)T∈Ω. Its corresponding homogeneous vector is defined as {dot over (x)}=(u, v, 1)T, including de-homogenization by π(X)=(x/z, y/z)T.
Based on the framework described above, an exemplary algorithm for incorporating a smoothness constraint into a known camera pose estimation objective function is defined as:
(1) The bottom center point of the detected bounding box of the moving object in a previous frame can be backprojected to the 3D world to get its 3D position p on the ground using a terrain map and the estimated camera pose.
(2) A plane passing p and its normal is the road direction at p are constructed. The four corners of the detected bounding box detection are backprojected and intersect with the plane. Thus, the four corners of the rear bounding box of the moving object in the 3D world may be determined.
(3) Tracking of the moving object is used to predict the 3D position of the four corners of the bounding box in the current frame.
(4) The camera pose is computed by minimizing the sum of an objective function ƒ (ξ) and the moving object smoothness constraint,
where λ is the coefficient balancing the known objective function ƒ (ξ) and the smoothness constraint, i iterates through the number of the moving objects included in the optimization function, j iterates through the four corners of the bounding box, xi,j is the detection result in the current frame, {dot over (X)}i,j is the corresponding homogeneous vector of 3D corner point Xi,j, and xi,j is the projection of the predicted 3D position of the four corners in the current frame defined as follows:
xi,j=π(K[I3×3|0]G(ξ){dot over (X)}i,j),
where K is a function based on intrinsic parameters of the camera. In some embodiments, the intrinsic parameters may include the focal length of the camera, the angle of view, and/or the center of the image captured by the camera.
In some embodiments, the smoothness constraint may be reinterpreted as
where, similar to the description above, i iterates through the number of the moving objects included in the optimization function, j iterates through the P corners of a generic bounding polygon, xi,j is the detection result in the current frame, and xi,j is the projection of the predicted 3D position of the four corners in the current frame. Embodiments of the disclosed technology may use this alternate interpretation of the smoothness constraint to use detection and prediction results from different frames for different vehicles. For example, the Nth and (N−1)th frames may be used to determine the bounding polygons for a first vehicle, whereas the Nth and (N−3)th frames may be used to determine the bounding polygons for a second vehicle. Using different frames to compute the bounding boxes for different vehicles advantageously enables, for example, the disclosed implementations to account for different speeds of adjacent vehicles.
In some embodiments, the bounding polygon used for each of the tracked vehicles may be distinct. For example, a rectangular bounding box may be used a first set of vehicles, and a many-cornered polygon (e.g., 12-15 corners) may be used for a second set of vehicles. Using different polygons for different vehicles advantageously enables, for example, the disclosed implementations to account for different cross-sections of adjacent vehicles.
In some embodiments, the coefficient λ that balances the known objective function and the smoothness constraint, as well as the confidence level cutoff (used to determine the number of moving objects included in the optimization function), may be selected based on experimental results. For example, different parameters may be tested, and the values that are robust and provide the best performance may be used. In other embodiments, the values may be updated periodically based on additional experimental results.
In an example, the objective function may be transformation from world coordinates to camera coordinates, and whose optimization generates the position and orientation of the camera, i.e. the camera pose.
The method 400 includes, at step 420, determining a second bounding box based on a current frame that is temporally subsequent to the previous frame. In some embodiments, and as described in step (3) of the algorithm, a second bounding box may be determined based on the current frame. In some embodiments, the previous and current frames are selected based on a frame rate used by the camera. For example, consecutive frames from a camera with a slow frame rate (e.g., 24 or 25 frames/sec) may be used to determine the first and second bounding boxes. In another example, intermediate frames from a camera with a fast frame rate (e.g., 60 frames/sec) may be dropped between the frames used to determine the first and second bounding boxes. In some embodiments, the choice of frame rate and the selection of frames to determine the bounding boxes may be based on the available hardware and computational processing power, as well as prevalent traffic conditions.
In some embodiments, the first bounding box and the second bounding box may be of different sizes. For example, the method 400 may initially use consecutive bounding boxes of the same size. However, subsequently during operation, upon determining that the adjacent vehicle is moving away at a high speed and/or the camera is using a slow frame rate, the method 400 may select the second bounding box to be smaller than the first bounding box. Similarly, if the vehicle is closing up on the adjacent vehicle, then the second bounding box may be made larger than the first bounding box to fit the increasing size of the adjacent vehicle on the screen.
The method 400 may include generating an initial estimate of the camera pose, which may be used in the determination of the first and second bounding boxes. In some embodiments, the initial estimate of the camera pose may be based on a Global Positioning System (GPS) sensor and/or an Inertial Measurement Unit (IMU). Using a previous estimate of the camera pose as an initial estimate for subsequent processing is not recommended since vibrations of the camera rack (for example, due to the road surface) may induce a drift in the camera pose. Thus, the GPS sensor and IMU are independent measures that are used to generate the initial estimate.
The method 400 includes, at step 430, estimating the camera pose. In some embodiments, the camera pose may be estimated by minimizing a weighted sum of a camera pose function and a constraint function, where the camera pose function tracks a position and an orientation of the camera in time, and where the constraint function is based on coordinates of the first bounding box and coordinates of the second bounding box. In some embodiments, the camera pose may be estimated in a reference coordinate system of the host vehicle (e.g., Euler angles, Lie algebra SO(3), unit quaternions, rotation matrices, and so on).
In some embodiments, and in the context of networked V2V (vehicle-to-vehicle) communications, the estimation of the camera pose may be augmented by location and/or orientation information received from other vehicles. For example, an adjacent car may transmit its location to the host vehicle, and the location information may be used to refine the camera pose estimate. Embodiments of the disclosed technology may receive this location information in a coordinate system that is different from the reference coordinate system. In these scenarios, the location information will be converted to the coordinate system of the host vehicle, and then incorporated into the estimation of the camera pose.
In some embodiments, the camera pose estimate may be based on any road markings or signs that are available in the images/frames captured. For example, since the exact positions and orientations of freeway on-ramp and off-ramp signs are known, their locations may be used to improve the estimation of the camera pose. The optimization function (which is the weighted sum of a camera pose function and a constraint function) may be augmented by another function that incorporates the known location and/or orientation of fixed objects, thereby further reducing the estimation error.
In some embodiments, and as described in step (4) of the algorithm, the constraint function (or regularization term) may iterate over the number of the moving objects included in the optimization function. In other words, the bounding boxes corresponding to multiple vehicles may be incorporated in generating a robust estimate of the camera pose. In some embodiments, only those vehicles that are being tracked with high confidence (e.g., tracking error less than 5%) may be iterated over.
In some embodiments, the weight (e.g. λ) corresponding to the constraint function may be determined experimentally. In other embodiments, it may be based on previous estimates of the camera pose. In yet other embodiments, it may be based on the tracking error. More generally, this weight balances the known objective function and the smoothness constraint, and may be kept constant, or varied over different timescales.
The method 400 includes, at step 440, using at least the camera pose for navigating the vehicle. In some embodiments, the first and second bounding boxes may correspond to the rear of a car that is in front of the vehicle. Accurately determining the positions of adjacent cars based on a robust camera pose estimates enables the safe navigation of the vehicle.
In some embodiments, the acceleration, deceleration and/or steering of the vehicle may be based on the camera pose estimate, which corresponds to the location of adjacent vehicles. For example, a safe following distance from the vehicle directly in front of the host vehicle may be maintained based on the camera pose estimate.
Implementations of the subject matter and the functional operations described in this patent document can be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term “data processing unit” or “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
It is intended that the specification, together with the drawings, be considered exemplary only, where exemplary means an example. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Additionally, the use of “or” is intended to include “and/or”, unless the context clearly indicates otherwise.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only a few implementations and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.
Number | Name | Date | Kind |
---|---|---|---|
6975923 | Spriggs | Dec 2005 | B2 |
7742841 | Sakai et al. | Jun 2010 | B2 |
8346480 | Trepagnier et al. | Jan 2013 | B2 |
8706394 | Trepagnier et al. | Apr 2014 | B2 |
8718861 | Montemerlo et al. | May 2014 | B1 |
8983708 | Choe et al. | Mar 2015 | B2 |
9088744 | Grauer et al. | Jul 2015 | B2 |
9214084 | Grauer et al. | Dec 2015 | B2 |
9219873 | Grauer et al. | Dec 2015 | B2 |
9282144 | Tebay et al. | Mar 2016 | B2 |
9317033 | Ibanez-guzman et al. | Apr 2016 | B2 |
9347779 | Lynch | May 2016 | B1 |
9390506 | Asvatha Narayanan | Jul 2016 | B1 |
9418549 | Kang et al. | Aug 2016 | B2 |
9494935 | Okumura et al. | Nov 2016 | B2 |
9507346 | Levinson et al. | Nov 2016 | B1 |
9513634 | Pack et al. | Dec 2016 | B2 |
9538113 | Grauer et al. | Jan 2017 | B2 |
9547985 | Tuukkanen | Jan 2017 | B2 |
9549158 | Grauer et al. | Jan 2017 | B2 |
9599712 | Van Der Tempel et al. | Mar 2017 | B2 |
9600889 | Bolsson et al. | Mar 2017 | B2 |
9602807 | Crane et al. | Mar 2017 | B2 |
9620010 | Grauer et al. | Apr 2017 | B2 |
9625569 | Lange | Apr 2017 | B2 |
9628565 | Stenneth et al. | Apr 2017 | B2 |
9649999 | Amireddy et al. | May 2017 | B1 |
9690290 | Prokhorov | Jun 2017 | B2 |
9701023 | Zhang et al. | Jul 2017 | B2 |
9712754 | Grauer et al. | Jul 2017 | B2 |
9723233 | Grauer et al. | Aug 2017 | B2 |
9726754 | Massanell et al. | Aug 2017 | B2 |
9729860 | Cohen et al. | Aug 2017 | B2 |
9739609 | Lewis | Aug 2017 | B1 |
9753128 | Schweizer et al. | Sep 2017 | B2 |
9753141 | Grauer et al. | Sep 2017 | B2 |
9754490 | Kentley et al. | Sep 2017 | B2 |
9760837 | Nowozin et al. | Sep 2017 | B1 |
9766625 | Boroditsky et al. | Sep 2017 | B2 |
9769456 | You et al. | Sep 2017 | B2 |
9773155 | Shotton et al. | Sep 2017 | B2 |
9779276 | Todeschini et al. | Oct 2017 | B2 |
9785149 | Wang et al. | Oct 2017 | B2 |
9805294 | Liu et al. | Oct 2017 | B2 |
9810785 | Grauer et al. | Nov 2017 | B2 |
9823339 | Cohen | Nov 2017 | B2 |
20020191003 | Hobgood | Dec 2002 | A1 |
20140037136 | Ramalingam | Feb 2014 | A1 |
20140085545 | Tu | Mar 2014 | A1 |
20160334230 | Ross et al. | Nov 2016 | A1 |
20170243352 | Kutliroff | Aug 2017 | A1 |
20180307916 | Satzoda | Oct 2018 | A1 |
Number | Date | Country | |
---|---|---|---|
20200126255 A1 | Apr 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15996292 | Jun 2018 | US |
Child | 16717937 | US |