An embodiment of the invention relates generally to capturing images, and in particular, to performing an iterative bundle adjustment for an imaging device.
Digital imaging in electronic devices, including mobile devices, is an important aspect of many electronic devices. Many electronic devices may comprise multiple cameras, where the images from the multiple cameras may require image processing to obtain a desirable image.
Bundle Adjustment (BA) is widely used for example in Augmented Reality (AR), Virtual Reality (VR) and computer vision. Applications requiring bundle adjustment include pose estimation, 3D reconstruction, and image stitching, for example. Bundle Adjustment may often be the most computation intensive part of an image processing pipeline. Therefore, reducing the latency for Bundle Adjustment is significant in enabling a higher system frame rate.
Accordingly, there is a need for methods and devices that improve the performance of digital imaging, including Bundle Adjustment in digital imaging.
A method of performing an iterative bundle adjustment for an imaging device is described. The method comprises implementing a plurality of functions in performing a bundle adjustment. Predetermined functions of the plurality of functions may be started, using a processor, for a second iteration in parallel with a first iteration of the plurality of functions. The result of the predetermined functions started during the first iteration may be used in a second iteration. An output of the bundle adjustment may then be generated for successive iterations.
An imaging device performing an iterative bundle adjustment is also described, where the imaging device may comprise a processor configured to implement a plurality of functions in performing the bundle adjustment. The processor may also start predetermined functions of the plurality of functions for a second iteration in parallel with a first iteration of the plurality of functions; use the result of the predetermined functions started during the first iteration in a second iteration; and generate an output of the bundle adjustment for successive iterations.
A non-transitory computer-readable storage medium having data stored therein representing software executable by a computer for performing an iterative bundle adjustment may comprise implementing a plurality of functions in performing a bundle adjustment; starting predetermined functions of the plurality of functions for a second iteration in parallel with a first iteration of the plurality of functions; using the result of the predetermined functions started during the first iteration in a second iteration; and generating an output of the bundle adjustment for successive iterations.
The circuits and methods set forth below reduce the latency of a Bundle Adjustment operation, such as in a device receiving images. Bundle Adjustment generally relates to an adjustment of camera parameters for multiple cameras based upon the reflection of light from objects received by the cameras. A Bundle Adjustment operation comprises a plurality of image processing steps that may be operated sequentially. Unlike conventional Bundle Adjustment operations used in AR, VR or computer vision as a backend optimization tool and which are often the bottleneck for achieving high system update rates for many applications, the circuits and methods set forth below enable a start of next iteration of a Bundle Adjustment operation in parallel with the current iteration, and may be described as Look-ahead Bundle Adjustment. According to some implementations, some steps of image processing, such as Jacobian and Hessian (J&H) execution for a next iteration, is started in parallel (i.e. a “look-ahead” operation) to a Cost function and high-level Iterative Control (which may be for example a Levenberg-Marquardt (LM) control) during a current iteration with the assumption that the corresponding solution will be accepted. Intermediate results can be saved and reused across iterations. Further, Jacobian and Hessian outputs for an accepted solution can be saved and can be restored across iterations.
There may also be a parallel execution of multiple branches (referred to below as P branches) of a solve operation. With the same Jacobian and Hessian output, one or multiple (P) branches of Solve and Cost functions can be run with different system parameters, for example with a different mu value related to a search scope, as will be described in more detail below. The parallel factor P (i.e. the number of branches running in parallel) can also be adaptive in run-time.
A modified high-level Iterative Control (e.g. LM control) can be implemented, where an iterative control block can Accept/Abort/Rerun/Restore a Jacobian and Hessian operation based on Cost results for all P branches. One advantage of the circuits and methods is that they lead to a reduction in latency for a bundle adjustment operation, particularly in cases where there are bursts of success iterations or bursts of failed iterations (i.e. multiple of consecutive iterations has the same status of success or fail). When an iteration fails, a next iteration can use a previous valid solution of a given function for a next iteration.
The embodiments of the circuits and methods can implement the described functions in a CPU, GPU, custom HW architecture (HWA), or a combination of elements of elements of an integrated circuit device. The embodiments of the circuits and methods can exploit the parallelism within each of the functions to accelerate its execution. In one embodiment, the branches may be running in parallel in different hardware (e.g. a CPU core, GPU, HWA). The LM control can check the Cost results in a particular order. In some embodiments, the branches may be running in the same shared hardware. The branches may be run in a predetermined order, where the LM control can check the cost after each branch is finished. Operation of remaining branches can be aborted if a success condition is met for a current branch. In some embodiments, the number of parallel branches is fixed. In other embodiments, the number of parallel branches may be modified across iterations according to a system strategy.
While the specification includes claims defining the features of one or more implementations of the invention that are regarded as novel, it is believed that the circuits and methods will be better understood from a consideration of the description in conjunction with the drawings. While various circuits and methods are disclosed, it is to be understood that the circuits and methods are merely exemplary of the inventive arrangements, which can be embodied in various forms. Therefore, specific structural and functional details disclosed within this specification are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the inventive arrangements in virtually any appropriately detailed structure. Further, the terms and phrases used herein are not intended to be limiting, but rather to provide an understandable description of the circuits and methods.
Before describing the figures in more detail below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The term “couple” and its derivatives refer to any direct or indirect communication between two or more elements, whether or not those elements are in physical contact with one another. The terms “transmit,” “receive,” and “communicate,” as well as derivatives thereof, encompass both direct and indirect communication. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrase “associated with,” as well as derivatives thereof, means to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, have a relationship to or with, or the like. The phrase “at least one of,” when used with a list of items, means that different combinations of one or more of the listed items may be used, and only one item in the list may be needed. For example, “at least one of: A, B, and C” includes any of the following combinations: A, B, C; A and B; A and C; B and C; and A and B and C.
Moreover, various functions described below can be implemented or supported by one or more computer programs, each of which is formed from computer readable program code and embodied in a computer readable medium. The terms “application” and “program” refer to one or more computer programs, software components, sets of instructions, procedures, functions, objects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer readable program code. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory. A “non-transitory” computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical or other signals. A non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable memory device.
Turning first to
The processor 102 may be coupled to a display 106 for displaying information to a user. The processor 102 may also be coupled to a memory 108 that allows storing information related to data or information associated with achieving a goal. The memory 108 could be implemented as a part of the processor 102, or could be implemented in addition to any cache memory of the processor, as is well known. The memory 108 could include any type of memory, such as a solid state drive (SSD), Flash memory, Read Only Memory (ROM) or any other memory element that provides long term memory, where the memory could be any type of internal memory of the electronic device or external memory accessible by the electronic device.
A user interface 110 is also provided to enable a user to both input data and receive data. The user interface could include a touch screen user interface commonly used on a portable communication device, and other input/output (I/O) elements, such as a speaker and a microphone. The user interface could also comprise devices for inputting or outputting data that could be attached to the mobile device by way of an electrical connector, or by way of a wireless connection, such as a Bluetooth or a Near Field Communication (NFC) connection.
The processor 102 may also be coupled to other elements that receive input data or provide data, including various sensors 111, an inertial measurement unit (IMU) 112 and a Global Positioning System (GPS) device 113 for activity tracking. For example, an inertial measurement unit (IMU) 112, which may include a gyroscope and an accelerometer for example, can provide various information related to the motion or orientation of the device, while GPS 113 provides location information associated with the device. The sensors, which may be a part of or coupled to a mobile device, may include by way of example a light intensity (e.g. ambient light or UV light) sensor, a proximity sensor, an environmental temperature sensor, a humidity sensor, a heart rate detection sensor, a galvanic skin response sensor, a skin temperature sensor, a barometer, a speedometer, an altimeter, a magnetometer, a hall sensor, a gyroscope, WiFi transceiver, or any other sensor that may provide information to the mobile device. The processor 102 may receive input data by way of an input/output (I/O) port 114 or a transceiver 116 coupled to an antenna 118. While the electronic device of
Turning now to
A Bundle Adjustment operation may comprise an iterative, non-linear optimization process and generally may contain functions in two levels. In a low level (i.e. within an iteration), several functions may be run sequentially, such as Jacobian, Hessian, Solver and Cost compute functions, for performing image processing. The Jacobian of a function is a matrix of the first partial derivative of the function, and may generally describe an amount of rotation, transformation and stretching. The Hessian of a function is the Jacobian of the function's gradient, and generally describes local curvature of a function of many variables. By way of example, the Hessian function can be represented by H=JT·J. The Solver block can solve a normal equation (H+μ ·diag(H))·δX=JTe and obtain new candidate solution Xnew=X+δX, where X is an output of the solver block, which may represent a normal equation and is used to update a candidate solution Xnew. Alternatively, the Solver block can solve a normal equation (H+μ·I)·δX=JTe, where I is the identity matrix. Bundle Adjustment minimizes the squared Mahalanobis distance eTe over parameters space X. As will be described in more detail below, the cost is then computed for a current iteration based on Xnew and represents a minimum error value associated with a vector parameter based upon 3D points captured by the camera and a measurement vector associated with measured image coordinates for all cameras. The cost function will be described in more detail below in reference to
At a high level, iteration control may be performed using “trust region” strategies (i.e. a mathematical optimization to denote the subset of the region of an objective function that is approximated using a model function). If an adequate model of the objective function is found within the trust region, then the region is expanded, whereas the region is contracted if the approximation is poor. The region can be expanded or contracted using the variable mu (μ) provided to the Solve block 206, as will be described in more detail below. Commonly used iterative control based upon trust region strategies include LM and Dogleg iteration control for example. Using LM iteration control for example, the cost of a current iteration is compared with previous iterations to decide whether to accept or reject the current solution, where convergence criteria is checked by the LM control block. If the cost of current iteration is smaller than previous iteration, then the iteration is a success. The LM control block may then reduce mu (e.g. divide by a factor Kd to decrease search radius) and accept the new solution from the iteration. In contrast, if the cost of current iteration is not smaller than the previous iteration, it is failed. The LM control block may then increase mu (e.g. multiply by a factor Ki to increase search radius) and reject the solution and reuse the last accepted solution for the next iteration. Because the sequential nature of the individual functions, the total delay may be based upon a number of iterations multiplied by the time taken by each iteration, where each iteration has sequential execution of its functions. While individual functions (e.g. Jacobian or Hessian) can be accelerated by using parallel branches, iteration control using trust region strategies can also improve performance during image processing.
Turning now to
The Iteration Control block 322 provides the main control for the functions. It controls the input parameter from the P select circuit 314 for the Jacobian and Hessian function based upon a Parameter Select value. It also provides Start, Abort and Save signals to the Jacobian and Hessian blocks 301-303 to control Jacobian and Hessian outputs to be saved, and which Jacobian and Hessian output to be passed to the Solve and Cost branches, as will be described in more detail below in reference to
As set forth above, Bundle Adjustment refers to the adjustment of bundles of rays that leave 3D feature points onto each camera centers with respect to multiple camera positions and point coordinates. It produces jointly an optimal 3D structure and viewing parameters by minimizing the cost function for a model fitting error. The re-projection error between the observed and the predicted image points, which is expressed for m images and n points as
Error(P,X)=Σi=0nΣj=1md(Q(Pj,Xi),Xij)2 (1)
where Xij represents a measured projection of point i on the image j, Q(Pj;Xi) is the predicted projection of point ion image j and d(x; y) the Euclidean distance between the inhomogeneous image points represented by x and y. Using the reprojection error, the cost function is defined as,
Cost(P;X)=minP;XError(P;X) (2)
According to various aspects of the invention, different numbers of branches can be used. However, additional branches require some additional circuits, and therefore requires additional circuit footprint and power. When a number of branches are implemented, it may be beneficial to operate only a number of those branches, depending upon the results of the various stages, including the cost function for example. As will be described in more detail below, different numbers of branches can be implemented, and it can be determined whether all of the branches should be used based upon the cost results generated.
When the circuit of
Turning first to
As can be seen in
Latency reduction can be determined as follows:
δT1=(Bs−1)·min(TJH,TCLM), (3)
wherein TJH is equal to the time for the Jacobian and Hessian function operations and TCLM is equal to the time for the Cost and Iteration Control operation (where the iteration control is performed using an LM control block for example). If the first branch of Solve and Cost is a success, the iteration is considered a success, and only one branch is useful in this case. In an embodiment where parallel number P is adaptive, the number of branches can gradually be reduced from P to 1 to save power. The LM control will perform the following operation: Set mu′=mu/Kd; Use the Jacobian and Hessian (Parameter Value V1) for next iteration; save the Hessian output for the success branch; and update iteration state as a success.
Turning now to
δT2=Bf·(TJH+TS+TCLM)+TJH, (4)
where Ts is equal to the time to perform the solve function.
As shown in
Turning now to
As shown in
Turning now to
According to some implementations, the plurality of functions may comprise a Jacobian function, a Hessian function, a solve function, and a cost function, where the predetermined functions started for a next iteration in parallel with the first iteration may comprise the Jacobian function and the Hessian function. A previous valid Jacobian and Hessian solution may be used for the next iteration when the next iteration fails. Also, intermediate results of the predetermined functions may be saved and reused across iterations of performing the bundle adjustment. Starting predetermined functions of the plurality of functions for a next iteration in parallel may comprise performing the predetermined functions on a plurality of branches, wherein a number of the plurality of branches can be adjusted in run-time. The method may further comprise providing a modified LM control based upon results of the cost function for all branches of the plurality of branches.
The various elements of the methods of
The benefits of the invention have been evaluated using 10 real datasets. The table below shows the iterations and the bursts for their iterations.
In the last two columns, the histogram of the burst lengths for failed and success iterations are shown, where N-th entry in the histogram represents number of occurrences of length-N burst of failed or passed iterations. For example, [0, 0, 0, 0, 1] represents (1 burst of length 5, no shorter burst), and [1, 0, 1] represents (1 burst of length 1 and 1 burst of length 3).
The normalized latency of these datasets is listed in table below, where the existing technology result are compared with the invention for P=1, 2, 3. In all cases, the invention significantly reduced the latency.
As can be seen in Table 2, the percent savings increases for a given dataset with an increase in the number of branches P. While an increase in the number of branches P requires additional circuit and power resources, power can be reduced by disabling certain branches P depending upon the results of the cost evaluations as described above.
It can therefore be appreciated that new circuits for and methods of performing a Bundle Adjustment for an imaging device have been described. It will be appreciated by those skilled in the art that numerous alternatives and equivalents will be seen to exist that incorporate the disclosed invention. As a result, the invention is not to be limited by the foregoing implementations, but only by the following claims.