This application is based on and claims priority under 35 U.S.C. § 119 to Korean Patent Application No. 10-2021-0053759, filed on Apr. 26, 2021, in the Korean Intellectual Property Office, the disclosure of which is incorporated by reference herein in its entirety.
The disclosure relates to methods and apparatuses for accelerating simultaneous localization and mapping.
Simultaneous localization and mapping (SLAM) is a technology for obtaining information about peripheral areas by an apparatus moving along arbitrary spaces and estimating a map of the spaces as well as a current location of the apparatus based on the obtained information. SLAM technology is used in various fields including augmented reality (AR), robots, autonomous cars, etc. For example, an apparatus for performing SLAM may obtain an image of a space by using a sensor such as a camera, etc., and estimate a map of the space and a current location thereof through an analysis on the image and coordinates set-up.
SLAM may be divided into front-end for extracting feature points and performing operations of three-dimensional spatial coordinates based on information obtained from a sensor, and back-end for optimizing map information and current location information based on data received from the front-end. While the front-end considers only the increment of location movement, the back-end optimizes location information based on the map, and thus has a significant influence on the overall performance of SLAM. Meanwhile, operation quantity required for the back-end may vary depending on the size of a map, the size of sensor data, the required degree of precision, etc., and a method of performing a large volume of operations with high speed and low power may be required in SLAM using combinations of various sensors.
Provided are methods and apparatuses for accelerating simultaneous localization and mapping (SLAM). The technical objects which the disclosure aims to achieve are not limited to the foregoing, and other technical objects may be inferred from the following embodiments.
Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments of the disclosure.
According to an aspect of the disclosure, there is provide an apparatus for accelerating simultaneous localization and mapping (SLAM) including: a memory; and a processor configured to: obtain a first measurement, among a plurality of measurements, for a map point and a camera pose from the memory, determine, based on the first measurement, one or more elements corresponding to an optimization matrix, among a plurality of elements of a Hessian matrix, without generating an entirety of the Hessian matrix for the map point and the camera pose based on all of the plurality of measurements, and accumulate the determined one or more elements over the optimization matrix used to perform optimization operations corresponding to the map point and the camera pose.
The processor may include a pipeline structure configured to sequentially perform first operations related to the first measurement over consecutive cycles after the first measurement is loaded in a first cycle, and wherein the pipeline structure may be configured to perform second operations related to a second measurement, following the first operations regarding the first measurement after the second measurement is loaded in a second cycle which follows the first cycle.
The processor may be further configured to determine the one or more elements corresponding to the optimization matrix by computing first elements of a first matrix block for the camera pose, second elements of a second matrix block for the map point, and third elements of a third matrix block for at least one camera pose corresponding to the map point based on the first measurement.
The first measurement may include a first map point and at least one first camera pose corresponding to the first map point.
The processor may be further configured to perform optimization operations corresponding to states of the map point and the camera pose by using the optimization matrix based on elements sequentially determined for all measurements being accumulated over the optimization matrix.
The first measurement may correspond to a result of performing front-end operations for data obtained from a sensor including at least one of a camera, an inertial measurement unit (IMU), a depth sensor, a global positioning system (GPS), or an odometer.
Based on the first measurement corresponding to the result of performing the front-end operations for the data obtained from the camera and the IMU, the processor may be further on configured to divide the first measurement into a first part corresponding to both of the camera and the IMU, and a second part corresponding only by the IMU.
The processor may be further configured to: based on the first part: determine first elements of a first matrix block for the camera pose, determine second elements of a second matrix block for the map point, determine third elements of a third matrix block for at least one camera pose corresponding to the map point, by using the first part, accumulate the first elements, the second elements and the third elements over the optimization matrix, and based on the second part: determine fourth elements of a fourth matrix block for the camera pose, and accumulate the fourth elements over the optimization matrix.
The processor may be further configured to divide operations to determine the one or more elements into a plurality of sub-tracks.
A track length of the plurality of sub-tracks may be set based on a number of camera poses in which the processor is able to perform operations simultaneously.
According to another aspect of the disclosure, there is provided a method of accelerating simultaneous localization and mapping (SLAM), the method including: obtaining a first measurement, among a plurality of measurements, for a map point and a camera pose from a memory, determining, based on the first measurement, one or more elements corresponding to an optimization matrix, among a plurality of elements of a Hessian matrix, without generating an entire the Hessian matrix for the map point and the camera pose based on all of the plurality of measurements, and accumulating the determined one or more elements over the optimization matrix used to perform optimization operations corresponding to the map point and the camera pose.
The method may further include sequentially performing first operations related to the first measurement over consecutive cycles after the first measurement is loaded in a first cycle, and performing second operations related to a second measurement, following the first operations regarding the first measurement, after the second measurement is loaded in a second cycle which follows the first cycle.
The determining of the one or more elements corresponding to the optimization matrix in connection with the first measurement may include determining first elements of a first matrix block for the camera pose, second elements of a second matrix block for the map point, and third elements of a third matrix block for at least one camera pose corresponding to the map point based on the first measurement.
The first measurement may include a first map point and at least one first camera pose corresponding to the first map point.
The method may further include performing optimization operations corresponding to states of the map point and the camera pose by using the optimization matrix based on elements sequentially determined for all measurements being accumulated over the optimization matrix.
The first measurement may correspond to a result of performing front-end operations for data obtained from a sensor including at least one of a camera, an inertial measurement unit (IMU), a depth sensor, a global positioning system (GPS), or an odometer.
The method may further include, based on the first measurement corresponding to the result of performing the front-end operations for the data obtained from the camera and the IMU, dividing the first measurement into a first part corresponding to both of the camera and the IMU, and a second part corresponding to only by the IMU.
The method may further include, based on the first part: determining first elements of a first matrix block for the camera pose, determining second elements of a second matrix block for the map point, determining third elements of a third matrix block for at least one camera pose corresponding to the map point, by using the first part, and accumulating the first elements, the second elements and the third elements over the optimization matrix; and based on the second part: determining fourth elements of a fourth matrix block for the camera pose and accumulating the fourth elements over the optimization matrix.
The method may further include dividing operations to determine the one or more elements into a plurality of sub-tracks.
According to another aspect of the disclosure, there is provided a computer-readable recording medium on which a program for executing a method of accelerating simultaneous localization and mapping (SLAM), the method including: obtaining a first measurement, among a plurality of measurements, for a map point and a camera pose from a memory, determining, based on the first measurement, one or more elements corresponding to an optimization matrix, among a plurality of elements of a Hessian matrix, without generating an entire the Hessian matrix for the map point and the camera pose based on all of the plurality of measurements, and accumulating the determined one or more elements over the optimization matrix used to perform optimization operations corresponding to the map point and the camera pose.
According to another aspect of the disclosure, there is provided an apparatus for accelerating simultaneous localization and mapping (SLAM) including: a memory; and a processor configured to: obtain a first measurement, among a plurality of measurements, for a map point and a camera pose from the memory, determine only first elements of an optimization matrix, the first elements corresponding to the map point and the camera pose in the first measurement, update the first elements into the optimization matrix, and perform optimization operations corresponding to the map point and the camera pose based on the optimization matrix.
The above and other aspects, features, and advantages of certain embodiments of the disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. In this regard, the example embodiments may have different forms and should not be construed as being limited to the descriptions set forth herein. Accordingly, the example embodiments are merely described below, by referring to the figures, to explain aspects. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list.
General terms which are currently used widely have been selected for use in consideration of theirs functions in example embodiments; however, such terms may be changed according to an intention of a person skilled in the art, precedents, advent of new technologies, etc. Further, in certain cases, terms have been arbitrarily selected, and in such cases, meanings of the terms will be described in detail in corresponding descriptions. Accordingly, the terms used in the example embodiments should be defined based on their meanings and overall descriptions of the embodiments, not simply by their names.
In some descriptions of the example embodiments, when a portion is described as being connected to another portion, the portion may be connected directly to another portion, or electrically connected to another portion with an intervening portion therebetween. An expression used in the singular encompasses the expression of the plural, unless it has a clearly different meaning in the context. When a portion “includes” an element, another element may be further included, rather than excluding the existence of the other element, unless otherwise described.
The terms “comprise” or “include” used in the example embodiments should not be construed as including all components or operations described in the specification, and may be understood as not including some of the components or operations, or further including additional components or operations.
While such terms as “first,” “second,” etc., may be used to describe various components, such components must not be limited to the above terms. The above terms are used only to distinguish one component from another.
The descriptions of the following embodiments should not be construed as limiting the scope of rights, and matters that those skilled in the art can easily derive should be construed as being included in the scope of rights of the embodiments. Hereinafter, example embodiments will be described in detail as an example, with reference to the attached drawings.
Referring to
According to an example embodiment, the connecting part 120 of the wearable electronic device 100 may include a projector, a processor 130, and an accelerator 140.
According to an example embodiment, the projector may receive data from the outside and emit a beam generated based on the received data to the lens 110. The beam emitted from the projector may be refracted from an object (e.g., a prism) having a refractive index and displayed on the lens 110. The refractive may be an arbitrary index.
According to an example embodiment, the processor 130 may perform overall functions to control the wearable electronic device 100. The processor 130 may be implemented by an array of multiple logic gates, and may be implemented by a combination of a general purpose microprocessor and a memory in which a program executable by the microprocessor is stored.
The processor 130 may receive sensing data regarding the surrounding environment from the sensor. The sensor may include at least one of one or more cameras, an inertial measurement unit (IMU), a depth sensor (e.g., LiDAR), a global positioning system (GPS), and an odometer. The camera may include a pixel array, a complementary metal oxide semiconductor (CMOS) image sensor (CIS), a charge coupled device (CCD) image sensor, etc., but the disclosure is not limited thereto. For example, the processor 130 may obtain image data regarding the surrounding environment using the camera. Further, the processor 130 may obtain data regarding the location, orientation, speed, acceleration, etc. of the wearable electronic device 100 by using the IMU.
According to an example embodiment, the processor 130 may extract keypoints of sensing data received from the sensor, perform operations regarding spatial coordinates, and transmit operation results to the accelerator 140. For example, the processor 130 may extract at least one keypoint from the image data obtained by the camera based on a key point extraction algorithm. The processor 130 may perform operations regarding spatial coordinates of the wearable electronic device 100 based on location and orientation data obtained by the IMU. That is, the processor 130 may perform front-end operations of SLAM.
According to an example embodiment, the processor 130 may include a general central processing unit (CPU), image signal processor (ISP), image processing unit, etc., and may not be optimized for performing back-end operations of SLAM. The accelerator 140 is an apparatus for accelerating SLAM, and may be a processing unit optimized for performing back-end operations of SLAM. The accelerator 140 may be implemented by an array of multiple logic gates, and may be implemented by a combination of a microprocessor and a memory in which a program executable by the microprocessor is stored.
According to an example embodiment, in
Also, in
According to an example embodiment, the wearable electronic device 100 may further include a communication interface. The communication interface may be wireless or wired, and relay data exchange between external devices and the wearable electronic device 100. The wearable electronic device 100 may transmit data processed by the processor 130 and the accelerator 140 through the communication interface to external devices, and may receive data from external devices.
Referring to
According to an example embodiment, the wearable electronic device 200 may be connected to an external device 250 (e.g., a smartphone, a set-top box, etc.) through a communication interface 255. In
According to an example embodiment, the wearable electronic device 200 may transmit sensing data obtained by the sensor to the external device 250 via the communication interface 255. The processor 230 of the external device 250 may perform front-end operations in relation to the sensing data received from the wearable electronic device 200, and transmit operation results to the accelerator 240. The accelerator 240 may perform back-end operations based on the data obtained from the processor 230, and transmit operation results back to the wearable electronic device 200 through the communication interface 255.
Unlike the front-end that considers only the increment of location movement, as the back-end optimizes location information based on the map, it has a significant influence on the overall performance of SLAM. Meanwhile, operation quantity required for the back-end may vary depending on the size of a map, the size of sensor data, the required degree of precision, etc., and a method of performing a large volume of operations with high speed and low power may be required in SLAM using combinations of various sensors. Hereinafter, the process of accelerating back-end operations by the accelerator 140 of
An SLAM accelerator 30 is an apparatus for accelerating SLAM, and may correspond to the accelerator 140 of
Referring to
The factor graph memory 310 is hardware for storing various data processed by the SLAM accelerator 30, and for example, the factor graph memory 310 may store data processed or to be processed by the SLAM accelerator 30. The factor graph memory 310 may include random access memory (RAM), such as dynamic random access memory (DRAM), static random access memory (SRAM), etc., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), a CD-ROM, a Blu-ray disk, other optical disk storages, a hard disk drive (HDD), a solid state drive (SSD), or flash memory. However, the present disclosure is not limited thereto.
The factor graph memory 310 may store data received from a front-end processor (e.g., the processor 130 of
The back-end processor 320 may perform back-end operations for optimization of SLAM. For example, the back-end processor 320 may receive data from the factor graph memory 310 and perform optimization operations in relation to the received data. The received data may correspond to results of movement accumulation of a sensor fusion performed by the front-end processor. The back-end processor 320 may perform repetitive operations in relation to data received from the front-end processor or the factor graph memory 310. For example, the back-end processor 320 may perform operations for estimation of location of the electronic device (e.g., the wearable electronic device 100 of
According to an example embodiment, the back-end processor 320 may estimate the location of the electronic device in the created map. For example, the back-end processor 320 may estimate the location of the moving electronic device in the created map based on the repetitively performed operations. The back-end processor 320 may estimate the location of the electronic device in real time, and data regarding the estimated location may be updated in real time.
The operations performed by the back-end processor 320 may include a bundle adjustment (BA). When a set of images illustrating a plurality of three-dimensional points from different perspectives is given, the BA may refer to refinement of the three-dimensional coordinates explaining scene geometry, parameters of relative motion, and optical characteristics of the camera in real time according to optimization criteria that accompany image projections corresponding to the respective points. Hereinafter, the process of optimizing three-dimensional coordinates of the map points and the camera poses will be described in detail with reference to
As shown in the example of
The back-end operations of SLAM may set an error e representing a difference between a measurement p and the estimate {circumflex over (p)} as an objective function, and include operations for estimating states in which the objective function is minimized for all measurements. The back-end operation of SLAM may be represented by the following Equation 1.
According to the above Equation 1, i represents a frame number, j represents a map point number, and X, which represents a state to be estimated through optimization operations, includes a camera pose Ci and a map point (landmark) Pj.
When the camera measurement is input, estimates may be obtained based on reprojection using three-dimensional coordinates of the map point, and based on a difference between the measurements and the estimates, the error e may be calculated. When the error is calculated, operations for optimizing the objective function according to Equation 1 may be performed.
For optimization of the objective function according to Equation 1, a Gauss-Newton method according to the following Equation 2 may be used.
X
k+1
=X
k−(JeT(Xk)Je(Xk))−1JeT(Xk)e(Xk)
ΔX=(JeT(Xk))Je(Xk))−1JeT(Xk)e(Xk)
(JeT(Xk)Je(Xk))ΔX=JeT(Xk)e(Xk)
H(Xk)ΔX=b(Xk) [Equation 2]
Each time a measurement is input, a state change ΔX for reducing the error according to the above Equation 1 may be estimated. Meanwhile, to estimate the state change ΔX, a Jacobian matrix Je(Xk), which represents partial differentiation of the error, and a Hessian matrix (H(Xk)), which is a product of a Jacobian inverse matrix (JeT(Xk)) and a Jacobian matrix (Je(Xk)), may need to be computed.
In one example, when fx and fy represent a focal distance of the camera based on the x-axis and the y-axis, respectively, X′, Y′, and Z′ represent three-dimensional coordinate elements of the map point based on a camera coordinate system, and R represents a rotation element of the camera, the Jacobian matrix may be calculated according to the following Equation 3.
According to Equation 3, the Jacobian matrix may be calculated by applying Lie algebra to a Jacobian matrix block JC for a camera pose, and a Jacobian matrix block JP for a map point. The Jacobian matrix block JC for the camera pose may be a matrix obtained by partial differentiating a reprojection error by the camera pose, and the Jacobian matrix block JP for the map point may represent a matrix obtained by partial differentiating the reprojection error by the map point.
As the Hessian matrix corresponds to the product of the Jacobian inverse matrix and the Jacobian matrix, it may be calculated based on the Jacobian matrix blocks. For example, the Hessian matrix may be calculated according to the following Equation 4.
According to the above Equation 4, the Hessian matrix may be divided into four Hessian matrix blocks U, W, WT, and V. Referring to Equation 4, the last line of Equation 2 may be represented by the following Equation 5.
Referring to Equation 5, the state change (ΔX) may include a state change ΔX, for a camera pose, and a state change ΔXp for a map point.
Meanwhile, as described above with reference to
According to an example embodiment, as in the following Equation 6, the state change ΔX may be estimated by Schur-complement operations using the Hessian matrix blocks U, W, WT, and V.
SΔX
c
=s
S=U−WV
−1
W
T
s=r
c
−WV
−1
r
p
ΔXp=V−1(rP−WTΔXC) [Equation 6]
According to the Equation 6, the operation for estimating the state change ΔX is changed to the matrix operation only for the camera pose (i.e., the operation for the S matrix and s vector), and the state change (ΔXc) for the camera pose may be obtained first by the changed matrix operation. Then, the state change ΔXp for the map point may be obtained through back substitution. As such, when Schur-complement operations are used to estimate the state change ΔX, operations required for estimation may be reduced significantly, compared to the case of directly solving a Hessian matrix.
The matrix block W and the matrix block WT may represent the relation between the map point and the camera pose. For example, the map point P0 of the matrix block W may be obtained in one frame corresponding to the camera pose C0, and the map point P3 of the matrix block W may be obtained in five frames corresponding to C0 to C4.
The matrix block U and the matrix block V may be a diagonal matrix in which data is included only in diagonal elements, not in any other elements. For example, the matrix block U, which is a matrix block of the camera poses C0 to C10, may include data only in a point where the camera poses C0 and C0 meet, a point where the camera poses C1 and C1 meet, . . . , and a point where the camera poses C10 and C10 meet. In addition, the matrix block V, which is a matrix block for the map points P0 to P13, may include data only in a point where the map points P0 and P0 meet, a point where the map points P1 and P1 meet, . . . , and a point where the map points P13 and P13 meet.
Meanwhile,
Referring to
In operation 1110, the SLAM accelerator may obtain the first measurement for a map point and a camera pose from the factor graph memory. The first measurement may correspond to a result of performing front-end operations for data obtained from a sensor including at least one of a camera, an IMU, a depth sensor, a GPS, and an odometer. The first measurement may include a first map point and at least one camera pose corresponding to the first map point.
The SLAM accelerator may include a pipeline structure configured to sequentially perform operations regarding the first measurement (i.e., operations corresponding to operations 1120 and 1130 to be described below) over consecutive cycles after the first measurement is loaded in a first cycle. The pipeline structure of the SLAM accelerator may perform operations regarding a second measurement, following the operations regarding the first measurement, when the second measurement is loaded in a second cycle following the first cycle. In other words, the SLAM accelerator may include a map point-based pipeline structure to perform factor generation and Schur-complement operations, which take significant operation quantity, with high speed.
In operation 1120, the SLAM accelerator may compute elements affecting an optimization matrix in connection with the first measurement, among elements of a Hessian matrix, instead of generating a whole Hessian matrix for map points and camera poses of all measurements. For example, the SLAM accelerator may compute elements of a matrix block for the camera pose, elements of a matrix block for the map point, and elements of a matrix block for at least one camera pose corresponding to the map point, by using the first measurement.
In operation 1130, the SLAM accelerator may accumulate the computed elements over the optimization matrix used to perform the optimization operations for states of the map point and the camera pose. According to an example embodiment, the SLAM accelerator may accumulate the computed elements over the optimization matrix by adding or inserting the computed elements at respective position in the optimization matrix. According to an example embodiment, the optimization matrix is updated with the computed elements. The SLAM accelerator may perform optimization operations regarding the states of the map point and the camera pose by using the optimization matrix when elements sequentially computed for all measurements are accumulated over the optimization matrix. The optimization operation may include Schur-complement operations, and the optimization matrix may include the S matrix.
According to an example embodiment, as described above with reference to
According to an example embodiment, the SLAM accelerator may compute elements affecting an optimization matrix in connection with a single measurement, among elements of a Hessian matrix, instead of generating the Hessian matrix for map points and camera poses of all measurements. That is, the SLAM accelerator may selectively calculate elements that affect the S matrix in connection with the first measurement, among elements of a Hessian matrix, without generating an intermediary Jacobian matrix or a Hessian matrix. Accordingly, as no Jacobian matrix or Hessian matrix needs to be generated or stored, the memory size may be reduced. According to an example embodiment, the SLAM accelerator may compute elements affecting an optimization matrix in connection with a first measurement, among elements of a Hessian matrix all at one, without having to reload the first measurement. That is, once one measurement is loaded, operations regarding the measurement are processed all at once, and thus, the same measurement does not need to be reloaded, which may lead to an increased operation speed.
The elements calculated for a single measurement may be transmitted directly to a Schur-complement operator. For example, the elements calculated for a single measurement may be accumulated over an optimization matrix for Schur-complement operations. Hereinafter, the process of calculating elements affecting the optimization matrix by the SLAM accelerator will be described in detail with reference to
For example, when the map point P0 is observed in four frames corresponding to the camera poses C1 to C4, the SLAM accelerator may compute reprojection errors e11, e21, e31, and e41. Further, the SLAM accelerator may compute elements of the matrix block U, i.e.,
an element of the matrix block V, i.e.,
and elements of the matrix block W, i.e.,
The SLAM accelerator may generate the S matrix 1320 having the same size as the matrix block U after computing the elements of the matrix block W and the matrix block V, and the elements of the matrix block U, according to Schur-complement operations. For example, the SLAM accelerator may calculate elements of the S matrix 1320 according to the following Equation 7.
S
i1,i2
=U
i1,i2
−ΣW
i1,j
V
−1
j
W
T
i2,j [Equation 7]
In Equation 7, i1 and i2 represent an index of a camera pose, and j represents an index of a map point. In the example of
Referring to
According to an example embodiment, the SLAM accelerator may estimate a camera pose and a map point by using only data obtained from a camera. However, the disclosure is not limited thereto, and as such, according to another example embodiment, performance may be enhanced by using data obtained from other sensors, such as an IMU along with the aforementioned data from the camera. In this case, as shown in
Further, the errors related to the IMU factor may be defined according to the following Equation 9.
e
r=Log((Exp(ΔJrij(bi−b{circumflex over ( )}i))ΔRij)TRjRiT)
e
v
=R
i(vj−vi−gΔtij)−(Δvij+ΔJvij(bi−b{circumflex over ( )}i))
e
p
=R
i(pj−pi−viΔtij−½gΔt2ij)−(Δpij+ΔJpij(bi−b{circumflex over ( )}i))
e
b
=b
j
−b
i [Equation 9]
In Equation 9, i and j represent a frame number (i.e., a camera pose number), and ij may indicate pre-integration from the ith frame into the jth frame.
The elements estimated by using the camera measurements among the elements of the state vector C for the camera pose described above with reference to
Referring to the example of
Referring to the example of
Accordingly, as illustrated in
As such, when a measurement corresponds to a result of performing front-end operations on data obtained from the IMU, the SLAM accelerator (or the back-end processor included in the SLAM accelerator) may divide the measurement into a first part affected by both of the camera and the IMU, and a second part affected only by the IMU. The SLAM accelerator may compute elements of a matrix block for a camera pose, elements of a matrix block for a map point, and elements of a matrix block for at least one camera pose corresponding to the map point, and then accumulate the computed elements over the optimization matrix firstly by using the first part. Thereafter, the SLAM accelerator may compute elements of a matrix block for the camera pose by using the second part, and then accumulate the computed elements over the optimization matrix. According to an example embodiment, the SLAM accelerator may accumulate the computed elements over the optimization matrix using the second part after accumulating the computed elements over the optimization matrix using the first part.
Referring to
The back-end processor 320 may obtain elements of the Jacobian matrix (JIMU) for the IMU, elements of the Jacobian matrix (Jcam) for the camera pose, elements of the Jacobian matrix (Jpoint) for the map point, and elements of a vector (e) for errors by performing a Jacobian update 1710 for the first data.
The back-end processor 320 may obtain elements of the matrix blocks U, W, and V, and elements of the vector r by performing a hessian update 1720 for data output through the Jacobian update 1710.
Then, the back-end processor 320 may obtain a state change (ΔX) with optimized accumulated errors by performing operations of the Schur-complement operator 1730 and equation operations of the linear solver 1740 on the obtained elements of the matrix blocks U, W, and V and the elements of the vector r. The operations of the Schur-complement operator 1730 and the equation operations of the linear solver 1740 may correspond to the operations according to the above Equation 6. The state change (ΔX) may include a state change ΔX, for a camera pose, and a state change ΔXp for a map point.
According to an example embodiment, the back-end processor 320 may sequentially perform Schur-complement operations based on the map point. For example, the back-end processor 320 may perform Schur-complement operations on elements of a matrix for the first map point and elements of a matrix for at least one camera pose corresponding to the first map point. The back-end processor 320 may accumulate results of performing operations on the first map point in a memory (e.g., the factor graph memory 310).
Then, the back-end processor 320 may perform Schur-complement operations on elements of a matrix for a second map point following the first map point and elements of a matrix for at least one camera pose corresponding to the second map point. The back-end processor sequentially performs Schur-complement operations based on a map point and accumulates the results in a memory to minimize the time required to load the data afterwards.
According to an example embodiment, the back-end processor 320 may obtain second data optimized from the first data based on result values accumulated in the memory. For example, the back-end processor 320 may obtain the second data, which corresponds to the first data in a new state 1750, by applying a state change (ΔX) obtained through operations of the Schur-complement operator 1730 and equation operations of the linear solver 1740 to the first data. The second data (X2) may refer to the state of the map point XP1 and the state of the camera pose XC1 of the first data (X1) applied with ΔXp and ΔXc, respectively. The second data (X2X
Referring to
In operation 1810, the SLAM accelerator may divide operations to obtain elements of a matrix for a map point and a camera pose into a plurality of sub-tracks. A track length of the plurality of sub-tracks may be determined based on the number of camera poses in which the SLAM accelerator (or the back-end processor) is able to perform operations simultaneously.
For example, when the SLAM accelerator is capable of performing operations simultaneously only for two camera poses (or frames), the length of the sub-track may be set to ‘2’. For example, when a particular map point (e.g., P1) is obtained in four frames corresponding to the camera poses C1 to C4, a matrix for the map point (e.g., the matrix block V of
N
subtrack
=N
frame−(subtrack length)+1 [Equation 10]
According to the Equation 10, when the length of the sub-track (subtrack length) is ‘2’, and the number of frames (Nframe) where certain map points are obtained is 4, the number of sub-tracks (Nsubtrack) is ‘3’. Referring to
In operation 1820, the SLAM accelerator may perform operations in relation to a first sub-track, and store a first result value in the memory. The operations for the first sub-track may include optimization operations or Schur-complement operations.
According to an example embodiment illustrated in
The SLAM accelerator may obtain a matrix S1,1 and a vector b1,1 as a first result value by performing Schur-complement operations in relation to the matrix V1,1 and matrixes W1(1) and W1(2). Here, the matrix S has a transpose structure with respect to the diagonal, and as such, the SLAM accelerator may obtain only diagonal elements and upper triangular elements of the matrix S1,1 by performing Schur-complement operations. The SLAM accelerator may store the data of the obtained matrix S1,1 and vector b1,1 in the memory as the first result value.
In operation 1830, the SLAM accelerator may perform operations in relation to a second sub-track, and obtain a second result value. The operations for the second sub-track may include optimization operations or Schur-complement operations.
According to an example embodiment illustrated in
The SLAM accelerator may not load data of the second sub-track 1920 which overlaps with the data of the first sub-track 1910. For example, the SLAM accelerator may determine that the matrix W1(2) for the camera pose C2 corresponding to the first map point at the first sub-track 1910 overlaps with the matrix for the camera pose C2 corresponding to the second map point at the second sub-track 1920. Accordingly, the SLAM accelerator may reduce the amount of data that is loaded by loading only the matrix W1(3) for the camera pose C3 without separately loading the matrix W1(2) for the camera pose C2. As such, the SLAM accelerator may reduce a loading speed by refraining from reloading previously loaded data.
The SLAM accelerator may obtain a matrix S1,2 and a vector b1,2 as a second result value by performing Schur-complement operations in relation to the matrix V1,2 and matrixes W1(2) and W1(3).
In operation 1840, the SLAM accelerator may accumulate the second result value over the first result value. For example, the SLAM accelerator may accumulate the second result value over the memory where the first result value is stored. That is, the SLAM accelerator may overwrite the first result value with the =second result value.
According to an example embodiment illustrated in
Referring to
In one embodiment, the Schur-complement operator may compute a matrix S and a vector b using the following Equation 11. Equation 11 may correspond to pseudo-code for performing Schur-complement operations divided into a plurality of sub-tracks (e.g., five sub-tracks).
for(j=0;j<Nsubtracked_keypoint;j++)
for(i=Csi<Csi+track_length;i++)
S(i,i+0)+=Wj(i)Qj(i−4,i)Wj(i+0)T
S(i,i+1)+=Wj(i)Qj(i−3,i)Wj(i+1)T
S(i,i+2)+=Wj(i)Qj(i−2,i)Wj(i+2)T
S(i,i+3)+=Wj(i)Qj(i−1,i)Wj(i+3)T
S(i,i+4)+=Wj(i)Qj(i−0,i)Wj(i+4)T
b(i)+=Wj(i)Σk=i−4iVj(k)−1vj(k)
(Qj(a,b)=Σk=abVj(k)−1,q=Σk=i−4iVj(k)−1vj(k)) [Equation 11]
In the above Equation 11, i represents an index element for a camera pose, and j represents an index element for a map point.
According to an example embodiment, the inverse multiplier 2010 may receive V data and r data obtained through a hessian update (e.g., the hessian update 1720 of
According to an example embodiment, the Q generator 2020 may generate a vector q and a matrix Qj(a,b) by receiving the inverse matrix of the V data (V−1) and the product of the inverse matrix of the V data and the r data (V−1r) from the inverse multiplier 2010.
According to an example embodiment, the W supplier 2030 may receive W data obtained through the hessian update. For example, the W data may refer to elements of the matrix block W for a camera pose corresponding to a map point in a Hessian matrix (H(Xk)).
According to an example embodiment, the vector-scalar product array 2040 may perform multiplication operations in relation to the vector q and the matrix Qj(a,b) received from the Q generator 2020, and W data received from the W supplier 2030. According to an example embodiment, the tensor product array 2050 may perform tensor product operations in relation to the W data received from the W supplier 2030 and data of multiplication operations performed by the vector-scalar product array 2040. In this case, the tensor product array 2050 may perform tensor product operations in relation to the W data received from the W supplier 2030 through conversion to a transposed matrix by transposing values of rows with values of columns with respect to the diagonal elements.
According to an example embodiment, the vector-scalar product array 2040 may transmit the data of multiplication operations performed on the vector q and the W data to the vector accumulator memory 2060. According to an example embodiment, the tensor product array 2050 may transmit the W data and the tensor product operation data for data obtained from multiplication operations performed by the vector-scalar product array 2040 to the matrix accumulator memory 2070.
With reference to
According to an example embodiment, the W supplier 2030 may include a plurality of W registers corresponding to a plurality of divided sub-tracks. For example, when the W data is divided into five sub-tracks, the W supplier 2030 may include five W registers. In this case, the W registers may include one register including diagonal elements of the W data and four registers including off-diagonal elements of the W data.
According to an example embodiment, the W supplier 2030 may include a plurality of shift registers (W register). For example, the W supplier 2030 may move in consecutive order each register of the received W data through the shift registers. According to an example embodiment, the number of shift registers may be identical to that of the divided sub-tracks.
According to an example embodiment, the W supplier 2030 may include a timing controller (t-con) configured to transmit data processed through the plurality of shift registers to the tensor product array 2050. For example, the data processed by the plurality of shift registers may be input to the timing controller in consecutive order, and the timing controller may transmit simultaneously the plurality of pieces of input data (e.g., Wj(i), Wj(i+1), Wj(i+2), Wj(i+3), and Wj(i+4)) to the tensor product array 2050.
According to an example embodiment, the W supplier 2030 may transmit the data processed through the plurality of shift registers to the vector-scalar product array 2040. For example, the W supplier 2030 may transmit the data (e.g., Wj(i)) processed by the register including diagonal elements of the W data to the vector-scalar product array 2040.
With reference to
According to an example embodiment, the Q generator 2020 may include a plurality of dual registers and adders corresponding to the plurality of divided sub-tracks. For example, the plurality of dual registers and adders may correspond to the shift registers. The Q generator 2020 may move in consecutive order the received inverse matrix (V−1) of the V data and the product (V−1r) of the inverse matrix of the V data and the r data through the plurality of dual registers and adders. In one embodiment, the number of dual registers may be identical to the number of divided sub-tracks.
According to an example embodiment, the Q generator 2020 may include a timing controller (t-con) configured to transmit data processed through the plurality of dual registers and adders to the vector-scalar product array 2040. For example, the data processed by the plurality of dual registers and adders may be input to the timing controller in consecutive order, and the timing controller may transmit simultaneously the plurality of pieces of input data (e.g., q, Qj(i−4,i), Qj(i−3,i), Qj(i−2,i), Qj(i−1,i), and Qj(i,i)) to the vector-scalar product array 2040.
With reference to
According to an example embodiment, the vector-scalar product array 2040 may perform vector-scalar product operations in relation to the W data, the vector q, and the matrix Qj(a,b) data. In such a case, the vector-scalar product array 2040 may perform multiplication operations in relation to the W data and the vector q data, and multiplication operations in relation to the W data and the matrix Qj(a,b) data simultaneously. For example, the vector-scalar product array 2040 may include a plurality of pipelined vector-scalar multipliers, and each pipelined vector-scalar multiplier may perform multiplication operations on the W data and the vector q data and multiplication operations on the W data and the matrix Qj(a,b) data in parallel.
According to an example embodiment, the vector-scalar product array 2040 may store results of multiplication operations on the W data and the vector q data (e.g., Wj(i)q) in the vector accumulator memory 2060. In one embodiment, the vector-scalar product array 2040 may transmit result values of multiplication operations on the W data and the matrix Qj(a,b) data (e.g., Wj(i)Qj(i−4,i), Wj(i)Qj(i−3,i), Wj(i)Qj(i−2,i), Wj(i)Qj(i−1,i), and Wj(i)Qj(i,i)) to the tensor product array 2050.
With reference to
According to an example embodiment, the tensor product array 2050 may perform tensor product operations on the W data and result values of multiplication operations on the W data and the matrix Qj(a,b) data. In this case, the tensor product array 2050 may perform tensor product operations on the W data received from the W supplier 2030 and the result values of multiplication operations on the W data and the matrix Qj(a,b) data received from the vector-scalar product array 2040 simultaneously. For example, the tensor product array 2050 may perform tensor product operations in relation to the W data received from the W supplier 2030 through conversion into a transposed matrix. The W data converted into a transposed matrix may include Wj(i)T, Wj(i+1)T, Wj(i+2)T, Wj(i+3)T, and Wj(i+4)T.
In one embodiment, the tensor product array 2050 may store result values of tensor product operations (e.g., Wj(i)Qj(i−4,i), Wj(i)Qj(i−3,i)Wj(i+1)T, Wj(i)Qj(i−2,i)Wj(i+2)T, Wj(i)Qj(i−1,i)Wj(1+3)T, and Wj(i)Qj(i,i)Wj(i+4)T) in the matrix accumulator memory 2070. As such, the SLAM accelerator according to the disclosure may perform effective operations for a factor graph having complicated connection relations by using division of sub-tracks despite the limitation in a hardware resource. Further, the SLAM accelerator may move data in consecutive order by using shift registers, and by reusing the existing data, operations for each sub-track unit may be processed all at once. Accordingly, optimization operations may be performed with low power and high speed.
The SLAM accelerator may load the Nth keypoint measurement to the Kth cycle. The SLAM accelerator may sequentially perform operations related to the Nth measurement over consecutive cycles (i.e., the K+1th cycle to K+7th cycle) after the Nth measurement is loaded to the Kth cycle. For example, the SLAM accelerator may sequentially perform data load, computation of reprojection error, generation of a Jacobian matrix (or elements of a Jacobian matrix), generation of a Hessian matrix (or elements of a Hessian matrix), Q generator, W supplier, vector-scalar product, tensor product, vector accumulation, and matrix accumulation in relation to the Nth measurement.
Further, the SLAM accelerator may load the N+1th measurement to the K+1th cycle. The SLAM accelerator may perform operations related to the N+1th measurement, following the operations related to the Nth measurement, after the N+1th measurement is loaded to the K+1th cycle. For example, the SLAM accelerator may perform data load for the N+1th measurement simultaneously with performing reprojection in relation to the Nth measurement in the K+1th cycle. Then, the SLAM accelerator may perform reprojection for the N+1th measurement simultaneously with performing generation of a Jacobian matrix for the Nth measurement in the K+2th cycle.
In addition to this, the SLAM accelerator may load the N+2th measurement in the K+1th cycle, and perform operations for the N+2th measurement, following the operations for the Nth and N+1th measurements. As such, the SLAM accelerator may include a pipeline structure configured to perform each of the plurality of operations for optimizing state variables, and perform operations in relation to the plurality of measurements in parallel. Accordingly, optimization operations may be performed with high speed.
According to an example embodiment, the aforementioned method of accelerating SLAM may be recorded on a computer-readable recording medium on which one or more programs including instructions to execute the method are recorded. The computer-readable recording medium may include a hardware device specifically configured to store and execute program instructions, such as magnetic media including a hard disk, a floppy disk, and a magnetic tape, optical media such as a CD-ROM and a DVD, magneto-optical media such as a floptical disk, ROM, RAM, flash memory, etc. The program instructions may include not only machine language code, which is made by a compiler, but high level language code executable by a computer by using an interpreter, etc.
It should be understood that example embodiments described herein should be considered in a descriptive sense only and not for purposes of limitation. Descriptions of features or aspects within each example embodiment should typically be considered as available for other similar features or aspects in other embodiments. While one or more example embodiments have been described with reference to the figures, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope as defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0053759 | Apr 2021 | KR | national |