RGB-D camera based tracking system and method thereof

Description

FIELD

This disclosure relates generally to tracking systems and, more particularly, to a RGB-D camera based tracking system and method thereof.

BACKGROUND

Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to the prior art by inclusion in this section.

SUMMARY

A summary of certain embodiments disclosed herein is set forth below. It should be understood that these aspects are presented merely to provide the reader with a brief summary of these certain embodiments and that these aspects are not intended to limit the scope of this disclosure. Indeed, this disclosure may encompass a variety of aspects that may not be set forth below.

Embodiments of the disclosure related to a method for computing visual Simultaneous localization and Mapping (SLAM). The method comprises generating, by a visual odometry module, a local odometry estimate; generating, by a keyframe generator, keyframes; creating keyframe graph; adding constraint to the keyframe graph using a loop constraint evaluator; and optimizing the keyframe graph with trajectory. The method further comprising generating a new keyframe between a keyframe and a current frame before generating a local odometry estimate. The method of adding constraint to the keyframe graph using a loop constraint evaluator is based on a loop closure wherein the loop closure is the return to previously visited locations. The method further comprises adjusting a pose graph based on edge heights of different constraints in the keyframe graph after optimization.

According to another aspect of the disclosure, a method of applying a probabilistic sensor model for a dense visual odometry comprises generating, by a keyframe generator, keyframes, creating keyframe graph, adding constraint to the keyframe graph using a loop constraint evaluator, and optimizing the keyframe graph with trajectory. The method further comprises generating a new keyframe between a keyframe and a current frame before generating a local odometry estimate. The method of adding constraint to the keyframe graph using a loop constraint evaluator is based on a loop closure wherein the loop closure is the return to previously visited locations. The method further comprises adjusting a pose graph based on edge heights of different constraints in the keyframe graph after optimization.

According to another aspect of the disclosure, a method of t-distribution for photometric errors and a probabilistic sensor model for geometric errors comprises:

${\hat{ξ}}_{Hybrid} = \underset{ξ}{\arg \min} \sum_{i = 1}^{n} r_{i}^{T} W_{i}^{1 / 2} Σ^{- 1} w_{i}^{1 / 2} r_{i}$

According to another aspect of the disclosure, a visual SLAM system comprises a plurality of keyframes including a keyframe, a current keyframe, and a previous keyframe, a dual dense visual odometry configured to provide a pairwise transformation estimate between two of the plurality of keyframes, a frame generator configured to create keyframe graph, a loop constraint evaluator adds a constraint to the receiving keyframe graph, and a graph optimizer configured to produce a map with trajectory.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features, aspects, and advantages of this disclosure will become better understood when the following detailed description of certain exemplary embodiments is read with reference to the accompanying drawings in which like characters represent like arts throughout the drawings, wherein:

FIG. 1 is a block diagram illustrating a visual SLAM system;

FIG. 2 is a block diagram illustrating the structure of an example keyframe graph and loop constraint evaluator;

FIG. 3 illustrates a RGB-D camera sensor model; a

FIG. 4 is a block diagram of an uncertainty propagation; and

FIG. 5 illustrates an example of a map generated by a σ-DVO SLAM system

DETAILED DESCRIPTION

The following description is presented to enable any person skilled in the art to make and use the described embodiments, and is provided in the context of a particular application and its requirements. Various modifications to the described embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the described embodiments. Thus, the described embodiments are not limited to the embodiments shown, but are to be accorded the widest scope consistent with the principles and features disclosed herein.

Various operations may be described as multiple discrete actions or operations in turn, in a manner that is most helpful in understanding the claimed subject matter. However, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations may not be performed in the order of presentation. Operations described may be performed in a different order than the described embodiment. Various additional operations may be performed and/or described operations may be omitted in additional embodiments.

FIG. 1 is a block diagram illustrating a visual Simultaneous Localization and Mapping (SLAM) system 100 divided into frontend 100a and backend 100b. At the frontend 100a, the system 100 uses visual odometry approach by making full use of all pixel information from an RGB-D camera to generate a local transformation estimate 112. Which is to say, dense visual odometry 108 or 110 provides a pairwise transformation estimate between two image frames 102, 104, 106. As illustrated, pairwise transformation estimate is performed between keyframe 102 and current frame 104 using dense visual odometry 108. Second pairwise transformation estimate is performed between current frame 104 and previous frame 106 using dense visual odometry 110. A keyframe generator 114 is used to generate a keyframe V_kbased on the quality of the odometry estimate. At the backend 100b of the system 100, a keyframe graph G⊂{V_k} 116 using the keyframe generator 114 is created. At a loop constraint evaluator 118, constraints based on the return, e.g. loop closure to previously visited locations are added to the keyframe graph to improve its connectivity. Graph optimizer 120 then optimizes the final graph with constraints to produce an optimized map with trajectory 122. More details on the keyframe graph 116 and the loop constraint evaluator 118 will be described below. A probabilistic sensor model is used in the front-end 100a and performs keyframe generation 114, and loop constraint detection 118 and graph optimization 120 in the back-end 100b.

FIG. 2 is a block diagram illustrating a structure of an example keyframe graph 200 comprises a backend graph optimization 202 and a local neighborhood 204. In the back-end graph optimization 202, the loop constraints L_Ki,Kjcombined with odometry constraints weighted by I_{K_i,K_j}is optimized. Recent keyframe K₁and the frames tracked with respect to K₁, (f₁, . . . , f_n) are included in the local neighborhood 204. The keyframes K₁and the tracked frames (f₁, . . . , f_n) are determined based on the ratio of entropies H_K1,f1/H_K1,fn. When the current frame does not contain sufficient information to track a new frame, a new keyframe is generated by using entropy of a camera pose estimate. The camera pose estimate generates a new keyframe when the estimated entry between the keyframe and the current frame falls below a threshold normalized by the largest estimate entropy in the local neighborhood 204. The largest estimate entropy is assumed to be the one between the keyframe and the first frame. An additional key frame generation strategy based on the curve estimate of the camera trajectory is proposed. The curve estimate ρ_i,kbetween Frames i and k is defined as the ratio of the sum of the translations between the frames (δ_i,i−1) in the local neighborhood N with respect to the translation between the keyframe and the latest frame (δ_i,k).

$\begin{matrix} ρ = \frac{\sum_{i \in N}^{k} δ_{i, i - 1}}{δ_{i, k}} & equation (1) \end{matrix}$

The return to a previously visited location helps identify additional constraints to the graph called loop closure at the loop constraint evaluator 118 as illustrated in FIG. 2. After optimization, the pose graph is adjusted based on the edge weights of different constraints in the graph. An erroneous loop constraint sometime can lead to a poorly optimized final trajectory. Extending previous loop constraint generation methods, two additional techniques can be used to reduce the impact of wrong loop constraints. Firstly, the loop closure constraints are weighted based on the inverse square of the metric distance between the keyframes that form the loop closure. This is based on the intuition that loop constraint between far frames is prone to a larger error than frames close to one another. Secondly, occlusion filtering is performed to remove false loop closure constraints. The depth image provides geometry information which can be used to perform occlusion filtering between two keyframes. The standard deviation of sensor model uncertainty of a depth point provides a bound on the maximum possible depth shift of the following equation:

$\begin{matrix} η (Z_{i}) = \frac{q_{pix} bf}{2} [\frac{1}{Rnd (\frac{q_{pix} bf}{Z_{i}} - 0.5)} - \frac{1}{Rnd (\frac{q_{pix} bf}{Z_{i}} + 0.5)}] & equation (2) \end{matrix}$

All points which violates this assumption are considered as occlusion.

On generation of a new keyframe, the back-end graph is updated with the previous keyframe information and a double window graph structure 200 is created. The pose graph in the back-end is optimized using for example an open source library, g2o. A final optimization on the termination of the visual odometry is performed to generate optimized camera trajectory estimate.

Generally, RGB-D cameras project infra-red patterns and recover depth from correspondences between two image views with a small parallax. During this process, the disparity is quantized into sub-pixels. This introduces a quantization error in the depth measurement. The noise due to quantization error in depth measurement is defined as

$\begin{matrix} η (Z_{i}) = \frac{q_{pix} bf}{2} [\frac{1}{Rnd (\frac{q_{pix} bf}{Z_{i}} - 0.5)} - \frac{1}{Rnd (\frac{q_{pix} bf}{Z_{i}} + 0.5)}] & equation (3) \end{matrix}$

where q_pixis the sub-pixel resolution of the device, b is the baseline, and f is the focal length. This error increases quadratically with range Z_i, thus preventing the use of depth observations from far objects. The 3D sensor noise of RGB-D cameras can be modeled with a zero-mean multivariate Gaussian distribution whose covariance matrix has the following as the diagonal components:

$\begin{matrix} σ_{11}^{2} = \tan (\frac{β_{x}}{2}) Z_{i}, σ_{22}^{2} = \tan (\frac{β_{y}}{2}) Z_{i}, σ_{33}^{2} = {η (Z_{i})}^{2} & equation (4) \end{matrix}$

where the σ₃₃²direction is along the ray, and β_xand β_ydenote the angular resolutions in x and y directions.

FIG. 3 illustrates a RGB-D camera sensor model. The camera is located at the origin and is looking up in the z direction. For each range of 1, 2, and 3 meters, 80 points are sampled and their uncertainties are expressed with ellipsoids. The error in the ray direction increases quadratically.

FIG. 4 is a block diagram of an uncertainty propagation. Each 3D point p_iin FIG. 4 is associated with a Gaussian distribution whose covariance matrices are Σ₁and Σ₁′, respectively,

p(p_i)= custom character (p_i,Σ_i) equation (5)

where

$\begin{matrix} \sum_{i} = R_{ray} [\begin{matrix} σ_{11}^{2} & 0 & 0 \\ 0 & σ_{22}^{2} & 0 \\ 0 & 0 & σ_{33}^{2} \end{matrix}] R_{ray}^{T} & equation (6) \end{matrix}$

R_raydenotes the rotation matrix between the ray and camera coordinates.

A method of linearization is used to propagate the uncertainty to the residuals and the likelihood function can be expressed as a Gaussian distribution,

p(r|ξ)= custom character (μ_i,Σ_i) equation (7)

where

$\begin{matrix} μ_{i} = [\begin{matrix} μ_{i}^{I} \\ μ_{i}^{Z} \end{matrix}] = [\begin{matrix} I_{2} (π (g ({\overline{p}}_{i}, ξ))) - I_{1} (x_{i}) \\ Z_{2} (π (g ({\overline{p}}_{i}, ξ))) - {[g ({\overline{p}}_{i}, ξ)]}_{Z} \end{matrix}] & equation (8) \\ \sum_{i} = J_{i} \sum_{i} J_{i}^{⊤} + diag (0, {[\sum_{i}^{'}]}_{3, 3}) & equation (9) \\ J_{i}^{⊤} = [\begin{matrix} \nabla r_{i}^{I} & \nabla r_{i}^{Z} \end{matrix}] = [\begin{matrix} \frac{\partial r_{i}^{I}}{\partial p_{i}} & \frac{\partial r_{i}^{Z}}{\partial p_{i}} \end{matrix}] & equation (10) \end{matrix}$

Here, [Σ_i′]_3,3denotes the variance of the back-projected point q_i′ in the z axis of the current camera coordinates as shown in FIG. 4. The maximum likelihood estimation is,

$\begin{matrix} {\hat{ξ}}_{Sensor} = \underset{ξ}{argmin} \sum_{i = 1}^{n} r_{i}^{⊤} \sum_{i}^{- 1} r_{i} & equation (11) \end{matrix}$

The individual precision matrix is split as two square roots Σ_i⁻¹=Σ_i^−1/2Σ_i^−1/2and normalize it by applying the single precision matrix of the weighted residuals Σ⁻¹as

$\begin{matrix} {\hat{ξ}}_{Sensor} = \underset{ξ}{argmin} \sum_{i = 1}^{n} r_{i}^{⊤} \sum_{i}^{- 1 / 2} \sum^{- 1} \sum_{i}^{- 1 / 2} r_{i} & equation (12) \end{matrix}$

The photometric and geometric errors can be defined as,

$\begin{matrix} r_{i} = [\begin{matrix} r_{i}^{I} \\ r_{i}^{Z} \end{matrix}] = [\begin{matrix} I_{2} (π (g (π^{- 1} (x_{i}, Z_{i}), ξ))) - I_{1} (x_{i}), \\ Z_{2} (π (g (π^{- 1} (x_{i}, Z_{i}), ξ))) - {[g (π^{- 1} (x_{i}, Z_{i}), ξ)]}_{Z} \end{matrix}] & equation (13) \end{matrix}$

where Z_i=Z₁(x_i) and [·]z denotes the z component of the vector.

To find the relative camera pose which minimizes the photometric and geometric errors, the energy function is the sum of weighted square errors as

$\begin{matrix} \hat{ξ} = \underset{ξ}{argmin} \sum_{i = 1}^{n} r_{i}^{⊤} {Wr}_{i} & equation (14) \end{matrix}$

where n is the total number of valid pixels, and W∈R^2×2denotes the weights for different errors.

Since the energy function is non-linear with respect to the relative camera pose ξ, the Gauss-Newton algorithm is usually applied to numerically find the optimal solution and the equation (14) is now updated to:

ξ_k+1=ξ_k+Δξ,(J^T(I_n⊗W)J)Δξ=−J^T(I_n⊗W)r equation (15)

where □ denotes the Kronecker product, r=(r₁, . . . , r_n)^T∈R^2n×1, and the Jacobian matrix is defined as

$\begin{matrix} J = [\begin{matrix} J_{1} \\ ⋮ \\ J_{n} \end{matrix}] \in ℝ^{2 n \times 6}, J_{i} = [\begin{matrix} \frac{\partial r_{i}^{I}}{\partial ξ_{1}} & \dots & \frac{\partial r_{i}^{I}}{\partial ξ_{6}} \\ \frac{\partial r_{i}^{Z}}{\partial ξ_{1}} & \dots & \frac{\partial r_{i}^{Z}}{\partial ξ_{6}} \end{matrix}] \in ℝ^{2 \times 6} & equation (16) \end{matrix}$

Eq. (14) is equivalent with maximum likelihood estimation where each residual is independent and follows an identical Gaussian distribution,

$\begin{matrix} {\hat{ξ}}_{ML} = \underset{ξ}{argmax} \sum_{i = 1}^{n} \log p (r_{i} | ξ) & equation (17) \end{matrix}$

where p(r_i|ξ)=N(0, Σ). Note that this corresponds to the case of W=Σ⁻¹in Eq. (14). The Eq. (17) can be rewritten as:

$\begin{matrix} {\hat{ξ}}_{DVO} = \underset{ξ}{argmin} \sum_{i = 1}^{n} w_{i} r_{i}^{⊤} \sum^{- 1} r_{i} & equation (18) \end{matrix}$

where w_i=(v+2)/(v+r_i^TΣ⁻¹r_i). Note that this corresponds to the case of W=w_iΣ⁻¹in Eq. (14).

A T-distribution for photometric errors and propagate a sensor model of a Gaussian distribution for geometric errors by combining Eq (11) AND Eq (18) to now defined as σ-dense visual odometry (σ-DVO):

$\begin{matrix} {\hat{ξ}}_{Hybrid} = \underset{ξ}{argmin} \sum_{i = 1}^{n} r_{i}^{⊤} W_{i}^{1 / 2} \sum^{- 1} W_{i}^{1 / 2} r_{i} & equation (19) \end{matrix}$

where the weight matrix w_i=diag(w_i^I, w_i^Z) and

$\begin{matrix} w_{i}^{I} = \frac{v + 1}{v + {(\frac{r_{i}^{I}}{σ})}^{2}} & equation (20) \\ w_{i}^{Z} = \frac{1}{\nabla r_{i}^{Z} \sum_{i}^{- 1} \nabla r_{i}^{Z_{⊤}} + {[\sum_{i}^{'}]}_{3, 3}} & equation (21) \end{matrix}$

The σ-DVO algorithm can be implemented in any suitable client devices such as smart phone, tablet, mobile phone, personal digital assistant (PDA), and any devices. Back to FIG. 1, the SLAM system 100 with integrated σ-DVO algorithm uses smaller number of keyframes and is due to a reduced drift in the system. A reduced number of keyframes indicates less computational requirements in the back-end of the system.

FIG. 5 illustrates an example of a map generated by a σ-DVO SLAM system 100. As can be seen, a consistent trajectory is generated using the σ-DVO SLAM system 100.

The embodiments described above have been shown by way of example, and it should be understood that these embodiments may be susceptible to various modifications and alternative forms. It should be further understood that the claims are not intended to be limited to the particular forms disclosed, but rather to cover all modifications, equivalents, and alternatives falling with the sprit and scope of this disclosure.

Embodiments within the scope of the disclosure may also include non-transitory computer-readable storage media or machine-readable medium for carrying or having computer-executable instructions or data structures stored thereon. Such non-transitory computer-readable storage media or machine-readable medium may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such non-transitory computer-readable storage media or machine-readable medium can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures. Combinations of the above should also be included within the scope of the non-transitory computer-readable storage media or machine-readable medium.

Embodiments may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination thereof) through a communications network.

Computer-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, objects, components, and data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.

While the patent has been described with reference to various embodiments, it will be understood that these embodiments are illustrative and that the scope of the disclosure is not limited to them. Many variations, modifications, additions, and improvements are possible. More generally, embodiments in accordance with the patent have been described in the context or particular embodiments. Functionality may be separated or combined in blocks differently in various embodiments of the disclosure or described with different terminology. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure as defined in the claims that follow.

Claims

1. A method for visual Simultaneous localization and Mapping (SLAM), the method comprising: capturing, with a camera, image frames of an environment, each image frame having an array of pixels, each pixel having an intensity and a depth;selecting, with a processor, keyframes from the image frames;determining, with the processor, an odometry estimate based on the image frames, the odometry estimate including estimated camera poses from which each of the image frames were captured, the estimated camera poses being determined in part based on a sensor noise model that indicates an uncertainty of the depth of each pixel of each of the image frames;generating, with the processor, a keyframe graph including the keyframes, relative transformations between the estimated camera poses of the keyframes defining first constraints for the keyframe graph;adding, with the processor, at least one second constraint to the keyframe graph in response to detecting at least one loop closure using a loop constraint evaluator; andoptimizing the keyframe graph and the odometry estimate based on the estimated camera poses of the keyframes and the first constraints and the at least one second constraint of the keyframe graph.
2. The method of claim 1, the determining the odometry estimate further comprising: determining the estimated camera poses in part based on t-distribution that indicates an uncertainty of the intensity of each pixel of each of the image frames.
3. The method of claim 1 wherein the sensor noise model indicates relatively more uncertainty for depths relatively further from the camera and relatively less uncertainty for depths relatively closer from the camera.
4. The method of claim 1, the optimizing the keyframe graph further comprising: weighting the first constraints and the at least one second constraint of the keyframe graph based on a propagated uncertainty of the depth of each pixel of each of the image frames.
5. A method for dense visual odometry, the method comprising: capturing, with a camera, image frames, each image frame having an array of pixels, each pixel having an intensity and a depth;selecting, with a processor, keyframes from the image frames;determining, with the processor, an odometry estimate based on the image frames, the odometry estimate including estimated camera poses from which each of the image frames were captured, the estimated camera poses being determined in part based on a sensor noise model that indicates an uncertainty of the depth of each pixel of each of the image frames;generating, with the processor, a keyframe graph including the keyframes, relative transformations between the estimated camera poses of the keyframes defining constraints for the keyframe graph;optimizing the keyframe graph and the odometry estimate based on the estimated camera poses of the keyframes and the constraints of the keyframe graph.
6. The method of claim 5, the determining the odometry estimate further comprising: determining the estimated camera poses in part based on t-distribution that indicates an uncertainty of the intensity of each pixel of each of the image frames.
7. The method of claim 5 wherein the sensor noise model indicates relatively more uncertainty for depths relatively further from the camera and relatively less uncertainty for depths relatively closer from the camera.
8. The method of claim 5, the optimizing the keyframe graph further comprising: weighting the first constraints and the at least one second constraint of the keyframe graph based on a propagated uncertainty of the depth of each pixel of each of the image frames.
9. A visual Simultaneous localization and Mapping (SLAM) system, the system comprising: a camera configured to capture image frames of a 3D environment, each image frame having an array of pixels, each pixel having an intensity and a depth; anda processor configured to: select keyframes from the image frames;determine an odometry estimate based on the image frames, the odometry estimate including estimated camera poses from which each of the image frames were captured, the estimated camera poses being determined in part based on a sensor noise model that indicates an uncertainty of the depth of each pixel of each of the image frames;generate a keyframe graph including the keyframes, relative transformations between the estimated camera poses of the keyframes defining constraints for the keyframe graph; andoptimize the keyframe graph and the odometry estimate based on the estimated camera poses of the keyframes and the constraints of the keyframe graph.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a 35 U.S.C. § 371 National Stage Application of PCT/EP2017/065677, filed on Jun. 26, 2017, which claims the benefit of U.S. Provisional Application No. 62/354,251, filed Jun. 24, 2016, the disclosures of which are herein incorporated by reference in their entirety.

PCT Information

Filing Document	Filing Date	Country	Kind
PCT/EP2017/065677	6/26/2017	WO	00

Publishing Document	Publishing Date	Country	Kind
WO2017/220815	12/28/2017	WO	A

US Referenced Citations (2)

Number	Name	Date	Kind
20120306847	Lim et al.	Dec 2012	A1
20140333741	Roumeliotis	Nov 2014	A1

Non-Patent Literature Citations (28)

Entry
International Search Report corresponding to PCT Application No. PCT/EP2017/065677, dated Oct. 10, 2017 (English language document) (6 pages).
Kerl, Christian, et al., Dense Visual SLAM for RGB-D Cameras, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nov. 3-7, 2013, Tokyo, Japan, pp. 2100-2106, XP_32537192A.
Kerl, Christian, et al., Robust Odometry Estimation for RGB-D Cameras, 2013 IEEE International Conference on Robotics and Automation (ICRA), May 6-10, 2013, Karlsruhe, Germany, pp. 3748-3754, XP_32506020A.
Strasdat, Hauke et al., Double Window Optimisation for Constant Time Visual SLAM, 2011 IEEE International Conference on Computer Vision, Nov. 6, 2011, pp. 2352-2359, XP_32101470A.
Babu, Benzun Wisely, et al., σ-DVO: Sensor Noise Model Meets Dense Visual Odometry, 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Sep. 19, 2016, pp. 18-26, XP_33023403A.
Konolige, K. et al., Motilal, FrameSLAM: From Bundle Adjustment to Real-Time Visual Mapping, IEEE Transactions on Robotics, vol. 24, No. 5, Oct. 2008, pp. 1066-1077.
Audras, C. et al., Real-time dense appearance-based SLAM for RGB-D sensors, Proceedings of Australian Conference on Robotics and Automation, Dec. 2011, 10 pages.
Baker S. et al., Lucas-Kanade 20 Years On: A Unifying Framework, International Journal of Computer Vision, 56(3), pp. 221-255, 2004.
Gauglitz, S. et al., Live Tracking and Mapping from Both General and Rotation-Only Camera Motion, IEEE International Symposium on Mixed and Augmented Reality, 2012, pp. 13-22.
Kümmerle, R. et al., g2o: A General Framework for graph Optimization, 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, May 2011, pp. 3607-3613.
Leonard, J. et al., Simultaneous Map Building and Localization for an Autonomous Mobile Robot, IEEE/RSJ International Workshop on Intelligent Robots and Systems IROS '91, vol. 3, Nov. 1991, Osaka, Japan, pp. 1442-1447.
Maimone, M. et al., Two Years of Visual Odometry on the Mars Exploration Rovers, Journal of Field Robotics, vol. 24, No. 3, pp. 169-186, 2007.
Mur-Artal, R. et al., ORB-SLAM: A Versatile and Accurate Monocular SLAM System, IEEE Transactions on Robotics, vol. 31(5), pp. 1147-1163, Oct. 2015.
Newcombe, R. et al., KinectFusion: Real-Time Dense Surface Mapping and Tracking, IEEE International Symposium on Mixed and Augmented Reality, Oct. 26-29, 2011, pp. 127-136.
Segal, A. et al., Generalized-ICP, Proceedings of Robotics: Science and Systems, Seattle, WA, USA, Jun. 28-Jul. 1, 2009, 8 pages.
Sturm, P. et al., A Factorization Based Algorithm for Multi-Image Projective Structure and Motion, 4th European Conference on Computer Vision, Cambridge, England, Apr. 1996, 10 pages.
Whelan, T. et al., ElasticFusion: Dense SLAM Without a Pose Graph, Robotics: Science and Systems, Rome, Italy, Jul. 2015, 9 pages.
Stückler, J. et al., Multi-Resolution Surfel Maps for Efficient Dense 3D Modeling and Tracking, Journal of Visual Communication and Image Representation, vol. 25(1), Jan. 2014, 30 pages.
Davison, A. J., Real-Time Simultaneous Localisation and Mapping with a Single Camera, Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV'03), vol. 2, Oct. 2003, 8 pages.
Endres, F. et al., An Evaluation of the RGB-D SLAM System, 2012 IEEE International Conference on Robotics and Automation, May 14-18, 2012, St. Paul, Minnesota, pp. 1691-1696.
Klein, G. et al., Parallel Tracking and Mapping for Small AR Workspaces, 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, 2007; ISMAR 2007, Nov. 2007, 10 pages.
Marchand, E. et al., Pose Estimation for Augmented Reality: A Hands-On Survey, IEEE Transaction on Visualization and Computer Graphics, vol. 22, No. 12, Dec. 2016, pp. 2633-2651.
Gutiérrez-Gómez, D. et al., Inverse Depth for Accurate Photometric and Geometric Error Minimisation in RGB-D Dense Visual Odometry, 2015 IEEE International Conference on Robotics and Automation (ICRA), May 2015, pp. 83-89.
Sturm, J. et al., A Benchmark for the Evaluation of RGB-D SLAM Systems, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573-580, Oct. 2012.
Newcombe, R. et al., DTAM: Dense Tracking and Mapping in Real-Time, 2011 IEEE International Conference on Computer Vision, pp. 2320-2327, Nov. 2011.
Whelan, T. et al., Real-Time Large-Scale Dense RGB-D SLAM with Volumetric Fusion, The International Journal of Robotics Research 2015, vol. 34(4-5), pp. 598-626, Apr. 2015.
Ruhnke, M. et al., Highly Accurate 3D Surface Models by Sparse Surface Adjustment, 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 751-757, May 2012.
Press, W. H. et al., Numerical Recipes in C, The Art of Scientific Computing, Second Edition, Cambridge University Press, 1992, Section 10, pp. 408-412.

Related Publications (1)

	Number	Date	Country
	20190377952 A1	Dec 2019	US

Provisional Applications (1)

	Number	Date	Country
	62354251	Jun 2016	US

RGB-D camera based tracking system and method thereof

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

International Classifications

Term Extension

Abstract