The present invention relates to techniques for mapping a three-dimensional (3D) space. The invention has particular, but not exclusive, relevance to generating a height map based on a sequence of images from a monocular camera, the sequence having been captured during a movement of the camera relative to a 3D space.
In the field of computer vision and robotics, in order to navigate a 3D space, such as an interior room, robotic devices may employ a range of techniques.
Simple navigation solutions may rely on limited perception and simple algorithms, for example an infra-red or ultrasonic sensor that detects objects within a line of site that may then be avoided.
Alternatively, more advanced solutions may employ tools and methods to construct a representation of a surrounding 3D space to enable navigation of the 3D space. Known techniques for constructing a representation of a 3D space include “structure from motion” and “multi-view stereo”. Certain techniques, known as “sparse”, use a reduced number of points or features, for example ten to a hundred, to generate a representation. These may be contrasted with “dense” techniques that generate representations with many thousands or millions of points. Typically, “sparse” techniques are easier to implement in real-time, for example at a frame rate of 30 frames-per-second or so since they use a limited number of points or features and thus limit the extent of the processing compared to more resource-intensive “dense” mapping techniques.
While great progress has been made around techniques such as “Simultaneous Localisation And Mapping” (SLAM) (see J. Engel, T. Schoeps, and D. Cremers. “LSD-SLAM: Large-scale direct monocular SLAM”. In Proceedings of the European Conference on Computer Vision (ECCV), 2014, and R. Mur-Artal and J. D. Tardos. “ORB-SLAM: Tracking and mapping recognizable features. In Workshop on Multi View Geometry in Robotics (MVIGRO)”—RSS 2014, 2014), the more advanced solutions typically rely on substantial computational resources and specialised sensor devices (such as LAser Detection And Ranging—LADAR—sensors, structured light sensors, or time-of-flight depth cameras) which make them difficult to translate to embedded computing devices that tend to control real-world commercial robotic devices such as, for example, relatively low-cost domestic floor cleaning robots.
Therefore, there is a desire for a dense, real-time mapping solution which can be implemented on a low-cost robotic device.
According to a first aspect of the present invention, there is provided an apparatus for mapping an observed 3D space. The apparatus comprises a mapping engine configured to generate a surface model for the space, a depth data interface to obtain a measured depth map for the space, a pose data interface to obtain a pose corresponding to the measured depth map, and a differentiable renderer. The differentiable renderer renders a predicted depth map as a function of the surface model and the pose from the pose data interface, and calculates partial derivatives of predicted depth values with respect to the geometry of the surface model. The mapping engine is further configured to evaluate a cost function comprising at least an error between the predicted depth map and the measured depth map, reduce the cost function using the partial derivatives from the differentiable renderer, and update the surface model using geometric parameters for the reduced cost function. Preferably, the differentiable renderer and the mapping engine are further configured to repeat their respective steps, iteratively, re-rendering the predicted depth map using the updated surface model, reducing the cost function, and updating the surface model. Preferably still, the surface model is updated until the depth map optimization (from the cost function minimization) converges.
In certain examples, the surface model comprises a fixed topology triangular mesh. In further examples, the surface model comprises a set of height values in relation to a reference plane within the space.
In some cases, the mapping engine is further configured to apply a threshold limit to the height values to calculate navigable space with respect to the reference plane.
In one variation, the mapping engine implements a generative model, which provides a depth map of the space as a sampled variable given at least the surface model and the pose as parameters.
In a further variation, the mapping engine is configured to linearize an error based on a difference between a measured depth map value and a corresponding rendered depth map value following the iterative minimization of the cost function, and use the said linearized error terms in at least one subsequent update of the surface model. The linearized error terms represent a measure of uncertainty in the estimated surface model. The linearized error terms enable the use of a recursive formulation that allows information from at least one, and typically a plurality, of past measurements to be used as prior probability values. These prior probability values may be jointly minimized with the residual errors calculated in the at least one subsequent update.
In a further example, there is also provided a robotic device incorporating the apparatus described above, and further comprising at least one image capture device to record a plurality of frames comprising one or more of depth data and image data. The robotic device also comprises a depth map processor to determine a depth map from the sequence of frames, and a pose processor to determine a pose of the at least one image capture device from the sequence of frames. The depth data interface of the apparatus is communicatively coupled to the depth map processor of the robotic device, and the pose data interface of the apparatus is communicatively coupled to the pose processor of the robotic device. One or more movement actuators are arranged to move the robotic device within the space, and a controller is arranged to control the one or more movement actuators, and is configured to access the surface model generated by the mapping engine to navigate the robotic device within the space.
In one example, the robotic device comprises a vacuuming system, and in a further example, the controller is arranged to selectively control the vacuuming system in accordance with the surface model generated by the mapping engine.
In some cases the image capture device is a monocular camera.
In a second embodiment of the invention, there is provided a method of generating a model of a 3D space. The method comprises obtaining a measured depth map for the space, obtaining a pose corresponding to the measured depth map, obtaining an initial surface model for the space, rendering a predicted depth map based upon the initial surface model and the obtained pose, obtaining, from the rendering of the predicted depth map, partial derivatives of the depth values with respect to geometric parameters of the surface model, reducing, using the partial derivatives, a cost function comprising at least an error between the predicted depth map and the measured depth map, and updating the initial surface model based on values for the geometric parameters from the cost function. Preferably, the method may be repeated, iteratively, each time rendering an updated predicted depth map based upon the previously updated surface model and the obtained pose, obtaining updated partial derivatives of the depth values with respect to geometric parameters of the previously updated surface model; optimizing the updated rendered depth map by minimizing, using the updated partial derivatives, a cost function comprising at least an error between the updated rendered depth map and the measured depth map, and updating the previous surface model based on values for the geometric parameters from the latest depth map following optimization. The method may be repeated until the optimization converges to a predetermined threshold.
Preferably, the method also comprises obtaining an observed color map for the space, obtaining an initial appearance model for the space, rendering a predicted color map based upon the initial appearance model, the initial surface model and the obtained pose, and obtaining, from the rendering of the predicted color map, partial derivatives of the color values with respect to parameters of the appearance model. The rendered color map is iteratively optimized by minimizing, using the partial derivatives, a cost function comprising an error between the predicted color map and the measured color map, and updating the initial appearance model based on values for the parameters of the appearance model from the color map following iterative optimization.
In some examples, the surface model comprises a fixed topology triangular mesh and the geometric parameters comprise at least a height above a reference plane within the space, and each triangle within the triangular mesh comprises three associated height estimates.
In other cases, the cost function comprises a polynomial function applied to each triangle within the triangular mesh.
In one variation, the predicted depth map comprises an inverse depth map, and for a given pixel of the predicted depth map, a partial derivative for an inverse depth value associated with the given pixel with respect to geometric parameters of the surface model comprises a set of partial derivatives of the inverse depth value with respect to respective heights of vertices of a triangle within the triangular mesh, said triangle being one that intersects a ray passing through the given pixel.
In other variations, the cost function comprises a function of linearized error terms, said error terms resulting from at least one previous comparison of the rendered depth map and the measured depth map, said error terms being linearized from said partial derivatives. In this manner error information from a given comparison, as represented within the partial derivatives, may be used in subsequent comparisons. For example, a set of linearized error terms representing a plurality of past comparisons may be jointly reduced with a set of non-linear error terms representing a current comparison.
In one example, the surface model is updated by reducing the cost function using a gradient-descent method.
In other examples, the method also comprises determining a set of height values from the surface model for the space, and determining an activity program for a robotic device according to the set of height values.
In a third embodiment of the invention, there is provided a non-transitory computer-readable storage medium comprising computer-executable instructions which, when executed by a processor, cause a computing device to obtain an observed depth map for a 3D space, obtain a pose corresponding to the observed depth map, obtain a surface model comprising a mesh of triangular elements, each triangular element having height values associated with vertices of the element, the height values representing a height above a reference plane, render a model depth map based upon the surface model and the obtained pose, including computing partial derivatives of rendered depth values with respect to height values of the surface model, compare the model depth map to the observed depth map, including determining an error between the model depth map and the observed depth map, and determine an update to the surface model based on the error and the computed partial derivatives.
In one example, the computer-executable instructions cause the computing device to, responsive to the update being determined, fuse nonlinear error terms associated with the update into a cost function associated with each triangular element. Preferably, the computer-executable instructions cause the computing device to iteratively optimize the predicted depth map by re-rendering an updated model depth map based upon an updated surface model, until the optimization converges to a predetermined threshold.
Further features and advantages of the invention will become apparent from the following description of preferred embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings, wherein:
Certain examples described herein relate to apparatus and techniques suitable for mapping a 3D space.
In some examples, the observed depth map data may be used to render (predict) a height map 100 in real-time. The reconstructed height map 100 may be processed to generate a free-space map (see also
In one example, and with regard to
For each new captured frame 210, and provided with initial surface model data 290 of a 3D space and camera pose data 230 from the image capture device, a predicted depth map 250 (and optionally a color map if initial color data is provided) is rendered for the observed 3D space using differentiable rendering (block 231). The resultant rendered depth map 250 is compared (block 251) to a measured depth map 240. The measured depth map 240 has been previously calculated (at block 221), for example by using a plane sweep algorithm, for each image frame 210 with corresponding pose data 220 captured by the image capture device. A nonlinear error 260 between the two depth maps (rendered 250 versus measured 240) is calculated. This nonlinear error value 260 is reduced (block 261) using the partial derivative gradient values 235, calculated as part of the differentiable rendering process (block 231), in order to optimize the rendered depth map, and optionally the color map. In a preferred example each cell on the surface map 290 is updated (block 271) according to the optimized depth map.
The optimization of the depth map (blocks 231, 251, 261) for a given frame 210, and subsequent update to the surface model (block 271) is repeated, iteratively, until the optimization “converges”. The convergence of the optimization may, for example, be when the difference between the rendered depth map 250 and the measured depth map 240 falls below a pre-determined threshold value. The updated surface model 290 is used in conjunction with the original pose date 230 for the captured frame 210 to render an updated predicted depth map 250 (and optionally an updated color map if initial color data is provided) using differentiable rendering (block 231). The resultant updated rendered depth map 250 is compared (block 251) to the original measured depth map 240, and the nonlinear error 260 between the two is used in conjunction with the partial derivative gradient values 235 derived from the rendering process (block 231) to reduce the cost function (block 261). This process is repeated until the optimization converges, for example, when the cost function, or error value between the rendered 250 and measured 240 depth maps fall beneath a predetermined threshold. Once the optimization has converged, the resultant depth map may be “fused” into the surface model ready for the next frame 210 to be calculated, in a recursive manner utilizing the latest update to the surface model 290.
The above-described camera tracking (210, 211, 220, 221, 230, 240) and mapping stages (231, 235, 250, 251, 260, 261, 271, 290) may be treated separately to simplify the method. In a first step, only the camera tracking and pose is estimated (block 211), and is subsequently treated as a fixed quantity for the duration of the rendering (block 231) and iterative optimization calculations (231, 235, 250, 251, 260, 261, 271, 290) for the current frame.
The presently disclosed method may be treated as a recursive, nonlinear optimization problem. Once the rendered depth map for a given frame 210 has been optimized (by iteratively minimizing the error value/reducing the cost function—block 261), and the surface model updated (block 271), the method is repeated (recursively) for each subsequent frame 210 captured by the image capture device (in this example a monocular video device) as it moves through a 3D space. Thus, as each new frame arrives, the measured depth data 240 is compared (block 251) with a generative differentiable rendering 250 of the latest surface model depth data estimate, and appropriate Bayesian updates are made to the rendered depth map.
Nonlinear residual values are formulated as the difference between the measured (inverse) depths in the current frame, and the predicted (inverse) depths generated by the rendered depth map. It may be more efficient to utilize the inverse depth values (i.e. 1/actual-depth) in calculations since the estimated distance values for far away objects may be effectively infinite, causing problems in the difference/error calculations. By utilizing inverse depth maps, these large/infinite depth values are instead reduced towards zero.
In order to obtain a recursive formulation and maintain all past measurements, the error terms are linearized and kept as “priors” that are jointly minimized with the residual values (the difference between the observed value and the estimated values) for the current frame.
Using the example efficient differentiable rendering approach enables rigorous incremental probabilistic fusion of standard, locally-estimated depth (and color) into an immediately-usable dense model. Therefore, using only a single forward-looking camera to provide detailed maps suitable for precise autonomous navigation, the present apparatus and method may be employed for free-space and obstacle mapping by low-cost robots.
Incorporation of the Apparatus and Method into a Robotic Device
In some examples, the apparatus and method described above may be implemented within a robotic device 400, as shown in
The robotic device 400 may also comprise a movement controller, such as a navigation engine 450 and a movement actuator 460. The movement actuator 460 may comprise at least one electric motor coupled, for example, to one or more wheels, tracks and/or rollers, and is arranged to move the robotic device 400 within a 3D space.
Furthermore, the navigation engine 450 of the robotic device 400 may also be coupled to both the mapping engine 330 of the mapping apparatus 300, and the movement actuator 460 of the robotic device 400. The navigation engine 450 controls movement of the robotic device 450 within a 3D space. In operation, the navigation engine 450 uses a “free-space map” (as will be described later on with reference to
Depth maps are measured and calculated by the depth map processor 430 from the retrieved image frames 210 of the 3D space, for example using a plane sweep algorithm, and communicated to the depth data interface 310 of the apparatus (block 510).
Frame-to-frame motion and pose data of the camera is calculated by a pose processor 440 (using techniques as discussed above). The camera pose data is retrieved by the pose data interface 320 of the mapping apparatus 300 and forwarded to the differentiable renderer 340 (block 520).
As outlined previously with reference to
The updated surface model along with the initial camera pose data (from block 520) is subsequently used by the differentiable renderer 340 to render an updated predicted depth map of the observed scene (block 540). The updated rendered depth map of the frame is compared directly to the original measured depth map for the frame (from block 510), and a cost function (including the error between the two maps) is reduced using the partial derivative values calculated by the differentiable rendering process (block 550). The surface model is updated, again, following optimization and the process (blocks 540, 550, 560, 570) is repeated, iteratively, until the optimization of the rendered depth map converges. The optimization may, for example, continue until the error term between the rendered and measured depth maps falls below a pre-determined threshold value.
After the iterative optimization process, the linearized error terms may also be updated. The linearized error terms represent an uncertainty of previously calculated values, and are used to create polynomial (in this example, quadratic) constraints on how the vertices of each triangular surface element of the surface model (in this example a triangular mesh) can be further modified/displaced in future recursions (e.g. at each frame) after the iterative optimization of the current (frame) depth map has been completed, and “fused” (i.e. included) into the latest surface model. The constraints are built from the residual errors between the rendered 250 and measured (“observed”) 240 depth maps.
The present example method combines a generative model approach and differentiable rendering process to maximise a likelihood function for each observed frame/scene 210, by which the method actively attempts to configure the rendered surface model to best represent the observed 3D space.
Furthermore, the linearized error terms allow a full posterior distribution to be stored and updated. The per-triangle nature of the information filters, rather than per-vertex, takes into account the connections between individual cells (vertices) on the map and discards no information while keeping computational complexity bounded.
The whole process is repeated for each frame captured, with each updated surface model replacing the previous model.
Whilst the apparatus and method described are primarily directed towards resolving a depth map, additional color data may be incorporated into the resultant height map/surface model and optimized during the process as well. In this case, the method is similar to that above, but includes some additional steps. Firstly, an observed color map for the 3D space is obtained, alongside an initial “appearance model” for the 3D space (using initial appearance parameters). A predicted color map is rendered based upon the initial appearance model, the initial surface model and the obtained camera pose data (see also
In addition to the components of the robotic device 605 shown in
A desirable property of the generated surface model is that it can be directly used for robot navigation and obstacle avoidance in a 3D space. In a preferred example, the reconstruction is based upon a triangular mesh atop a height map representation, and therefore a threshold may be applied to the calculated height values to generate usable quantities such as the drivable free-space area or a classification of walls, furniture and small obstacles based on their height.
Any one of the mapping apparatus 300 and navigation engine 450 above may be implemented upon a computing device embedded within a robotic device (as indicated by the dashed lines 620, 670 in
In a further example, once the surface model update is determined, the computer-executable instructions cause the computing device to fuse nonlinear error terms associated with the update into a cost function associated with each triangular element.
The present approach is based on a probabilistic generative model, and
Within a 3D space to be mapped, any given surface is parametrised by its geometry G and its appearance A. The “pose” of an image capture device such as a camera, and therefore any image taken with it, is the location and orientation of the camera within a given 3D space. A camera with an associated pose T in the 3D space samples the current frame, and an image I and an inverse depth (i.e. 1/actual-depth) map D are rendered.
Employing Bayesian probability techniques, the joint distribution that models the image formation process is:
P(I, D, G, A, T)=P(I|G, A, T)P(D|G, T)P(G)P(A)P(T)
The relationship between image observations and surface estimation can be also expressed using Bayes rule:
P(G, A, T|I, D)∝P(I, D|G, A, T)P(G)P(A)P(T)
This allows the derivation of a maximum a-posteriori (MAP) estimate of the camera pose and surface:
argmaxG,A,TP(I, D|G, A, T)P(G)P(A)P(T)
The term P(I, D|G, A, T) is a likelihood function which can be evaluated and differentiated using the differentiable renderer. No assumptions are made regarding the geometry and/or colors of the frame, and the problem is treated as one of maximum likelihood. The camera pose is treated as given by a dense tracking module. With these simplifications and taking the negative logarithm of the equation above, the following minimization problem is obtained:
argminG,AF(G, A, T)
with:
F(G, A, T)=∥{tilde over (D)}−D(G, T)∥ΣD+∥Ĩ−I((G, A, T)∥ΣI
Here {tilde over (D)} and Ĩ represent, respectively, the measured (observed) inverse depth map and image with associated measurement uncertainties modelled by (diagonal) covariance matrices ΣD and ΣI, whereas D and I denote the rendered predicted inverse depth map and image using the current estimates of G, A, and a given T. Even though the differentiable rendering process and therefore the function F(G, A, T) is nonlinear, having access to some initial estimates of G0, A0, T0, as well being able to evaluate the cost function F and its derivative with respect the model parameters, allows an estimate of the standard nonlinear least squares to be found in an iterative fashion. In particular the partial derivatives
as well
are required to De calculated, and are obtained from the differentiable rendering process, for almost no extra computational cost, by the differentiable renderer.
The differentiable rendering method is based upon a weighted optimization of the depth map values (and optionally the color map values for the more advanced image modelling) as each new image (frame) is received. While the method utilizes the nonlinear error terms between the rendered and predicted depth (and optionally color) maps of the latest frame captured, all previous such error measurements are kept as “prior” linear error terms to determine the polynomial (in this example, quadratic) constraints on how the vertices of the surface model (in this example, a triangular mesh) can be further modified/displaced after an optimize depth map has been fused into the surface model, as described below. Therefore, as more data is collected, rendered, optimized and fused into the surface model, the more robust the model becomes.
The optimization process requires several iterations, and the number of measurements and the size of the state space are high, though any Jacobian matrixes (a matrix of all first-order partial derivatives of a vector-valued function) linking them are sparse. The present method is highly efficient owing to the differentiable rendering approach, wherein at each iteration of the optimization, the inverse depth (and optionally the color measurement) likelihood function is re-evaluated by rendering the predictions. At the same time, the per-pixel elements of the Jacobian matrixes that will be used for the optimization stage are also calculated. When correctly implemented this can be done at almost no additional computational cost.
With regards to
The t, u, and v are the essential elements required to render a depth (t) and a color (u and v) for a particular pixel. The depth value t is directly related to the depth, whereas the barycentric coordinates (u and v) are used to interpolate the color c based on the RGB color triangle vertices (c0, c1, c2) in the following way:
c=(1−u−v)c0+uc1+vc2.
The rendered inverse depth di of a pixel i depends only on the geometry of the triangle that a ray is intersecting (and camera pose, that is assumed to be fixed for a given frame). In one example, the surface model is modelled using a height map, wherein each vertex has only one degree of freedom, its height z. Assuming that the ray intersects the triangle j specified by heights z0, z1, z2, at distance 1/di (where di is the inverse depth for a pixel i), the derivative can be expressed as follows:
If the more advanced step of differentiating color/appearance is employed, the rendered color ci of pixel i depends both on the triangle (j) geometry as well as the per vertex color. The derivative of the rendered color with respect to vertex colors is simply the barycentric coordinates:
In this example, I denotes the identity matrix (3×3 in this case). Since in this loosely-coupled fusion, the color image has already been used to generate a depth map that determines the height map, the dependency of the color image on the height map is ignored, i.e. the respective derivative are not calculated. This is a conservative assumption in order that the colors and height maps may be treated independently. In essence, the color estimation simply serves to improve the representation of the height map.
Height Map Fusion through Linearization
The inverse depth error term as described above is of the form:
e
1
={tilde over (d)}
i
−d
i)zj)
Where zi denotes the heights of the triangle j intersected by the ray through pixel i. This is a scalar adaption of the depth component of the minimization problem outlined previously. In this example zj=[z0, z1, z2]T. After the optimization is completed, the error term is approximated linearly around the current estimate
e
i
∞ē
i
+Eδz=ē
i
−E
j
+Ez
j
=:e
i
i
The Jacobian matrix E was computed as part of the gradient descent as:
After a frame has been fused into the surface model, the polynomial (in this example a quadratic) cost is accumulated on a “per-triangle” basis. These linearized error terms create polynomial (in this example, quadratic) constraints on how the vertices of the surface model (in this example, a triangular mesh) can be further modified/displaced after a depth map has been fused into the surface model. The constraints are built from the residual errors between the rendered and observed depth maps. Therefore, for each triangle j, a quadratic cost term is kept of the form:
c=c
0
+b
T
z
+z
T
Az
Wherein the values of c0, b, and A are initially zero. The gradient of these cost terms can be obtained in a straight-forward manner, and the per-triangle cost update (simply summing) based on the current linearized error term thus consists of the following operation:
Multiplying this out and rearranging provides the updates to the coefficient of the per-triangle quadratic cost:
The overall cost concerning the height map, Fz, thus amounts to:
Wherein ei is the pixel difference between the measured and the rendered depth as described earlier, j is the sum over all triangles, and i is the sum over all pixels. After the optimization terminates (converges), the fusion of the current nonlinear depth error terms is performed into all the quadratic per-triangle cost terms. Note that, consequently, the number of linear cost terms is bounded by the number of triangles in the height map, whereas the number of nonlinear (inverse) depth error terms is bounded by the number of pixels in the image capture device. This is an important property for real-time operation.
As an example, the per-triangle error terms are initially set to zero, and the first depth map is fused into the surface model. After the first depth map has been fused into the surface model, the per-triangle quadratic constraints are updated, and they are used as the priors (“spring” constraints) for the fusion of the next depth map. This process is then repeated.
Note furthermore that color fusion is not at addressed here, but the skilled person could extend the above formulation in a straight-forward manner. Since the color information is only used in this example for improved display of the height map, the preferred method abandons fusing the color and only uses the current frame nonlinear color error terms in the overall cost function.
The height map fusion is formulated as an optimization problem. Furthermore, by means of differentiable rendering, the gradient of the associated cost function may be accessed without any considerable increase in computational demand When optimizing the depth map (and optionally the color map) for each new frame 210, the apparatus and method iteratively solves a nonlinear “least squares” problem. A standard procedure, at each iteration, would require forming a normal equation and solving, it for example by means of Cholesky factorization. However, due to the size of the problem to be solved, using direct methods that form the Hessian explicitly, and rely on matrix factorization, are prohibitively expensive.
Instead, the conjugate gradient descent algorithm is used, which is indirect, matrix-free and can access the Hessian through a dot product. At each iteration of conjugate gradient it is required to perform a line search in order to determine the step size in the descent direction. This requires a re-evaluation of the cost function. When evaluating the cost function with the present method, the gradient may be almost instantaneously accessed, and the optimal step size is not searched for, but instead the method accepts any step size that leads to a decrease in the cost, and in the next iteration the already-available gradient is used. Typically about 10-20 iterations are required until the optimization process converges, which in the current implementation allows the described fusion to run at a rate of about 15-20 fps. Convergence may occur, for example, when the error value between the rendered and the measured depth maps falls below a predetermined threshold value.
The disclosed apparatus and method provide a number of benefits over the prior art. Given the probabilistic interpretation and generative model used, Bayesian fusion using a “per triangle” information filter is performed. The approach is optimal up to linearization errors, and discards no information, while the computational complexity is bounded.
The method is highly scalable, both in terms of image resolution and scene representation. Using current GPUs, rendering can be done extremely efficiently, and calculating the partial derivatives comes at almost negligible cost. The disclosed method is both robust and efficient when applied directly to mobile robotics.
The above embodiments are to be understood as illustrative examples of the invention. Further embodiments are envisaged. For example, there exist many different types of camera and image retrieval methods. The depth, image and camera pose and tracking data might each be obtained from separate sources, for example depth data from a dedicated depth camera (such as the Microsoft Kinect™) and image data from a standard RGB camera. Furthermore, the tracking may also be directly integrated into the mapping process. In one example, the five most-recent frames are used to derive the depth maps for a single frame.
It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. It should be noted that use of method/process diagrams is not intended to imply a fixed order; for example in
Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.
Number | Date | Country | Kind |
---|---|---|---|
1608471.7 | May 2016 | GB | national |
This application is a continuation of International Application No. PCT/GB2017/051333, filed May 12, 2017, which is a claims priority to GB Application No. GB1608471.7, filed May 13, 2016, under 35 U.S.C. § 119(a). Each of the above-referenced patent applications is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/GB2017/051333 | May 2017 | US |
Child | 16188693 | US |