The invention relates to a mapping device for supporting localization and mapping using a heterogenous map comprising image-based map elements and structure-based map elements, a method of supporting localization and mapping using a heterogenous map comprising image-based map elements and structure-based map elements, a corresponding computer program, a corresponding computer-readable data carrier, and a corresponding data carrier signal.
Simultaneous Localization and Mapping (SLAM) and similar algorithms may be used for determining the pose, i.e., position and/or orientation, of a device moving through a real-world environment using a map, while at the same time augmenting the map with new information. This is achieved by utilizing information captured by one or more sensors which the device is carrying, comparing the captured information with the existing map to determine the device's pose, while at the same time the map with information derived from the captured sensor data.
An important research topic is performing SLAM with heterogeneous sensor information, i.e., localizing a device having a certain type of sensor, e.g., image-based or structure-based, using a heterogenous map which contains both image-based and structure-based elements. This is because devices may be equipped with a wide range of sensors of the same type or different types, having different capabilities. For example, an ordinary smartphone executing an Augmented Reality (AR) application may have only a monocular camera, whereas high-end smartphones or tablets may be equipped with a monocular camera and a Lidar (e.g., the Apple iPhone 12 Pro and the Apple iPad Pro). Moreover, Mixed-Reality (MR) headsets will likely be provided with multiple monocular cameras, and in some cases stereo cameras or Lidars, robotic vacuum cleaners may only possess a Lidar, while industrial robots may feature all the above sensors. Even if a device may have multiple sensors of the same type, their characteristics may vary, e.g., with respect to the Field-of-View (FoV), range, resolution, density, frame rate, etc. Hence, there is a strong need to develop SLAM algorithms and solutions which can handle such heterogeneity.
When performing localization on heterogeneous maps which have been compiled using sensor data from potentially different devices at different occasions, a problem that arises is to decide which map information should be used to localize a device which has a specific type of sensor.
It is an object of the invention to provide an improved alternative to the above techniques and prior art.
More specifically, it is an object of the invention to provide improved solutions for supporting localization and mapping of devices using heterogenous maps.
These and other objects of the invention are achieved by means of different aspects of the invention, as defined by the independent claims. Embodiments of the invention are characterized by the dependent claims.
According to a first aspect of the invention, a mapping device for supporting localization and mapping using a heterogenous map is provided. The heterogenous map comprises image-based map elements and structure-based map elements. The mapping device is operative to receive a request pertaining to localization using current sensor data which is image-based sensor data or structure-based sensor data. The mapping device is further operative to access image-based map elements and structure-based map elements, representing visual features and structural features, respectively, in a real-world environment. Each map element is associated with a capturing time indicating a time of capturing the sensor data based on which it was derived. The mapping device is further operative to identify pairs of corresponding map elements. Each pair of corresponding map elements comprises an image-based map element of the accessed image-based elements, and a structure-based map element of the accessed structure-based map elements. The image-based map element and the structure-based map element, i.e., the corresponding map elements, represent features at the same real-world location. The mapping device is further operative to determine, for each pair of corresponding map elements, an information difference metric between the image-based map element and the structure-based map element. The information difference metric is indicative of a difference in a representation of the real-world location by the image-based map element and a representation of the real-world location by the structured-based map element. The mapping device is further operative to select, for each pair of corresponding map elements, one of the map elements of the pair of corresponding map elements for updating the heterogenous map. The selection is based on a comparison of the respective capturing times, a type of the current sensor data, and the information difference metric.
According to a second aspect of the invention, a method of supporting localization and mapping using a heterogenous map is provided. The heterogenous map comprises image-based map elements and structure-based map elements. The method is performed by a mapping device and comprises receiving a request pertaining to localization using current sensor data which is image-based sensor data or structure-based sensor data. The method further comprises accessing image-based map elements and structure-based map elements, representing visual features and structural features, respectively, in a real-world environment. Each map element is associated with a capturing time indicating a time of capturing the sensor data based on which it was derived. The method further comprises identifying pairs of corresponding map elements. Each pair of corresponding map elements comprises an image-based map element of the accessed image-based elements, and a structure-based map element of the accessed structure-based map elements. The image-based map element and the structure-based map element, i.e., the corresponding map elements, represent features at the same real-world location. The method further comprises determining, for each pair of corresponding map elements, an information difference metric between the image-based map element and the structure-based map element. The information difference metric is indicative of a difference in a representation of the real-world location by the image-based map element and a representation of the real-world location by the structured-based map element. The method further comprises selecting, for each pair of corresponding map elements, one of the map elements of the pair of corresponding map elements for updating the heterogenous map. The selection is based on a comparison of the respective capturing times, a type of the current sensor data, and the information difference metric.
According to a third aspect of the invention, a computer program is provided. The computer program comprises instructions which, when the computer program is executed by a computing device, such as a mapping device, cause the computing device to carry out the method according to an embodiment of the second aspect of the invention.
According to a fourth aspect of the invention, a computer-readable data carrier is provided. The computer-readable data carrier has stored thereon the computer program according to the third aspect of the invention.
According to a fifth aspect of the invention, a data carrier signal is provided. The data carrier signal carries the computer program according to the third aspect of the invention.
The invention makes use of an understanding that, in heterogenous SLAM, the problem arises that a device (commonly referred to as “agent” in SLAM literature) comprising one or more sensors of potentially different types, which uses a heterogenous map for localizing itself, i.e., determining its pose (position and/or orientation) in a real-world environment represented by the heterogenous map, has no means of assessing which of potentially several map elements of different types representing the same real-world location is the current or more recent representation of the real-world environment at that location. Such a scenario may arise if the real-world environment has changed in-between capturing of data by sensor devices having different types of sensors. If that occurs, the derived map elements having different types may have deviating representations of visual or structural features of the same real-world location.
Embodiments of the invention are advantageous in that the selection of map elements among map elements in a heterogenous map, i.e., map elements having different types, is not only based on type, e.g., a device comprising an image-based sensor is localized based on image-based map elements, but also takes into consideration the difference in representations of the real-world environment by the respective different-type map elements, as well as the time when their respective representations were valid, i.e., when the sensor data from which the map elements are derived has been captured.
Even though advantages of the invention have in some cases been described with reference to embodiments of the first aspect of the invention, corresponding reasoning applies to embodiments of the other aspects of the invention.
Further objectives of, features of, and advantages with, the invention will become apparent when studying the following detailed disclosure, the drawings, and the appended claims. Those skilled in the art realize that different features of the invention can be combined to create embodiments other than those described in the following.
The above, as well as additional objects, features, and advantages of the invention, will be better understood through the following illustrative and non-limiting detailed description of embodiments of the invention, with reference to the appended drawings, in which:
All the figures are schematic, not necessarily to scale, and generally only show parts which are necessary in order to elucidate the invention, wherein other parts may be omitted or merely suggested.
The invention will now be described more fully herein after with reference to the accompanying drawings, in which certain embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In the following, embodiments of the invention will be described in relation to the scenario illustrated in
It is assumed here that the sensor device 110 is to be localized using an existing heterogeneous map which is updated while the sensor device 110 moves through the real-world environment 100, which in
Embodiments of the invention relate to a field known as collaborative mapping, which is based on multiple sensor devices capturing sensor data of the same real-world environment at different occasions, using sensors having different capabilities (e.g., resolution, accuracy, Field-of-View, etc) and potentially different types (image-based vs structure-based). Throughout this disclosure, a sensor, such as sensors 111, 121, or 131, is assumed to be either of image-based type or of structure-based type. Examples of image-based sensors are monocular cameras, stereo cameras, and any other optical sensor which captures a visual image of the environment 100. Examples of structure-based sensors are Lidars, stereo cameras, radars, and any other sensor which captures ranges or distances between the sensor and the environment 100.
In practice, collaborative mapping may be implemented in different ways, utilizing the distributed nature of the underlying architecture with different sensor devices (such as the sensor devices 110, 120, and 130) and a mapping server 140. The mapping server 140 may be provided as an application server or edgecloud which is accessible by, and can communicate with, the sensor devices 110, 120, and 130, via communication links 105 which are established between the sensor devices 110, 120, and 130, and one or more communications networks 160 which may include one or more radio-access-networks, such as cellular networks, WLAN/Wi-Fi networks, a Bluetooth network, or the like. The sensor devices 110, 120, and 130, may also exchange data with each other via the communication links 105, either directly in a device-to-device fashion or via the one or more communications networks 160.
For instance, a sensor device, such as sensor device 110, capturing sensor data utilizing its sensor 111 while moving through the environment 100, may transmit the captured sensor data to the mapping server 140, which creates map elements based on the captured sensor data and updates an existing heterogenous map with the created map elements with an existing heterogenous map, which already contains map elements derived from sensor data captured by sensor devices (such as sensors devices 110, 120, and 130) at earlier occasions. The process updating an existing map by adding, or fusing, new map elements which are created based on captured sensor data is commonly referred to as “mapping”. Then, the mapping device 140 sends the updated heterogenous map, or a portion of the updated heterogenous map covering an area of the environment 100 in which the sensor device 110 is to be localized, to the sensor device 110. Upon receiving the updated heterogenous map, or portion thereof, the sensor device 110 attempts to determine its pose, i.e., its position and/or orientation, using the updated heterogenous map, as is known in the art. The latter process is commonly referred to as a “localization”. A portion of a larger map which covers an area of an environment in which a sensor device (or agent) is to be localized is commonly referred to as “local map”.
In an alternative implementation of distributed SLAM, the sensor device 110 capturing sensor data with its sensor 111 while moving through the environment 100 may perform both localization and mapping, using a locally stored heterogenous map. This may, e.g., be a heterogenous map, or portion thereof covering an area of the environment 100 in which the sensor device 110 is to be localized, which the sensor device 110 has received from the mapping server 140. Similar to what is described hereinabove in relation to performing mapping by the mapping device 140, the sensor device 110 creates map elements based on the captured sensor data and fuses the created map elements with the existing local heterogenous map. It may then transmit the updated local map to the mapping device 140, which fuses the received updated map with its heterogeneous map. The map which is maintained by the mapping device 140 is commonly referred to as a “global map”. The mapping device 140 may optionally share the updated global map, or updated local maps, with the sensor devices 110, 120, and 130. In this way, sensor devices moving close to each other in the same environment benefit from each other's captured sensor data.
For both implementations, the sensor device 110 performs localization using a local map, which optionally may only cover a limited geographical area, whereas the mapping device 140 collects updated local maps or map elements from different sensor devices, such as sensor devices 110, 120, and 130, and fuses the updated information with its existing heterogenous map.
As yet a further alternative, collaborative mapping may also be implemented in a distributed fashion without relying on a mapping device 140 for maintaining a (global) map, as is described hereinbefore. In this alternative implementation, the sensor devices, such as sensor devices 110, 120, and 130, perform localization and mapping locally and share their updated local maps with each other, and each sensor device 110, 120, and 130, fuses the received updated map or map elements with its local heterogeneous map. In other words, the sensor devices 110, 120, and 130, perform localization and mapping autonomously and share updated map information with each other, via the communication links 105.
For the sake of simplicity, the present disclosure focuses on the first alternative described above, i.e., the sensor device 110 transmits captured sensor data to the mapping server 140, which creates map elements based on the captured sensor data and fuses the created map elements with an existing heterogenous map, which it then shares with the sensor device 110, and based on which the sensor device performs localization. However, embodiments of the invention are not limited to this specific implementation of distributed SLAM, and alternative embodiments may easily be envisaged. In particular, the mapping device 140 may comprise the sensor device 110, i.e., the sensor device 110 is performing both localization and mapping.
In
The processing circuitry 202 may comprise one or more processors 203, such as Central Processing Units (CPUs), microprocessors, application-specific processors, Graphics Processing Units (GPUs), and Digital Signal Processors (DSPs) including image processors, or a combination thereof, and a memory 204 comprising a computer program 205 comprising instructions. When executed by the processor(s) 203, the computer program 205 causes the mapping device 140 to become operative in accordance with embodiments of the invention described herein. The memory 204 may, e.g., be a Random-Access Memory (RAM), a Read-Only Memory (ROM), a Flash memory, or the like. The computer program 205 may be downloaded to the memory 204 by means of the network interface 206, as a data carrier signal carrying the computer program 205. The processing circuitry 202 may alternatively or additionally comprise one or more Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), or the like, which are operative to cause the mapping device 140 to become operative in accordance with embodiments of the invention described herein.
The network interface 206 may comprise one or more of a cellular modem (e.g., GSM, UMTS, LTE, 5G, or higher generation), a WLAN/Wi-Fi modem, a Bluetooth modem, a Near-Field Communication (NFC) modem, an Ethernet interface, or the like, for exchanging data through a communication link 105 with one or more sensor devices, such as sensor devices 110, 120, and 130. Data may be exchanged directly, or via one or more wired or wireless communications networks 160.
Further with reference to
In the present context, maps and map elements are data structures comprising data representing features of a real-world environment. A known alternative used for both image-based maps and structure-based maps in relation to SLAM are so-called graph maps. These are graph structures which are defined by vertices and edges. The vertices contain the map information, which for structure-based maps may include depth information, or pointcloud/segment descriptors and their respective poses. For image-based maps, the vertices may include keyframes, or 2D features or images and their respective poses. The edges describe the geometric transformation required to traverse from one vertex to adjacent vertices. A widely used framework for graph maps and their manipulation is g2o (https://openslam-org.github.io/g2o.html, R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, “g2o: A General Framework for Graph Optimization”, IEEE International Conference on Robotics and Automation (ICRA), 2011).
A known alternative for both image-based maps and structure-based maps in relation to SLAM are so-called occupancy maps. An occupancy map is an 2D or 3D array of cells, octree, or any other 3D space representation format, which can either be occupied or free, depending on if a real-world object was present when the underlying sensor data was captured. A widely used framework for occupancy maps is OctoMap (https://octomap.github.io/, A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “OctoMap: an efficient probabilistic 3D mapping framework based on octrees”, Autonomous Robots, vol. 34, pages 189-206, Springer, 2013).
Occupancy maps can be represented as graph maps, so that both types represent the same information. For example, a graph map can encode parts of an occupancy grid map, where instead of the vertex being a point cloud, an image, or descriptors of those, it is a section or an element of an occupancy grid map (see, e.g., “Virtual Occupancy Grid Map for Submap-based Pose Graph SLAM and Planning in 3D Environments”, by B.-J. Ho, P. Sodhi, P. Teixeira, M. Hsiao, T. Kusnur, and M. Kaess, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2019).
Throughout this disclosure, a map element is to be understood as a subset of a map. For instance, for graph maps, a map element may comprise one or more vertices including its/their edges. For occupancy maps, a map element may comprise one or more cells. In practice, the number of vertices and edges, or the number of cells, i.e., the size of the map element, is a design parameter and may be selected according to the desired resolution of the map. For example, for a graph map, a new vertex is created for every “keyframe” which is a reference frame for the current space (instead of every frame), while for an occupancy map the desired depth of the tree-based space representation is selected, which in turn defines the desired map resolution in meters, and can be adjusted to reflect the resolution of the sensor capturing the data.
In the following, embodiments of the mapping device 140 for supporting localization and mapping using a heterogenous map are described with reference to
The mapping device 140 is operative to receive a request 311 pertaining to localization using current sensor data. The current sensor date is either image-based sensor data or structure-based sensor data. In general, the received request 311 may relate to localization of a sensor device comprising at least one sensor operative to capture the current sensor data, such as the sensor device 110. The request 311 may be received from the sensor device 110, and may comprise the current sensor data. In practice, the current sensor data is sensor data which the sensor device 110 has captured 310 using its sensor 111.
The mapping device 140 is further operative to access 313 image-based map elements and structure-based map elements. The image-based map elements represent visual features in a real-world environment, in particular the environment 100. In practice, the image-based map elements may be derived from sensor data captured 301 by one or more image-based sensors, e.g., the sensors 121 comprised in the sensor device 120 which has passed through the environment 100 at an earlier occasion, as part of mapping process performed by the mapping device 140 during which the heterogenous map maintained by the mapping device 140 is updated 305. The image-based map elements are associated with respective capturing times indicating a time of capturing 301 the sensor data based on which they have been derived. The capturing time may, e.g., be indicated by a time stamp which is associated with the map element, e.g., as meta data.
The structure-based map elements represent structural features in a real-world environment, in particular the environment 100. In practice, the structure-based map elements may be derived from sensor data captured 302 by one or more structure-based sensors, e.g., the sensors 131 comprised in the sensor device 130 which has passed through the environment 100 at an earlier occasion, as part of a mapping process performed by the mapping device 140 during which the heterogenous map maintained by the mapping device 140 is updated 306. The structure-based map elements are associated with respective capturing times indicating a time of capturing 302 the sensor data based on which they have been derived.
The image-based map elements and the structure-based map elements may be part of a heterogenous map, e.g., a global map, which the mapping device 140 maintains and updates in a collaborative manner. The image-based map elements and the structure-based map elements, or the heterogenous map, may be stored in a local data storage which the mapping device is provided with, such as the memory 204, or in a data storage which is accessible by the mapping device 140 via the one or more communications networks 160, e.g., a cloud storage.
The mapping device 140 is further operative to identify 314 pairs of corresponding map elements. Each pair of corresponding map elements comprises an image-based map element of the accessed image-based elements, and a structure-based map element of the accessed structure-based map elements. The image-based map element and the structure-based map element of the pair of corresponding map elements represent features at the same, or at least substantially the same, real-world location. In other words, the two corresponding map elements represent respective regions in the real-world environment which substantially overlap, and preferably are identical. By way of reminder, the real-world region is represented by a map element which comprises one or more cells. It is assumed here that the set of image-based map elements and the set of structure-based mal elements are aligned, i.e., they use a common coordinate system which allows to identify pairs of corresponding map elements which represent visual and structural features, respectively, at the same, or substantially the same, real-world location.
The mapping device 140 is further operative, for each pair of corresponding map elements, to determine 315 an information difference metric between the image-based map element and the structure-based map element of the pair of corresponding map elements. The information difference metric is indicative of a difference in a representation of the real-world location by the image-based map element and a representation of the (same, or substantially the same) real-world location by the structured-based map element.
For occupancy maps, or map elements of occupancy-type, the information difference metric may be calculated based on a comparison of the respective states, i.e., whether the corresponding cell, or cells, of the corresponding image-based map element and the structure-based map element are occupied or not occupied. If the map elements comprise single cells, the information difference metric is binary, i.e., the states of the corresponding map elements are the same (e.g., the information difference metric is 0 or 0%), or not (accordingly, the information difference metric is 1 or 100%). If the map elements comprise multiple cells representing a real-world location, the information difference metric may, e.g., be determined as a ratio (e.g., between 0 and 1, or 0 and 100%) of the corresponding cells which have different states, divided by the total number of pairs of corresponding cells comprised in a map element.
If an octree representation is used, a real-world region is represented by a tree structure comprising cells, where the resolution increases when moving further down the tree structure. In this case, the information difference metric may be determined for different resolutions by limiting the depth of the octree in calculating the information difference metric.
For both occupancy maps and graph maps, or map elements of occupancy-type or graph-type, the information difference metric may be determined based on feature descriptors. A feature descriptor is generated by an algorithm which takes as input 3D data which is a structural representation of the real-world environment, and outputs feature descriptors, aka feature vectors. Feature descriptors encode information into a series of numbers and act as a sort of numerical “fingerprint” that can be used to differentiate one feature from another. As an example, R. Dubé, A. Cramariuc, D. Dugas, H. Sommer, M. Dymczyk, J. Nieto, R. Siegwart, and C. Cadena, have described an algorithm for generating feature descriptors as 64×1 vectors of floating point numbers in “SegMap: Segment-based mapping and localization using data-driven descriptors” (in “The International Journal of Robotics Research”, DOI: 10.1177/0278364919863090, 2019). The information difference metric may then be calculated as the distance in feature space between two feature descriptors at the same real-world location. The distance in feature space can, e.g., be computed as the Euclidian distance between the two 64×1 vectors of floating-point numbers. A distance of zero means that the two feature descriptors are exactly the same, i.e., there is no information difference between the corresponding image-based structure-based map elements (e.g., the information difference metric is 0 or 0%). A non-zero distance is indicative of a difference in information. Because of noise, which is inherent to captured sensor data, the sensor's limited accuracy etc, a non-zero threshold is used for determining whether two map elements represent the same information, or not, as is described below.
The mapping device 140 is further operative to select 316, for each pair of corresponding map elements, one of the map elements of the pair of corresponding map elements for updating the heterogenous map. The map element is selected based on a comparison of the respective capturing times, a type of the current sensor data, and the information difference metric, as is described in further detail below.
Optionally, the mapping device 140 may further be operative to update 317, for each pair of corresponding map elements, the heterogenous map with the selected 316 map element, or a transformed map element which is derived therefrom. Depending on the type of the selected map element, i.e., image-based or structure-based, they selected map element may need to be transformed into a map element of the other type, as is described in further detail below. Updating 317 the heterogenous map may be achieved by adding a new map element, or replacing an existing map element at the same location as the selected map element with the selected map element, or a transformed map element which is derived therefrom. The process of updating maps with new sensor information, or map elements derived therefrom, is also referred to as fusing. For a graph map, new vertices and consequently new edges may be added when adding new map elements to an existing map, while the replacement of map elements by a new map element may incur the removal of vertices and/or edges. Similarly, for occupancy maps, map elements in the form of an octree can be updated with new information, or removed if environmental information is no longer present at a given location represented by the map element.
As an alternative, the mapping device 140 may be operative to perform localization 320 on the selected 316 map element, or a set of selected 316 map elements which are in proximity of the sensor device 110, without updating 317 the heterogenous map for future use. In particular, this may be the case if the mapping device 150 comprises the sensor device 110.
Maps, or map elements, of a given type may be transformed into maps or map elements, respectively, of the other type. Correspondingly, sensor data of a given type may be transformed into sensor data of the other type. For instance, “Visual localization within LIDAR maps for automated urban driving”, by R. W. Wolcott and R. M. Eustice, in “2014 IEEE/RSJ International Conference on Intelligent Robots and Systems”, DOI: 10.1109/IROS.2014.6942558, IEEE, 2014, discloses transforming structure-based map elements or sensor data into image-based map elements or sensor data, respectively. More specifically, 3D Lidar data extracted by a sensor and stored in a map is transformed into synthetic 2D images by applying a projection framework.
The transformation of image-based map elements or sensor data into structure-based map elements or sensor data, may, e.g., be performed in accordance with what is described in “Monocular camera localization in 3D LiDAR maps” (by T. Caselitz, B. Steder, M. Ruhnke, and W. Burgard, in “2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)”, DOI: 10.1109/IROS.2016.7759304, IEEE, 2016) which describes creating a 3D point-cloud comprising 2D features extracted from 2D images by computing the 3D location of the 2D features using a SLAM algorithm. Yet another method to transform image-based map elements or sensor data into structure-based map elements or sensor data is based on unsupervised or supervised Machine Learning techniques, as proposed in “Unsupervised monocular depth estimation with left-right consistency”, by C. Godard, O. Mac Aodha, and G. J. Brostow (in “2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)”, pages 6602-6611, DOI: 10.1109/CVPR.2017.699, IEEE, 2017).
The mapping device 140 may optionally be further operative to estimate 312 future real-world locations of a sensor device which is operative to capture the current sensor data, such as the sensor device 110. In practice, these are real-world locations at which the current sensor data is likely to be captured, i.e., locations which the sensor device 110 will visit in the near future. The future real-world locations of the sensor device 110 may be estimated 312 based on one or more of:
If the mapping device 140 is further operative to estimate 312 future real-world locations of the sensor device 110, it is further operative to prioritize performing one or more of the operations described hereinbefore, in particular identifying 314 pairs of corresponding map elements, determining 315 an information difference metric, selecting 316 one of the corresponding map elements, and updating 317 the heterogenous map with the selected map element, for map elements which represent the estimated 312 future real-world locations of the sensor device 110 operative to capture the current sensor data. Preferably, the mapping device 140 may be operative to perform one or more of these operations only for map elements which represent the estimated 312 future real-world locations of the sensor device 110. Advantageously, and in view of the fact that SLAM and similar algorithms are computationally complex, map elements representing real-world locations which the sensor device is likely to visit in the near future are prioritized and can be made available in time for localizing 320 the sensor device 110.
If the sensor device 110 and the mapping device 140 are separate devices, the mapping device 140 may optionally be further operative to transmit the updated heterogenous map 318, or a portion thereof (e.g., a local map, or a map comprising map elements which represent the estimated 312 future real-world locations of the sensor device 110) to the sensor device 110.
In the following, different alternatives for selecting 316 one of the map elements of the pair of corresponding map elements for updating the heterogenous map, based on a comparison of the respective capturing times, a type of the current sensor data, and the information difference metric, are described.
For instance, the mapping device 140 may be operative to select 316 one of the map elements of the pair of corresponding map elements for updating the heterogenous map, if the information difference metric exceeds a threshold, as follows. The mapping device 140 may operative to determine the map element of the pair of corresponding map elements which is derived from the more recently captured sensor data, based on comparing the respective capturing times. In practice, this is the map element with the more recent capturing time, as indicated by the time stamp. This is the map element which is the more recent representation of the real-world environment 100. If the map element which is derived from the more recently captured sensor data is of the same type as the current sensor data, the mapping device 140 is operative to select the map element which is derived from the more recently captured sensor data. Else, if the map element which is derived from the more recently captured sensor data is of a different type than the current sensor data (e.g., the more recent map element is of image-based type, and the current sensor data is captured by a structure-based sensor, or vice versa), the mapping device 140 is operative to transform the map element which is derived from the more recently captured sensor data into a transformed map element having the same type as the current sensor data, and to select the transformed map element.
As another example, the mapping device 140 may be operative to select 316 one of the map elements of the pair of corresponding map elements for updating the heterogenous map, if the information difference metric does not exceed the threshold, by updating the heterogenous map with the map element having the same type as the current sensor data.
The threshold which is used in selecting 316 one of the map elements in accordance with the alternatives described above is a parameter which may be determined through experiments so as to provide a desired level of robustness for the determination of the information difference metric between two map elements. The threshold may further be adjusted according to the environment 100, such as indoor vs outdoor, brightness or lighting conditions, etc, as well as the properties of the image-based and structure-based sensors, such as their accuracy, noise level, etc. As an example, in a recent study using SegMap, it was found that a suitable threshold for the distance in feature space was of the order of 15 (section 4.5 in “Toward localization and mapping with heterogeneous depth sensors”, by P. Carbó Cubero, Master Thesis, KTH Royal Institute of Technology, Stockholm, 2020).
As a further alternative, the mapping device 140 may be operative to select one of the map elements of the pair of corresponding map elements for updating the heterogenous map further based on a prevalent type of neighboring map elements of the heterogenous map at a location where the map is to be updated. For an occupancy-type heterogenous map, this would be a number of map elements, or cells, around the location where the new map element is to be fused with the heterogenous map. For a graph-type heterogenous map, this may be map elements, or vertices, which are connected by edges with the vertex at the location where the map is to be updated. Thereby, the map elements which are contained in the heterogeneous map are, at least within a region of the map, which is relied on in the localization process, predominantly of the same type. This is advantageous in that switching between SLAM algorithms for performing localization using image-based map elements versus structure-based map elements can be avoided. Typically, switching of SLAM algorithms due to a change in type of map element on which localization is based incurs a computational cost which is caused by initialization of the SLAM algorithm.
More specifically, the mapping device 140 may be operative to select 316 one of the map elements of the pair of corresponding map elements for updating the heterogenous map as follows. The mapping device 140 may be operative to determine the map element of the pair of corresponding map elements which is derived from the more recently captured sensor data, based on comparing the respective capturing times. In practice, this is the map element with the more recent capturing time, as indicated by the time stamp. This is the map element which is the more recent representation of the real-world environment 100. If the prevalent type of neighboring map elements of the heterogenous map is of the same type as the map element which is derived from the more recently captured sensor data, the mapping device 140 is operative to select the map element which is derived from the more recently captured sensor data. Else, if the prevalent type of neighboring map elements of the heterogenous map is of a different type than the map element which is derived from the more recently captured sensor data, the mapping device 140 is operative to transform the map element which is derived from the more recently captured sensor data into a transformed map element having the having the same type as the current sensor data, and to select the transformed map element.
As yet a further alternative, the mapping device may be operative to select 316 one of the map elements of the pair of equivalent map elements for updating the heterogenous map further based on relative computational performance of localization algorithms for different types of map elements. Thereby, by selecting map elements of a type for which the corresponding localization-and-mapping algorithm is more efficient, e.g., faster or less resource consuming, processing time can be reduced, power consumption reduced, or resource requirements lowered.
More specifically, the mapping device 140 may be operative to select 316 one of the map elements of the pair of corresponding map elements for updating the heterogenous map as follows. The mapping device 140 is operative to determine the map element of the pair of corresponding map elements which is derived from the more recently captured sensor data, based on comparing the respective capturing times. In practice, this is the map element with the more recent capturing time, as indicated by the time stamp. This is the map element which is the more recent representation of the real-world environment 100. If the computational performance of a localization algorithm for map elements of the same type as the map element which is derived from the more recently captured sensor data is superior to the computational performance of a localization algorithm for map elements of a different type than the map element which is derived from the more recently captured sensor data, the mapping device 140 is operative to select the map element which is derived from the more recently captured sensor data. Else, if the computational performance of a localization algorithm for map elements of the same type as the map element which is derived from the more recently captured sensor data is not superior to the computational performance of a localization algorithm for map elements of a different type than the map element which is derived from the more recently captured sensor data, the mapping device 140 is operative to transform the map element which is derived from the more recently captured sensor data into a transformed map element having the having the same type as the current sensor data, and to select the transformed map element.
In addition, the mapping device 140 may be operative to select 316 one of the map elements of the pair of corresponding map elements for updating the heterogenous map based on properties of the sensors which have captured the sensor data from which the map elements have been derived. For instance, map elements which have been derived from sensor data captured by sensor with higher accuracy or precision maybe given preference.
Optionally, the mapping device 140 may further be operative, if the received request 311 pertains to localization using current sensor data which is image-based sensor data and structure-based sensor data, e.g., if the sensor device 110 comprises sensors 111 of both types, to indicate which type of current sensor data to use for localization. The indication is provided in response to the request 311 and may be transmitted 319 to the sensor device 110, if the sensor device 110 and the mapping device 140 are separate entities. The indication 319 may be transmitted separately from the updated heterogenous map 318, or together with the updated heterogenous map 318. By providing an indication as to which type of current sensor data to use for localization using the updated heterogenous map 318, localization may be controlled to commence using the type of current sensor data which is most suitable for the updated heterogenous map 318.
Optionally, the mapping device 140 may be further operative to perform localization 320 using the current sensor data, or sensor data derived therefrom. This may be the case if the mapping device 140 performs mapping and localization on behalf of the sensor device 110, e.g., if the sensor device 110 is constrained in terms of computational resources or battery power. More specifically, the mapping device 140 may further be operative, if the selected map element has the same type as the current sensor data, to perform localization 320 using the current sensor data. Else, if the selected map element has a different type than the current sensor data, the mapping device 140 is operative to transform the current sensor data into transformed sensor data having the same type as the selected map element, and to perform localization 320 using the transformed sensor data. Subsequently, if the sensor device 110 and the mapping device 140 are separate devices, the mapping device 140 may transmit information pertaining to the determined location of the sensor device 110 to the sensor device 110 (not shown in
In the following, embodiments of the method 400 of supporting localization and mapping using a heterogenous map comprising image-based map elements and structure-based map elements are described with reference to
The method 400 is performed by a mapping device and comprises receiving 401 a request pertaining to localization using current sensor data which is image-based sensor data or structure-based sensor data, and accessing 403 image-based map elements and structure-based map elements. The image-based map elements and structure-based map elements represent visual features and structural features, respectively, in a real-world environment. Each map element is associated with a capturing time indicating a time of capturing the sensor data based on which it was derived. The received request may relate to localization of a sensor device comprising at least one sensor operative to capture the current sensor data.
The method 400 further comprises identifying 404 pairs of corresponding map elements, each pair comprising an image-based map element of the accessed image-based elements, and a structure-based map element of the accessed structure-based map elements. The image-based map element and the structure-based map of the pair of corresponding map elements represent features at the same real-world location. The method 400 further comprises, for each pair of corresponding map elements, determining 405 an information difference metric between the image-based map element and the structure-based map element. The information difference metric is indicative of a difference in a representation of the real-world location by the image-based map element and a representation of the real-world location by the structured-based map element. The method 400 further comprises, for each pair of corresponding map elements, selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map based on a comparison of the respective capturing times, a type of the current sensor data, and the information difference metric.
The method 400 may optionally further comprise, for each pair of corresponding map elements, updating 407 the heterogenous map with the selected map element, or a transformed map element which is derived therefrom.
For instance, the selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may comprise determining, if the information difference metric exceeds a threshold, the map element of the pair of corresponding map elements which is derived from the more recently captured sensor data, based on comparing the respective capturing times. The selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may further comprise, if the map element which is derived from the more recently captured sensor data is of the same type as the current sensor data, selecting the map element which is derived from the more recently captured sensor data. Else, if the map element which is derived from the more recently captured sensor data is of a different type than the current sensor data, the selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may further comprise transforming the map element which is derived from the more recently captured sensor data into a transformed map element having the same type as the current sensor data, and selecting the transformed map element.
As another example, the selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may comprise selecting, if the information difference metric does not exceed the threshold, the map element having the same type as the current sensor data.
As a further example, the selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may further be based on a prevalent type of neighboring map elements of the heterogenous map at a location where the map is to be updated. More specifically, the selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may comprise determining the map element of the pair of corresponding map elements which is derived from the more recently captured sensor data, based on comparing the respective capturing times. The selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may further comprise selecting, if the prevalent type of neighboring map elements of the heterogenous map is of the same type as the map element which is derived from the more recently captured sensor data, the map element which is derived from the more recently captured sensor data. Else, if the prevalent type of neighboring map elements of the heterogenous map is of a different type than the map element which is derived from the more recently captured sensor data, the selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may further comprise transforming the map element which is derived from the more recently captured sensor data into a transformed map element having the having the same type as the current sensor data, and selecting the transformed map element.
As yet a further example, the selecting 406 one of the map elements of the pair of equivalent map elements for updating the heterogenous map may further be based on relative computational performance of localization algorithms for different types of map elements. More specifically, the selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may comprises determining the map element of the pair of corresponding map elements which is derived from the more recently captured sensor data, based on comparing the respective capturing times. The selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may further comprise selecting, if the computational performance of a localization algorithm for map elements of the same type as the map element which is derived from the more recently captured sensor data is superior to the computational performance of a localization algorithm for map elements of a different type as the map element which is derived from the more recently captured sensor data, the map element which is derived from the more recently captured sensor data. Else, if the computational performance of a localization algorithm for map elements of the same type as the map element which is derived from the more recently captured sensor data is not superior to the computational performance of a localization algorithm for map elements of a different type as the map element which is derived from the more recently captured sensor data, the selecting 406 one of the map elements of the pair of corresponding map elements for updating the heterogenous map may further comprise transforming the map element which is derived from the more recently captured sensor data into a transformed map element having the having the same type as the current sensor data, and selecting the transformed map element.
The method 400 may optionally further comprise estimating 402 future real-world locations of a sensor device operative to capture the current sensor data, and prioritizing performing one or more of: identifying pairs of corresponding map elements, determining an information difference metric, selecting one of the corresponding map elements, and updating the heterogenous map with the selected map element, for map elements representing the estimated future real-world locations of the sensor device operative to capture the current sensor data. The future real-world locations of a sensor device operative to capture the current sensor data are estimated based on one or more of: a current position of the sensor device, a current and/or future velocity of the sensor device, a planned or estimated route of the sensor device, geographical limitations in the movement of the sensor device, and a processing time which is estimated to lapse between receiving the request pertaining to localization and until updating the heterogenous map is finalized.
If the received request pertains to localization using current sensor data which is image-based sensor data and structure-based sensor data, the method 400 further comprising indicating 408, in response to the request, which type of current sensor data to use for localization.
The method 400 may optionally further comprise, if the selected map element has the same type as the current sensor data, performing localization 408 using the current sensor data. Else, if the selected map element has a different type than the current sensor data, the method may further comprise transforming the current sensor data into transformed sensor data having the same type as the selected map element, and performing localization 409 using the transformed sensor data.
It will be appreciated that the method 400 may comprise additional, alternative, or modified, steps in accordance with what is described throughout this disclosure. An embodiment of the method 400 may be implemented as a computer program comprising instructions which, when the computer program is executed by a computing device, such as the mapping device 140, cause the computing device to carry out the method 400. In particular, the method 400 may be implemented as the computer program 205 comprising instructions which, when executed by the one or more processor(s) 203 comprised in the mapping device 140, cause the mapping device 140 to carry out the method 400 and become operative in accordance with embodiments of the invention described herein.
The computer program 205 may be stored in a computer-readable data carrier, such as the memory 204. Alternatively, the computer program 205 may be carried by a data carrier signal, e.g., downloaded to the memory 204 via the network interface 206.
The person skilled in the art realizes that the invention by no means is limited to the embodiments described above. On the contrary, many modifications and variations are possible within the scope of the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/069027 | 7/8/2021 | WO |