The system and method disclosed in this document relates to processing sensor data and, more particularly, to estimating semantic maps from 2D LiDAR scans.
Unless otherwise indicated herein, the materials described in this section are not admitted to be the prior art by inclusion in this section.
The light detection and ranging (LiDAR) sensors are commonly used by robot vacuum cleaners to obtain LiDAR scans of environment that is to be cleaned by the robot vacuum cleaner. However, these LiDAR scans tend to be noisy and incomplete. Accordingly, processes for interpreting these LiDAR scans tend to be highly prone to errors, which adversely affect the operations of the robot vacuum cleaner. Moreover, learning-based models for interpreting these LiDAR scans are challenging to develop due to the unavailability of and the expense of collecting sufficiently large sets of training data having the detailed annotations that would enable a learning-based model to provide robust and useful interpretation of new LiDAR scans.
A method for training a model to estimate semantic labels for LiDAR scans is disclosed. The method comprises receiving, with a processor, a floorplan. The method further comprises generating, with the processor, a simulated LiDAR scan by converting the floorplan using a physics-based simulation model. The method further comprises annotating, with the processor, the simulated LiDAR scan with semantic labels. The method further comprises training, with the processor, the model using the simulated LiDAR scan.
A method for operating a device is disclosed. The method comprises capturing, with a LiDAR sensor of the device, a LiDAR scan of an environment. The method further comprises generating, with a processor of the device, semantic labels for the LiDAR scan using a trained model, the model having been trained in-part using simulated LiDAR scans. The generating semantic labels comprises identifying portions of the LiDAR scan that correspond to a floor in the environment. The generating semantic labels further comprises identifying portions of the LiDAR scan that correspond to a wall in the environment. The method further comprises operating at least one actuator of the device to perform a task depending on the semantic labels for the LiDAR scan
The foregoing aspects and other features of the methods and systems are explained in the following description, taken in connection with the accompanying drawings.
For the purposes of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiments illustrated in the drawings and described in the following written specification. It is understood that no limitation to the scope of the disclosure is thereby intended. It is further understood that the present disclosure includes any alterations and modifications to the illustrated embodiments and includes further applications of the principles of the disclosure as would normally occur to one skilled in the art which this disclosure pertains.
A system and method are disclosed herein for developing robust semantic mapping models for estimating semantic maps from LiDAR scans. In particular, the system and method enable the generation of realistic simulated LiDAR scans based on two-dimensional (2D) floorplans, for the purpose of providing a much larger set of training data that can be used to train robust semantic mapping models. These simulated LiDAR scans, as well as real LiDAR scans, are annotated using automated and manual processes with a rich set of semantic labels. Based on the annotated LiDAR scans, one or more semantic mapping models can be trained to estimate the semantic map for new LiDAR scans. The trained semantic mapping model(s) can be deployed in robot vacuum cleaners, as well as similar devices that must interpret LiDAR scans of an environment to perform a task.
This point cloud is interpreted to further indicate positions in floorplan at which there was no reflective obstruction (shown in solid white) and positions in floorplan that were not explored during the scanning process (shown with diagonal hatching). The positions in the floorplan interpreted to include no reflective obstruction are those positions through which the measurement light traveled to reach the detected obstructions (i.e., in between the detected obstructions and the LiDAR sensor trajectory). Conversely, positions in the floorplan interpreted to be unexplored are those positions through which the measurement light did not travel during the scanning process.
It should be appreciated that the point cloud of the LiDAR scan can indicate 2D positions or three-dimensional (3D) positions, depending of the scanning process utilized. In any case, in some embodiments, the point cloud is subsequently converted into a raster map after the scanning process, in particular a 2D raster map. The 2D raster map comprises a matrix or grid of pixels, in which each pixel indicates whether a corresponding 2D position in the floorplan (1) includes a physical obstruction, (2) does not include physical obstruction, or (3) is unexplored. Accordingly, references to a LiDAR scan in the description should be understood to refer interchangeably to a point cloud or to an equivalent a raster map. Likewise, references to points of a LiDAR scan or to pixels of a LiDAR scan in the description should be understood essentially interchangeably.
In the illustrated embodiment, the semantic labels include labels for each explored point and/or pixel of the semantic map 50 that distinguish between the floor in the floorplan that was scanned (shown in solid white), walls in the floorplan (shown in solid black), and other obstructions detected on the floor (e.g., clutter or furniture, shown with grid cross-hatching). In at least some embodiments, the semantic labels include labels that identify obstructions detected on the floor at a class-level (e.g., “sofa,” “table,” “TV stand,” and “bed”), as well as at an instance-level (e.g., “Sofa 1,” “Sofa 2,” “Table 1,” and “Table 2”).
In some embodiments, the semantic labels include labels for each point and/or pixel of the semantic map 50 that segment the floorplan into different rooms. In the illustration of
Finally, in at least some embodiments, the semantic labels include labels that identify points and/or pixels of the semantic map 50 that correspond to measurement errors caused by one of (i) glass and (ii) mirrors (shown with horizontal hatching). Particularly, it will be appreciated that materials such as glass and mirrors generally do not reflect light diffusely and instead reflect light in a specular manner. Accordingly, little to none of the measurement light emitted from the LiDAR sensor may be reflected directly back to the LiDAR sensor. As a result, the 2D LiDAR scan 10 may include erroneous points and/or pixels indicating an obstruction where there was no obstruction and erroneous points and/or pixels indicating the lack of obstruction. The 2D semantic map 50 advantageously includes semantic labels identifying the points and/or pixels at which these measurement errors may exist. Additionally, in some embodiments, the semantic labels further include labels that identify points and/or pixels of the semantic map 50 predicted to include the glass or mirror itself that caused these errors (not shown).
The systems and methods described herein are advantageous improvements to conventional techniques for several reasons. Firstly, semantic mapping models trained according the methods described herein take incomplete and noisy LiDAR scans as inputs, rather than requiring complete and clean floor plan drawings. Secondly, semantic mapping models trained according the methods described herein can detect mismeasurements caused by mirrors and glasses and further localize mirrors and glasses. Thirdly, semantic mapping models trained according the methods described herein can provide both instance-level fine-grained and class-level segmentation of the input LiDAR scans. Fourthly, the methods described herein provide a simulation pipeline that can be used to generate large-scale realistic training data, which greatly improves the performance of the trained semantic mapping models.
The processor 110 is configured to execute instructions to operate the computing device 100 to enable the features, functionality, characteristics and/or the like as described herein. To this end, the processor 110 is operably connected to the memory 120, the display screen 130, and the network communications module 150. The processor 110 generally comprises one or more processors which may operate in parallel or otherwise in concert with one another. It will be recognized by those of ordinary skill in the art that a “processor” includes any hardware system, hardware mechanism or hardware component that processes data, signals or other information. Accordingly, the processor 110 may include a system with a central processing unit, graphics processing units, multiple processing units, dedicated circuitry for achieving functionality, programmable logic, or other processing systems.
The memory 120 is configured to store data and program instructions that, when executed by the processor 110, enable the computing device 100 to perform various operations described herein. The memory 120 may be of any type of device capable of storing information accessible by the processor 110, such as a memory card, ROM, RAM, hard drives, discs, flash memory, or any of various other computer-readable medium serving as data storage devices, as will be recognized by those of ordinary skill in the art.
The display screen 130 may comprise any of various known types of displays, such as LCD or OLED screens. The user interface 140 may include a variety of interfaces for operating the computing device 100, such as a buttons, switches, a keyboard or other keypad, speakers, and a microphone. Alternatively, or in addition, the display screen 130 may comprise a touch screen configured to receive touch inputs from a user.
The network communications module 150 may comprise one or more transceivers, modems, processors, memories, oscillators, antennas, or other hardware conventionally included in a communications module to enable communications with various other devices. Particularly, the network communications module 150 generally includes a Wi-Fi module configured to enable communication with a Wi-Fi network and/or Wi-Fi router (not shown) configured to enable communication with various other devices. Additionally, the network communications module 150 may include a Bluetooth® module (not shown), as well as one or more cellular modems configured to communicate with wireless telephony networks.
The computing device 100 may also include a respective battery or other power source (not shown) configured to power the various components within the computing device 100. In one embodiment, the battery of the computing device 100 is a rechargeable battery configured to be charged when the computing device 100 is connected to a battery charger configured for use with the computing device 100.
In at least one embodiment, the memory 120 stores program instructions of a training data generation tool 160 configured to enable a user to generate richly annotated training data 170 for the purpose of training a semantic mapping model 180. As discussed in further detail below, the processor 110 is configured to execute program instructions of the training data generation tool 160 to enable the user to generate annotated training data, which generally takes the form of training data pairs consisting of input LiDAR scan data and corresponding output semantic maps, which are essentially similar to the exemplary LiDAR scan 10 of
A variety of methods and processes are described below for operating the computing device 100 to develop and train a semantic mapping model 180. In these descriptions, statements that a method, processor, and/or system is performing some task or function refers to a controller or processor (e.g., the processor 110 of the computing device 100) executing programmed instructions stored in non-transitory computer readable storage media (e.g., the memory 120 of the computing device 100) operatively connected to the controller or processor to manipulate data or to operate one or more components in the computing device 100 to perform the task or function. Additionally, the steps of the methods may be performed in any feasible chronological order, regardless of the order shown in the figures or the order in which the steps are described.
The method 200 begins with receiving a plurality of floorplans (block 210). Particularly, the processor 110 receives, and stores in the memory 120, a plurality of floorplans. In at least one embodiment, these floorplans are 2D floorplans. However, in other embodiments, 3D floorplans can be utilized. In some embodiments, these floorplans can be obtained from a public dataset or can be generated, such as by a generative adversarial network (GAN)-based method. The plurality of floorplans can be received or generated in a variety of formats, such as a raster image or vector image, or any of a variety of 3D model or 2D model file formats.
In some embodiments, the processor 110 receives user inputs from a user via the user interface 140 for the purpose of manually pre-processing or labeling the floorplans. In one embodiment, the processor 110 receives user inputs from a user via the user interface 140 defining a scaling factor for each respective floorplan. In one embodiment, the processor 110 receives user inputs from a user via the user interface 140 defining polygon-level annotations, such as polygons defining room boundaries or fixture boundaries (e.g., cabinetry, sinks, bath tubs, or the like). In some embodiments, the processor 110 performs certain automated pre-processing of the plurality of floorplans, such as converting the format of the floorplans into a stand format (e.g., converting raster images into vector images), using suitable processes.
The method 200 continues with generating a plurality of simulated LiDAR scans by converting the plurality of floorplans using a physics-based simulation model (block 220). Particularly, the processor 110 generates, and stores in the memory 120, a plurality of simulated LiDAR scans based on the plurality of floorplans. In each case, the processor 110 generates the simulated LiDAR scan based on a respective floorplan using a physics-based simulation model.
The method 500 continues with resizing and/or rotating the virtual object (block 520). Particularly, the processor 110 resizes and/or rotates the virtual object that is defined by the selected template. More particularly, in some embodiments, the processor 110 resizes the virtual object to match a scale of the selected floorplan or, optionally, with a random scaling within a reasonable predefined range relative to the scale of the selected floorplan. Additionally, the processor 110 rotates the orientation of virtual object randomly or according to predetermined rules for the virtual object (e.g., certain virtual objects might only be oriented in certain ways).
The method 500 continues with selecting a position for the virtual object within the virtual environment with reference to at least one placement rule (block 530). Particularly, the processor 110 selects a position for the virtual object within the virtual environment. The position for the virtual object within the virtual environment is selected depending on or constrained by one or more placement rules, which may depend on a type of object defined by the template. For example, certain virtual object types might only be placed in particular room types (e.g., a bed can only be placed in a bedroom or a bathtub can only be placed in a bathroom). As another example, certain virtual objects might only be placed a certain distance from a wall (e.g., a table), while certain other virtual objects might only be placed on a wall or directly touching (e.g., a wall mirror).
The method 500 continues with checking for a collision of the virtual object with another virtual structure within the virtual environment (block 540). Particularly, the processor 110 checks for a collision (i.e., an intersection) of the virtual object with another virtual structure (i.e., walls, fixtures, furniture, other objects, etc.) when placed at the selected position. In response to detecting a collision of the virtual object, the processor 110 selects another position within the virtual environment to place the virtual object (i.e., the method 500 returns to block 530). Otherwise, the processor 110 moves on to placing the next virtual object or finishing with the placing of virtual objects.
Returning to
The method 400 continues with simulating a scanning of the virtual environment by the virtual LiDAR sensor moved along the simulated moving trajectory (block 430). Particularly, the processor 110 simulates a scanning of the virtual environment by moving the virtual LiDAR sensor along the simulated moving trajectory. At each respective position of a plurality of positions along the simulated moving trajectory, the processor 110 simulates the emission of measurement light from the virtual LiDAR sensor at the respective position, the reflection of the measurement light through the virtual environment, and reception of the measurement light at the virtual LiDAR sensor at the respective position, using a physics based-model. In simulating the emission, reflection, and reception of the measurement light, the processor 110 takes into consideration the virtual structures in the virtual environment (i.e., walls, fixtures, furniture, etc.) and their material properties, in particular their reflective characteristics. In this way, virtual glass or mirrors having specular reflective characteristics will give rise to realistic measurement errors.
In at least one embodiment, the processor 110 simulates the emission, reflection, and reception of the measurement light using a raytracing-based simulation model. The material properties of the virtual structures are modeled by laser/light intensity response curves (i.e., intensity vs. incident angle). Additionally, the processor 110 utilizes a LiDAR sensor model which models range, accuracy, and precision (e.g., which can be modeled by step functions or splines), as well as angular resolution, Lambertian reflectivity, detection probability, and beam divergence. In one embodiment, a signal attenuation in the raytracing-based simulation model is adjustable and may, for example, be set to
In one embodiment, a maximum recursion depth for the raytracing-based simulation model can be adjusted.
The method 400 continues with generating the simulated LiDAR scan based on the simulated scanning of the virtual environment (block 440). Particularly, based on the simulated scanning of the virtual environment, the processor 110 generates a simulated LiDAR scan of the respective floorplan. In particular, the processor 110 calculates simulated times of flight and/or simulated return times for the measurement light (i.e., a time between emission and reception of the measurement light). Based on the simulated times of flight and/or simulated return times, the processor 110 generates the simulated LiDAR scan, for example in the form of a point cloud or raster map, as discussed above with respect to the exemplary LiDAR scan 10. In at least one embodiment, the processor 110 applies sensor noise to the simulated LiDAR scan, or more particularly, to the simulated times of flight and/or simulated return times. In this way, the processor 110 generates a more realistic simulated LiDAR scan.
Returning to
The method 200 continues with annotating the plurality of simulated LiDAR scans and the plurality of real LiDAR scans with semantic labels (block 240). Particularly, the processor 110 annotates the plurality of simulated LiDAR scans and the plurality of real LiDAR scans by generating semantic labels for each respective LiDAR scan and compiling them into respective semantic map for each respective LiDAR scan. In some embodiments, the processor 110 automatically generates at least some of the semantic labels for each simulated LiDAR scan based, in part, the virtual environment that was defined based on the respective floorplan. In some embodiments, the processor 110 automatically generates at least some of the semantic labels for each real LiDAR scan based on the measurements of the respective real LiDAR scan. In some embodiments, the processor 110 generates at least some of the semantic labels for each real LiDAR scan based on manual user inputs received via a user interface.
In some embodiments, the processor 110 generates semantic labels that distinguish between unexplored regions, walls, floors, and other obstructions detected on the floors (e.g., furniture or clutter). Particularly, if a pixel that is unexplored coincides with a position that is outside of the virtual environment or within a virtual structure of the virtual environment, then the processor 110 labels the respective pixel with a “unexplored” semantic label. Additionally, if a pixel at which no obstruction was detected (i.e., white pixels) coincides with the floor of the virtual environment, then the processor 110 labels the respective pixel with a “floor” semantic label. Conversely, if a pixel at which an obstruction was detected (i.e., black pixels) coincides with a virtual wall of the virtual environment, then the processor 110 labels the respective pixel with a “wall” semantic label. Similarly, if a pixel at which an obstruction was detected (i.e., black pixels) coincides with a virtual furniture object or virtual clutter object in the virtual environment, then the processor 110 labels the respective pixel with a “furniture/clutter” semantic label. In at least some embodiments, the semantic labels include labels that identify virtual furniture objects or virtual clutter objects detected on the floor at a class-level (e.g., “sofa,” “table,” “TV stand,” and “bed”), as well as at an instance-level (e.g., “Sofa 1,” “Sofa 2,” “Table 1,” and “Table 2”).
Additionally, in some embodiments, the processor 110 generates semantic labels that identify a room type and room instance. Particularly, the processor 110 generates room labels for each pixel based on the room of the virtual environment corresponding to the position of the respective pixel. In at least some embodiments, the room segmentation labels identify the rooms at a class-level (i.e., room type) and at an instance level (e.g., “Bedroom,” “Bathroom,” “Laundry Room,” “Hallway,” Kitchen,” “Living Room 1,” “Living Room 2,” “Dining Room 1,” and “Dining Room 2”). However, in some embodiments, room labels may only identify the rooms at an instance-level (e.g., “room 1,” “room 2,” “room 3,” etc.).
In some embodiments, the processor 110 further generates semantic labels that identify measurement errors, such as those errors typically caused by glass or mirrors. Particularly, if a pixel at which no obstruction was detected (i.e., white pixels) coincides with a position that is outside of the virtual environment or within a virtual structure of the virtual environment, then the processor 110 labels the respective pixel with a “mirror/glass error” semantic label. Likewise, if a pixel at which an obstruction was detected (i.e., black pixels) coincides with a position that is outside of the virtual environment or within a virtual structure of the virtual environment, then the processor 110 labels the respective pixel with a “mirror/glass error” semantic label.
In some embodiments, the processor 110 further generates semantic labels that identify virtual structures containing mirrors or glass. Particularly, if a pixel coincides with a virtual furniture object or virtual clutter object in the virtual environment having material properties of glass or a mirror, then the processor 110 labels the respective pixel with a “mirror/glass” semantic label identifying that mirror or glass is located at that pixel.
With continued reference to
The method 600 continues with automatically generating pixel-level semantic labels for each real LiDAR scan based on the polygon-level semantic labels and based on the measurements of the real LiDAR scan (block 630). Particularly, for each respective pixel in each respective simulated LiDAR scan, the processor 110 determines one or more semantic labels. The processor 110 determines pixel-level semantic labels based on the measurements from the real LiDAR scans and based on the polygon-level semantic labels.
Particularly, the processor 110 labels each pixel that is unexplored with the “unexplored” semantic label. Additionally, the processor 110 labels each pixel at which no obstruction was detected (i.e., white pixels) with the “floor” semantic label if the pixel is not bounded by a polygon identifying measurement errors or identifying furniture or other objects having mirrors or glass that caused measurement errors. Furthermore, the processor 110 labels each pixel at which an obstruction was detected (i.e., black pixels) with the “wall” semantic label if the pixel is not bounded by a polygon identifying furniture or other clutter. Conversely, the processor 110 labels each pixel at which an obstruction was detected (i.e., black pixels) with the “furniture/clutter” semantic label if the pixel is also bounded by a polygon indicated furniture or other clutter. In at least some embodiments, the semantic labels associated with the polygons identify virtual furniture objects or virtual clutter objects detected on the floor at a class-level (e.g., “sofa,” “table,” “TV stand,” and “bed”), as well as at an instance-level (e.g., “Sofa 1,” “Sofa 2,” “Table 1,” and “Table 2”).
Additionally, in some embodiments, the processor 110 labels each pixel with semantic labels that identify a room type and/or room instance if they are bounded by a polygon that specifies a room type and/or room instance. In at least some embodiments, the semantic labels associated with the polygons identify the rooms at a class-level (i.e., room type) and at an instance level (e.g., “Bedroom,” “Bathroom,” “Laundry Room,” “Hallway,” Kitchen,” “Living Room 1,” “Living Room 2,” “Dining Room 1,” and “Dining Room 2”). However, in some embodiments, the semantic labels associated with the polygons may only identify the rooms at an instance-level (e.g., “room 1,” “room 2,” “room 3,” etc.).
Moreover, in some embodiments, the processor 110 labels pixels with the “mirror/glass error” semantic label if they are bounded by a polygon that identifies those measurement errors in the real LiDAR scan. Likewise, the processor 110 labels each pixel with the “mirror/glass” semantic label if they are bounded by a polygon that identifies furniture or other objects having mirrors or glass.
With continued reference to
Returning to
The semantic mapping model 180 may comprise any type or combination of traditional or deep-learning based models configured to perform semantic segmentation, panoptic segmentation, floor plan recognition, etc. The semantic mapping model 180 may, for example comprise one or more machine learning models such as convolution neural networks, or the like. As used herein, the term “machine learning model” refers to a system or set of program instructions and/or data configured to implement an algorithm, process, or mathematical model (e.g., a neural network) that predicts or otherwise provides a desired output based on a given input. It will be appreciated that, in general, many or most parameters of a machine learning model are not explicitly programmed and the machine learning model is not, in the traditional sense, explicitly designed to follow particular rules in order to provide the desired output for a given input. Instead, a machine learning model is provided with a corpus of training data from which it identifies or “learns” implicit patterns and statistical relationships in the data, which are generalized to make predictions or otherwise provide outputs with respect to new data inputs. The result of the training process is embodied in a plurality of learned parameters, kernel weights, and/or filter values that are used in the various components of the machine learning model to perform various operations or functions.
It should, at this point, be appreciated that the annotated training data 170 comprises two categories of training data pairs: (1) simulated training data pairs consisting of a respective simulated LiDAR scan and a corresponding semantic map, and (2) real training data pairs consisting of a respective real LiDAR scan and a corresponding semantic map.
Generally, the semantic mapping model 180 will achieve the best performance when trained with a large number of real training data pairs. Accordingly, if the number of available real training data pairs reaches a sufficient threshold amount of training data, then the semantic mapping model 180 could be trained only using the real training data pairs. However, in practice, it is very time-consuming to collect and annotate a sufficiently large number of real LiDAR scans so as to cover all the possible corner cases. Therefore, in most embodiments, the semantic mapping model 180 must be trained using a combination of the simulated training data pairs and the real training data pairs.
In at least some embodiments, a domain adaptation approach is utilized during the training process to bridge the gap between the simulated training data and the real training data.
In some embodiments, during training, learned parameters and/or weights of the feature extractors 720A and 720B are fine-tuned based both on (i) a classification loss depending on semantic labeling errors in the estimated semantic maps 730A, 730B and (ii) a discrimination loss depending on domain discrimination errors by the discriminator 710. In particular, the feature extractors 720A and 720B are fine-tuned to minimize classification errors in the estimated semantic maps and to minimize the ability of the discriminate between real and simulated LiDAR scans. Similarly, during training, the learned parameters and/or weights of the discriminator 710 are fine-tuned based on the discrimination loss depending on domain discrimination errors by the discriminator 710.
In some embodiments, the both feature extractors 720A and 720B are trained simultaneously. Moreover, in some embodiments, the both feature extractors 720A and 720B are the same neural network and share the same learned parameters and/or weights. Alternatively, in some embodiments, the feature extractor 720A (i.e., the source feature extractor) is pre-trained, prior to domain adaptation, using conventional methods depending only on a classification loss using the generally larger collection of simulated training data. Next, the feature extractor 720B (i.e., the target feature extractor) is trained depending on discrimination loss and classification loss, using the generally smaller collection of real training data. In any case, it should be appreciated that the semantic mapping model 180 that is deployed on end-user devices incorporates the feature extractor 720B (i.e., the target feature extractor) for feature extraction.
In some embodiments, this domain adaptation and/or training is an iterative process. For example, during some phases of the training, the parameters and/or weights of the feature extractors 720A and 720B are fine-tuned, while the parameters and/or weights of the discriminator 710 are fixed. Conversely, during other phases of the training, the parameters and/or weights of the discriminator 710 are fine-tuned, while the feature extractors 720A and 720B are fixed. Popular implementations of this iterative process could include GAN-based deep neural networks and expectation maximization (EM)—like methods.
The processor 810 is configured to execute instructions to operate the robot vacuum cleaner 800 to enable the features, functionality, characteristics and/or the like as described herein. To this end, the processor 110 is operably connected to the memory 820, the LiDAR sensor 830, and the one or more actuators 840. The processor 810 generally comprises one or more processors which may operate in parallel or otherwise in concert with one another. It will be recognized by those of ordinary skill in the art that a “processor” includes any hardware system, hardware mechanism or hardware component that processes data, signals or other information. Accordingly, the processor 810 may include a system with a central processing unit, graphics processing units, multiple processing units, dedicated circuitry for achieving functionality, programmable logic, or other processing systems.
The memory 820 is configured to store data and program instructions that, when executed by the processor 810, enable the robot vacuum cleaner 800 to perform various operations described herein. The memory 820 may be of any type of device capable of storing information accessible by the processor 810, such as a memory card, ROM, RAM, hard drives, discs, flash memory, or any of various other computer-readable medium serving as data storage devices, as will be recognized by those of ordinary skill in the art. In at least one embodiment, the memory 820 stores the trained semantic mapping model 180. As discussed in further detail below, the processor 810 is configured to execute program instructions of the trained semantic mapping model 180 to estimate a semantic map based on a real LiDAR scan captured using the LiDAR sensor 830.
The LiDAR sensor 830 is configured to emit measurement light (e.g., lasers) and receive the measurement light after it has reflected throughout the environment. The processor 810 is configured to calculate times of flight and/or return times for the measurement light. Based on the calculated times of flight and/or return times, the processor 810 generates a real LiDAR scan, for example in the form of a point cloud or raster map.
The one or more actuators 840 at least include motors of a locomotion system that, for example, drive a set of wheels to cause the robot vacuum cleaner 800 to move throughout the environment during the LiDAR scanning process, as well as during a vacuuming operation. Additionally, the one or more actuators 840 at least include a vacuum suction system configured to vacuum environment as the robot vacuum cleaner 800 is moved throughout the environment.
The robot vacuum cleaner 800 may also include a respective battery or other power source (not shown) configured to power the various components within the robot vacuum cleaner 800. In one embodiment, the battery of the robot vacuum cleaner 800 is a rechargeable battery configured to be charged when the robot vacuum cleaner 800 is connected to a battery charger configured for use with the robot vacuum cleaner 800.
The method 900 continues with determining semantic labels for the LiDAR scan using a trained model, the model having been trained in-part using training data including simulated LiDAR scans (block 920). Particularly, the processor 810 executes program instructions of the trained semantic mapping model 180 to estimate a semantic map for the environment based on the real LiDAR scan of the environment generated using the LiDAR sensor 830. This process may, for example, occur in after or at the end of the learning phase that was initiated by the end-user.
In at least one embodiment, the processor 810 uses the trained semantic mapping model 180 to identify portions of the real LiDAR scan that correspond to a floor in the environment. In at least one embodiment, the processor 810 uses the trained semantic mapping model 180 to identify portions of the real LiDAR scan that correspond to a wall in the environment. In at least one embodiment, the processor 810 uses the trained semantic mapping model 180 to identify portions of the real LiDAR scan that correspond to an obstruction detected on the floor in the environment. In at least one embodiment, the processor 810 uses the trained semantic mapping model 180 to identify portions of the real LiDAR scan that correspond to unexplored regions of the environment.
In at least one embodiment, the processor 810 uses the trained semantic mapping model 180 to identify portions of the real LiDAR scan that correspond to particular room types in the environment. In at least one embodiment, the processor 810 uses the trained semantic mapping model 180 to identify portions of the real LiDAR scan that correspond to particular room instances in the environment.
In at least one embodiment, the processor 810 uses the trained semantic mapping model 180 to identify portions of the real LiDAR scan that correspond to measurement errors caused by glass and/or mirrors. In at least one embodiment, the processor 810 uses the trained semantic mapping model 180 to identify portions of the LiDAR scan that correspond to the glass and/or mirrors the caused the aforementioned measurement errors. In one embodiment, the processor 810 modifies the real LiDAR scan to correct the measurement errors caused by glass and/or mirrors.
The method 900 continues with operating one or more actuators of the robot vacuum cleaner depending on the semantic labels (block 930). Particularly, the processor 810 operates one or more of the actuators 840, such as motors of the locomotion system and/or the vacuum suction system, depending on the semantic labels of the generated semantic map for the environment. In one example, the processor 810 operates the motors of the locomotion system to efficiently navigate the environment depending on the semantic labels indicating the locations of walls, floors, furniture, mirrors, glass, clutter, and/or measurement errors. In another example, the processor 810 operates one or more of the actuators 840 to vacuum clean a particular room instance or room type depending on the semantic labels identifying the particular room instances or room types. These processes may, for example, occur during an operating phase that occurs after the learning phase and which was initiated by the end-user or automatically initiated based on a user-defined schedule.
Embodiments within the scope of the disclosure may also include non-transitory computer-readable storage media or machine-readable medium for carrying or having computer-executable instructions (also referred to as program instructions) or data structures stored thereon. Such non-transitory computer-readable storage media or machine-readable medium may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such non-transitory computer-readable storage media or machine-readable medium can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures. Combinations of the above should also be included within the scope of the non-transitory computer-readable storage media or machine-readable medium.
Computer-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, objects, components, and data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
While the disclosure has been illustrated and described in detail in the drawings and foregoing description, the same should be considered as illustrative and not restrictive in character. It is understood that only the preferred embodiments have been presented and that all changes, modifications and further applications that come within the spirit of the disclosure are desired to be protected.