This disclosure relates generally to computer vision, and more particularly to generating and using a three-dimensional layout model of an environment.
The construction of three-dimensional (3D) room layouts is useful in many applications. For example, 3D room layouts may be used in robotics, augmented reality, virtual reality, etc. In many cases, the construction of 3D room layouts often relies on red, green, blue, depth (RGBD) cameras or a light detection and ranging (LIDAR) scanner to obtain accurate boundaries of rooms. However, the use of an RGBD camera or a LIDAR scanner for constructing 3D room layouts is relatively costly as such technology comprises relatively expensive hardware.
The following is a summary of certain embodiments described in detail below. The described aspects are presented merely to provide the reader with a brief summary of these certain embodiments and the description of these aspects is not intended to limit the scope of this disclosure. Indeed, this disclosure may encompass a variety of aspects that may not be explicitly set forth below.
According to at least one aspect, a computer-implemented method relates to generating a three-dimensional (3D) layout model of an environment. The method includes receiving a digital image. The digital image comprises two-dimensional data. The method includes generating instance segmentation data using the digital image. The instance segmentation data includes segmentation masks identifying architectural elements in the digital image. The method includes generating depth data using the digital image. The method includes generating a set of planes. Each plane is generated using the depth data of a corresponding segmentation mask. The set of planes include at least a first plane and a second plane. The method includes generating boundary estimate data for the set of planes using corresponding boundary data of the segmentation masks. The method includes generating a set of plane segments by bounding the set of planes using the boundary estimate data. The set of plane segments include a first plane segment corresponding to a bounding of the first plane and a second plane segment corresponding to a bounding of the second plane. The method includes generating boundary tolerance data for each boundary estimate. Each boundary tolerance data is used to create a plane buffer, which extends a corresponding boundary estimate by a predetermined distance. The method includes locating an intersection between the first plane segment and the second plane segment using the boundary estimate data and the boundary tolerance data. The method includes constructing a 3D layout model that includes at least a boundary segment connecting the first plane segment and the second plane segment at the intersection.
According to at least one aspect, a system includes at least one or more processors and one or more computer memory. The one or more computer memory are in data communication with the one or more processors. The one or more computer memory have computer readable data stored thereon. The computer readable data includes instruction that, when executed by one or more processors, causes the one or more processors to perform a method. The method includes receiving a digital image. The digital image comprises two-dimensional data. The method includes generating instance segmentation data using the digital image. The instance segmentation data includes segmentation masks identifying architectural elements in the digital image. The method includes generating depth data using the digital image. The method includes generating a set of planes. Each plane is generated using the depth data of a corresponding segmentation mask. The set of planes include at least a first plane and a second plane. The method includes generating boundary estimate data for the set of planes using corresponding boundary data of the segmentation masks. The method includes generating a set of plane segments by bounding the set of planes using the boundary estimate data. The set of plane segments include a first plane segment corresponding to a bounding of the first plane and a second plane segment corresponding to a bounding of the second plane. The method includes generating boundary tolerance data for each boundary estimate. Each boundary tolerance data is used to create a plane buffer, which extends a corresponding boundary estimate by a predetermined distance. The method includes locating an intersection between the first plane segment and the second plane segment using the boundary estimate data and the boundary tolerance data. The method includes constructing a 3D layout model that includes at least a boundary segment connecting the first plane segment and the second plane segment at the intersection.
According to at least one aspect, one or more non-transitory computer readable media have computer readable data stored thereon. The computer readable data include instructions that, when executed by one or more processors, cause the one or more processors to perform a method. The method includes receiving a digital image. The digital image comprises two-dimensional data. The method includes generating instance segmentation data using the digital image. The instance segmentation data includes segmentation masks identifying architectural elements in the digital image. The method includes generating depth data using the digital image. The method includes generating a set of planes. Each plane is generated using the depth data of a corresponding segmentation mask. The set of planes include at least a first plane and a second plane. The method includes generating boundary estimate data for the set of planes using corresponding boundary data of the segmentation masks. The method includes generating a set of plane segments by bounding the set of planes using the boundary estimate data. The set of plane segments include a first plane segment corresponding to a bounding of the first plane and a second plane segment corresponding to a bounding of the second plane. The method includes generating boundary tolerance data for each boundary estimate. Each boundary tolerance data is used to create a plane buffer, which extends a corresponding boundary estimate by a predetermined distance. The method includes locating an intersection between the first plane segment and the second plane segment using the boundary estimate data and the boundary tolerance data. The method includes constructing a 3D layout model that includes at least a boundary segment connecting the first plane segment and the second plane segment at the intersection.
These and other features, aspects, and advantages of the present invention are discussed in the following detailed description in accordance with the accompanying drawings throughout which like characters represent similar or like parts. Furthermore, the drawings are not necessarily to scale, as some features could be exaggerated or minimized to show details of particular components.
The embodiments described herein, which have been shown and described by way of example, and many of their advantages will be understood by the foregoing description, and it will be apparent that various changes can be made in the form, construction, and arrangement of the components without departing from the disclosed subject matter or without sacrificing one or more of its advantages. Indeed, the described forms of these embodiments are merely explanatory. These embodiments are susceptible to various modifications and alternative forms, and the following claims are intended to encompass and include such changes and not be limited to the particular forms disclosed, but rather to cover all modifications, equivalents, and alternatives falling with the spirit and scope of this disclosure.
At step 102, according to an example, the method 100 includes receiving a digital image from at least one image sensor. For example, the digital image may be a red, green, blue (RGB) image, a cyan, magenta, yellow (CMY) image, a grayscale image, or any type of image with pixels. The digital image may comprise a panoramic image or any similar type of image. In this regard, the method 100 is advantageous in that the digital image may be obtained from an image sensor that does not further include a depth sensor. The digital image includes pixel data without depth data. The method 100 only requires that the digital image include two-dimensional (2D) data. Also, the digital image displays one or more architectural elements, which is to be generated as 3D layout model. Upon receiving the digital image, the method proceeds to step 104 and step 106.
At step 104, according to an example, the method 100 includes generating instance segmentation data using at least one digital image. More specifically, the 3D layout model generator 710 is configured to receive the digital image and generate, via an ML system 712 (
At step 106, according to an example, the method 100 includes generating depth data via a depth estimator 716 (
In another example embodiment, the depth estimator 716 comprises a low-cost laser rangefinder. The laser rangefinder is positioned in a vicinity of the image sensor (e.g., camera) that generates each digital image. The laser rangefinder is positioned at a predetermined distance away from the image sensor. The laser rangefinder is configured to use a laser beam to determine a distance to an object (e.g., an architectural element). The depth estimator 716 is configured to generate laser measurements (e.g., sparse laser measurements) correlated with the digital image. In this regard, the depth estimator 716 is configured to generate depth data or a depth map for the digital image at a low-cost. After the depth estimation data is generated, the method 100 proceeds to step 108.
At step 108, according to an example, the method 100 includes generating 3D configuration estimation data using the instance segmentation data, generated at step 104, and the depth data, generated at step 106. The configuration estimation data includes a set of planes. Each plane is defined by the group of depth data, which includes at least three non-collinear 3D coordinate points that are associated with a same segmentation mask. Each plane is defined by a plane equation. In this regard, the model generation system 702 and/or the 3D layout model generator 710 is configured to obtain a group of at least three non-collinear 3D point coordinates of depth data for a particular segmentation mask and then fit a plane to these 3D point coordinates. The set of planes includes a number of planes in which each plane is generated for a respective group of depth data associated with a particular segmentation mask.
As a non-limiting example, the 3D layout model generator 710 is configured to obtain a group of depth data comprising 3D point coordinates, which are associated with the segmentation mask of “floor,” and generate a plane for those 3D point coordinates. In this case, the set of planes include at least the plane corresponding to an architectural element that is identified as “floor.” Upon generating the 3D configuration estimation data that includes the set of planes, the method 100 proceeds to step 110.
At step 110, according to an example, the method 100 includes generating a set of plane segments by bounding the set of planes. More specifically, after generating the set of planes, the model generation system 702 obtains boundary data (e.g., edge data) of the segmentation masks. The model generation system 702 generates boundary estimate data for each plane using the boundary data of the corresponding segmentation mask. The model generation system 702 generates a plane segment for a plane by bounding that plane using the boundary estimate data associated with boundary data of a particular segmentation mask. Upon bounding each plane of the set of planes, the method 100 proceeds to step 112.
At step 112, according to an example, the method 100 includes generating boundary tolerance data for each boundary estimate data. The boundary tolerance data provides a plane segment with a plane buffer that provides a plane extension that extends outward from the boundary estimate data by a predetermined distance. The boundary tolerance data is advantageous in providing a plane buffer that extends a bounding range of a plane segment in the event that the boundary estimate data is misestimated. Upon generating boundary tolerance data for each boundary estimate data such that each plane segment includes plane buffers, the method 100 proceeds to step 114.
At step 114, according to an example, the method 100 includes constructing a 3D layout model by connecting the plane segments using the boundary estimate data and the boundary tolerance data. More specifically, upon generating the set of plane segments, the model generation system 702 and/or the 3D layout model generator 710 is configured to locate an intersection between plane segments using the boundary estimate data and the boundary tolerance data. In this regard, the intersection may be positioned within a vicinity of the boundary estimate and the boundary tolerance data. In most cases, the intersection may be located near the boundary estimate data or between the boundary estimate and boundary tolerance data. The plane segments are then connected at a boundary segment defined at the intersection between a pair of plane segments. After the plane segments are connected at intersections, then each plane segment is indicative of a planar surface, corresponding to a particular segmentation mask, (e.g., wall, floor, ceiling, etc.) and each boundary segment is indicative of a connection between that planar surface and another planar surface. Also, once the 3D layout model is generated, the 3D layout model may be used to update or enhance the depth estimator 716 and/or the depth data of step 106.
After the 3D layout model is generated at step 114, then the 3D layout model may be used in a number of different ways and a number of different applications. For example, the 3D layout model may be used to generate measurement data associated with various dimensions of the 3D layout model. The 3D layout model may be displayed by a display device or transmitted to another computing device. The 3D layout model may be combined with other 3D layout models. The 3D layout model may be modified by a user. In this regard, the 3D layout model may be outputted and/or used downstream. For example, the 3D layout model may be used downstream by a navigation system to navigate a mobile robot around a room or a part of a building. Also, the 3D layout model is configured to be aligned and combined with one or more other 3D layout models to generate a unified 3D layout model. For example, the 3D layout model may be aligned and combined with another architectural structure (e.g., one or more walls) and/or another 3D layout model (e.g., one or more rooms) to create a unified 3D layout model that shows a greater portion of a house/building.
In addition, the visualization 300 also displays some examples of depth data, which is generated by a depth estimator 716 based on the digital image 200. For ease and convenience of illustration, each depth point is illustrated as a dot on the visualization 300. As shown in
Upon identifying and establishing a group of depth data for each segmentation mask, the model generation system 702 is configured to generate configuration estimation data. The configuration estimation data includes a set of planes. Each plane is defined by the group of depth data, which includes at least three non-collinear 3D coordinate points that are associated with a same segmentation mask. Each plane is defined by a plane equation.
After generating the set of planes, the model generation system 702 uses the boundary data of the segmentation masks to identify corresponding boundary estimate data for each plane. The model generation system 702 generates a plane segment for each plane by bounding that plane using corresponding boundary data of the corresponding segmentation mask. Next, upon generating the boundary estimate data for each plane, the model generation system 702 generates boundary tolerance data for each boundary estimate. The boundary tolerance data provides a plane buffer that extends a bound of a plane segment. The boundary tolerance data is a predetermined distance away from the boundary estimate. The boundary tolerance data is advantageous in providing a plane buffer that extends a bounding range of a plane segment in the event that the boundary estimate is misestimated.
As shown in
As aforementioned, the model generation system 702 generates a number of planes. In this regard, regarding a scenario in which there is no boundary estimate data and no buffers for locating an intersection, then a number of non-parallel planes may intersect and may form a number of connections, boundary segments, and/or architectural elements, which do not exist in the actual environment itself. Also, since any two non-parallel planes may eventually intersect at some point, these intersections may generate a number of possible layouts. Taking this into account, the model generation system 702 is advantageous in (i) generating boundary estimate data using the boundary data of the segmentation masks as a guide, (ii) generating boundary tolerance data and plane buffers to account for a misestimation of the boundary estimate data, and (iii) using the boundary estimate data and the boundary tolerance data to determine a vicinity and general range for locating an actual intersection and an actual boundary segment, which then defines a bound for a plane segment. As such, the generation of boundary estimate data, boundary tolerance data, and boundary buffers enable the model generation system 702 to locate an actual connection between planes for the formation of the 3D layout model with professional grade precision.
In addition, the model generation system 702 locates an intersection between the first plane segment 410 and the second plane segment 420 using the boundary estimate data 412 and the boundary estimate data 422, as well as the boundary tolerance data 414 and the boundary tolerance data 424. The intersection may be located in a vicinity of the boundary estimate data 412, the boundary estimate data 422, the boundary tolerance data 414, and the boundary tolerance data 424. For example, the intersection is found to reside on the first plane segment 410 in a plane extension region of the buffer 416 at a location that is between the boundary estimate data 412 and the boundary tolerance data 414. The intersection is also found to reside on the second plane segment 420 at a location that is more inward than the boundary estimate data 422 such that the boundary estimate data 422 is between the intersection and the boundary tolerance data 424. The intersection is thus on the second plane segment 420 at a location that is more inward than the boundary tolerance data 424.
The first plane segment 410 and the second plane segment 420 are then connected at a boundary segment 430 defined at the intersection between the first plane segment 410 and the second plane segment 420. The first plane segment 410 is bounded at one end portion by the boundary segment 430, which defines a boundary of the room. In this non-limiting example, as shown in
In addition,
At step 602, according to an example, the method 600 includes generating or estimating camera pose data. The camera pose data represents the position and orientation of an object (e.g., architectural element), usually in three dimensions. As shown in
The model generation system 702 and/or the 3D layout model generator 710 is configured to identify matching, common, and/or overlapping features via the instance segmentation data. For example, the 3D layout model generator 710 is configured to generate or estimate camera pose data based on geometric calculations relating to matching one or more plane segments from the set of plane segments of one digital image with one or more plane segments from another set of plane segments of another digital image. Upon generating or estimating the camera pose data, the method 600 proceeds to step 604.
At step 604, according to an example, the method 600 includes aligning a set of 3D layout data and/or 3D layout models. As shown in
In some implementations, when there is not a sufficient number of overlapping areas among the set of 3D layout data to generate camera pose data and/or when camera pose data is lacking, the model generation system 702 and/or the 3D layout model generator 710 is configured to use at least one computer vision algorithm to identify corresponding planes between the 3D layout data/models. Additionally or alternatively, a set of corresponding planes for unifying the 3D layout data/models may be manually defined. As an example, the set of corresponding planes for unifying 3D layout data/models may be received as input data from at least one user.
After the unified 3D layout model is generated, then the unified 3D layout model may be used in a number of different ways and a number of different applications. For example, the unified 3D layout model may be used to generate measurement data associated with various dimensions of the unified 3D layout model. The unified 3D layout model may be displayed by a display device or transmitted to another computing device. The unified 3D layout model may be modified by a user. In this regard, the unified 3D layout model may be outputted and/or used downstream.
The system 700 and/or the model generation system 702 includes at least one sensor system 706. The sensor system 706 includes one or more sensors. For example, the sensor system 706 includes an image sensor, such as a camera that generates digital images. The sensor system 706 may include at least one other sensor, such as an inertial measurement unit (IMU), depending upon the specific application (e.g., robot) of the model generation system 702. The sensor system 706 is operable to communicate with one or more other components (e.g., processing system 704 and memory system 708) of the system 700. For example, the sensor system 706 may provide sensor data (e.g., digital images), which is then processed by the processing system 704, via the 3D layout model generator 710, to generate one or more 3D layout models, one or more unified 3D layout models, measurement data relating to one or more 3D layout models or unified 3D layout models, or any number and combination thereof. The sensor system 706 is local, remote, or a combination thereof (e.g., partly local and partly remote) with respect to one or more components of the system 700. Upon receiving the sensor data (e.g., one or more digital images), the processing system 704, via the 3D layout model generator 710, is configured to process this sensor data (e.g. digital images) in connection with the ML system 712, the other relevant data 714, or any number and combination thereof.
The system 700 and/or the model generation system 702 includes a memory system 708, which is operatively connected to the processing system 704. In this regard, the processing system 704 is in data communication with the memory system 708. Thee memory system 708 includes at least one non-transitory computer readable storage medium, which is configured to store and provide access to various data to enable at least the processing system 704 to perform the operations and functionality, as disclosed herein. The memory system 708 comprises a single memory device or a plurality of memory devices. The memory system 708 may include electrical, electronic, magnetic, optical, semiconductor, electromagnetic, or any suitable storage technology. For instance, the memory system 708 may include random access memory (RAM), read only memory (ROM), flash memory, a disk drive, a memory card, an optical storage device, a magnetic storage device, a memory module, any suitable type of memory device, or any number and combination thereof.
The memory system 708 includes at least a 3D layout model generator 710 with an ML system 712, as well as other relevant data 714, which are stored thereon. Each of these components include computer readable data that, when executed by the processing system 704, is configured to perform at least the functions disclosed in this disclosure. The computer readable data may include instructions, code, routines, various related data, any software technology, or any number and combination thereof. The 3D layout model generator 710 is configured to generate one or more 3D layout models, as well as one or more unified 3D layout models. In addition, the 3D layout model generator 710 is configured to generate measurement data relating to various dimensions taken with respect to one or more 3D layout models, one or more unified 3D layout models, or any number and combination thereof.
The ML system 712 includes at least one machine learning model, which is configured to perform instance segmentation. As a non-limiting example, the machine learning model may include an artificial neural network, a deep neural network, machine learning technology, or any number and combination thereof. More specifically, as discussed above, in response to receiving a digital image from the image sensor (e.g., camera), the ML system 712 is configured to generate instance segmentation data, which includes segmentation masks identifying various objects (e.g., one or more architectural elements) in a digital image. Also, the other relevant data 714 provides various data (e.g., operating system, etc.), which enables the system 700 and/or the model generation system 702 to perform the functions as discussed herein.
Also, the system 700 and/or the model generation system 702 includes a depth estimator 716. The depth estimator 716 is configured to generate depth data or a depth map. For instance, in an example embodiment, the depth estimator 716 comprises a machine learning system. The machine learning system includes at least one machine learning model, which is configured to generate depth data or a depth map in response to receiving one or more digital images as input data. In another example embodiment, the depth estimator 716 comprises a laser rangefinder, which is configured to generate depth data.
The system 700 and/or the model generation system 702 may include one or more I/O devices 718 (e.g., display device, microphone, speaker, etc.). As an example, for instance, the system 700 and/or the model generation system 702 may include a display device, which is configured to display one or more 3D layout models, one or more unified 3D layout models, measurement data relating to one or more of the 3D layout models, measurement data relating to one or more of the unified 3D layout models, or any number and combination thereof. Also, the system 700 and/or the model generation system 702 may include one or more I/O devices 718 to display the 3D layout model and receive input data, which allows for the modification of the 3D layout model. As a non-limiting example, for instance, the system 700 and/or the model generation system 702 includes a touchscreen on a mobile communication device that displays a first 3D layout model and then allows a user to delete a wall of the first 3D layout model and combine a second 3D layout model to the first 3D layout model. This feature is advantageous in enabling a user to interact with the model generation system 702 and one or more 3D layout models.
In addition, the system 700 includes other functional modules 720, such as any appropriate hardware, software, or combination thereof that assist with or contribute to the functioning of the system 700. For example, the other functional modules 720 include communication technology (e.g. wired communication technology, wireless communication technology, or a combination thereof) that enables components of the system 700 to communicate with each other and/or one or more computing devices 722 (e.g., mobile communication device, smart phone, laptop, tablet, etc.). The system 700 may also include a cloud computing system 724. The cloud computing system 724 is in data communication with the system 700 and the one or more other computing devices 722.
Also, the other functional modules 720 may include other components, such as an actuator. In this regard, for instance, when the model generation system 702 is employed in a robot vacuum, the other functional modules 720 further include one or more actuators, which relate to driving, steering, stopping, and/or controlling a movement of the robot vacuum based at least on the 3D layout model, the unified 3D layout model, measurement data relating to one or more 3D layout models or one or more unified 3D layout models, or any number and combination thereof.
As described in this disclosure, the system 700 and/or the model generation system 702 provides several advantages and benefits. For example, the model generation system 702 is configured to generate and construct 3D layout models without requiring or using a depth image (e.g., RGBD image). The system 700 and/or the model generation system 702 is configured to generate 3D layout models using an image sensor (e.g., RGB camera) with digital images comprising 2D data. In some embodiments, the model generation system 702 is configured to generate estimates of a depth map and/or depth data (e.g., dense depth data) via a machine learning system (e.g., CNN). These embodiments are advantageous in reducing the physical size and costs, which are associated with generating 3D layout models of professional grade and precision. In other embodiments, the model generation system 702 is configured to estimate the 3D configurations of depth data, which is generated via some sparse laser measurements taken by a laser rangefinder.
Also, the model generation system 702 is configured to construct one or more 3D layout models, as well as unified 3D layout models. In this regard, the model generation system 702 is configured to align multiple 3D layout models, which are generated from digital images via camera pose data and construct a unified larger 3D room layout that provides greater layout coverage of an environment. In addition, the model generation system 702 is configured to provide the added benefit of identifying different architectural elements in the 3D layout models. Furthermore, the model generation system 702 is advantageous in being configured to provide accurate measurement data with respect to various dimensions of the 3D layout models and/or the unified 3D layout models. The model generation system 702 is configured to provide at least these features at a relatively low cost. Moreover, the model generation system 702 is configured to provide these 3D layout models and/or unified 3D layout models downstream so that they may be used in various applications (e.g., robotics, augmented reality, virtual reality, etc.).
Furthermore, the above description is intended to be illustrative, and not restrictive, and provided in the context of a particular application and its requirements. Those skilled in the art can appreciate from the foregoing description that the present invention may be implemented in a variety of forms, and that the various embodiments may be implemented alone or in combination. Therefore, while the embodiments of the present invention have been described in connection with particular examples thereof, the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the described embodiments, and the true scope of the embodiments and/or methods of the present invention are not limited to the embodiments shown and described, since various modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims. Additionally, or alternatively, components and functionality may be separated or combined differently than in the manner of the various described embodiments and may be described using different terminology. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure as defined in the claims that follow.