Recreating a physical scene is useful for various user applications, including gaming and interior design and renovation. An aspect of the physical scene is the architectural layout, including one or more walls, floors, and ceilings. Advances in augmented reality, deep networks, and open-source data sets have facilitated extracting single-view room layout extraction and planar reconstruction. However, contemporary technology can be limited in accurately capturing a scene.
Layout extraction of a scene can utilize images taken on a user device, such as a smartphone. Piecewise planar reconstruction methods for layout extraction can attempt to retrieve geometric surface planes from the images, which can be single views or panoramic views. The quality of the layout extraction can depend on the technology equipped with the smartphone. For example, some smartphones only utilize 2D red-green-blue (RGB) images, which can propagate layout extraction challenges, such as repeating textures, or large low-texture surfaces, that can hinder the perception of 3D surface geometry, using conventional methods. Some smartphones may ignore field of contextual, or perceptual information, available to accurately reconstruct a scene. Some smartphones might not accurately capture complex geometries that can include corners, curvatures, and other architectural features, for example. In addition to camera limitations, technology can operate under certain assumptions about a room that impair layout extraction, such as assuming the room is strictly rectangular, that corners are visible and not occluded by furniture or other items, and that walls or other surfaces do not contain openings or architectural features, such as arches, columns, or baseboards.
Accordingly, there is a need for improvements in layout extraction systems and methods.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
While methods, apparatuses, and computer-readable media are described herein by way of examples and embodiments, those skilled in the art recognize that methods, apparatuses, and computer-readable media for layout extraction are not limited to the embodiments or drawings described. It should be understood that the drawings and description are not intended to be limited to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the appended claims. Any headings used herein are for organizational purposes only and are not meant to limit the scope of the description or the claims. As used herein, the word “may” is used in a permissive sense (i.e., meaning having the potential to) rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.
The present method, apparatus, and computer-readable medium addresses at least the problems discussed above in layout extraction. As discussed above, layout extraction methods can utilize scene capturing technology and piecewise planar reconstruction techniques to model various scenes, e.g., rooms. However, reliance on RGB images exclusively can result in an inaccurate model of a scene.
Existing systems and methods might fail to recognize or utilize modeling tools available from the scene, such as gravity vectors, orientations, and depth information. Further, existing systems and methods might rely on assumptions regarding the layout that inaccurately capture the room geometry. For example, existing systems and methods may assume adjacent walls are connected when the walls are actually disconnected in the scene. In another example, existing systems and methods may assume a ceiling in the scene is rectangular, or that the scene contains only one ceiling. Accordingly, existing systems and methods may place corners or seams between walls and the ceiling at incorrect locations.
Existing systems and methods may be unable to model complex geometries, either partially or wholly ignoring or misinterpreting details in the scene, including doors, windows, curvatures, and various architectural features. Some existing systems and methods may utilize a single image to reconstruct a scene. The single image might not provide full visibility of the scene though, as occlusions, such as furniture and wall-hanging objects, e.g., artwork, can block the background layout. Because of these limitations, existing systems and methods can be unable to model complex geometries that might include details, such as corners, curvatures, walls that do not extend fully from a ceiling to a floor, windows, doors, and architectural features, such as arches, columns, or baseboards.
The present system addresses these problems by utilizing additional contextual, or perceptional, information (e.g. instance segmentation, geometric edge detection, occluded wall connectivity perception) and line segments identified in a scene to produce a more geometrically complete and consistent estimate of the architectural layout of the scene. As described herein, scenes can be more precisely modeled by better defining boundaries of layout planes based on the contextual information and/or line segments. Layout plane masks corresponding to the layout planes can be accurately predicted based on classifying and grouping line segments into planes to achieve precise mask boundaries. The layout planes can be defined with line segments to form the layout plane masks. In addition, fewer or no assumptions are made regarding room corners or seams between adjacent walls, for example, in comparison to existing methods. Instead, confidence values are determined regarding the layout plane masks to optimize the layout plane masks. Using plane connectivity and depth and semantic segmentation priors, the layout plane masks can be optimized, providing a detailed and true model of the scene.
The novel method, apparatus, and computer-readable medium for layout extraction will now be described with reference to the figures.
Based on one or more inputs, including one or more images of a scene or contextual information, the layout can be extracted, the layout being a geometrically consistent empty version of a scene. The layout can be a 3D representation of one or more surfaces, e.g., walls, floor, ceiling, windows, doors, soffits, etc., and/or 2D representations of the same data (floor plans, elevation plans, etc. A single input or a plurality of inputs can be used to generate the 3D and/or 2D representations of the scene. As shown in
The methods described herein can be implemented in a user facing system. A user can take a photo or multiple photos of a space showing a scene (e.g., a room in their house). The user can also take a video of the space, in other examples. The user can utilize any user device having a camera, such as a smartphone, laptop, tablet, or digital camera. Using the techniques described in greater detail below, the layout of the scene can be modeled. Based on one or more inputs, a framework of the layout can be identified and modeled, rendering a more accurate view of the scene background. Once this modeling is complete, a user can apply virtual objects, e.g., furniture or wall-hanging objects, to the scene to simulate various interior design options, and architectural editing and planning, for example. Features of the system can include:
Steps of method 400 can generally include one or more of the following:
In this way, method 400 can provide detailed, automatic, parametric computer-aided design modeling.
Prior to the initial step (in methods 400, 3500, and/or 3600), a user can capture and upload one or more images of a scene and/or one or more videos of the scene. The image or images are then processed and analyzed to extract or determine the contextual, or perceptional, information. This can include, for example, 3D points, features, gravity, augmented reality data, etc. The one or more images can be inputs along with optional poses, 3D points, features, gravity information, inertial measurement unit (IMU) data streams, and/or augmented reality data. As part of this step, the input preprocessor 502 can obtain one or more images, gravity, camera poses, 3D points or depth maps, features, and/or AR data. The images can be RGBD (Red-Green-Blue-Depth) images, RGB images, gray-scale images, RGB images with associated IMU data streams, and/or RGBD images of a room/space with optional measurement of gravity vector. Of course, any representation of the captured scene (e.g. point clouds, depth-maps, meshes, voxels, where 3D points are assigned their respective texture/color) can be used. The inputs can be further processed to extract layout information, as described in greater detail below.
At step 401, a plurality of scene priors corresponding to an image of a scene can be stored, the plurality of scene priors comprising a semantic map indicating semantic labels associated with the plurality of pixels in the image and a plurality of line segments. Any of the scene priors, including the plurality of line segments, can themselves be generated from other inputs. The semantic labels can include at least one of a wall, a ceiling, or a floor, for example, or a window or a door.
With reference to
The additional data derived from additional viewpoints and images also allows for improvements in layout extraction. Viewing aspects of the scene from multiple angles can elevate the scene modeling by facilitating visibility. For example, viewing nested objects, or objects adjacent one another in various arrangements, from several angles can provide additional pixels such that removing some occlusions does not automatically eliminate visibility of the background layout. Instead, the additional pixels can act as replacement pixels to create visibility of the background behind some occlusions. Multiple views and images also allow for building better three dimensional views of objects and provide additional views of geometry and textures from various architectural features.
Scene priors based on the one or more of input images can include a semantic map (
The scene priors can be extracted from one or more images and can be contextual, or perceptional, information corresponding to a plurality of pixels in the one or more images. The contextual, or perceptional, information can include perceptual quantities, aligned to one or more of the input images, individually or stitched into composite images. The depth maps, for example, can be obtained by using dense fusion on RGBD (e.g., an RGB image taken on a camera with depth inputs), or by densifying sparse reconstruction from RGB images (such as through neural network depth estimation and multi-view stereoscopy). Metric scale can be estimated using many methods including multi-lens stereo baseline, active depth sensor, visual-inertial odometry, SLAM (Simultaneous Localization and Mapping) points, known object detection, learned depth or scale estimation, manual input, or other methods. In this step a set of one or more images, with (optionally) information about gravity, poses, and depths can be used to extract various perceptual information like semantic segmentation, edges, and others, using task-specific classical/deep-learning algorithms. The contextual, or perceptional, information can be aligned to one or more of the input images, or to a composed input image e.g. a stitched panorama.
Various scene priors will now be described with reference to
The semantic map can include three-dimensional semantic maps, in which semantic labels (such as those described above), are associated with three dimensional geometry, such as polygons or voxels. In this case, the semantic map still includes semantic labels associated with a plurality of pixels, when the scene is viewed as an image from the vantage point of a camera, but the semantic map structure itself maps semantic labels to three dimensional structures.
For example, the system can store a three dimensional geometric model corresponding to the scene or a portion of the scene. The three dimensional geometric model can store x, y, and z coordinates for various structures in the scene. These coordinates can correspond to depth information, since the coordinates can be used with camera parameters to determine a depth associated with pixels in an image viewed from the camera orientation.
Scene priors can be inputs to generate one or more outputs.
Camera parameters can include intrinsic parameters, such as focal length, radial distortion and settings such as exposure, color balance, etc., and also extrinsic parameters. Camera parameters can be scene priors to generate the layout extraction.
Gravity vectors can be estimated from an IMU, from a VIO or SLAM system, from camera level/horizon indicator, from vanishing point and horizon analysis, from neural networks, etc.
Determining an orientation map can include a method 1300 shown in
At step 1302, a plurality of first normal estimates from the plurality of pixels can be extracted. First normal estimates can be derived from background horizontal lines in the plurality of pixels. Accordingly, first normal estimates can be line-based normal estimates. As shown, step 1302 can include steps 1305-1307. At step 1305, a horizontal line from a pixel of the plurality of pixels can be detected. The detection of horizontal lines can be completed using a deep network. Lines, such as borders, can be selected, with outliers being rejected. Outliers can be identified by calculating the agreement of a vanishing point of a particular line against vanishing points of other lines in the scene. At step 1306, a vanishing point based on the horizontal line and a 3D gravity vector associated with the pixel can be calculated. At step 1307, the vanishing point can be combined with the 3D gravity vector. In this way, a 3D normal for a vertical plane is determined.
At step 1303, a plurality of second normal estimates from the normal map can be extracted. The second normal estimates can be from a deep network.
At step 1304, the plurality of first normal estimates with the plurality of second normal estimates can be concatenated. Step 1304 can be illustrated with
In this way, using gravity and the information of lines being horizontal inside the input mask area, a candidate normal vector for the plane can be created.
Referring back to
Each layout plane corresponds to a background plane, such as wall/window/door plane, a floor plane, and/or a ceiling plane. For example, each layout plane can be a wall plane corresponding to a wall, a ceiling plane corresponding to a ceiling, or a floor plane corresponding to a floor, for example. Layout planes can be planar or curved. Each of the layout planes can be stored as a segmentation mask for the image(s), the segmentation mask indicating which pixels in the image(s) correspond to a particular plane. 3D plane equations can also be computed for each of the layout planes, with the 3D plane equations defining the orientation of the plane in three-dimensional space. To track this information on a per-plane basis, planes can have corresponding plane identifiers (e.g., wall 1, wall 2, ceiling, etc.) and the plane identifiers can be associated with plane equations for each plane. The layout map can also include structures other than walls, ceilings, and floors, including architectural features of a room such soffits, arches, pony walls, built-in cabinetry, etc.
Borders between layout planes can be based on first scene priors 1201 in
At step 1501, a first set of borders comprising lines that form seams between two walls can be detected. The lines can represent seams between adjacent, touching wall planes, e.g., wall-wall vertical seams or edges. Line segments, a deep network, and/or other scene priors, e.g., input images and semantic segmentation, can be inputs to detect borders, as shown with reference to
At step 1502 in
At step 1503 in
Wall-wall seam, or border, detection can be refined with reference to
Referring back to
In an example, as shown in
A plurality of plane equations corresponding to the plurality of planes, a plurality of initial plane masks corresponding to the plurality of planes, and a plurality of connectivity values can be stored, each plane mask indicating the presence or absence of a particular plane at a plurality of pixel locations. The determination of the plane equations and the generation of the plane masks are described with respect to the previous steps. The plane masks can then be used to determine 3D plane equations corresponding to the planes. The plane equations can correspond to 3D plane parameters corresponding to the plurality of layout planes that estimate the geometry of the scene. In other words, the 3D plane equations can define the orientation of the planes in 3D space. These computed values are then stored to be used when determining estimated geometry of the scene.
In another example,
The semantic map can be generated by one or more input images for example. At step 1901 the semantic map can be superimposed on the plurality of borders to select for pixels corresponding to wall planes, ceiling planes, or floor planes. These planes can be used to generate the corresponding initial plane masks having plane equations.
Step 1901 can include looking up a semantic labels corresponding to the plurality of pixels in the semantic map to determine what labels are assigned to the pixels. The semantic map can be superimposed on the borders to identify which semantic label corresponds to each of the pixels. This can include, for example, identifying pixels that have a semantic label of wall, ceiling, or floor.
A user can select the pixels corresponding to the planes for modeling. For example, it can be determined which plane is at that pixel location that is selected. Once the planes are identified, the 3D plane equations can be used in conjunction with the locations of pixels corresponding to the planes to determine the estimated geometry of the planes.
Step 1901 can additionally or alternatively include superimposing the semantic map onto the depth map discussed above to determine locations of planes within the scene. As discussed above, the plane masks can then be used to determine 3D plane equations corresponding to the planes. The 3D plane equations can define the orientation of the planes in 3D space.
Referring back to
Second scene priors 2001 can be seen in
The estimated geometry can include estimated geometry that is curved or curvilinear. The system can include functionality for identifying curved geometry (such as curved walls, arched ceilings, or other structures) and determining an estimate of the curvature, such as through an estimation of equations that describe the geometry or a modeling of the geometry based on continuity.
An example of planar optimization is shown in
At step 2201, a non-linear optimization function can be applied based at least in part on the plurality of initial plane masks, the plurality of connectivity values and the one or more second scene priors to generate an initial estimated geometry of the plurality of layout planes, the initial estimated geometry comprising confidence values associated with the plurality of layout planes. Optimization accounts for low confidence and high error planes. These planes are detected and refined.
At step 2202 in
An exemplary implementation can be as follows:
At step 2203, the plurality of initial plane masks can be refined based at least in part on the refined estimated geometry to generate the plurality of optimized plane masks.
As shown in
The methods of
The following notation and assumptions can be used:
The optimization objective can be as follows:
Objective
where i, j iterate over the detected planes, and the data loss is defined as:
E
i
data
=E
i
orientation
+E
i
depth
Loss terms can include connectivity loss, photogrammetry loss, deep depth loss, orientation loss, and/or semantic occlusion loss.
Connectivity loss between planes πi and πj can be as follows:
where {rk} is the set of image rays lying on the boundary between the masks of planes (πi, πj), and wi,j∈[0, 1] is the connectivity weight, indicating whether two planes are connected. The connectivity as a floating value can reflect the confidence of the prior, e.g., how certain it is that two planes are actually connected.
Manhattan loss between πi and πj, can be as follows:
E
i,j
manhattan=ρ(min(|niTnj|,|niTnj|−1))
The robust loss helps when planes are actually non-Manhattan (e.g., Atlanta-world), plane priors indicate they are non-manhattans but the rest of the constraints “pull” them to be, and with minor errors in image vanishing geometry.
Photogrammetry loss for πi can be as follows:
where {Pi} is the set of 3D points, on the image reference frame, which lie inside maski, when projected to the image.
Deep depth loss is the same as photogrammetry loss, with the only difference being that the 3D points are subsampled, since the input deep depth is dense, to get the set {Pi}.
Orientation loss for πi can be as follows:
where {n{circumflex over ( )}i} is the set of feasible normals for plane πi.
This set consists of normals voted by the scene vanishing points, as well as prior normals available (e.g. from the input depth). This set is calculated using the image orientation prior, by accumulating all the candidate normals for maski.
Semantic occlusion loss can be as follows:
The first term states that the plane πi, which always lies in the background because it is a layout plane, cannot be “in front” of the 3D object points. The second term states that no part of the estimated wall plane can be placed under the floor.
The plane equations can be optimized in a scene having multiple views available. The steps can:
The association matrix C can be derived by using image information and prior dense normals, or line tracking, normals from a deep network, RGB and dense features from a deep network. Extending the optimization objective to multiple views can include the following steps:
transformk(πi)=[Rkni,di−Rknitk]
wi, jk is the connectivity weight of planes πi, of πj, as observed from camera k. That is, two world planes might be connected in 3D, but in view of k, they do not appear as such. Except for planes moving outside the field-of-view, this can happen in cases of complex ceilings, poor wall-wall estimates, and occlusions.
The layout processing step can generate orientation priors and/or the initial plane masks. The outputted assets represent the layout extraction of a scene for use in various applications of the systems and methods described herein. Using the techniques described herein, the layout of the scene can be modeled. User applications can include interior design, such as applying wall-hanging objects or furniture, or providing an empty room to allow reimagining of the space in the scene.
As discussed, layout extraction can be used in a variety of applications, such as interior design. The methods described herein can be implemented in a user facing system, in which a user is prompted to take one or more photo of a space showing a scene (e.g., a room in their house). With reference to
Once this modeling is complete, a user can apply virtual objects, e.g., furniture or wall-hanging objects, to the scene to simulate various interior design options, and architectural editing and planning, for example. As shown in
A dataset of 250 wide-angle photographs from homes was gathered, captured from a viewpoint that maximizes the scene visibility. A specialized tool annotated the room layout, e.g., the ground truth floor-wall boundary, even in challenging environments (e.g., kitchens).
Wall-floor edge error was evaluated, where the error can effectively measure the accuracy of the layout in 2D. The error does not need to find correspondences between predicted and ground truth planes.
The results were evaluated against Render-And-Compare (RnC), an example existing system. For both systems, the same semantic segmentation and dense depth from a deep network as inputs (no LiDAR) were used, to allow for comparison. The present system utilized line segments from LCNN directly.
The following Table 1 shows the quantitative results for the wall-floor (W-F) edge loss. RnC is tested with PlaneRCNN, a existing system, as an input, as well as PlaneTR, the state-of-the-art piecewise planar reconstruction method. Since the present system uses semantic segmentation to carve out plane instances, PlaneTR plane masks were post-processed with semantic segmentation, to make for a more fair comparison of the two methodologies. Ablation studies were also included, to demonstrate the importance of the optimization losses used.
Quantitative results of the present system's in-house dataset, comparing the present method against RnC under various configurations, for the wall-floor (W-F) edge pixel loss. The arrow-down symbol indicates “lower is better”. For PlaneTR with semantic segmentation, the input plane masks are refined using semantic segmentation. The ablation studies show the importance of the wall-wall connectivity term, and the orientation loss.
As shown, using the same input priors, the present method significantly outperforms the previous state-of-the-art, on the challenging in-house dataset. It can be seen that the plane segmentation quality has a detrimental effect on the results as current methods have trouble generating precise masks for small wall segments with severe occlusions, which is not a problem for the layout approaches.
Qualitative comparisons are also shown in
As shown, mask-based planar segmentation methods face no problem when a wall plane is clearly visible without occlusions (top row). But precise boundary estimation becomes challenging under severe occlusions, resulting in less accurate estimate layout by the existing (bottom row). The present system estimates precise layout plane boundaries, which can be used to enforce reliable connectivity constraints between planes and get an accurate layout reconstruction, shown in (c).
At step 3501, a plurality of scene priors corresponding to an image of a scene can be stored. The plurality of scene priors can include a semantic map indicating semantic labels associated with a plurality of pixels in the image, geometry information corresponding to the plurality of pixels in the image, and one or more line segments corresponding to the scene. Step 3501 can be similar to step 401 (
The image can be an RGB image. The semantic labels can include at least one of a wall, a ceiling, or a floor. The scene priors can include one or more of a gravity vector corresponding to the scene; an edge map corresponding to a plurality of edges in the scene; a normal map corresponding to a plurality of normals in the scene; camera parameters of a camera configured to capture the image; or an orientation map corresponding to a plurality of orientation values in the scene. The geometry information can include one or more of a depth map corresponding to the plurality of pixels; photogrammetry points corresponding to a plurality of three-dimensional point values in the plurality of pixels; a sparse depth map corresponding to the plurality of pixels; a plurality of depth pixels storing both color information and depth information; a mesh representation corresponding to the plurality of pixels; a voxel representation corresponding to the plurality of pixels; or depth information associated with one or more polygons corresponding to the plurality of pixels.
At step 3502, one or more borders based on the one or more line segments can be generated. Each border can represent a separation between two layout planes in a plurality of layout planes of the scene. Step 3502 can be similar to step 402 (
The plurality of layout planes can include at least one of a planar plane or a curved plane. Step 3502 can include detecting a horizontal line from a pixel of the plurality of pixels in the image; calculating a vanishing point based on the horizontal line and a gravity vector associated with the pixel; and combining the vanishing point with the gravity vector to determine a plurality of normal estimates. Additionally or alternatively, step 3502 can include detecting a first set of borders comprising lines that form seams between two walls; detecting a second set of borders comprising lines that separate walls in the scene; and detecting a third set of borders comprising lines that separate walls from floors or ceilings in the scene. The detecting the first set of borders can include lines that form seams between two walls determining a first end and a second end of the line segment; and determining whether the line segment forms a seam between two walls based at least in part on the first end and the second end and a normal map of the scene.
At step 3503, a plurality of plane masks corresponding to the plurality of layout planes that estimate the geometry of the scene can be generated. The plurality plane masks can be based at least in part on at least one of the plurality of scene priors and the one or more borders. Step 3503 can be similar to steps 402 and 403 (
Step 3503 can include generating a plurality of initial plane masks, the plurality of initial plane masks corresponding to the plurality of layout planes; generating a plurality of plane connectivity values based at least in part on the one or more borders, each plane connectivity value indicating connectivity between two layout planes in the plurality of layout planes; and refining the plurality of initial plane masks based at least in part on an estimated geometry of the plurality of layout planes, the estimated geometry based at least in part on at least one of at least one of the plurality of scene priors, the plurality of initial plane masks, and the plurality of connectivity values. The generating the plurality of initial plane masks and the plurality of plane connectivity values based at least in part on the plurality of borders can include superimposing the semantic map on the plurality of borders to select for pixels corresponding to the plurality of layout planes of the scene. The refining the plurality of initial plane masks based at least in part on an estimated geometry of the plurality of layout planes can include applying a non-linear optimization function based at least in part on the plurality of initial plane masks, the plurality of connectivity values, and at least one of the one or more scene priors to generate an initial estimated geometry of the plurality of layout planes, the initial estimated geometry comprising confidence values associated with the plurality of layout planes; detecting and refining one or more low confidence layout planes in the plurality of layout planes in the initial estimated geometry having confidence values below a predetermined threshold to generate a refined estimated geometry; and refining the plurality of initial plane masks based at least in part on the refined estimated geometry to generate the plurality of plane masks.
Method 3500 can optionally include step 3504, at which one or more three-dimensional (3D) plane parameters corresponding to the plurality of layout planes that estimate the geometry of the scene can be generated. Step 3504 can be similar to steps 402 and 403 (
At step 3601, a first scene prior and a second scene prior corresponding to an image of a scene can be stored. The image can be of one or more corners in the scene. The first scene prior and the second scene prior can include a semantic map indicating semantic labels associated with a plurality of pixels in the image and geometry information corresponding to the plurality of pixels in the image. Step 3501 can be similar to step 401 (
The image can be one of a plurality of images of the scene. The first scene prior and the second scene prior can correspond to the plurality of images of the scene. The semantic map can indicate semantic labels is associated with a plurality of pixels in the plurality of images. The first scene prior and the second scene prior each can include one or more of a gravity vector corresponding to the scene; an edge map corresponding to a plurality of edges in the scene; a normal map corresponding to a plurality of normals in the scene; camera parameters of a camera configured to capture the image; or an orientation map corresponding to a plurality of orientation values in the scene. The geometry information can include one or more of a depth map corresponding to the plurality of pixels; photogrammetry points corresponding to a plurality of three-dimensional point values in the plurality of pixels; a sparse depth map corresponding to the plurality of pixels; a plurality of depth pixels storing both color information and depth information; a mesh representation corresponding to the plurality of pixels; a voxel representation corresponding to the plurality of pixels; or depth information associated with one or more polygons corresponding to the plurality of pixels.
At step 3602, one or more borders based on at least one of the first scene prior or the second scene prior can be generated. Each border can represent a separation between two layout planes in a plurality of layout planes of the scene. Step 3502 can be similar to step 402 (
At step 3603, a plurality of plane masks corresponding to the plurality of layout planes that estimate a geometry of the scene can be generated. The plurality plane masks can be based at least in part on the one or more borders. Step 3503 can be similar to steps 402 and 403 (
Method 3600 can optionally include step 3604, at which one or more three-dimensional plane (3D) parameters corresponding to the plurality of layout planes that estimate the geometry of the scene can be generated. Step 3504 can be similar to steps 402 and 403 (
As shown in
All of the software stored within memory 3701 can be stored as a computer-readable instructions, that when executed by one or more processors 3702, cause the processors to perform the functionality described with respect to
Processor(s) 3702 execute computer-executable instructions and can be a real or virtual processors. In a multi-processing system, multiple processors or multicore processors can be used to execute computer-executable instructions to increase processing power and/or to execute certain software in parallel.
Specialized computing environment 3700 additionally includes a communication interface 3703, such as a network interface, which is used to communicate with devices, applications, or processes on a computer network or computing system, collect data from devices on a network, and implement encryption/decryption actions on network communications within the computer network or on data stored in databases of the computer network. The communication interface conveys information such as computer-executable instructions, audio or video information, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
Specialized computing environment 3700 further includes input and output interfaces 1304 that allow users (such as system administrators) to provide input to the system to set parameters, to edit data stored in memory 3701, or to perform other administrative functions.
An interconnection mechanism (shown as a solid line in
Input and output interfaces 3704 can be coupled to input and output devices. For example, Universal Serial Bus (USB) ports can allow for the connection of a keyboard, mouse, pen, trackball, touch screen, or game controller, a voice input device, a scanning device, a digital camera, remote control, or another device that provides input to the specialized computing environment 3700.
Specialized computing environment 3700 can additionally utilize a removable or non-removable storage, such as magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, USB drives, or any other medium which can be used to store information and which can be accessed within the specialized computing environment 3700.
Having described and illustrated the principles of our invention with reference to the described embodiment, it will be recognized that the described embodiment can be modified in arrangement and detail without departing from such principles. It should be understood that the programs, processes, or methods described herein are not related or limited to any particular type of computing environment, unless indicated otherwise. Elements of the described embodiment shown in software may be implemented in hardware and vice versa.
It will be appreciated by those skilled in the art that changes could be made to the embodiments described above without departing from the broad inventive concept thereof. For example, the steps or order of operation of one of the above-described methods could be rearranged or occur in a different series, as understood by those skilled in the art. It is understood, therefore, that this disclosure is not limited to the particular embodiments disclosed, but it is intended to cover modifications within the spirit and scope of the present disclosure.
This application claims priority to U.S. Provisional Application No. 63/354,596, filed Jun. 22, 2022, and U.S. Provisional Application No. 63/354,608, filed Jun. 22, 2022, the disclosure of which are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
63354608 | Jun 2022 | US | |
63354596 | Jun 2022 | US |