Many modern computing applications reconstruct a scene for use in augmented reality (AR), virtual reality (VR), robotics, autonomous applications, etc. However, conventional scene reconstruction, such as, dense three-dimensional (3D) reconstruction have a very high computational requirement in both compute and memory requirements. Thus, present techniques are not suitable for real-time scene reconstruction for many applications, such as, mobile applications lacking the necessary compute and memory resources.
To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced.
Scene reconstruction can sometimes be referred to as dense mapping, and operates to digitally reconstruct a physical environment based on images or 3D scans of the physical environment.
In general, the present disclosure provides scene reconstruction methods and techniques, systems and apparatus for reconstructing scenes, and a two and a half dimensional (2.5D) model for modeling areas (e.g., planar areas, non-planar areas, boundary areas, holes in a plane, etc.) of a scene. With some examples, the 2.5D model can be integrated into a scene reconstructions system and can used to model a portion of a scene while other portions of the scene can be modeled by a 3D model.
The present disclosure can provide scene reconstruction for applications such as, robotics, AR, VR, autonomous driving, high definition (HD) mapping, etc. In particular, the present disclosure can provide a scene reconstructions system where all or portions of the scene are modeled using a 2.5D model, as described in greater detail herein. As such, the present disclosure can be implemented in systems where compute resources are limited, such as, for example, by systems lacking a dedicated graphics processing unit (GPU), or the like.
Reference is now made to the detailed description of the embodiments as illustrated in the drawings. While embodiments are described in connection with the drawings and related descriptions, there is no intent to limit the scope to the embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications and equivalents. In alternate embodiments, additional devices, or combinations of illustrated devices, may be added to or combined, without limiting the scope to the embodiments disclosed herein. The phrases “in one embodiment”, “in various embodiments”, “in some embodiments”, and the like are used repeatedly. Such phrases do not necessarily refer to the same embodiment. The terms “comprising”, “having”, and “including” are synonymous, unless the context dictates otherwise.
Scene reconstruction device 100 includes scene capture device 102, processing circuitry 104, memory 106, input and output devices 108 (I/O), network interface circuitry 110 (NIC), and a display 112. These components can be connected by a bus or busses (not shown). In general, such a bus system provides a mechanism for enabling the various components and subsystems of scene reconstruction device 100 to communicate with each other as intended. In some examples, the bus can be any of a variety of busses, such as, for example, a PCI bus, a USB bus, a front side bus, or the like.
Scene capture device 102 can be any of a variety of devices arranged to capture information about a scene. For example, scene capture device 102 can be a radar system, a depth camera system, a 3D camera system, a stereo camera system, or the like. Examples are not limited in this context. In general, however, scene capture device 102 can be arranged to capture information about the depth of a scene, such as, an indoor room (e.g., refer to
Scene reconstruction device 100 can include one or more of processing circuitry 104. Note, although processing circuitry 104 is depicted as a central processing unit (CPU), processing circuitry 104 can include a multi-threaded processor, a multi-core processor (whether the multiple cores coexist on the same or separate dies), an application specific integrated circuit (ASIC), a field programmable integrated circuit (FPGA). In some examples, processing circuitry 104 may include graphics processing portions and may include dedicated memory, multiple-threaded processing and/or some other parallel processing capability. In some examples, processing circuitry 104 may be circuitry arranged to perform particular computations, such as, related to artificial intelligence (AI), machine learning, or graphics. Such circuitry may be referred to as an accelerator. Furthermore, although referred to herein as a CPU, circuitry associated with processing circuitry 104 may be a graphics processing unit (GPU), or may be neither a conventional CPU or GPU. Additionally, where multiple processing circuitry 104 are included in scene reconstruction device 100, each processing circuitry 104 need not be identical.
Memory 106 can be a tangible media configured to store computer readable data and instructions. Examples of tangible media include circuitry for storing data (e.g., semiconductor memory), such as, flash memory, non-transitory read-only-memory (ROMS), dynamic random access memory (DRAM), NAND memory, NOR memory, phase-change memory, battery-backed volatile memory, or the like. In general, memory 106 will include at least some non-transitory computer-readable medium arranged to store instructions executable by circuitry (e.g., processing circuitry 104, or the like). Memory 106 could include a DVD/CD-ROM drive and associated media, a memory card, or the like. Additionally, memory 106 could include a hard disk drive or a solid-state drive.
The input and output devices 108 include devices and mechanisms for receiving input information to scene reconstruction device 100 or for outputting information from scene reconstruction device 100. These may include a keyboard, a keypad, a touch screen incorporated into the display 112, audio input devices such as voice recognition systems, microphones, and other types of input devices. In various embodiments, the input and output devices 108 may be embodied as a computer mouse, a trackball, a track pad, a joystick, wireless remote, drawing tablet, voice command system, eye tracking system, and the like. The input and output devices 108 typically allow a user to select objects, icons, control areas, text and the like that appear on the display 112 via a command such as a click of a button or the like. Further, input and output devices 108 can include speakers, printers, infrared LEDs, display 112, and so on as well understood in the art. Display 112 can include any of a devices to display images, or a graphical user interfaces (GUI).
Memory 106 may include instructions 114, scene capture data 116, 2.5D plane data 118, 3D data 120, and visualization data 122. In general, processing circuitry 104 can execute instructions 114 to receive indications of a scene (e.g., indoor scene 200 of
Furthermore, the processing circuitry 104 can execute instructions 114 to generate both 2.5D plane data 118 and 3D data 120. More specifically, the present disclosure provides that portions of a scene can be represented by a 2D plane, and as such, 2.5D plane data 118 can be generated from scene capture data 116 for these portions of the scene. Likewise, pother portions of the scene can be represented by 3D data, and as such, 3D data 120 can be generated from scene capture data 116 for these portions of the scene. Subsequently, visualization data 122 can be generated from the 2.5D plane data 118 and the 3D data 120. The visualization data 122 can include indications of a rendering of the scene. Visualization data 122 can be used in either a VR system or an AR system, as such, the visualization data 122 can include indications of a virtual rendering of the scene or an augmented reality rendering of the scene.
Indoor scene 200 includes a wall 202, a painting 204, and a couch 206. Scene reconstruction device 100 can be arranged to capture indications of indoor scene 200, such as, indications of depth (e.g., from device 102, from a fixed reference point, or the like) of points of indoor scene 200. It is noted, that points in indoor scene 200 are not depicted for purposes of clarity. Further, the number of points, or rather, the resolution, of the scene capture device can vary.
Indoor scene 200 is used to describe illustrative examples of the present disclosure, where a scene is reproduced by representing portions of the scene as a 2D plane and other portions of the scene as 3D objects. In particular, indoor scene 200 can be reproduced by representing portions of wall 202 not covered by painting 204 and couch 206 as 2D plane 208. Further, the frame portion of painting 204 can be represented as 3D object 210 while the canvas portion of painting 204 can be represented as 2D plane 212. Likewise, couch 206 can be represented as 3D object 214. By representing portions of indoor scene 200 as 2D planes, the present disclosure provides for real-time and/or on-device scene reconstructions without the need for large scale computational resources (e.g., GPU support, or the like).
Routine 300 can begin at block 302 “receive data comprising indications of a scene” where data including indications of a scene can be received. For example, processing circuitry 104 can execute instructions 114 to receive scene capture data 116. As a specific example, processing circuitry 104 can execute instructions 114 to cause scene capture device 102 to capture indications of a scene (e.g., indoor scene 200). Processing circuitry 104 can execute instructions 114 to store the captured indications as scene capture data 116.
Continuing to block 304 “identify planar areas within the scene” planar areas in the scene can be identified. In general, for indoor scenes, planar surfaces (e.g., walls, floors, ceilings, etc.) typically occupy a significant portion of the non-free space. These such planar areas are identified at block 304. For example, processing circuitry 104 can execute instructions 114 to identify areas within scene capture data 116 having a contagious depth value, thereby forming a surface. In a specific example, depth values within a threshold value of each other across a selection of points will be identified as a planar surface. Referring to
Continuing to block 306 “segment the scene into planes and 3D objects” the scene can be segmented into planes and 3D objects. For example, points within the scene capture data 116 associated with the planar areas identified at block 304 can be segmented from the other points of the scene. Processing circuitry 104 can execute instructions 114 to identify or mark points of scene capture data 116 associated with the identified planes. As a specific example, the depth value of points associated with the identified planar areas can be multiplied by negative 1 (−1). In conventional systems, depth values are not negative. As such, a negative depth value can indicate inclusion within the planar areas. As another specific example, processing circuitry 104 can execute instructions 114 to generate 2.5D plane data 118 for 2D plane 208 and 2D plane 212.
Continuing to subroutine block 308 “generate 2.5D plane models for planar areas” 2.5D plane models for the identified planar areas can be generated. For example, processing circuitry 104 can execute instructions 114 to generate 2.5D plane data 118 from points of scene capture data 116 associated with the identified planar areas. This is described in greater detail below, for example, with respect to
Continuing to subroutine block 312 “reconstruct the scene from the 2.5D plane models and the 3D object models” the scene can be reconstructed (e.g., visualized, or the like) from the 2.5D plane models and the 3D object models generated at subroutine block 308 and subroutine block 310. More particularly, processing circuitry 104 can execute instructions 114 to generate visualization data 122 from 2.5D plane data 118 generated at subroutine block 308 and the 3D data 120 generated at subroutine block 310. With some examples, processing circuitry 104 can execute instructions 114 to display the reconstructed scene (e.g., based on visualization data 122, or the like) on display 112. More specifically, processing circuitry 104 can execute instructions 114 to display the reconstructed indoor scene 200 as part of a VR or AR image.
It is noted, that routine 300 depicts various subroutines for modeling objects or planes in a scene and for reconstructing the scene from these models. In scene reconstruction, scene capture data 116 typically includes indications of points, point cloud, or surfels. Said differently, point cloud is mostly used to model raw sensor data. From point cloud data, voxels can be generated. More specifically, volumetric methods can be applied to digitalize the 3D space (e.g., the point cloud) with a regular grid, with each grid cell named a voxel. For each voxel, a value is stored to represent either the probability of this place being occupied (occupancy grid mapping), or its distance to nearest surface (signed distance function (SDF), or truncated SDF (TSDF)).
It is noted, that with conventional volumetric method techniques, it is impractical to generate voxels for a room-size or larger indoor space. That is, the memory of modern desktop computers is insufficient to store indications of all the voxels. As such, voxels may be compacted using octrees and hashing.
It is noted that the difficulty with representing indoor scenes as planar is that planar surfaces in the real world are usually not strictly planar. For example, attached on walls there can be power plugs, switches, paintings (e.g., painting 204, or the like), etc. Furthermore, using octree models, representation of large planar surfaces cannot be compressed as the large planar surface splits all the nodes it passes through. For example,
In general, routine 600 provides that for indoor scenes (e.g., walls, floors, ceilings, etc.), which usually occupy a significant portion of the non-free space be modeled as a surface. As noted above, these large planar surfaces cannot be compressed using octree or hashing. For example, for octree maps, their efficiency comes from the fact that only nodes near the surface of object are split into the finest resolution. However, as detailed above (e.g., see
Routine 600 can begin at block 602 “fit a plane to the planar surface” where a plane (e.g., defined in the X and Y coordinates, or the like) can be fit to the planar surface. For example, processing circuitry 104 can execute instructions 114 to fit a plane to the 2D plane 208 or the 2D plane 212. Continuing to block 604 “set values representing distance from the planar surface to fitted plane” where values indicating a distance between the actual surface (e.g., 2D plane 208, 2D plane 212, or the like) and the fit plane (e.g., the plane generated at block 602. For example, processing circuitry 104 can execute instructions 114 to set a value representing the distance from the actual surface to the fitted plane at the center position of the cell. With some examples, this value can be based on Truncated Signed Distance Function (TSDF). Additionally, with some examples, a weight can be set at block 604 where the weight is indicative of the confidence of the distance value (e.g., the TDSF value, or the like) and the occupancy state. More particularly, TDSF can mean the signed distance from the actual surface to the fitted plane. In some examples, the TDSF value can be updated whenever there is an observation of the surface near the fitted plane at the center position of corresponding cell. Furthermore, weights can mean a confidence and occupancy. Regarding the weights, with some examples, the weights may have an initial value of 0, which can be increased (e.g. w+=1) when there is an observation of the surface fit to the plane at this position, or decreased (e.g. w*=0.5, or the like to converge to 0 with infinite observations) when this position is observed to be free (unoccupied). A cell can be considered to be free if its weight is below a threshold (e.g., w<1.0).
As a specific example,
The 2.5D plane model 704 is updated when there is an aligned observation from a 3D sensor (e.g., scene capture device 102, or the like). Alignment is described in greater detail below. With some examples, updating a 2.5D plane model 704 can be based on the following pseudocode.
In the pseudocode above, the function “to_plane_frame” denotes the process of transforming a given point into the coordinate frame of the plane, which is defined in a way that the fit plane is spanned by the X-and Y-axis, and the Z-axis points towards the sensor. More specifically, the fit plane 706 is represented in the X-axis and Y-axis where the Z-axis points towards scene capture device 102. It is noted that the above pseudocode are just one example of an update algorithm and the present disclosure could be implemented using different updating algorithms under the same principle of the TSDF and weight definition.
Returning to
It is noted, that the number of occupied voxels represented by 3D data is significantly reduced (e.g.,
The present disclosure provides for real-time (e.g., live, or the like) indoor scene (e.g., indoor scene 800, or the like) reconstruction without the need for a GPU. For example, indoor scene 800 was reconstructed in real-time by integrating over 20 depth camera frames per second on a single core of a modern CPU. An additional advantage of the present disclosure is that it can be used to further enhance understanding of the scene by machine learning applications. For example, as planar surfaces (e.g., walls, floors, ceilings, etc.) can be explicitly modeled, the machine learning agent can further infer the spatial structure of the scene, such as to segment rooms based on wall information, to ignore walls, floors, ceilings, and focus on things in the room, or the like. As a specific example, a machine learning agent can infer planar surfaces (e.g., walls, ceilings, floors, etc.) from the 2.5D plane data 118 and can then focus on objects represented in the 3D data 120, for example, to identify objects within an indoor scene without needing to parse the objects out from the planar surfaces.
The instructions 1008 transform the general, non-programmed machine 1000 into a particular machine 1000 programmed to carry out the described and illustrated functions in a specific manner. In alternative embodiments, the machine 1000 operates as a standalone device or may be coupled (e.g., networked) to other machines. In a networked deployment, the machine 1000 may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine 1000 may comprise, but not be limited to, a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a set-top box (STB), a PDA, an entertainment media system, a cellular telephone, a smart phone, a mobile device, a wearable device (e.g., a smart watch), a smart home device (e.g., a smart appliance), other smart devices, a web appliance, a network router, a network switch, a network bridge, or any machine capable of executing the instructions 1008, sequentially or otherwise, that specify actions to be taken by the machine 1000. Further, while only a single machine 1000 is illustrated, the term “machine” shall also be taken to include a collection of machines 1000 that individually or jointly execute the instructions 1008 to perform any one or more of the methodologies discussed herein.
The machine 1000 may include processors 1002, memory 1004, and I/O components 1042, which may be configured to communicate with each other such as via a bus 1044. In an example embodiment, the processors 1002 (e.g., a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an ASIC, a Radio-Frequency Integrated Circuit (RFIC), a neural-network (NN) processor, an artificial intelligence accelerator, a vision processing unit (VPU), a graphics processing unit (GPU) another processor, or any suitable combination thereof) may include, for example, a processor 1006 and a processor 1010 that may execute the instructions 1008.
The term “processor” is intended to include multi-core processors that may comprise two or more independent processors (sometimes referred to as “cores”) that may execute instructions contemporaneously. Although
The memory 1004 may include a main memory 1012, a static memory 1014, and a storage unit 1016, both accessible to the processors 1002 such as via the bus 1044. The main memory 1004, the static memory 1014, and storage unit 1016 store the instructions 1008 embodying any one or more of the methodologies or functions described herein. The instructions 1008 may also reside, completely or partially, within the main memory 1012, within the static memory 1014, within machine-readable medium 1018 within the storage unit 1016, within at least one of the processors 1002 (e.g., within the processor's cache memory), or any suitable combination thereof, during execution thereof by the machine 1000.
The I/O components 1042 may include a wide variety of components to receive input, provide output, produce output, transmit information, exchange information, capture measurements, and so on. The specific I/O components 1042 that are included in a particular machine will depend on the type of machine. For example, portable machines such as mobile phones will likely include a touch input device or other such input mechanisms, while a headless server machine will likely not include such a touch input device. It will be appreciated that the I/O components 1042 may include many other components that are not shown in
In further example embodiments, the I/O components 1042 may include biometric components 1032, motion components 1034, environmental components 1036, or position components 1038, among a wide array of other components. For example, the biometric components 1032 may include components to detect expressions (e.g., hand expressions, facial expressions, vocal expressions, body gestures, or eye tracking), measure biosignals (e.g., blood pressure, heart rate, body temperature, perspiration, or brain waves), identify a person (e.g., voice identification, retinal identification, facial identification, fingerprint identification, or electroencephalogram-based identification), and the like. The motion components 1034 may include acceleration sensor components (e.g., accelerometer), gravitation sensor components, rotation sensor components (e.g., gyroscope), and so forth. The environmental components 1036 may include, for example, illumination sensor components (e.g., photometer), temperature sensor components (e.g., one or more thermometers that detect ambient temperature), humidity sensor components, pressure sensor components (e.g., barometer), depth and/or proximity sensor components (e.g., infrared sensors that detect nearby objects, depth cameras, 3D cameras, stereoscopic cameras, or the like), gas sensors (e.g., gas detection sensors to detection concentrations of hazardous gases for safety or to measure pollutants in the atmosphere), or other components that may provide indications, measurements, or signals corresponding to a surrounding physical environment. The position components 1038 may include location sensor components (e.g., a GPS receiver component), altitude sensor components (e.g., altimeters or barometers that detect air pressure from which altitude may be derived), orientation sensor components (e.g., magnetometers), and the like.
Communication may be implemented using a wide variety of technologies. The I/O components 1042 may include communication components 1040 operable to couple the machine 1000 to a network 1020 or devices 1022 via a coupling 1024 and a coupling 1026, respectively. For example, the communication components 1040 may include a network interface component or another suitable device to interface with the network 1020. In further examples, the communication components 1040 may include wired communication components, wireless communication components, cellular communication components, Near Field Communication (NFC) components, Bluetooth® components (e.g., Bluetooth® Low Energy), Wi-Fi® components, and other communication components to provide communication via other modalities. The devices 1022 may be another machine or any of a wide variety of peripheral devices (e.g., a peripheral device coupled via a USB).
Moreover, the communication components 1040 may detect identifiers or include components operable to detect identifiers. For example, the communication components 1040 may include Radio Frequency Identification (RFID) tag reader components, NFC smart tag detection components, optical reader components (e.g., an optical sensor to detect one-dimensional bar codes such as Universal Product Code (UPC) bar code, multi-dimensional bar codes such as Quick Response (QR) code, Aztec code, Data Matrix, Dataglyph, MaxiCode, PDF417, Ultra Code, UCC RSS-2D bar code, and other optical codes), or acoustic detection components (e.g., microphones to identify tagged audio signals). In addition, a variety of information may be derived via the communication components 1040, such as location via Internet Protocol (IP) geolocation, location via Wi-Fi® signal triangulation, location via detecting an NFC beacon signal that may indicate a particular location, and so forth.
The various memories (i.e., memory 1004, main memory 1012, static memory 1014, and/or memory of the processors 1002) and/or storage unit 1016 may store one or more sets of instructions and data structures (e.g., software) embodying or utilized by any one or more of the methodologies or functions described herein. These instructions (e.g., the instructions 1008), when executed by processors 1002, cause various operations to implement the disclosed embodiments.
As used herein, the terms “machine-storage medium,” “device-storage medium,” “computer-storage medium” mean the same thing and may be used interchangeably in this disclosure. The terms refer to a single or multiple storage devices and/or media (e.g., a centralized or distributed database, and/or associated caches and servers) that store executable instructions and/or data. The terms shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media, including memory internal or external to processors. Specific examples of machine-storage media, computer-storage media and/or device-storage media include non-volatile memory, including by way of example semiconductor memory devices, e.g., erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), FPGA, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The terms “machine-storage media,” “computer-storage media,” and “device-storage media” specifically exclude carrier waves, modulated data signals, and other such media, at least some of which are covered under the term “signal medium” discussed below.
In various example embodiments, one or more portions of the network 1020 may be an ad hoc network, an intranet, an extranet, a VPN, a LAN, a WLAN, a WAN, a WWAN, a MAN, the Internet, a portion of the Internet, a portion of the PSTN, a plain old telephone service (POTS) network, a cellular telephone network, a wireless network, a Wi-Fi® network, another type of network, or a combination of two or more such networks. For example, the network 1020 or a portion of the network 1020 may include a wireless or cellular network, and the coupling 1024 may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile communications (GSM) connection, or another type of cellular or wireless coupling. In this example, the coupling 1024 may implement any of a variety of types of data transfer technology, such as Single Carrier Radio Transmission Technology (1×RTT), Evolution-Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) technology, Enhanced Data rates for GSM Evolution (EDGE) technology, third Generation Partnership Project (3GPP) including 3G, fourth generation wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Worldwide Interoperability for Microwave Access (WiMAX), Long Term Evolution (LTE) standard, others defined by various standard-setting organizations, other long range protocols, or other data transfer technology.
The instructions 1008 may be transmitted or received over the network 1020 using a transmission medium via a network interface device (e.g., a network interface component included in the communication components 1040) and utilizing any one of a number of well-known transfer protocols (e.g., hypertext transfer protocol (HTTP)). Similarly, the instructions 1008 may be transmitted or received using a transmission medium via the coupling 1026 (e.g., a peer-to-peer coupling) to the devices 1022. The terms “transmission medium” and “signal medium” mean the same thing and may be used interchangeably in this disclosure. The terms “transmission medium” and “signal medium” shall be taken to include any intangible medium that is capable of storing, encoding, or carrying the instructions 1008 for execution by the machine 1000, and includes digital or analog communications signals or other intangible media to facilitate communication of such software. Hence, the terms “transmission medium” and “signal medium” shall be taken to include any form of modulated data signal, carrier wave, and so forth. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a matter as to encode information in the signal.
Terms used herein should be accorded their ordinary meaning in the relevant arts, or the meaning indicated by their use in context, but if an express definition is provided, that meaning controls.
Herein, references to “one embodiment” or “an embodiment” do not necessarily refer to the same embodiment, although they may. Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively, unless expressly limited to a single one or multiple ones. Additionally, the words “herein,” “above,” “below” and words of similar import, when used in this application, refer to this application as a whole and not to any particular portions of this application. When the claims use the word “or” in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list, unless expressly limited to one or the other. Any terms not expressly defined herein have their conventional meaning as commonly understood by those having skill in the relevant art(s).
The following are a number of illustrative examples of the disclosure. These examples pertain to further embodiments, from which numerous permutations and configurations will be apparent.
Example 1. A computing apparatus, the computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to: receive, from a depth measurement device, scene capture data comprising indications of an indoor scene; identify a planar area of the indoor scene from the scene capture data; model the planar area using a two-and-a-half-dimensional (2.5D) model; identify a non-planar area of the indoor scene from the scene capture data; model the non-planar area of the indoor scene using a three-dimensional (3D) model; and generate visualization data comprising indications of a digital reconstruction of the indoor scene based on the 2.5D model and the 3D model.
Example 2. The computing apparatus of claim 1, model the planar area using the 2.5D model comprising: fit a planar surface to the planar area; and set, for each a plurality of points on the plane, a distance from the fit plane to the planar surface.
Example 3. The computing apparatus of claim 2, comprising derive the distance from the fit plane to the planar surface based on a truncated signed distance function (TSDF).
Example 4. The computing apparatus of claim 2, comprising set, for each of the plurality of points on the plane, a weight value, wherein the weight value comprising an indication of a confidence of the distance.
Example 5. The computing apparatus of claim 1, wherein the scene capture data comprises a plurality of points, the method comprising: mark ones of the plurality of points associated with the planar area; and identify the non-planar area from the ones of the plurality of points that are not marked.
Example 6. The computing apparatus of claim 1, model the non-planar area using the 3D model comprising deriving voxel values and node values representing the non-planar area.
Example 7. A computer implemented method, comprising: receiving, from a depth measurement device, scene capture data comprising indications of an indoor scene; identifying a planar area of the indoor scene from the scene capture data; modeling the planar area using a two-and-a-half-dimensional (2.5D) model; identifying a non-planar area of the indoor scene from the scene capture data; modeling the non-planar area of the indoor scene using a three-dimensional (3D) model; and generating visualization data comprising indications of a digital reconstruction of the indoor scene based on the 2.5D model and the 3D model.
Example 8. The computer implemented method of claim 7, modeling the planar area using the 2.5D model comprising: fitting a planar surface to the planar area; and setting, for each a plurality of points on the plane, a distance from the fit plane to the planar surface.
Example 9. The computer implemented method of claim 8, comprising deriving the distance from the fit plane to the planar surface based on a truncated signed distance function (TSDF).
Example 10. The computer implemented method of claim 8, comprising setting, for each of the plurality of points on the plane, a weight value, wherein the weight value comprising an indication of a confidence of the distance.
Example 11. The computer implemented method of claim 7, wherein the scene capture data comprises a plurality of points, the method comprising: marking ones of the plurality of points associated with the planar area; and identifying the non-planar area from the ones of the plurality of points that are not marked.
Example 12. The computer implemented method of claim 7, modeling the non-planar area using the 3D model comprising deriving voxel values and node values representing the non-planar area.
Example 13. A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to: receive, from a depth measurement device, scene capture data comprising indications of an indoor scene; identify a planar area of the indoor scene from the scene capture data; model the planar area using a two-and-a-half-dimensional (2.5D) model; identify a non-planar area of the indoor scene from the scene capture data; model the non-planar area of the indoor scene using a three-dimensional (3D) model; and generate visualization data comprising indications of a digital reconstruction of the indoor scene based on the 2.5D model and the 3D model.
Example 14. The computer-readable storage medium of claim 13, model the planar area using the 2.5D model comprising: fit a plane to the planar area; and set, for each a plurality of points on the plane, a distance from the fit planar surface to the planar surface.
Example 15. The computer-readable storage medium of claim 14, comprising derive the distance from the fit plane to the planar surface based on a truncated signed distance function (TSDF).
Example 16. The computer-readable storage medium of claim 14, comprising set, for each of the plurality of points on the plane, a weight value, wherein the weight value comprising an indication of a confidence of the distance.
Example 17. The computer-readable storage medium of claim 13, wherein the scene capture data comprises a plurality of points, the method comprising: mark ones of the plurality of points associated with the planar area; and identify the non-planar area from the ones of the plurality of points that are not marked.
Example 18. The computer-readable storage medium of claim 13, model the non-planar area using the 3D model comprising deriving voxel values and node values representing the non-planar area.
Example 19. An apparatus, comprising: means for receiving, from a depth measurement device, scene capture data comprising indications of an indoor scene; means for identifying a planar area of the indoor scene from the scene capture data; means for modeling the planar area using a two-and-a-half-dimensional (2.5D) model; means for identifying a non-planar area of the indoor scene from the scene capture data; means for modeling the non-planar area of the indoor scene using a three-dimensional (3D) model; and means for generating visualization data comprising indications of a digital reconstruction of the indoor scene based on the 2.5D model and the 3D model.
Example 20. The apparatus of claim 19, comprising means for fitting a planar surface to the planar area and means for setting, for each a plurality of points on the plane, a distance from the fit plane to the planar surface to model the planar area using the 2.5D model.
Example 21. The apparatus of claim 20, comprising means for deriving the distance from the fit plane to the planar surface based on a truncated signed distance function (TSDF).
Example 22. The apparatus of claim 20, comprising means for setting, for each of the plurality of points on the plane, a weight value, wherein the weight value comprising an indication of a confidence of the distance.
Example 23. The apparatus of claim 19, wherein the scene capture data comprises a plurality of points, the apparatus comprising means for marking ones of the plurality of points associated with the planar area and means for identifying the non-planar area from the ones of the plurality of points that are not marked.
Example 24. The apparatus of claim 19, comprising means for deriving voxel values and node values representing the non-planar area to model the non-planar area using the 3D model.
Example 25. A head worn computing device, comprising: a frame; a display coupled to the frame; a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to: receive, from a depth measurement device, scene capture data comprising indications of an indoor scene; identify a planar area of the indoor scene from the scene capture data; model the planar area using a two-and-a-half-dimensional (2.5D) model; identify a non-planar area of the indoor scene from the scene capture data; model the non-planar area of the indoor scene using a three-dimensional (3D) model; generate visualization data comprising indications of a digital reconstruction of the indoor scene based on the 2.5D model and the 3D model; and cause the digital reconstruction of the indoor scene to be displayed on the display.
Example 26. The head worn computing device of claim 25, wherein the head worn computing device is a virtual reality computing device or an alternative reality computing device.
Example 27. The head worn computing device of claim 25, model the planar area using the 2.5D model comprising: fit a planar surface to the planar area; and set, for each a plurality of points on the plane, a distance from the fit plane to the planar surface.
Example 28. The head worn computing device of claim 27, comprising derive the distance from the fit plane to the planar surface based on a truncated signed distance function (TSDF).
Example 29. The head worn computing device of claim 27, comprising set, for each of the plurality of points on the plane, a weight value, wherein the weight value comprising an indication of a confidence of the distance.
Example 30. The head worn computing device of claim 25, wherein the scene capture data comprises a plurality of points, the method comprising: mark ones of the plurality of points associated with the planar area; and identify the non-planar area from the ones of the plurality of points that are not marked.
Example 31. The head worn computing device of claim 25, model the non-planar area using the 3D model comprising deriving voxel values and node values representing the non-planar area.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2020/103432 | 7/22/2020 | WO |