An object or scene may be described using volumetric visual data consisting of a series of points. The points may be stored as a point cloud format that includes a collection of points in three-dimensional space. As point clouds can get quite large in data size, sending and processing point cloud data may need a data compression scheme that is specifically designed with respect to the unique characteristics of point cloud data.
The following summary presents a simplified summary of certain features. The summary is not an extensive overview and is not intended to identify key or critical elements.
Point cloud information (e.g., of a point cloud associated with content) may be predicted (e.g., between frames). A first plurality of sub-volumes (e.g., a current point cloud) may, for example, be coded based on a second plurality of sub-volumes (e.g., a reference point cloud). Geometry representation models of neighboring sub-volumes, of a current sub-volume, may be converted such that the converted geometry representation models may be of the same type used to represent a geometry of a portion of a point cloud contained in the current sub-volume. Additional geometry information of the neighboring sub-volumes may be obtained to predictively code (e.g., encode or decode) geometry information of the geometry representation model representing the point cloud portion in the current sub-volume. By increasing the amount of relevant geometry information, advantages may be achieved such as higher prediction accuracy, less coding bits and higher compression.
These and other features and advantages are described in greater detail below.
Some features are shown by way of example, and not by limitation, in the accompanying drawings. In the drawings, like numerals reference similar elements.
The accompanying drawings and descriptions provide examples. It is to be understood that the examples shown in the drawings and/or described are non-exclusive, and that features shown and described may be practiced in other examples. Examples are provided for operation of point cloud or point cloud sequence encoding or decoding systems. More particularly, the technology disclosed herein may relate to point cloud compression as used in encoding and/or decoding devices and/or systems.
At least some visual data may describe an object or scene in content and/or media using a series of points. Each point may comprise a position in two dimensions (x and y) and one or more optional attributes like color. Volumetric visual data may add another positional dimension to these visual data. For example, volumetric visual data may describe an object or scene in content and/or media using a series of points that each may comprise a position in three dimensions (x, y, and z) and one or more optional attributes like color, reflectance, time stamp, etc. Volumetric visual data may provide a more immersive way to experience visual data, for example, compared to the at least some visual data. For example, an object or scene described by volumetric visual data may be viewed from any (or multiple) angles, whereas the at least some visual data may generally only be viewed from the angle in which it was captured or rendered. As a format for the representation of visual data (e.g., volumetric visual data, three-dimensional video data, etc.) point clouds are versatile in their capability in representing all types of three-dimensional (3D) objects, scenes, and visual content, Point clouds are well suited for use in various applications including, among others: movie post-production, real-time 3D immersive media or telepresence, extended reality, free viewpoint video, geographical information systems, autonomous driving, 3D mapping, visualization, medicine, multi-view replay, and real-time Light Detection and Ranging (LiDAR) data acquisition.
As explained herein, volumetric visual data may be used in many applications, including extended reality (XR). XR encompasses various types of immersive technologies, including augmented reality (AR), virtual reality (VR), and mixed reality (MR). Sparse volumetric visual data may be used in the automotive industry for the representation of three-dimensional (3D) maps (e.g., cartography) or as input to assisted driving systems. In the case of assisted driving systems, volumetric visual data may be typically input to driving decision algorithms. Volumetric visual data may be used to store valuable objects in digital form. In applications for preserving cultural heritage, a goal may be to keep a representation of objects that may be threatened by natural disasters. For example, statues, vases, and temples may be entirely scanned and stored as volumetric visual data having several billions of samples. This use-case for volumetric visual data may be particularly relevant for valuable objects in locations where earthquakes, tsunamis and typhoons are frequent. Volumetric visual data may take the form of a volumetric frame. The volumetric frame may describe an object or scene captured at a particular time instance. Volumetric visual data may take the form of a sequence of volumetric frames (referred to as a volumetric sequence or volumetric video). The sequence of volumetric frames may describe an object or scene captured at multiple different time instances.
Volumetric visual data may be stored in various formats. A point cloud may comprise a collection of points in a 3D space. Such points may be used create a mesh comprising vertices and polygons, or other forms of visual content. As described herein, point cloud data may take the form of a point cloud frame, which describes an object or scene in content that is captured at a particular time instance. Point cloud data may take the form of a sequence of point cloud frames (e.g., point cloud video). As further described herein, point cloud data may be encoded by a source device (e.g., source device 102 as described herein with respect to
One format for storing volumetric visual data may be point clouds. A point cloud may comprise a collection of points in 3D space. Each point in a point cloud may comprise geometry information that may indicate the point's position in 3D space. For example, the geometry information may indicate the point's position in 3D space, for example, using three Cartesian coordinates (x, y, and z) and/or using spherical coordinates (r, phi, theta) (e.g., if acquired by a rotating sensor). The positions of points in a point cloud may be quantized according to a space precision. The space precision may be the same or different in each dimension. The quantization process may create a grid in 3D space. One or more points residing within each sub-grid volume may be mapped to the sub-grid center coordinates, referred to as voxels. A voxel may be considered as a 3D extension of pixels corresponding to the 2D image grid coordinates. For example, similar to a pixel being the smallest unit in the example of dividing the 2D space (or 2D image) into discrete, uniform (e.g., equally sized) regions, a voxel may be the smallest unit of volume in the example of dividing 3D space into discrete, uniform regions. A point in a point cloud may comprise one or more types of attribute information. Attribute information may indicate a property of a point's visual appearance. For example, attribute information may indicate a texture (e.g., color) of the point, a material type of the point, transparency information of the point, reflectance information of the point, a normal vector to a surface of the point, a velocity at the point, an acceleration at the point, a time stamp indicating when the point was captured, or a modality indicating how the point was captured (e.g., running, walking, or flying). A point in a point cloud may comprise light field data in the form of multiple view-dependent texture information. Light field data may be another type of optional attribute information.
The points in a point cloud may describe an object or a scene. For example, the points in a point cloud may describe the external surface and/or the internal structure of an object or scene. The object or scene may be synthetically generated by a computer. The object or scene may be generated from the capture of a real-world object or scene. The geometry information of a real-world object or a scene may be obtained by 3D scanning and/or photogrammetry. 3D scanning may include different types of scanning, for example, laser scanning, structured light scanning, and/or modulated light scanning. 3D scanning may obtain geometry information. 3D scanning may obtain geometry information, for example, by moving one or more laser heads, structured light cameras, and/or modulated light cameras relative to an object or scene being scanned. Photogrammetry may obtain geometry information. Photogrammetry may obtain geometry information, for example, by triangulating the same feature or point in different spatially shifted 2D photographs. Point cloud data may take the form of a point cloud frame. The point cloud frame may describe an object or scene captured at a particular time instance. Point cloud data may take the form of a sequence of point cloud frames. The sequence of point cloud frames may be referred to as a point cloud sequence or point cloud video. The sequence of point cloud frames may describe an object or scene captured at multiple different time instances.
The data size of a point cloud frame or point cloud sequence may be excessive (e.g., too large) for storage and/or transmission in many applications. For example, a single point cloud may comprise over a million points or even billions of points. Each point may comprise geometry information and one or more optional types of attribute information. The geometry information of each point may comprise three Cartesian coordinates (x, y, and z) and/or spherical coordinates (r, phi, theta) that may be each represented, for example, using at least 10 bits per component or 30 bits in total. The attribute information of each point may comprise a texture corresponding to a plurality of (e.g., three) color components (e.g., R, G, and B color components). Each color component may be represented, for example, using 8-10 bits per component or 24-30 bits in total. For example, a single point may comprise at least 54 bits of information, with at least 30 bits of geometry information and at least 24 bits of texture. If a point cloud frame includes a million such points, each point cloud frame may require 54 million bits or 54 megabits to represent. For dynamic point clouds that change over time, at a frame rate of 30 frames per second, a data rate of 1.32 gigabits per second may be required to send (e.g., transmit) the points of the point cloud sequence. Raw representations of point clouds may require a large amount of data, and the practical deployment of point-cloud-based technologies may need compression technologies that enable the storage and distribution of point clouds with a reasonable cost.
Encoding may be used to compress and/or reduce the data size of a point cloud frame or point cloud sequence to provide for more efficient storage and/or transmission. Decoding may be used to decompress a compressed point cloud frame or point cloud sequence for display and/or other forms of consumption (e.g., by a machine learning based device, neural network-based device, artificial intelligence-based device, or other forms of consumption by other types of machine-based processing algorithms and/or devices). Compression of point clouds may be lossy (introducing differences relative to the original data) for the distribution to and visualization by an end-user, for example, on AR or VR glasses or any other 3D-capable device. Lossy compression may allow for a high ratio of compression but may imply a trade-off between compression and visual quality perceived by an end-user. Other frameworks, for example, frameworks for medical applications or autonomous driving, may require lossless compression to avoid altering the results of a decision obtained, for example, based on the analysis of the sent (e.g., transmitted) and decompressed point cloud frame.
A source device 102 may comprise a point cloud source 112, an encoder 114, and an output interface 116. A source device 102 may comprise a point cloud source 112, an encoder 114, and an output interface 116, for example, to encode point cloud sequence 108 into a bitstream 110. Point cloud source 112 may provide (e.g., generate) point cloud sequence 108, for example, from a capture of a natural scene and/or a synthetically generated scene. A synthetically generated scene may be a scene comprising computer generated graphics. Point cloud source 112 may comprise one or more point cloud capture devices, a point cloud archive comprising previously captured natural scenes and/or synthetically generated scenes, a point cloud feed interface to receive captured natural scenes and/or synthetically generated scenes from a point cloud content provider, and/or a processor(s) to generate synthetic point cloud scenes. The point cloud capture devices may include, for example, one or more laser scanning devices, structured light scanning devices, modulated light scanning devices, and/or passive scanning devices.
Point cloud sequence 108 may comprise a series of point cloud frames 124 (e.g., an example shown in
Encoder 114 may encode point cloud sequence 108 into a bitstream 110. To encode point cloud sequence 108, encoder 114 may use one or more lossless or lossy compression techniques to reduce redundant information in point cloud sequence 108. To encode point cloud sequence 108, encoder 114 may use one or more prediction techniques to reduce redundant information in point cloud sequence 108. Redundant information is information that may be predicted at a decoder 120 and may not be needed to be sent (e.g., transmitted) to decoder 120 for accurate decoding of point cloud sequence 108. For example, Motion Picture Expert Group (MPEG) introduced a geometry-based point cloud compression (G-PCC) standard (ISO/IEC standard 23090-9: Geometry-based point cloud compression). G-PCC specifies the encoded bitstream syntax and semantics for transmission and/or storage of a compressed point cloud frame and the decoder operation for reconstructing the compressed point cloud frame from the bitstream. During standardization of G-PCC, a reference software (ISO/IEC standard 23090-21: Reference Software for G-PCC) was developed to encode the geometry and attribute information of a point cloud frame. To encode geometry information of a point cloud frame, the G-PCC reference software encoder may perform voxelization. The G-PCC reference software encoder may perform voxelization, for example, by quantizing positions of points in a point cloud. Quantizing positions of points in a point cloud may create a grid in 3D space. The G-PCC reference software encoder may map the points to the center coordinates of the sub-grid volume (e.g., voxel) that their quantized locations reside in. The G-PCC reference software encoder may perform geometry analysis using an occupancy tree to compress the geometry information. The G-PCC reference software encoder may entropy encode the result of the geometry analysis to further compress the geometry information. To encode attribute information of a point cloud, the G-PCC reference software encoder may use a transform tool, such as Region Adaptive Hierarchical Transform (RAHT), the Predicting Transform, and/or the Lifting Transform. The Lifting Transform may be built on top of the Predicting Transform. The Lifting Transform may include an extra update/lifting step. The Lifting Transform and the Predicting Transform may be referred to as Predicting/Lifting Transform or pred lift. Encoder 114 may operate in a same or similar manner to an encoder provided by the G-PCC reference software.
Output interface 116 may be configured to write and/or store bitstream 110 onto transmission medium 104. The bitstream 110 may be sent (e.g., transmitted) to destination device 106. In addition or alternatively, output interface 116 may be configured to send (e.g., transmit), upload, and/or stream bitstream 110 to destination device 106 via transmission medium 104. Output interface 116 may comprise a wired and/or wireless transmitter configured to send (e.g., transmit), upload, and/or stream bitstream 110 according to one or more proprietary, open-source, and/or standardized communication protocols. The one or more proprietary, open-source, and/or standardized communication protocols may include, for example, Digital Video Broadcasting (DVB) standards, Advanced Television Systems Committee (ATSC) standards, Integrated Services Digital Broadcasting (ISDB) standards, Data Over Cable Service Interface Specification (DOCSIS) standards, 3rd Generation Partnership Project (3GPP) standards, Institute of Electrical and Electronics Engineers (IEEE) standards, Internet Protocol (IP) standards, Wireless Application Protocol (WAP) standards, and/or any other communication protocol.
Transmission medium 104 may comprise a wireless, wired, and/or computer readable medium. For example, transmission medium 104 may comprise one or more wires, cables, air interfaces, optical discs, flash memory, and/or magnetic memory. In addition or alternatively, transmission medium 104 may comprise one or more networks (e.g., the Internet) or file server(s) configured to store and/or send (e.g., transmit) encoded video data.
Destination device 106 may decode bitstream 110 into point cloud sequence 108 for display or other forms of consumption. Destination device 106 may comprise one or more of an input interface 118, a decoder 120, and/or a point cloud display 122. Input interface 118 may be configured to read bitstream 110 stored on transmission medium 104. Bitstream 110 may be stored on transmission medium 104 by source device 102. In addition or alternatively, input interface 118 may be configured to receive, download, and/or stream bitstream 110 from source device 102 via transmission medium 104. Input interface 118 may comprise a wired and/or wireless receiver configured to receive, download, and/or stream bitstream 110 according to one or more proprietary, open-source, standardized communication protocols, and/or any other communication protocol. Examples of the protocols include Digital Video Broadcasting (DVB) standards, Advanced Television Systems Committee (ATSC) standards, Integrated Services Digital Broadcasting (ISDB) standards, Data Over Cable Service Interface Specification (DOCSIS) standards, 3rd Generation Partnership Project (3GPP) standards, Institute of Electrical and Electronics Engineers (IEEE) standards, Internet Protocol (IP) standards, and Wireless Application Protocol (WAP) standards.
Decoder 120 may decode point cloud sequence 108 from encoded bitstream 110. For example, decoder 120 may operate in a same or similar manner as a decoder provided by G-PCC reference software. Decoder 120 may decode a point cloud sequence that approximates a point cloud sequence 108. Decoder 120 may decode a point cloud sequence that approximates a point cloud sequence 108 due to, for example, lossy compression of the point cloud sequence 108 by encoder 114 and/or errors introduced into encoded bitstream 110, for example, if transmission to destination device 106 occurs.
Point cloud display 122 may display a point cloud sequence 108 to a user. The point cloud display 122 may comprise, for example, a cathode rate tube (CRT) display, a liquid crystal display (LCD), a plasma display, a light emitting diode (LED) display, a 3D display, a holographic display, a head-mounted display, or any other display device suitable for displaying point cloud sequence 108.
Point cloud coding (e.g., encoding/decoding) system 100 is presented by way of example and not limitation. Point cloud coding systems different from the point cloud coding system 100 and/or modified versions of the point cloud coding system 100 may perform the methods and processes as described herein. For example, the point cloud coding system 100 may comprise other components and/or arrangements. Point cloud source 112 may, for example, be external to source device 102. Point cloud display device 122 may, for example, be external to destination device 106 or omitted altogether (e.g., if point cloud sequence 108 is intended for consumption by a machine and/or storage device). Source device 102 may further comprise, for example, a point cloud decoder. Destination device 106 may comprise, for example, a point cloud encoder. For example, source device 102 may be configured to further receive an encoded bit stream from destination device 106. Receiving an encoded bit stream from destination device 106 may support two-way point cloud transmission between the devices.
As described herein, an encoder may quantize the positions of points in a point cloud according to a space precision, which may be the same or different in each dimension of the points. The quantization process may create a grid in 3D space. The encoder may map any points residing within each sub-grid volume to the sub-grid center coordinates, referred to as a voxel or a volumetric pixel. A voxel may be considered as a 3D extension of pixels corresponding to 2D image grid coordinates.
An encoder may represent or code a point cloud (e.g., a voxelized). An encoder may represent or code a point cloud, for example, using an occupancy tree. For example, the encoder may split the initial volume or cuboid containing the point cloud into sub-cuboids. The initial volume or cuboid may be referred to as a bounding box. A cuboid may be, for example, a cube. The encoder may recursively split each sub-cuboid that contains at least one point of the point cloud. The encoder may not further split sub-cuboids that do not contain at least one point of the point cloud. A sub-cuboid that contains at least one point of the point cloud may be referred to as an occupied sub-cuboid. A sub-cuboid that does not contain at least one point of the point cloud may be referred to as an unoccupied sub-cuboid. The encoder may split an occupied sub-cuboid into, for example, two sub-cuboids (to form a binary tree), four sub-cuboids (to form a quadtree), or eight sub-cuboids (to form an octree). The encoder may split an occupied sub-cuboid to obtain further sub-cuboids. The sub-cuboids may have the same size and shape at a given depth level of the occupancy tree. The sub-cuboids may have the same size and shape at a given depth level of the occupancy tree, for example, if the encoder splits the occupied sub-cuboid along a plane passing through the middle of edges of the sub-cuboid.
The initial volume or cuboid containing the point cloud may correspond to the root node of the occupancy tree. Each occupied sub-cuboid, split from the initial volume, may correspond to a node (of the root node) in a second level of the occupancy tree. Each occupied sub-cuboid, split from an occupied sub-cuboid in the second level, may correspond to a node (off the occupied sub-cuboid in the second level from which it was split) in a third level of the occupancy tree. The occupancy tree structure may continue to form in this manner for each recursive split iteration until, for example, some maximum depth level of the occupancy tree is reached or each occupied sub-cuboid has a volume corresponding to one voxel.
Each non-leaf node of the occupancy tree may comprise or be associated with an occupancy word representing the occupancy state of the cuboid corresponding to the node. For example, a node of the occupancy tree corresponding to a cuboid that is split into 8 sub-cuboids may comprise or be associated with a 1-byte occupancy word. Each bit (referred to as an occupancy bit) of the 1-byte occupancy word may represent or indicate the occupancy of a different one of the eight sub-cuboids. Occupied sub-cuboids may be each represented or indicated by a binary “1” in the 1-byte occupancy word. Unoccupied sub-cuboids may be each represented or indicated by a binary “0” in the 1-byte occupancy word. Occupied and un-occupied sub-cuboids may be represented or indicated by opposite 1-bit binary values (e.g., a binary “0” representing or indicating an occupied sub-cuboid and a binary “1” representing or indicating an unoccupied sub-cuboid) in the 1-byte occupancy word.
Each bit of an occupancy word may represent or indicate the occupancy of a different one of the eight sub-cuboids. Each bit of an occupancy word may represent or indicate the occupancy of a different one of the eight sub-cuboids, for example, following the so-called Morton order. For example, the least significant bit of an occupancy word may represent or indicate, for example, the occupancy of a first one of the eight sub-cuboids following the Morton order. The second least significant bit of an occupancy word may represent or indicate, for example, the occupancy of a second one of the eight sub-cuboids following the Morton order, etc.
The geometry of a point cloud may be represented by, and may be determined from, the initial volume and the occupancy words of the nodes in an occupancy tree. An encoder may send (e.g., transmit) the initial volume and the occupancy words of the nodes in the occupancy tree in a bitstream to a decoder for reconstructing the point cloud. The encoder may entropy encode the occupancy words. The encoder may entropy encode the occupancy words, for example, before sending (e.g., transmitting) the initial volume and the occupancy words of the nodes in the occupancy tree. The encoder may encode an occupancy bit of an occupancy word of a node corresponding to a cuboid. The encoder may encode an occupancy bit of an occupancy word of a node corresponding to a cuboid, for example, based on one or more occupancy bits of occupancy words of other nodes corresponding to cuboids that are adjacent or spatially close to the cuboid of the occupancy bit being encoded.
An encoder and/or a decoder may code (e.g., encode and/or decode) occupancy bits of occupancy words in sequence of a scan order. The scan order may also be referred to as a scanning order. For example, an encoder and/or a decoder may scan an occupancy tree in breadth-first order. All the occupancy words of the nodes of a given depth (e.g., level) within the occupancy tree may be scanned. All the occupancy words of the nodes of a given depth (e.g., level) within the occupancy tree may be scanned, for example, before scanning the occupancy words of the nodes of the next depth (e.g., level). Within a given depth, the encoder and/or decoder may scan the occupancy words of nodes in the Morton order. Within a given node, the encoder and/or decoder may scan the occupancy bits of the occupancy word of the node further in the Morton order.
Each of occupied sub-cuboids (e.g., two occupied sub-cuboids 304 and 306) may correspond to a node off the root node in a second level of an occupancy tree 300. The occupied sub-cuboids (e.g., two occupied sub-cuboids 304 and 306) may be each further split into eight sub-cuboids. For example, one of the sub-cuboids 308 of the eight sub-cuboids split from the sub-cube 304 may be occupied, and the other seven sub-cuboids may be unoccupied. Three of the sub-cuboids 310, 312, and 314 of the eight sub-cuboids split from the sub-cube 306 may be occupied, and the other five sub-cuboids of the eight sub-cuboids split from the sub-cube 306 may be unoccupied. Two second eight-bit occupancy words occW2,1 and occW2,2 may be constructed in this order to respectively represent the occupancy word of the node corresponding to the sub-cuboid 304 and the occupancy word of the node corresponding to the sub-cuboid 306.
Each of occupied sub-cuboids (e.g., four occupied sub-cuboids 308, 310, 312, and 314) may correspond to a node in a third level of an occupancy tree 300. The occupied sub-cuboids (e.g., four occupied sub-cuboids 308, 310, 312, and 314) may be each further split into eight sub-cuboids or 32 sub-cuboids in total. For example, four third level eight-bit occupancy words occW3,1, occW3,2, occW3,3 and occW3,4 may be constructed in this order to respectively represent the occupancy word of the node corresponding to the sub-cuboid 308, the occupancy word of the node corresponding to the sub-cuboid 310, the occupancy word of the node corresponding to the sub-cuboid 312, and the occupancy word of the node corresponding to the sub-cuboid 314.
Occupancy words of an example occupancy tree 300 may be entropy coded (e.g., entropy encoded by an encoder and/or entropy decoded by a decoder), for example, following the scanning order discussed herein (e.g., Morton order). The occupancy words of the example occupancy tree 300 may be entropy coded (e.g., entropy encoded by an encoder and/or entropy decoded by a decoder) as the succession of the seven occupancy words occW1,1 to occW3,4, for example, following the scanning order discussed herein. The scanning order discussed herein may be a breadth-first scanning order. The occupancy word(s) of all node(s) having the same depth (or level) as a current parent node may have already been entropy coded, for example, if the occupancy word of a current child node belonging to the current parent node is being entropy coded. For example, the occupancy word(s) of all node(s) having the same depth (e.g., level) as the current child node and having a lower Morton order than the current child node may have also already been entropy coded. Part of the already coded occupancy word(s) may be used to entropy code the occupancy word of the current child node. The already coded occupancy word(s) of neighboring parent and child node(s) may be used, for example, to entropy code the occupancy word of the current child node. The occupancy bit(s) of the occupancy word having a lower Morton order than a particular occupancy bit may have also already been entropy coded and may be used to code the occupancy bit of the occupancy word of the current child node, for example, if the particular occupancy bit of the occupancy word of the current child node is being coded (e.g., entropy coded).
The number (e.g., quantity) of possible occupancy configurations (e.g., sets of one or more occupancy words and/or occupancy bits) for a neighborhood of a current child cuboid may be 2N, where N is the number (e.g., quantity) of cuboids in the neighborhood of the current child cuboid with already-coded occupancy bits. The neighborhood of the current child cuboid may comprise several dozens of cuboids. The neighborhood of the current child cuboid (e.g., several dozens of cuboids) may comprise 26 adjacent parent cuboids sharing a face, an, edge, and/or a vertex with the parent cuboid of the current child cuboid and also several adjacent child cuboids having occupancy bits already coded sharing a face, an edge, or a vertex with the current child cuboid. The occupancy configuration for a neighborhood of the current child cuboid may have billions of possible occupancy configurations, even limited to a subset of the adjacent cuboids, making its direct use impractical. An encoder and/or decoder may use the occupancy configuration for a neighborhood of the current child cuboid to select the context (e.g., a probability model), among a set of contexts, of a binary entropy coder (e.g., binary arithmetic coder) that may code the occupancy bit of the current child cuboid. The context-based binary entropy coding may be similar to the Context Adaptive Binary Arithmetic Coder (CABAC) used in MPEG-H Part 2 (also known as High Efficiency Video Coding (HEVC)).
An encoder and/or a decoder may use several methods to reduce the occupancy configurations for a neighborhood of a current child cuboid being coded to a practical number (e.g., quantity) of reduced occupancy configurations. The 26 or 64 occupancy configurations of the six adjacent parent cuboids sharing a face with the parent cuboid of the current child cuboid may be reduced to 9 occupancy configurations. The occupancy configurations may be reduced by using geometry invariance. An occupancy score for the current child cuboid may be obtained from the 226 occupancy configurations of the 26 adjacent parent cuboids. The score may be further reduced into a ternary occupancy prediction (e.g., “predicted occupied,” “unsure”, or “predicted unoccupied”) by using score thresholds. The number (e.g., quantity) of occupied adjacent child cuboids and the number (e.g., quantity) of unoccupied adjacent child cuboids may be used instead of the individual occupancies of these child cuboids.
An encoder and/or a decoder using/employing one or more of the methods described herein may reduce the number (e.g., quantity) of possible occupancy configurations for a neighborhood of a current child cuboid to a more manageable number (e.g., a few thousands). It has been observed that instead of associating a reduced number (e.g., quantity) of contexts (e.g., probability models) directly to the reduced occupancy configurations, another mechanism may be used, namely Optimal Binary Coders with Update on the Fly (OBUF). An encoder and/or a decoder may implement OBUF to limit the number (e.g., quantity) of contexts to a lower number (e.g., 32 contexts).
OBUF may use a limited number (e.g., 32) of contexts (e.g., probability models). The number (e.g., quantity) of contexts in OBUF may be a fixed number (e.g., fixed quantity). The contexts used by OBUF may be ordered, referred to by a context index (e.g., a context index in the range of 0 to 31), and associated from a lowest virtual probability to a highest virtual probability to code a “1”. A Look-Up Table (LUT) of context indices may be initialized at the beginning of a point cloud coding process. For example, the LUT may initially point to a context (e.g., with a context index 15) with the median virtual probability to code a “1” for all input. The LUT may initially point to a context with the median virtual probability to code a “1”, among the limited number (e.g., quantity) of contexts, for all input. This LUT may take an occupancy configuration for a neighborhood of current child cuboid as input and output the context index associated with the occupancy configuration. The LUT may have as many entries as reduced occupancy configurations (e.g., around a few thousand entries). The coding of the occupancy bit of a current child cuboid may comprise steps including determining the reduced occupancy configuration of the current child node, obtaining a context index by using the reduced occupancy configuration as an entry to the LUT, coding the occupancy bit of the current child cuboid by using the context pointed to (or indicated) by the context index, and updating the LUT entry corresponding to the reduced occupancy configuration, for example, based on the value of the coded occupancy bit of the current child cuboid. The LUT entry may be decreased to a lower context index value, for example, if a binary “0” (e.g., indicating the current child cuboid is unoccupied) is coded. The LUT entry may be increased to a higher context index value, for example, if a binary “1” (e.g., indicating the current child cuboid is occupied) is coded. The update process of the context index may be, for example, based on a theoretical model of optimal distribution for virtual probabilities associated with the limited number (e.g., quantity) of contexts. This virtual probability may be fixed by a model and may be different from the internal probability of the context that may evolve, for example, if the coding of bits of data occurs. The evolution of the internal context may follow a well-known process similar to the process in CABAC.
An encoder and/or a decoder may implement a “dynamic OBUF” scheme. The “dynamic OBUF” scheme may enable an encoder and/or a decoder to handle a much larger number (e.g., quantity) of occupancy configurations for a neighborhood of a current child cuboid, for example, than general OBUF. The use of a larger number (e.g., quantity) of occupancy configurations for a neighborhood of a current child cuboid may lead to improved compression capabilities, and may maintain complexity within reasonable bounds. By using an occupancy tree compressed by OBUF, an encoder and/or a decoder may reach a lossless compression performance as good as 1 bit per point (bpp) for coding the geometry of dense point clouds. An encoder and/or a decoder may implement dynamic OBUF to potentially further reduce the bit rate by more than 25% to 0.7 bpp.
OBUF may not take as input a large variety of reduced occupancy configurations for a neighborhood of a current child cuboid, and may potentially cause a loss of useful correlation. With OBUF, the size of the LUT of context indices may be increased to handle more various occupancy configurations for a neighborhood of a current child cuboid as input. Due to such increase, statistics may be diluted, and compression performance may be worsened. For example, if the LUT has millions of entries and the point cloud has a hundred thousand points, then most of the entries may be never visited (e.g., looked up, accessed, etc.). Many entries may be visited only a few times and their associated context index may not be updated enough times to reflect any meaningful correlation between the occupancy configuration value and the probability of occupancy of the current child cuboid. Dynamic OBUF may be implemented to mitigate the dilution of statistics due to the increase of the number (e.g., quantity) of occupancy configurations for a neighborhood of a current child cuboid. This mitigation may be performed by a “dynamic reduction” of occupancy configurations in dynamic OBUF.
Dynamic OBUF may add an extra step of reduction of occupancy configurations for a neighborhood of a current child cuboid, for example, before using the LUT of context indices. This step may be called a dynamic reduction because it evolves, for example, based on the progress of the coding of the point cloud or, more precisely, based on already visited (e.g., looked up in the LUT) occupancy configurations.
As discussed herein, many possible occupancy configurations for a neighborhood of a current child cuboid may be potentially involved but only a subset may be visited if the coding of a point cloud occurs. This subset may characterize the type of the point cloud. For example, most of the visited occupancy configurations may exhibit occupied adjacent cuboids of a current child cuboid, for example, if AR or VR dense point clouds are being coded. On the other hand, most of the visited occupancy configurations may exhibit only a few occupied adjacent cuboids of a current child cuboid, for example, if sensor-acquired sparse point clouds are being coded. The role of the dynamic reduction may be to obtain a more precise correlation, for example, based on the most visited occupancy configuration while putting aside (e.g., reducing aggressively) other occupancy configurations that are much less visited. The dynamic reduction may be updated on-the-fly. The dynamic reduction may be updated on-the-fly, for example, after each visit (e.g., a lookup in the LUT) of an occupancy configuration, for example, if the coding of occupancy data occurs.
β=β1 . . . βK
made of K bits. The size of the mask may decrease, for example, if occupancy configurations are visited (e.g., looked up in the LUT) a certain number (e.g., quantity) of times. The initial dynamic reduction function DR0 may mask all bits for all occupancy configurations such that it is a constant function DR0(β)=0 for all occupancy configurations β. The dynamic reduction function may evolve from a function DRn to an updated function DRn+1. The dynamic reduction function may evolve from a function DRn to an updated function DRn+1, for example, after each coding of an occupancy bit. The function may be defined by
β′=DRn(β)=β1 . . . βkn(β)
where kn(β) 510 is the number (e.g., quantity) of non-masked bits. The initialization of DR0 may correspond to k0(β)=0, and the natural evolution of the reduction function toward finer statistics may lead to an increasing number (e.g., quantity) of non-masked bits kn(β)≤kn+1(β). The dynamic reduction function may be entirely determined by the values of kn for all occupancy configurations β.
The visits (e.g., instances of a lookup in the LUT) to occupancy configurations may be tracked by a variable NV(β′) for all dynamically reduced occupancy configurations β′=DRn(β). The corresponding number (e.g., quantity) of visits NV(βV′) may be increased by one, for example, after each instance of coding of an occupancy bit based on an occupancy configuration βV. If this number (e.g., quantity) of visits NV(βV′) is greater than a threshold thV,
NV(βV′)>thV
then the number (e.g., quantity) of unmasked bits kn(β) may be increased by one for all occupancy configurations β being dynamically reduced to βV′. This corresponds to replacing the dynamically reduced occupancy configuration βV′ by the two new dynamically reduced occupancy configurations β0′ and β1′ defined by
β0′=βV′0=βV1 . . . βVkn(β)0 and β1′=βV′1=βV1 . . . βVkn(β)1.
In other words, the number (e.g., quantity) of unmasked bits has been increased by one kn+1(β)=kn(β)+1 for all occupancy configurations β such that DRn(β)=βV′. The number (e.g., quantity) of visits of the two new dynamically reduced occupancy configurations may be initialized to zero
At the start of the coding, the initial number (e.g., quantity) of visits for the initial dynamic reduction function DR0 may be set to
NV(DR0(β))=NV(0)=0,
and the evolution of NV on dynamically reduced occupancy configurations may be entirely defined.
The corresponding LUT entry LUT[βV′] may be replaced by the two new entries LUT[β0′] and LUT[β1′] that are initialized by the coder index associated with βV′. The corresponding LUT entry LUT[βV′] may be replaced by the two new entries LUT[β0′] and LUT[β1′] that are initialized by the coder index associated with βV′, for example, if a dynamically reduced occupancy configuration βV′ is replaced by the two new dynamically reduced occupancy configurations β0′ and β1′,
and then evolve separately. The evolution of the LUT of coder indices on dynamically reduced occupancy configurations may be entirely defined.
The reduction function DRn may be modeled by a series of growing binary trees Tn 520 whose leaf nodes 530 are the reduced occupancy configurations β′=DRn(β). The initial tree may be the single root node associated with 0=DR0(β). The replacement of the dynamically reduced to βV′ by β0′ and β1′ may correspond to growing the tree Tn from the leaf node associated with βV′, for example, by attaching to it two new nodes associated with β0′ and β1′. The tree Tn+1 may be obtained by this growth. The number (e.g., quantity) of visits NV and the LUT of context indices may be defined on the leaf nodes and evolve with the growth of the tree through equations (I) and (II).
The practical implementation of dynamic OBUF may be made by the storage of the array NV[β′] and the LUT[β′] of context indices, as well as the trees Tn 520. An alternative to the storage of the trees may be to store the array kn[β] 510 of the number (e.g., quantity) of non-masked bits.
A limitation for implementing dynamic OBUF may be its memory footprint. In some applications, a few million occupancy configurations may be practically handled, leading to about 20 bits βi constituting an entry configuration β to the reduction function DR. Each bit βi may correspond to the occupancy status of a neighboring cuboid of a current child cuboid or a set of neighboring cuboids of a current child cuboid.
Higher (e.g., more significant) bits βi (e.g., β0, β1, etc.) may be the first bits to be unmasked. Higher (e.g., more significant) bits βi (e.g., β0, β1, etc.) may be the first bits to be unmasked, for example, during the evolution of the dynamic reduction function DR. The order of neighbor-based information put in the bits βi may impact the compression performance. Neighboring information may be ordered from higher (e.g., highest) priority to lower priority and put in this order into the bits βi, from higher to lower weight. The priority may be, from the most important to the least important, occupancy of sets of adjacent neighboring child cuboids, then occupancy of adjacent neighboring child cuboids, then occupancy of adjacent neighboring parent cuboids, then occupancy of non-adjacent neighboring child nodes, and finally occupancy of non-adjacent neighboring parent nodes. Adjacent nodes sharing a face with the current child node may also have higher priority than adjacent nodes sharing an edge (but not sharing a face) with the current child node. Adjacent nodes sharing an edge with the current child node may have higher priority than adjacent nodes sharing only a vertex with the current child node.
At step 602, an occupancy configuration (e.g., occupancy configuration β) of the current child cuboid may be determined. The occupancy configuration (e.g., occupancy configuration β) of the current child cuboid may be determined, for example, based on occupancy bits of already-coded cuboids in a neighborhood of the current child cuboid. At step 604, the occupancy configuration (e.g., occupancy configuration β) may be dynamically reduced. The occupancy configuration may be dynamically reduced, for example, using a dynamic reduction function DR″. For example, the occupancy configuration β may be dynamically reduced into a reduced occupancy configuration β′=DRn(β). At step 606, context index may be looked up, for example, in a look-up table (LUT). For example, the encoder and/or decoder may look up context index LUT[β′] in the LUT of the dynamic OBUF. At step 608, context (e.g., probability model) may be selected. For example, the context (e.g., probability model) pointed to by the context index may be selected. At step 610, occupancy of the current child cuboid may be entropy coded. For example, the occupancy bit of the current child cuboid may be entropy coded (e.g., arithmetic coded), for example, based on the context. The occupancy bit of the current child cuboid may be coded based on the occupancy bits of the already-coded cuboids neighboring the current child cuboid.
Although not shown in
In general, the occupancy tree is a lossless compression technique. The occupancy tree may be adapted to provide lossy compression, for example, by modifying the point cloud on the encoder side (e.g., down-sampling, removing points, moving points, etc.). The performance of the lossy compression may be weak. The lossy compression may be a useful lossless compression technique for dense point clouds.
One approach to lossy compression for point cloud geometry may be to set the maximum depth of the occupancy tree to not reach the smallest volume size of one voxel but instead to stop at a bigger volume size (e.g., N×N×N cuboids (e.g., cubes), where N>1). The geometry of the points belonging to each occupied leaf node associated with the bigger volumes may then be modeled. This approach may be particularly suited for dense and smooth point clouds that may be locally modeled by smooth functions such as planes or polynomials. The coding cost may become the cost of the occupancy tree plus the cost of the local model in each of the occupied leaf nodes.
A scheme for modeling the geometry of the points belonging to each occupied leaf node associated with a volume size larger than one voxel may use sets of triangles as local models. The scheme may be referred to as the “TriSoup” scheme. TriSoup is short for “Triangle Soup” because the connectivity between triangles may not be part of the models. An occupied leaf node of an occupancy tree that corresponds to a cuboid with a volume greater than one voxel may be referred to as a TriSoup node. An edge belonging to at least one cuboid corresponding to a TriSoup node may be referred to as a TriSoup edge. A TriSoup node may comprise a presence flag (sk) for each TriSoup edge of its corresponding occupied cuboid. A presence flag (sk) of a TriSoup edge may indicate whether a TriSoup vertex (Vk) is present or not on the TriSoup edge. At most one TriSoup vertex (Vk) may be present on a TriSoup edge. For each vertex (Vk) present on a TriSoup edge of an occupied cuboid, the TriSoup node corresponding to the occupied cuboid may comprise a position (pk) of the vertex (Vk) along the TriSoup edge.
In addition to the occupancy words of an occupancy tree, an encoder may entropy encode, for each TriSoup node of the occupancy tree, the TriSoup vertex presence flags and positions of each TriSoup edge belonging to TriSoup nodes of the occupancy tree. A decoder may similarly entropy decode the TriSoup vertex presence flags and positions of each TriSoup edge and vertex along a respective TriSoup edge belonging to a TriSoup node of the occupancy tree, in addition to the occupancy words of the occupancy tree.
A presence flag (sk) and, if the presence flag (sk) may indicate the presence of a vertex, a position (pk) of a current TriSoup edge may be entropy coded. The presence flag (sk) and position (pk) may be individually or collectively referred to as vertex information or TriSoup vertex information. A presence flag (sk) and, if the presence flag (sk) indicates the presence of a vertex, a position (pk) of a current TriSoup edge may be entropy coded, for example, based on already-coded presence flags and positions, of present TriSoup vertices, of TriSoup edges that neighbor the current TriSoup edge. A presence flag (sk) and, if the presence flag (sk) may indicate the presence of a vertex, a position (pk) of a current TriSoup edge (e.g., indicating a position of the vertex the edge is along) may be additionally or alternatively entropy coded. The presence flag (sk) and the position (pk) of a current TriSoup edge may be additionally or alternatively entropy coded, for example, based on occupancies of cuboids that neighbor the current TriSoup edge. Similar to the entropy coding of the occupancy bits of the occupancy tree, a configuration βTS for a neighborhood (also referred to as a neighborhood configuration βTS) of a current TriSoup edge may be obtained and dynamically reduced into a reduced configuration βTS′=DRn(βTS), for example, by using a dynamic OBUF scheme for TriSoup. A context index LUT[βTS′] may be obtained from the OBUF LUT. At least a part of the vertex information of the current TriSoup edge may be entropy coded using the context (e.g., probability model) pointed to by the context index.
The TriSoup vertex position (pk) (if present) along its TriSoup edge may be binarized. The TriSoup vertex position (pk) (if present) along its TriSoup edge may be binarized, for example, to use a binary entropy coder to entropy code at least part of the vertex information of the current TriSoup edge. A number (e.g., quantity) of bits Nb may be set for the quantization of the TriSoup vertex position (pk) along the TriSoup edge of length N. The TriSoup edge of length N may be uniformly divided into 2Nb quantization intervals. By doing so, the TriSoup vertex position (pk) may be represented by Nb bits (pkj, j=1, . . . , Nb) that may be individually coded by the dynamic OBUF scheme as well as the bit corresponding to the presence flag (sk). The neighborhood configuration βTS, the OBUF reduction function DRn, and the context index may depend on the nature, characteristic, and/or property of the coded bit (e.g., a presence flag (sk), a highest position bit (pk1), a second highest position bit (pk2), etc.) of the coded bit (e.g., presence flag (sk), highest position bit (pk1), second highest position bit (pk2), etc.). There may practically be several dynamic OBUF schemes, each dedicated to a specific bit of information (e.g., presence flag (sk) or position bit (pkj)) of the vertex information.
A value resulting from each cross product may be equal to an area of a parallelogram formed by the two vectors in the cross product. The value may be representative of an area of a triangle formed by the two vectors because the area of the triangle is equal to half of the value. The vector {right arrow over (n)} may be indicative of the direction normal to a local surface representative of the portion of the point cloud. The vector {right arrow over (n)} may be indicative of the direction normal to a local surface representative of the portion of the point cloud, for example, since the vector {right arrow over (n)} indicates a direction of the triangles (e.g., TriSoup triangles) representing (e.g., modeling) the portion of the point cloud. A one-component residual αres along the line (C, {right arrow over (n)}) 810 may be coded, instead of a 3D residual vector, to maximize the effect of the centroid residual while minimizing its coding cost.
C
res=αres{right arrow over (n)}
The residual value αres may be determined by the encoder as the intersection between the current point cloud and the line (C, {right arrow over (n)}), which is along the same direction of the normalized vector {right arrow over (n)}. For example, a set of points, of the portion of the point cloud, closest (e.g., within a threshold distance, a threshold number of points) to the line may be determined. The set of points may be projected on the line and the residual value αres may be determined as the mean component along the line of the projected points. The mean may be determined as a weighted mean whose weights depend on the distance of the set of points from the line. For example, a point from the set closer to the line may have a higher weight than another point from the set farther from the line.
The residual value αres may be quantized. For example, it may be quantized by a uniform quantization function having quantization step similar to the quantization precision of the TriSoup vertices Vk. The quantization error may be maintained to be uniform over all vertices Vk and C+Cres such that the local surface is uniformly approximated.
The residual value αres may be binarized and entropy coded into the bitstream, for example, by using a unary-based coding scheme. The residual value αres may be coded using a set of flags. A flag f0 may be coded to indicate if the residual value αres is equal to zero. No further syntax elements may be needed. No further syntax elements may be needed, for example, if the flag f0 indicates the residual value αres is zero. A sign bit indicating a sign may be coded and the residual magnitude |αres|−1 may be coded using an entropy code. A sign bit indicating a sign may be coded and the residual magnitude |αres|−1 may be coded using an entropy code, for example, if the flag f0 indicates the residual value αres is not zero. The residual magnitude may be coded using a unary coding scheme that codes successive flags fi (i≥1) indicating if the residual value magnitude |αres| is equal to ‘i’. A binary entropy coder may binarize the residual value αres into the flags fi (i≥0) and entropy code the binarized residual value as well as the sign bit.
Compression of the residual value αres may be improved by determining bounds as shown in
The binary entropy coder used to code the binarized residual value αres may be a context-adaptive binary arithmetic coder (CABAC) such that the probability model (also referred to as a context or an entropy coder) used to code at least one bit (e.g., fi or sign bit) of the binarized residual value αres are updated depending on precedingly coded bits. The probability model of the binary entropy coder may be determined, for example, based on contextual information such as the values of the bounds m and M, the position of vertices Vk, or the size of the cuboid. The selection of the probability model (i.e., also referred equivalently as an entropy coder or context) may be performed by a dynamic OBUF scheme with the contextual information described herein as inputs.
The reconstruction of a decoded point cloud from the set of TriSoup triangles may be referred to as “voxelization” and may be performed, for example, by ray tracing or rasterization, for each triangle individually before duplicate voxels from the voxelized triangles are removed.
An intersection point 904 (shown as Pint), if any, between ray 900 and a TriSoup triangle 901 belonging to a cube 902, corresponding to a TriSoup node, may be rounded (e.g., quantized) to obtain a decoded point corresponding to a voxel. For example, a ray, launched parallel to a coordinate axis in 3D space, may intersect a TriSoup triangle if and only if the projection, along the ray direction, of the center of a voxel belongs to the TriSoup triangle. The ray may be determined to intersect the TriSoup triangle if the point of intersection corresponds to the center of the voxel. This intersection may be determined by using (e.g., applying) a ray-triangle intersection algorithm (e.g., tracing or ray casting technique) such as the Möller-Trumbore algorithm to generate voxels representing the triangle.
Ray tracing techniques such as the Möller-Trumbore algorithm may be, for example, based on generating, with respect to a triangle, barycentric coordinates of points of intersection between rays and a plane of the triangle. Points of the triangle may be determined from the barycentric coordinates.
under the condition u+v+w=1. Any point P of the plane (containing TriSoup triangle 910) may have unique coordinates (u,v,w) in the barycentric coordinate system. A point with barycentric coordinates (u,v,w) may include an ordered triple of numbers u, v, and w. A point with barycentric coordinates (u,v,w) that sum to 1 (i.e., u+v+w=1) may be known as homogeneous barycentric coordinates or normalized barycentric coordinates. The barycentric coordinates of the intersection point with respect to TriSoup triangle 910 may be determined using, for example, the well-known Möller-Trumbore algorithm.
The three vertices A, B, C of TriSoup triangle 910 may have respective barycentric coordinates A (1,0,0), B (0,1,0) and C (0,0,1), by converting points with Cartesian coordinates in 3D space to homogeneous barycentric coordinates. The convex hull (i.e., TriSoup triangle 910) of the three vertices A, B, and C may be equal to the set of all points such that the barycentric coordinates u, v, and w is each greater than or equal to zero:
0≤u,v,w
The intersection point may be determined to belong to TriSoup triangle 910, for example, based on the intersection point having barycentric coordinates with an ordered triple of values that is each greater than or equal to zero. The intersection point may be determined to not belong to TriSoup triangle. The intersection point may be determined to not belong to TriSoup triangle, for example, if at least one of barycentric coordinates (i.e., one of u, v, or w) is negative or less than 0. The intersection point may be determined to not belong to TriSoup triangle because the intersection point will be on the plane, but not on an edge or within the TriSoup triangle. A point determined to belong to TriSoup triangle 910 may be the ray intersecting TriSoup triangle 910 (e.g., within or at an edge of TriSoup triangle 910).
In at least some technologies, the geometry of the point cloud may be represented by a master space-partitioning of a volume encompassing the point cloud. The master space-partitioning of the volume may be obtained by splitting the volume into sub-volumes. For example, a master occupancy tree structure may be derived from the volume by recursively splitting the volume into sub-volumes. The master occupancy tree structure may comprise a root node associated with the volume and non-leaf nodes and leaf nodes associated with sub-volumes as discussed in relation with
A tree-based geometry representation model as discussed on
Information associated with a tree-based geometry representation model of a current sub-volume may be coded, for example, based on information associated with a tree-based geometry representation model of at least one neighboring sub-volume. For example, the at least one neighboring sub-volume may be adjacent or spatially close to the current sub-volume as shown on
Information associated with a tree-based geometry representation model of a current sub-volume of the master occupancy tree may be entropy coded using OBUF or dynamic OBUF, for example, based on information associated with a tree-based model geometry representation model of at least one neighboring sub-volume of the master occupancy tree as discussed above in
The geometry of a sub-volume associated with an occupied leaf node may be alternatively represented by a triangle-based geometry representation model, for example, TriSoup-based geometry representation model.
Information associated with a TriSoup-based geometry representation model of a current sub-volume may be coded, for example, based on information associated with a TriSoup-based geometry representation model of at least one neighboring sub-volume. For example, the at least one neighboring sub-volume may be adjacent or spatially close to the current sub-volume as shown on
Information associated with a TriSoup-based geometry representation model of a current sub-volume associated with an occupied leaf node of the master occupancy tree, for example, current TriSoup node, may be vertex information of each vertex along TriSoup edge of TriSoup triangle belonging to the current TriSoup node. Vertex information associated with a vertex of a TriSoup triangle of the current TriSoup node may be a presence flag on an edge of the TriSoup triangle and possibly a position of the vertex along an edge of the TriSoup triangle. The entropy coding of the vertex information may be based on vertex information of at least one vertex along at least one edge of at least one TriSoup triangle belonging to at least one TriSoup node adjacent of the current TriSoup node.
A geometry of a portion of the point cloud geometry contained in a current sub-volume of the master occupancy tree may be represented by a first type of geometry representation model, for example, a tree-based geometry representation model or a triangle-based geometry representation model. An information associated with the first type of geometry representation model of the current sub-volume may then be coded (encoded or decoded), for example, entropy coded, based on an information associated with the first type of geometry representation model to represent portion of the point cloud contained in at least one neighboring sub-volume of the current sub-volume.
The information associated with the first type of geometry representation model of the current sub-volume may be also coded, for example, based on an information associated with second type of geometry representation model to represent portion of the point cloud contained in at least one other neighboring sub-volume of the current sub-volume.
The first type may be different from the second type. For example the first type may be a tree-based geometry representation model and the second type may be a triangle-based geometry representation model or, inversely, the first type may be a triangle-based geometry representation model and the second type may be a tree-based geometry representation model.
In at least some Test Models (TM) under development at MPEG, for example the TMC13 and the GeS-TM, the coding of an information associated with a type of geometry representation model, for example, tree-based geometry representation model or TriSoup-based geometry representation model, of a current sub-volume may be performed based only on an information associated with same type of geometry representation model information.
The information associated with first type of geometry representation model information, for first and second type of geometry representation models represent portions of sub-volumes, may be coded, for example, based only on the information associated with first type geometry representation model. The coding may not be based on information associated with second type of geometry representation model of neighboring sub-volumes of the sub-volumes. Coding performance may be improved if the coding of an information associated with a first type of geometry representation model information of a current sub-volume can be performed based on information associated with a first type of geometry representation model of neighboring sub-volumes of the sub-volume and information associated with a second type of geometry representation model of neighboring sub-volumes of the sub-volume.
Examples described herein are related to converting (e.g., regenerating, transforming) geometry representation models of neighboring sub-volumes of a current sub-volume to be coded such that converted geometry representation models are of the same type as that used to represent a geometry of a portion of a point cloud contained in the current sub-volume. Additional geometry information of neighboring sub-volumes of the current sub-volume may be obtained to predictively code geometry information of the geometry representation model representing the point cloud portion in the current sub-volume. Increasing the amount of relevant geometry information may lead to higher prediction accuracy, which leads to less coding bits and higher compression. First information associated with a first type of geometry representation model may represent a first portion of a point cloud geometry contained in a first sub-volume of a volume encompassing the point cloud geometry. The first information associated with a first type of geometry representation model may be obtained. The first portion of the point cloud geometry may be reconstructed using the first information. A second type of geometry representation model may represent a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume. Based on the second type of geometry representation model and the first type of geometry representation model being different from the second type, second information associated with the second type of geometry representation model representing the reconstructed first portion may be generated. Third information associated with the second type of geometry representation model representing the second portion of the point cloud geometry may be coded, for example, based on the second information.
The examples described herein may allow coding information (e.g., third information) associated with a second type of geometry representation model to represent a portion of the point cloud geometry contained in a current sub-volume (e.g., second sub-volume) based on generated information associated with second type of geometry representation model (different from the first type of geometry representation model) to represent portions of the point cloud geometry contained in neighboring sub-volumes (e.g., first sub-volume) containing a portion of the point cloud geometry that is represented by the first type of geometry representation model.
The information associated with a tree-based geometry representation model may be coded, for example, based on information generated from information associated with a first type of geometry representation model. The first type of geometry representation model may be, for example, a TriSoup-based geometry representation model. The TriSoup-based geometry representation model may represent a portion of the point cloud geometry contained in one neighboring sub-volume. The information associated with a tree-based geometry representation model may be coded, for example, based on information generated from information associated with the first type of geometry representation model (e.g., TriSoup-based geometry representation model), for example, if the second type of geometry representation model is a tree-based geometry representation model. The information associated with the triangle-based geometry representation model may be coded, for example, based on information generated from information associated with a first type of geometry representation model. The first type of geometry representation model may be, for example, a tree-based geometry representation model. The tree-based geometry representation model may represent a portion of the point cloud geometry contained in one neighboring sub-volume. The information associated with the triangle-based geometry representation model may be coded, for example, based on information generated from information associated with a first type of geometry representation model, for example, if the second type of geometry representation model is a triangle-based geometry representation model (e.g., a TriSoup-based geometry representation model).
At step 1210, first information (e.g., first information 1212) may be obtained. The first information (e.g., first information 1212) may be associated with a first type of geometry representation model representing a first portion of a point cloud geometry contained in a first sub-volume (e.g., first sub-volume 1211) of the volume encompassing the point cloud geometry. The first type of geometry representation model may be a tree-based geometry representation model, for example, an occupancy tree. The first information (e.g., first information 1212) may be occupancy related information, for example, as described herein with respect to
At step 1220, a reconstructed first portion (e.g., reconstructed first portion 1222) of the point cloud geometry may be obtained. The reconstructed first portion (e.g., reconstructed first portion 1222) may be obtained, for example, by reconstructing the first portion of the point cloud using the first information (e.g., first information 1212). At step 1230, second information (e.g., second information 1232) may be generated. The second information (e.g., second information 1232), for example, based on a second type of geometry representation model representing a second portion of the point cloud geometry contained in a second sub-volume 1231 (e.g., current sub-volume) neighboring the first sub-volume. The first type of geometry representation model may be different from the second type of geometry representation model. The second information may be associated with the second type of geometry representation model representing the reconstructed first portion (e.g., reconstructed first portion 1222) of the point cloud geometry. The process at step 1230 may be performed, for example, based on (e.g., in response to) an indication of the second sub-volume (e.g., second sub-volume 1231). The second sub-volume may correspond, for example, to a current sub-volume to be coded. The process at step 1230 may be performed, for example, based on an indication of the second sub-volume being coded using the second type of geometry representation model. The second type of geometry representation model may be different from the first type of geometry representation model used to represent point cloud geometry contained in the first sub-volume (e.g., first sub-volume 1211) neighboring the second sub-volume.
At step 1235, third information (e.g., third information 1241) associated with the second type of geometry representation model representing the second portion of the point cloud geometry, may be obtained. At step 1240, third information (e.g., third information 1241) may be coded. The third information may be coded, for example, based on the second information (e.g., second information 1232). At step 1240, the process 1200 may further comprise obtaining fourth information (e.g., fourth information 1252). The fourth information may be associated with the second type of geometry representation model representing a third portion of a point cloud geometry contained in a third sub-volume (e.g., third sub-volume 1251) neighboring the second sub-volume (e.g., second sub-volume 1231). The third information (e.g., third information 1241) may be coded. The third information may be coded, for example, based on the second information (e.g., second information 1232). The third information may be coded, for example, based on the second information (e.g., second information 1232) and, if available, the fourth information (e.g., fourth information 1252).
The examples described herein may allow coding the third information associated with the second type of geometry representation model representing the second portion of the point cloud geometry. The third information may be coded, for example, based on information associated with both the first and the second types of geometry representation models of neighboring sub-volumes. The third information associated with a tree-based geometry representation model may be coded, for example, based on first information associated with a triangle-based geometry representation model and fourth information associated with tree-based geometry representation model. The third information associated with a triangle-based geometry representation model may be coded, for example, based on first information associated with a tree-based geometry representation model and fourth information associated with triangle-based geometry representation model.
The third information (associated with the second type of geometry representation model) may be coded, for example, based on a plurality of reconstructed neighboring sub-volumes. The plurality of reconstructed neighboring sub-volumes may include at least one sub-volume that was reconstructed using the first type of geometry representation model (e.g., reconstructed first portion 1222) and one sub-volume (e.g., third sub-volume 1251) that was reconstructed using the second type of geometry representation model. Neighboring sub-volumes coded using different geometry representation models may be considered in coding (e.g., used in context selection) the third information (e.g., third information 1241). Neighboring sub-volumes coded using different geometry representation models may be considered in coding, for example, by obtaining the second information from reconstructed first portion 1222 at step 1230.
Process 1200 may further comprise coding a first indication indicating the second type of geometry representation model. An encoder may encode the first indication in a bitstream. A decoder may decode the first indication from a bitstream.
The first indication may indicate, for example, a tree-based geometry representation model or a triangle-based geometry representation model. The triangle-based geometry representation model may include, for example, a TriSoup-based geometry representation model. The triangle-based geometry representation model may be associated with the second sub-volume (e.g., current sub-volume). The first indication may be signaled in a bitstream for at least one sub-volume containing a portion of the point cloud geometry. The first indication may be signaled, for example, for each sub-volume containing a portion of the point cloud geometry. The first indication may be signaled, for example, for each group of more than one sub-volume containing a portion of the point cloud geometry.
Coding the third information 1241 (as shown for example at step 1240) may comprise binarizing the third information 1241 and coding (e.g., entropy coding) the binarized third information 1241. Coding the third information 1241 (as shown for example at step 1240) may comprise binarizing the third information 1241 and coding (e.g., entropy coding) the binarized third information 1241, for example, based on a contextual information derived from the second information 1232 by a coder (e.g., a binary entropy coder). The binarized third information 1241 may be coded (e.g., entropy coded). The binarized third information 1241 may be coded, for example, based on a contextual information derived from the second information 1232 and fourth information 1252.
Coding, (e.g., entropy coding) may use probabilities, for example, based on the contextual information. The coder (e.g., binary entropy coder) may be Context Adaptive Binary Arithmetic Coder (CABAC) in which the contexts are selected, for example, based on the contextual information. The coder (e.g., binary entropy coder) may be OBUF or dynamic OBUF as discussed herein, for example, based on the contextual information.
The first type of geometry representation model may be triangle-based geometry representation model representing the first portion of the point cloud and the second type of geometry representation model may be tree-based geometry representation model representing the reconstructed first portion 1222 of the point cloud, the second portion of the point cloud and the third portion of the point cloud.
The first information may be associated with TriSoup-based geometry representation model of the first sub-volume 1310 associated with an occupied leaf node of a master occupancy tree (e.g., first TriSoup node). The first information may include vertex information of each vertex along TriSoup edge of TriSoup triangle belonging to the first TriSoup node. Vertex information associated with a vertex of a TriSoup triangle of the first TriSoup node may be a presence flag on an edge of the TriSoup triangle. Vertex information associated with a vertex of a TriSoup triangle of the first TriSoup node may be a presence flag on an edge of the TriSoup triangle and a position of the vertex along an edge of the TriSoup triangle.
The second, respectively third and fourth, information may include information associated with a local space-partitioning of the first sub-volume 1310, respectively with the second sub-volume 1320 and third sub-volume 1330. The local space-partitioning of the first 1310, respectively second 1320 and third 1330 sub-volume may be a local space partitioning tree. The local space-partitioning of the first 1310, respectively second 1320 and third 1330 sub-volume may be, for example, a local occupancy tree, and may be a part of the master occupancy tree. The second, respectively third and fourth, information may include occupancy bits of the occupancy words of occupied leaf nodes of the local occupancy tree of the first 1310, respectively second 1320 and third 1330 sub-volume.
The reconstructed first portion 1222 of the point cloud geometry may comprise a set of points defined by voxelizing the at least one triangle of the triangle-based geometry representation model representing the first portion of the point cloud. Voxelization of triangle may be performed as described herein with respect to
Tree-based geometry representation model representing the reconstructed first portion 1222 contained in the first sub-volume 1310 may comprise a local space-partitioning of first sub-volume 1310. The local space-partitioning of first sub-volume 1310 may be obtained, for example, by recursively splitting the first sub-volume 1310 into local sub-volumes containing at least one point of the set of points. The local space-partitioning of first sub-volume 1310 may be a local space-partitioning tree and the second information 1232 may indicate information related to the leaf nodes of the local space-partitioning tree.
The local space-partitioning tree may be an occupancy tree comprising occupied leaf nodes containing at least one point of the set of points. The second information 1232 may indicate occupancy bit associated with each leaf node of the local space-partitioning tree. The local space-partitioning tree may be part of a master space-partitioning tree splitting the volume encompassing the point cloud geometry into the sub-volumes. The recursive splitting may be stopped, for example, if a stopping condition is fulfilled.
The stopping condition may be fulfilled, for example, if a local sub-volume corresponding to each leaf node of the local space-partitioning tree contains a single point of the set of points. The stopping condition may be fulfilled, for example, if the size of a local sub-volume corresponding to one leaf node of the local space-partitioning tree is below a minimum size. The minimum size may be signaled in a bitstream. The stopping condition may be fulfilled, for example, if a maximum depth is reached. The maximum depth may be signaled in the bitstream.
The third information 1241 may indicate an occupancy bit associated with each leaf node of a local spanning-tree of the tree-based geometry representation model representing the second portion of the point cloud. The third information may be entropy coded by OBUF or dynamic OBUF, for example, based on the second information indicating occupancy bits of leaf nodes of the local space-partitioning tree of tree-based geometry representation model representing the reconstructed first portion 1222 of the point cloud. The fourth information 1252 may indicate an occupancy bit associated with each leaf node of a local spanning-tree of the tree-based geometry representation model to represent the third portion of the point cloud. The third information 1241 may be entropy coded by OBUF or dynamic OBUF, for example, based on the second information and the fourth information.
The first information may be associated with tree-based geometry representation model of the first sub-volume 1610 associated with an occupied leaf node of a master occupancy tree (e.g., TriSoup node). The first information may include a local space-partitioning of the first sub-volume.
The second information may include information associated with triangle-based geometry representation model to represent the reconstructed first portion 1222 of the point cloud geometry contained in the first sub-volume 1610. The third, respectively fourth, information may include information associated with triangle-based geometry representation model representing the second portion contained in the second sub-volume 1620, respectively the third portion of the point cloud geometry contained in the third sub-volume 1630.
The third, respectively fourth information, may include vertex information of each vertex along TriSoup edge of TriSoup triangle belonging to a TriSoup node corresponding to the second sub-volume 1620, respectively third sub-volume 1630. Vertex information associated with a vertex along an edge of a TriSoup node, may be a presence flag on an edge of the TriSoup node. Vertex information associated with a vertex along an edge of a TriSoup node, may be a presence flag on an edge of the TriSoup node and a position of the vertex along an edge of the TriSoup node.
The reconstructed first portion 1222 of the point cloud geometry may be a set of points derived from a space-partitioning of the tree-based geometry representation model representing the first portion of the point cloud contained in the first sub-volume 1610. The set of points may be derived, for example, based on points defined as being centers of sub-volumes of the space-partitioning of the tree-based geometry representation model representing the first portion of the point cloud contained in the first sub-volume 1610.
The center point of each occupied leaf node of the space-partitioning tree may be a point of the set of points. The center point of each occupied leaf node of the space-partitioning tree may be a point of the set of points, for example, if the space-partitioning is a space-partitioning tree.
Triangle-based geometry representation model representing the reconstructed first portion 1222 of the point cloud geometry contained in the first sub-volume 1610 may comprise at least one triangle. The at least one triangle may be defined by vertices located on edges of the first sub-volume 1610. The second information may include vertex information of at least one vertex along at least one edge of the first sub-volume 1610. In some instances there may be at most one vertex per edge.
Vertex information associated with a vertex along an edge of the first sub-volume 1610 may be a presence flag of the vertex on an edge of the first sub-volume 1610. Vertex information associated with a vertex along an edge of the first sub-volume 1610 may be a presence flag of the vertex on an edge of the first sub-volume 1610 and a position of the vertex along an edge of the sub-volume 1610. A vertex along an edge of the first sub-volume 1610 may be derived, for example, based on a sub-set of the set of points. The sub-set of points may be derived, for example, based on points of the set of points having a distance lower than (e.g., smaller than, below, etc.) a distance threshold from the edge.
A position of a vertex along an edge of the first sub-volume 1610 may be derived, for example, based on projection of points of the sub-set of points on the edge. The points of the sub-set of points may be orthogonally projected on the edge.
Projection of the points of the sub-set of points on the edge may define positions of the projected points of the sub-set of points on the edge. The edge may be defined between a first and a second vertex. Projection of a point on the edge may be a position of the projected point on the edge between the first and second vertex. The position of the vertex along the edge may be derived, for example, based on an average or median value of the positions of the projected points of the sub-set on the edge.
The position of the vertex V1720 may be quantized along the edge 1720. The position of the vertex V1720 may be quantized along the edge 1720, for example, based on a quantizing parameter Δp. The edge may be split into fourth intervals of a same size (e.g., uniform quantizing) and an index, for example, equal 1 to 4, may indicate the position of a vertex along the edge.
The third information 1241 may indicate vertex information of vertex along edge of the second sub-volume 1620. Vertex information may be coded (e.g., entropy coded) by OBUF or dynamic OBUF, for example, based on the second information indicating vertex information of at least one vertex along at least one edge of the first sub-volume 1610.
The third information 1241 may indicate presence of flag of a vertex on an edge of the second sub-volume 1620. The presence flag may be coded (e.g., entropy coded) by OBUF or dynamic OBUF, for example, based on the second information indicating presence flag of at least one vertex along at least one edge of the first sub-volume 1610. The third information 1241 may indicate position of a vertex on an edge of the second sub-volume 1620. The position of the vertex may be coded (e.g., entropy coded) by OBUF or dynamic OBUF, for example, based on the second information indicating position of at least one vertex along at least one edge of the first sub-volume 1610. The fourth information 1252 may indicate vertex information of vertex along edge of the third sub-volume 1630. Vertex information may be coded (e.g., entropy coded) by OBUF or dynamic OBUF, for example, based on the second and fourth information.
The fourth information 1252 may indicate presence of flag of a vertex on an edge of the third sub-volume. The presence flag may be coded (e.g., entropy coded) by OBUF or dynamic OBUF, for example, based on the second and fourth information. The fourth information may indicate position of a vertex on an edge of the third sub-volume. The position of the vertex may be coded (e.g., entropy coded) by OBUF or dynamic OBUF, for example, based on the second and fourth information.
The definition of the triangle-based geometry representation model of the portion of the set of points contained in the sub-volume 1113 may be limited to the definition of vertices on edges of the sub-volume 1113. The definition of the triangle-based geometry representation model of the portion of the set of points contained in the sub-volume 1113 may be limited to the definition of vertices on edges of the sub-volume 1113, for example, if vertex information of those vertices are useful for coding vertex information of triangles of the triangle-based geometry representation model representing the second portion contained in the second sub-volume 1100. The third information 1241 of the triangle-based geometry representation model representing the second portion of the point cloud geometry contained in the second sub-volume 1100 may be coded, for example, based on the information of the triangle-based geometry representation models of the neighboring sub-volumes 1110 to 1113.
At step 2010, first information may be determined (e.g., obtained, generated, retrieved, calculated). The first information may be associated with a first type of geometry representation model representing a first portion of a point cloud geometry contained in a first sub-volume of the volume encompassing the point cloud geometry.
At step 2020, a reconstructed first portion of the point cloud geometry may be obtained by reconstructing the first portion of the point cloud using the first information.
At step 2030, second information may be determined (e.g., obtained, generated, retrieved, calculated). The second information may be determined, for example, based on a second type of geometry representation model representing a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume. The first type of geometry representation model may be different from the second type of geometry representation model. The second information may be associated with the second type of geometry representation model representing the reconstructed first portion of the point cloud geometry.
At step 2040, third information may be encoded. The third information may be encoded, for example, based on the second information. The third information may be associated with the second type of geometry representation model representing the second portion of the point cloud geometry.
At step 2060, first information may be determined (e.g., obtained, generated, retrieved, calculated). The first information may be associated with a first type of geometry representation model representing a first portion of a point cloud geometry contained in a first sub-volume of the volume encompassing the point cloud geometry.
At step 2070, a reconstructed first portion of the point cloud geometry may be obtained by reconstructing the first portion of the point cloud using the first information.
At step 2080, second information may be determined (e.g., obtained, generated, retrieved, calculated). The second information may be determined, for example, based on a second type of geometry representation model representing a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume. The first type of geometry representation model may be different from the second type of geometry representation model. The second information may be associated with the second type of geometry representation model representing the reconstructed first portion of the point cloud geometry.
At step 2090, third information may be decoded. The third information may be decoded, for example, based on the second information. The third information may be associated with the second type of geometry representation model representing the second portion of the point cloud geometry.
The computer system 2100 may comprise one or more processors, such as a processor 2104. The processor 2104 may be a special purpose processor, a general purpose processor, a microprocessor, and/or a digital signal processor. The processor 2104 may be connected to a communication infrastructure 2102 (for example, a bus or network). The computer system 2100 may also comprise a main memory 2106 (e.g., a random access memory (RAM)), and/or a secondary memory 2108.
The secondary memory 2108 may comprise a hard disk drive 2110 and/or a removable storage drive 2112 (e.g., a magnetic tape drive, an optical disk drive, and/or the like). The removable storage drive 2112 may read from and/or write to a removable storage unit 2116. The removable storage unit 2116 may comprise a magnetic tape, optical disk, and/or the like. The removable storage unit 2116 may be read by and/or may be written to the removable storage drive 2112. The removable storage unit 2116 may comprise a computer usable storage medium having stored therein computer software and/or data.
The secondary memory 2108 may comprise other similar means for allowing computer programs or other instructions to be loaded into the computer system 2100. Such means may include a removable storage unit 2118 and/or an interface 2114. Examples of such means may comprise a program cartridge and/or cartridge interface (such as in video game devices), a removable memory chip (such as an erasable programmable read-only memory (EPROM) or a programmable read-only memory (PROM)) and associated socket, a thumb drive and USB port, and/or other removable storage units 2118 and interfaces 2114 which may allow software and/or data to be transferred from the removable storage unit 2118 to the computer system 2100.
The computer system 2100 may also comprise a communications interface 2120. The communications interface 2120 may allow software and data to be transferred between the computer system 2100 and external devices. Examples of the communications interface 2120 may include a modem, a network interface (e.g., an Ethernet card), a communications port, etc. Software and/or data transferred via the communications interface 2120 may be in the form of signals which may be electronic, electromagnetic, optical, and/or other signals capable of being received by the communications interface 2120. The signals may be provided to the communications interface 2120 via a communications path 2122. The communications path 2122 may carry signals and may be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link, and/or any other communications channel(s).
A computer program medium and/or a computer readable medium may be used to refer to tangible storage media, such as removable storage units 2116 and 2118 or a hard disk installed in the hard disk drive 2110. The computer program products may be means for providing software to the computer system 2100. The computer programs (which may also be called computer control logic) may be stored in the main memory 2106 and/or the secondary memory 2108. The computer programs may be received via the communications interface 2120. Such computer programs, when executed, may enable the computer system 2100 to implement the present disclosure as discussed herein. In particular, the computer programs, when executed, may enable the processor 2104 to implement the processes of the present disclosure, such as any of the methods described herein. Accordingly, such computer programs may represent controllers of the computer system 2100.
Features of the disclosure may be implemented in hardware using, for example, hardware components such as application-specific integrated circuits (ASICs) and gate arrays. Implementation of a hardware state machine to perform the functions described herein will also be apparent to persons skilled in the relevant art(s).
The example in
A computing device may perform a method comprising multiple operations. The computing device may comprise a decoder. The computing device may determine first information associated with a first geometry model representing a first portion of a point cloud geometry contained in a first sub-volume of the point cloud geometry; reconstruct the first portion of the point cloud geometry using the first information; determine, based on a second geometry model representing a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume, second information associated with the second geometry model representing the reconstructed first portion, wherein the first geometry model may be different from the second geometry model; and decode, based on the second information, third information associated with the second geometry model representing the second portion of the point cloud geometry. The computing device may determine fourth information associated with the second geometry model representing a third portion of the point cloud geometry contained in a third sub-volume neighboring the second sub-volume; and wherein the decoding the third information, from a bitstream, may be based on the second information and the fourth information. The computing device may decode, from a bitstream, a first indication indicating the second geometry model, wherein: the first indication may further indicate at least one of: a tree-based geometry model associated with the second sub-volume; or a triangle-based geometry model associated with the second sub-volume; and the first indication is signaled in the bitstream for one or more sub-volumes containing a portion of the point cloud geometry, wherein the decoding the third information further may comprise: binarizing the third information; and decoding, based on contextual information derived from the second information, the binarized third information, wherein: the first geometry model may be a triangle-based geometry model representing the first portion of the point cloud geometry; and the second geometry model may be a tree-based geometry model representing the reconstructed first portion of the point cloud geometry, the second portion of the point cloud geometry, and a third portion of the point cloud geometry; wherein: the reconstructed first portion of the point cloud geometry may comprise a set of points defined by voxelizing at least one triangle of the triangle-based geometry model; and the tree-based geometry model representing the reconstructed first portion of the point cloud geometry may comprise a local space-partitioning of first sub-volume determined by recursively splitting the first sub-volume into local sub-volumes containing at least one point of the set of points; wherein: the local space-partitioning may be a local space-partitioning tree; the second information may indicate occupancy bit associated with each leaf node of the local space-partitioning tree; and the local space-partitioning tree may be part of a master space-partitioning tree splitting a volume encompassing the point cloud geometry into sub-volumes; wherein: the local space-partitioning may be a local space-partitioning tree; and the recursive splitting of the first sub-volume may be stopped based on: a local sub-volume corresponding to each leaf node of the local space-partitioning tree may contain at least one point of the set of points; or a size of a local sub-volume corresponding to one leaf node of the local space-partitioning tree may be below a minimum size; wherein decoding the third information may comprise entropy decoding from a bitstream, by a binary entropy coder; wherein the entropy decoding may use probabilities based on the contextual information; wherein the binary entropy coder may be a Context Adaptive Binary Arithmetic Coder in which the contexts are selected based on the contextual information; wherein the binary entropy coder may be Optimal Binary Coders with Update on the Fly (OBUF) or dynamic OBUF based on the contextual information; wherein the third information may indicate information related to the leaf nodes of the local space-partitioning tree; wherein the local space-partitioning tree may be an occupancy tree comprising occupied leaf nodes containing at least one point of the set of points; wherein the minimum size may be signaled in a bitstream; wherein the stopping condition may be fulfilled when a maximum depth is reached; wherein the maximum depth may be signaled in the bitstream; wherein the center point of each occupied leaf node of the space-partitioning tree may be a point of the set of points if the space-partitioning is a space-partitioning tree; wherein the second information may include vertex information of at least one vertex along at least one edge of the triangle vertices, at most one vertex per edge; wherein one vertex along one edge of the first sub-volume may be derived based on a sub-set of the set of points; wherein the sub-set of points may be derived based on points of the set of points having a distance lower than a threshold from the edge; wherein position of one vertex along one edge of the first sub-volume may be derived based on projection of points of the sub-set of points on the edge; wherein projection of the points of the sub-set of points on the edge may define positions of the projected points of the sub-set of points on the edge; wherein the position of the vertex along the edge may be derived based on an average or median value of positions of the projected points of the sub-set on the edge; wherein the third information may indicate position of a vertex on an edge of the second sub-volume and the position of the vertex is entropy encoded by OBUF or dynamic OBUF based on the second information indicating position of at least one vertex along at least one edge of the first sub-volume; wherein the fourth information may indicate vertex information of vertex along edge of the third sub-volume and vertex information is entropy encoded by OBUF or dynamic OBUF based on the second and fourth information; wherein the fourth information may indicate presence of flag of a vertex on an edge of the third sub-volume and the presence flag is entropy encoded by OBUF or dynamic OBUF based on the second and fourth information; wherein the fourth information may indicate position of a vertex on an edge of the third sub-volume and the position of the vertex is entropy encoded by OBUF or dynamic OBUF based on the second and fourth information. The computing device may comprise one or more processors and memory, storing instructions that, when executed by the one or more processors, perform the method described herein. A system may comprise the computing device configured to perform the described method, additional operations, and/or include additional elements; and a second computing device configured to encode the point cloud frame. A computer-readable medium may store instructions that, when executed, cause performance of the described method, additional operations, and/or include additional elements.
A computing device may perform a method comprising multiple operations. The computing device may comprise a decoder. The computing device may determine first information associated with a first geometry model representing a first portion of a point cloud geometry contained in a first sub-volume of the point cloud geometry; determine, based on a second geometry model representing a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume, second information associated with the second geometry model representing a reconstructed first portion of the point cloud geometry, wherein and the first geometry model is different from the second geometry model; determine fourth information associated with the second geometry model representing a third portion of the point cloud geometry contained in a third sub-volume neighboring the second sub-volume; and decode based on the second information and the fourth information, third information associated with the second geometry model representing the second portion of the point cloud geometry; wherein: the third information may indicate occupancy bit associated with each leaf node of a local spanning-tree representing the second portion of the point cloud geometry; and the third information may be decoded based on the second information indicating occupancy bits of leaf nodes of a local space-partitioning tree representing the reconstructed first portion of the point cloud geometry; wherein: the fourth information may indicate occupancy bit associated with each leaf node of a local spanning-tree representing the third portion of the point cloud geometry; and the third information may be decoded based on the second information and the fourth information; wherein: the first geometry model may be a tree-based geometry representation model representing the first portion of the point cloud geometry; and the second geometry model may be a triangle-based geometry representation model representing: the reconstructed first portion of the point cloud geometry; the second portion of the point cloud geometry; and the third portion of the point cloud geometry; wherein: the reconstructed first portion of the point cloud geometry may be a set of points derived based on points defined as centers of sub-volumes of a space-partitioning of tree-based geometry representation model representing the first portion of the point cloud geometry; wherein: triangle-based geometry representation model representing the reconstructed first portion of the point cloud geometry contained in the first sub-volume may comprise at least one triangle defined by vertices located on edges of the first sub-volume; wherein decoding the third information may comprise entropy decoding from a bitstream, by a binary entropy coder; wherein the entropy decoding may use probabilities based on the contextual information; wherein the binary entropy coder may be a Context Adaptive Binary Arithmetic Coder in which the contexts are selected based on the contextual information; wherein the binary entropy coder may be Optimal Binary Coders with Update on the Fly (OBUF) or dynamic OBUF based on the contextual information; wherein the third information may indicate information related to the leaf nodes of the local space-partitioning tree; wherein the local space-partitioning tree may be an occupancy tree comprising occupied leaf nodes containing at least one point of the set of points; wherein the minimum size may be signaled in a bitstream; wherein the stopping condition may be fulfilled when a maximum depth is reached; wherein the maximum depth may be signaled in the bitstream; wherein the center point of each occupied leaf node of the space-partitioning tree may be a point of the set of points if the space-partitioning is a space-partitioning tree; wherein the second information may include vertex information of at least one vertex along at least one edge of the triangle vertices, at most one vertex per edge; wherein one vertex along one edge of the first sub-volume may be derived based on a sub-set of the set of points; wherein the sub-set of points may be derived based on points of the set of points having a distance lower than a threshold from the edge; wherein position of one vertex along one edge of the first sub-volume may be derived based on projection of points of the sub-set of points on the edge; wherein projection of the points of the sub-set of points on the edge may define positions of the projected points of the sub-set of points on the edge; wherein the position of the vertex along the edge may be derived based on an average or median value of positions of the projected points of the sub-set on the edge; wherein the third information may indicate position of a vertex on an edge of the second sub-volume and the position of the vertex is entropy encoded by OBUF or dynamic OBUF based on the second information indicating position of at least one vertex along at least one edge of the first sub-volume; wherein the fourth information may indicate vertex information of vertex along edge of the third sub-volume and vertex information is entropy encoded by OBUF or dynamic OBUF based on the second and fourth information; wherein the fourth information may indicate presence of flag of a vertex on an edge of the third sub-volume and the presence flag is entropy encoded by OBUF or dynamic OBUF based on the second and fourth information; wherein the fourth information may indicate position of a vertex on an edge of the third sub-volume and the position of the vertex is entropy encoded by OBUF or dynamic OBUF based on the second and fourth information. The computing device may comprise one or more processors and memory, storing instructions that, when executed by the one or more processors, perform the method described herein. A system may comprise the computing device configured to perform the described method, additional operations, and/or include additional elements; and a second computing device configured to encode the point cloud frame. A computer-readable medium may store instructions that, when executed, cause performance of the described method, additional operations, and/or include additional elements.
A computing device may perform a method comprising multiple operations. The computing device may comprise an encoder. The computing device may determine first information associated with a first geometry model representing a first portion of a point cloud geometry contained in a first sub-volume of the point cloud geometry; reconstruct the first portion of the point cloud geometry using the first information; determine based on a second geometry model representing a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume, second information associated with the second geometry model representing the reconstructed first portion, wherein and the first geometry model is different from the second geometry model; and encode, based on the second information, third information associated with the second geometry model representing the second portion of the point cloud geometry. The computing device may encode, in a bitstream, a first indication indicating the second geometry model, wherein: the first indication may indicate at least one of: a tree-based geometry representation model associated with the second sub-volume; or a triangle-based geometry representation model associated with the second sub-volume; and the first indication is signaled in the bitstream for at least one sub-volume containing a portion of the point cloud geometry; wherein: the third information may indicate vertex information of at least one vertex along edge of the second sub-volume; and the vertex information may be encoded based on the second information indicating vertex information of at least one vertex along at least one edge of the first sub-volume; wherein: the third information may indicate a presence flag of at least one vertex on an edge of the second sub-volume; and the presence flag may be encoded based on the second information indicating a presence flag of at least one vertex along at least one edge of the first sub-volume. The computing device may determine fourth information associated with the second geometry model representing a third portion of a point cloud geometry contained in a third sub-volume neighboring the second sub-volume; and wherein the encoding the third information may be based on the second and fourth information; wherein the encoding the third information may further comprise: binarizing the third information; and encoding, based on a contextual information derived from the second information, the binarized third information; wherein encoding the third information may comprise entropy encoding in a bitstream, by a binary entropy coder; wherein the entropy encoding may use probabilities based on the contextual information; wherein the binary entropy coder may be a Context Adaptive Binary Arithmetic Coder in which the contexts are selected based on the contextual information; wherein the binary entropy coder may be Optimal Binary Coders with Update on the Fly (OBUF) or dynamic OBUF based on the contextual information; wherein the third information may indicate information related to the leaf nodes of the local space-partitioning tree; wherein the local space-partitioning tree may be an occupancy tree comprising occupied leaf nodes containing at least one point of the set of points; wherein the minimum size may be signaled in a bitstream; wherein the stopping condition may be fulfilled when a maximum depth is reached; wherein the maximum depth may be signaled in the bitstream; wherein the center point of each occupied leaf node of the space-partitioning tree may be a point of the set of points if the space-partitioning is a space-partitioning tree; wherein the second information may include vertex information of at least one vertex along at least one edge of the triangle vertices, at most one vertex per edge; wherein one vertex along one edge of the first sub-volume may be derived based on a sub-set of the set of points; wherein the sub-set of points may be derived based on points of the set of points having a distance lower than a threshold from the edge; wherein position of one vertex along one edge of the first sub-volume may be derived based on projection of points of the sub-set of points on the edge; wherein projection of the points of the sub-set of points on the edge may define positions of the projected points of the sub-set of points on the edge; wherein the position of the vertex along the edge may be derived based on an average or median value of positions of the projected points of the sub-set on the edge; wherein the third information may indicate position of a vertex on an edge of the second sub-volume and the position of the vertex is entropy encoded by OBUF or dynamic OBUF based on the second information indicating position of at least one vertex along at least one edge of the first sub-volume; wherein the fourth information may indicate vertex information of vertex along edge of the third sub-volume and vertex information is entropy encoded by OBUF or dynamic OBUF based on the second and fourth information; wherein the fourth information may indicate presence of flag of a vertex on an edge of the third sub-volume and the presence flag is entropy encoded by OBUF or dynamic OBUF based on the second and fourth information; wherein the fourth information may indicate position of a vertex on an edge of the third sub-volume and the position of the vertex is entropy encoded by OBUF or dynamic OBUF based on the second and fourth information. The computing device may comprise one or more processors and memory, storing instructions that, when executed by the one or more processors, perform the method described herein. A system may comprise the computing device configured to perform the described method, additional operations, and/or include additional elements; and a second computing device configured to decode the point cloud frame. A computer-readable medium may store instructions that, when executed, cause performance of the described method, additional operations, and/or include additional elements.
A computing device may perform a method comprising multiple operations. The computing device may comprise an encoder. The computing device may obtain first information associated with a tree-based geometry representation model representing a first portion of a point cloud geometry contained in a first sub-volume of a volume encompassing the point cloud geometry; reconstruct the first portion of the point cloud geometry using the first information; generate, based on a triangle-based geometry representation model representing a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume, second information associated with the triangle-based geometry representation model representing the reconstructed first portion; and encode, in the bitstream and based on the second information, third information associated with the triangle-based geometry representation model representing the second portion of the point cloud geometry. The computing device may comprise one or more processors and memory, storing instructions that, when executed by the one or more processors, perform the method described herein. A system may comprise the computing device configured to perform the described method, additional operations, and/or include additional elements; and a second computing device configured to decode the point cloud frame. A computer-readable medium may store instructions that, when executed, cause performance of the described method, additional operations, and/or include additional elements.
A computing device may perform a method comprising multiple operations. The computing device may comprise an decoder. The computing device may obtain first information associated with a tree-based geometry representation model representing a first portion of a point cloud geometry contained in a first sub-volume of a volume encompassing the point cloud geometry; reconstruct the first portion of the point cloud geometry using the first information; generate, based on a triangle-based geometry representation model representing a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume, second information associated with the triangle-based geometry representation model representing the reconstructed first portion; and decode, from the bitstream and based on the second information, third information associated with the triangle-based geometry representation model representing the second portion of the point cloud geometry. The computing device may comprise one or more processors and memory, storing instructions that, when executed by the one or more processors, perform the method described herein. A system may comprise the computing device configured to perform the described method, additional operations, and/or include additional elements; and a second computing device configured to encode the point cloud frame. A computer-readable medium may store instructions that, when executed, cause performance of the described method, additional operations, and/or include additional elements.
A computing device may perform a method comprising multiple operations. The computing device may comprise an encoder. The computing device may obtain first information associated with a triangle-based geometry representation model representing a first portion of a point cloud geometry contained in a first sub-volume of a volume encompassing the point cloud geometry; reconstruct the first portion of the point cloud geometry using the first information; generate, based on a tree-based geometry representation model representing a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume, second information associated with the tree-based geometry representation model representing the reconstructed first portion; and encode, in the bitstream and based on the second information, third information associated with the tree-based geometry representation model representing the second portion of the point cloud geometry. The computing device may comprise one or more processors and memory, storing instructions that, when executed by the one or more processors, perform the method described herein. A system may comprise the computing device configured to perform the described method, additional operations, and/or include additional elements; and a second computing device configured to decode the point cloud frame. A computer-readable medium may store instructions that, when executed, cause performance of the described method, additional operations, and/or include additional elements.
A computing device may perform a method comprising multiple operations. The computing device may comprise an decoder. The computing device may obtain first information associated with a triangle-based geometry representation model representing a first portion of a point cloud geometry contained in a first sub-volume of a volume encompassing the point cloud geometry; reconstruct the first portion of the point cloud geometry using the first information; generate, based on a tree-based geometry representation model representing a second portion of the point cloud geometry contained in a second sub-volume neighboring the first sub-volume, second information associated with the tree-based geometry representation model representing the reconstructed first portion; and decode, from the bitstream and based on the second information, third information associated with the tree-based geometry representation model representing the second portion of the point cloud geometry. The computing device may comprise one or more processors and memory, storing instructions that, when executed by the one or more processors, perform the method described herein. A system may comprise the computing device configured to perform the described method, additional operations, and/or include additional elements; and a second computing device configured to encode the point cloud frame. A computer-readable medium may store instructions that, when executed, cause performance of the described method, additional operations, and/or include additional elements.
One or more examples herein may be described as a process which may be depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, and/or a block diagram. Although a flowchart may describe operations as a sequential process, one or more of the operations may be performed in parallel or concurrently. The order of the operations shown may be re-arranged. A process may be terminated when its operations are completed, but could have additional steps not shown in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. If a process corresponds to a function, its termination may correspond to a return of the function to the calling function or the main function.
Operations described herein may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a computer-readable or machine-readable medium. A processor(s) may perform the necessary tasks. Features of the disclosure may be implemented in hardware using, for example, hardware components such as application-specific integrated circuits (ASICs) and gate arrays. Implementation of a hardware state machine to perform the functions described herein will also be apparent to persons skilled in the art.
One or more features described herein may be implemented in a computer-usable data and/or computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types when executed by a processor in a computer or other data processing device. The computer executable instructions may be stored on one or more computer readable media such as a hard disk, optical disk, removable storage media, solid state memory, RAM, etc. The functionality of the program modules may be combined or distributed as desired. The functionality may be implemented in whole or in part in firmware or hardware equivalents such as integrated circuits, field programmable gate arrays (FPGA), and the like. Particular data structures may be used to more effectively implement one or more features described herein, and such data structures are contemplated within the scope of computer executable instructions and computer-usable data described herein. Computer-readable medium may comprise, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other mediums capable of storing, containing, or carrying instruction(s) and/or data. A computer-readable medium may include a non-transitory medium in which data can be stored and that does not include carrier waves and/or transitory electronic signals propagating wirelessly or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape, optical storage media such as compact disk (CD) or digital versatile disk (DVD), flash memory, memory or memory devices. A computer-readable medium may have stored thereon code and/or machine-executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, or the like.
A non-transitory tangible computer readable media may comprise instructions executable by one or more processors configured to cause operations described herein. An article of manufacture may comprise a non-transitory tangible computer readable machine-accessible medium having instructions encoded thereon for enabling programmable hardware to cause a device (e.g., an encoder, a decoder, a transmitter, a receiver, and the like) to allow operations described herein. The device, or one or more devices such as in a system, may include one or more processors, memory, interfaces, and/or the like.
Communications described herein may be determined, generated, sent, and/or received using any quantity of messages, information elements, fields, parameters, values, indications, information, bits, and/or the like. While one or more examples may be described herein using any of the terms/phrases message, information element, field, parameter, value, indication, information, bit(s), and/or the like, one skilled in the art understands that such communications may be performed using any one or more of these terms, including other such terms. For example, one or more parameters, fields, and/or information elements (IEs), may comprise one or more information objects, values, and/or any other information. An information object may comprise one or more other objects. At least some (or all) parameters, fields, IEs, and/or the like may be used and can be interchangeable depending on the context. If a meaning or definition is given, such meaning or definition controls. 1
One or more elements in examples described herein may be implemented as modules. A module may be an element that performs a defined function and/or that has a defined interface to other elements. The modules may be implemented in hardware, software in combination with hardware, firmware, wetware (e.g., hardware with a biological element) or a combination thereof, all of which may be behaviorally equivalent. For example, modules may be implemented as a software routine written in a computer language configured to be executed by a hardware machine (such as C, C++, Fortran, Java, Basic, Matlab or the like) or a modeling/simulation program such as Simulink, Stateflow, GNU Octave, or LabVIEWMathScript. Additionally or alternatively, it may be possible to implement modules using physical hardware that incorporates discrete or programmable analog, digital and/or quantum hardware. Examples of programmable hardware may comprise: computers, microcontrollers, microprocessors, application-specific integrated circuits (ASICs); field programmable gate arrays (FPGAs); and/or complex programmable logic devices (CPLDs). Computers, microcontrollers and/or microprocessors may be programmed using languages such as assembly, C, C++ or the like. FPGAs, ASICs and CPLDs are often programmed using hardware description languages (HDL), such as VHSIC hardware description language (VHDL) or Verilog, which may configure connections between internal hardware modules with lesser functionality on a programmable device. The above-mentioned technologies may be used in combination to achieve the result of a functional module.
One or more of the operations described herein may be conditional. For example, one or more operations may be performed if certain criteria are met, such as in computing device, a communication device, an encoder, a decoder, a network, a combination of the above, and/or the like. Example criteria may be based on one or more conditions such as device configurations, traffic load, initial system set up, packet sizes, traffic characteristics, a combination of the above, and/or the like. If the one or more criteria are met, various examples may be used. It may be possible to implement any portion of the examples described herein in any order and based on any condition.
Although examples are described above, features and/or steps of those examples may be combined, divided, omitted, rearranged, revised, and/or augmented in any desired manner. Various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be part of this description, though not expressly stated herein, and are intended to be within the spirit and scope of the descriptions herein. Accordingly, the foregoing description is by way of example only, and is not limiting.
This application claims the benefit of U.S. Provisional Application No. 63/618,118 filed on Jan. 5, 2024. The above referenced applications are hereby incorporated by reference in their entirety.
| Number | Date | Country | |
|---|---|---|---|
| 63618118 | Jan 2024 | US |