The examples and non-limiting embodiments relate generally to volumetric video coding, and more particularly, to hierarchical V3C patch remeshing for dynamic mesh coding.
It is known to perform encoding and decoding of video using an encoder and a decoder.
In accordance with an aspect, an apparatus includes: at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; subsample a geometry component of a patch of the three-dimensional object at occupied positions, based on the sampling rate for the at least one layer; define respective search windows around the respective occupied positions; select respective salient points relative to the respective occupied positions within the respective search windows; triangulate the salient points to approximate a shape of a three-dimensional object; detect zero or more triangles that overlap with at least one unoccupied pixel; split the zero or more triangles that overlap with at least one unoccupied pixel until no triangle overlaps with the unoccupied pixels; and add zero or more additional triangles close to a border of the three-dimensional object to generate a resulting mesh for the at least one layer.
In accordance with an aspect, an apparatus includes: at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: determine scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; and transmit the scalability information to a decoder; wherein the sampling rate is configured to be used with the decoder to subsample a geometry component of a patch of the three-dimensional object at occupied positions; wherein the at least one sampling rate defines a level of detail at which the geometry component is subsampled, where the level of detail increases as fewer occupied positions are subsampled, and where the level of detail is chosen depending on an operating parameter of a rendering device or viewer distance; wherein the scalability information is configured to be used with the decoder to reconstruct a mesh at different operating points to approximate a shape of the three-dimensional object.
In accordance with an aspect, an apparatus includes: at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; generate a mesh using a depth patch triangulation method using the scalability information to approximate a shape of the three-dimensional object; evaluate a quality of a depth patch triangulation compared to a reconstructed three-dimensional object reconstructed without triangulated depth patches; and iterate until a reconstructed three-dimensional object using the depth patch triangulation method reaches an expected improved quality.
The foregoing aspects and other features are explained in the following description, taken in connection with the accompanying drawings, wherein:
The examples described herein relate to the encoding, signaling and rendering of a volumetric video that is based on mesh coding. The examples described herein focus on methods improving the quality of reconstructed mesh surfaces. The examples described herein relate to methods to improve quality of decoded mesh textures and geometry by using its hierarchical representation which as a consequence increases compression efficiency of the encoding pipeline.
Volumetric Video Data
Volumetric video data represents a three-dimensional scene or object and can be used as input for AR, VR and MR applications. Such data describes geometry (shape, size, position in 3D-space) and respective attributes (e.g. color, opacity, reflectance, plus any possible temporal transformations of the geometry and attributes at given time instances (like frames in 2D video). Volumetric video is either generated from 3D models, i.e. CGI, or captured from real-world scenes using a variety of capture solutions, e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, and more. Also, a combination of CGI and real-world data is possible. Typical representation formats for such volumetric data are triangle meshes, point clouds, or voxels. Temporal information about the scene can be included in the form of individual capture instances, i.e. “frames” in 2D video, or other means, e.g. position of an object as a function of time.
Because volumetric video describes a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for AR, VR, or MR applications, especially for providing 6DOF viewing capabilities.
Increasing computational resources and advances in 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations of natural scenes. Infrared, lasers, time-of-flight and structured light are all examples of devices that can be used to construct 3D video data. Representation of the 3D data depends on how the 3D data is used. Dense voxel arrays have been used to represent volumetric medical data. In 3D graphics, polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold. Another way to represent 3D data is coding this 3D data as a set of textures and a depth map as is the case in the multi-view plus depth framework. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.
MPEG Visual Volumetric Video-Based Coding (V3C)
Selected excerpts from the ISO/IEC 23090-5 Visual Volumetric Video-based Coding and Video-based Point Cloud Compression 2nd Edition standard are referred to herein.
Visual volumetric video, a sequence of visual volumetric frames, if uncompressed, may be represented by a large amount of data, which can be costly in terms of storage and transmission. This has led to the need for a high coding efficiency standard for the compression of visual volumetric data.
The V3C specification enables the encoding and decoding processes of a variety of volumetric media by using video and image coding technologies. This is achieved through first a conversion of such media from their corresponding 3D representation to multiple 2D representations, also referred to as V3C components, before coding such information. Such representations may include occupancy, geometry, and attribute components. The occupancy component can inform a V3C decoding and/or rendering system of which samples in the 2D components are associated with data in the final 3D representation. The geometry component contains information about the precise location of 3D data in space, while attribute components can provide additional properties, e.g. texture or material information, of such 3D data. An example is shown in
As further shown in
As shown in
Additional information that allows associating all these subcomponents and enables the inverse reconstruction, from a 2D representation back to a 3D representation is also included in a special component, referred to herein as the atlas. An atlas consists of multiple elements, namely patches. Each patch identifies a region in all available 2D components and contains information necessary to perform the appropriate inverse projection of this region back to the 3D space. The shape of such regions is determined through a 2D bounding box associated with each patch as well as their coding order. The shape of these regions is also further refined after the consideration of the occupancy information.
Atlases are partitioned into patch packing blocks of equal size. Refer for example to block 202 in
Axes orientations are specified for internal operations. For instance, the origin of the atlas coordinates is located on the top-left corner of the atlas frame. For the reconstruction step, an intermediate axes definition for a local 3D patch coordinate system is used. The 3D local patch coordinate system is then converted to the final target 3D coordinate system using appropriate transformation steps.
V3C High Level Syntax
Coded V3C video components are referred to herein as video bitstreams, while an atlas component is referred to as the atlas bitstream. Video bitstreams and atlas bitstreams may be further split into smaller units, referred to herein as video and atlas sub-bitstreams, respectively, and may be interleaved together, after the addition of appropriate delimiters, to construct a V3C bitstream.
V3C patch information is contained in an atlas bitstream, atlas_sub_bitstream( ) which contains a sequence of NAL units. A NAL unit is specified to format data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media. All data are contained in NAL units, each of which contains an integer number of bytes. A NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and sample streams is identical except that in the sample stream format specified in Annex D of ISO/IEC 23090-5 each NAL unit can be preceded by an additional element that specifies the size of the NAL unit.
NAL units in an atlas bitstream can be divided into atlas coding layer (ACL) and non-atlas coding layer (non-ACL) units. The former is dedicated to carry patch data, while the latter is dedicated to carry data necessary to properly parse the ACL units or any additional auxiliary data.
In the nal_unit_header( )syntax nal_unit type specifies the type of the RBSP data structure contained in the NAL unit as specified in Table 4 of ISO/IEC 23090-5. nal_layer_id specifies the identifier of the layer to which an ACL NAL unit belongs or the identifier of a layer to which a non-ACL NAL unit applies. The value of nal_layer_id shall be in the range of 0 to 62, inclusive. The value of 63 may be specified in the future by ISO/IEC. Decoders conforming to a profile specified in Annex A of ISO/IEC 23090-5 shall ignore (i.e., remove from the bitstream and discard) all NAL units with values of nal_layer_id not equal to 0.
V3C Extension Mechanisms
While designing the V3C specification it was envisaged that amendments or new editions can be created in the future. In order to ensure that the first implementations of V3C decoders are compatible with any future extension, a number of fields for future extensions to parameter sets were reserved.
For example, the second edition of V3C introduced an extension in VPS related to MIV and the packed video component.
Rendering and Meshes
A polygon mesh is a collection of vertices, edges and faces that defines the shape of a polyhedral object in 3D computer graphics and solid modeling. The faces usually consist of triangles (triangle mesh), quadrilaterals (quads), or other simple convex polygons (n-gons), since this simplifies rendering, but may also be more generally composed of concave polygons, or even polygons with holes.
With reference to
Polygon meshes are defined by the following elements:
Vertex (402): a position in 3D space defined as (x,y,z) along with other information such as color (r,g,b), normal vector and texture coordinates.
Edge (404): a connection between two vertices.
Face (406): a closed set of edges 404, in which a triangle face has three edges, and a quad face has four edges. A polygon 408 is a coplanar set of faces 406. In systems that support multi-sided faces, polygons and faces are equivalent. Mathematically a polygonal mesh may be considered an unstructured grid, or undirected graph, with additional properties of geometry, shape and topology.
Surfaces (410): or smoothing groups, are useful, but not required to group smooth regions.
Groups: some mesh formats contain groups, which define separate elements of the mesh, and are useful for determining separate sub-objects for skeletal animation or separate actors for non-skeletal animation.
Materials: defined to allow different portions of the mesh to use different shaders when rendered.
UV coordinates: most mesh formats also support some form of UV coordinates which are a separate 2D representation of the mesh “unfolded” to show what portion of a 2-dimensional texture map to apply to different polygons of the mesh. It is also possible for meshes to contain other such vertex attribute information such as color, tangent vectors, weight maps to control animation, etc. (sometimes also called channels).
V-PCC mesh coding extension (MPEG M49588)
In the encoder extension 500, the input mesh data 502 is demultiplexed with demultiplexer 504 into vertex coordinates+attributes 506 and vertex connectivity 508. The vertex coordinates+attributes data 506 is coded using MPEG-I V-PCC (such as with MPEG-I VPCC encoder 510), whereas the vertex connectivity data 508 is coded (using vertex connectivity encoder 516) as auxiliary data 518. Both of these (encoded vertex coordinates and vertex attributes 517 and auxiliary data 518) are multiplexed using multiplexer 520 to create the final compressed output bitstream 522. Vertex ordering 514 is carried out on the reconstructed vertex coordinates 512 at the output of MPEG-I V-PCC 510 to reorder the vertices for optimal vertex connectivity encoding 516.
Based on the examples described herein, as shown in
As shown in
Based on the examples described herein, as shown in
Generic Mesh Compression
Mesh data may be compressed directly without projecting it into 2D-planes, like in V-PCC based mesh coding. In fact, the anchor for V-PCC mesh compression call for proposals (CfP) utilizes off-the shelf mesh compression technology, Draco (https://google.github.io/draco/), for compressing mesh data excluding textures. Draco is used to compress vertex positions in 3D, connectivity data (faces) as well as UV coordinates. Additional per-vertex attributes may be also compressed using Draco. The actual UV texture may be compressed using traditional video compression technologies, such as H.265 or H.264.
Draco uses the edgebreaker algorithm at its core to compress 3D mesh information. Draco offers a good balance between simplicity and efficiency, and is part of Khronos endorsed extensions for the glTF specification. The main idea of the algorithm is to traverse mesh triangles in a deterministic way so that each new triangle is encoded next to an already encoded triangle. This enables prediction of vertex specific information from the previously encoded data by simply adding delta to the previous data. Edgebreaker utilizes symbols to signal how each new triangle is connected to the previously encoded part of the mesh. Connecting triangles in such a way results on average in 1 to 2 bits per triangle when combined with existing binary encoding techniques.
MPEG 3DG (ISO/IEC SC29 WG7) has issued a call for proposal (CfP) on integration of MESH compression into the V3C standard (ISO/IEC 23090-5). During the work on a CfP response owned by the Applicant of the instant disclosure, the Applicant has identified that straightforward regular meshing of V3C patches from depth maps (geometry component) is generating meshes with too many faces 752 and vertices 753 compared to the original encoded mesh, sometimes up to ten times the number of faces 752, an example is shown on
Therefore, there is a need for a remeshing approach that keeps the quality of the decoded depth map but outputs a connectivity that is closer to the original mesh, i.e. reduces the number of vertices and faces.
Simple subsampling of the patch depth map (geometry component) may lead to poor geometry quality as represented in
The objective is to construct a triangulation of the depth map (geometry component) with i) a minimal number of triangles, ii) triangles with non-degenerate shapes, with a consistently oriented normal, iii) triangles that do not cover unoccupied depth map (geometry component) pixels, iv) vertices that capture well the salient points of the depth map (local maxima and minima), and v) patch borders that are reconstructed with a quality that can be parameterized by the encoder.
Furthermore, the triangulation should support a level-of-detail and/or multi-resolution that is adaptive to the depth map (geometry component), and the triangulation process should be fast and parallelizable.
Accordingly,
Currently signaling related to the preferred triangulation method and related parameters is missing. Thus, signaling related to the preferred triangulation method and related parameters are described herein. The methods described herein describe a re-meshing algorithm, as well as leveraging a layered scalability approach for the re-meshing algorithm.
A lot of research has been done by the computer graphics community on the area of remeshing. For example mesh simplification (also known as decimation) such as mesh simplification based on Quadric Error Metrics (QEMs) to e.g. simplify surfaces with color and texture, or progressive meshes has been introduced. These approaches and most of those based on them rely on a priority queue of edges to be collapsed (or vertices) with costs that depend on some metric computed based on the edge or vertex neighborhood. The process is iterative and costs need to be updated for all edges/vertices for which a neighboring edge or vertex has been collapsed or decimated.
Furthermore, despite the fact some of these approaches, such as those based on QEMs, consider the attributes and not only geometry for the cost estimations, the computational cost increases for limited enhanced quality, for example when using a quadric metric for simplifying meshes with appearance attributes.
Wavelet decompositions is another approach to remesh meshes in a hierarchical representation including filters, such as normal meshes, but the complexity of the approach and the limited compression gains have not enabled these approaches to be used in recent compression frameworks such as the current de facto state-of-the art Draco, that is based on an edgebreaker algorithm (e.g. for connectivity compression for triangle meshes), without any hierarchical or simplification tool. There were also no extensions of the wavelet frameworks to dynamic meshes, but rather a focus on static meshes.
These approaches also do not take into account the nature of V3C patches, that have a depth map (geometry component) with possibly unoccupied pixels and that represent a projection and rasterization of a 3D mesh patch.
Disclosed herein is a method to remesh V3C patches in a way that can be parameterized by the encoder and provided to a decoder in or along V3C bitstream as spatial scalability levels or other types of signaling that are explicitly described hereafter.
Notably the examples described herein allow a mesh to be reconstructed with several resolutions, in the spirit of spatial resolution mesh scalability, so that the decoder can process the decoded V3C patches at different operating points: ranging from coarse meshes at high speed and low memory usage to full resolution meshes at highest quality but with longer execution times and a higher memory footprint. This scalability and flexibility in operating points is desirable for applications where speed and memory consumption are challenging (e.g. XR applications on mobile phones) or where a greener lower power consumption mode is required, or when several meshes need to be decoded and rendered in real-time on the same device. The scalability is also exceptionally useful for enabling lod-based rendering of meshes at different distances from the viewer, where a higher quality is used for objects closer to the viewer and further away objects are rendered with fewer triangles.
The remeshing algorithm is performed in several hierarchical steps that can all be processed in parallel at a depth map (geometry component) block of pixels or pixel level. The algorithm is designed to preserve the geometry characteristics by sampling salient points, and to guarantee that non-degenerate triangles can be regenerated from the encoded patches.
Furthermore, the algorithm produces regions with coarser sampling when the geometry does not present salient features (planar region or smooth curvature region) and regions with finer sampling close to patch borders and on regions where curvature presents several salient points (e.g. topological salient pixels of the curvature field of the shape such as umbilical points).
The approach is numerically efficient as each step exhibits a high level of data processing parallelism and such that the approach processes data from coarse to fine granularity at the boarders of the patches. When high resolution patches need to be remeshed, the remeshing process at each step accesses a subset of pixels of the depth map (geometry component) by resampling or defining small local regions (working groups) for processing; no processing is applied in the full resolution depth map as a whole at any moment.
The herein described approach can lead to decreased quality compared to fine-to-coarse approaches, such as mesh simplification, however, the computational complexity is much lower with higher data parallelism, and well-shaped triangles are guaranteed by construction, which makes the herein described method a good alternative for devices with limited computational capability.
Accordingly, disclosed herein is a method for performing optimization of the remeshing procedure at the encoder side, deriving values for optimized remeshing parameters at different operating points, providing the encoded mesh as a bitstream along with the optimized parameters and scalability information to a decoder, and reconstructing the mesh from the provided bitstream utilizing the optimized parameters at different operating points.
In the herein described method an encoder calculates a set of subsampling factors and provides them to the decoder as spatial scalability layer information. The spatial scalability layer information is firstly provided at the sequence level, e.g. through an extension of ASPS. Each V3C patch can be further amended through V3C patch data unit syntax structure information, when necessary. For example, a V3C patch data syntax structure may provide information to a decoder to be subsampled at a more optimal subsampling factor that may be different along the patch X, Y axis due to the shape of the patch in 2D and in 3D.
The remeshing process used in the herein described method consists of the following items (1-5 immediately following). The syntax elements used in the description of the remeshing process illustrate only one possible implementation of signaling.
1. Initialization
The decoder initializes the layering at the sequence level, e.g. through ASPS information, with: the number of layers, e.g. through asps_num_scalable_layers_minus1, and its sampling rates, e.g. through asps_scalable_layer_sampling_rate[i] for all i from 0 to mri_num_scalable_layers_minus1. This sampling rate indicates that the decoder should sample one geometry component sample every asps_scalable_layer_sampling_rate[i] samples in X and in Y coordinates. This sampling rate can optionally be adapted at the V3C patch level as signaled by the encoder in the V3C bitstream. The sampling rates in X and Y can be specified as different values at the patch level, for example in case of elongated patches.
2. Subsampling
The decoder then produces a coarse mesh corresponding to the lowest scalable layer.
The depth map is subsampled at occupied positions according to the maximum level of subsampling asps_scalable_layer_sampling_rate[0]. This generates the coarsest sampling of the patch geometry component.
Optionally the decoder may sample a patch with a sampling rate that is different from asps_scalable_layer_sampling_rate[0] if the flag pdu_mri_adaptive_sampling_factors_enable_flag is true, and use pdu_mri_adaptive_sampling_factor_delta_X and pdu_mri_adaptive_sampling_factor_delta_Y in X and Y dimensions respectively as a value to be added to the mri_scalable_layer_sampling_rate. Such sampling rate deltas can be optionally specified for each layer pdu_mri_adaptive_sampling_factor_delta_X[i] and pdu_mri_adaptive_sampling_factor_delta_Y[i].
Optionally the patch data unit information can specify that the patch should be discarded at a scalable layer through pdu_mri_skip_patch_at_layer_op[i].
The subsampling may also optionally include filtering. Filtering can include Laplacian or Gaussian pyramids, DWT, low-pass filters etc. that can avoid aliasing issues in this coarse resampling. Such filters should ideally be occupancy-aware; in other words, the filters are able to discard depth (geometry) values that are not valid.
The subsampling may also optionally select one of the nearest occupied pixels (this increases the computational cost).
3. Local Optimization: (See
3.1 The decoder detects salient points in a local window and uses them rather than the first regular subsampled pixel positions. Such salient points can be defined in increasing order of computational complexity as local extrema, singularities of the curvature flow, and points on seams of the texture coordinates or edges of the texture pixels.
Local extrema are the local minimum or maximum of the valid depth (geometry) pixels; detecting such pixels is computationally efficient and they enable the method to better capture the range of the depth (geometry) signal.
Singularities of the curvature flow such as umbilical points are defined such that principal curvatures are equal at these points. While they are not discriminative for planes or smooth simple surfaces such as quadrics, umbilical points of shapes that exhibit more variety in curvature are located at the intersection of the shape curvature principal directions vector field separatrices (lines that separate portions of the shape that have homogeneous curvature flow), which are good candidates for adding vertices in a remeshing process.
Adding points on seams of the texture coordinates or edges of the texture pixels; adding such points ensure a higher quality interpolation of the texture signal once the geometry is remeshed. Such detection may not be complex but the texture component resolution may be higher than the one of the geometry components.
Any combination of the aforementioned points may be the salient points.
3.2 The local window is centered on the subsampled pixel and its size depends on the subsampling factor such that local windows do not overlap and ideally, that a pixel column or row separates them. This ensures well shaped triangles, i.e. a Poisson disk around vertices by construction.
3.3 Triangulate salient points (see
3.4. Occupancy-based correction: (see
3.5. Refine contours (see
3.6. Mesh optimization. The decoder optionally flips edges to maximize geometry smoothness, and/or optionally smooths the resulting mesh.
4. Extracting other layers
The decoding of other layers involves the same sub-items as the initialization and may operate in two ways: 1) layer per layer from the base layer to the next layer until the target layer is reached; the goal is to output as many meshes as there are layers specified by mri_num_scalable_layers_minus1, or 2) directly from the base layer to the target layer, where only the base mesh and the target layer mesh are reconstructed.
When extracting the next layers the subsampling (2), the local optimization (3), especially the refine contours (3.5) and mesh optimization (3.6) are updated as follows (i-iv immediately below).
i) The subsampling operates at the current layer sampling rate on the geometry component, optionally with patch data unit information that signal a delta in the X and/or Y direction. If a geometry component sample is located at the same position, or optionally at a neighboring position, that is already occupied by the sample from the previous layer, then the sample position of the previous layer is kept, otherwise a new sample is added for the current layer. The position of a sample from a previous layer may have been determined through local optimization or occupancy-based correction (steps 3.1, 3.4 of the previous layer) etc., it is not only based on subsampling (step 2) of the previous layer.
ii) The local optimization centers a window on the samples obtained in the previous step, i.e. for a lower layer. If a sample from a previous layer is located in the window, then optionally, this sample is retained, otherwise a new point is selected based on salient point detection.
The following are examples based on the numbering herein:
Mesh with layers 0 and 1
Mesh with 3 layers 0, 1 and 2
Mesh with 3 layers 0, 1 and 2
Direct mode: target layer is layer 2
Mesh with 3 layers 0, 1 and 2
Layer-per-layer mode: target layer is layer 2
iii) In the layer-per-layer mode, the amount of local optimization is reduced due to the presence of samples in the local search windows, while in the direct mode, the local optimization is executed more often especially if the distance between layers in the hierarchy is large.
iv) In case the layer-per-layer mode is selected, 3.5 Refine contours and 3.6 Mesh optimization are applied layer per layer, otherwise in the direct mode, 3.5 Refine contours and 3.6 Mesh optimization are only applied for the target layer.
5. Finalization. For each decoded layer, the texture component is mapped based on the decoded vertices' texture coordinates. The texture component is also down sampled based on the asps_scalable_layer_sampling_rate[i] information, optionally refined at the patch level.
In one embodiment the signaling information related to the method is provided as extensions to atlas_sequence_parameter_set_rbsp( ) and the patch_data_unit( ) syntax elements defined in ISO/IEC 23090-5.
asps_lodhierachical_enable_flag equal to 1 indicates that the hierarchical subsampling syntax elements are present.
asps_num_scalable_layers_minus1 indicates the number of scalable_layers that can be extracted according to the algorithm presented herein.
asps_scalable_layer_sampling_rate[i] indicates the sampling rate of the scalable layer with index i. Scalable layer index i shall be strictly between 0 and asps_num_scalable_layers_minus1.
asps_remeshing_type indicates the value of subsampling filtering. asps_remeshing_type equal to 0 signals Gaussian pyramids, equal to 1 signals DWT multiresolution, and equal to 2 FIR filter. Values from 3 . . . 255 can provide other types.
asps_max_edge_flips indicates the maximal number of allowed edge flips.
asps_smoothing_iter indicates Laplacian smoothing iterations.
pdu_mri_skippatch_at_layer_op[tileID][patchIdx] indicates the operating point from which the patch is used for reconstruction. The operating point is the scalability layer index.
In one embodiment the patches belonging to the same operating point may be grouped in one tile to expose the information on in sequence or frame parameter sets for easy bitstream pruning.
pdu_remeshing_local_window_size_x [tileID][p] indicates the local optimization window size in the x-direction centered at a point in a patch with index p of the current atlas tile, with tile ID equal to tileID, prior to its addition to the patch coordinate TilePatch3dOffsetV[tileID][p].
pdu_remeshing_local_window_size_y [tileID][p] indicates the local optimization window size in the y-direction centered at a point in a patch with index p of the current atlas tile, with tile ID equal to tileID, prior to its addition to the patch coordinate TilePatch3dOffsetV[tileID][p].
pdu_mri_adaptive_sampling_factors_enable_flag [tileID][p] equal to 1 for tile tileID and patch index p indicates that pdu_mri_adaptive_sampling_factor_delta_X and pdu_mri_adaptive_sampling_factor_delta_Y should be interpreted according to the algorithm presented herein.
pdu_mri_adaptive_sampling_factor_delta_X [tileID][p] for tile tileID and patch index p, indicates the delta to be applied to the operating sampling rate asps_scalable_layer_sampling_rate[i] at layer i for the geometry component X dimension. The resulting sampling factor is obtained as pdu_mri_adaptive_sampling_factor_delta_X [tileID][p]+asps_scalable_layer_sampling_rate[i]
pdu_mri_adaptive_sampling_factor_delta_Y [tileID][p] for tile tileID and patch index p, indicates the delta to be applied to the operating sampling rate asps_scalable_layer_sampling_rate[i] at layer i for the geometry component Y dimension. The resulting sampling factor is obtained as pdu_mri_adaptive_sampling_factor_delta_Y [tileID][p]+asps_scalable_layer_sampling_rate[i]
In one embodiment the signaling information related to the method is provided through two SEI messages, one corresponding to sequence level information containing the same information as the asps_mesh_extension( )syntax structure, and one corresponding to patch level information containing the same information as the pdu_mesh_extension (tileID, patchIdx) syntax structure for each tile and patch for which this SEI applies.
Ideas herein are to be contributed to standardization in ISO/IEC SC 29 WG7 as part of response to CfP on mesh coding.
Structures and concepts described herein may be included as normative text in a standard.
The apparatus 900 may be a remote, virtual or cloud apparatus. The apparatus 900 may be either a writer or a reader (e.g. parser), or both a writer and a reader (e.g. parser). The apparatus 900 may be either a coder or a decoder, or both a coder and a decoder (codec). The apparatus 900 may be a user equipment (UE), a head mounted display (HMD), or any other fixed or mobile device.
The memory 904 may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The memory 904 may comprise a database for storing data. Interface 912 enables data communication between the various items of apparatus 900, as shown in
As used herein, a level of detail may refer to a level of coarseness.
The following examples 1-29 are described herein.
Example 1: An apparatus includes at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: receive scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; subsample a geometry component of a patch of the three-dimensional object at occupied positions, based on the sampling rate for the at least one layer; define respective search windows around the respective occupied positions; select respective salient points relative to the respective occupied positions within the respective search windows; triangulate the salient points to approximate a shape of a three-dimensional object; detect zero or more triangles that overlap with at least one unoccupied pixel; split the zero or more triangles that overlap with at least one unoccupied pixel until no triangle overlaps with the unoccupied pixels; and add zero or more additional triangles close to a border of the three-dimensional object to generate a resulting mesh for the at least one layer.
Example 2: The apparatus of example 1, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: initialize, through atlas sequence parameter set information, the at least one layer and the at least one sampling rate.
Example 3: The apparatus of any of examples 1 to 2, wherein the at least one sampling rate defines a level of detail at which the geometry component is subsampled, where the level of detail increases as fewer occupied positions are subsampled, and where the level of detail is chosen depending on an operating parameter of a rendering device or viewer distance.
Example 4: The apparatus of any of examples 1 to 3, wherein the at least one sampling rate differs for different layers of the at least one layer.
Example 5: The apparatus of any of examples 1 to 4, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: receive adaptive sampling signaling comprising a flag indicating whether adaptive sampling is enabled; and add a delta to the at least one sampling rate, in response to the adaptive sampling being enabled.
Example 6: The apparatus of any of examples 1 to 5, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: receive signaling indicating that the patch should be discarded at the at least one layer.
Example 7: The apparatus of any of examples 1 to 6, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: maintain a first sample position of a previous layer for a current layer, in response to a second sample of the current layer being located at a common or neighboring position of a first sample of the previous layer; and add a second sample position for the current layer, in response to the second sample of the current layer not being located at the common or neighboring position of the first sample of the previous layer.
Example 8: The apparatus of any of examples 1 to 7, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: update the subsampling of the geometry component for a current layer; and center a window of the respective search windows around samples generated based on the updated subsampling; retain a first sample from a previous layer, in response to the first sample being located within the centered window; and detect a new salient point, in response to the first sample not being located within the centered window.
Example 9: The apparatus of any of examples 1 to 8, wherein the defining of the respective search windows, and the selecting of the respective salient points are performed more frequently, in response to a distance between layers being greater than a threshold.
Example 10: The apparatus of any of examples 1 to 9, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: map a texture component based on respective texture coordinates for a set of respective vertices; and downsample the texture component, based on the at least one layer.
Example 11: The apparatus of any of examples 1 to 10, wherein: the scalability information is received as an extension to an atlas sequence parameter set raw byte sequence payload; or the scalability information is received as an extension to a patch data unit.
Example 12: The apparatus of any of examples 1 to 11, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: group patches belonging to a common operating point in a tile; and provide information related to a sequence of at least one frame parameter set for bitstream pruning.
Example 13: An apparatus includes at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: determine scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; and transmit the scalability information to a decoder; wherein the sampling rate is configured to be used with the decoder to subsample a geometry component of a patch of the three-dimensional object at occupied positions; wherein the at least one sampling rate defines a level of detail at which the geometry component is subsampled, where the level of detail increases as fewer occupied positions are subsampled, and where the level of detail is chosen depending on an operating parameter of a rendering device or viewer distance; wherein the scalability information is configured to be used with the decoder to reconstruct a mesh at different operating points to approximate a shape of the three-dimensional object.
Example 14: The apparatus of example 13, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: transmit adaptive sampling signaling comprising a flag indicating whether adaptive sampling is enabled; wherein the adaptive sampling signaling is configured to be used with the decoder to add a delta to the at least one sampling rate, in response to the adaptive sampling being enabled.
Example 15: The apparatus of any of examples 13 to 14, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: transmit signaling indicating that the patch should be discarded at the at least one layer.
Example 16: The apparatus of any of examples 13 to 15, wherein: the scalability information is transmitted as an extension to an atlas sequence parameter set raw byte sequence payload; or the scalability information is transmitted as an extension to a patch data unit.
Example 17: An apparatus includes at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: receive scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; generate a mesh using a depth patch triangulation method using the scalability information to approximate a shape of the three-dimensional object; evaluate a quality of a depth patch triangulation compared to a reconstructed three-dimensional object reconstructed without triangulated depth patches; and iterate until a reconstructed three-dimensional object using the depth patch triangulation method reaches an expected improved quality.
Example 18: The apparatus of example 17, wherein the at least one sampling rate defines a level of detail at which the geometry component is subsampled, where the level of detail increases as fewer occupied positions are subsampled, and where the level of detail is chosen depending on an operating parameter of a rendering device or viewer distance.
Example 19: The apparatus of any of examples 17 to 18, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: receive adaptive sampling signaling comprising a flag indicating whether adaptive sampling is enabled; and add a delta to the at least one sampling rate, in response to the adaptive sampling being enabled.
Example 20: The apparatus of any of examples 17 to 19, wherein: the scalability information is received as an extension to an atlas sequence parameter set raw byte sequence payload; or the scalability information is received as an extension to a patch data unit.
Example 21: A method includes receiving scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; subsampling a geometry component of a patch of the three-dimensional object at occupied positions, based on the sampling rate for the at least one layer; defining respective search windows around the respective occupied positions; selecting respective salient points relative to the respective occupied positions within the respective search windows; triangulating the salient points to approximate a shape of a three-dimensional object; detecting zero or more triangles that overlap with at least one unoccupied pixel; splitting the zero or more triangles that overlap with at least one unoccupied pixel until no triangle overlaps with the unoccupied pixels; and adding zero or more additional triangles close to a border of the three-dimensional object to generate a resulting mesh for the at least one layer.
Example 22: A method includes determining scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; and transmitting the scalability information to a decoder; wherein the sampling rate is configured to be used with the decoder to subsample a geometry component of a patch of the three-dimensional object at occupied positions; wherein the at least one sampling rate defines a level of detail at which the geometry component is subsampled, where the level of detail increases as fewer occupied positions are subsampled, and where the level of detail is chosen depending on an operating parameter of a rendering device or viewer distance; wherein the scalability information is configured to be used with the decoder to reconstruct a mesh at different operating points to approximate a shape of the three-dimensional object.
Example 23: A method includes receiving scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; generating a mesh using a depth patch triangulation method using the scalability information to approximate a shape of the three-dimensional object; evaluating a quality of a depth patch triangulation compared to a reconstructed three-dimensional object reconstructed without triangulated depth patches; and iterating until a reconstructed three-dimensional object using the depth patch triangulation method reaches an expected improved quality.
Example 24: An apparatus includes means for receiving scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; means for subsampling a geometry component of a patch of the three-dimensional object at occupied positions, based on the sampling rate for the at least one layer; means for defining respective search windows around the respective occupied positions; means for selecting respective salient points relative to the respective occupied positions within the respective search windows; means for triangulating the salient points to approximate a shape of a three-dimensional object; means for detecting zero or more triangles that overlap with at least one unoccupied pixel; means for splitting the zero or more triangles that overlap with at least one unoccupied pixel until no triangle overlaps with the unoccupied pixels; and means for adding zero or more additional triangles close to a border of the three-dimensional object to generate a resulting mesh for the at least one layer.
Example 25: An apparatus includes means for determining scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; and means for transmitting the scalability information to a decoder; wherein the sampling rate is configured to be used with the decoder to subsample a geometry component of a patch of the three-dimensional object at occupied positions; wherein the at least one sampling rate defines a level of detail at which the geometry component is subsampled, where the level of detail increases as fewer occupied positions are subsampled, and where the level of detail is chosen depending on an operating parameter of a rendering device or viewer distance; wherein the scalability information is configured to be used with the decoder to reconstruct a mesh at different operating points to approximate a shape of the three-dimensional object.
Example 26: An apparatus includes means for receiving scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; means for generating a mesh using a depth patch triangulation method using the scalability information to approximate a shape of the three-dimensional object; means for evaluating a quality of a depth patch triangulation compared to a reconstructed three-dimensional object reconstructed without triangulated depth patches; and means for iterating until a reconstructed three-dimensional object using the depth patch triangulation method reaches an expected improved quality.
Example 27: A non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations is described and provided, the operations comprising: receiving scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; subsampling a geometry component of a patch of the three-dimensional object at occupied positions, based on the sampling rate for the at least one layer; defining respective search windows around the respective occupied positions; selecting respective salient points relative to the respective occupied positions within the respective search windows; triangulating the salient points to approximate a shape of a three-dimensional object; detecting zero or more triangles that overlap with at least one unoccupied pixel; splitting the zero or more triangles that overlap with at least one unoccupied pixel until no triangle overlaps with the unoccupied pixels; and adding zero or more additional triangles close to a border of the three-dimensional object to generate a resulting mesh for the at least one layer.
Example 28: A non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations is described and provided, the operations comprising: determining scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; and transmitting the scalability information to a decoder; wherein the sampling rate is configured to be used with the decoder to subsample a geometry component of a patch of the three-dimensional object at occupied positions; wherein the at least one sampling rate defines a level of detail at which the geometry component is subsampled, where the level of detail increases as fewer occupied positions are subsampled, and where the level of detail is chosen depending on an operating parameter of a rendering device or viewer distance; wherein the scalability information is configured to be used with the decoder to reconstruct a mesh at different operating points to approximate a shape of the three-dimensional object.
Example 29: A non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations is described and provided, the operations comprising: receiving scalability information, the scalability information comprising a number of at least one layer of a three-dimensional object, and at least one sampling rate for the at least one layer; generating a mesh using a depth patch triangulation method using the scalability information to approximate a shape of the three-dimensional object; evaluating a quality of a depth patch triangulation compared to a reconstructed three-dimensional object reconstructed without triangulated depth patches; and iterating until a reconstructed three-dimensional object using the depth patch triangulation method reaches an expected improved quality.
References to a ‘computer’, ‘processor’, etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device such as instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device, etc.
As used herein, the term ‘circuitry’ may refer to any of the following: (a) hardware circuit implementations, such as implementations in analog and/or digital circuitry, and (b) combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present. As a further example, as used herein, the term ‘circuitry’ would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware. The term ‘circuitry’ would also cover, for example and if applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device. Circuitry may also be used to mean a function or a process, such as one implemented by an encoder or decoder, or a codec.
In the figures, arrows between individual blocks represent operational couplings there-between as well as the direction of data flows on those couplings.
It should be understood that the foregoing description is only illustrative. Various alternatives and modifications may be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination(s). In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.
The following acronyms and abbreviations that may be found in the specification and/or the drawing figures are defined as follows:
This application claims priority to U.S. Provisional Application No. 63/321,208, filed Mar. 18, 2022, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63321208 | Mar 2022 | US |