The examples and non-limiting embodiments relate generally to multimedia transport and, more particularly, to V-DMC displacement rectangular packing.
It is known to perform data compression and decoding in a multimedia system.
In accordance with an aspect, an apparatus includes at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive mesh data; determine a number of subdivisions for the mesh data, wherein a subdivision of the subdivisions comprises a level of detail level, wherein the number of subdivisions comprises a subdivision count; calculate displacement values for the subdivisions; pack the displacement values for the subdivisions in a rectangular region of a video frame; encode the video frame; and signal in or along a video-based dynamic mesh coding bitstream mapping information indicating a relation between displacement values for level of detail levels and regions of the video frame.
In accordance with an aspect, an apparatus includes at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive a video-based dynamic mesh coding bitstream comprising mesh data; extract from the video-based dynamic mesh coding bitstream a number of level of detail levels and mapping information indicating a relation between displacement values for the level of detail levels and regions of a video frame; receive level of detail information, wherein the level of detail information comprises a level of detail threshold; extract and decode the displacement values as indicated by the mapping information from a video substream corresponding to level of detail levels that are below or equal to the level of detail threshold; and reconstruct a level of detail level of the mesh data using the extracted and decoded displacement values.
In accordance with an aspect, a method includes receiving mesh data; determining a number of subdivisions for the mesh data, wherein a subdivision of the subdivisions comprises a level of detail level, wherein the number of subdivisions comprises a subdivision count; calculating displacement values for the subdivisions; packing the displacement values for the subdivisions in a rectangular region of a video frame; encoding the video frame; and signaling in or along a video-based dynamic mesh coding bitstream mapping information indicating a relation between displacement values for level of detail levels and regions of the video frame.
In accordance with an aspect, a method includes receiving a video-based dynamic mesh coding bitstream comprising mesh data; extracting from the video-based dynamic mesh coding bitstream a number of level of detail levels and mapping information indicating a relation between displacement values for the level of detail levels and regions of a video frame; receiving level of detail information, wherein the level of detail information comprises a level of detail threshold; extracting and decoding the displacement values as indicated by the mapping information from a video substream corresponding to level of detail levels that are below or equal to the level of detail threshold; and reconstructing a level of detail level of the mesh data using the extracted and decoded displacement values.
The foregoing embodiments and other features are explained in the following description, taken in connection with the accompanying drawings, wherein:
The examples described herein to volumetric video, particularly to the new standard called Video-based Dynamic Mesh Coding (V-DMC) ISO/IEC 23090-29, which is a new application of the Visual Volumetric Video Coding (V3C) standard family ISO/IEC 23090-5.
Visual volumetric video, a sequence of visual volumetric frames, if uncompressed, may be represented by a large amount of data, which can be costly in terms of storage and transmission. This has led to the need for a high coding efficiency standard for the compression of visual volumetric data.
V3C specification enables the representation of a variety of volumetric media by using video and image coding technologies. This is achieved through first a conversion of such media from their corresponding 3D representation to multiple 2D representations, also referred to as V3C components, before coding such information. Such representations may include occupancy, geometry, and attribute components. The occupancy component can inform a V3C decoding and/or rendering system of which samples in the 2D components are associated with data in the final 3D representation. The geometry component contains information about the precise location of 3D data in space, while attribute components can provide additional properties, e.g., texture or material information, of such 3D data.
Additional information that allows associating all these subcomponents and enables the inverse reconstruction, from a 2D representation back to a 3D representation is also included in a special component, referred to in this document as the atlas. An atlas consists of multiple elements, named as patches. Each patch identifies a region in all available 2D components and contains information necessary to perform the appropriate inverse projection of this region back to the 3D space. The shape of such regions is determined through a 2D bounding box associated with each patch as well as their coding order. The shape of these regions is also further refined after the consideration of the occupancy information.
Coded V3C video components are referred to in this document as video bitstreams, while a V3C atlas component is referred to as the atlas bitstream. Video bitstreams and atlas bitstreams may be further split into smaller units, referred to here as video and atlas sub-bitstreams, respectively, and may be interleaved together, after the addition of appropriate delimiters, to construct a V3C bitstream.
V3C patch information is contained in atlas bitstream, which contains a sequence of NAL units. NAL unit is specified to format data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media. A NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and sample streams is identical except that in the sample stream format specified in Annex D of ISO/IEC 23090-5 each NAL unit can be preceded by an additional element that specifies the size of the NAL unit.
NAL units in atlas bitstream can be divided to atlas coding layer (ACL) and non-atlas coding layer (non-ACL) units. The former is dedicated to carry patch data while the latter is used to carry data necessary to properly parse the ACL units or any additional auxiliary data. NAL unit is identified by type that is specified in Table 4 of ISO/IEC 23090-5.
While designing V3C specification it was envisaged that amendments or new editions can be created in the future. In order to ensure that the first implementations of V3C decoders are compatible with any future extension, a number of fields for future extensions to parameter sets were reserved.
The V-DMC consists of: generating a base mesh that is a simplified (low resolution) mesh approximation of the original mesh, called base mesh (this is done for all frames of the dynamic mesh sequence) mi, performing several mesh subdivision iterative steps (e.g., each triangle is converted into four triangles by connecting the triangle edge midpoints) on the generated base mesh, generating other approximation meshes mni; where n stands for the number of iterations with mi=m0i, defining displacement vectors di, also named error vectors, for each vertex of each mesh approximation mni with n>0, noted dmi, for each subdivision level, the deformed mesh, obtained by mni+dni, i.e., by adding the displacement vectors to the subdivided mesh vertices generates the best approximation of the original mesh at that resolution, given the base mesh and prior subdivision levels, the displacement vectors, may undergo a lazy wavelet transform prior to compression, the attribute map of the original mesh is transferred to the deformed mesh at the highest resolution (i.e., subdivision level) such that texture coordinates are obtained for the deformed mesh and a new attribute map is generated.
Referring to
Referring to
The encoder is illustrated on
The inter encoding process is similar to the intra encoding process with the following changes. The reconstructed reference base mesh is an input of the inter coding process 400. A new module called Motion Encoder 402 takes as input the quantized input base mesh 404 and the reconstructed quantized reference base mesh 318 to produce compressed motion information encoded as a compressed motion bitstream 406, which is multiplexed by multiplexer 408 into the encoder output compressed bitstream 420. All other modules and processes are similar to the intra encoding case.
The compressed bitstream generated by the encoder multiplexes: a sub-bitstream with the encoded base mesh using a static mesh codec, a sub-bitstream with the encoded motion data using an animation codec for base meshes in case INTER coding is enabled, a sub-bitstream with the wavelet coefficients of the displacement vectors packed in an image and encoded using a video codec, a sub-bitstream with the attribute map encoded using a video codec, and a sub-bitstream that contains all metadata required to decode and reconstruct the mesh sequence based on the aforementioned sub-bitstreams. The signaling of the metadata is based on the V3C syntax and includes necessary extensions that are specific to meshes.
The signaling of the metadata and substreams produced by the encoder and ingested by the decoder is an extension of V3C. It is as follows and mainly consists in additional V3C unit header syntax, additional V3C unit payload syntax, and Mesh Intra patch data unit.
Base meshes are the output of the base mesh substream decoder. A submesh is a set of vertices, their connectivity and the associated attributes which can be decoded completely independently in a mesh frame. Each base mesh can have one or more submeshes.
Resampled base meshes are the output of the mesh subdivision process. The one or more inputs to the process is the base meshes (or sets of submeshes) as well as the information from the atlas data substream on how to subdivide/resample the meshes (submeshes).
A displacement video is the output of the displacement decoder. The inputs to the process is the decoded geometry video as well the information from the atlas data substream on how to interpret/process this video. The displacement video contains displacement values to be added to the corresponding vertices.
ISO/IEC 23090-5: Visual Volumetric Video-based Coding (V3C) includes a base mesh substream and codec.
One of the key features of the current V-DMC specification design is the support for a base mesh signal that can be encoded using any currently or future specified static mesh codec. For example, such information could be coded using Draco 3D Graphics Compression or the MPEG Edgebreaker-based static mesh codec.
V-DMC introduces a new Base Mesh Substream format to allow to carry compressed frames of the base mesh. This new format is very similar to the atlas sub-bitstream of V3C, with the base mesh sub-bitstream also constructed using NAL units. High Level Syntax (HLS) structures such as base mesh sequence parameter sets, base mesh frame parameter sets, submesh layer are also specified.
ISO/IEC 23090-5: Visual Volumetric Video-based Coding (V3C) includes submeshes.
ISO/IEC 23090-5: Visual Volumetric Video-based Coding (V3C) includes Level-of-Detail (LoD). The V-DMC framework utilizes the well-known concept of Level-of-Detail (LoD).
ISO/IEC 23090-5: Visual Volumetric Video-based Coding (V3C) includes the concept of displacement video frame packing.
The video-based packing of displacement is illustrated on
Thus,
The approaches illustrated on
The problem consists in packing displacement data in a video frame in such a way that LODs can be easily extracted from coarse to higher LOD with spatial random access and to allow for in bitstream rate adaption by simply dropping the highest LODs when needed, while achieving good compression performance.
As shown on
The V-DMC high level syntax (HLS) does not provide possibility to pack LOD displacements in such a way that it can be utilized on the system level during smart delivery over unreliable/limited transmission channel.
Described herein are examples, including methods and machines, to enable packing of displacement data of V-DMC in rectangular regions of the video frame. The regions can, for example, correspond to tiles as defined in HEVC and VVC, or subpictures as defined in VVC. The position of the regions and its mapping to LODs would be signaled in V-DMC bitstream, allowing an apparatus to perform inverse reconstruction of a mesh based on the selected LOD.
Described herein is an apparatus/method/computing program comprising: receiving mesh data, determine subdivision count (number of LODs) for a given mesh data and calculate the displacement values for each subdivision iteration (LOD), pack the displacement values for each subdivision in a rectangular or square region of a video frame, and signal in or along a V-DMC bitstream mapping information indicating the relation between displacement LODs and the regions of the video frame.
The main purpose of packing in rectangular regions is to be able to encode later on with a video codec which allows to independently decode the regions (e.g. subbitstreams in VVC).
Described herein is an apparatus/method/computing program comprising: receiving a V-DMC bitstream, extracting from the bitstream the number of LODs and the mapping information indicating the relation between displacement LODs and the video frame regions, receiving desired maximum LOD information from a rendering module, extracting and decoding only the displacement values as indicated by the mapping information from the video substream corresponding to the LOD levels that are below or equal to the desired maximum LOD level to be decoded, and reconstructing the desired LOD of the mesh data using extracted and decoded displacement values.
In an embodiment, the video frame to be encoded (including displacement data for different LOD-levels) may be prepared such that the resolution of the packed video frame is less than or equal to a maximum resolution allowed by a certain profile-tier-level (PTL) of the target video decoder. The target decoder may be based on the devices for which the application or the content provider chooses to enable V-DMC based use-cases.
In an alternate embodiment, the video frame to be encoded (including displacement data for different LOD-levels) may be prepared such that the packed video frame does not contain any regions with empty data (or data not used for rendering) or contain regions with empty data (or data not used for rendering) which is below a threshold. The threshold may be chosen based on the bitrate overhead resulting from including empty data (or data not used for rendering), or the threshold may be manually set by the system user. Determination of the threshold may not need to be fully automated by such a calculation of bitrate overhead.
In an embodiment, the region with empty data may be filled with substitutable subpictures or slices or tiles with for example black pixels or gray pixels. The pixels may be of a different color other than gray or black, where the color of the pixels of the subpictures or slices or tiles used for the empty data is a configurable option, as long as there is an indication that the subpictures or slices or tiles used for the empty data are not part of the packed regions for displacement information. The indication that the subpictures or slices or tiles used for the empty data are not part of the packed regions for displacement information may be implicit, based on for example the color of the pixels of the subpictures or slices or tiles used for the empty data. For example, when the color of the pixels of the subpictures or slices or tiles used for the empty data is gray or black, this may be an implicit indication to a decoder that the subpictures or slices or tiles used for the empty data do not have displacement information.
In one embodiment, the subdivision count is determined based on input parameters to the encoder.
In one embodiment, the displacement data is sorted by LOD-level and following a vertex traversal order as defined in the VDMC framework, such as the Morton order for example.
In one embodiment the video frame can contain multiple sub streams of V-DMC data, e.g., texture information and displacement information. In another embodiment the video frame can only contain displacement values.
In one embodiment displacement data for each LOD is mapped to a separate region that is aligned to CTU boundaries.
In another embodiment, these regions are defined as slices of HEVC.
In one embodiment, these regions are rectangular or square and correspond to Tiles.
In one embodiment the Tiles are defined as Motion-Constrained Tile Sets (MCTS) such that motion compensation is constrained to refer to the same tile location in reconstructed reference frames.
In one embodiment, these regions are defined as VVC subpictures.
In one embodiment, the signaling information in a bitstream allowing the client to determine the relations between region of video frame and LOD and allows extract the displacement values of each LOD is provided in ASPS.
asps_vdmc_ext_packing_method equal to 0 specifies that the displacement component samples are packed in ascending order, asps_vdmc_ext_packing_method equal to 1 specifies that the displacement component samples are packed in descending order. asps_vdmc_ext_packing_method equal to 2 specifies that displacement component are packed in regions described in displacement_region_information( ) syntax element.
dri_region_id[i] Specifies the Region Id for LOD with Index i.
dri_region_u[i] specifies the horizontal position of the top left sample of the region, that contains displacement values for the LOD with index i, in unit of luma samples in the displacement video component frame.
dri_region_v[i] specifies the vertical position of top left sample of the region, that contains displacement values for the LOD with index i, in unit of luma samples in the displacement video component frame.
dri_region_width[i] plus 1 specifies the width of the region, that contains displacement values for the LOD with index i, in unit of luma samples in the displacement video component frame.
dri_region_height[i] plus 1 specifies the height of the region, that contains displacement values for the LOD with index i, in unit of luma samples in the displacement video component frame.
In one embodiment displacement_region_information( ) syntax structure contains explicit syntax element that maps index i to LOD.
In one embodiment displacement_region_information( ) syntax can be placed in VPS, AFPS, or any other signalling information that is sent in or along V-DMC bitstream.
In one embodiment (example), the packing of displacements into a video frame is performed as illustrated on
In one embodiment, inside each region, padding data is required to fill all pixels. This padding data may be inferred by the V-DMC decoder based on the number of vertices of the corresponding LOD.
In one embodiment, it may be an advantage for faster processing and decoding to explicitly signal per frame and the length of the used padding for each LoD region in displacement_region_information( ) syntax structure.
When the video codec used is VVC, each subpicture is encoded as a VVC subpicture (independently decodable). In case the number and size of tiles or subpictures do not fill the full frame area, additional auxiliary tiles or subpictures may be added (called the Substitute subpictures (1402, 1404, 1502, 1504)) as illustrated on
When the video codec used is HEVC, each region is encoded as a motion-constraint tile sets (MCTS) that prevents motion compensation to refer to reference picture areas that are not within the same region.
In case multiple submeshes are enabled by the encoder, it is an advantage to be able to filter submeshes and their respective LODs adaptively. In this case, tiles or subpictures are defined per submesh and per LOD as well. In the asps, the displacement_region_information( ) is signaled for each submesh id.
In one embodiment, the lifting transform parameters are constrained to using skipUpdate=true, so that no displacement value drift appears on lower LODs in case higher LOD(s) are not decoded or not processed. In one embodiment the signaling of mapping information can be done using file format level functionality. A dedicated box syntax structure may be added to indicate the relation of various displacement LODs and the regions of the video frames.
In one embodiment the signaling of mapping information can be done using the supplemental-enhancement-information (SEI) of the video.
Attributes and displacements may be encoded in separate substreams with their own subpictures or packed together (the video frames contain both displacement and attribute pixels) with corresponding subpictures. Accordingly, the examples described herein are not limited to displacement (video) data, as they also relate to attribute (video) data. Optionally the attribute video data may be encoded/decoded using subpictures following an LOD structure into respective attribute video substreams. Optionally the attribute video data and the displacement video data may be packed into the same video frames and subpicture frames (thus frames containing displacement and attribute pixels).
In an embodiment, asve_lod_patches_enable_flag equal to 1 specifies that patches contains data per LoD. asve_lod_patches_enable_flag equal to 0 specifies that a patch contains data of all lods.
In an embodiment, mdu_lod_idx[tileID][patchIdx] indicates the LOD index for which data in the current patch with index patchIdx, in the current atlas tile with tile ID equal to tileID applies to. When mdu_lod_idx[tileID][patchIdx] is not present, its value is inferred to be equal to 0.
In an embodiment, LoD extraction information SEI payload syntax is defined as follows:
The LoD extraction information SEI message semantics (such as that signaled with the LoD extraction information SEI payload syntax element above) includes the following:
lei_extractable_unit_type_idx indicates the type of extractable units within the video bitstream. lei_extractable_unit_type_idx equal to 0 specifies that the displacement video is encoded with MCTS (Motion-Constrained Tile Sets), as specified in ISO/IEC 23008-2. lei_extractable_unit_type_idx equal to 1 specifies that the displacement video is encoded with subpicture, as specified in ISO/IEC 23090-3.
lei_number_of_submesh_minus1 plus 1 indicates the number of submeshes.
lei_submesh_id[i] indicates the submesh ID of the i-th submesh.
lei_subdivision_iteration_count[i][j] indicates the number of iterations used for the subdivision of the i-th submesh.
lei_mcts_idx[i][j] indicates the identifier of the MCTS, corresponding to the region where the j-th refinement level displacement data of the i-th submesh in the video bitstream. lei_mcts_idx[i][j] shall be the same as idx_of_mcts_in_set[m][n][1] in ISO/IEC 23008-2: Annex D.3.43. idx_of_mcts_in_set[m][n][1] specifies the MCTS index of the 1-th MCTS in the n-th MCTS set that is associated with the m-th extraction information set.
lei_subpicture_idx[i][j] indicates the identifier of the subpicture, corresponding to the region where the j-th refinement level displacement data of the i-th submesh in the video bitstream.
This examples described herein are applicable to ISO/IEC 23090-29, V-DMC, and V-DMC scalability. As such, these embodiments described herein may be included in the ISO/IEC 23090-29 standard, including the herein described stream flow.
In some examples, the transmitting apparatus 1680 and the receiving apparatus 1682 are at least partially within a common apparatus, and for example are located within a common housing 1650. In other examples the transmitting apparatus 1680 and the receiving apparatus 1682 are at least partially not within a common apparatus and have at least partially different housings. Therefore in some examples, the encoder 1630 and the decoder 1640 are at least partially within a common apparatus, and for example are located within a common housing 1650. For example the common apparatus comprising the encoder 1630 and decoder 1640 implements a codec. In other examples the encoder 1630 and the decoder 1640 are at least partially not within a common apparatus and have at least partially different housings, but when together still implement a codec.
3D media from the capture (e.g., volumetric capture) at a viewpoint 1612 of the scene 1615, which includes a person 1613) is converted via projection to a series of 2D representations with occupancy, geometry, and attributes. Additional atlas information is also included in the bitstream to enable inverse reconstruction. For decoding, the received bitstream 1610 is separated into its components with atlas information; occupancy, geometry, and attribute 2D representations. A 3D reconstruction is performed to reconstruct the scene 1615-1 created looking at the viewpoint 1612-1 with a “reconstructed” person 1613-1. The “−1” are used to indicate that these are reconstructions of the original. As indicated at 1620, the decoder 1640 performs an action or actions based on the received signaling.
The apparatus 1700 includes a display and/or I/O interface 1708, which includes user interface (UI) circuitry and elements, that may be used to display features or a status of the methods described herein (e.g., as one of the methods is being performed or at a subsequent time), or to receive input from a user such as with using a keypad, camera, touchscreen, touch area, microphone, biometric recognition, one or more sensors, etc. The apparatus 1700 includes one or more communication e.g. network (N/W) interfaces (I/F(s)) 1710. The communication I/F(s) 1710 may be wired and/or wireless and communicate over the Internet/other network(s) via any communication technique including via one or more links 1724. The communication I/F(s) 1710 may comprise one or more transmitters or one or more receivers.
The transceiver 1716 comprises one or more transmitters 1718 and one or more receivers 1720. The transceiver 1716 and/or communication I/F(s) 1710 may comprise standard well-known components such as an amplifier, filter, frequency-converter, (de) modulator, and encoder/decoder circuitries and one or more antennas, such as antennas 1714 used for communication over wireless link 1726.
The control module 1706 of the apparatus 1700 comprises one of or both parts 1706-1 and/or 1706-2, which may be implemented in a number of ways. The control module 1706 may be implemented in hardware as control module 1706-1, such as being implemented as part of the one or more processors 1702. The control module 1706-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array. In another example, the control module 1706 may be implemented as control module 1706-2, which is implemented as computer program code (having corresponding instructions) 1705 and is executed by the one or more processors 1702. For instance, the one or more memories 1704 store instructions that, when executed by the one or more processors 1702, cause the apparatus 1700 to perform one or more of the operations as described herein. Furthermore, the one or more processors 1702, one or more memories 1704, and example algorithms (e.g., as flowcharts and/or signaling diagrams), encoded as instructions, programs, or code, are means for causing performance of the operations described herein.
The apparatus 1700 to implement the functionality of control 1706 may correspond to any of the apparatuses depicted herein. Alternatively, apparatus 1700 and its elements may not correspond to any of the other apparatuses depicted herein, as apparatus 1700 may be part of a self-organizing/optimizing network (SON) node or other node, such as a node in a cloud.
The apparatus 1700 may also be distributed throughout the network including within and between apparatus 1700 and any network element (such as a base station and/or terminal device and/or user equipment).
Interface 1712 enables data communication and signaling between the various items of apparatus 1700, as shown in
The following examples are provided and described herein.
Example 1. An apparatus including: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive mesh data; determine a number of subdivisions for the mesh data, wherein a subdivision of the subdivisions comprises a level of detail level, wherein the number of subdivisions comprises a subdivision count; calculate displacement values for the subdivisions; pack the displacement values for the subdivisions in a rectangular region of a video frame; encode the video frame; and signal in or along a video-based dynamic mesh coding bitstream mapping information indicating a relation between displacement values for level of detail levels and regions of the video frame.
Example 2. The apparatus of example 1, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: configure the video frame and the displacement values for different level of detail levels such that a resolution of the video frame is less than or equal to a maximum resolution allowed by a profile tier level of a target video decoder.
Example 3. The apparatus of any of examples 1 to 2, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to configure the video frame and the displacement values for different level of detail levels such that at least one or more of the following applies: the video frame does not contain any regions with empty data, or the video frame does not contain any regions with data not used for decoding, or a number of regions of the video frame with empty data is below a threshold number of empty data regions, wherein the threshold number of empty data regions is based on a bitrate overhead that results from including the empty data, or a number of regions of the video frame with data not used for decoding is below a threshold number of empty data regions not used for decoding, wherein the threshold number of empty data regions not used for decoding is based on a bitrate overhead that results from including the empty data regions not used for decoding, or wherein the threshold number of empty data regions not used for decoding is received from a system user based on a manual setting by the system user.
Example 4. The apparatus of any of examples 1 to 3, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: configure the video frame to contain at least one region with empty data; encode the at least one region with empty data with substitutable subpictures or slices or tiles; wherein the substitutable subpictures or slices or tiles are not part of packed regions of the video frame having displacement information.
Example 5. The apparatus of any of examples 1 to 4, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: determine the subdivision count based on input parameters to an encoder, wherein the apparatus comprises the encoder.
Example 6. The apparatus of any of examples 1 to 5, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: sort the displacement values by level of detail level, wherein the sorting follows a vertex traversal order.
Example 7. The apparatus of any of examples 1 to 6, wherein the video frame comprises multiple sub-streams of video-based dynamic mesh coding data, wherein one sub-stream of the multiple sub-streams comprises texture information, and another sub-stream of the multiple sub-streams comprises displacement information.
Example 8. The apparatus of any of examples 1 to 7, wherein displacement data for a level of detail level is mapped to a separate region that is aligned to boundaries of a coding tree unit.
Example 9. The apparatus of example 8, wherein at least one or more of the following applies to the separate region: the separate region comprises a high efficiency video coding slice, or the separate region comprises a rectangular tile, or the separate region comprises a rectangular tile, and the tile comprises a motion constrained tile set such that motion compensation is constrained to refer to a same tile location in reconstructed reference frames, or the separate region comprises a versatile video coding subpicture.
Example 10. The apparatus of any of examples 1 to 9, wherein the mapping information indicating the relation between displacement values for level of detail levels and regions of the video frame signaled in or along the video-based dynamic mesh coding bitstream is provided in an atlas bitstream or an atlas sequence parameter set.
Example 11. The apparatus of any of examples 1 to 10, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: encode the rectangular region of the video frame as one or more video substreams, wherein a video substream of the one or more video substreams comprises a profile, tier, or level of detail level.
Example 12. The apparatus of any of examples 1 to 11, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: determine attribute video data for the subdivisions; and pack the attribute video data for the subdivisions in a rectangular region of a video frame.
Example 13. The apparatus of any of examples 1 to 12, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: encode attribute video data using subpictures into respective attribute video substreams, based on the subdivisions.
Example 14. The apparatus of any of examples 1 to 13, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: encode displacement video data using subpictures into respective displacement video substreams, based on the subdivisions.
Example 15. The apparatus of any of examples 1 to 14, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: pack attribute video data and displacement video data into the same video frames and subpicture frames, wherein the video frames and subpicture frame comprise displacement pixels and attribute pixels.
Example 16. An apparatus including: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive a video-based dynamic mesh coding bitstream comprising mesh data; extract from the video-based dynamic mesh coding bitstream a number of level of detail levels and mapping information indicating a relation between displacement values for the level of detail levels and regions of a video frame; receive level of detail information, wherein the level of detail information comprises a level of detail threshold; extract and decode the displacement values as indicated by the mapping information from a video substream corresponding to level of detail levels that are below or equal to the level of detail threshold; and reconstruct a level of detail level of the mesh data using the extracted and decoded displacement values.
Example 17. The apparatus of example 16, wherein the video frame and the displacement values for different level of detail levels are configured such that a resolution of the video frame is less than or equal to a maximum resolution allowed by a profile tier level of the apparatus, wherein the apparatus comprises a target video decoder.
Example 18. The apparatus of any of examples 16 to 17, wherein the video frame and the displacement values for different level of detail levels are configured such that at least one or more of the following applies: the video frame does not contain any regions with empty data, or the video frame does not contain any regions with data not used for decoding, or a number of regions of the video frame with empty data is below a threshold number of empty data regions, wherein the threshold number of empty data regions is based on a bitrate overhead that results from including the empty data, or a number of regions of the video frame with data not used for decoding is below a threshold number of empty data regions not used for decoding, wherein the threshold number of empty data regions not used for decoding is based on a bitrate overhead that results from including the empty data regions not used for decoding, or wherein the threshold number of empty data regions not used for decoding is based on a manual setting by a system user.
Example 19. The apparatus of any of examples 16 to 18, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: decode substitutable subpictures or slices or tiles from at least one region of the video frame with empty data; wherein the substitutable subpictures or slices or tiles are not part of packed regions of the video frame having displacement information.
Example 20. The apparatus of any of examples 16 to 19, wherein the number of level of detail levels comprises a subdivision count, wherein the subdivision count is based on input parameters to an encoder from which the video-based dynamic mesh coding bitstream is received.
Example 21. The apparatus of any of examples 16 to 20, wherein the displacement values are sorted by level of detail level, wherein the sorting follows a vertex traversal order.
Example 22. The apparatus of any of examples 16 to 21, wherein the video frame comprises multiple sub-streams of video-based dynamic mesh coding data, wherein one sub-stream of the multiple sub-streams comprises texture information, and another sub-stream of the multiple sub-streams comprises displacement information.
Example 23. The apparatus of any of examples 16 to 22, wherein displacement data for a level of detail level is mapped to a separate region that is aligned to boundaries of a coding tree unit.
Example 24. The apparatus of example 23, wherein at least one or more of the following applies to the separate region: the separate region comprises a high efficiency video coding slice, or the separate region comprises a rectangular tile, or the separate region comprises a rectangular tile, and the tile comprises a motion constrained tile set such that motion compensation is constrained to refer to a same tile location in reconstructed reference frames, or the separate region comprises a versatile video coding subpicture.
Example 25. The apparatus of any of examples 16 to 24, wherein the mapping information indicating the relation between displacement values for the level of detail levels and regions of the video frame extracted from the video-based dynamic mesh coding bitstream is provided in an atlas bitstream or an atlas sequence parameter set.
Example 26. The apparatus of any of examples 16 to 25, wherein the level of detail information is received from a rendering engine or other application entity.
Example 27. The apparatus of any of examples 16 to 26, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: decode attribute video data from the video-based dynamic mesh coding bitstream; and reconstruct a level of detail level of the mesh data using the decoded attribute video data.
Example 28. The apparatus of any of examples 16 to 27, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: decode attribute video data from subpictures from respective attribute video substreams, based on the level of detail levels.
Example 29. The apparatus of any of examples 16 to 28, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: decode displacement video data from subpictures from respective displacement video substreams, based on the level of detail levels.
Example 30. The apparatus of any of examples 16 to 29, wherein the instructions, when executed by the at least one processor, cause the apparatus at least to: decode attribute video data and displacement video data from the same video frames and subpicture frames, wherein the video frames and subpicture frame comprise displacement pixels and attribute pixels.
Example 31. A method including: receiving mesh data; determining a number of subdivisions for the mesh data, wherein a subdivision of the subdivisions comprises a level of detail level, wherein the number of subdivisions comprises a subdivision count; calculating displacement values for the subdivisions; packing the displacement values for the subdivisions in a rectangular region of a video frame; encoding the video frame; and signaling in or along a video-based dynamic mesh coding bitstream mapping information indicating a relation between displacement values for level of detail levels and regions of the video frame.
Example 32. A method including: receiving a video-based dynamic mesh coding bitstream comprising mesh data; extracting from the video-based dynamic mesh coding bitstream a number of level of detail levels and mapping information indicating a relation between displacement values for the level of detail levels and regions of a video frame; receiving level of detail information, wherein the level of detail information comprises a level of detail threshold; extracting and decoding the displacement values as indicated by the mapping information from a video substream corresponding to level of detail levels that are below or equal to the level of detail threshold; and reconstructing a level of detail level of the mesh data using the extracted and decoded displacement values.
Example 33. An apparatus including: means for receiving mesh data; means for determining a number of subdivisions for the mesh data, wherein a subdivision of the subdivisions comprises a level of detail level, wherein the number of subdivisions comprises a subdivision count; means for calculating displacement values for the subdivisions; means for packing the displacement values for the subdivisions in a rectangular region of a video frame; means for encoding the video frame; and means for signaling in or along a video-based dynamic mesh coding bitstream mapping information indicating a relation between displacement values for level of detail levels and regions of the video frame.
Example 34. An apparatus including: means for receiving a video-based dynamic mesh coding bitstream comprising mesh data; means for extracting from the video-based dynamic mesh coding bitstream a number of level of detail levels and mapping information indicating a relation between displacement values for the level of detail levels and regions of a video frame; means for receiving level of detail information, wherein the level of detail information comprises a level of detail threshold; means for extracting and decoding the displacement values as indicated by the mapping information from a video substream corresponding to level of detail levels that are below or equal to the level of detail threshold; and means for reconstructing a level of detail level of the mesh data using the extracted and decoded displacement values.
Example 35. A non-transitory computer readable medium including program instructions stored thereon for performing at least the following: receiving mesh data; determining a number of subdivisions for the mesh data, wherein a subdivision of the subdivisions comprises a level of detail level, wherein the number of subdivisions comprises a subdivision count; calculating displacement values for the subdivisions; packing the displacement values for the subdivisions in a rectangular region of a video frame; encoding the video frame; and signaling in or along a video-based dynamic mesh coding bitstream mapping information indicating a relation between displacement values for level of detail levels and regions of the video frame.
Example 36. A non-transitory computer readable medium including program instructions stored thereon for performing at least the following: receiving a video-based dynamic mesh coding bitstream comprising mesh data; extracting from the video-based dynamic mesh coding bitstream a number of level of detail levels and mapping information indicating a relation between displacement values for the level of detail levels and regions of a video frame; receiving level of detail information, wherein the level of detail information comprises a level of detail threshold; extracting and decoding the displacement values as indicated by the mapping information from a video substream corresponding to level of detail levels that are below or equal to the level of detail threshold; and reconstructing a level of detail level of the mesh data using the extracted and decoded displacement values.
Example 37. The apparatus of any of examples 1 to 15, wherein the apparatus is further caused to: signal an atlas sequence parameter set video-based dynamic mesh coding extension flag; wherein a value of the atlas sequence parameter set video-based dynamic mesh coding extension flag being equal to 1 specifies that patches contain data per level of detail; wherein a value of the atlas sequence parameter set video-based dynamic mesh coding extension flag being equal to 0 specifies that a patch contains data of all level of details.
Example 38. The apparatus of any of examples 1 to 15 or 37, wherein the apparatus is further caused to: signal a mesh patch data unit level of detail index per tile identifier and per patch index; wherein the mesh patch data unit level of detail index indicates a level of detail index that data in a current patch with an index corresponding to the patch index and in a current atlas tile with an identifier corresponding to the tile tiler identifier applies to.
Example 39. The apparatus of example 38, wherein when the signaling of the mesh patch data unit level of detail index per tile identifier and per patch index is not present, a value of the mesh patch data unit level of detail index per tile identifier and per patch index is inferred to be equal to zero.
Example 40. The apparatus of any of examples 1 to 15 or 37 to 39, wherein the apparatus is further caused to signal level of detail extraction information with a level of detail extraction information payload supplemental enhancement information syntax element, wherein the level of detail extraction information signaled with the level of detail extraction information payload supplemental enhancement information syntax element indicates: an extractable unit type identifier that indicates a type of extractable units within the video-based dynamic mesh coding bitstream; a number of one or more submeshes; a submesh identifier per submesh; a subdivision iteration count per submesh; a motion constrained tile set identifier corresponding to a region, wherein the motion constrained tile set identifier is indicated per submesh, and per displacement data refinement level or subdivision iteration; and a subpicture identifier corresponding to a region, wherein the subpicture identifier is indicated per submesh, and per displacement data refinement level or subdivision iteration.
Example 41. The apparatus of example 40, wherein: a value of 0 for the extractable unit type index specifies that displacement video is encoded with motion-constrained tile sets, a value of 1 for the extractable unit type index specifies that displacement video is encoded with a subpicture, and the motion constrained tile set identifier is the same as: a motion constrained tile set index of a motion constrained tile set corresponding to a first index in a motion constrained tile set corresponding to a second index, wherein the motion constrained tile set corresponding to the second index is associated with an extraction information set corresponding to a third index.
Example 42. The apparatus of any of examples 16 to 30, wherein the apparatus is further caused to: receive signaling of an atlas sequence parameter set video-based dynamic mesh coding extension flag; wherein a value of the atlas sequence parameter set video-based dynamic mesh coding extension flag being equal to 1 specifies that patches contain data per level of detail; wherein a value of the atlas sequence parameter set video-based dynamic mesh coding extension flag being equal to 0 specifies that a patch contains data of all level of details; and reconstruct the level of detail level of the mesh data based on the signaling of the atlas sequence parameter set video-based dynamic mesh coding extension flag.
Example 43. The apparatus of any of examples 16 to 30 or 42, wherein the apparatus is further caused to: receiving signaling of a mesh patch data unit level of detail index per tile identifier and per patch index; wherein the mesh patch data unit level of detail index indicates a level of detail index that data in a current patch with an index corresponding to the patch index and in a current atlas tile with an identifier corresponding to the tile tiler identifier applies to; and reconstruct the level of detail level of the mesh data based on the signaling of the mesh patch data unit level of detail index per tile identifier and per patch index.
Example 44. The apparatus of example 43, wherein the apparatus is further caused to: infer a value of the mesh patch data unit level of detail index that is signaled per tile identifier and per patch index to be equal to zero, when the signaling of the mesh patch data unit level of detail index that is signaled per tile identifier and per patch index is not present.
Example 45. The apparatus of any of examples 16 to 30 or 42 to 44, wherein the apparatus is further caused to: receive signaling of level of detail extraction information with a level of detail extraction information payload supplemental enhancement information syntax element, wherein the level of detail extraction information signaled with the level of detail extraction information payload supplemental enhancement information syntax element indicates: an extractable unit type identifier that indicates a type of extractable units within the video-based dynamic mesh coding bitstream; a number of one or more submeshes; a submesh identifier per submesh; a subdivision iteration count per submesh; a motion constrained tile set identifier corresponding to a region, wherein the motion constrained tile set identifier is indicated per submesh, and per displacement data refinement level or subdivision iteration; and a subpicture identifier corresponding to a region, wherein the subpicture identifier is indicated per submesh, and per displacement data refinement level or subdivision iteration; and reconstruct the level of detail level of the mesh data based on the signaling of the level of detail extraction information with a level of detail extraction information payload supplemental enhancement information syntax element.
Example 46. The apparatus of example 45, wherein: a value of 0 for the extractable unit type index specifies that displacement video is encoded with motion-constrained tile sets, a value of 1 for the extractable unit type index specifies that displacement video is encoded with a subpicture, and the motion constrained tile set identifier is the same as: a motion constrained tile set index of a motion constrained tile set corresponding to a first index in a motion constrained tile set corresponding to a second index, wherein the motion constrained tile set corresponding to the second index is associated with an extraction information set corresponding to a third index.
References to a ‘computer’, ‘processor’, etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGAs), application specific circuits (ASICs), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device such as instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device, etc.
As used herein, the term ‘circuitry’, ‘circuit’ and variants may refer to any of the following: (a) hardware circuit implementations, such as implementations in analog and/or digital circuitry, and (b) combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and one or more memories that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even when the software or firmware is not physically present. As a further example, as used herein, the term ‘circuitry’ would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware. The term ‘circuitry’ would also cover, for example and when applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device. Circuitry or circuit may also be used to mean a function or a process used to execute a method.
The term “non-transitory,” as used herein, is a limitation of the medium itself (i.e., tangible, not a signal) as opposed to a limitation on data storage persistency (e.g., RAM vs. ROM).
It should be understood that the foregoing description is only illustrative. Various alternatives and modifications may be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination(s). In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.
The following acronyms and abbreviations that may be found in the specification and/or the drawing figures are defined as follows (the abbreviations may be appended with each other or with other characters using e.g. a hyphen or dash (-), and may be case insensitive):
This application claims priority to U.S. Provisional Application No. 63/541,366, filed Sep. 29, 2023, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63541366 | Sep 2023 | US |