Embodiments of the present disclosure relate to a 3D data coding apparatus and a 3D data decoding apparatus.
In order to efficiently transmit or record 3D data, there are a 3D data coding apparatus that projects 3D data into a two-dimensional image, performs coding with a video coding scheme, and generates coded data, and a 3D data decoding apparatus that decodes a two-dimensional image from the coded data and reconstructs the 3D data.
As specific 3D data coding schemes, for example, there are MPEG-I Volumetric Video-based Coding (V3C) and Video-based Point Cloud Compression (V-PCC) (NPL 1). In V3C, in addition to a point cloud including positions of points and attribute information, a multi-view video can be coded and decoded. Existing video coding schemes include, for example, H. 266/Versatile Video Coding (VVC), H. 265/High Efficiency Video Coding (HEVC), and the like.
In the 3D data coding scheme of NPL 1, a geometry (depth image) and an attribute (color image) constituting 3D data (point cloud) are coded and decoded using the video coding schemes such as HEVC and VVC. In the 3D data coding scheme of NPL 2, a geometry (base mesh, mesh displacement (mesh displacement array, mesh displacement image)) and an attribute (texture mapping image) constituting 3D data (mesh) are coded and decoded using a vertex coding scheme such as Draco and the video coding schemes such as HEVC and VVC. There are experimental results that, in a case that the 3D data (mesh) is coded and decoded using the video coding schemes disclosed in NPL 2, replacing the video coding schemes with an arithmetic coding scheme to perform the coding of the mesh displacement enhances performance (NPL 3). In a case that the mesh displacement is arithmetically coded, there is a problem in that performance thereof depends on an initial value of context in arithmetic coding.
The present disclosure has an object to enhance coding efficiency of a mesh displacement and code and decode 3D data with high quality in coding and decoding of the 3D data using a video coding scheme.
In order to solve the problem described above, a 3D data decoding apparatus according to an aspect of the present disclosure is a 3D data decoding apparatus for decoding coded data. The 3D data decoding apparatus includes an arithmetic decoder configured to arithmetically decode mesh displacement from the coded data, a context selection unit configured to select a context in the arithmetic decoding, and a context initialization unit configured to set an initial value of the context. In the context initialization unit, a context initialization parameter for initializing the context is decoded from the coded data.
In order to solve the problem described above, a 3D data coding apparatus according to an aspect of the present disclosure is a 3D data coding apparatus for coding 3D data. The 3D data coding apparatus includes an arithmetic coder configured to arithmetically code mesh displacement, a context selection unit configured to select a context in the arithmetic coding, and a context initialization unit configured to set an initial value of the context. In the context initialization unit, a context initialization parameter for initializing the context is coded into coded data.
According to an aspect of the present disclosure, coding efficiency of a mesh displacement can be enhanced, and 3D data can be coded and decoded with high quality.
Hereinafter, an embodiment of the present disclosure will be described with reference to the drawings.
The 3D data transmission system 1 is a system in which a coding stream obtained by coding target 3D data is transmitted, the transmitted coding stream is decoded, and thus 3D data is displayed. The 3D data transmission system 1 includes a 3D data coding apparatus 11, a network 21, a 3D data decoding apparatus 31, and a 3D data display apparatus 41.
3D data T is input to the 3D data coding apparatus 11.
The network 21 transmits a coding stream Te generated by the 3D data coding apparatus 11 to the 3D data decoding apparatus 31. The network 21 is the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or a combination thereof. The network 21 is not necessarily limited to a bidirectional communication network, and may be a unidirectional communication network configured to transmit broadcast waves of digital terrestrial television broadcasting, satellite broadcasting of the like. The network 21 may be substituted by a storage medium in which the coding stream Te is recorded, such as a Digital Versatile Disc (DVD: trade name) or a Blu-ray Disc (BD: trade name).
The 3D data decoding apparatus 31 decodes each of the coding streams Te transmitted from the network 21 and generates one or multiple pieces of decoded 3D data Td.
The 3D data display apparatus 41 displays all or part of the one or multiple pieces of decoded 3D data Td generated by the 3D data decoding apparatus 31. For example, the 3D data display apparatus 41 includes a display device such as a liquid crystal display and an organic Electro-Luminescence (EL) display. Forms of the display include a stationary type, a mobile type, an HMD type, and the like. In a case that the 3D data decoding apparatus 31 has high processing capability, an image having high image quality is displayed, and in a case that the apparatus has lower processing capability, an image which does not require high processing capability and display capability is displayed.
Structure of Coding Stream Te
Prior to the detailed description of the 3D data coding apparatus 11 and the 3D data decoding apparatus 31 according to the present embodiment, a data structure of the coding stream Te generated by the 3D data coding apparatus 11 and decoded by the 3D data decoding apparatus 31 will be described.
Coded Video Sequence
In the coded video sequence, a set of data referred to by the 3D data decoding apparatus 31 to decode the sequence SEQ to be processed is defined. As illustrated in the coded video sequence of
In the video parameter set VPS, in a video including multiple layers, a set of coding parameters common to multiple videos and a set of coding parameters associated with the multiple layers and an individual layer included in the video are defined.
In the sequence parameter set SPS, a set of coding parameters referred to by the 3D data decoding apparatus 31 to decode a target sequence is defined. For example, a width and a height of a picture are defined. Note that multiple SPSs may exist. In that case, any of the multiple SPSs is selected from the PPS.
In the picture parameter set PPS, a set of coding parameters referred to by the 3D data decoding apparatus 31 to decode each picture in a target sequence is defined. For example, a reference value (pic_init_qp_minus26) of a quantization step size used for decoding of a picture and a flag (weighted_pred_flag) indicating an application of a weighted prediction are included. Note that multiple PPSs may exist. In that case, any of the multiple PPSs is selected from each picture in a target sequence.
Coded Picture
In the coded picture, a set of data referred to by the 3D data decoding apparatus 31 to decode the picture PICT to be processed is defined. As illustrated in the coded picture of
Coding Slice
In the coding slice, a set of data referred to by the 3D data decoding apparatus 31 to decode the slice S to be processed is defined. As illustrated in the coding slice of
The slice header includes a coding parameter group referred to by the 3D data decoding apparatus 31 to determine a decoding method for a target slice. Slice type indication information (slice_type) indicating a slice type is one example of a coding parameter included in the slice header.
Coding Slice Data
In the coding slice data, a set of data referred to by the 3D data decoding apparatus 31 to decode the slice data to be processed is defined. The slice data includes CTUs as illustrated in the coding slice header in
Coding Tree Unit
In the coding tree unit of
Coding Unit
As illustrated in the coding unit of
There are two types of predictions (prediction modes), which are intra prediction and inter prediction. The intra prediction refers to a prediction in an identical picture, and the inter prediction refers to prediction processing performed between different pictures (for example, between pictures of different display times, and between pictures of different layer images).
Transform and quantization processing is performed in units of CU, but the quantization transform coefficient may be subjected to entropy coding in units of subblock such as 4×4.
Configuration of 3D Data Decoding Apparatus According to First Embodiment
The de-multiplexing unit 301 inputs coded data multiplexed using a byte stream format, an ISO Base Media File Format (ISOBMFF), or the like, de-multiplexes the coded data, and outputs an atlas information coding stream, a base mesh coding stream, a mesh displacement coding stream, and an attribute image coding stream.
The atlas information decoder 302 inputs the atlas information coding stream output from the de-multiplexing unit 301 and decodes atlas information.
The base mesh decoder 303 decodes the base mesh coding stream coded using vertex coding (a 3D data compression coding scheme, such as Draco), and outputs a base mesh. The base mesh will be described below.
The mesh displacement decoder 305 decodes the mesh displacement coding stream and outputs a mesh displacement.
The mesh reconstruction unit 307 inputs the base mesh and the mesh displacement and reconstructs a mesh in a 3D space.
The attribute decoder 306 decodes the attribute image coding stream coded using VVC, HEVC, or the like, and outputs an attribute image of a YCbCr format. The attribute image may be a texture image developed along UV axes (a texture mapping image converted using a UV atlas method).
The color space conversion processing unit 308 performs color space conversion on the attribute image from the YCbCr format to an RGB format. Note that the attribute image coding stream coded as the RGB format may be decoded and the color space conversion may be omitted.
Decoding of Base Mesh
The mesh decoder 3031 decodes the intra-coded base mesh coding stream and outputs the base mesh. As a coding scheme, Draco or the like is used.
The motion information decoder 3032 decodes the inter-coded base mesh coding stream, and outputs motion information for each vertex of a reference mesh to be described below. As a coding scheme, entropy coding such as arithmetic coding is used.
The mesh motion compensation unit 3033 performs motion compensation on each vertex of the reference mesh input from the reference mesh memory 3034, based on the motion information, and outputs a mesh with its motion being compensated.
The reference mesh memory 3034 is a memory that stores the decoded mesh, so as to be referred to in later decoding processing.
Decoding of Mesh Displacement
The atlas information decoder 302 of
Context-Adaptive Binary Arithmetic Coding
The arithmetic decoder 3051, the de-binarization unit 3052, the context selection unit 3056, and the context initialization unit 3057 use a decoding method referred to as Context-adaptive binary arithmetic coding (CABAC). These may be collectively referred to as a CABAC decoder. In CABAC, all of CABAC states are initialized at a start of a segment. In the CABAC decoder, each bit of a binary string (Bin String) corresponding to a syntax element is decoded. In a case that context is used, a context index ctxInc is derived for each bit of the syntax element, the bit is decoded using the context, and the CABAC state of the used context is updated. Bits that do not use the context are decoded at an equal probability (EP, bypass), and derivation of ctxInc and the CABAC state are omitted. The context is a variable area for storing a probability (state) of CABAC, and is identified with a value (0, 1, 2, . . . ) of ctxIdx. A case that 0 and 1 are invariably equal probabilities, i.e., 0.5, 0.5, is referred to as EqualProbability (EP) or bypass. In this case, the state need not be stored for a specific syntax element, and thus context is not used. ctxIdx is derived with reference to ctxInc.
Coordinate System
For coordinate systems of the mesh displacement (three-dimensional vector), the following two types of coordinate systems are used. Cartesian coordinate system: An orthogonal coordinate system defined in common in the entire 3D space. An (X, Y, Z) coordinate system. An orthogonal coordinate system in which directions do not change at the same time (within the same frame, within the same tile). Local coordinate system: An orthogonal coordinate system defined for each region or for each vertex in the 3D space. An orthogonal coordinate system in which directions may change at the same time (within the same frame, within the same tile). A normal (D), tangent (U), bi-tangent (V) coordinate system. In other words, this is an orthogonal coordinate system including a first axis (D) indicated by a normal vector n_vec in a certain vertex (a plane including the certain vertex), and a second axis (U) and a third axis (V) indicated by two tangent vectors t_vec and b_vec orthogonal to the normal vector n_vec. n_vec, t_vec, and b_vec are each a three-dimensional vector. The (D, U, V) coordinate system may be referred to as an (n, t, b) coordinate system.
Decoding and Derivation of Control Parameters at Sequence Level
Here, control parameters used in the mesh displacement decoder 305 will be described.
asps_vdmc_ext_displacement_coordinate_system: Coordinate system conversion information indicating the coordinate system of the mesh displacement. In a case that a value thereof is equal to a prescribed first value (for example, 0), this indicates the Cartesian coordinate system. In a case that the value is equal to another second value (for example, 1), this indicates the local coordinate system.
asps_vdmc_ext_displacement_context_init_type: Context initialization timing information. This indicates the initialization timing of the context in arithmetic decoding of the mesh displacement. In a case that a value thereof is equal to a first value (for example, 0), the context is initialized for each GoF. In a case that the value is equal to a second value (for example, 1), the context is initialized for each segment (for example, the segment is a frame or a slice constituting the frame). Specifically, in a case that asps_vdmc_ext_displacement_context_init_type==1 (or asps_vdmc_ext_displacement_context_init_type==0 and frameIdxInGoF==0), the mesh displacement decoder 305 initializes the context by using a variable frameIdxInGoF indicating a frame position in GoF.
In another configuration, in a case that asps_vdmc_ext_displacement_context_init_type is the first value, the context is initialized only in a case of a start of a segment and a random access point, and in a case that asps_vdmc_ext_displacement_context_init_type is the second value, the context is invariably initialized at a start of a segment. Note that whether it is the random access point may be determined based on whether nal_unit_type is a specific type. For example, determination may be performed based on whether or not nal_unit_type is equal to NAL_GIDR_W_RADL, NAL_GBLA_N_LP, or NAL_GCRA, or a range from NAL_GBLA_W_LP to NAL_GBLA_N_LP. Whether nal_unit_type is from NAL_GBLA_W_LP to NAL_GBLA_N_LP, or from NAL_GIDR_W_RADL to NAL_GIDR_N_LP may be determined as the random access point.
According to the above-described configuration of performing initialization for each GoF being timing of performing initialization for each random access, deterioration in probability prediction accuracy due to initialization can be minimized, and therefore there is an effect of enhancing coding efficiency. In addition, according to the configuration of selecting initialization for each frame using a flag, there is also an effect of enhancing error tolerance. asps_vdmc_ext_displacement_context_init_index: This indicates an index of a context initial value table in arithmetic decoding of the mesh displacement.
Decoding and Derivation of Control Parameters at Picture/Frame Level
The mesh displacement decoder 305 derives the coordinate system conversion parameter displacementCoordinateSystem as follows.
Alternatively, in a case that the syntax element appears at multiple levels, the coordinate system conversion parameter displacementCoordinateSystem may be derived by overwriting with a lower level value.
Derivation of Context Initialization Parameters
The mesh displacement decoder 305 derives context initialization parameters displacementContextInitType and displacementContextInitIndex as follows.
Alternatively, in a case that the syntax element appears at multiple levels, the context initialization parameters displacementContextInitType and displacementContextInitIndex may be derived by overwriting with a lower level value.
The mesh displacement decoder 305 may derive the context initialization parameters displacementContextInitType and displacementContextInitIndex every time the mesh displacement decoder 305 decodes each of the context initialization parameters. The gating flag may be, for example, afps_vdmc_ext_displacement_context_init_enable_flag. The context initialization parameters may be, for example,
gt0_flag is a flag indicating whether or not the absolute value of a mesh displacement coefficient is greater than 0. gt1_flag is a flag indicating whether or not the absolute value of the mesh displacement coefficient is greater than 1. rem_prefix is a prefix part of a Golomb code of the mesh displacement coefficient. rem_suffix is a suffix part of the Golomb code of the mesh displacement coefficient.
Operation of Mesh Displacement Decoder
The arithmetic decoder 3051 decodes the arithmetically coded mesh displacement coding stream, and outputs a binary signal. The binary signal may be a k-th order exponential Golomb code (k-th order Exp-Golomb-code).
The de-binarization unit 3052 decodes a quantized mesh displacement Qdisp being a multivalue signal from the binary signal.
The context selection unit 3056 includes a memory for storing contexts, and updates various contexts used for arithmetic decoding of the mesh displacement depending on a state. In arithmetic decoding of each coefficient of the mesh displacement, the following different context arrays may be used depending on a frame type ft (for example, 0: intra frame, 1: inter frame), a level lod (level of detail) of mesh subdivision, and a dimension dim of a mesh displacement vector. The context at least includes a variable indicating an occurrence probability of the binary signal.
Here, NUM_FT is the number of frame types, and NUM_FT=2. NUM_LOD is a maximum number of levels of mesh subdivision, and NUM_LOD=4. NUM_DIM is the number of dimensions of the mesh displacement vector, and NUM_DIM=3.
ctxSign[NUM_FT][NUM_LOD][NUM_DIM] is a context array used to decode the syntax element sign_flag. The arithmetic decoder 3051 decodes sign_flag of the displacement of the frame type ft, the level lod, and the dimension dim of the mesh displacement vector by using a value of ctxSign[ft][lod][dim]. Although an example of using the context is described herein, bypass may be used without using the context. In the configuration of using bypass, there is an effect in which the memory of the contexts and the amount of processing are reduced.
ctxCoeffGtN[NUM_FT][NUM_LOD][2][NUM_DIM] is a context array used to decode the syntax elements gt0_flag and gt1_flag. The arithmetic decoder 3051 decodes gt0_flag and gt1_flag of the displacement of the frame type ft, the level lod, and the dimension dim of the mesh displacement vector by using values of ctxCoeffGtN[ft][lod][0][dim] and ctxCoeffGtN[ft][lod][1][dim].
ctxCoeffRemPrefix[NUM_FT][NUM_LOD][NUM_DIM][7] is a context array used to decode the syntax element rem_prefix. The arithmetic decoder 3051 decodes rem_prefix[ft][lod][dim] of the displacement of the frame type ft, the level lod, and the dimension dim of the mesh displacement vector by using a value of ctxCoeffRemPrefix[ft][lod][dim][binIdx]. Here, binIdx indicates a bin position of binary of rem_prefix.
ctxCoeffRemSuffix[NUM_FT][NUM_LOD][NUM_DIM][7] is a context array used to decode the syntax element rem_suffix. The arithmetic decoder 3051 decodes rem_suffix[ft][lod][dim] of the displacement of the frame type ft, the level lod, and the dimension dim of the mesh displacement vector by using a value of ctxCoeffRemPrefix[ft][lod][dim][binIdx].
The context initialization unit 3057 initializes the context (occurrence probability of the binary signal), based on the parameters (here, the context initialization parameters displacementContextInitType and displacementContextInitIndex) decoded from the coded data.
In a case that the value of displacementContextInitType is equal to a first value (for example, 0), the context is initialized for each GoF. In a case that the value of displacementContextInitType is equal to a second value (for example, 1), the context is initialized for each frame.
As described above, the initialization timing and the initial value of the context can be set depending on the context initialization parameters. In a case that the context is initialized for each frame, there is no dependency on the context between frames, and therefore random access to any frame can be easily performed, and coding efficiency can be enhanced. In a case that the context is initialized for each GoF, coding efficiency can be further enhanced in comparison to a case that the context is initialized for each frame.
An example of the context initial value table will be described below. Note that each value “value” in an initialization table is such a value that 16 bits after the decimal point of an occurrence probability p are expressed in hexadecimal numbers, and p=value/(1<<16). For example, value=0x8000 indicates occurrence probability p=0.5.
Example of Initialization Table for sign_flag
Example of Initialization Table for gt0_flag and gt1_flag
Example of Initialization Table for rem_prefix
Example of Initialization Table for rem_suffix
The context initialization unit 3057 initializes the initial value of the context used to code and decode binary of sign_flag, gt0_flag, gt1_flag, rem_prefix, and rem_suffix by using values of LUT_ctxSign[ft][lod][dim], LUT_ctxCoeffGtN[ft][lod][ ][dim], LUT_ctxCoeffRemPrefix[ft][lod][dim][ ], and LUT_ctxCoeffRemSuffix[ft][lod][dim][ ], respectively, depending on the frame type ft, the level lod of mesh subdivision, and the dimension dim of the mesh displacement vector. Note that, in a case that the level lod of mesh subdivision exceeds NUM_LOD−1, initialization is performed by using a default value (for example, value=0x8000, p=0.5).
The initial value of the context is switched depending on the features ft, lod, and dim of the mesh displacement, and therefore coding efficiency can be enhanced.
Configuration of Switching Context Initialization Depending on displacementContextInitIndex
In addition, initialization may be performed by using displacementContextInitIndex decoded from coded data. The context initialization unit 3057 switches the initial value of the context used to code and decode binary of sign_flag, gt0_flag, gt1_flag, rem_prefix, and rem_suffix depending on the context initialization index displacementContextInitIndex, the frame type ft, the level lod of mesh subdivision, and the dimension dim of the mesh displacement vector. Specifically, initialization is performed by using values of LUT_ctxSign[displacementContextInitIndex][ft][lod][dim], LUT_ctxCoeffGtN[displacementContextInitIndex][ft][lod][ ][dim], LUT_ctxCoeffRemPrefix[displacementContextInitIndex][ft][lod][dim][ ], and LUT_ctxCoeffRemSuffix[displacementContextInitIndex][ft][lod][dim][ ]. With an encoder appropriately selecting displacementContextInitIndex using an optimal context initial value and transmitting displacementContextInitIndex as syntax, context more appropriate for contents can be derived, and therefore coding efficiency can be enhanced.
Configuration of Switching Context Initialization Depending on displacementCoordinateSystem
The context initialization unit 3057 may switch the context initialization method, depending on whether the coordinate system is the Cartesian coordinate system or the local coordinate system.
For example, with sys=displacementCoordinateSystem, the context initialization unit 3057 sets the initial value of the context used to code and decode binary of sign_flag, gt0_flag, gt1_flag, rem_prefix, and rem_suffix. For example, initialization is performed by using values of LUT_ctxSign[sys][displacementContextInitIndex][ft][lod][dim], LUT_ctxCoeffGtN[sys][displacementContextInitIndex][ft][lod][ ][dim], LUT_ctxCoeffRemPrefix[sys][displacementContextInitIndex][ft][lod][dim][ ], and LUT_ctxCoeffRemSuffix[sys][displacementContextInitIndex][ft][lod][dim][ ], respectively, depending on the context initialization index displacementContextInitIndex, the coordinate system sys, the frame type ft, the level lod of mesh subdivision, and the dimension dim of the mesh displacement vector.
According to the configuration described above, the context initial value is changed depending on the syntax element displacementCoordinateSystem indicating the coordinate system in coded data, an appropriate initial value can be derived depending on the coordinate system, and therefore coding efficiency can be enhanced. The local coordinate system (n, t, b) exhibits a greater change of displacement corresponding to a small value of dim, in comparison to the Cartesian coordinate system (x, y, z). Thus, it is appropriate that the initial value be set taking such change into consideration.
The (x, y, z) coordinate system may allow use of the same context initial value regardless of the value of dim. In a case that displacementCoordinateSystem indicates the Cartesian coordinate system, the amount of memory of the context initial value table can be reduced by referring to the context initial value table with dim=0.
The inverse quantization unit 3053 performs inverse quantization, based on a quantization scale value iscale, and derives a mesh displacement Tdisp after transform (for example, wavelet transform). Tdisp may be the Cartesian coordinate system or the local coordinate system. iscale is a value derived from a quantization parameter of each component of the mesh displacement image.
Here, iscaleOffset=1<<(iscaleShift−1). iscaleShift may be a constant determined in advance, or a value obtained by performing coding at a sequence level, a picture/frame level, a tile/patch level, or the like and decoding from coded data may be used.
The inverse transform processing unit 3054 performs inverse transform g (for example, inverse wavelet transform) and derives a mesh displacement d.
The coordinate system conversion processing unit 3055 converts the mesh displacement (the coordinate system of the mesh displacement) into the Cartesian coordinate system, based on the value of the coordinate system conversion parameter displacementCoordinateSystem. Specifically, in a case that displacementCoordinateSystem=1, conversion is performed from a displacement of the local coordinate system to a displacement of the Cartesian coordinate system. Here, d is a three-dimensional vector indicating the mesh displacement before coordinate system conversion. disp is a three-dimensional vector indicating the mesh displacement after coordinate system conversion, and is the Cartesian coordinate system. n_vec, t_vec, and b_vec are three-dimensional vectors (of the Cartesian coordinate system) corresponding to respective axes of the local coordinate system of a target region or a target vertex.
The derivation method shown in the above vector multiplication is individually expressed with a scalar as follows.
Note that the same variable name may be assigned before and after conversion with disp=d, and the value of d may be updated with coordinate conversion.
Alternatively, the following configuration may be employed.
Here, n_vec2, t_vec2, and b_vec2 are three-dimensional vectors (of the Cartesian coordinate system) corresponding to respective axes of the local coordinate system of a neighboring region.
Alternatively, the following configuration may be employed.
Here, n_vec3, t_vec3, and b_vec3 are three-dimensional vectors (of the Cartesian coordinate system) corresponding to respective axes of the local coordinate system of a target region with a reduced variation. For example, vectors of the coordinate system used for decoding are derived from the previous coordinate system and the current coordinate system as follows.
Here, for example, wShift=2, 3, or 4, WT=1<<wShift, and w=1 . . . WT−1. For example, in a case that w=3 and wShift=3,
As in the following configuration, a configuration allowing selection depending on the value of the parameter displacementCoordinateSystem decoded from coded data may be employed.
Reconstruction of Mesh
The mesh subdivision unit 3071 subdivides the base mesh output from the base mesh decoder 303 and generates a subdivided mesh.
v12=(v1+v2)/2
v13=(v1+v3)/2
v23=(v2+v3)/2
Alternatively, the following may be employed.
v12=(v1+v2+1)>>1
v13=(v1+v3+1)>>1
v23=(v2+v3+1)>>1
The mesh deformation unit 3072 inputs the subdivided mesh and the mesh displacement, adds mesh displacements d12, d13, and d23, to thereby generate and output a deformed mesh (
v12′=v12+d12
v13′=v13+d13
v23′=v23+d23
Note that the following may be employed: d12=disp[0][ ], d23=disp[1][ ], and d23=disp[3][ ].
Configuration of 3D Data Coding Apparatus According to First Embodiment
The atlas information coder 101 codes the atlas information.
The base mesh coder 103 codes the base mesh, and outputs a base mesh coding stream. As a coding scheme, Draco or the like is used.
The base mesh decoder 104 is similar to the base mesh decoder 303, and thus description thereof will be omitted.
The mesh displacement update unit 106 adjusts the mesh displacement, based on the (original) base mesh and the decoded base mesh, and outputs an updated mesh displacement.
The mesh displacement coder 107 codes the updated mesh displacement, and outputs a mesh displacement coding stream.
The mesh displacement decoder 108 is similar to the mesh displacement decoder 305, and thus description thereof will be omitted.
The mesh reconstruction unit 109 is similar to the mesh reconstruction unit 307, and thus description thereof will be omitted.
The attribute transfer unit 110 inputs the (original) mesh and the reconfigured mesh output from the mesh reconstruction unit 109 (mesh deformation unit 3072), and attribute image, and outputs an attribute image optimized for the reconfigured mesh.
The padding unit 111 inputs the optimized attribute image, and performs padding processing in a region with empty pixel values.
The color space conversion processing unit 112 performs color space conversion from the RGB format to the YCbCr format.
The attribute coder 113 codes the attribute image of the YCbCr format output from the color space conversion processing unit 112, and outputs an attribute image coding stream. As a coding scheme, VVC, HEVC, or the like is used.
The multiplexing unit 114 multiplexes the atlas information coding stream, the base mesh coding stream, the mesh displacement coding stream, and the attribute image coding stream, and outputs these as coded data. As a multiplexing scheme, a byte stream format, an ISOBMFF, or the like is used.
Operation of Mesh Separation Unit
The mesh separation unit 115 generates the base mesh and the mesh displacement from the mesh.
The mesh decimation unit 1151 decimates a part of the vertices from the mesh, to thereby generate a base mesh.
Similarly to the mesh subdivision unit 3071, the mesh subdivision unit 1152 subdivides the base mesh to generate a subdivided mesh (
v4′=(v1+v2)/2
v5′=(v1+v3)/2
v6′=(v2+v3)/2
The mesh displacement derivation unit 1153 derives the displacements d4, d5, and d6 of the vertices v4, v5, and v6 for the vertices v4′, v5′, and v6′ as the mesh displacements, based on the mesh and the subdivided mesh, and outputs the mesh displacements (
d4=v4−v4′
d5=v5−v5′
d6=v6−v6
Coding of Base Mesh
The mesh coder 1031 has a base mesh intra-coding function, and intra-codes the base mesh and outputs a base mesh coding stream. As a coding scheme, Draco or the like is used.
The mesh decoder 1032 is similar to the mesh decoder 3031, and thus description thereof will be omitted.
The motion information coder 1033 has a base mesh inter-coding function, and inter-codes the base mesh and outputs a base mesh coding stream. As a coding scheme, entropy coding such as arithmetic coding is used.
The motion information decoder 1034 is similar to the motion information decoder 3032, and thus description thereof will be omitted.
The mesh motion compensation unit 1035 is similar to the mesh motion compensation unit 3033, and thus description thereof will be omitted.
The reference mesh memory 1036 is similar to the reference mesh memory 3034, and thus description thereof will be omitted.
Coding of Mesh Displacement
The coordinate system conversion processing unit 1071 converts the coordinate system of the mesh displacement from the Cartesian coordinate system to a coordinate system (for example, the local coordinate system) for coding the displacement, based on the value of the coordinate system conversion parameter displacementCoordinateSystem. Here, disp is a three-dimensional vector indicating the mesh displacement before coordinate system conversion, d is a three-dimensional vector indicating the mesh displacement after coordinate system conversion, and n_vec, t_vec, and b_vec are three-dimensional vectors (of the Cartesian coordinate system) indicating respective axes of the local coordinate system.
The mesh displacement coder 107 may update the value of displacementCoordinateSystem at a sequence level. Alternatively, the mesh displacement coder 107 may update the value at a picture/frame level. The initial value is 0, which indicates the Cartesian coordinate system.
In a case that displacementCoordinateSystem is updated at a sequence level, the syntax of the configuration of
In a case that displacementCoordinateSystem is changed at a picture/frame level, the syntax of the configuration of
The transform processing unit 1072 performs transform f (for example, wavelet transform) to derive the mesh displacement Tdisp after transform.
Tdisp[0][ ]=f(d[0][ ])
Tdisp[1][ ]=f(d[1][ ])
Tdisp[2][ ]=f(d[2][ ])
The quantization unit 1073 performs quantization, based on a quantization scale value scale derived from a quantization parameter of each component of the mesh displacement, and derives the mesh displacement Qdisp after quantization.
Alternatively, the scale value may be approximated with the exponent of 2, and Qdisp may be derived according to the following equations.
The binarization unit 1074 codes the quantized mesh displacement Qdisp being a multivalue signal into a binary signal. The binary signal may be a k-th order exponential Golomb code.
The arithmetic coder 1075 arithmetically codes the binary signal and outputs a mesh displacement coding stream.
The context selection unit 1076 is similar to the context selection unit 3056, and thus description thereof will be omitted.
The context initialization unit 1077 is similar to the context initialization unit 3057, and thus description thereof will be omitted.
The mesh displacement coder 107 may update the value of displacementContextInitType at a sequence level. Alternatively, the mesh displacement coder 107 may update the value at a picture/frame level.
In a case that context initialization parameters are updated at a sequence level, the syntax of the configuration of
In a case that the context initialization parameters are changed at a picture/frame level, the syntax of the configuration of
As described above, the initialization timing and the initial value of the context can be set depending on the context initialization parameters. In a case that the context is initialized for each frame, there is no dependency on the context between frames, and therefore random access to any frame can be easily performed, and coding efficiency can be enhanced. In a case that the context is initialized for each GoF, coding efficiency can be further enhanced in comparison to a case that the context is initialized for each frame. The initial value of the context is switched depending on the features of the mesh displacement, and therefore coding efficiency can be enhanced.
The embodiment of the present disclosure has been described in detail above referring to the drawings, but the specific configuration is not limited to the above embodiment and various amendments can be made to a design that fall within the scope that does not depart from the gist of the present disclosure.
The above-mentioned 3D data coding apparatus 11 and 3D data decoding apparatus 31 can be utilized being installed to various apparatuses performing transmission, reception, recording, and regeneration of 3D data. Note that, the 3D data may be natural 3D data imaged by a camera or the like, or may be artificial 3D data (including CG and GUI) generated by a computer or the like.
The embodiment of the present disclosure is not limited to the above-described embodiment, and various modifications are possible within the scope of the claims. That is, an embodiment obtained by combining technical means modified appropriately within the scope of the claims is also included in the technical scope of the present disclosure.
The embodiments of the present disclosure can be preferably applied to the 3D data decoding apparatus that decodes coded data in which 3D data is coded, and the 3D data coding apparatus that generates coded data in which 3D data is coded. The embodiments of the present disclosure can be preferably applied to a data structure of coded data generated by the 3D data coding apparatus and referred to by the 3D data decoding apparatus.
Number | Date | Country | Kind |
---|---|---|---|
2022-201271 | Dec 2022 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
20190174133 | Abe | Jun 2019 | A1 |
20200302658 | Simon | Sep 2020 | A1 |
20210287430 | Li | Sep 2021 | A1 |
20220343582 | Anton Dominguez | Oct 2022 | A1 |
20230033616 | Han | Feb 2023 | A1 |
20230343010 | Kwatra | Oct 2023 | A1 |
Number | Date | Country |
---|---|---|
2021210548 | Oct 2021 | WO |
Entry |
---|
“Information technology—Coded Representation of Immersive Media—Part 5: Visual Volumetric Video-based Coding (V3C) and Video-based Point Cloud Compression (V-PCC)”, ISO/IEC 23090-5:2021(2E), ISO/IEC JTC 1/SC 29/WG 07, Secretariat: AFNOR (France). |
Apple Inc., “[V-CG] Apple's Dynamic Mesh Coding CfP Response”, ISO/IEC JTC 1/SC 29/WG 7 m59281, Online—Apr. 2022. |
Chao Huang et al., “Arithmetic Coding of Displacements for Subdivision-based Mesh Compression”, ISO/IEC JTC 1/SC 29/WG 7 m60300, Online—Jul. 2022. |
Nishimura et al., “[V-DMC] [new] Block-Based Context-Adaptive Arithmetic Coding of Displacements”, ISO/IEC JTC 1/SC 29/WG 7, No. m61065, Oct. 24, 2022-Oct. 28, 2022, MAINZ, Oct. 21, 2022 (Oct. 21, 2022). |
Sharp Corporation, “[V-DMC] [EE4.7-related] Improvement on displacement arithmetic coding”, ISO/IEC JTC 1/SC 29/WG 7, No. m61808, Jan. 16, 2023-Jan. 20, 2023, Jan. 6, 2023 (Jan. 6, 2023). |
Number | Date | Country | |
---|---|---|---|
20240205407 A1 | Jun 2024 | US |