The technical field of the disclosure is video compression, more specifically video coding of a 3D mesh consisting of points in a 3D space with associated attributes (e.g. connectivity, texture associated with the mesh, 2D coordinates of points in the texture image).
One of the approaches (MPEG 3DG/PCC's Test Model for Category 2, called TMC2) used in the state of the art to achieve good compression efficiency when coding point clouds consists of projecting multiple geometry and texture/attribute information onto the same position (pixel) of a 2D image; i.e., coding several layers of information per input point cloud. Typically, two layers are considered.
At least one of the present embodiments generally relates to a method or an apparatus in the context of the compression of images and videos of a 3D mesh with associated attributes.
According to a first aspect, there is provided a method. The method comprises steps for decoding one or more video bitstreams to obtain depth maps and attribute maps; further decoding a bitstream containing lists of border points to obtain segments of border points; generating a plurality of lists of adjacent patches based on syntax information; obtaining segments of border points of patches from said border points coded in said bitstream and from said lists of adjacent patches; generating occupancy maps of said patches using segments of border points of patches; building meshes of said patches based on occupancy, depth, and attribute maps; and, filling of inter-patch spaces between reconstructed patches and said segments of border points to obtain reconstructed models.
According to a second aspect, there is provided a method. The method comprises steps for correcting geometry and topology of a scaled three-dimensional mesh model; grouping triangles of the mesh into connected components; refining said connected components; extracting patch border segments of patches; creating occupancy maps of said patches from said patch border segments; rasterizing meshes of the connected component of said patches to create depth maps of the patches and depth video frames; reconstructing three-dimensional meshes of said patches from depth maps and said occupancy maps; filling inter-patch spaces based on said patch border segments and border edges of the reconstructed meshes of the patches; extracting all vertices of the mesh based on said reconstructed meshes to create a reconstructed point cloud; coloring the points of the reconstructed point cloud, and, using the colored reconstructed point cloud to create attribute video frames.
According to another aspect, there is provided an apparatus. The apparatus comprises a processor. The processor can be configured to implement the general aspects by executing any of the described methods.
These and other aspects, features and advantages of the general aspects will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.
One or more embodiments rest upon concepts introduced in other works under the name “mesh coding using V-PCC” and proposed as EE2.6 of the PCC Ad hoc Group.
The following sections briefly introduce background concepts on:
One of the approaches (MPEG 3DG/PCC's Test Model for Category 2, called TMC2) used in the state of the art to achieve good compression efficiency when coding point clouds consists of projecting multiple geometry and texture/attribute information onto the same position (pixel) of a 2D image; i.e., coding several layers of information per input point cloud. Typically, two layers are considered. This means that several 2D geometry and/or 2D texture/attribute images are generated per input point cloud. In the case of TMC2, two depth (for geometry) and color (for texture a.k.a. attribute) images are coded per input point cloud.
One embodiment proposes to extend the V-PCC (MPEG-I part 5) codec of the MPEG 3D Graphics (3DG) Ad hoc Group on Point Cloud Compression to enable to address meshes compression and to enable to address dynamic textured meshes compression. It supplements an idea that proposed a new coding scheme to code 3D meshes without occupancy map and proposes new methods to build the occupancy maps of the patches, to reconstruct the patches and to fill the inter-patch spaces.
The described embodiments propose a novel approach to build the occupancy maps of the patches, to reconstruct the patches and to fill the inter-patch space based on the border segments of the patches, defining the lists of 3D points of the borders of the patches, and on the patch information (patch parameters, depth map, attribute map).
The methods to build the occupancy map and to fill the inter-patch spaces triangulate the areas defined:
These processes are complex and required to extract the border edges from the reconstructed patches. To make these processes more efficient and allow a parallel reconstruction of the patches, these processes have been updated to allow in only one pass the reconstruction of the patches and the filling of the border of the patches.
One embodiment relates to the V-PCC codec, whose structure is shown in
“Interest in free-viewpoint video (FVV) has soared with recent advances in both capture technology and consumer virtual/augmented reality hardware, e.g., Microsoft HoloLens. As real-time view tracking becomes more accurate and pervasive, a new class of immersive viewing experiences becomes possible on a broad scale, demanding similarly immersive content.” Cite from Microsoft (Collet, et al., 2015).
Patches are represented by sets of parameters including the lists of border segments defining the 3D points of the borders of the patches.
The animated sequence that is captured can then be re-played from any virtual viewpoint with six degrees of freedom (6 dof). In order to provide such capabilities, image/video, point cloud, and textured mesh approaches exist.
The Image/Video-based approach will store a set of video streams plus additional metadata and perform a warping or any other reprojection to produce the image from the virtual viewpoint at playback. This solution requires heavy bandwidth and introduces many artifacts.
The point cloud approach will reconstruct an animated 3D point cloud from the set of input animated images, thus leading to a more compact 3D model representation. The animated point cloud can then be projected on the planes of a volume wrapping the animated point cloud, and the projected points (a.k.a. patches) encoded into a set of 2D coded video streams (e.g. using HEVC, AVC, VVC, . . . ) for its delivery. This is the solution developed in the MPEG V-PCC (ISO/IEC JTC1/SC29 WG11, w19332, V-PCC codec description, April 2020) standard, which leads to very good results. However, the nature of the model is very limited in terms of spatial resolution, and some artifacts can appear, such as holes on the surface for closeup views.
The border segments of a patch are used to:
These processes will be updated by the embodiments described in the rest of this document, are:
These two processes were carried out by triangulating the 2D polygons extracted from the 3D border segments and the borders of the reconstructed patches in our earlier approach.
With the same triangulation process used to create the occupancy maps, the inter-patch spaces are filled by triangulating the 2D polygons built based on the border segments and on the border edges of the reconstructed patches. This process result is shown in
An updated embodiment describing the building of the occupancy map is described.
An improved patch reconstruction process is described in a later section.
The process for the filling of the inter-patch spaces presented in an earlier approach has been removed because this process is now directly carried out during the patch reconstruction process.
The textured mesh approach will reconstruct an animated textured mesh (see
Textured mesh and point cloud solutions are both relevant, and even image/video solutions under some specific conditions. The modality (mesh or point cloud) is usually selected by the author according to the nature of the model. Sparse elements such as hairs or foliage will get better rendering using point clouds. Surfaces such as skin and clothes will get better rendering using meshes. Both solutions are thus good to optimize. Also note that these solutions can be combined to represent different parts of a model.
One or more embodiments propose a novel approach to leverage V-PCC coder (that is projection based) to encode dynamic textured meshes. We propose a complete chain to project meshes into patches that are encoded using V-PCC video-based schemes. We also present a solution to prevent the need for encoding occupancy maps as requested in standard V-PCC chains. Instead, we propose edge contours encoding. Some additional solutions such as fast implicit re-meshing which prevents the encoding of topology as well as some filtering methods to enhance reconstructed meshes are also presented.
In an Exploratory Experiment (EE) tracked in V-PCC, a solution was proposed to combine the use of V-PCC with the TFAN codec as a vertex connectivity codec.
Encoding and decoding architectures of such a proposition are respectively depicted in
Basically, at the encoder side, an input mesh is decomposed (demultiplexed) into two sets:
As the TFAN encoder generates data whose order is different from that of the input mesh, a reordering process must be applied with respect to the V-PCC output ordering.
In a first version, it was proposed to reorder on both sides and the transmission of such a reordering table (
The resulting encodings are multiplexed into a so-called extended V-PCC bitstream.
It should be noted that when meshes are considered sparse (and the attributes are contained separately in an image file), this approach proposes a pre-processing: downscaling, transforming the texture images to vertex colors, and then voxelizing them to 10 bits prior to cleaning non-manifold and degenerative faces resulting from the voxelization procedure.
At the decoder side, dual operations of the encoder are performed so that the mesh is reconstructed.
Although this proposal leverages and combines two existing codecs, the following disadvantages are pointed out:
Below is proposed a flowchart of this proposal (encoder and decoder, respectively, in
At the encoder side, a module 10, taking as input an input mesh, outputs connectivity information to a module 20 TFAN encoder and vertex/point coordinates and attributes to a vertex reordering module 30, whose intent is to align the placement of vertices/points with regards to TFAN and V-PCC point coding order, prior to processing the reordered vertices with a V-PCC encoder module 40. Eventually, the TFAN and V-PCC encoders' outputs are wrapped in an output bitstream, possibly compliant with an extended version of a V-PCC encoder.
At the decoder side, a module 60 parses the V-PCC mesh-extended bitstream so that a module 70 decodes the coded connectivity by a TFAN decoder while a module 80 decodes the attributes and coordinates of the associated vertices. Eventually, a module 90 combines all the decoded information to create an output mesh.
Typically, the above-described technology allows to losslessly compress mesh objects by a ratio of approximately 7 (e.g., 150 bpv->23 bpv) with respect to the original (uncompressed) mesh model.
The main intent behind the general aspects described herein is twofold:
The current version of V-PCC codes information representative of a point cloud, i.e., attributes, occupancy map and geometry information in one so-called atlas (“a collection Modules 3600 and 4300 are removed in a proposed embodiment and new processes are added
The V-PCC encoder and decoder schemes are modified as follows:
Rather than sending the 2D occupancy map videos in the bitstream, the solution proposes to store in the bitstream:
According to the segmentation process, each triangle of the mesh is assigned to a patch and each triangle has a patch index value that indicates which patch it belongs to.
The algorithm to compute the segments of the border points of each patch and at the same time the list of the adjacent patches is described below.
For each triangle T of patch index pi(T):
For each list of edges between two patches p0 and p1:
This process is allowed because the triangle and the edge have been oriented clockwise in the first stage of the process.
After this process, we have Q lists of oriented points (Pi, Pj), containing the points of the border of the patches between the patches Pi and Pj. The pairs (Pi, Pj) have Pi<Pj, and Pj could be equal to −1 if the border of the patch Pi is not linked with any other patch.
These lists of border points could be used to extract all the border patches of one patch Pk, by concatenating all the lists of points where the index Pk is present in the pair (Pi, Pj). If Pj is equal to Pk, the list of points must be inversed to get the points in the clockwise order corresponding to the patch.
The set of pairs (Pi, Pj) could also be used to extract the lists of adjacent patches. The next example shows the list of adjacent patches:
Transfer of the border point segments and of the list of adjacent patches
The border point segments, and the list of adjacent patches are used by the encoder and by the decoder, and these data must be transmitted in the V-PCC bitstream.
The order of the points must be preserved, and the decoder must rebuild the list of points of each of the adjacent patches (i,j).
According to the order of the patches defined in the lists of adjacent patches in ascending order, the points of the border segments are concatenated into only one list.
To allow us to detect where a list of consecutive segments ends, the last point of each such list is duplicated.
The lists of border points may then be coded (for example, with Draco) as quantized point clouds, or any other encoder that will preserve the order of the points and their multiplicity. This encoder may be lossless or lossy.
The corresponding bitstream is stored in the V-PCC bitstream in a V3C Unit. The bitstream is stored in a V3C unit of type V3C_OVD corresponding to the occupancy video data, but another V3C unit could be used or specially defined for this.
The decoding process gets the corresponding V3C unit, decodes the data with the appropriate decoder (e.g., Draco), and obtains the list of border points containing the encoded points in the same order with the duplicate points.
According to the list of adjacent patches in ascending order, the points of the list are added to the corresponding border list points (Pi, Pj). Each detected duplicate point is used to know that the next point is the starting point of a new segment, and in this case the next pair (Pi, Pj) is fetched from the lists of the adjacent patches in ascending order.
This process is lossless, and the decoded lists of border points are the same as the encoded ones.
The list of adjacent patches is required on the decoder side to reconstruct the border point segments and to reconstruct the patches, and these data must be transmitted.
This list is by patch and stores the indices of the neighboring patches. If the current frame is represented by n patches, for each patch of index I in [0;n−1], we have a list of adjacent patches: {j, k, l, m, o, . . . , [−1], [−1]}. These lists have some specific properties:
To code this list, we could create an intermediate list containing the delta values and code the delta list:
On the decoder side, the opposite process can be executed to rebuild the list:
After this process, we need to code in the V3C bitstreams the delta values of the adjacent patch lists.
We must update the V3C syntax defined in w19579_ISO_IEC_FDIS_23090-5 with the following syntax elements:
Based on the border segments of the patches and on the lists of the adjacent patches, it is possible to build the occupancy map of a patch by computing the intersection between the border segments of the patch and the horizontal lines in the occupancy map.
For each horizontal line of the occupancy map, we can compute the intersection points with the border segments of the patch. The list of the intersection points can be ordered according to the values of u (see
A greedy algorithm can process the list of points and for each point pi, inverse the value of a flag marking whether the intersection after the point is inside or ouside the patch, and if the intersection [pi,pi+1] is in the patch, set the occupancy map pixels in this intersection to true.
Additionally, to the information of occupation stored in the occupancy map, the proposed algorithm, that builds the occupancy map, stores for each occupied point a link to the intersection points {B0, B1, . . . } between the border segments and the horizontal/vertical lines. Each border point of the occupancy map stores the nearest intersection points on the border segments in both u and v directions.
The intersection points are also added in the oriented list of the border segment points noted: {S0, B0, B1, . . . Bn, S1, Bn+1, Bn+2, . . . Bn+m, S2 . . . }, where Si are the points of the border segments of the patch and Bi the intersection points. This list is named S′ and stored in memory for later use. The S′ list is not coded in the bitstream, like the list of the border segments of the patches as presented in [1], but will be reconstructed, like the occupancy maps, by the decoding process.
According to the occupancy maps, the decoded geometry video, the border segments and the intersection points of the border segments (stored in S′), we can reconstruct the 3D mesh corresponding to each patch and directly fill the inter-patch space to be sure that the reconstructed patch covers all the space defined by the border segments. In this case, the additional process that fills the space between the patches (as proposed in the first approach) is not required because the inter-patch spaces are already covered by the reconstructed patches.
The reconstruction process proposed in the first approach has therefore been updated to directly fill the inter-patch spaces.
To allow a parallel reconstruction of a patch by a GPU shader, each occupied (u,v) pixel of the occupancy map could be reconstructed in parallel, followed by the reconstruction of the mesh corresponding to the square defined by the four points: (u,v), (u+1,v) (u+1,v+1) and (u, v+1), where these points are noted: p0, p1, p2 and p3, as shown in the example in
Each point (u,v) of the occupancy map has a list of intersection points Bi. Only the points that are in the square [p0, p1, p2, p3] must be considered as shown in
According to the numbers of occupied points on the four corners of the square, an ad hoc reconstruction using a specific meshing pattern is performed, as detailed in the sub-sections below.
To limit the number of cases that must be considered and the complexity of the reconstruction process, the border segments are simplified during the encoding process to guarantee that only one point Si is present in each 3D voxel of size one. After this simplification, we know that only one point Si can be present in each square: (p0, p1, p2, p3)
If only one point is occupied, for example p3 as shown in
Note: As shown in
If two points are occupied, for example p3 and p0 as shown in
If the two points have only one intersection point Bi each, we could build the list B0, p0, p3, B1, and the points of S′ between B0 and B1, to triangulate this polygon (
If the two points have two intersection points each, as shown in
If the two occupied points are not adjacent (p0 and p2 or p1 and p3), as shown in
To separate these two cases, we can compute four sub-segments of S′:
The numbers of points in the sub-segments are used to know if the points are inside or outside the border segments, and according to this a specific reconstruction process is performed. In the following formula, we use the notation | . . . | to represent the number of points in the sub-segment.
If |S′B
If |S′B
If three points are occupied, for example p0, p1 and p3 as shown in
If two points have one border point and the other point has no border points (
If one point has two intersection points and the two other points have only one intersection point each, we could use the previously described processes to re-build the mesh (
If the three points have two border points each, we could use the previously described process in One Occupied Point to re-build the mesh (
If four points of the square are occupied, we could study the number of intersection points of each point to reconstruct the mesh.
If no border point is present for any points, the square has no intersection with the border segments and the square must be fully triangulated. According to the modulo of the u and v coordinates, the two created triangles to represent the square are (p0, p1, p3) and (p1, p2, p3) or (p0, p1, p2) and (p0, p2, p3) to guarantee that the complete patch will be meshed by diamond oriented triangles (
If only two occupied points have one intersection point each, we can build the list of the border points of the reconstructed mesh with the points in oriented order: B0, pi, p0, p3, p2, B1, and the points of S′ between B0 and B1. This list could be triangulated to build the mesh (
The other cases can be rebuilt by using the previous reconstruction processes based on one, two or three occupied points described in the previous sections.
If a point has two intersection points, the point is isolated, and the process described in section on one occupied point must be used (
If two adjacent points have only one intersection point each, the process described in section on two occupied points must be used (
If three adjacent points have one intersection point for the first one and the third one and no intersection point for the second one, the process described in section on three occupied points must be used (
The processes described below can work with any kind of meshes in floating point coordinates with the positions of the points of the border segments not aligned with the 2D grid of the patches. If the points of the input meshes are quantized on a regular grid, we can align the 2D grid of the patches with the 3D grid used to quantize the input meshes and in this case the positions of the points of the border segments of the patches (Si) will be on the vertices of the 2D squares (Bj) and the previous described reconstruction process will be simpler.
An evolution of the proposed algorithm can be to store on the intersection points (Bj) the index of the segment which created this intersection point. This information can be used to know if two intersection points come from the same segment of the border segments and in this case directly triangulate the space without build and study the list S′ of the intersection points between them. On the
According to the previous descriptions, the encoding process is summarized below.
The new proposed processes update the points:
And remove processes:
The decoded process is also simplified by the new proposed processes.
The new processes update the points described in
And remove processes:
The segments of border points of the patches are coded losslessly so we can be sure that the positions of these points are correct.
In lossy mode, the geometry maps have been compressed with a video encoder and some disturbances appear on the values of depths. Due to the depth filling processes and to the values of the neighborhood patches stored in the depth maps, we can observe that more disturbances appear on the borders of the patches or on the patch area near the borders of the patches.
Knowing, on the decoded side, the exact positions of the border points, we can correct the depth values of the pixels of the depth map close to the edges of the border segments.
According to the patch parameters, we can project the border segments in the depth map and correct the pixel values of the depth maps with the computed depths during the rasterization of the border edges.
Following this process, the pixel values of the depth map (D) corresponding to the border of the patch are exact and it is interesting to propagate this information inside the patch to also correct the close border pixel values.
For this, a second depth map (R) is used to store the exact depth values of the patches computed with the positions of the 3D border points. The border points are rasterized in this second depth map according to the patch parameters, and the values of the other pixels are set with a mipmap dilation.
This procedure is not limited to this implementation:
For each pixel (u,v) of the depth map D:
The coordinates of the border points must be coded with Draco as explained in an earlier section. For the low bitrate experiments, the size of the Draco bitstreams could be important and so it is valuable to reduce the size of this data.
To reduce the size of the Draco bitstreams, we need to reduce the number of points used to describe the borders of the patches.
Knowing the points of the border segments, we could use several algorithms to clean the segments of border points. In our implementation, we use the Douglas-Peucker algorithm to remove the points of the segments that are less representative.
Based on this process and according to one threshold parameter, we can simplify the 2D segments of border points.
The next table shows the gain in terms of the number of edges when using this kind of simplification.
Another process that can be used to simplify the border segments is to compute the minimum path on a mesh between two vertices based on the Dijkstra algorithm. Based on some parameters, the list of the points of each border segment (L) are simplified based on some very simple rules: keep the first and the last point, remove one point in N, keep the extremum point, . . . ). This shortened list of points is named S. After the first stage, for each point in S, we add in the final list F the shortest path computed with Dijkstra's algorithm between the current point (s(i)) and the next point in S (s(i+1)).
This process creates a smoother border with fewer points in the border segments.
In V3CN-PCC, the value of the depths stored in the depth maps are integer values in the range [0, 2N−1], with N the bit depth of the video, and in the reconstruction process the values of depth are used to reconstruct the geometry. The normal coordinates of the reconstructed points are also integer values. This method is good for encoding point clouds that have been quantized on the discrete grid with all points in integer coordinates, but there are some issues when coding the depth values of the center points of a discrete edge.
If the two vertices v1 and v2 of an edge are quantized, then the coordinates of the points are integer values (x1, y1, z1) and (x2, y2, z2), respectively. The projection of the normal coordinates in the depth maps (for example the Z coordinate) will give for the two points the values of depth: z1 and z2, respectively, which are integer values.
For example, the projection of the center of the edge (v1,v2): v3((x2-x1)/2, (y2-y1)/2, (z2-z1)/2), will give the value of depth: (z2-z1)/2, which is not an integer value. To be coded in the depth maps, this value must be truncated and only the integer part of the depth must be kept.
After this, the reconstructed mesh will be aliased.
To limit the effect of this issue, we propose to scale the value of the depth stored in the depth map according to:
The value of the depth could be linearly scaled according to the formula:
Where N is the bit depth of the video and MaxDepth is the value of the depth. This value could be computed based on all the depth values of the patch but could also be sent in the bitstream to allow more precise reconstruction.
To allow this process, the V3CN-PCC syntax must be updated to indicate that the depth values of the current patch need to be scaled before the reconstruction process. The updated syntax will affect the V3C syntax defined in wu9579_ISO_eE(_FDIS_23090-5 and the following syntax elements:
Note 3: Additionally, to the note 2, a delta value can also be stored in a bitstream in the merge and inter patch and in this case the syntax must be updated as follows:
and the section 9.2.5.5.1 “General decoding process for patch data units coded in inter prediction mode” from wl9579_ISO_IEC_FDIS_23090-5 document, or respectively section 9.2.5.4 “Decoding process for patch data units coded in merge prediction mode” for the merge patch, must be updated to define the copy function as:
Respectively:
According to the previous descriptions, the encoding process is summarized below. The V-PCC encoding process has been updated to use the previously described process.
The segmentation process of the V-PCC encoder has been updated to consider the mesh format, and in particular to use the topology information of the surface given by this new format that was not available with the original point cloud format.
The Mesh V-PCC encoder loads as input a 3D mesh model. The input sequence's bounding box is scaled to the [0, 2N−1] range, where N is the geometry quantization bit depth set by the user (
Before the encoding and segmentation processes, a pre-process (
Some vertices defining the models, due to either the uv texture coordinates, the normal values, or to the color values, could be duplicated in the topology representation. To increase the efficiency of the following processes, all the duplicate vertices (in terms of position coordinates) are merged, to reduce the number of vertices that must be used.
As shown in
The pre-process (
To create a set of 2D patches that could be well encoded in the V-PCC bitstreams, we need to group the triangles of the mesh into a set of connected components (CC, a group of connected triangles) that will be rasterized to create the 2D patches.
It is necessary for the created connected components to have specific properties, to guarantee that all the parts of the mesh model are well described in the patches and that the patches are not too complex to code. The properties are:
To group the triangles, we describe each triangle by a vector S of size numberOfProjection, where numberOfProjection is the number of projection planes used to describe the mesh (6 projection planes by default: {−X, +X, −Y, +Y, −Z, +Z}, or more if 45° projections or extended projection plane modes are used. These modes could be activated with the flag: ptc_45 degree_no_projection_patch_constraint_flag and asps_extended_projection_enabled_flag.) (
For each triangle j and for each projection plane i in [0, numberOfProjection −1], we store in S the normalized signed area of the 2D triangle projected onto the corresponding plane i:
where i is the projection plane index and j is the index of the triangles. The sign of the 2D area is set by comparing the projection plane orientation and the 3D triangle orientation. This can be computed with the dot product between the normal of the current 3D triangle, Normalj, and the normal of the used projection plane, Normali.
Each triangle is represented by a vector S that describes the capability of the triangle to be represented by the projection planes.
To limit the number of created CCs, S vectors of each triangle are averaged according to the neighborhood triangles (
For each 3D cell, we compute the average of S of all the triangles intercepting the cell, noted cellS. The values of cellS are averaged according to the neighborhood cells (all cells at a distance inferior to an input parameter: searchSizeRefineSegmentation). For all triangles in the cell, we update S with cellS. This process could be executed several times according to the input parameter: iterationCountRefineSegmentation.
For each orientation i, we set the score of the orientation i equal to the normalized score of one virtual triangle parallel to the corresponding projection plane. These scores are noted ScoreProjPlane(i).
According to the average scores S, we can group to each CC of index I, each triangle j that has:
During this process, if too small a CC is found (number of triangles inferior to minNumberOfTriangleByCC), the triangles of this CC are unregistered and allowed to be attached to other CCs.
This process creates the first connected component segmentation (
To refine the CC, based on the first CC segmentation, the isolated triangles (not attached by an edge to the current CC) or the triangles that are not attached to any CC are evaluated to see if they can be added to an existing CC.
Each isolated triangle that is not attached to any other triangle of the CC by an edge (two adjacent triangles must share two vertices) is removed from the CC (
For each triangle that is not represented in a CC, we check for each neighboring CC if the triangle can be rasterized, and out of all the possible CCs we attach the triangle to the largest one (
Note: the connected components CC are now named a “patch” in the following section
The patch border segments of the patches are extracted as described in Section 6.1 (
The occupancy maps of the patches can be created from the patch border segments as described in Section 6.3 (
Using the 3D triangles and the occupancy map of each patch, we can rasterize the triangles of the patch in all the areas defined by the occupancy map and store these values in the depth map of the patch.
According to the process defined in an earlier section, the depth values of the patch can be scaled or not.
The 3D meshes of the patches can be reconstructed from depth maps and from the occupancy maps as described in Section 6.4 (
The inter-patch spaces can be filled according to the border patch segments and to the border edges of the reconstructed meshes of the patches, as described in Section 6.5 (
Based on the reconstructed meshes (patch+inter-patch filling), we can extract all the vertices of the mesh to create a reconstructed point cloud (
This reconstructed point cloud has no colors, and we need to color the points. Based on the source mesh model, we can create a dense, colored source point cloud by sampling and quantizing the source mesh. This process creates a source-colored point cloud, noted SourcePC.
Like in V3CN-PCC point cloud encoding, we can use a color transfer process, which colors the reconstructed point cloud based on the source point cloud colors. The colors of the RecPC points are obtained from the closest corresponding points in the SourcePC.
Each point of the point cloud has (u,v) frame coordinates, which define the coordinates of the pixels in the depth map and in the attribute map that will be used to set the pixel values of the attribute frame of each reconstructed colored point (
According to the previous description, the decoding process is summarized below.
The V3CN-PCC bitstream is parsed to get the stored data:
The video bitstreams are decoded to obtain the depth maps and the attribute maps.
The bitstreams containing the lists of border points are decoded to obtain the segments of border points.
The lists of the adjacent patches are rebuilt patch by patch based on the pdu_delta_adjacent_patches, mpdu_delta_adjacent_patches and ipdu_delta_adjacent_patches information.
The lists of the adjacent patches are used to rebuild the segments of border points of patches.
The segments of border points of patches are used to build the occupancy maps of the patches.
Based on the occupancy, depth and attribute maps, the meshes of the patches are built.
The filling of the inter-patch spaces is executed between the reconstructed patches and the segments of the border points to obtain the reconstructed models.
The general aspects described herein have direct application to the V-MESH coding draft. CfP issued in October 2021. These aspects have potential to be adopted in the V-MESH standard as part of the specification.
One embodiment of a method 3500 under the general aspects described here is shown in
Control proceeds from block 3530 to block 3540 for obtaining segments of border points of patches from said border points coded in said bitstream and from said lists of adjacent patches;
Control proceeds from block 3540 to block 3550 for generating occupancy maps of said patches using segments of border points of patches;
Control proceeds from block 3550 to block 3560 for building meshes of said patches based on occupancy, depth, and attribute maps; and,
Control proceeds from block 3560 to block 3570 for filling of inter-patch spaces between reconstructed patches and said segments of border points to obtain reconstructed models
Another embodiment of a method 3600 under the general aspects described here is shown in
Control proceeds from block 3610 to block 3620 for grouping triangles of the mesh into connected components. Control proceeds from block 3620 to block 3630 for refining said connected components. Control proceeds from block 3630 to block 3640 for extracting patch border segments of patches. Control proceeds from block 3640 to block 3650 for creating occupancy maps of said patches from said patch border segments.
Control proceeds from block 3650 to block 3660 for rasterizing meshes of the connected component of said patches to create depth maps of the patches and depth video frames. Control proceeds from block 3660 to block 3670 for reconstructing three-dimensional meshes of said patches from depth maps and said occupancy maps.
Control proceeds from block 3670 to block 3680 for filling inter-patch spaces based on said patch border segments and border edges of the reconstructed meshes of the patches.
Control proceeds from block 3680 to block 3690 for extracting all vertices of the mesh based on said reconstructed meshes to create a reconstructed point cloud. Control proceeds from block 3690 to block 3692 for coloring the points of the reconstructed point cloud. Control proceeds from block 3692 to block 3697 for using the colored reconstructed point cloud to create attribute video frames.
Processor 3710 is also configured to either insert or receive information in a bitstream and, performing either segmentation, compression, analysis, interpolation, representation and understanding of point cloud signals using any of the described aspects.
The embodiments described here include a variety of aspects, including tools, features, embodiments, models, approaches, etc. Many of these aspects are described with specificity and, at least to show the individual characteristics, are often described in a manner that may sound limiting. However, this is for purposes of clarity in description, and does not limit the application or scope of those aspects. Indeed, all of the different aspects can be combined and interchanged to provide further aspects. Moreover, the aspects can be combined and interchanged with aspects described in earlier filings as well.
The aspects described and contemplated in this application can be implemented in many different forms.
Various methods are described herein, and each of the methods comprises one or more steps or actions for achieving the described method. Unless a specific order of steps or actions is required for proper operation of the method, the order and/or use of specific steps and/or actions may be modified or combined.
Various numeric values are used in the present application. The specific values are for example purposes and the aspects described are not limited to these specific values.
The system 1000 includes at least one processor 3710 configured to execute instructions loaded therein for implementing, for example, the various aspects described in this document. Processor 3710 can include embedded memory, input output interface, and various other circuitries as known in the art. The system 3700 includes at least one memory 3720 (e.g., a volatile memory device, and/or a non-volatile memory device).
System 3700 can include a storage device, which can include non-volatile memory and/or volatile memory, including, but not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Read-Only Memory (ROM), programmable Read-Only Memory (PROM), Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), flash, magnetic disk drive, and/or optical disk drive. The storage device can include an internal storage device, an attached storage device (including detachable and non-detachable storage devices), and/or a network accessible storage device, as non-limiting examples.
Program code to be loaded onto processor 3710 to perform the various aspects described in this document can be stored in a storage device and subsequently loaded onto memory 3720 for execution by processor 3710. In accordance with various embodiments, one or more of processor 3710, memory 3720, or a storage device can store one or more of various items during the performance of the processes described in this document.
In some embodiments, memory inside of the processor 3710 and/or the memory 3720 is used to store instructions and to provide working memory for processing that is needed. In other embodiments, however, a memory external to the processing device (for example, the processing device can be either the processor 3710 or an external device) is used for one or more of these functions. The external memory can be the memory 3720 and/or a storage device, for example, a dynamic volatile memory and/or a non-volatile flash memory. In several embodiments, an external non-volatile flash memory is used to store the operating system of, for example, a television.
The embodiments can be carried out by computer software implemented by the processor 3710 or by hardware, or by a combination of hardware and software. As a non-limiting example, the embodiments can be implemented by one or more integrated circuits. The memory 3720 can be of any type appropriate to the technical environment and can be implemented using any appropriate data storage technology, such as optical memory devices, magnetic memory devices, semiconductor-based memory devices, fixed memory, and removable memory, as non-limiting examples. The processor 3710 can be of any type appropriate to the technical environment, and can encompass one or more of microprocessors, general purpose computers, special purpose computers, and processors based on a multi-core architecture, as non-limiting examples.
When a figure is presented as a flow diagram, it should be understood that it also provides a block diagram of a corresponding apparatus. Similarly, when a figure is presented as a block diagram, it should be understood that it also provides a flow diagram of a corresponding method/process.
The implementations and aspects described herein can be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed can also be implemented in other forms (for example, an apparatus or program). An apparatus can be implemented in, for example, appropriate hardware, software, and firmware. The methods can be implemented in, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.
Additionally, this application may refer to “determining” various pieces of information. Determining the information can include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application may refer to “accessing” various pieces of information. Accessing the information can include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information.
Additionally, this application may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information can include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as is clear to one of ordinary skill in this and related arts, for as many items as are listed.
Also, as used herein, the word “signal” refers to, among other things, indicating something to a corresponding decoder. For example, in certain embodiments the encoder signals a particular one of a plurality of transforms, coding modes or flags. In this way, in an embodiment the same transform, parameter, or mode is used at both the encoder side and the decoder side. Thus, for example, an encoder can transmit (explicit signaling) a particular parameter to the decoder so that the decoder can use the same particular parameter. Conversely, if the decoder already has the particular parameter as well as others, then signaling can be used without transmitting (implicit signaling) to simply allow the decoder to know and select the particular parameter. By avoiding transmission of any actual functions, a bit savings is realized in various embodiments. It is to be appreciated that signaling can be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth are used to signal information to a corresponding decoder in various embodiments. While the preceding relates to the verb form of the word “signal”, the word “signal” can also be used herein as a noun.
As will be evident to one of ordinary skill in the art, implementations can produce a variety of signals formatted to carry information that can be, for example, stored or transmitted. The information can include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal can be formatted to carry the bitstream of a described embodiment. Such a signal can be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting can include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries can be, for example, analog or digital information. The signal can be transmitted over a variety of different wired or wireless links, as is known. The signal can be stored on a processor-readable medium.
We describe a number of embodiments, across various claim categories and types. Features of these embodiments can be provided alone or in any combination. Further, embodiments can include one or more of the following features, devices, or aspects, alone or in any combination, across various claim categories and types:
Number | Date | Country | Kind |
---|---|---|---|
22305317.4 | Mar 2022 | EP | regional |
22305826.4 | Jun 2022 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2023/055282 | 3/2/2023 | WO |