The invention relates generally to compressing and representing point clouds, and more particularly to methods and system for predicting and applying transforms to three dimensional blocks of point cloud data for which some positions in a block may not be occupied by a point.
Point Clouds
A point cloud is a set of data points in some coordinate system. In a three-dimensional coordinate (3D) system, the points can represent an external surface of an object. Point clouds can be acquired by a 3D sensor. The sensors measure a large number of points on the surface of the object, and output the point cloud as a data file. The point cloud represents the set of points that the device has measured.
Point clouds are used for many purposes, including 3D models for manufactured parts, and a multitude of visualization, animation, rendering applications.
Typically, the point cloud is a set of points in three-dimensional (3D) space, with attributes associated with each point. For example, a given point can have a specific (x, y, z) coordinate specifying its position, along with one or more attributes associated with that point. Attributes can include data such as color values, motion vectors, surface normal vectors, and connectivity information. The amount of data associated with the point cloud can be massive, in the order of many gigabytes. Therefore, compression is needed to efficiently store or transmit the data associated with the point cloud for practical applications.
Compression
A number of methods are known for compressing images and videos using prediction and transforms. Existing methods for compressing images and videos typically operate on blocks of pixels. Given a block of data for images or video, every position in the block corresponds to a pixel position in the image or video.
However, unlike images or videos, if a 3D point cloud is partitioned into blocks, not all positions in the block are necessarily occupied by a point. Methods such as prediction and transforms used to efficiently compress video and image blocks will not work directly on blocks of 3D point cloud data. Therefore, there is a need for methods to perform prediction and transforms on blocks of 3D point cloud data for which some of the positions in the blocks may not be occupied by point data.
Applications
With the recent advancements and reductions in cost of 3D sensor technologies, there has been an increasingly wide proliferation of 3D applications such as virtual reality, mobile mapping, scanning of historical artifacts, and 3D printing. These applications use different kinds of sensors to acquired data from the real world in three dimensions, producing massive amounts of data. Representing these kinds of data as 3D point clouds has become a practical method for storing and conveying the data independent of how the data are acquired.
Usually, the point cloud is represented a set of coordinates or meshes indicating the position of each point, along with the one or more attributes associated with each point, such as color. Point clouds that include connectivity information among vertices are known as structured or organized point clouds. Point clouds that contain positions without connectivity information are unstructured or unorganized point clouds.
Much of the earlier work in reducing the size of point clouds, primarily structured, has come from computer graphics applications. Many of those applications achieve compression by reducing the number of vertices in triangular or polygonal meshes, for example by fitting surfaces or splines to the meshes. Block-based and hierarchical octree-based approaches can also be used to compress point clouds. For example, octree representations can be used to code structured point clouds or meshes
Significant progress has been made over the past several decades on compressing images and videos. The Joint Photographic Experts Group (JPEG) standard, H.264 or the Moving Picture Experts Group (MPEG-4) Part 10, also known as the Advanced Video Coding (MPEG-4 AVC) standard, and the High Efficiency Video Coding (HEVC) standard are widely used to compress images and video. These coding standards also utilize block-based and/or hierarchical methods for coding pixels. Concepts from these image and video coders have also been used to compress point clouds.
The embodiments of the invention provide method and system for compressing a three-dimensional (3D) point cloud using prediction and transformation of attributes of the 3D point cloud. The point cloud is partitioned into 3D blocks. To compress each block, projections of attributes in previously-coded blocks are used to determine directional predictions of attributes in the block currently being coded.
A modified shape-adaptive transform is used to transform the attributes in the current block or the prediction residual block. The residual block results from determining a difference between the prediction block and the current block. The shape-adaptive transform is capable of operating on blocks that have “missing” elements or “holes.” i.e., not all possible positions in the block are occupied by points.
As defined herein, the term “position” to refer to the location of a point in 3D space, i.e., the (x, y, z) location of a point in space, anywhere in space, not necessarily aligned to a grid. For example, the position can be specified by a floating-point number. The term “element” to refer to data at a position within a uniformly-partitioned block of data, similar in concept to how a matrix contains a grid of elements, or a block of pixels contains a grid of pixels.
Two embodiments for handling holes inside shapes are provided. One embodiment inserts a value into each hole, and another example shifts subsequent data to fill the holes. A decoder, knowing the coordinates of the points, can reverse these processes without the need for signaling additional shape or region information in the compressed bitstream, unlike the prior-art shape adaptive discrete cosine transform (SA-DCT).
The embodiments of the invention provide a method and system for compressing a three-dimensional (3D) point cloud using prediction and transformation of attributes of the 3D point cloud.
Point Cloud Preprocessing and Block Partitioning
Sometimes, point clouds are already arranged in a format that is amenable to block processing. For example, graph transforms can be used for compressing point clouds that are generated by sparse voxelization. The data in these point clouds are already arranged on a 3D grid where each direction has dimensions 2j with j being a level within a voxel hierarchy, and the points in each hierarchy level have integer coordinates.
Partitioning such a point cloud into blocks, where the points are already arranged on a hierarchical integer grid, is straightforward. In general, however, point clouds acquired using other techniques can have floating-point coordinate positions, not necessarily arranged on a grid.
In order to be able to process point clouds without constraints on the acquisition technique, we preprocess the point cloud data so the points are located on a uniform grid. This preprocessing can also serve as a form of down-sampling.
The first step of preprocessing converts 110 the point cloud to an octree representation of voxels, also known as a 3D block of pixels, according to an octree resolution r 102, i.e., a size of edges of the voxels. Given the minimal octree resolution r, the point cloud is organized or converted 110 into octree nodes. If a node contains no points, then the node is removed from the octree. If a node contains one or more points, then the node is further partitioned into smaller nodes. This process continues until the size, or edge length of a leaf node reaches the minimal octree resolution r.
Each leaf node corresponds to a point output by the partitioning step. The position of the output point is set to a geometric center of the leaf node, and the value of any attribute associated with the point is set 120 to an average value of one or more points in the leaf node. This process ensures that the points output by the preprocessing are located on a uniform 3D grid 140 having the resolution r.
When the points are arranged on a uniform grid, the region encompassing the set of points is partitioned 160 into 3D blocks of size k×k×k. A block contains k3 elements, however, many of these elements can be empty, unless the point cloud happens to contain points at every possible position in each block. A block may also have different numbers of elements in each direction; for example, a block can have dimensions k×m×n, hence containing k*m*n elements.
At this stage, the difference between these 3D point cloud blocks and 2D blocks of pixels from conventional image processing becomes apparent. In conventional image processing, all elements of each 2D block correspond to pixel positions present in the image. In other words, all blocks are fully occupied.
However, in the block-based point cloud processing as described herein, the 3D blocks are not necessarily fully occupied. The blocks can contain between 1 and k3 elements. Therefore, procedures, such as intra prediction and block-based transforms, used for conventional image and video coding cannot be directly applied to these 3D blocks. Hence, we provide techniques for accommodating the empty elements.
We define 130 replacement point positions at the center of each octree leaf node. Thus, the preprocessed point cloud 140 has a set of attribute values and a set of point positions. The point cloud can now be partitioned 160 into the array k×k×k blocks 170 according to a block edge size 150.
Intra Prediction of 3D Point Cloud Blocks
Using prediction among blocks to reduce redundancy is a common technique in current coding standards such as H.264/AVC and HEVC. Adjacent decoded blocks are used to predict pixels in the current block, and then the prediction error or residuals are optionally transformed and coded in a bitstream. We describe a block prediction scheme using a low-complexity prediction architecture in which the prediction is obtained from three directions, i.e., (x, y z).
As shown in
As described above, many of the k3 elements in a block may not be occupied by points. Moreover, points within a block may not necessarily be positioned along the edges or boundaries of the block. The intra prediction techniques of H.264/AVC and HEVC use pixels along the boundaries of adjacent blocks to determine predictions for the current block.
As shown in
Here, data from known points is used to determine an interpolation or prediction located at an arbitrary point, in this case, along the boundary between the previous block and the current block.
In our case, suppose the block 202 above the current block contains a set of point positions P={p1, p2, . . . , pN}, with the points having associated attribute values A={a1, a2, . . . , aN}. Given a point position along the boundary pboundary, the prediction takes the form
a
boundary
=f(P,A,Pboundary),
where aboundary is the predicted value of the attribute at the boundary.
We can use a nearest-neighbor interpolation and extrapolation, which reduces complexity and simplifies the handling of degenerate cases in which the adjacent block contains only one or two points, or when all the points in the adjacent block are aligned on a plane perpendicular to the projection plane.
After the attribute values along the boundary plane are estimated, these values are then projected 206 or replicated into the current block parallel to the direction of prediction. This is similar to how prediction values are replicated into the current block for the directional intra prediction used in standards such as H.264/AVC and HEVC.
The projected and replicated values are used to predict attributes for points in the current block. For example, if the adjacent block in the y direction is used for prediction, then the set of points along the boundary pboundary are indexed in two dimensions, i.e. p(x, z), and the attribute for a point the current block pcurr(x, y, z) is predicted using aboundary (x, z) for all values of y.
Transforms for 3D Block Data
After the prediction process, a 3D block containing prediction residuals for each point in the current block, or the current block itself if it yields lower coding distortion, is transformed. As was the case for the prediction process, not all the positions in the block may be occupied by a point. Therefore, the transform is designed so that it will work on these potentially sparse blocks. We consider two types of transforms: a novel variant of a conventional shape-adaptive discrete cosine transform (SA-DCT) designed for 3D point cloud attribute compression, and a 3D graph transform.
Modified Shape-Adaptive DCT
The shape-adaptive DCT (SA-DCT) is a well-known transform designed to code arbitrarily shaped regions in images. A region is defined by a contour, e.g., around a foreground region of an image. All the pixels inside the region are shifted and then transformed in two dimensions using orthogonal DCTs of varying lengths. The contour positions and quantized transform coefficients are then signaled in the bitstream.
For our 3D point cloud compression method, we treat the presence of points in a 3D block as a “region” to be coded, and positions in the block that do not contain points are considered as being outside the region. For the attribute coding application described herein, the point positions are already available at the decoder irrespective of what kind of transform is used.
Because our 3D SA-DCT regions are defined by the point positions and not by the attribute values of the points, there is no need to perform operations, such as foreground and background segmentation and coding of contours, as is typically done when the SA-DCT is used for conventional 2D image coding.
In another embodiment, all remaining empty positions in a 3D block are filled with predetermined values, so that all 1D DCTs applied to the block in a given direction have the same length, equal to the number of missing and non-missing elements along that direction in the 3D block.
3D Graph Transform
In one embodiment, the transform on the 3D blocks of attributes can use a graph transform. Because our point cloud is partitioned into 3D blocks, we can apply the graph transform on each block.
As shown in
In contrast to the modified SA-DCT, which always produces only one DC coefficient, the graph transform method generates one DC coefficient for every disjoint connected set of points in the block, and each DC coefficient has a set of corresponding AC coefficients. In the example of
Preprocessing and Coding
The steps of the method described herein can be performed in a processor 100 connected to memory and input/output interfaces as known in the art.
Decoder
The embodiments of the invention extend some of the concepts used to code images and video to compress attributes from unstructured point clouds. Point clouds are preprocessed so the points are arranged on a uniform grid, and then the grid is partitioned into 3D blocks. Unlike image and video processing in which all points in a 2D block correspond to a pixel position, our 3D blocks are not necessarily fully occupied by points. After performing 3D block-based intra prediction, we transform, for example using a 3D shape-adaptive DCT or a graph transform, and then quantize the resulting data.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.