The present application generally relates to point cloud compression and, in particular to methods and apparatus of encoding/decoding point cloud geometry data sensed by at least one sensor.
The present section is intended to introduce the reader to various aspects of art, which may be related to various aspects of at least one embodiments of the present application that is described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present application.
As a format for the representation of 3D data, point clouds have recently gained traction as they are versatile in their capability in representing all types of physical objects or scenes. Point clouds may be used for various purposes such as culture heritage/buildings in which objects like statues or buildings are scanned in 3D in order to share the spatial configuration of the object without sending or visiting it. Also, it is a way to ensure preserving the knowledge of the object in case it may be destroyed; for instance, a temple by an earthquake. Such point clouds are typically static, colored and huge.
Another use case is in topography and cartography in which using 3D representations allow for maps that are not limited to the plane and may include the relief. Google Maps is now a good example of 3D maps but uses meshes instead of point clouds. Nevertheless, point clouds may be a suitable data format for 3D maps and such point clouds are typically static, colored and huge.
Virtual Reality (VR), Augmented Reality (AR) and immersive worlds have recently become a hot topic and are foreseen by many as the future of 2D flat video. The basic idea is to immerse the viewer in a surrounding environment, in contrast to a standard TV that only allows the viewer to look at the virtual world in front of him/her. There are several gradations in the immersivity depending on the freedom of the viewer in the environment. A point cloud is a good format candidate for distributing VR/AR worlds.
The automotive industry, and more particularly foreseen autonomous cars, are also domains in which point clouds may be intensively used. Autonomous cars should be able to “probe” their environment to make good driving decisions based on the detected presence and nature of their immediate nearby objects and road configuration.
A point cloud is a set of points located in a three-dimensional (3D) space, optionally with additional values attached to each of the points. These additional values are usually called attributes. Attributes may be, for example, three-component colors, material properties like reflectance and/or two-component normal vectors to a surface associated with a point.
A point cloud is thus a combination of geometry data (locations of the points in a 3D space usually represented by 3D cartesian coordinates x, y and z) and attributes.
Point clouds may be sensed by various types of devices like an array of cameras, depth sensors, lasers (light detection and ranging, also known as Lidars), radars, or may be computer-generated (for example in movie post-production). Depending on the use cases, points clouds may have from thousands to up to billions of points for cartography applications. Raw representations of point clouds require a very high number of bits per point, with at least a dozen of bits per cartesian coordinate x, y or z, and optionally more bits for the attribute(s), for instance three times 10 bits for the colors.
It is important in many applications to be able to either distribute point clouds to an end-user or store them in a server by consuming only a reasonable amount of bitrate or storage space, while maintaining an acceptable (or preferably very good) quality of experience. Efficient compression of these point clouds is a key point in order to make the distribution chain of many immersive worlds practical.
Compression may be lossy (like in video compression) for the distribution to and visualization by an end-user, for example on AR/VR glasses or any other 3D-capable device. Other use cases do require lossless compression, like medical applications or autonomous driving, to avoid altering the results of a decision obtained from the subsequent analysis of the compressed and transmitted point cloud.
Until recently, point cloud compression (aka PCC) was not addressed by the mass market and no standardized point cloud codec was available. In 2017, the standardization working group ISO/JCT1/SC29/WG11, aka Moving Picture Experts Group or MPEG, has initiated work items on point cloud compression. This has led to two standards, namely
Point cloud geometry data may be represented by geometrical elements defined in a three-dimensional space.
For example, point cloud geometry data may be represented by a voxel representation as depicted on
On
In G-PCC, an octree representation is used for representing point cloud geometry data. The geometrical elements are nodes of said octree, each node being associated with a cube (or more generally a “shoebox-shaped” cuboid) comprising at least one point of the point cloud. A root cube is associated with the root node of the octree. This root cube is split into eight equal child cubes associated with the eight child nodes of the root node. The child cubes are scanned in a pre-determined scan order. Each child cube is marked occupied or unoccupied depending upon the presence or not of at least one point of the point cloud in the child cube. The split is applied recursively to child-cubes, grand-child-cubes, etc., until a predetermined maximum depth is reached. In G-PCC, the scanning is performed in a depth first order, i.e. all nodes of a given depth are processed before those of the next depth. The point cloud is coded as a set of points located in occupied leaf nodes having maximum depth, for instance at the centre of such nodes. Consequently, the set of points of the point cloud is entirely determined by the root cube, the predetermined scan order of child nodes, the maximum depth and an 8-bit occupancy pattern associated with each occupied internal node. The 8-bit occupancy pattern comprises 8 binary data fj each associated with a child node j, the binary data fj indicates the occupancy data of the associated child node j.
Encoding point cloud geometry data comprises coding the binary data fj by a context-adaptive entropy encoder. The binary data fj may be associated with a voxel of a voxel representation or a child node of an octree defined over a voxel representation.
Context-adaptive entropy encoding a binary data fj requires that a context associated with the binary data fj be selected. A context selection process should use relevant available (i.e. already coded) information and correlation to obtain this adequate context. The context is usually selected based on neighborhood occupancy configurations associated with the current node/cube(or cuboid)/voxel to be coded. A neighborhood occupancy configuration is a set of occupancy data associated with nodes/cubes(or cuboids)/voxels belonging to a causal neighborhood relative to a current node/cube(or cuboid)/voxel for which a binary data fj to be entropy encoded.
In an octree representation of the point cloud, if nodes are processed in a depth first order as in G-PCC, one may profit from the knowledge of the occupancy data of nodes at higher depth as depicted on
Nodes at depth higher than the current node are used to obtain geometry information in regions (here for y higher than the y of the current node) not yet coded at current depth.
In G-PCC, a combination of neighboring nodes at current depth and current depth plus one is used to define a neighborhood. However, in order to limit the number of possible neighborhood occupancy configurations, the neighborhood has been limited to a subset of the set of nodes adjacent (defined as “sharing a face with”) to the current node.
Point cloud geometry data may also be represented by a coarse representation defined in a two-dimensional coordinate space.
For example, in the working group ISO/IEC JTC 1/SC 29/WG 7 on MPEG 3D Graphics Coding, a new codec named L3C2 (Low-Latency Low-Complexity Codec) is being considered to improve, relative to the G-PCC codec, the coding efficiency of Lidar-sensed point clouds. The codec L3C2 provides an example of a two-dimensional representation of the points of a point cloud namely a coarse representation. A description of the code can be found in the output document of the Working Group in N00167, ISO/IEC JTC 1/SC 29/WG 7, MPEG 3D Graphics Coding, “Technologies under Consideration in G-PCC”, Aug. 31, 2021.
Basically, for each sensed point Pn of the point cloud, a sensor index λn associated with a sensor that sensed the point Pn and an azimuthal angle Φn representing a sense angle of said sensor are obtained by converting 3D cartesian coordinates (xn,yn,zn) representing the 3D location of the sensed point Pn. Points of the point cloud are then ordered based on the azimuthal angles Φn and the sensor indices λn, for example, according to a lexicographic order based first on the azimuthal angle and then on the sensor index. The order index o(Pn) of a point Pn is then obtained by:
where K is the number of sensors.
On
Encoding point cloud geometry data comprises encoding the occupancies of the coarse points of the coarse representation by encoding order index differences Δon representing, each, a difference between order indices of two consecutive points Pn−1 and Pn are obtained by:
The order index o(P1) of the first point P1 may be directly coded into a bitstream. This is equivalent to arbitrary setting the order index of a virtual zero-th point to zero, i.e. o(P0)=0, and coding Δo1=o(P1)−o(P0)=o(P1).
Given the order index o(P1) of the first point and the order differences Δon, one can recursively reconstruct the order index o(Pn) of any point Pn by:
Order index differences Δon are binarized and the obtained binary data fj are context-adaptive entropy encoded as follows: for each binary data fj, a context is selected based on a difference Cj between an azimuthal angle Φj associated with the binary data fj and an azimuthal angle of an already encoded point, and the binary data fj is context-adaptive entropy coded based on the selected context.
In the example of
For example, a difference Cj associated with a binary data fj is calculated between an index of an azimuthal angle Φj associated with the binary data fj and an index of an azimuthal angle Φpenult,j associated with a penultimate already coded point Φpenult,j with a same sensor index as the sensor index kA associated with the binary data fj. The current point Pn−1 is never considered in the set of already coded points from which the penultimate already coded point Ppenult,0 is determined:
On
Then, a context index ctxldx of a table of contexts may be selected as equal to Cj but, in order to limit the number of contexts, one may cap the context index by a threshold th1 as
ctxldx=min(Cj,th1).
Context-adaptive entropy encoding a binary flag f; comprises selecting a context by some selection process based on already coded information. The context is selected based on neighborhood occupancy configurations associated with the current coarse point, i.e. a set of coarse points belonging to a causal neighborhood relative to the binary data fj to be entropy encoded.
A drawback of prior arts is their inability to use a big neighborhood, big enough to encompass most of (or all) local occupancy correlations, to code the occupancy data of either a node/cube/voxel (three-dimensional geometrical element) in the voxel or octree representation or of a coarse point (two-dimensional geometrical element) in the coarse representation.
In practice, using a big neighborhood involves a too high number of possible neighborhood occupancy configurations to be used directly in selecting a context that is usually used for entropy coding the occupancy data (for example the binary information fj) of a current two- or three-dimensional geometrical element.
Several techniques have been proposed to reduce the number of possible neighborhood occupancy configurations.
Firstly, some “physical” arguments have been introduced like the masking of an occupied neighbor by another occupied neighbor placed between the former and the current node/cube/voxel; in this case the information of the former is discarded because this information is supposed to be weaker than the information of the latter.
Secondly, based on learning, local reductions have been proposed but the building of the local reduction depends much on the type of learned point clouds. The local reductions may be implemented using Look-Up-Tables (LUT) of limited size.
Thirdly, a more flexible technique named OBUF (Optimal Binary Coder with Update on the Fly) has been introduced to reduce the number of contexts of the entropy coder by using a limited number of adaptive entropy coders (say 32), by associating a coder index (say 0 to 31) with each neighborhood occupancy configuration, and, after each coding of the occupancy data of a current node, updating the coder index associated with the neighborhood occupancy configuration of the current node. More details of OBUF in the ACM paper entitled “Using neighboring nodes for the compression of octrees representing the geometry of point clouds” (S. Lasserre et al., ACM MMSys'19, June 2019, Amherst, Massachusetts, USA).
The OBUF scheme has been combined with the first reduction technique, applied before OBUF, to limit the number of coder (context) indices that are dependent on reduced neighborhood occupancy configurations.
One of the problems to be solved is to allow the usage of a big neighborhood of a current geometrical element (node/voxel/coarse point), big enough to encompass most of (or all) local occupancy correlations, to select an entropy coder context for coding the occupancy data of a current geometrical element while maintaining the number of contexts reasonably low for practical implementation and meaningful context statistics.
The problem is not only the difficulty of implementation (due mostly to memory access) of a high number of contexts, but also to see contexts being individually rarely visited, consequently leading to poor statistics and then to badly adapted probabilities used for the coding of the occupancy data by the entropy coder. This leads to overall poor compression capability.
For example, a neighborhood involving 20 neighbors (which actually is not many neighbors in a three-dimensional space) would lead to 220=1 million possible neighborhood occupancy configurations. Consequently, when dealing with typical point clouds having a few millions of points at most, many of the neighborhood occupancy configurations will be visited a very limited number of times and it becomes impossible to build reliable statistics per neighborhood occupancy configuration.
The prior art does not provide satisfactory solutions.
Solutions based on neighbor masking and reduction LUT are specific to some types of contents. Anyway, LUT with millions of entries are not practical and are trained on a specific type of contents.
Also, OBUF reduces the number of entropy coders (or equivalently of contexts) but the number of coder indices is equal to the number of (reduced) neighborhood occupancy configurations and having millions of coder indices is also impractical.
For these reasons, the prior art has all worked with a spatially very limited neighborhood.
At least one embodiment of the present application has been devised with the foregoing in mind.
The following section presents a simplified summary of at least one embodiment in order to provide a basic understanding of some aspects of the present application. This summary is not an extensive overview of an embodiment. It is not intended to identify key or critical elements of an embodiment. The following summary merely presents some aspects of at least one of the embodiments in a simplified form as a prelude to the more detailed description provided elsewhere in the document.
According to a first aspect of the present application, there is provided a method of encoding, into a bitstream, point cloud geometry data represented by geometrical elements occupying some discrete positions of a set of discrete positions of a multi-dimensional space. The method comprises obtaining a series of at least one binary data representative of an occupancy data of at least one neighboring geometrical element belonging to a causal neighborhood of a current geometrical element of the multi-dimensional space; obtaining a first index from the series of at least one binary data, the first index being representative of a neighborhood occupancy configuration among a first set of neighborhood occupancy configuration indices representative of potential neighborhood occupancy configurations; obtaining a second index by applying an index range reduction function to the first index; the second index is representative of the neighborhood occupancy configuration among a second set of neighborhood occupancy configuration indices representative of the potential neighborhood occupancy configurations; the index range reduction function is adapted to map each index of the first set of neighborhood occupancy configuration indices to an index of the second set of neighborhood occupancy configuration indices; and the range of the second set of neighborhood occupancy configuration indices being lower than the range of the first set of neighborhood occupancy configuration indices; and entropy encoding, into the bitstream, at least one binary data representative of an occupancy data of the current geometrical element based on the second index.
According to a second aspect of the present application, there is provided a method of decoding, from a bitstream, point cloud geometry data represented by geometrical elements occupying some discrete positions of a set of discrete positions of a multi-dimensional space.
The method comprises obtaining, a series of at least one binary data based on occupancy data of precedingly decoded geometrical elements belonging to a causal neighborhood of a current geometrical element of the multi-dimensional space; obtaining a first index from the series of at least one binary data, the first index being representative of a neighborhood occupancy configuration among a first set of neighborhood occupancy configuration indices representative of potential neighborhood occupancy configurations; obtaining a second index by applying an index range reduction function to the first index; the second index is representative of the neighborhood occupancy configuration among a second set of neighborhood occupancy configuration indices representative of the potential neighborhood occupancy configurations; the index range reduction function is adapted to map each index of the first set of neighborhood occupancy configuration indices to an index of the second set of neighborhood occupancy configuration indices; and the range of the second set of neighborhood occupancy configuration indices being lower than the range of the first set of neighborhood occupancy configuration indices; and entropy decoding, from the bitstream, at least one binary data representative of an occupancy data of the current geometrical element based on the second index.
According to a third aspect of the present application, there is provided an apparatus of encoding, into a bitstream, point cloud geometry data represented by geometrical elements occupying some discrete positions of a set of discrete positions of a multi-dimensional space.
The apparatus comprising one or more processors configured to carry out a method according to the first aspect of the present application.
According to a fourth aspect of the present application, there is provided an apparatus of decoding, from a bitstream, point cloud geometry data represented by geometrical elements occupying some discrete positions of a set of discrete positions of a multi-dimensional space.
The apparatus comprising one or more processors configured to carry out a method according to the second aspect of the present application.
According to a fifth aspect of the present application, there is provided a non-transitory computer-readable storage medium carrying instructions of program code for executing a method according to the first aspect of the present application.
According to a sixth aspect of the present application, there is provided a non-transitory computer-readable storage medium carrying instructions of program code for executing a method according to the second aspect of the present application.
The specific nature of at least one of the embodiments as well as other objects, advantages, features and uses of said at least one of embodiments will become evident from the following description of examples taken in conjunction with the accompanying drawings.
Reference will now be made, by way of example, to the accompanying drawings which show embodiments of the present application, and in which:
Similar reference numerals may have been used in different figures to denote similar components.
At least one of the embodiments is described more fully hereinafter with reference to the accompanying figures, in which examples of at least one of the embodiments are depicted.
An embodiment may, however, be embodied in many alternate forms and should not be construed as limited to the examples set forth herein. Accordingly, it should be understood that there is no intent to limit embodiments to the particular forms disclosed. On the contrary, the disclosure is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present application.
At least one of the aspects generally relates to point cloud encoding and decoding, one other aspect generally relates to transmitting a bitstream generated or encoded and one other aspects relates to receiving/accessing a decoded bitstream.
According to a first aspect of the present application, there is provided a method of encoding, into a bitstream, point cloud geometry data represented by geometrical elements occupying some discrete positions of a set of discrete positions of a multi-dimensional space. The method comprises obtaining a series of at least one binary data representative of an occupancy data of at least one neighboring geometrical element belonging to a causal neighborhood of a current geometrical element of the multi-dimensional space; obtaining a first index from the series of at least one binary data, the first index being representative of a neighborhood occupancy configuration among a first set of neighborhood occupancy configuration indices representative of potential neighborhood occupancy configurations; obtaining a second index by applying an index range reduction function to the first index; the second index is representative of the neighborhood occupancy configuration among a second set of neighborhood occupancy configuration indices representative of the potential neighborhood occupancy configurations; the index range reduction function is adapted to map each index of the first set of neighborhood occupancy configuration indices to an index of the second set of neighborhood occupancy configuration indices; and the range of the second set of neighborhood occupancy configuration indices being lower than the range of the first set of neighborhood occupancy configuration indices; and entropy encoding, into the bitstream, at least one binary data representative of an occupancy data of the current geometrical element based on the second index.
According to a second aspect of the present application, there is provided a method of decoding, from a bitstream, point cloud geometry data represented by geometrical elements occupying some discrete positions of a set of discrete positions of a multi-dimensional space.
The method comprises obtaining, a series of at least one binary data based on occupancy data of precedingly decoded geometrical elements belonging to a causal neighborhood of a current geometrical element of the multi-dimensional space; obtaining a first index from the series of at least one binary data, the first index being representative of a neighborhood occupancy configuration among a first set of neighborhood occupancy configuration indices representative of potential neighborhood occupancy configurations; obtaining a second index by applying an index range reduction function to the first index; the second index is representative of the neighborhood occupancy configuration among a second set of neighborhood occupancy configuration indices representative of the potential neighborhood occupancy configurations; the index range reduction function is adapted to map each index of the first set of neighborhood occupancy configuration indices to an index of the second set of neighborhood occupancy configuration indices; and the range of the second set of neighborhood occupancy configuration indices being lower than the range of the first set of neighborhood occupancy configuration indices; and entropy decoding, from the bitstream, at least one binary data representative of an occupancy data of the current geometrical element based on the second index.
In an embodiment, the index range reduction function is a hashing function.
In an embodiment, said at least one binary data is encoded or decoded by a binary arithmetic coder or decoder using an internal probability based on at least the second index.
In an embodiment, encoding or decoding said at least one binary data comprises selecting a context among a set of contexts based on at least the second index.
In an embodiment, selecting the context further depends on predictors of said at least one binary data.
In an embodiment, encoding or decoding the at least one binary data representative of the occupancy data of the current geometrical element comprises obtaining a context index as an entry of a context index table determined from at least the second index; obtaining a context associated with a probability as the entry, associated with the context index, of a context table comprising the set of contexts; entropy encoding into the bitstream or entropy decoding from the bitstream, the at least one binary data using the probability; and updating the entry of the context index table based on the encoded or decoded binary data to a new value.
In an embodiment, the geometrical elements are defined in a two-dimensional space.
In an embodiment, the geometrical elements are defined in a three-dimensional space.
In some embodiments, there is provided an apparatus of encoding, into a bitstream, point cloud geometry data represented by geometrical elements occupying some discrete positions of a set of discrete positions of a multi-dimensional space. The apparatus comprising one or more processors configured to carry out a method according to the first aspect of the present application.
In some embodiments, there is provided an apparatus of decoding, from a bitstream, point cloud geometry data represented by geometrical elements occupying some discrete positions of a set of discrete positions of a multi-dimensional space. The apparatus comprising one or more processors configured to carry out a method according to the second aspect of the present application.
In some embodiments, there is provided a computer program product including instructions which, when the program is executed by one or more processors, causes the one or more processors to carry out a method according to the first aspect of the present application.
In some embodiments, there is provided a non-transitory computer-readable storage medium carrying instructions of program code for executing a method according to the first aspect of the present application.
In some embodiments, there is provided a computer program product including instructions which, when the program is executed by one or more processors, causes the one or more processors to carry out a method according to the second aspect of the present application.
In some embodiments, there is provided a non-transitory computer-readable storage medium carrying instructions of program code for executing a method according to the second aspect of the present application.
Moreover, the present aspects are not limited to MPEG standards such as MPEG-I part 5 or part 9 that relate to the Point Cloud Compression, and may be applied, for example, to other standards and recommendations, whether pre-existing or future-developed, and extensions of any such standards and recommendations (including MPEG-I part 5 and part 9). Unless indicated otherwise, or technically precluded, the aspects described in the present application may be used individually or in combination.
In brief, the present invention relates to encoding/decoding point cloud geometry data represented by geometrical elements occupying some discrete positions of a set of discrete positions of a multi-dimensional space. At least one binary data is obtained. The at least one binary data is representative of an occupancy data of neighboring geometrical elements belonging to a causal neighborhood of a current geometrical element of the multi-dimensional space. A first index is then obtained from the at least one binary data. The first index is representative of a neighborhood occupancy configuration among a first set of neighborhood occupancy configuration indices representative of potential neighborhood occupancy configurations. A second index is obtained by applying an index range reduction function to the first index. The second index is representative of the neighborhood occupancy configuration among a second set of neighborhood occupancy configuration indices representative of the potential neighborhood occupancy configurations. The index range reduction function is adapted to map each index of the first set of neighborhood occupancy configuration indices to an index of the second set of neighborhood occupancy configuration indices. The range (cardinality) of the second set of neighborhood occupancy configuration indices is lower than the range (cardinality) of the first set of neighborhood occupancy configuration indices. At least one binary data representative of an occupancy data of the current geometrical element is entropy encoded based on the second index.
An ideal index range reduction function should be easy to compute and it should take a high number of entries and “randomly mixes” neighborhood occupancy configurations. The rationale behind randomness being an efficient choice is as follows. For a given type of point clouds, a few neighborhood occupancy configurations will be visited much more often than the vast majority of others. For example, Augmented RealityNirtual Reality point clouds representing surfaces will tend to visit neighborhood occupancy configurations with occupied adjacent neighbors. Sparse point clouds will tend to visit neighborhood occupancy configurations with a few occupied neighbors. Volumetric point clouds will tend to visit neighborhood occupancy configurations with most of neighbors being occupied. By a random mixing nature of the index range reduction function, none of the types of point cloud is privileged and it is likely that often visited configurations do not collide by having a same reduced neighborhood occupancy configuration after having applied the function. Therefore, visiting statistics are well preserved. Practically, an index range reduction function may be selected to ensure that visiting statistics are preserved for a large class of realistic point clouds.
The present invention applies to any type of two- or three-dimensional geometrical elements defined in any representation used for representing point cloud geometry data. For example, a geometrical element may be a voxel in a voxel representation, a node of a tree representing the point cloud geometry data, or a coarse point of a coarse representation.
A current geometrical element representative of a part of the point cloud geometry data is considered.
In step 110, a binary data fj representative of the occupancy data of the current geometrical element is obtained.
In step 120, a series (indexed by n) of at least one binary data fj,n representative of an occupancy data of at least one neighboring geometrical element belonging to a causal neighborhood of the current geometrical element, is obtained.
In step 130, a first index 11 is obtained from the series of at least one binary data (fj,n). The first index 11 is representative of a neighborhood occupancy configuration among a first set of neighborhood occupancy configuration indices representative of potential neighborhood occupancy configurations. A neighborhood occupancy configuration index indicates which neighbors of the causal neighborhood are occupied by at least one point of the point cloud.
In an embodiment of step 130, the first index 11 may be a word made of a concatenation of the bits fj,n (indexed by n) indicating the occupancy data of each of said neighbor.
In a variant, the concatenation may be performed in a predetermined order of said neighbors.
In step 140, a second index I2 is obtained by applying an index range reduction function F to the first index 11. The second index I2 is representative of the neighborhood occupancy configuration among a second set of neighborhood occupancy configuration indices representative of the potential neighborhood occupancy configurations. The index range reduction function F is adapted to map each index of the first set of neighborhood occupancy configuration indices to an index of the second set of neighborhood occupancy configuration indices. The range (cardinality) of the second set of neighborhood occupancy configuration indices being lower than the range (cardinality) of the first set of neighborhood occupancy configuration indices.
The index amplitude reduction function F should output a number of possible indices, i.e. a cardinality of the second set of neighborhood occupancy configuration indices, sufficiently big to avoid mixing unrelated neighboring statistics and, thus, to preserve compression capability.
On the other hand, the index amplitude reduction function F should output a number of possible indices sufficiently small to obtain a practical implementation in terms of tracking of the statistics and to avoid dilution of the statistics. Typically, the cardinality of the second set of neighborhood occupancy configuration indices is in the range of a few dozens to a few thousand possible neighborhood occupancy configuration indices.
In an embodiment of step 140, the index range reduction function F may be a hashing function.
In an embodiment of step 140, the hashing function may be a modulo function relative to an integer p:
where p is preferably a prime number. By doing so there are p possible second indices.
In an embodiment of step 140, the hashing function may be a linear combination follows by a modulo relative to an integer M:
where a and M are preferably relatively prime numbers. By doing so there are M possible hashed indices.
Any other hashing function may be suitable as soon as it is easy to compute and mixes well the neighborhood occupancy configurations.
The occupancies information of neighbors may not be the only useful information for encoding the occupancy data of a current geometrical element. Another information INF may be used in combination with the occupancy data of neighbors.
In one variant, the information INF is based on inter frame prediction that may provide a prediction of the occupancy data of a current geometrical element based on the motion of points of the point cloud. Such prediction information may be also used to encode the occupancy data of a current geometrical element.
In one variant, the information INF is another prediction information that may be the position of the current geometrical element within the multi-dimensional space. For example, in the coarse representation, the sensor index A carries information useful to the anticipation of occupancy data because different sensors pointing to different elevations have different occupancy statistics. Another example is the position of a node within its parent node because the geometry of the neighborhood is affected by this position.
In step 150, the binary data fj, representative of an occupancy data of the current geometrical element is entropy encoded into a bitstream B based on the second index I2.
Then, the encoding method processes to the encoding of a binary data fj representative of the occupancy data of a next current geometrical element (step 160). The method ends when all geometrical elements have been considered.
In an embodiment of step 110, depicted on
The decoding method 200 is straightforwardly deduced from the encoding method 100.
In step 210, a series (indexed by n) of at least one binary data fj,n is obtained based on occupancy data of precedingly decoded geometrical elements belonging to a causal neighborhood of a current geometrical element of the multi-dimensional space.
A second index I2 is obtained from steps 130 and 140.
In step 220, one binary data fj representative of the occupancy data of the current geometrical element is entropy decoded from the bitstream B based on the second index I2.
Then, the decoding method processes to the decoding of a binary data fj representative of the occupancy data of a next geometrical element indicated by the bitstream B. The method ends when all geometrical elements have been considered.
In an embodiment of the method 200, in step 230, a data representative of the occupancy data of the current geometrical element is obtained from at least one decoded binary data fj. When a data representative of the occupancy data of the current geometrical element is obtained, a new binary data fj is decoded from the bitstream until all the binary data fj of a new data are have been obtained. Said new data is obtained from at least one decoded binary data fj and so on until all the binary data fj are decoded from the bitstream. For example, when a geometrical element is a coarse point, the occupancy data of the coarse point is represented by an index order difference coded as a series of binary data fj.
In an embodiment of step 150, respectively 220, a binary data fj is encoded, respectively decoded, by a binary arithmetic coder, respectively decoder, using a probability pr of a binary data fj to be true (fj=1) based on the second index I2. In this case, the second index I2 may point to the 12-th element of a table T of probabilities such that pr=T[I2]. The probability T[I2] may be updated after the coding of the binary data fj depending on its value (false vs true or 0 vs 1).
In an embodiment step 150, respectively 220, depicted on
In one variant, depicted on
In step 151, the context Ctx is selected by some selection process based at least on the second index I2 to provide a context index ctxldx associated with a binary data fj.
A context table Tctx with Nctx entries usually store probabilities associated with the contexts and a probability pctxldx is obtained as the ctxldx-th entry of the context table. The context is selected based at least on the second index I2 by
Ctx=Tctx[ctxldx].
For example, the context index ctxldx may be equal to the second index I2 such that the context Ctx is selected as
Ctx=Tctx[I2].
In step 152, respectively 221, a binary data fj is entropy encoded in the bitstream B, respectively entropy decoded from the bitstream B, using the probability pctxldx.
Entropy coders are usually arithmetic coders but may be any other type of entropy coders like asymmetric numeral systems. In any case, optimal coders add −log2(pctxldx) bits in the bitstream B to encode fj=1 or −log 2(1−pctxldx) bits in the bitstream B to encode fj=0. Once the binary data fj is encoded (decoded) the probability pctxldx is updated by using an updater taking the encoded binary data fj and pctxldx as entries; the updater is usually performed by using updated tables. The updated probability replaces the ctxldx-th entry of the context table Tctx. Then, another binary data fj can be encoded, and so on. The update loop back to the context table is a bottleneck in the coding workflow as another symbol can be encoded only after the update has been performed. For this reason, the memory access to the context table must be as quick as possible and minimizing the size of the context table helps easing its hardware implementation.
A Context-Adaptive Binary Arithmetic decoder performs essentially the same operations as the Context-Adaptive Binary Arithmetic encoder except that the coded binary data fj is decoded from the bitstream B by an entropy decoder using the probability pctxldx.
Selecting an adequate context, i.e. the probability pctxldx that estimates at best the chance of the binary data fj to be equals to 1, is essential to obtain good compression. Therefore, the context selection should use the occupancy data of at least one neighboring geometrical element belonging to a causal neighborhood of a current geometrical element, and correlation between them to obtain this adequate context.
In an embodiment of step 150, respectively of step 220, depicted on
In step 153, a context index Ctxldx is selected based on a dependency state D of a set Dj, indexed by j from the binary data fj. A set Dj may comprise at least the second index I2:
where bj,0, . . . , bj,j−1 are the binary data representative of the occupancies of some neighboring geometrical elements belonging to a causal neighborhood of the current geometrical element. In one variant, the set Dj of dependency states may comprise predictors for the binary data fj:
In one embodiment of the later variant, when the geometrical element is a node of an octree, an occupancy score scorem is computed for each of the eight sub-nodes SNm(m=0, . . . ,7) of the current node (current geometrical element) by using a weighted sum over the 26 neighbors that share a face, an edge or a vertex with the current node:
where m is the sub-node index, k is a neighbor index, wk,m is the contribution (weight) from neighbour k to sub-node m, and δk is the occupancy data (0 for unoccupied, 1 for occupied) of the neighbour k. The score is transformed into a ternary information Predm belonging to the set {“predicted unoccupied”, “predicted occupied”, “not predicted” } of three prediction states by using two thresholds th0(No) and th1(No) that depend upon the number No of occupied neighbors. If the score scorem is lower than th0(No), then Predm is set to “predicted unoccupied”; if the score scorem is higher than th1(No), then Predm is set to “predicted occupied”; otherwise the score is between the two thresholds and Predm is set to “not predicted”.
Neighborhood may not be the only useful information for predicting the occupancy data of a current geometrical element.
For example, inter frame prediction may provide a prediction of occupancy data based on the motion of points of the point cloud.
Another prediction information may be the position of the element within the geometrical representation.
For example, when the geometrical element is a coarse point in the two-dimensional angular space, the sensor index A carries information useful to the anticipation of occupancy data because different sensors pointing to different elevations have different occupancy statistics.
Another example is the position of a node within its parent node because the geometry of the neighborhood is affected by this position.
In one embodiment of step 153 of
Ctxldx=L[D]
where the context index table L has as many entries as the cardinality of the set Dj of dependency states.
In step 154, a context table Tctx with Nctx entries stores the probabilities associated with the contexts, and a context Ctx associated with a probability pctxldx is obtained as the ctxldx-th entry of the context table by:
Ctx=Tctx[Ctxldx].
In step 152, the binary data fj is encoded in the bitstream B by an entropy coder using the probability pctxldx as above explained.
In step 221, the binary data fj is decoded from the bitstream B by an entropy decoder using the probability pctxldx as above explained.
In step 155, the D-th entry of the context index table L[D] is updated from the encoded binary data fj to a new value Lupdated[D].
In one variant, the update may be performed based on a model of a memory channel such as
where M is the length of the memory and the binary data fj value is either 0 or 1 depending on the occupancy data of the current element.
In one variant, a memory channel may be implemented by using two other fixed look Up Tables LM0 and LM1 by
L
updated
[D]=LM0[L[D]] if fj=0,
L
updated
[D]=LM1[L[D]] if fj=1.
The associated decoder performs essentially the same operations the encoder except that the encoded binary data fj is decoded from the bitstream B by an entropy decoder using the probability pctxldx.
In an embodiment, the geometrical elements are defined in a three-dimensional space.
For example, the geometrical elements may be nodes/cubes of a tree representing point cloud geometry data or voxels of a vowel representation of point cloud geometry data.
In an embodiment, the geometrical elements are defined in a two-dimensional space.
For example, the geometrical elements may be occupied coarse points of a coarse representation defined in a two-dimensional coordinate space.
Said coarse representation may be defined as the coarse representation of the L3C2 codec as above discussed, i.e. a coarse representation dedicated to Lidar-sensed point clouds.
However, the coarse representation may also be defined for any types of sensors head including rotating or non-rotating sensors heads. A sensing path is defined from sensor characteristics in a two-dimensional angular coordinate (Φ, θ) space comprising an azimuthal coordinate Φ representative of an azimuthal angle representing a sensing angle of a sensor relative to a referential and an elevation coordinate θ representative of an elevation angle of a sensor relative to a horizontal referential plane. The sensing path is used to sense the points of the point cloud according to ordered coarse points representing potential locations of sensed points of the point cloud. Each coarse point is a geometrical element and is defined from one sample index s associated with a sensing time instant along the sensing path and one sensor index A associated with a sensor.
On the illustrative example of
Each of these two coarse points belongs to a sensor sensing path (dash line) defined from the sensing path SP.
An order index o(P) is associated with each coarse point according to the rank of said coarse point among the ordered coarse points:
where K is the number of sensors of a set of sensors or the number of different positions of a single sensor for a same sample index, λ is the sensor index of the sensor that sensed the point P of the point cloud at the sensing time instant s.
On
Encoding point cloud geometry data comprises encoding the occupancies of the coarse points of the coarse representation by encoding order index differences Δo representing, each, a difference between order indices of two consecutive points P−1 and P are obtained by:
The order index o(P1) of the first coarse point occupied by the first sensed point P1 of the point cloud may be directly encoded into the bitstream B. This is equivalent to arbitrary setting the order index of a virtual zero-th point to zero, i.e. o(P0)=0, and coding Δo1=o(P1)−o(P0)=o(P1).
Given the order index o(P1) of the first coarse point occupied by the first sensed point P1 of the point cloud, and order differences Δo, one can recursively reconstruct the order index o(P) of any coarse point occupied by a sensed point P of the point cloud by:
Order index difference Δon=o(Pn)−o(Pn−1) are binarized and the obtained binary data fj are context-adaptive entropy encoded as above discussed.
System 300 may be embedded as one or more devices including the various components described below. In various embodiments, the system 300 may be configured to implement one or more of the aspects described in the present application.
Examples of equipment that may form all or part of the system 300 include personal computers, laptops, smartphones, tablet computers, digital multimedia set top boxes, digital television receivers, personal video recording systems, connected home appliances, connected vehicles and their associated processing systems, head mounted display devices (HMD, see-through glasses), projectors (beamers), “caves” (system including multiple displays), servers, video encoders, video decoders, post-processors processing output from a video decoder, pre-processors providing input to a video encoder, web servers, set-top boxes, and any other device for processing a point cloud, a video or an image or other communication devices. Elements of system 300, singly or in combination, may be embodied in a single integrated circuit (IC), multiple ICs, and/or discrete components. For example, in at least one embodiment, the processing and encoder/decoder elements of system 300 may be distributed across multiple ICs and/or discrete components. In various embodiments, the system 300 may be communicatively coupled to other similar systems, or to other electronic devices, via, for example, a communications bus or through dedicated input and/or output ports.
The system 300 may include at least one processor 310 configured to execute instructions loaded therein for implementing, for example, the various aspects described in the present application. Processor 310 may include embedded memory, input output interface, and various other circuitries as known in the art. The system 300 may include at least one memory 320 (for example a volatile memory device and/or a non-volatile memory device). System 300 may include a storage device 340, which may include non-volatile memory and/or volatile memory, including, but not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), Random Access Memory (RAM), Dynamic Random-Access Memory (DRAM), Static Random-Access Memory (SRAM), flash, magnetic disk drive, and/or optical disk drive. The storage device 340 may include an internal storage device, an attached storage device, and/or a network accessible storage device, as non-limiting examples.
The system 300 may include an encoder/decoder module 330 configured, for example, to process data to provide encoded/decoded point cloud geometry data, and the encoder/decoder module 330 may include its own processor and memory. The encoder/decoder module 330 may represent module(s) that may be included in a device to perform the encoding and/or decoding functions. As is known, a device may include one or both encoding and decoding modules. Additionally, encoder/decoder module 330 may be implemented as a separate element of system 300 or may be incorporated within processor 310 as a combination of hardware and software as known to those skilled in the art. Program code to be loaded onto processor 310 or encoder/decoder 330 to perform the various aspects described in the present application may be stored in storage device 340 and subsequently loaded onto memory 320 for execution by processor 310. In accordance with various embodiments, one or more of processor 310, memory 320, storage device 340, and encoder/decoder module 330 may store one or more of various items during the performance of the processes described in the present application. Such stored items may include, but are not limited to, a point cloud frame, encoded/decoded geometry/attributes videos/images or portions of the encoded/decoded geometry/attribute video/images, a bitstream, matrices, variables, and intermediate or final results from the processing of equations, formulas, operations, and operational logic.
In several embodiments, memory inside of the processor 310 and/or the encoder/decoder module 330 may be used to store instructions and to provide working memory for processing that may be performed during encoding or decoding.
In other embodiments, however, a memory external to the processing device (for example, the processing device may be either the processor 310 or the encoder/decoder module 330) may be used for one or more of these functions. The external memory may be the memory 320 and/or the storage device 340, for example, a dynamic volatile memory and/or a non-volatile flash memory. In several embodiments, an external non-volatile flash memory may be used to store the operating system of a television. In at least one embodiment, a fast external dynamic volatile memory such as a RAM may be used as working memory for video coding and decoding operations, such as for MPEG-2 part 2 (also known as ITU-T Recommendation H.262 and ISO/IEC 13818-2, also known as MPEG-2 Video), HEVC (High Efficiency Video coding), VVC (Versatile Video Coding), or MPEG-I part 5 (V-PCC) or part 9 (G-PCC).
The input to the elements of system 300 may be provided through various input devices as indicated in block 390. Such input devices include, but are not limited to, (i) an RF portion that may receive an RF signal transmitted, for example, over the air by a broadcaster, (ii) a Composite input terminal, (iii) a USB input terminal, and/or (iv) an HDMI input terminal.
In various embodiments, the input devices of block 390 may have associated respective input processing elements as known in the art. For example, the RF portion may be associated with elements necessary for (i) selecting a desired frequency (also referred to as selecting a signal, or band-limiting a signal to a band of frequencies), (ii) down-converting the selected signal, (iii) band-limiting again to a narrower band of frequencies to select (for example) a signal frequency band which may be referred to as a channel in certain embodiments, (iv) demodulating the down-converted and band-limited signal, (v) performing error correction, and (vi) demultiplexing to select the desired stream of data packets. The RF portion of various embodiments may include one or more elements to perform these functions, for example, frequency selectors, signal selectors, band-limiters, channel selectors, filters, downconverters, demodulators, error correctors, and de-multiplexers. The RF portion may include a tuner that performs various of these functions, including, for example, down-converting the received signal to a lower frequency (for example, an intermediate frequency or a near-baseband frequency) or to baseband.
In one set-top box embodiment, the RF portion and its associated input processing element may receive an RF signal transmitted over a wired (for example, cable) medium. Then, the RF portion may perform frequency selection by filtering, down-converting, and filtering again to a desired frequency band.
Various embodiments rearrange the order of the above-described (and other) elements, remove some of these elements, and/or add other elements performing similar or different functions.
Adding elements may include inserting elements in between existing elements, such as, for example, inserting amplifiers and an analog-to-digital converter. In various embodiments, the RF portion may include an antenna.
Additionally, the USB and/or HDMI terminals may include respective interface processors for connecting system 300 to other electronic devices across USB and/or HDMI connections. It is to be understood that various aspects of input processing, for example, Reed-Solomon error correction, may be implemented, for example, within a separate input processing IC or within processor 310 as necessary. Similarly, aspects of USB or HDMI interface processing may be implemented within separate interface ICs or within processor 310 as necessary. The demodulated, error corrected, and demultiplexed stream may be provided to various processing elements, including, for example, processor 310, and encoder/decoder 330 operating in combination with the memory and storage elements to process the data stream as necessary for presentation on an output device.
Various elements of system 300 may be provided within an integrated housing. Within the integrated housing, the various elements may be interconnected and transmit data therebetween using suitable connection arrangement 390, for example, an internal bus as known in the art, including the 12C bus, wiring, and printed circuit boards.
The system 300 may include communication interface 350 that enables communication with other devices via communication channel 700. The communication interface 350 may include, but is not limited to, a transceiver configured to transmit and to receive data over communication channel 700. The communication interface 350 may include, but is not limited to, a modem or network card and the communication channel 700 may be implemented, for example, within a wired and/or a wireless medium.
Data may be streamed to the system 300, in various embodiments, using a Wi-Fi network such as IEEE 802.11. The Wi-Fi signal of these embodiments may be received over the communications channel 700 and the communications interface 350 which are adapted for Wi-Fi communications. The communications channel 700 of these embodiments may be typically connected to an access point or router that provides access to outside networks including the Internet for allowing streaming applications and other over-the-top communications.
Other embodiments may provide streamed data to the system 300 using a set-top box that delivers the data over the HDMI connection of the input block 390.
Still other embodiments may provide streamed data to the system 300 using the RF connection of the input block 390.
The streamed data may be used as a way for signaling information used by the system 300.
The signaling information may comprise the bitstream B and/or information such a number of points of a point cloud, coordinates and/or sensor setup parameters.
It is to be appreciated that signaling may be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth may be used to signal information to a corresponding decoder in various embodiments.
The system 300 may provide an output signal to various output devices, including a display 400, speakers 500, and other peripheral devices 600. The other peripheral devices 600 may include, in various examples of embodiments, one or more of a stand-alone DVR, a disk player, a stereo system, a lighting system, and other devices that provide a function based on the output of the system 300.
In various embodiments, control signals may be communicated between the system 300 and the display 400, speakers 500, or other peripheral devices 600 using signaling such as AV.Link (Audio/Video Link), CEC (Consumer Electronics Control), or other communications protocols that enable device-to-device control with or without user intervention.
The output devices may be communicatively coupled to system 300 via dedicated connections through respective interfaces 360, 370, and 380.
Alternatively, the output devices may be connected to system 300 using the communications channel 700 via the communications interface 350. The display 400 and speakers 500 may be integrated in a single unit with the other components of system 300 in an electronic device such as, for example, a television.
In various embodiments, the display interface 360 may include a display driver, such as, for example, a timing controller (T Con) chip.
The display 400 and speaker 500 may alternatively be separate from one or more of the other components, for example, if the RF portion of input 390 is part of a separate set-top box. In various embodiments in which the display 400 and speakers 500 may be external components, the output signal may be provided via dedicated output connections, including, for example, HDMI ports, USB ports, or COMP outputs.
In
Some examples are described with regard to block diagrams and/or operational flowcharts. Each block represents a circuit element, module, or portion of code which includes one or more executable instructions for implementing the specified logical function(s). It should also be noted that in other implementations, the function(s) noted in the blocks may occur out of the indicated order. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved.
The implementations and aspects described herein may be implemented in, for example, a method or a process, an apparatus, a computer program, a data stream, a bitstream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or computer program).
The methods may be implemented in, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a computer readable storage medium. A computer readable storage medium may take the form of a computer readable program product embodied in one or more computer readable medium(s) and having computer readable program code embodied thereon that is executable by a computer. A computer readable storage medium as used herein may be considered a non-transitory storage medium given the inherent capability to store the information therein as well as the inherent capability to provide retrieval of the information therefrom. A computer readable storage medium may be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. It is to be appreciated that the following, while providing more specific examples of computer readable storage mediums to which the present embodiments may be applied, is merely an illustrative and not an exhaustive listing as is readily appreciated by one of ordinary skill in the art: a portable computer diskette; a hard disk; a read-only memory (ROM); an erasable programmable read-only memory (EPROM or Flash memory); a portable compact disc read-only memory (CD-ROM); an optical storage device; a magnetic storage device; or any suitable combination of the foregoing.
The instructions may form an application program tangibly embodied on a processor-readable medium.
Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. Examples of such apparatus include personal computers, laptops, smartphones, tablet computers, digital multimedia set top boxes, digital television receivers, personal video recording systems, connected home appliances, head mounted display devices (HMD, see-through glasses), projectors (beamers), “caves” (system including multiple displays), servers, video encoders, video decoders, post-processors processing output from a video decoder, pre-processors providing input to a video encoder, web servers, set-top boxes, and any other device for processing a point cloud, a video or an image or other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Computer software may be implemented by the processor 310 or by hardware, or by a combination of hardware and software. As a non-limiting example, the embodiments may be also implemented by one or more integrated circuits. The memory 320 may be of any type appropriate to the technical environment and may be implemented using any appropriate data storage technology, such as optical memory devices, magnetic memory devices, semiconductor-based memory devices, fixed memory, and removable memory, as non-limiting examples. The processor 310 may be of any type appropriate to the technical environment, and may encompass one or more of microprocessors, general purpose computers, special purpose computers, and processors based on a multi-core architecture, as non-limiting examples.
As will be evident to one of ordinary skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry the bitstream of a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, the singular forms “a”, “an”, and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “includes/comprises” and/or “including/comprising” when used in this specification, may specify the presence of stated, for example, features, integers, steps, operations, elements, and/or components but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Moreover, when an element is referred to as being “responsive” or “connected” to another element, it may be directly responsive or connected to the other element, or intervening elements may be present. In contrast, when an element is referred to as being “directly responsive” or “directly connected” to other element, there are no intervening elements present.
It is to be appreciated that the use of any of the symbol/term “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, may be intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as is clear to one of ordinary skill in this and related arts, for as many items as are listed.
Various numeric values may be used in the present application. The specific values may be for example purposes and the aspects described are not limited to these specific values.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements are not limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element without departing from the teachings of this application. No ordering is implied between a first element and a second element.
Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, is frequently used to convey that a particular feature, structure, characteristic, and so forth (described in connection with the embodiment/implementation) is included in at least one embodiment/implementation. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.
Similarly, reference herein to “in accordance with an embodiment/example/implementation” or “in an embodiment/example/implementation”, as well as other variations thereof, is frequently used to convey that a particular feature, structure, or characteristic (described in connection with the embodiment/example/implementation) may be included in at least one embodiment/example/implementation. Thus, the appearances of the expression “in accordance with an embodiment/example/implementation” or “in an embodiment/example/implementation” in various places in the specification are not necessarily all referring to the same embodiment/example/implementation, nor are separate or alternative embodiment/examples/implementation necessarily mutually exclusive of other embodiments/examples/implementation.
Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims. Although not explicitly described, the present embodiments/examples and variants may be employed in any combination or sub-combination.
When a figure. is presented as a flow diagram, it should be understood that it also provides a block diagram of a corresponding apparatus. Similarly, when a figure is presented as a block diagram, it should be understood that it also provides a flow diagram of a corresponding method/process.
Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
Various implementations involve decoding. “Decoding”, as used in this application, may encompass all or part of the processes performed, for example, on a received point cloud frame (including possibly a received bitstream which encodes one or more point cloud frames) in order to produce a final output suitable for display or for further processing in the reconstructed point cloud domain. In various embodiments, such processes include one or more of the processes typically performed by a decoder. In various embodiments, such processes also, or alternatively, include processes performed by a decoder of various implementations described in this application, for example,
As further examples, in one embodiment “decoding” may refer only to de-quantizing, in one embodiment “decoding” may refer to entropy decoding, in another embodiment “decoding” may refer only to differential decoding, and in another embodiment “decoding” may refer to combinations of de-quantizing, entropy decoding and differential decoding. Whether the phrase “decoding process” may be intended to refer specifically to a subset of operations or generally to the broader decoding process will be clear based on the context of the specific descriptions and is believed to be well understood by those skilled in the art.
Various implementations involve encoding. In an analogous way to the above discussion about “decoding”, “encoding” as used in this application may encompass all or part of the processes performed, for example, on an input point cloud frame in order to produce an encoded bitstream. In various embodiments, such processes include one or more of the processes typically performed by an encoder. In various embodiments, such processes also, or alternatively, include processes performed by an encoder of various implementations described in this application.
As further examples, in one embodiment “encoding” may refer only to quantizing, in one embodiment “encoding” may refer only to entropy encoding, in another embodiment “encoding” may refer only to differential encoding, and in another embodiment “encoding” may refer to combinations of quantizing, differential encoding and entropy encoding. Whether the phrase “encoding process” may be intended to refer specifically to a subset of operations or generally to the broader encoding process will be clear based on the context of the specific descriptions and is believed to be well understood by those skilled in the art. Additionally, this application may refer to “obtaining” various pieces of information. Obtaining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application may refer to “accessing” various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information.
Additionally, this application may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
Also, as used herein, the word “signal” refers to, among other things, indicating something to a corresponding decoder. For example, in certain embodiments the encoder signals a particular information such a number of points of the point cloud or coordinates or sensor setup parameters. In this way, in an embodiment the same parameter may be used at both the encoder side and the decoder side. Thus, for example, an encoder may transmit (explicit signaling) a particular parameter to the decoder so that the decoder may use the same particular parameter. Conversely, if the decoder already has the particular parameter as well as others, then signaling may be used without transmitting (implicit signaling) to simply allow the decoder to know and select the particular parameter. By avoiding transmission of any actual functions, a bit savings is realized in various embodiments. It is to be appreciated that signaling may be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth are used to signal information to a corresponding decoder in various embodiments. While the preceding relates to the verb form of the word “signal”, the word “signal” may also be used herein as a noun.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations.
Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.
Number | Date | Country | Kind |
---|---|---|---|
21306291.2 | Sep 2021 | EP | regional |
This application is a U.S. national phase application of International Application No. PCT/CN2022/099973, filed Jun. 20, 2022, which is based upon and claims a priority to European Patent Application No. 21306291.2, filed on Sep. 17, 2021, the entire contents of which are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/099973 | 6/20/2022 | WO |