The present disclosure relates to an image processing apparatus and method, and more particularly, to an image processing apparatus and method capable of reducing deterioration in quality of a point cloud.
Conventionally, encoding/decoding of point cloud data expressing a three-dimensional shaped object as a set of points has been standardized by Moving Picture Experts Group (MPEG) (refer to Non-Patent Document 1, for example).
Furthermore, there has been proposed a method (hereinafter, also referred to as a video-based approach) in which geometry data and attribute data of a point cloud are projected onto a two-dimensional plane for each small region, an image (patch) projected onto the two-dimensional plane is disposed in a frame image, and the frame image is encoded by an encoding method for two-dimensional images (refer to Non-Patent Documents 2 to 4, for example).
Moreover, there has been conceived a method in which LiDAR (light detection and ranging, or laser imaging detection and ranging) data obtained by measuring scattered light with respect to light irradiation and analyzing a distance to a subject at a long distance and a property of the subject is encoded by an encoding method for two-dimensional images as in the video-based approach described above (refer to Non-Patent Document 5 and Patent Document 1, for example).
However, in a case of a method described in Non-Patent Document 5 or Patent Document 1, when the LiDAR data is transformed into an image, the LiDAR data is aligned with pixel positions in a two-dimensional image, and thus there is a possibility that information amount of geometry data is reduced. Therefore, there is a possibility that quality of the point cloud data is reduced.
The present invention has been made in view of such circumstances, and an object thereof is to reduce deterioration in quality of a point cloud due to encoding/decoding.
An image processing apparatus according to one aspect of the present technology includes an angle image generation unit that generates an angle image having, as a pixel value, information indicating an angular component of geometry data of a point cloud expressing a three-dimensional shaped object as a set of points, and an angle image encoding unit that encodes the angle image generated by the angle image generation unit.
An image processing method according to one aspect of the present technology includes generating an angle image having, as a pixel value, information indicating an angular component of geometry data of a point cloud expressing a three-dimensional shaped object as a set of points, and encoding the generated angle image.
An image processing apparatus according to another aspect of the present technology includes an angle image decoding unit that decodes encoded data to generate an angle image having, as a pixel value, information indicating an angular component of geometry data of a point cloud expressing a three-dimensional shaped object as a set of points, and a construction unit that constructs the point cloud by using the angle image generated by the angle image decoding unit.
An image processing method according to another aspect of the present technology includes decoding encoded data to generate an angle image having, as a pixel value, information indicating an angular component of geometry data of a point cloud expressing a three-dimensional shaped object as a set of point, and constructing the point cloud by using the generated angle image.
In an image processing apparatus and method according to one aspect of the present technology, an angle image having, as a pixel value, information indicating an angular component of geometry data of a point cloud expressing a three-dimensional shaped object as a set of points is generated, and the generated angle image is encoded.
In an image processing apparatus and method according to another aspect of the present technology, encoded data is decoded to generate an angle image having, as a pixel value, information indicating an angular component of geometry data of a point cloud expressing a three-dimensional shaped object as a set of points, and the point cloud is constructed by using the generated angle image.
Modes for carrying out the present disclosure (hereinafter, referred to as embodiments) will be described below. Note that the description will be made in the following order.
<1. Two-Dimensional Image Encoding of LiDAR Data>
<Documents or the Like Supporting Technical Content/Technical Terms>
The scope disclosed in the present technology includes not only the content described in the embodiments but also the content described in the following non-patent documents and the like that are known at the time of filing, content of other documents referred to in the following non-patent documents, and the like.
That is, contents described in the above-described non-patent documents and patent document, contents of other documents referred to in the above-described non-patent documents and patent document, and the like serve as bases for determining support requirements.
<Point Cloud>
Conventionally, there has been 3D data such as a point cloud representing a three-dimensional structure with point position information, attribute information, and the like.
For example, in a case of a point cloud, a three-dimensional structure (three-dimensional shaped object) is expressed as a set of a large number of points. The data of the point cloud (also referred to as point cloud data) includes position information of each point (also referred to as geometry data) and attribute information (also referred to as attribute data) of each point. The attribute data can include any information. For example, color information, reflectance information, normal line information, or the like of each point may be included in the attribute data. Thus, the point cloud data has a relatively simple data structure, and can express an arbitrary three-dimensional structure with sufficient accuracy by using a sufficiently large number of points.
<Overview of Video-Based Approach>
In a video-based approach, geometry data and attribute data of such a point cloud are projected onto a two-dimensional plane for each small region. An image obtained by projecting geometry data and attribute data onto the two-dimensional plane is also referred to as a projected image. Furthermore, a projected image in each small region is referred to as a patch. For example, in the projected image (patch) of geometry data, position information of a point is represented as position information in a direction perpendicular to a projection plane (depth direction) (depth value (Depth)).
Then, each patch generated in this manner is disposed in the frame image. A frame image in which patches of geometry data are arranged is also referred to as a geometry video frame. Furthermore, a frame image in which patches of attribute data are arranged is also referred to as a color video frame. For example, each pixel value of the geometry video frame indicates the depth value described above.
Then, these video frames are encoded by an encoding method for two-dimensional images, such as Advanced Video Coding (AVC) or High Efficiency Video Coding (HEVC), for example. That is, point cloud data that is 3D data representing a three-dimensional structure can be encoded by using a codec for two-dimensional images.
<Occupancy Map>
Note that, in a case of such a video-based approach, an occupancy map can be further used. The occupancy map is map information indicating presence or absence of a projected image (patch) in the geometry video frame. For example, in the geometry video frame or color video frame, the occupancy map indicates a region where there is a patch by using a value “1” and a region where there is no patch by using a value “0”.
Such an occupancy map is encoded as data different from the geometry video frame or color video frame and transmitted to a decoding side. A decoder can grasp, with reference to the occupancy map, whether or not the region is a region where there is a patch, and therefore can reduce influence of noise or the like caused by encoding/decoding and can restore 3D data more accurately. For example, even if a depth value changes due to the encoding/decoding, the decoder can ignore a depth value of a region where there is no patch (does not process the depth value as position information of the 3D data) with reference to the occupancy map.
Note that, similarly to the geometry video frame, the color video frame, or the like, the occupancy map can also be transmitted as a video frame (that is, can be encoded/decoded by the codec for two-dimensional images).
<Polar Coordinate System>
Although geometry data of a point cloud generally represents a three-dimensional position of each point in an orthogonal coordinate system (x, y, z), the three-dimensional position of each point may be represented in a coordinate system using an angular component such as a polar coordinate system, for example. In a case of the polar coordinate system, a three-dimensional position of a point is represented by a distance r from a reference point (origin), an angle φ in a horizontal direction (on an X-Y plane), and an angle θ from a z axis (perpendicular to the X-Y plane) as illustrated in A of
<Lidar Data>
Incidentally, there is LiDAR (light detection and ranging, or laser imaging detection and ranging) data obtained by measuring scattered light with respect to light irradiation and by analyzing a distance to a subject at a long distance and a property of the subject.
When LiDAR data is generated, for example, linear scanning is performed while changing the angle θ in the polar coordinate system. Then, while changing φ in the polar coordinate system, such scanning is repeated to scan an entire circumference. By performing scanning with such a procedure, LiDAR data 11 indicating a result of detecting a body around an observation point 11A as illustrated in B of
Non-Patent Document 5 and Patent Document 1 propose a method in which such LiDAR data is encoded with an encoding method for two-dimensional images, as in the above-described video-based approach. For example, LiDAR data is transformed into an image by combining a plurality of linear scan data of the LiDAR data as illustrated in A of
The depth image 21 indicates depth information as a pixel value (luminance value) for each pixel. For example, the shorter the distance from the observation point, the larger the pixel value (higher luminance), and the farther the distance from the observation point, the smaller the pixel value (lower luminance).
In a case of the example in B of
The attribute image 22 indicates attribute data (texture, reflectance, or the like, for example) as a pixel value for each pixel. For example, in a pixel at a position corresponding to each object in the depth image 21, attribute data of the object is represented as a pixel value.
Then, these images are encoded by the encoding method for two-dimensional images.
On the decoding side, the encoded data is decoded by a decoding method for two-dimensional images to generate the depth image 21 in B of
Thus, it is possible to apply an encoding/decoding method for two-dimensional images, which is more general than the encoding/decoding method for three-dimensional data, and therefore, the LiDAR data can be encoded/decoded more easily. Therefore, it is possible to reduce an increase in cost for encoding/decoding or to perform processing at a higher speed.
In a case of this method, however, each scan data is aligned with pixel positions in the two-dimensional image, but there may be a case where positions of respective points do not completely coincide with the pixel positions. Even if motion or sensing rate of a sensor is controlled so that the point positions coincide with the pixel positions, there may be actually a case where the positions of the respective points do not completely coincide with pixel positions due to, for example, occurrence of deviation (error) in the positions of or motion of the sensor or the like. For example, one piece of scan data may be curved. As described above, there has been a possibility that, when the scan data is aligned with the pixel positions in the two-dimensional image, the positions are quantized and information amount of geometry data is reduced, by which quality of the point cloud is reduced.
<Transmission of Angle Information>
Therefore, as illustrated in the first row (uppermost row) from the top of the table illustrated in
For example, an angle image having, as a pixel value, information indicating an angular component of the geometry data of the point cloud expressing a three-dimensional shaped object as a set of points is generated, and the generated angle image is encoded.
For example, in the image processing apparatus, there are provided an angle image generation unit that generates an angle image having, as a pixel value, information indicating an angular component of the geometry data of the point cloud expressing a three-dimensional shaped object as a set of points, and an angle image encoding unit that encodes the angle image generated by the angle image generation unit.
Thus, the angle information can be provided to a reception side. On the reception side, the point cloud is constructed by using the angle information.
For example, the encoded data is decoded to generate an angle image having, as a pixel value, information indicating an angular component of the geometry data of the point cloud expressing a three-dimensional shaped object as a set of points, and the point cloud is constructed by using the generated angle image.
For example, in the image processing apparatus, there are provided an angle image decoding unit that decodes the encoded data to generate an angle image having, as a pixel value, information indicating an angular component of the geometry data of the point cloud expressing a three-dimensional shaped object as a set of points, and a construction unit that constructs the point cloud by using the angle image generated by the angle image decoding unit.
Thus, it is possible to reduce reduction in the information amount of geometry data due to encoding/decoding. That is, it is possible to reduce deterioration in the quality of the point cloud.
Note that the above-described “coordinate system including an angle” representing a position of each point may be, for example, a polar coordinate system having angular components θ and φ, as described in the second row from the top of the table illustrated in
Furthermore, the above-described “coordinate system including an angle” representing a position of each point may be, for example, a cylindrical coordinate system having an angular component θ, as described in the third row from the top of the table illustrated in
For example, when geometry data is expressed in the polar coordinate system, an angle image 52 and an angle image 53 as illustrated in B of
Meanwhile, the angle image 52 is an image in which the angular component θ of geometry data serves as a pixel value. Each pixel of the angle image 52 corresponds to each pixel of the depth image 51. That is, a pixel value of each pixel of the angle image 52 indicates a sensing direction (angular component θ) for which a pixel value (depth information) of a pixel corresponding to the depth image 51 is obtained. For example, a larger pixel value (luminance value) indicates a larger value θ, and a smaller pixel value (luminance value) indicates a smaller value θ. Because the angular component θ changes in one piece of scan data, pixel values of the angle image 52 change in a longitudinal direction as in the example in B of
The angle image 53 is an image in which the angular component φ of geometry data serves as a pixel value. Each pixel of the angle image 53 corresponds to each pixel of the depth image 51. That is, each pixel of the angle image 53 corresponds to each pixel of the depth image 51. That is, a pixel value of each pixel of the angle image 52 indicates a sensing direction (angular component φ) for which a pixel value (depth information) of a pixel corresponding to the depth image 51 is obtained. For example, a larger pixel value (luminance value) indicates a larger value φ, and a smaller pixel value (luminance value) indicates a smaller value cp. Because the angular component φ changes between each scan data, pixel values of the angle image 53 change in a lateral direction as in the example in C of
For such angle image 52 and angle image 53, for example, as described in the fourth row from the top of the table illustrated in
For example, as described in the fifth row from the top of the table illustrated in
On the decoding side, an angular component of each point is derived by using prediction residuals of the angle images 62 and 64 generated by decoding encoded data, and a point cloud is constructed by using the derived angular component.
Because pixel values of the angle images 62 and 64 are configured by the prediction residual, the pixel values are generally smaller than pixel values of the angle images 52 and 53, and an increase in amount of encoding can be reduced.
A method for deriving the prediction value is arbitrary. Furthermore, information regarding prediction that derives the prediction value may be signaled. For example, the information regarding the prediction may be encoded. Furthermore, encoded data may be decoded to generate the information regarding the prediction.
For example, as described in the sixth row from the top of the table illustrated in
Furthermore, as described in the seventh row from the top of the table illustrated in
For example, as described in the eighth row from the top of the table illustrated in
For example, as described in the ninth row from the top of the table illustrated in
Furthermore, as described in the 10th row from the top of the table illustrated in
For example, as described in the 11th row from the top of the table illustrated in
Furthermore, as described in the 12th row from the top of the table illustrated in
<Dividing Depth Image for Multiple Layers>
Although each pixel value of a depth image is depth information, in general, a detection range of LiDAR is wide, and thus a depth value range tends to be wide. That is, if the depth value range is too wide, a bit depth of pixel values of the depth image is insufficient, and all the depth values may not be able to be expressed. For example, even if the bit depth of the depth image is 16 bits, the depth value range may be wider than that, and there may be a case where all accuracy cannot be expressed with the 16 bits. Furthermore, for example, there may be a case where the depth value is of float accuracy in which the range tends to be wider.
Therefore, as described in the 13th row from the top of the table illustrated in
For example, it is assumed that a depth image 71 illustrated in A of
Therefore, each pixel of the depth image 71 may be classified on the basis of a depth value thereof, and as illustrated in C of
By dividing the depth image for multiple layers in this manner, even when the depth value range is wide, it is possible to represent all the depth values with bit depths less than that. Therefore, it is possible to reduce deterioration in the quality of the point cloud.
In other words, by applying the present technology, by using an inexpensive (low-performance) encoder/decoder capable of processing narrow bit depth of an image, it is possible to encode/decode an image having a wider bit depth than that. Therefore, an increase in cost can be reduced.
In addition, this layering is performed on the basis of a magnitude of a depth value (that is, as illustrated in D of
Note that the number of layers is arbitrary. Three or more layers may be formed. The number of layers may be variable. Furthermore, a range of the depth values assigned to each layer is arbitrary. The ranges of the depth values of the layers may be the same or different from each other. Furthermore, the range of the depth value of each layer may be variable. Expressible accuracy can be set according to a width of a range of a depth value. For example, a range of a depth value may be set wider for a layer including a part with high priority, such as a layer corresponding to a depth value on the close side or a layer including a depth value with a high density of points. Furthermore, a range of a depth value may be set narrower for a layer including only a part with low priority, such as a layer corresponding to a depth value on the far side or a layer including a depth value with a low density of points.
Furthermore, in the depth image, a pixel value of a pixel having no depth value (position where a body has not been detected) may be 0, or may be a value other than 0, for example, an average value of peripheral pixel values, or the like. When applying an encoding method for two-dimensional images in which prediction is performed with reference to peripheral pixels, in general, it is possible to reduce a reduction in encoding efficiency more if this pixel value is set to a value other than 0 than in a case where this pixel value is set to 0. When this pixel value is set to a value other than 0, when the depth value is extracted from the depth image by unpacking, the pixel value is only required to be excluded by using an occupancy map.
<Occupancy Map>
An occupancy map indicating a region in the depth image may be signaled to the depth image, the region including useful information. That is, such an occupancy map may be encoded. Furthermore, encoded data may be decoded to generate such an occupancy map.
In this case, for each pixel or for each of a plurality of pixels, the occupancy map can indicate regions in the depth image, the regions including useful information. For example, each pixel of the occupancy map corresponds to each pixel of the depth image. A pixel having a value “1” in the occupancy map indicates that a pixel at the same position in the depth image has a useful depth value. A pixel having a value “0” in the occupancy map indicates that there is no depth value in a pixel at the same position in the depth image. By performing unpacking of the depth image by utilizing the occupancy map having such a configuration, it is possible to more easily extract the useful depth value.
When the depth image has a plurality of layers, as described in the 14th row from the top (bottom row) of the table illustrated in
<Encoding Apparatus>
Note that
As illustrated in
The decomposition processing unit 111 performs processing related to decomposition of geometry data. For example, the decomposition processing unit 111 acquires LiDAR data that is a point cloud and input to the encoding apparatus 100. Furthermore, as described above in <1. Two-dimensional image encoding of LiDAR data>, the decomposition processing unit 111 decomposes geometry data expressed with polar coordinates (coordinate system including angular components) of the acquired LiDAR data to extract the angular components, and generates angle information including the angular components. In other words, the decomposition processing unit 111 generates depth information that is a distance component from an observation point.
The decomposition processing unit 111 supplies the generated angle information and depth information to the packing unit 113 together with attribute data. Furthermore, the decomposition processing unit 111 supplies the auxiliary patch information compression unit 112 with information (for example, information regarding an observation point position, a camera parameter, or the like) used for the generation of the angle information or depth information, as auxiliary patch information.
The auxiliary patch information compression unit 112 performs processing related to compression of the auxiliary patch information. For example, the auxiliary patch information compression unit 112 acquires the auxiliary patch information supplied from the decomposition processing unit 111. The auxiliary patch information compression unit 112 encodes (compresses) the acquired auxiliary patch information with an encoding method other than the encoding method for two-dimensional images. The auxiliary patch information may include information regarding prediction that derives a prediction value, the information being supplied from the decomposition processing unit 111. Furthermore, the encoding method is arbitrary as long as the method is not for two-dimensional images. The auxiliary patch information compression unit 112 supplies the obtained encoded data of the auxiliary patch information to the multiplexer 118.
The packing unit 113 performs processing related to data packing. For example, the packing unit 113 acquires geometry data (depth information or angle information) and attribute data that are supplied from the decomposition processing unit 111. Furthermore, as described above in <1. Two-dimensional image encoding of LiDAR data>, the packing unit 113 generates a depth image that is a two-dimensional image having the acquired depth information as a pixel value. At that time, the packing unit 113 can also divide the depth image for multiple layers. The packing unit 113 regards the depth image as a video frame and generates a geometry video frame. When the depth image is divided for multiple layers, the packing unit 113 regards each layer of the depth image as a video frame and generates the geometry video frame.
Moreover, as described above in <1. Two-dimensional image encoding of LiDAR data>, the packing unit 113 generates an angle image that is a two-dimensional image having the acquired angle information (or information derived from the angle information) as a pixel value. The packing unit 113 regards the angle image as a video frame and generates an angle video frame.
Furthermore, the packing unit 113 generates an attribute image that is a two-dimensional image having the acquired attribute data as a pixel value. The packing unit 113 regards the attribute image as a video frame and generates an attribute video frame.
Moreover, as described above in <1. Two-dimensional image encoding of LiDAR data>, the packing unit 113 generates an occupancy map indicating, to the depth image, whether or not a region includes useful information. Note that, when the depth image includes a plurality of layers, the packing unit 113 can generate an occupancy map for each layer.
The packing unit 113 regards the occupancy map as a video frame. Note that, when an occupancy map is generated for each layer of the depth image, the packing unit 113 regards an occupancy map corresponding to each depth image as a video frame.
The packing unit 113 supplies them to the processing units in subsequent stages. For example, the packing unit 113 supplies the geometry video frame to the geometry video frame encoding unit 114. Furthermore, the packing unit 113 supplies the attribute video frame to the attribute video frame encoding unit 115. Moreover, the packing unit 113 supplies the occupancy map to the occupancy-map encoding unit 116. Furthermore, the packing unit 113 supplies the angle video frame to the angle video frame encoding unit 117.
The geometry video frame encoding unit 114 performs processing related to encoding of the geometry video frame. For example, the geometry video frame encoding unit 114 acquires the geometry video frame supplied from the packing unit 113. Furthermore, as described above in <1. Two-dimensional image encoding of LiDAR data>, the geometry video frame encoding unit 114 encodes the acquired geometry video frame with the encoding method for two-dimensional images, and generates encoded data of the geometry video frame. The encoding method for two-dimensional images is arbitrary, and may be Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), or the like, for example. The geometry video frame encoding unit 114 supplies the encoded data of the geometry video frame to the multiplexer 118.
The attribute video frame encoding unit 115 performs processing related to encoding of the attribute video frame. For example, the attribute video frame encoding unit 115 acquires the attribute video frame supplied from the packing unit 113. Furthermore, the attribute video frame encoding unit 115 encodes the acquired attribute video frame with an arbitrary encoding method for two-dimensional images, such as AVC or HEVC, for example. The attribute video frame encoding unit 115 supplies the multiplexer 118 with the encoded data of the attribute video frame, the encoded data being obtained by the encoding.
The occupancy-map encoding unit 116 performs processing related to encoding of a video frame of an occupancy map. For example, the occupancy-map encoding unit 116 acquires the occupancy map supplied from the packing unit 113. Furthermore, as described above in <1. Two-dimensional image encoding of LiDAR data>, the occupancy-map encoding unit 116 encodes the acquired occupancy map with, for example, an arbitrary encoding method for two-dimensional images. The occupancy-map encoding unit 116 supplies the multiplexer 118 with the encoded data of the occupancy map, the encoded data being obtained by the encoding.
The angle video frame encoding unit 117 performs processing related to encoding of the angle video frame. For example, the angle video frame encoding unit 117 acquires the angle video frame supplied from the packing unit 113. Furthermore, as described above in <1. Two-dimensional image encoding of LiDAR data>, the angle video frame encoding unit 117 encodes the acquired angle video frame with, for example, an arbitrary encoding method for two-dimensional images, such as AVC or HEVC. The angle video frame encoding unit 117 supplies the multiplexer 118 with the encoded data of the angle video frame, the encoded data being obtained by the encoding.
The multiplexer 118 performs processing related to multiplexing. For example, the multiplexer 118 acquires the encoded data of the auxiliary patch information, the encoded data being supplied from the auxiliary patch information compression unit 112. Furthermore, the multiplexer 118 acquires the encoded data of the geometry video frame, the encoded data being supplied from the geometry video frame encoding unit 114. Moreover, the multiplexer 118 acquires the encoded data of the attribute video frame, the encoded data being supplied from the attribute video frame encoding unit 115. Furthermore, the multiplexer 118 acquires the encoded data of the occupancy map, the encoded data being supplied from the occupancy-map encoding unit 116. Moreover, the multiplexer 118 acquires the encoded data of the angle video frame, the encoded data being supplied from the angle video frame encoding unit 117.
The multiplexer 118 multiplexes these pieces of acquired information to generate a bit stream. The multiplexer 118 outputs the generated bit stream to outside of the encoding apparatus 100.
In the encoding apparatus 100 having such a configuration, each of the processing units from the decomposition processing unit 111 to the multiplexer 118 performs the processing in <1. Two-dimensional image encoding of LiDAR data> by applying the above-described present technology. Therefore, as described above in <1. Two-dimensional image encoding of LiDAR data>, the encoding apparatus 100 can reduce deterioration in quality of a point cloud.
Note that these processing units (the decomposition processing unit 111 to the multiplexer 118) have an arbitrary configuration. For example, each of the processing units may include a logic circuit that achieves the above-described processing. Furthermore, each of the processing units may have, for example, a central processing unit (CPU), a read only memory (ROM), a random access memory (RAM), and the like, and execute a program by using the CPU, the ROM, the RAM, and the like to achieve the above-described processing. Needless to say, each of the processing units may have both the configurations and part of the above-described processing may be achieved by the logic circuit, and another may be achieved by executing the program. The configurations of the processing units may be independent from each other. For example, among the processing units, some processing units may achieve part of the above-described processing with a logic circuit, some other processing units may achieve the above-described processing by executing a program, and still some other processing units may achieve the above-described processing with both a logic circuit and execution of a program.
<Flow of Encoding Processing>
An example of a flow of encoding processing executed by the encoding apparatus 100 will be described with reference to a flowchart in
When the encoding processing starts, in Step S101, the decomposition processing unit 111 of the encoding apparatus 100 decomposes geometry data to generate angle information (and depth information).
In Step S102, the auxiliary patch information compression unit 112 compresses auxiliary patch information obtained by the processing in Step S101.
In Step S103, the packing unit 113 generates a depth image by using the depth information generated in Step S101, and generates a geometry video frame. Furthermore, the packing unit 113 generates an angle image by using the angle information generated in Step S101, and generates an angle video frame. Moreover, the packing unit 113 generates an attribute video frame by using attribute data. Furthermore, the packing unit 113 generates an occupancy map corresponding to the depth image.
In Step S104, the geometry video frame encoding unit 114 encodes, with an encoding method for two-dimensional images, the geometry video frame obtained by the processing in Step S103.
In Step S105, the attribute video frame encoding unit 115 encodes, with the encoding method for two-dimensional images, the attribute video frame obtained by the processing in Step S103.
In Step S106, the occupancy-map encoding unit 116 encodes, with the encoding method for two-dimensional images, the occupancy map obtained by the processing in Step S103.
In Step S107, the angle video frame encoding unit 117 encodes, with the encoding method for two-dimensional images, the angle video frame obtained by the processing in Step S103.
In Step S108, the multiplexer 118 multiplexes various kinds of information generated as described above, and generates a bit stream including these pieces of information. In Step S109, the multiplexer 118 outputs the bit stream generated by the processing in Step S108 to the outside of the encoding apparatus 100. When the processing in step S109 ends, the encoding processing ends.
In the encoding processing in such a flow, each of the processing units from the decomposition processing unit 111 to the multiplexer 118 performs the processing in <1. Two-dimensional image encoding of LiDAR data> by applying the above-described present technology. Therefore, as described above in <1. Two-dimensional image encoding of LiDAR data>, the encoding apparatus 100 can reduce deterioration in quality of a point cloud.
<Decoding Apparatus>
Note that
As illustrated in
The demultiplexer 211 performs processing related to demultiplexing of data. For example, the demultiplexer 211 can acquire a bit stream input to the decoding apparatus 200. The bit stream is supplied from the encoding apparatus 100, for example.
Furthermore, the demultiplexer 211 can demultiplex the bit stream. For example, with demultiplexing, the demultiplexer 211 can extract encoded data of auxiliary patch information from the bit stream. Furthermore, with demultiplexing, the demultiplexer 211 can extract encoded data of a geometry video frame from the bit stream. Moreover, with demultiplexing, the demultiplexer 211 can extract encoded data of an attribute video frame from the bit stream. Furthermore, with demultiplexing, the demultiplexer 211 can extract encoded data of an occupancy map from the bit stream. Moreover, with demultiplexing, the demultiplexer 211 can extract encoded data of an angle video frame from the bit stream.
Moreover, the demultiplexer 211 can supply extracted data to the processing units in subsequent stages. For example, the demultiplexer 211 can supply the extracted encoded data of the auxiliary patch information to the auxiliary patch information decoding unit 212. Furthermore, the demultiplexer 211 can supply the extracted encoded data of the geometry video frame to the geometry video frame decoding unit 213. Moreover, the demultiplexer 211 can supply the extracted encoded data of the attribute video frame to the attribute video frame decoding unit 214. Furthermore, the demultiplexer 211 can supply the extracted encoded data of the occupancy map to the occupancy-map decoding unit 215. Moreover, the demultiplexer 211 can supply the extracted encoded data of the angle video frame to the angle video frame decoding unit 216.
The auxiliary patch information decoding unit 212 performs processing related to decoding of the encoded data of the auxiliary patch information. For example, the auxiliary patch information decoding unit 212 can acquire the encoded data of the auxiliary patch information, the encoded data being supplied from the demultiplexer 211. Furthermore, the auxiliary patch information decoding unit 212 can decode the encoded data to generate auxiliary patch information. The auxiliary patch information may include information regarding prediction that derives a prediction value of angle information. Furthermore, the decoding method is arbitrary as long as the method is a method (decoding method not for two-dimensional images) corresponding to an encoding method applied at a time of encoding (for example, an encoding method applied by an auxiliary patch information compression unit 112). Moreover, the auxiliary patch information decoding unit 212 supplies the auxiliary patch information to the 3D reconstruction unit 218.
The geometry video frame decoding unit 213 performs processing related to decoding of the encoded data of the geometry video frame. For example, the geometry video frame decoding unit 213 can acquire the encoded data of the geometry video frame, the encoded data being supplied from the demultiplexer 211. Furthermore, the geometry video frame decoding unit 213 can decode the encoded data to generate a geometry video frame. Moreover, the geometry video frame decoding unit 213 can supply the geometry video frame to the unpacking unit 217.
The attribute video frame decoding unit 214 performs processing related to decoding of the encoded data of the attribute video frame. For example, the attribute video frame decoding unit 214 can acquire the encoded data of the attribute video frame, the encoded data being supplied from the demultiplexer 211. Furthermore, the attribute video frame decoding unit 214 can decode the encoded data to generate an attribute video frame. Moreover, the attribute video frame decoding unit 214 can supply the attribute video frame to the unpacking unit 217.
The occupancy-map decoding unit 215 performs processing related to decoding of the encoded data of the occupancy map. For example, the occupancy-map decoding unit 215 can acquire the encoded data of the occupancy map, the encoded data being supplied from the demultiplexer 211. Furthermore, the occupancy-map decoding unit 215 can decode the encoded data to generate an occupancy map. Moreover, the occupancy-map decoding unit 215 can supply the occupancy map to the unpacking unit 217.
The angle video frame decoding unit 216 performs processing related to decoding of the encoded data of the angle video frame. For example, the angle video frame decoding unit 216 can acquire the encoded data of the angle video frame, the encoded data being supplied from the demultiplexer 211. Furthermore, the angle video frame decoding unit 216 can decode the encoded data to generate an angle video frame. Moreover, the angle video frame decoding unit 216 can supply the angle video frame to the unpacking unit 217.
The unpacking unit 217 performs processing related to unpacking. For example, the unpacking unit 217 can acquire the geometry video frame supplied from the geometry video frame decoding unit 213. Moreover, the unpacking unit 217 can acquire the attribute video frame supplied from the attribute video frame decoding unit 214. Furthermore, the unpacking unit 217 can acquire the occupancy map supplied from the occupancy-map decoding unit 215. Moreover, the unpacking unit 217 can acquire the angle video frame supplied from the angle video frame decoding unit 216.
Moreover, the unpacking unit 217 can unpack, on the basis of the occupancy map, the geometry video frame, the attribute video frame, and the angle video frame, and can generate (restore) depth information, angle information, attribute data, and the like.
Furthermore, the unpacking unit 217 can supply the 3D reconstruction unit 218 with the generated depth information, angle information, attribute data, and the like.
The 3D reconstruction unit 218 performs processing related to reconstruction of 3D data. For example, the 3D reconstruction unit 218 can acquire the auxiliary patch information supplied from the auxiliary patch information decoding unit 212. Furthermore, the 3D reconstruction unit 218 can acquire the depth information, angle information, attribute data (that is, geometry data and attribute data), and the like supplied from the unpacking unit 217. The 3D reconstruction unit 218 reconstructs the point cloud (LiDAR data) by using these pieces of information.
The 3D reconstruction unit 218 outputs the point cloud data (LiDAR data) obtained by such processing to outside of the decoding apparatus 200. The point cloud data (LiDAR data) is rendered and displayed on a display unit, recorded on a recording medium, or supplied to another apparatus via communication, for example.
In the decoding apparatus 200 having such a configuration, each of the processing units from the demultiplexer 211 to the 3D reconstruction unit 218 performs the processing in <1. Two-dimensional image encoding of LiDAR data> by applying the above-described present technology. Therefore, as described above in <1. Two-dimensional image encoding of LiDAR data>, the decoding apparatus 200 can reduce deterioration in quality of a point cloud.
Note that these processing units (the demultiplexer 211 to the 3D reconstruction unit 218) have an arbitrary configuration. For example, each of the processing units may include a logic circuit that achieves the above-described processing. Furthermore, each of the processing units may have, for example, a CPU, a ROM, a RAM, and the like, and execute a program by using the CPU, the ROM, the RAM, and the like to achieve the above-described processing. Needless to say, each of the processing units may have both the configurations and part of the above-described processing may be achieved by the logic circuit, and another may be achieved by executing the program. The configurations of the processing units may be independent from each other. For example, among the processing units, some processing units may achieve part of the above-described processing with a logic circuit, some other processing units may achieve the above-described processing by executing a program, and still some other processing units may achieve the above-described processing with both a logic circuit and execution of a program.
<Flow of Decoding Processing>
An example of a flow of decoding processing executed by such a decoding apparatus 200 will be described with reference to a flowchart in
When the decoding processing starts, in Step S201, the demultiplexer 211 of the decoding apparatus 200 demultiplexes a bit stream.
In Step S202, the auxiliary patch information decoding unit 212 decodes encoded data of auxiliary patch information, the encoded data being extracted from the bit stream by the processing in Step S201.
In Step S203, the geometry video frame decoding unit 213 decodes encoded data of a geometry video frame, the encoded data being extracted from the bit stream by the processing in Step S201.
In Step S204, the attribute video frame decoding unit 214 decodes encoded data of an attribute video frame, the encoded data being extracted from the bit stream by the processing in Step S201.
In Step S205, the occupancy-map decoding unit 215 decodes encoded data of an occupancy map, the encoded data being extracted from the bit stream by the processing in Step S201.
In Step S206, the angle video frame decoding unit 216 decodes encoded data of an angle video frame, the encoded data being extracted from the bit stream by the processing in Step S201.
In Step S207, the unpacking unit 217 unpacks the geometry video frame on the basis of the occupancy map or the like, and generates (restores) depth information. Furthermore, the unpacking unit 217 unpacks the attribute video frame on the basis of the occupancy map or the like, and generates (restores) attribute data. Moreover, the unpacking unit 217 unpacks the angle video frame on the basis of the occupancy map or the like, and generates (restores) angle information.
In Step S208, the 3D reconstruction unit 218 reconstructs the point cloud data (LiDAR data) on the basis of the auxiliary patch information generated in Step S202 and various kinds of information obtained in Step S207. When the processing in Step S208 ends, the decoding processing ends.
In the decoding processing in such a flow, each of the processing units from the demultiplexer 211 to the 3D reconstruction unit 218 performs the processing in <1. Two-dimensional image encoding of LiDAR data> by applying the above-described present technology. Therefore, as described above in <1. Two-dimensional image encoding of LiDAR data>, the decoding apparatus 200 can reduce deterioration in quality of a point cloud.
<4. Supplementary Note>
<Point Cloud Data>
Although LiDAR data has been described above as an example of the point cloud data, the point cloud data may be other than LiDAR data. That is, the present technology can be applied to arbitrary point cloud data.
<Computer>
The above-described series of processing can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program included in the software is installed on a computer. Here, examples of the computer include a computer incorporated in dedicated hardware, a general-purpose personal computer, for example, which is capable of executing various kinds of functions by installing various kinds of programs, or the like.
In a computer 900 illustrated in
Furthermore, an input/output interface 910 is connected to the bus 904. An input unit 911, an output unit 912, a storage unit 913, a communication unit 914, and a drive 915 are connected to the input/output interface 910.
The input unit 911 includes, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, or the like. The output unit 912 includes, for example, a display, a speaker, an output terminal, or the like. The storage unit 913 includes, for example, a hard disk, a RAM disk, a non-volatile memory, or the like. The communication unit 914 includes, for example, a network interface. The drive 915 drives a removable medium 921 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
In a computer configured as above, the series of processing described above is performed by the CPU 901 loading, for example, a program stored in the storage unit 913 to the RAM 903 via the input/output interface 910 and the bus 904 and executing the program. As appropriate, the RAM 903 also stores data necessary for the CPU 901 to execute various kinds of processing.
A program executed by the computer can be applied by being recorded on the removable medium 921 as a package medium, or the like, for example. In this case, the program can be installed on the storage unit 913 via the input/output interface 910 by attaching the removable medium 921 to the drive 915.
Furthermore, the program can also be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting. In this case, the program can be received by the communication unit 914 and installed on the storage unit 913.
In addition, the program can be installed on the ROM 902 or the storage unit 913 in advance.
<Object to which Present Technology is Applied>
Although an encoding apparatus, a decoding apparatus, and the like have been described above as application examples of the present technology, the present technology can be applied to an arbitrary configuration.
For example, the present technology may be applied to various electronic devices such as a transmitter or receiver (for example, a television receiver or a mobile phone) for satellite broadcasting, wired broadcasting such as cable television, distribution on the Internet, or distribution to a terminal by cellular communication, an apparatus (for example, a hard disk recorder or a camera) that records an image on a medium such as an optical disk, a magnetic disk, or a flash memory, or regenerates an image from the storage medium, or the like.
Furthermore, for example, the present technology can also be implemented as a partial configuration of an apparatus, such as a processor (for example, a video processor) as a system large scale integration (LSI) or the like, a module (for example, a video module) using a plurality of processors or the like, a unit (for example, a video unit) using a plurality of modules or the like, or a set (for example, a video set) obtained by further adding another function to the unit.
Furthermore, for example, the present technology can also be applied to a network system including a plurality of apparatuses. For example, the present technology may be implemented as cloud computing in which the present technology is shared and processed jointly by a plurality of apparatuses via a network. For example, the present technology may be implemented in cloud service that provides a service related to an image (moving image) to an arbitrary terminal such as a computer, an audio visual (AV) apparatus, a portable information processing terminal, or Internet of Things (IoT) device.
Note that, in the present description, the system means a set of a plurality of components (apparatuses, modules (parts), or the like) without regard to whether or not all the components are in the same housing. Therefore, a plurality of apparatuses housed in separate housings and connected via a network, and one apparatus storing a plurality of modules in one housing are both systems.
<Field/Application to which Present Technology is Applicable>
A system, apparatus, processing unit, or the like to which the present technology is applied may be utilized in an arbitrary field such as traffic, medical care, crime prevention, agriculture, stockbreeding, mining, beauty, factory, home appliance, weather, or nature monitoring, for example. Furthermore, application thereof is also arbitrary.
<Others>
Furthermore, various kinds of information (metadata or the like) related to coded data (bit stream) may be transmitted or recorded in any form as long as the information is associated with the coded data. Here, the term “associate” means, for example, that, when one datum, is processed, another datum may be utilized (may be linked). That is, data associated with each other may be collected as one data or may be separate data. For example, information associated with coded data (image) may be transmitted on a transmission path different from a transmission line of the coded data (image).
Furthermore, for example, information associated with coded data (image) may be recorded on a recording medium different from (or the same as) a recording medium of the coded data (image). Note that the “association” may be applied to a part of data instead of the entire data. For example, an image and information corresponding to the image may be associated with each other in arbitrary units such as in units of a plurality of frames, one frame, or a part in a frame.
Note that, in the present specification, terms such as “combine”, “multiplex”, “add”, “integrate”, “include”, “store”, “contain”, “put in”, or “insert” mean to collect a plurality of items into one, for example, to collect coded datum and meta datum into one datum, and mean one method for the above-described “association”.
Furthermore, embodiments of the present technology are not limited to the above-described embodiments, and various changes can be made without departing from the scope of the present technology.
For example, a configuration described as one apparatus (or processing unit) may be divided and configured as a plurality of apparatuses (or processing units). To the contrary, the configurations described above as a plurality of apparatuses (or processing units) may be collectively configured as one apparatus (or processing unit). Furthermore, needless to say, a configuration other than the configurations described above may be added to a configuration of each apparatus (or each processing unit). Moreover, if a configuration and operation of an entire system are substantially the same, a part of a configuration of a certain apparatus (or processing unit) may be included in a configuration of another apparatus (or another processing unit).
Furthermore, for example, the above-described programs may be executed in an arbitrary apparatus. In that case, the apparatus is only required to have a necessary function (function block, or the like) so that necessary information can be acquired.
Furthermore, for example, each step in one flowchart may be executed by one apparatus, or may be executed by being shared by a plurality of apparatuses. Moreover, when a plurality of pieces of processing is included in one step, the plurality of pieces of processing may be executed by one apparatus, or may be executed by being shared by a plurality of apparatuses. In other words, a plurality of pieces of processing included in one step can be executed as processing of a plurality of steps. To the contrary, pieces of processing described as a plurality of steps can be collectively executed as one step.
Furthermore, for example, a program executed by the computer may be a program in which processing of steps describing the program is executed in time series in an order described in the present description, or a program in which the processing may be executed in parallel, or separately at a necessary timing such as when a call is made. That is, unless there is a contradiction, the processing of each step may be executed in an order different from the order described above. Moreover, the processing of steps describing the program may be executed in parallel with processing of another program, or may be executed in combination with processing of another program.
Furthermore, for example, a plurality of techniques related to the present technology can be implemented independently as a single piece unless there is a contradiction. Needless to say, a plurality of arbitrary pieces of the present technology can be used in combination. For example, a part or all of the present technology described in any of the embodiments can be implemented in combination with a part or all of the present technology described in another embodiment. Furthermore, any part or all of the present technology described above can be implemented in combination with another technology not described above.
Note that the present technology can have the following configurations.
(1) An image processing apparatus including
(2) The image processing apparatus according to (1),
(3) The image processing apparatus according to (1),
(4) The image processing apparatus according to (1),
(5) The image processing apparatus according to (4), the image processing apparatus further including
(6) The image processing apparatus according to (1), the image processing apparatus further including an angular component extraction unit that extracts the angular component from the geometry data,
(7) The image processing apparatus according to (1), the image processing apparatus further including
(8) The image processing apparatus according to (7),
(9) The image processing apparatus according to (8),
(10) An image processing method including
(11) An image processing apparatus including
(12) The image processing apparatus according to (11),
(13) The image processing apparatus according to (11),
(14) The image processing apparatus according to
(15) The image processing apparatus according to (14), the image processing apparatus further including
(16) The image processing apparatus according to (11), the image processing apparatus further including
(17) The image processing apparatus according to (16), the image processing apparatus further including
(18) The image processing apparatus according to (16),
(19) The image processing apparatus according to (18), the image processing apparatus further including
(20) An image processing method including
Number | Date | Country | Kind |
---|---|---|---|
2020-148334 | Sep 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/030600 | 8/20/2021 | WO |