The disclosure relates to the field of communication technologies, and in particular, to an immersive media data processing method and apparatus, a device, and a storage medium.
Immersive media is media content that delivers an immersive experience to a business object. Point cloud media may be immersive media. In the related art, although division of point cloud slices is supported and the point cloud slices are identified, a scenario of a plurality of point cloud slices existing in one point cloud frame and how to identify different point cloud slices in this scenario are not considered. In this case, encoding and file encapsulation based on the point cloud slices cannot be fully supported, consequently affecting decoding and presentation efficiency of point cloud media.
Provided are an immersive media data processing method and apparatus, a device, a storage medium, and a program product, which can implement encoding and file encapsulation based on point cloud slices, and improve decoding and presentation efficiency of point cloud media.
According to some embodiments, an immersive media data processing method, performed by a computer device, includes: obtaining a media file resource of immersive media; decapsulating the media file resource to obtain a point cloud bitstream including a plurality of point cloud frames and slice information indicating one or more point cloud slices in a point cloud frame; and decoding the point cloud bitstream based on the slice information.
According to some embodiments, an immersive media data processing apparatus, includes: at least one memory configured to store computer program code; and at least one processor configured to read the program code and operate as instructed by the program code, the program code including: obtaining code configured to cause at least one of the at least one processor to obtain a media file resource of immersive media; decapsulation code configured to cause at least one of the at least one processor to decapsulate the media file resource to obtain a point cloud bitstream including a plurality of point cloud frames and slice information indicating each of one or more point cloud slices in a point cloud frame; and decoding code configured to cause at least one of the at least one processor to decode the point cloud bitstream based on the slice information.
According to some embodiments, a non-transitory computer-readable storage medium, storing computer code which, when executed by at least one processor, causes the at least one processor to at least: obtain a media file resource of immersive media; decapsulate the media file resource to obtain a point cloud bitstream including a plurality of point cloud frames and slice information indicating each of one or more point cloud slices in a point cloud frame; and decode the point cloud bitstream based on the slice information.
To describe the technical solutions of some embodiments of this disclosure more clearly, the following briefly introduces the accompanying drawings for describing some embodiments. The accompanying drawings in the following description show only some embodiments of the disclosure, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts. In addition, one of ordinary skill would understand that aspects of some embodiments may be combined together or implemented alone.
To make the objectives, technical solutions, and advantages of the present disclosure clearer, the following further describes the present disclosure in detail with reference to the accompanying drawings. The described embodiments are not to be construed as a limitation to the present disclosure. All other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure.
In the following descriptions, related “some embodiments” describe a subset of all possible embodiments. However, it may be understood that the “some embodiments” may be the same subset or different subsets of all the possible embodiments, and may be combined with each other without conflict. As used herein, each of such phrases as “A or B,” “at least one of A and B,” “at least one of A or B,” “A, B, or C,” “at least one of A, B, and C,” and “at least one of A, B, or C,” may include all possible combinations of the items enumerated together in a corresponding one of the phrases. For example, the phrase “at least one of A, B, and C” includes within its scope “only A”, “only B”, “only C”, “A and B”, “B and C”, “A and C” and “all of A, B, and C.”
The following describes some technical terms involved in some embodiments:
Immersive media refers to a media file that can provide immersive media content, so that a business object immersed in the media content can get real-world sensory experiences such as a visual experience and an auditory experience. Immersive media may be classified into 3DoF media, 3DoF+ media, and 6DoF media based on a degree of freedom (DoF) of a business object when consuming media content. Point cloud media may be 6DoF media. In some embodiments, a user that consumes immersive media (for example, point cloud media) may be referred to as a business object.
A point cloud is a set of irregularly distributed discrete points in space that expresses a spatial structure and a surface attribute of a three-dimensional object or scene. Each point in the point cloud has at least three-dimensional position information, and may also have color, material, or other information based on different application scenarios. Usually, each point in the point cloud has the same number of additional attributes.
The point cloud can flexibly and conveniently express a spatial structure and a surface attribute of a three-dimensional object or scene, and is therefore widely applied to virtual reality (VR) gaming, computer aided design (CAD), geography information systems (GISs), autonomous navigation systems (ANSs), digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive telepresence, three-dimensional reconstruction of biological tissues and organs, and the like.
A point cloud may be obtained in the following ways: generation by computers, 3-dimension (3D) laser scanning, and 3D photogrammetry. A computer can generate point clouds of a virtual 3D object and scene. By 3D scanning, a million-level point cloud of a static real-world 3D object or scene can be obtained per second. By 3D photogrammetry, a ten million-level point cloud of a dynamic real-world 3D object or scene can be obtained per second. In addition, in the medical field, point clouds of a biological tissue and organ can be obtained by magnetic resonance imaging (MRI), computed tomography (CT), and electromagnetic positioning information. These techniques reduce costs and time cycles for obtaining point cloud data, and improve data precision. Changes in manners of obtaining point cloud data may facilitate obtaining a large volume of point cloud data. With continuous accumulation of large-scale point cloud data, efficient storage, transmission, distribution, sharing, and standardization of point cloud data become the key to point cloud applications.
A track is a media data set in a media file encapsulation process, including a plurality of time sequential samples. One media file may include one or more tracks. For example, one media file may include one video media track, one audio media track, and one subtitle media track. Metadata information may also be included in a file as a media type in a form of a metadata media track.
A sample is an encapsulation unit in a media file encapsulation process. One track includes many samples. Each sample corresponds to timestamp information. For example, one video media track may include many samples, and one sample is usually one video frame. In another example, one sample in a point cloud media track may be one point cloud frame.
Sample number: a number of a sample. A number of the first sample in a track is 1.
Some embodiments relate to an immersive media data processing technique. The following describes some concepts in an immersive media data processing process. Some embodiments are described by using an example in which immersive media is point cloud media.
Refer to
Refer to
Refer to
Refer to
The point cloud collection is to convert, into binary digital information, point cloud data collected by a plurality of cameras from different angles. The binary digital information obtained through conversion from the point cloud data is a binary data stream. The binary digital information may also be referred to as a bitstream of the point cloud data. The point cloud encoding is to convert a file of an original video format into a file of another video format through a compression technique. In terms of obtaining point cloud data, the point cloud data may be captured by a camera or generated by a computer. Different statistical characteristics may correspond to different compression encoding manners. Used compression encoding manners may include the international video encoding standard High Efficiency Video Coding (HEVC)/H.265, the international video encoding standard Versatile Video Coding (VVC)/H.266, the China national video encoding standard Audio Video Coding Standard (AVS), the third-generation video encoding standard introduced by the AVS standard group (AVS3), and the like.
After point cloud encoding, an encoded data stream (for example, a point cloud bitstream) may be encapsulated, and the encoded data stream may be transmitted to a business object. The point cloud file encapsulation is to store an encoded and compressed point cloud bitstream in a file in an encapsulation format (or a container or a file container). Encapsulation formats may include an audio video interleaved (AVI) format or an ISOBMFF format. In some embodiments, a point cloud bitstream is encapsulated in a file container in an ISOBMFF file format to form a point cloud file (which may also be referred to as a media file, an encapsulated file, or a video file). The point cloud file may include a plurality of tracks, for example, may include one video track, one audio track, and one subtitle track.
After performing the encoding process and the file encapsulation process, a content production device may transmit the point cloud file to a client on a content consumption device. The client may perform inverse operations such as decapsulation and decoding to present final media content in the client. The point cloud file may be sent to the client based on various transport protocols. The transport protocols may include, but are not limited to: a DASH protocol, a dynamic bit rate adaptive transport (HTTP Live Streaming, HLS) protocol, a smart media transport protocol (SMTP), and a transmission control protocol (TCP).
The file decapsulation process of the client is inverse to the file encapsulation process. The client may decapsulate the point cloud file according to the file encapsulation format, to obtain the point cloud bitstream. The decoding process of the client is also inverse to the encoding process. For example, the client may decode the point cloud bitstream to restore and present the media content.
For ease of understanding, refer to
The immersive media data processing technique involved in some embodiments may be implemented based on cloud technology. For example, a cloud server is used as the content production device. Cloud technology refers to a hosting technology that unifies a series of resources such as hardware, software, and network in a wide area network or a local area network, to achieve data computing, storage, processing, and sharing.
A point cloud media data processing process includes a data processing process on the content production device side and a data processing process on the content consumption device side.
The data processing process on the content production device side may include: (1) a process of obtaining and producing media content of the point cloud media; and (2) a process of encoding and file encapsulation of the point cloud media. The data processing process on the content consumption device side may include: (1) a process of file decapsulation and decoding of the point cloud media; and (2) a process of rendering the point cloud media. In addition, a transmission process of the point cloud media is involved between the content production device and the content consumption device. The transmission process may be implemented based on various transport protocols. The transport protocols may include, but are not limited to, the DASH protocol, the HLS protocol, the SMTP protocol, and the TCP protocol.
The processes involved in the point cloud media data processing process are described in detail below with reference to
1. Data processing process on content production device side:
(1) Process of obtaining and producing media content of point cloud media.
1) Process of obtaining media content of point cloud media.
The media content of the point cloud media is obtained by a capture device collecting a real-world sound-visual scene. In some embodiments, the capture device may be a hardware component configured in the content production device. For example, the capture device is a microphone, camera, or sensor of a terminal. In some embodiments, the capture device may be a hardware apparatus connected to the content production device, for example, a camera connected to a server, for providing a service of obtaining the media content of the point cloud media for the content production device. The capture device may include, but is not limited to: an audio device, a camera device, and a sensing device. The audio device may include an audio sensor, a microphone, and the like. The camera device may include an camera, a stereo camera, a light-field camera, and the like. The sensing device may include a laser device, a radar device, and the like. There may be a plurality of capture devices. These capture devices are deployed at some positions in real space to capture audio content and video content from different angles in the space. The captured audio content and video content are synchronized in both time and space. In some embodiments, media content in three-dimensional space that is for providing a multi-degree of freedom (for example, 6DoF) viewing experience and that is collected by a capture device deployed at a position may be referred to as point cloud media.
For example, video content of point cloud media is obtained. As shown in
The process of producing the media content of the point cloud media involved in some embodiments may be understood as a process of content producing of the point cloud media, and the content producing of the point cloud media may be implemented by producing content in a form of point cloud data obtained through capturing by cameras or camera arrays deployed at a plurality of positions. For example, the content production device may convert the point cloud media from a three-dimensional representation to a two-dimensional representation. The point cloud media may include geometry information, attribute information, occupancy map information, atlas data, and the like. The point cloud media may be processed before encoding. For example, point cloud data may be cut and mapped before encoding.
The description is as follows: (1) Three-dimensional representation data (for example, the point cloud data) of the collected and inputted point cloud media is projected to a two-dimensional plane, usually by orthogonal projection, perspective projection, or equi-rectangular projection (ERP). The point cloud media projected to the two-dimensional plane is represented by data of a geometry component, an occupancy component, and an attribute component. The data of the geometry component provides position information of each point of the point cloud media in three-dimensional space. The data of the attribute component provides an additional attribute (such as color, texture, or material information) of each point of the point cloud media. The data of the occupancy component indicates whether data in another component is associated with the point cloud media.
In some embodiments, in the process of content producing of the point cloud media, the geometry component is mandatory, and the occupancy component is conditionally mandatory.
In addition, as a panoramic video can be captured by the capture device, after the video is processed by the content production device and transmitted to the content consumption device for corresponding data processing, a business object on the content consumption device may perform some actions (for example, head rotation) to view 360-degree video information, while performing non-specific actions (for example, head moving) cannot cause corresponding video changes, resulting in a poor VR experience. In this case, depth information that matches the panoramic video may be provided for the business object for immersion and the VR experience, which may involve 6DoF production technology. 6DoF is for the business object to move freely in a simulated scene. When the video content of the point cloud media is produced by the 6DoF production technology, the capture device is usually a laser device or a radar device to capture point cloud data in space.
Captured audio content may be directly configured for audio encoding to form an audio bitstream of the point cloud media. Captured video content may be configured for video encoding to obtain a video bitstream of the point cloud media. If the 6DoF production technology is used, video encoding may use an encoding manner (for example, video-based point cloud compression). The audio bitstream and the video bitstream are encapsulated in a file container in a file format (for example, ISOBMFF) of the point cloud media to form a media file resource of the point cloud media. The media file resource may be a media file or a media file of the point cloud media formed by a media fragment. Metadata information (or “metadata”) of the media file resource of the point cloud media is recorded by using media presentation description (MPD) information based on a file format of the point cloud media. The metadata information is a generic term for information related to presentation of the point cloud media. The metadata information may include description information of media content, description information of a window, signaling information related to presentation of media content, and the like. The content production device stores the media presentation description information and the media file resource that are formed through data processing. The media presentation description information may be added to the media file resource and delivered to the content consumption device.
Collected audio is encoded into a corresponding audio bitstream, geometry information, attribute information, and occupancy map information of the point cloud media may be encoded in a video encoding manner, and atlas data of the point cloud media may be encoded in an entropy encoding manner. Encoded media is encapsulated in a file container in a format (such as ISOBMFF or HNSS), and is combined with metadata describing a media content attribute and window metadata to form a media file or to form an initialization segment and a media segment in a media file format.
For example, as shown in
The content consumption device may adaptively and dynamically obtain the media file resource and the corresponding media presentation description information of the point cloud media from the content production device through recommendation of the content production device or based on the business object on the content consumption device side. For example, the content consumption device may determine a viewing direction and a viewing position of the business object based on head/eye position information of the business object, and dynamically request the corresponding media file resource from the content production device based on the determined viewing direction and viewing position. The media file resource and the media presentation description information are transmitted from the content production device to the content consumption device by using a transmission mechanism (such as DASH or SMT). The process of file decapsulation on the content consumption device side is inverse to the process of file encapsulation on the content production device. The content consumption device decapsulates the media file resource based on a file format (for example, ISOBMFF) of the point cloud media, to obtain the corresponding audio bitstream and video bitstream. The process of decoding on the content consumption device side is inverse to the process of encoding on the content production device side. The content consumption device performs audio decoding on the audio bitstream to restore audio content. The content consumption device performs video decoding on the video bitstream to restore video content.
For example, as shown in
The content consumption device renders, based on rendering-related metadata in the media presentation description information corresponding to the media file resource, audio content obtained through audio decoding and video content obtained through video decoding, to play and output the content.
The immersive media system supports a data box. The data box is a data block or object that includes metadata, for example, the data box includes metadata of corresponding media content. During actual application, the content production device may guide the content consumption device through the data box to consume the media file of the point cloud media. The point cloud media may include a plurality of data boxes, for example, include an ISOBMFF box that includes metadata for describing corresponding information in file encapsulation, for example, that may include point cloud slice information related to each track for file encapsulation based on point cloud slices.
For example, as shown in
Based on the above, the content consumption device may dynamically obtain a media file resource corresponding to the point cloud media from the content production device side. The media file resource is obtained by encoding and encapsulating captured audio and video content by the content production device. In this case, based on receiving the media file resource returned by the content production device, the content consumption device may first decapsulate the media file resource to obtain corresponding audio and video bitstreams, decode the audio and video bitstreams, and present decoded audio and video content to the business object. The point cloud media may include, but is not limited to, video-based point cloud compression (VPCC) point cloud media and GPCC point cloud media.
A point cloud sequence is the highest-level syntactic structure of a point cloud bitstream. A point cloud sequence starts with sequence header information (or “sequence header”) and is followed by one or more point cloud frames. Each point cloud frame may include geometry header information (or “geometry header”), attribute header information (or “attribute header”), and one or more point cloud slices. The point cloud slice includes a geometry slice header, geometry data, an attribute slice header, and attribute data. When one point cloud frame includes a plurality of point cloud slices, the plurality of point cloud slices cannot be identified and distinguished by a point cloud slice indication method in the related art. Accordingly, a method is provided for encoding and encapsulating a point cloud slice, which may guide decoding, transmission, and presentation of point cloud media. The content production device may encode obtained point cloud data to obtain a point cloud bitstream including slice information. The slice information may indicate each of M point cloud slices included in a point cloud frame in the current point cloud bitstream, where M is a positive integer. For example, even when a point cloud frame includes a plurality of (M>1) point cloud slices, different point cloud slices can be distinguished. Further, the content production device may encapsulate the point cloud bitstream based on the slice information into a media file resource of immersive media, and transmit the media file resource to the content consumption device. Correspondingly, the content consumption device may decapsulate the obtained media file resource to obtain the point cloud bitstream including the slice information, may decode the obtained point cloud bitstream based on the slice information, and may present decoded point cloud data. Some embodiments support a scenario in which a single point cloud frame includes a plurality of point cloud slices, to implement encoding and file encapsulation based on the point cloud slices. Correspondingly, the content consumption device can decode the corresponding point cloud slices based on a business object and the slice information, to implement partial decoding, thereby improving decoding and presentation efficiency of the point cloud media.
The method provided in some embodiments is applicable to a server side (for example, the content production device side), a player side (for example, the content consumption device side), and an intermediate node (such as a smart media transport (SMT) receiving entity or SMT sending entity) of the immersive media system. For a process in which the content production device encodes the point cloud data to obtain the point cloud bitstream including the slice information and encapsulates the point cloud bitstream based on the slice information into the media file resource, and for a process in which the content consumption device decapsulates the media file resource to obtain the corresponding point cloud bitstream and decodes the point cloud bitstream based on the included slice information, refer to the following descriptions of some embodiments as illustrated in
Further, refer to
A server may encapsulate a point cloud bitstream including slice information to obtain a media file resource, and may send the obtained media file resource to the client for consumption. Correspondingly, the client may obtain the media file resource and decapsulate the media file resource to obtain the corresponding point cloud bitstream. The point cloud bitstream includes a plurality of point cloud frames and the slice information. The slice information indicates each of M point cloud slices included in a point cloud frame in the point cloud bitstream, where M is a positive integer. A number of the point cloud frames and a number of point cloud slices included in each point cloud frame are not limited herein. In some embodiments, each point cloud slice has corresponding slice information. For example, using an mth point cloud slice in the M point cloud slices as an example (where m is a positive integer less than or equal to M), slice information corresponding to the mth point cloud slice may include all information related to the point cloud slice, such as a geometry slice header, geometry data, an attribute slice header, and attribute data that are related to the point cloud slice, which may include an identifier of the point cloud slice, an identifier of a point cloud frame to which the point cloud slice belongs, and the like. For a process in which the server encodes point cloud data and performs file encapsulation on an obtained point cloud bitstream, refer to a subsequent description in some embodiments as illustrated in
To support the operations in some embodiments, based on the related art, in some embodiments, several descriptive fields are added at a system layer, including a high-level syntactic level, a file encapsulation level, and a transmission signaling level. Subsequently, using an extended AVS encoding high-level syntactic structure, an ISOBMFF data box, and DASH signaling as examples, a relevant field is defined to support indications of point cloud slice encoding, point cloud file encapsulation, and transmission signaling.
The following describes in detail a relevant field extended in AVS GPCC bitstream high-level syntax (for example, an HLS definition related to an extended point cloud slice) with reference to relevant syntax, to describe content of the slice information.
In some embodiments, the slice information may include content that is related to a point cloud slice and that is in a geometry header. For example, for ease of understanding and description, using one point cloud frame (which may be any point cloud frame) in the point cloud bitstream as an example, the point cloud frame includes M point cloud slices, slice information related to the point cloud frame may include a first frame identification field, a multi-slice identification field, and a slice number field, and the first frame identification field, the multi-slice identification field, and the slice number field may be all added in a geometry header corresponding to the point cloud frame. The first frame identification field may indicate an identifier of the point cloud frame; the multi-slice identification field may indicate that the point cloud frame includes one or more point cloud slices; and the slice number field may indicate a number of point cloud slices included in the point cloud frame (which may be a true number value or a value that differs from a true number value by a constant).
In some embodiments, when a value of the multi-slice identification field is a first flag value (for example, 1), the point cloud frame includes a plurality of point cloud slices. In this case, M is a positive integer greater than 1, and a difference between a number of the plurality of point cloud slices and a field value of the slice number field is X, X being a non-negative integer (for example, 0 or 1 or 2). When X=0, the field value of the slice number field is equal to a true number of the plurality of point cloud slices. When X=1, the field value of the slice number field plus 1 is equal to a true number of the plurality of point cloud slices. A value of X is not limited. In some embodiments, when a value of the multi-slice identification field is a second flag value (for example, 0), the point cloud frame includes one point cloud slice. In this case, M=1.
Further, for ease of understanding, refer to Table 1. Table 1 shows syntax of a geometry header information structure (for example, geometry_header( )) of point cloud media provided in some embodiments:
Some of the semantics of the syntax shown in Table 1 above are as follows: frame_id (for example, the first frame identification field) indicates an identifier of a current point cloud frame, and different point cloud slices in the same point cloud frame have the same identifier of the point cloud frame. When a value of gps_multi_slice_flag (for example, the multi-slice identification field) is 1 (for example, the first flag value), the current point cloud frame includes a plurality of point cloud slices; or when a value of gps_multi_slice_flag is 0 (for example, the second flag value), the current point cloud frame includes one point cloud slice. A value of gps_num_slice_minus_one (for example, the slice number field) is a number of point cloud slices in the current point cloud frame minus one (for example, X=1).
In some embodiments, the slice information may further include content that is related to a point cloud slice and that is in an attribute header. For example, for ease of understanding and description, still using one point cloud frame in the point cloud bitstream as an example, slice information related to the point cloud frame may include a second frame identification field, the second frame identification field may be added in an attribute header corresponding to the point cloud frame, and the second frame identification field may indicate an identifier of the point cloud frame.
Further, for ease of understanding, refer to Table 2. Table 2 shows syntax of an attribute header information structure (for example, attribute_header( )) of point cloud media provided in some embodiments:
Some of the semantics of the syntax shown in Table 2 above are as follows: aps_frame_id (for example, the second frame identification field) indicates an identifier of a point cloud frame corresponding to a current attribute header.
In some embodiments, the slice information may further include content that is related to a point cloud slice and that is in a geometry slice header. For example, for one point cloud frame in the point cloud bitstream, the point cloud frame may include M point cloud slices, and each point cloud slice corresponds to one geometry slice header. For ease of understanding and description, using a target point cloud slice in the point cloud frame as an example, the target point cloud slice may be any point cloud slice in the M point cloud slices. In this case, slice information related to the point cloud frame may include a third frame identification field. The third frame identification field may be added in a geometry slice header related to the point cloud frame, for example, added in a geometry slice header corresponding to the target point cloud slice. In this case, the third frame identification field may indicate an identifier of a point cloud frame to which the target point cloud slice corresponding to the geometry slice header belongs.
For ease of understanding, refer to Table 3. Table 3 shows syntax of a geometry slice header structure (for example, geometry_slice_header( )) of point cloud media provided in some embodiments:
Some of the semantics of the syntax shown in Table 3 above are as follows: slice_id indicates an identifier of a current geometry point cloud slice (for example, a mark number of the point cloud slice in a point cloud frame to which the point cloud slice belongs). gbh_frame_id (for example, the third frame identification field) indicates an identifier of a point cloud frame corresponding to the current geometry point cloud slice.
In some embodiments, a point cloud slice header may be associated with a point cloud slice data part, for example, a geometry slice header and geometry data of the same point cloud slice may be associated. For ease of understanding, still using the target point cloud slice as an example, the geometry slice header corresponding to the target point cloud slice may further include a first slice identification field (for example, slice_id in Table 3) indicating the target point cloud slice. Based on this, in some embodiments, when the geometry slice header is added in geometry data bitstream information related to the target point cloud slice, the geometry slice header is associated with geometry data corresponding to the target point cloud slice. In some embodiments, when the first slice identification field indicating the target point cloud slice exists in geometry data corresponding to the target point cloud slice, the geometry slice header is associated with the geometry data. Based on the corresponding geometry slice header and the geometry data being associated, when a point cloud frame includes a plurality of point cloud slices, different point cloud slices may be distinguished from each other based on the first slice identification field in the geometry slice header. For example, a geometry slice header and geometry data that have the same slice_id belong to a point cloud slice indicated by the slice_id.
For ease of understanding, refer to Table 4. Table 4 shows syntax of a geometry data bitstream information structure (for example, geometry_data_bitstream( )) of point cloud media provided in some embodiments:
Some of the semantics of the syntax shown in Table 4 above are as follows: geometry_slice_header( ) is a geometry slice header (with a syntax structure shown in Table 3). The geometry slice header carries slice_id (for example, the first slice identification field) shown in Table 3, followed by geometry_data( ), which is attribute data associated with the geometry slice header. In the related art, each slice header and point cloud slice data are individually defined, but no indication is provided for how to store each slice header and point cloud slice data in a corresponding point cloud frame. In this case, each slice header and point cloud slice data are theoretically unordered. In some embodiments, geometry_slice_header( ) is followed by geometry_data( ) the geometry_slice_header( ) and the following geometry_data( ) belong to the same point cloud slice, and the geometry_slice_header( ) and the geometry_data( ) are associated for decoding in a unit of a point cloud slice. In some embodiments, a corresponding field indication slice_id may be directly added in geometry_data( ) to associate the geometry slice header with the geometry data.
Similarly, the slice information may further include content that is related to a point cloud slice and that is in an attribute slice header. Each point cloud slice may correspond to one attribute slice header. For ease of understanding and description, still using a target point cloud slice in the M point cloud slices included in the point cloud frame as an example, slice information related to the point cloud frame may further include a fourth frame identification field. The fourth frame identification field may be added in an attribute slice header related to the point cloud frame, for example, added in an attribute slice header corresponding to the target point cloud slice. In this case, the fourth frame identification field may indicate an identifier of a point cloud frame to which the target point cloud slice corresponding to the attribute slice header belongs. In addition, when a value of a multi-slice identification field in a geometry header corresponding to the point cloud frame is a first flag value (for example, 1), the attribute slice header may further include a reflectance attribute quantization parameter offset, and the reflectance attribute quantization parameter offset may be configured for controlling a reflectance attribute quantization parameter. In some embodiments, when a value of the multi-slice identification field is a second flag value (for example, 0), the reflectance attribute quantization parameter offset may not be set.
For ease of understanding, refer to Table 5. Table 5 shows syntax of an attribute slice header structure (for example, attribute_slice_header( )) of point cloud media provided in some embodiments:
Some of the semantics of the syntax shown in Table 5 above are as follows: slice_id indicates an identifier of a current attribute point cloud slice (for example, a mark number of the point cloud slice in a point cloud frame to which the point cloud slice belongs). abh_frame_id indicates an identifier of a point cloud frame corresponding to the current attribute point cloud slice. gps_multi_slice_flag is a slice identification field. When a value of the field is 1 (for example, a first flag value), a current point cloud frame includes a plurality of point cloud slices, and corresponding reflQPoffset (for example, a reflectance attribute quantization parameter offset) may be indicated, where reflQPoffset may be a signed integer for controlling a reflectance attribute quantization parameter, ranging from −32 to 32.
In some embodiments, an attribute slice header and attribute data of the same point cloud slice may also be associated. For ease of understanding, still using the target point cloud slice as an example, the attribute slice header corresponding to the target point cloud slice may further include a second slice identification field (for example, slice_id in Table 5) indicating the target point cloud slice. Based on this, in some embodiments, when the attribute slice header is added in attribute data bitstream information related to the target point cloud slice, the attribute slice header is associated with attribute data corresponding to the target point cloud slice. In some embodiments, when the second slice identification field indicating the target point cloud slice exists in attribute data corresponding to the target point cloud slice, the attribute slice header is associated with the attribute data. Based on the corresponding attribute slice header and the attribute data being associated, when a point cloud frame includes a plurality of point cloud slices, different point cloud slices may be distinguished from each other based on the second slice identification field in the attribute slice header. For example, an attribute slice header and attribute data that have the same slice_id belong to a point cloud slice indicated by the slice_id.
For ease of understanding, refer to Table 6. Table 6 shows syntax of an attribute data bitstream information structure (for example, attribute_data_bitstream( )) of point cloud media provided in some embodiments:
Some of the semantics of the syntax shown in Table 6 above are as follows: attribute_slice_header( ) is an attribute slice header (with a syntax structure shown in Table 5). The attribute slice header carries slice_id (for example, the second slice identification field) shown in Table 5, followed by attribute_data_reflectance( ) (for example, reflectance attribute data) or attribute_data_color( ) (for example, color attribute data), which is attribute data associated with the attribute slice header (in addition to reflectance and color, there may be another type of attribute data, which is not limited herein). In some embodiments, attribute_slice_header( ) is followed by attribute_data (which may be attribute_data_reflectance( ) or attribute_data_color( ) or another type of attribute data), the attribute_slice_header( ) and the following attribute_data belong to the same point cloud slice, and the attribute_slice_header( ) and the attribute_data are associated for decoding in a unit of a point cloud slice. In some embodiments, a corresponding field indication slice_id may be directly added in attribute_data_reflectance( ) or attribute_data_color( ) to associate the attribute slice header with the attribute data.
The client may perform decoding in a unit of a point cloud slice through the foregoing field extension at the high-level syntactic level of the point cloud bitstream. For example, when a point cloud frame is decoded, a geometry header and an attribute header that correspond to the point cloud frame may be first decoded. The corresponding geometry header is parsed to know a number of point cloud slices existing in the point cloud frame (for example, gps_num_slice_minus_one in Table 1 is parsed), for example, know a number of to-be-decoded point cloud slices. In addition, an identifier of the current point cloud frame may also be known (for example, frame_id in Table 1 and aps_frame_id in Table 2 are parsed). In this case, when a point cloud slice therein is decoded, a geometry slice header corresponding to the point cloud slice may be first decoded, and geometry data associated with the geometry slice header may be decoded. Similarly, an attribute slice header corresponding to the point cloud slice may be first decoded, and attribute data associated with the attribute slice header may be decoded. In this case, even if there are a plurality of point cloud slices, data between different point cloud slices can be differentiated. This can implement encoding and decoding technologies based on the point cloud slices, for example, can support differentiated encoding optimization on different point cloud slices. Point cloud frame identifiers (for example, frame_id in Table 1, aps_frame_id in Table 2, gbh_frame_id in Table 3, and abh_frame_id in Table 5) corresponding to different point cloud slices in the same point cloud frame have the same values.
The slice information is added in the high-level syntax of the point cloud bitstream by the server in a point cloud data encoding process. Correspondingly, the client may subsequently decode the point cloud bitstream based on the slice information. For a decoding process, refer to a subsequent operation 402.
Similarly, when the server performs file encapsulation on the point cloud bitstream, metadata information for track transmission may be added. For example, corresponding slice encapsulation information may be added to be encapsulated together with the point cloud bitstream into a corresponding media file resource. The slice encapsulation information may include information related to a point cloud slice in a corresponding track, such as a number of point cloud slices included in the track, and identifiers of these point cloud slices. In some embodiments, if the point cloud bitstream is encapsulated based on a first encapsulation mode (for example, file encapsulation based on a point cloud slice), corresponding slice encapsulation information may be indicated in one or more data boxes in a slice track sample entry or a slice sample group entry. In some embodiments, if the point cloud bitstream is encapsulated based on a second encapsulation mode (for example, file encapsulation not based on a point cloud slice), corresponding slice encapsulation information may be indicated by a subsample information data box. An encapsulation mode for file encapsulation is not limited. For a process of file encapsulation, refer to operation 502 in some embodiments as illustrated in
Further, the following describes in detail a relevant field extended in a file encapsulation level (for example, an extended ISOBMFF data box) with reference to relevant syntax, to describe content of the slice encapsulation information.
In a scenario of file encapsulation based on the first encapsulation mode, for any point cloud frame in the point cloud bitstream, for ease of understanding and description, it may be assumed that M point cloud slices in the point cloud frame are encapsulated in T slice tracks, where T is a positive integer. A number of the slice tracks is not limited herein. The T slice tracks include a slice track Ti, i being a positive integer less than or equal to T. In this case, the slice track Ti is used as an example for subsequent description. The media file resource obtained by the client may include a slice track sample entry corresponding to the slice track Ti, and the slice track sample entry may indicate slice encapsulation information included in the slice track Ti.
Based on this, the slice track sample entry is defined in some embodiments. For a definition, refer to Table 7. Table 7 shows a definition of a slice track sample entry provided in some embodiments:
A sample entry type of a slice track is ‘apst’. The slice track sample entry may be included in SampleDescriptionBox. Each slice track may correspond to one slice track sample entry, and the sample entry may be an extension based on Volumetric VisualSampleEntry.
For a scenario of multi-track encapsulation based on a point cloud slice, T is a positive integer greater than 1. Each slice track may include one or more samples. Each sample may include zero or one or more point cloud slices corresponding to a point cloud frame. A number of point cloud slices included in each point cloud frame and a number of point cloud slices in each sample are not limited. For example, in the slice track Ti, a sample 1 may include one point cloud slice of a point cloud frame 1, a sample 2 may include two point cloud slices of a point cloud frame 2, and a sample 3 may include zero point cloud slices of a point cloud frame 3 (for example, the sample 3 is empty). Considering that numbers of point cloud slices in different point cloud frames are not exactly the same, and during multi-track encapsulation, point cloud slices may not exist in samples of some tracks, or a number of point cloud slices changes dynamically, some embodiments provide various manners for supporting the encapsulation mode based on point cloud slices.
In some embodiments, both static slice encapsulation information and dynamic slice encapsulation information may be indicated by the slice track sample entry. For ease of understanding, refer to Table 8. Table 8 shows syntax of a slice track sample entry structure (for example, AVSPCCSampleEntry) provided in some embodiments:
In some embodiments as depicted in Table 8, the slice track sample entry may include a dynamic slice field (for example, dynamic_num_slices_flag in Table 8), a slice maximum number field (for example, max_num_slice_ids in Table 8), and a third slice identification field (for example, slice_id in Table 8). The dynamic slice field may indicate change status of slice encapsulation information in the slice track Ti. In some embodiments, the change status may include a static state and a dynamic state. The static state indicates that samples in the slice track Ti include the same numbers of point cloud slices. The dynamic state indicates that samples in the slice track Ti include not exactly the same numbers of point cloud slices. The slice maximum number field may indicate a number of point cloud slices included in a maximum number sample in the slice track Ti. The maximum number sample is a sample with a maximum number of point cloud slices included in the slice track Ti. It is assumed that a value of the slice maximum number field is N1, N1 being a positive integer. The third slice identification field may indicate an identifier of a point cloud slice included in each sample in the slice track Ti.
In some embodiments, when a value of the dynamic slice field is a third flag value (for example, 0), the change status of the slice encapsulation information in the slice track Ti is a static state. For example, for different samples, the slice encapsulation information in the slice track Ti does not change dynamically. In this case, N1 point cloud slices exist in each sample in the slice track Ti. In some embodiments, when a value of the dynamic slice field is a fourth flag value (for example, 1), the change status of the slice encapsulation information in the slice track Ti is a dynamic state. For example, for different samples, the slice encapsulation information in the slice track Ti changes dynamically. In this case, at most N1 point cloud slices exist in each sample in the slice track Ti. For example, there may be cases in which no point cloud slice exists in a sample, and numbers of point cloud slices included in samples in which point cloud slices exist may not be the same.
For the case in which the slice encapsulation information changes dynamically, the slice maximum number field in the slice track sample entry shown in Table 8 indicates that at most N1 point cloud slices exist in each sample in the slice track Ti, but does not explicitly indicate the case in which no point cloud slice exists in a sample. Some embodiments can further introduce a slice sample group entry to resolve this problem. For ease of understanding, refer to Table 9. Table 9 shows a definition of a slice sample group entry provided in some embodiments:
A data box type of a slice sample group entry is ‘asig’, and the slice sample group entry may be included in Sample Group Description Box.
For ease of understanding, refer to Table 10. Table 10 shows syntax of a slice sample group entry structure (for example, AvsPccSliceInfoEntry) provided in some embodiments:
With reference to some embodiments as illustrated in Table 8, when the change status of the slice encapsulation information is the dynamic state, the media file resource may further include a slice sample group entry corresponding to the slice track Ti. As shown in Table 10, the slice sample group entry may include a slice presence field (for example, slice_contained_flag in Table 10). The slice presence field may indicate presence or absence of a point cloud slice in each sample of the slice track Ti. For ease of understanding and description, assuming that the point cloud bitstream includes S point cloud frames, the slice track Ti includes S samples, S being a positive integer, and the S samples include a sample Sj, j being a positive integer less than or equal to S. The sample Sj is used as an example for subsequent description. In some embodiments, when a value of a slice presence field corresponding to the sample Sj is a fifth flag value (for example, 1), the sample Sj belongs to a first sample group. A point cloud slice exists in each sample included in the first sample group, and slice encapsulation information included in each sample in the first sample group may be indicated by a subsample information data box (in this case, a subsample may be divided in a unit of a point cloud slice). For a syntactic structure of the subsample information data box, refer to subsequent Table 15. In some embodiments, when a value of a slice presence field corresponding to the sample Sj is a sixth flag value (for example, 0), the sample Sj belongs to a second sample group. No point cloud slice exists in each sample included in the second sample group.
The slice sample group entry is configured for classifying samples in a slice track into different sample group, and each sample group may have its own characteristics. For example, the first sample group is a set of samples in which point cloud slices exist, and the second sample group is a set of samples in which no point cloud slices exist. Based on the above, whether a number of point cloud slices in a current sample is greater than 0 can be directly indicated by the slice presence field in the slice sample group entry, so that a sample in which no point cloud slice exists can be quickly identified.
In some embodiments, static slice encapsulation information and dynamic slice encapsulation information may be separately indicated. For example, the static slice encapsulation information may be indicated by the slice track sample entry (for its definition, refer to Table 7), and the dynamic slice encapsulation information may be indicated by the slice sample group entry. For ease of understanding, refer to Table 11. Table 11 shows syntax of a slice track sample entry structure (for example, AVSPCCSampleEntry) provided in some embodiments:
In some embodiments as depicted in Table 11, the slice track sample entry may include a dynamic slice field (for example, dynamic_num_slices_flag in Table 11), and the dynamic slice field may indicate change status of the slice encapsulation information in the slice track Ti, and has semantics same as the semantics of the field dynamic_num_slices_flag in Table 8. In some embodiments, when a value of the dynamic slice field is a third flag value (for example, 0), the change status of the slice encapsulation information in the slice track Ti is a static state. Further, a slice track sample entry indicating static slice encapsulation information may further include a first track slice number field (for example, num_slices in Table 11) and a fourth slice identification field (for example, slice_id in Table 11). The first track slice number field may indicate a number of point cloud slices included in each sample in the slice track Ti. Assuming that a value of the first track slice number field is K1, K1 point cloud slices exist in each sample in the slice track Ti, K1 being a positive integer. The fourth slice identification field may indicate an identifier of a point cloud slice included in each sample in the slice track Ti.
In some embodiments, when a value of the dynamic slice field is a fourth flag value (for example, a value of dynamic_num_slices_flag in Table 11 is 1), the change status of the slice encapsulation information in the slice track Ti is a dynamic state. In this case, if dynamic slice encapsulation information is still indicated in the slice track sample entry, the client still tries to decode a sample in which no point cloud slice exists, which results in a waste of decoding resources. Some embodiments can further introduce a slice sample group entry (for its definition, refer to Table 9) to indicate dynamic slice encapsulation information. For example, for some embodiments as depicted in Table 11, the media file resource may further include a slice sample group entry indicating dynamic slice encapsulation information. For ease of understanding, refer to Table 12. Table 12 shows syntax of a slice sample group entry structure (for example, AvsPccSliceInfoEntry) provided in some embodiments:
In some embodiments as depicted in Table 12, the slice sample group entry may include a second track slice number field (for example, num_slices in Table 12) and a fifth slice identification field (for example, slice_id in Table 12). For ease of understanding and description, assuming that the point cloud bitstream includes S point cloud frames, the slice track Ti includes S samples, S being a positive integer, and the S samples include a sample Sj, j being a positive integer less than or equal to S. Using the sample Sj as an example for description, the second track slice number field may indicate a number of point cloud slices included in the sample Sj in the slice track Ti. When the number of the point cloud slices included in the sample Sj is greater than 0, a point cloud slice exists in the sample Sj, and the corresponding fifth slice identification field may indicate an identifier of a point cloud slice included in the sample Sj.
Similar to some embodiments as illustrated in Table 10, samples in a slice track may also be classified into different sample groups based on the slice sample group entry shown in Table 12. For example, a sample that meets num_slices >0 in the slice track Ti may be classified into a first sample group. The first sample group is a set of samples in which point cloud slices exist in the slice track Ti. Conversely, a sample that does not meet num_slices >0 in the slice track Ti may be classified into a second sample group. The second sample group is a set of samples in which no point cloud slices exist in the slice track Ti.
In addition, the foregoing indication manner may also be applied to a point cloud tile (for example, a tile structure in the existing MPEG technology). The point cloud tile may include one or more point cloud slices. Point cloud slices included in one point cloud tile are all from the same point cloud frame, and a correspondence between a point cloud tile and a point cloud slice may span a plurality of point cloud frames. For example, for a point cloud frame frame1, it is assumed that a point cloud tile tile1 corresponds to a point cloud slice slice1 in the point cloud frame frame1, and for a point cloud frame frame2, if the point cloud tile tile1 also corresponds to the point cloud slice slice1 in the point cloud frame frame2, it may be considered that a correspondence between a point cloud tile and a point cloud slice does not change. If the correspondence is not updated in a current tile track, the point cloud tile tile1 corresponding to each point cloud frame includes the point cloud slice slice1 in the point cloud frame. For example, the correspondence still takes effect. In some embodiments, the correspondence may be updated. For example, for a point cloud frame frame1, a point cloud tile tile1 may correspond to a point cloud slice slice1 in the point cloud frame frame1; and for a point cloud frame frame2, the point cloud tile tile1 may correspond to a point cloud slice slice2 in the point cloud frame frame2. This is not limited.
Based on this, in some embodiments, file encapsulation may be performed based on the point cloud tile, with a corresponding indication manner of tile encapsulation information similar to the foregoing indication manner of the slice encapsulation information, which is as follows:
For ease of understanding and description, it is assumed that the M point cloud slices in the point cloud frame are classified into H point cloud tiles, H being a positive integer less than or equal to M, and the H point cloud tiles are encapsulated in G tile tracks, G being a positive integer. The G tile tracks include a tile track Gp, p being a positive integer less than or equal to G. Numbers of the point cloud tiles and the tile tracks are not limited. The tile track Gp is used as an example for subsequent description. In this case, the media file resource obtained by the client may include a tile track sample entry corresponding to the tile track Gp, and the tile track sample entry may indicate tile encapsulation information included in the tile track Gp.
Based on this, similar to the scenario of multi-track encapsulation based on a point cloud slice, for a scenario of multi-track encapsulation based on a point cloud tile, each tile track may include one or more samples, and each sample may include zero or one or more point cloud tiles corresponding to a point cloud frame. A number of point cloud tiles included in each point cloud frame and a number of point cloud tiles in each sample are not limited. In this case, during multi-track encapsulation, point cloud tiles may not exist in samples of some tracks, or a number of point cloud tiles changes dynamically. Some embodiments provide various manners for supporting the encapsulation mode based on point cloud tiles.
In some embodiments, both static tile encapsulation information and dynamic tile encapsulation information may be indicated by the tile track sample entry. Similar to some embodiments as illustrated in Table 8, the tile track sample entry may include a dynamic tile field, a tile maximum number field, and a first point cloud tile identification field. The dynamic tile field may indicate change status of tile encapsulation information in the tile track Gp. In some embodiments, the change status may include a static state and a dynamic state. The static state indicates that samples in the tile track Gp include the same numbers of point cloud tiles. The dynamic state indicates that samples in the tile track Gp include not exactly the same numbers of point cloud tiles. The tile maximum number field may indicate a number of point cloud tiles included in a maximum number sample in the tile track Gp. The maximum number sample is a sample with a maximum number of point cloud tiles included in the tile track Gp. It is assumed that a value of the tile maximum number field is N2, N2 being a positive integer. The first point cloud tile identification field may indicate an identifier of a point cloud tile included in each sample in the tile track Gp.
In some embodiments, when a value of the dynamic tile field is a seventh flag value (for example, 0), the change status of the tile encapsulation information is a static state. For example, for different samples, the tile encapsulation information in the tile track Gp does not change dynamically. In this case, N2 point cloud tiles exist in each sample in the tile track Gp. In some embodiments, when a value of the dynamic tile field is an eighth flag value (for example, 1), the change status of the tile encapsulation information is a dynamic state. For example, for different samples, the tile encapsulation information in the tile track Gp changes dynamically. In this case, at most N2 point cloud tiles exist in each sample in the tile track Gp.
Further, for the scenario in which the tile encapsulation information changes dynamically, some embodiments introduces a tile sample group entry for indicating the scenario. For ease of understanding, refer to Table 13. Table 13 shows a definition of a tile sample group entry (for example, GPccTileInfoEntry) provided in some embodiments:
In some embodiments as depicted in Table 13, when the change status of the tile encapsulation information is the dynamic state, the media file resource may further include a tile sample group entry corresponding to the tile track Gp. As shown in Table 13, the tile sample group entry may include a tile presence field (for example, tile_contained_flag in Table 13). The tile presence field may indicate presence or absence of a point cloud tile in each sample of the tile track Gp. For ease of understanding and description, assuming that the point cloud bitstream includes S point cloud frames, the tile track Gp includes S samples, S being a positive integer, and the S samples include a sample Sj, j being a positive integer less than or equal to S. The sample Sj is used as an example for subsequent description. In some embodiments, when a value of a tile presence field corresponding to the sample Sj is a ninth flag value (for example, 1), the sample Sj belongs to a third sample group. A point cloud tile exists in each sample included in the third sample group, and tile encapsulation information included in each sample in the third sample group may be indicated by a subsample information data box (in this case, a subsample may be divided in a unit of a point cloud tile). For a syntactic structure of the subsample information data box, refer to subsequent Table 15. In some embodiments, when a value of a tile presence field corresponding to the sample Sj is a tenth flag value (for example, 0), the sample Sj belongs to a fourth sample group. No point cloud tile exists in each sample included in the fourth sample group.
Based on the above, whether a number of point cloud tiles in a current sample is greater than 0 can be directly indicated by the tile presence field in the tile sample group entry, so that a sample in which no point cloud tile exists can be quickly identified.
In some embodiments, static tile encapsulation information and dynamic tile encapsulation information may be separately indicated. For example, the static tile encapsulation information may be indicated by the tile track sample entry, and the dynamic tile encapsulation information may be indicated by the tile sample group entry.
For example, the tile track sample entry may include a dynamic tile field, and the dynamic tile field may indicate change status of the tile encapsulation information in the tile track Gp. In some embodiments, when a value of the dynamic tile field is a seventh flag value (for example, 0), the change status of the tile encapsulation information in the tile track Gp is a static state. Further, a tile track sample entry indicating static tile encapsulation information may further include a first track tile number field and a second point cloud tile identification field. The first track tile number field may indicate a number of point cloud tiles included in each sample in the tile track Gp. Assuming that a value of the first track tile number field is K2, K2 point cloud tiles exist in each sample in the tile track Gp, K2 being a positive integer. The second point cloud tile identification field may indicate an identifier of a point cloud tile included in each sample in the tile track Gp.
In some embodiments, when a value of the dynamic tile field is an eighth flag value (for example, 1), the change status of the tile encapsulation information in the tile track Gp is a dynamic state. In this case, if dynamic tile encapsulation information is still indicated in the tile track sample entry, the client still tries to decode a sample in which no point cloud tile exists, which results in a waste of decoding resources. Some embodiments can further introduce a tile sample group entry to indicate dynamic tile encapsulation information. For example, the media file resource may further include a tile sample group entry indicating dynamic tile encapsulation information. For ease of understanding, refer to Table 14. Table 14 shows syntax of a tile sample group entry structure (for example, GPccTileInfoEntry) provided in some embodiments:
In some embodiments as depicted in Table 14, the tile sample group entry may include a second track tile number field (for example, num_tiles in Table 14) and a third point cloud tile identification field (for example, tile_id in Table 14). For ease of understanding and description, assuming that the point cloud bitstream includes S point cloud frames, the tile track Gp includes S samples, S being a positive integer, and the S samples include a sample Sj, j being a positive integer less than or equal to S. Using the sample Sj as an example for description, the second track tile number field may indicate a number of point cloud tiles included in the sample Sj in the tile track Gp. When the number of the point cloud tiles included in the sample Sj is greater than 0, a point cloud tile exists in the sample Sj, and the third point cloud tile identification field may indicate an identifier of a point cloud tile included in the sample Sj.
Similar to some embodiments as illustrated in Table 13, samples in a tile track may also be classified into different sample groups based on the tile sample group entry shown in Table 14. For example, a sample that meets num_tiles >0 in the tile track Gp may be classified into a third sample group. The third sample group is a set of samples in which point cloud tiles exist in the tile track Gp. Conversely, a sample that does not meet num_tiles >0 in the tile track Gp may be classified into a fourth sample group. The fourth sample group is a set of samples in which no point cloud tiles exist in the tile track Gp.
In addition, in some embodiments, a definition of a subsample is further extended. For example, in some embodiments, the subsample information data box may include a component data indication field. When component data of a target point cloud slice exists in a subsample of the slice track Ti, the component data indication field may indicate a data volume of the component data included in the subsample. The target point cloud slice belongs to the M point cloud slices, and the target point cloud slice is encapsulated in the slice track Ti. In some embodiments, when a value of the component data indication field is a first field value (for example, 1), the subsample includes all of the component data of the target point cloud slice. In some embodiments, when a value of the component data indication field is a second field value (for example, 0), the subsample includes a part of the component data of the target point cloud slice.
For ease of understanding, refer to Table 15. Table 15 shows syntax of a subsample information data box structure (for example, SubSampleInformationBox) provided in some embodiments:
The syntax shown in Table 15 is the syntax corresponding to codec_parameters in SubsampleInformationBox. Some of the semantics of the syntax shown in Table 15 above are as follows: In a scenario of file encapsulation based on the second encapsulation mode, SubSampleInformationBox may be used in file encapsulation of the point cloud bitstream, and a subsample is defined based on a value of an identification field (for example, flags in Table 15) of subsample information data. The identification field indicates a type of subsample information in this data box. In some embodiments, when a value of flags is 0, a subsample based on a type of data carried by a point cloud slice is defined. In this case, one subsample includes one data type and relevant information. In some embodiments, when a value of flags is 1, a subsample based on a point cloud slice is defined. In this case, one subsample includes relevant information of one point cloud slice, including a geometry slice header, geometry data, an attribute slice header, and attribute data. The other values of flags are reserved. payloadType indicates a type of data in a point cloud slice included in a subsample, with a value of 0 indicating attribute data or a value of 1 indicating geometry data. attribute_present_flag indicates whether a subsample includes a color attribute and/or a reflectance attribute, defined in AVS-PCC. attribute_present_flag[0] indicates whether a color attribute is included. attribute_present_flag[1] indicates whether a reflectance attribute is included. slice_data indicates whether a subsample includes component data of a point cloud slice, with a value of 1 indicating that component data of a geometry and/or attribute type of a point cloud slice is included, or with a value of 0 indicating that point cloud parameter information is included. slice_id indicates an identifier of a point cloud slice corresponding to component data included in a subsample. When a value of all_component_data (for example, the component data indication field) is 1 (for example, the first field value), a current subsample includes all of component data of a corresponding point cloud slice. When a value of all_component_data is 0 (for example, the second field value), a current subsample includes a part of component data of a corresponding point cloud slice. Based on the above, the first encapsulation mode implements file encapsulation entirely based on the point cloud slice, and is a multi-track encapsulation mode for the point cloud slice; while the second encapsulation mode implements file encapsulation not for the point cloud slice. During actual application, an appropriate encapsulation mode may be selected based on a requirement. This is not limited.
A data volume of component data included in a subsample may be determined based on whether a corresponding slice track includes a component information data box. When a component information data box exists in the slice track Ti, a subsample of the slice track T; includes component data corresponding to the component information data box. In some embodiments, when no component information data box exists in the slice track Ti, a subsample of the slice track Ti includes all component data of a target point cloud slice. The target point cloud slice belongs to the M point cloud slices, and the target point cloud slice is encapsulated in the slice track Ti.
In some embodiments, component data included in a subsample is further determined based on the foregoing extension of subsample-related syntactic information, thereby more perfecting a definition of the subsample. The subsample information data box is applicable in both a single-track encapsulation mode and a multi-track encapsulation mode. In addition, for the foregoing encapsulation based on a point cloud slice, a subsample may also be divided by using a subsample information data box. For example, when N1 (N1>1) point cloud slices exist in each sample in the slice track Ti, each point cloud slice included in each sample may be defined as a subsample, to differentiate different point cloud slices in the same sample.
In some embodiments, partial transmission of point cloud media can be supported, thereby saving bandwidth. The following describes in detail a relevant field extended in DASH signaling with reference to relevant syntax.
In a scenario of file encapsulation based on the first encapsulation mode, the media file resource obtained by the client may include some or all slice tracks obtained through file encapsulation, which may include the following obtaining process: The client first receives a signaling message transmitted by the server. The signaling message is generated by the server based on slice encapsulation information included in each slice track. Using the slice track Ti as an example, the signaling message may include a point cloud slice identifier list corresponding to the slice track Ti, and the point cloud slice identifier list may include an identifier of a point cloud slice included in the slice track Ti. Further, the client may request the media file resource of the immersive media based on the signaling message. The media file resource may include data streams (for example, representations, also referred to as transport streams) corresponding to W slice tracks, where W is a positive integer less than or equal to T. For example, the client can know, based on the point cloud slice identifier list in the signaling message, a slice track where a point cloud slice requested by the client is located, to request a corresponding data stream. When W=T, the client requests data streams corresponding to all slice tracks. When W<T, the client requests data streams corresponding to some slice tracks, and the server may perform partial transmission.
In some embodiments, the point cloud slice identifier list may be configured for being added to a separate point cloud slice identifier descriptor or to a component descriptor related to the slice track Ti.
For example, when a slice track includes all component data of a point cloud slice, there is no relevant component descriptor. In this case, a point cloud slice identifier list may be added to a newly added and separate point cloud slice identifier descriptor. For ease of understanding, refer to Table 16. Table 16 shows syntax of a point cloud slice identifier descriptor (for example, AVSPCCSliceID descriptor) of point cloud media provided in some embodiments:
contained_slice_ids (for example, the point cloud slice identifier list) in Table 16 may be used as a separate descriptor to describe a representation or an adaptation set corresponding to the slice track.
In another example, when a slice track includes some component data of a point cloud slice, a relevant component descriptor may be used. In this case, a point cloud slice identifier list may be directly added to an existing component descriptor. For ease of understanding, refer to Table 17. Table 17 shows syntax of a component descriptor (for example, GPCCComponent descriptor) structure of point cloud media provided in some embodiments:
M (Mandatory) represents a mandatory field; and CM (Conditional Mandatory) represents conditional mandatory. component@contained_slice_ids is a point cloud slice identifier list added to a component descriptor.
Based on the description in operation 401, when the media file resource requested by the client includes all slice tracks, a to-be-decoded point cloud bitstream is a complete point cloud bitstream (including all point cloud slices); or when the media file resource requested by the client includes some slice tracks, a to-be-decoded point cloud bitstream is a partial point cloud bitstream (including some point cloud slices).
In some embodiments, partial decoding of point cloud media can be supported, thereby saving computing resources. Assuming that the W slice tracks obtained by the client include the slice track Ti, for ease of understanding, using the slice track Ti as an example for description, a process of decoding a data stream corresponding to another slice track is similar thereto. The process may be as follows: When decapsulating the media file resource, the client may parse metadata information at a file layer carried by the media file resource. The metadata information may include slice encapsulation information included in the slice track Ti (for example, the slice encapsulation information indicates a point cloud slice included in the slice track Ti). Further, a to-be-decoded point cloud slice may be determined, based on the slice encapsulation information, from a point cloud slice included in the slice track Ti. For example, assuming that the slice encapsulation information indicates that the slice track Ti includes a point cloud slice A1 in a sample 1 and a point cloud slice A2 in a sample 2, the client may select one point cloud slice adaptively or based on a user for decoding. The client may decode the to-be-decoded point cloud slice based on slice information related to the to-be-decoded point cloud slice, to implement partial decoding.
A process of determining, based on the slice encapsulation information, the to-be-decoded point cloud slice from the point cloud slice included in the slice track Ti may be as follows: When the slice encapsulation information indicates that a target sample in the slice track Ti does not include a point cloud slice, the client may not decode the target sample, but determine the to-be-decoded point cloud slice from a sample other than the target sample. For example, the client can skip decoding an empty sample based on the slice encapsulation information, thereby reducing a waste of decoding resources.
In addition, in some embodiments, a corresponding start code and end code are further defined. A start code is a group of bit strings. These bit strings cannot exist in a bitstream that conforms to this part other than the start code. A start code may include a start code prefix and a start code value. The start code prefix is the bit string ‘0000 0000 0000 0000 0000 0001’. All start codes have to be byte-aligned. The start code value is an 8-bit integer and may indicate a type of the start code. Refer to Table 18. Table 18 shows a start code value list of point cloud media provided in some embodiments:
In some embodiments, a point cloud frame may be a frame indicated by a start code and an end code together. The start code may be any one of a frame start code (picture_start_code), an intra-frame predicted picture start code (intra_picture_start_code), or an inter-frame predicted picture start code (inter_picture_start_code). Correspondingly, the end code may be a frame end code (picture_end_code) corresponding to the frame start code, an intra-frame predicted picture end code (intra_picture_end_code) corresponding to the intra-frame predicted picture start code, or an inter-frame predicted picture end code (inter_picture_end_code) corresponding to the inter-frame predicted picture start code. One group of the start code and the end code may be used for distinguishing one point cloud frame. In addition, the start code and the end code that correspond to the point cloud frame may be other representations in Table 18.
Based on the above, for one or more point cloud slices in a point cloud frame, in some embodiments, a corresponding bitstream high-level syntactic element (for example, slice information) is extended to distinguish different point cloud slices of different point cloud frames and to support differentiated encoding optimization on point cloud slices. In addition, for a multi-track encapsulation mode based on point cloud slices, in some embodiments, metadata information (such as slice encapsulation information, a signaling message, and slice information) for track transmission and sample decoding is indicated. By the method provided according to some embodiments, decoding, transmission, and presentation of point cloud media can be guided, and transmission bandwidth and computing resources can be saved, thereby improving decoding and presentation efficiency of the point cloud media.
Further, refer to
The server may obtain point cloud data of a real-world three-dimensional object or scene by using a capture device (for example, a camera array including a plurality of cameras), or the server may generate point cloud data of a virtual three-dimensional object or scene. The point cloud data may express a spatial structure and a surface attribute (such as color or material) of the corresponding three-dimensional object or scene. Further, the server may encode the obtained point cloud data to obtain a point cloud bitstream. The point cloud bitstream includes a plurality of point cloud frames and slice information. The slice information may indicate each of M point cloud slices included in a point cloud frame in the point cloud bitstream, where M is a positive integer.
In some embodiments, before encoding the point cloud data, the server may perform processing on the point cloud data, for example, perform cutting and mapping. The server may encode the point cloud data in an encoding manner. The encoding manner may be, for example, geometry-based point cloud compression (GPCC) or another encoding manner. This is not limited. In addition, as different point cloud slices in different point cloud frames are distinguished in some embodiments, the server can be supported to perform differentiated encoding optimization on point cloud slices. For example, some additional parameters may be defined at each point cloud slice level to achieve a gain. A manner of differentiated encoding is not limited.
The slice information may be added by the server to high-level syntax of the point cloud bitstream. For content and indication manners, refer to the relevant description of operation 401 in some embodiments as illustrated in
The server may perform file encapsulation on the point cloud bitstream in a first encapsulation mode or a second encapsulation mode. In some embodiments, if file encapsulation is performed on the point cloud bitstream in the first encapsulation mode (for example, file encapsulation is performed based on a point cloud slice), it is assumed that M point cloud slices in a point cloud frame are encapsulated in T slice tracks, T being a positive integer, and the T slice tracks include a slice track Ti, i being a positive integer less than or equal to T. The slice track Ti is used as an example for description. The encapsulation process may be as follows: The server may generate slice encapsulation information corresponding to the slice track Ti based on slice information related to a point cloud slice included in the slice track Ti. The slice encapsulation information belongs to metadata information related to the point cloud bitstream (other metadata information for track transmission may also be included). Further, the server may encapsulate the metadata information and the T slice tracks into the media file resource of the immersive media.
The slice encapsulation information may be added by the server to a corresponding data box. For content and indication manners, refer to the relevant description of operation 401 in some embodiments as illustrated in
In some embodiments, if file encapsulation is performed on the point cloud bitstream in the second encapsulation mode (for example, file encapsulation is performed not based on a point cloud slice), the server may indicate slice encapsulation information corresponding to each slice track by using a subsample information data box. The slice encapsulation information belongs to metadata information related to the point cloud bitstream (other metadata information for track transmission may also be included). The server may encapsulate the metadata information and the corresponding slice track into the media file resource of the immersive media.
Further, the server may transmit the media file resource of the immersive media to the client. If file encapsulation is performed in the first encapsulation mode, a corresponding signaling message may be added to describe relevant information of a point cloud slice in a corresponding data stream. For content and indication manners of the signaling message, refer to the relevant description of operation 401 in some embodiments as illustrated in
When the client requests data streams corresponding to some slice tracks based on the received signaling message, the media file resource transmitted by the server to the client includes some slice tracks.
Based on this, the server may encode obtained point cloud data to obtain a corresponding point cloud bitstream, the point cloud bitstream carrying slice information, may encapsulate the point cloud bitstream based on the slice information, and may transmit a media file resource requested by the client to the client for consumption. As the slice information can indicate each of M point cloud slices included in a point cloud frame in the point cloud bitstream, different point cloud slices in different point cloud frames can be differentiated, so that encoding and file encapsulation based on the point cloud slices can be implemented. In addition, in some embodiments, decoding, transmission, and presentation of point cloud media can be guided based on the point cloud slices, thereby improving decoding and presentation efficiency of the point cloud media.
Further,
Further, the server may perform file encapsulation on the point cloud bitstream in a manner of file encapsulation based on a point cloud slice (for example, the first encapsulation mode). The corresponding slice encapsulation information is as follows: Track1 (slice track 1): dynamic_num_slices_flag=0; max_num_slice_ids=1; slice_id=1 (for example, change status of slice encapsulation information corresponding to the slice track 1 is a static state, and each sample includes one point cloud slice of slice_id=1). Track2 (slice track 2): dynamic_num_slices_flag=1; max_num_slice_ids=1; slice_id=2 (for example, change status of slice encapsulation information corresponding to the slice track 2 is a dynamic state, and each sample includes at most one point cloud slice of slice_id=2). In AvsPccSliceInfoEntry (for example, a slice sample group entry) of Track2, num_slices corresponding to a sample 2 is equal to 0. Corresponding to the other samples, num_slices=1 and slice_id=2. For example, in the slice track 2, the sample 2 excludes a point cloud slice, and each of the other samples includes one point cloud slices of slice_id=2.
The server may transmit a corresponding signaling message (also referred to as a signaling file) to the client. The signaling message includes: Representation1 (data stream 1 corresponding to the slice track 1): slice_id=1; and Representation2 (data stream 2 corresponding to the slice track 2): slice_id=2. Representation1 and Representation2 respectively correspond to track1 and track2.
Further, the client may request a corresponding data stream based on the signaling message. When decapsulating and decoding an obtained point cloud file/file segment, the client may decode a corresponding point cloud slice based on a user and slice information, to implement partial transmission and partial decoding.
Corresponding to track2, it can be learned based on information about the sample 2 indicated in AvsPccSliceInfoEntry that the sample 2 includes no point cloud slice, so that the client can skip decoding the sample 2 during presentation.
In some embodiments, the slice information includes a first frame identification field, a multi-slice identification field, and a slice number field, and the first frame identification field, the multi-slice identification field, and the slice number field are all added in a geometry header corresponding to the point cloud frame; the first frame identification field indicates an identifier of the point cloud frame; the multi-slice identification field indicates that the point cloud frame includes one or more point cloud slices; and when a value of the multi-slice identification field is a first flag value, the point cloud frame includes a plurality of point cloud slices, M is a positive integer greater than 1, and a difference between a number of the plurality of point cloud slices and a field value of the slice number field is X, X being a non-negative integer; or when a value of the multi-slice identification field is a second flag value, the point cloud frame includes one point cloud slice, and M is equal to 1.
In some embodiments, the slice information includes a second frame identification field, the second frame identification field is added in an attribute header corresponding to the point cloud frame, and the second frame identification field indicates an identifier of the point cloud frame.
In some embodiments, the slice information includes a third frame identification field, the third frame identification field is added in a geometry slice header related to the point cloud frame, and the third frame identification field indicates an identifier of the point cloud frame to which a target point cloud slice corresponding to the geometry slice header belongs, the target point cloud slice belonging to the M point cloud slices.
In some embodiments, the geometry slice header includes a first slice identification field indicating the target point cloud slice; and when the geometry slice header is added in geometry data bitstream information related to the target point cloud slice, the geometry slice header is associated with geometry data corresponding to the target point cloud slice; or when the first slice identification field indicating the target point cloud slice exists in geometry data corresponding to the target point cloud slice, the geometry slice header is associated with the geometry data.
In some embodiments, the slice information further includes a fourth frame identification field, the fourth frame identification field is added in an attribute slice header related to the point cloud frame, and the fourth frame identification field indicates an identifier of the point cloud frame to which a target point cloud slice corresponding to the attribute slice header belongs, the target point cloud slice belonging to the M point cloud slices; and when a value of the multi-slice identification field is the first flag value, the attribute slice header further includes a reflectance attribute quantization parameter offset.
In some embodiments, the attribute slice header includes a second slice identification field indicating the target point cloud slice; and when the attribute slice header is added in attribute data bitstream information related to the target point cloud slice, the attribute slice header is associated with attribute data corresponding to the target point cloud slice; or when the second slice identification field indicating the target point cloud slice exists in attribute data corresponding to the target point cloud slice, the attribute slice header is associated with the attribute data.
The decapsulation module 11 may include: a message receiving unit 111 and a data request unit 112. The message receiving unit 111 is configured to receive a signaling message transmitted by a server, the signaling message being generated by the server based on the slice encapsulation information included in the slice track Ti, the signaling message including a point cloud slice identifier list corresponding to the slice track Ti, and the point cloud slice identifier list including an identifier of a point cloud slice included in the slice track Ti, and the point cloud slice identifier list being configured for being added to a separate point cloud slice identifier descriptor or to a component descriptor related to the slice track Ti. The data request unit 112 is configured to request the media file resource of the immersive media based on the signaling message, the media file resource including data streams corresponding to W slice tracks, and W being a positive integer less than or equal to T.
For implementation details of the message receiving unit 111 and the data request unit 112 according to some embodiments, reference may be made to operation 401 as illustrated in
The decoding module 12 is configured to decode the point cloud bitstream based on the slice information.
The W slice tracks include the slice track Ti. The decoding module 12 may include: an information parsing unit 121, a point cloud slice determining unit 122, and a point cloud slice decoding unit 123. The information parsing unit 121 is configured to parse metadata information carried by the media file resource, the metadata information including the slice encapsulation information included in the slice track Ti. The point cloud slice determining unit 122 is configured to determine, based on the slice encapsulation information, a to-be-decoded point cloud slice from a point cloud slice included in the slice track Ti. The point cloud slice determining unit 122 is configured to skip, when the slice encapsulation information indicates that a target sample in the slice track Ti does not include a point cloud slice, decoding the target sample, and determine the to-be-decoded point cloud slice from a sample other than the target sample. The point cloud slice decoding unit 123 is configured to decode the to-be-decoded point cloud slice based on slice information related to the to-be-decoded point cloud slice.
For implementation details of the information parsing unit 121, the point cloud slice determining unit 122, and the point cloud slice decoding unit 123 according to some embodiments, reference may be made to operation 402 as illustrated in
In some embodiments, the M point cloud slices in the point cloud frame are encapsulated in T slice tracks, T being a positive integer; the T slice tracks include a slice track Ti, i being a positive integer less than or equal to T; and the media file resource includes a slice track sample entry corresponding to the slice track Ti, and the slice track sample entry indicates slice encapsulation information included in the slice track Ti.
In some embodiments, the slice track sample entry includes a dynamic slice field, a slice maximum number field, and a third slice identification field; the dynamic slice field indicates change status of the slice encapsulation information in the slice track Ti; the slice maximum number field indicates a number of point cloud slices included in a maximum number sample in the slice track Ti, the maximum number sample is a sample with a maximum number of point cloud slices included in the slice track Ti, and a value of the slice maximum number field is N1, N1 being a positive integer; the third slice identification field indicates an identifier of a point cloud slice included in each sample in the slice track Ti; and when a value of the dynamic slice field is a third flag value, the change status of the slice encapsulation information is a static state, and N1 point cloud slices exist in each sample in the slice track Ti; or when a value of the dynamic slice field is a fourth flag value, the change status of the slice encapsulation information is a dynamic state, and at most N1 point cloud slices exist in each sample in the slice track Ti.
In some embodiments, when the change status of the slice encapsulation information is the dynamic state, the media file resource further includes a slice sample group entry corresponding to the slice track Ti, the slice sample group entry includes a slice presence field, and the slice presence field indicates presence or absence of a point cloud slice in a sample of the slice track Ti; when the point cloud bitstream includes S point cloud frames, the slice track Ti includes S samples, S being a positive integer; the S samples include a sample Sj, j being a positive integer less than or equal to S; when a value of a slice presence field corresponding to the sample Sj is a fifth flag value, the sample Sj belongs to a first sample group, a point cloud slice exists in each sample included in the first sample group, and slice encapsulation information included in each sample in the first sample group is indicated by a subsample information data box; or when a value of a slice presence field corresponding to the sample Sj is a sixth flag value, the sample Sj belongs to a second sample group, and no point cloud slice exists in each sample included in the second sample group.
In some embodiments, the slice track sample entry includes a dynamic slice field, and the dynamic slice field indicates change status of the slice encapsulation information in the slice track Ti; when a value of the dynamic slice field is a third flag value, the change status of the slice encapsulation information is a static state; a slice track sample entry indicating static slice encapsulation information further includes a first track slice number field and a fourth slice identification field; the first track slice number field indicates a number of point cloud slices included in each sample in the slice track Ti, and when a value of the first track slice number field is K1, K1 point cloud slices exist in each sample in the slice track Ti, K1 being a positive integer; and the fourth slice identification field indicates an identifier of a point cloud slice included in each sample in the slice track Ti.
In some embodiments, when a value of the dynamic slice field is a fourth flag value, the change status of the slice encapsulation information is a dynamic state; the dynamic state indicates that samples in the slice track Ti include not exactly the same numbers of point cloud slices; the media file resource further includes a slice sample group entry indicating dynamic slice encapsulation information, and the slice sample group entry includes a second track slice number field and a fifth slice identification field; when the point cloud bitstream includes S point cloud frames, the slice track Ti includes S samples, S being a positive integer; the S samples include a sample Sj, j being a positive integer less than or equal to S; the second track slice number field indicates a number of point cloud slices included in the sample Sj in the slice track Ti; and when the number of the point cloud slices included in the sample Sj is greater than 0, the fifth slice identification field indicates an identifier of a point cloud slice included in the sample Sj.
In some embodiments, the M point cloud slices in the point cloud frame are classified into H point cloud tiles, H being a positive integer less than or equal to M; the H point cloud tiles are encapsulated in G tile tracks, G being a positive integer; the G tile tracks include a tile track Gp, p being a positive integer less than or equal to G; and the media file resource includes a tile track sample entry corresponding to the tile track Gp, and the tile track sample entry indicates tile encapsulation information included in the tile track Gp.
In some embodiments, the tile track sample entry includes a dynamic tile field, a tile maximum number field, and a first point cloud tile identification field; the dynamic tile field indicates change status of the tile encapsulation information in the tile track Gp; the tile maximum number field indicates a number of point cloud tiles included in a maximum number sample in the tile track Gp, the maximum number sample is a sample with a maximum number of point cloud tiles included in the tile track Gp, and a value of the tile maximum number field is N2, N2 being a positive integer; the first point cloud tile identification field indicates an identifier of a point cloud tile included in each sample in the tile track Gp; and when a value of the dynamic tile field is a seventh flag value, the change status of the tile encapsulation information is a static state, and N2 point cloud tiles exist in each sample in the tile track Gp; or when a value of the dynamic tile field is an eighth flag value, the change status of the tile encapsulation information is a dynamic state, and at most N2 point cloud tiles exist in each sample in the tile track Gp.
In some embodiments, when the change status of the tile encapsulation information is the dynamic state, the media file resource further includes a tile sample group entry corresponding to the tile track Gp, the tile sample group entry includes a tile presence field, and the tile presence field indicates presence or absence of a point cloud tile in a sample of the tile track Gp; when the point cloud bitstream includes S point cloud frames, the tile track Gp includes S samples, S being a positive integer; the S samples include a sample Sj, j being a positive integer less than or equal to S; and when a value of a tile presence field corresponding to the sample Sj is a ninth flag value, the sample Sj belongs to a third sample group, a point cloud tile exists in each sample included in the third sample group, and tile encapsulation information included in each sample in the third sample group is indicated by a subsample information data box; or when a value of a tile presence field corresponding to the sample Sj is a tenth flag value, the sample Sj belongs to a fourth sample group, and no point cloud tile exists in each sample included in the fourth sample group.
In some embodiments, the tile track sample entry includes a dynamic tile field, and the dynamic tile field indicates change status of the tile encapsulation information in the tile track Gp; when a value of the dynamic tile field is a seventh flag value, the change status of the tile encapsulation information is a static state; a tile track sample entry indicating static tile encapsulation information further includes a first track tile number field and a second point cloud tile identification field; the first track tile number field indicates a number of point cloud tiles included in each sample in the tile track Gp, and when a value of the first track tile number field is K2, K2 point cloud tiles exist in each sample in the tile track Gp, K2 being a positive integer; and the second point cloud tile identification field indicates an identifier of a point cloud tile included in each sample in the tile track Gp.
In some embodiments, when a value of the dynamic tile field is an eighth flag value, the change status of the tile encapsulation information is a dynamic state; the dynamic state indicates that samples in the tile track Gp include not exactly the same numbers of point cloud tiles; the media file resource further includes a tile sample group entry indicating dynamic tile encapsulation information, and the tile sample group entry includes a second track tile number field and a third point cloud tile identification field; when the point cloud bitstream includes S point cloud frames, the tile track Gp includes S samples, S being a positive integer; the S samples include a sample Sj, j being a positive integer less than or equal to S; the second track tile number field indicates a number of point cloud tiles included in the sample Sj in the tile track Gp; and when the number of the point cloud tiles included in the sample Sj is greater than 0, the third point cloud tile identification field indicates an identifier of a point cloud tile included in the sample Sj.
In some embodiments, the subsample information data box includes a component data indication field; when component data of a target point cloud slice exists in a subsample of the slice track Ti, the component data indication field indicates a data volume of the component data included in the subsample, the target point cloud slice belonging to the M point cloud slices, and the target point cloud slice being encapsulated in the slice track Ti; and when a value of the component data indication field is a first field value, the subsample includes all of the component data of the target point cloud slice; or when a value of the component data indication field is a second field value, the subsample includes a part of the component data of the target point cloud slice.
In some embodiments, when a component information data box exists in the slice track Ti, a subsample of the slice track Ti includes component data corresponding to the component information data box; or when no component information data box exists in the slice track Ti, a subsample of the slice track Ti includes all component data of a target point cloud slice, the target point cloud slice belonging to the M point cloud slices, and the target point cloud slice being encapsulated in the slice track Ti.
In some embodiments, a point cloud frame may be a frame indicated by a start code and an end code together. The start code is any one of a frame start code, an intra-frame predicted picture start code, or an inter-frame predicted picture start code. The end code may be a frame end code corresponding to the frame start code, an intra-frame predicted picture end code corresponding to the intra-frame predicted picture start code, or an inter-frame predicted picture end code corresponding to the inter-frame predicted picture start code.
For implementation details of the decapsulation module 11 and the decoding module 12 according to some embodiments, reference may be made to operations 401 and 402 as illustrated in
The M point cloud slices in the point cloud frame are encapsulated in T slice tracks, T being a positive integer; and the T slice tracks include a slice track Ti, i being a positive integer less than or equal to T. The encapsulation module 22 may include: an encapsulation information generating unit 221 and a file encapsulation unit 222. The encapsulation information generating unit 221 is configured to generate slice encapsulation information corresponding to the slice track Ti based on slice information related to a point cloud slice included in the slice track Ti, the slice encapsulation information belonging to metadata information related to the point cloud bitstream. The file encapsulation unit 222 is configured to encapsulate the metadata information and the T slice tracks into the media file resource of the immersive media.
For implementation details of the encapsulation information generating unit 221 and the file encapsulation unit 222 according to some embodiments, reference may be made to operation 502 as illustrated in
For implementation details of the encoding module 21 and the encapsulation module 22 according to some embodiments, reference may be made to operations 501 and 502 as illustrated in
According to some embodiments, each module or unit may exist respectively or be combined into one or more units. Some units may be further split into multiple smaller function subunits, thereby implementing the same operations without affecting the technical effects of some embodiments. The units are divided based on logical functions. In actual applications, a function of one unit may be realized by multiple units, or functions of multiple units may be realized by one unit. In some embodiments, the apparatus may further include other units. In actual applications, these functions may also be realized cooperatively by the other units, and may be realized cooperatively by multiple units.
A person skilled in the art would understand that these “modules” or “units” could be implemented by hardware logic, a processor or processors executing computer software code, or a combination of both. The “modules” or “units” may also be implemented in software stored in a memory of a computer or a non-transitory computer-readable medium, where the instructions of each unit are executable by a processor to thereby cause the processor to perform the respective operations of the corresponding unit.
In the computer device 1000 shown in
In addition, some embodiments provide a computer-readable storage medium, the computer-readable storage medium stores the computer program executed by the immersive media data processing apparatus 1 or the immersive media data processing apparatus 2, and the computer program includes program instructions. When a processor executes the program instructions, the immersive media data processing method in some embodiments as illustrated in
The computer-readable storage medium may be an internal storage unit of the immersive media data processing apparatus or the computer device provided in some embodiments, for example, a hard disk or an internal memory of the computer device. The computer-readable storage medium may be an external storage device of the computer device, for example, a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card, or the like that is equipped on the computer device. Further, the computer-readable storage medium may include both the internal storage unit and the external storage device of the computer device. The computer-readable storage medium is configured to store the computer program and another program and data for the computer device. The computer-readable storage medium may be further configured to temporarily store data that has been outputted or that is to be outputted.
In addition, some embodiments provide a computer program product or a computer program. The computer program product or the computer program includes computer instructions. The computer instructions are stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium. The processor executes the computer instructions to enable the computer device to perform the method provided in some embodiments as illustrated in
Further, refer to
Claims, and accompanying drawings of some embodiments, the terms “first”, “second”, and the like are intended to distinguish different objects and are not intended to describe a specific sequence. In addition, the terms “include” and any variant thereof are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, product, or device that includes a series of operations or units is not limited to the listed operations or modules, and further includes any operation or unit that is intrinsic to the process, method, apparatus, product, or device.
The foregoing embodiments are used for describing, instead of limiting the technical solutions of the disclosure. A person of ordinary skill in the art shall understand that although the disclosure has been described in detail with reference to the foregoing embodiments, modifications can be made to the technical solutions described in the foregoing embodiments, or equivalent replacements can be made to some technical features in the technical solutions, provided that such modifications or replacements do not cause the essence of corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the disclosure and the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202211008936.X | Aug 2022 | CN | national |
This application is a continuation application of International Application No. PCT/CN2023/106305 filed on Jul. 7, 2023, which claims priority to Chinese Patent Application No. 202211008936.X, filed on Aug. 22, 2022, the disclosures of each being incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2023/106305 | Jul 2023 | WO |
Child | 18989464 | US |