Disclosed are embodiments related to applying an overlay process to a picture.
High Efficiency Video Coding (HEVC) is a block-based video codec standardized by ITU-T and MPEG that utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra (I) prediction from within the current picture. Temporal prediction is achieved using uni-directional (P) or bi-directional inter (B) prediction on block level from previously decoded reference pictures. In the encoder, the difference between the original pixel data and the predicted pixel data, referred to as the residual, is transformed into the frequency domain, quantized and then entropy coded before transmitted together with necessary prediction parameters such as prediction mode and motion vectors, also entropy coded. The decoder performs entropy decoding, inverse quantization and inverse transformation to obtain the residual, and then adds the residual to an intra or inter prediction to reconstruct a picture.
MPEG and ITU-T are working on the successor to HEVC within the Joint Video Exploratory Team (JVET). The name of this video codec is Versatile Video Coding (VVC) and version 1 of VVC specification, which is the current version of VVC at the time of writing, has been published as Rec. ITU-T H.2661 ISO/IEC 23090-3, “Versatile Video Coding”, 2020.
A video (a.k.a., video sequence) consists of a series of pictures (a.k.a., images) where each picture consists of one or more components. Each component can be described as a two-dimensional rectangular array of sample values. It is common that a picture in a video sequence consists of three components; one luma component Y where the sample values are luma values and two chroma components Cb and Cr, where the sample values are chroma values. It is also common that the dimensions of the chroma components are smaller than the luma components by a factor of two in each dimension. For example, the size of the luma component of an HD picture would be 1920×1080 and the chroma components would each have the dimension of 960×540. Components are sometimes referred to as color components.
A block is one two-dimensional array of samples. In video coding, each component is split into blocks and the coded video bitstream consists of a series of coded blocks. It is common in video coding that the picture is split into units that cover a specific area of the picture. Each unit consists of all blocks from all components that make up that specific area and each block belongs fully to one unit. The macroblock in H.264 and the Coding unit (CU) in HEVC are examples of units.
A block can alternatively be defined as a two-dimensional array that a transform used in coding is applied to. These blocks are known under the name “transform blocks.” Alternatively, a block can be defined as a two-dimensional array that a single prediction mode is applied to. These blocks can be called “prediction blocks”. In this application, the word block is not tied to one of these definitions but that the descriptions herein can apply to either definition.
A residual block consists of samples that represents sample value differences between sample values of the original source blocks and the prediction blocks. The residual block is processed using a spatial transform. In the encoder, the transform coefficients are quantized according to a quantization parameter (QP) which controls the precision of the quantized coefficients. The quantized coefficients can be referred to as residual coefficients. A high QP value would result in low precision of the coefficients and thus low fidelity of the residual block. A decoder receives the residual coefficients, applies inverse quantization and inverse transform to derive the residual block.
Both HEVC and VVC define a Network Abstraction Layer (NAL). All the data, i.e. both Video Coding Layer (VCL) or non-VCL data in HEVC and VVC is encapsulated in NAL units. A VCL NAL unit contains data that represents picture sample values. A non-VCL NAL unit contains additional associated data such as parameter sets and supplemental enhancement information (SEI) messages. The NAL unit in HEVC begins with a header which specifies the NAL unit type of the NAL unit that identifies what type of data is carried in the NAL unit, the layer ID and the temporal ID for which the NAL unit belongs to. The NAL unit type is transmitted in the nal_unit_type codeword in the NAL unit header and the type indicates and defines how the NAL unit should be parsed and decoded. The rest of the bytes of the NAL unit is payload of the type indicated by the NAL unit type. A bitstream consists of a series of concatenated NAL units.
The syntax for the NAL unit header for HEVC is shown in Table 1.
The syntax for the NAL unit header in the current version of VVC is shown in Table 2.
The NAL unit types of the current version of VVC are shown in Table 3.
The decoding order is the order in which NAL units shall be decoded, which is the same as the order of the NAL units within the bitstream. The decoding order may be different from the output order, which is the order in which decoded pictures are to be output, such as for display, by the decoder.
In HEVC and VVC all pictures are associated with a TemporalId value that specifies the temporal layer to which the picture belongs. TemporalId values are decoded from the nuh_temporal_id_plus1 syntax element in the NAL unit header. The encoder is required to set TemporalId values such that pictures belonging to a lower layer is perfectly decodable when higher temporal layers are discarded. Assume for instance that an encoder has output a bitstream using temporal layers 0, 1 and 2. Then removing all layer 2 NAL units or removing all layer 1 and 2 NAL units will result in bitstreams that can be decoded without problems. This is ensured by restrictions in the HEVC specification with which the encoder must comply. For instance, it is not allowed for a picture of a temporal layer to reference a picture of a higher temporal layer.
The value of the nuh_layer_id syntax element in the NAL unit header specifies the layer ID to which the NAL unit belongs. A layer access unit in VVC is defined as a set of one or more NAL units for which the VCL NAL units all have a particular value of nuh_layer_id, that are associated with each other according to a specified classification rule, that are consecutive in decoding order, and that contain exactly one coded picture.
A coded layer video sequence (CLVS) in VVC version 1 is defined as a sequence of layer access units that consists, in decoding order, of a CLVS layer access unit, followed by zero or more layer access units that are not CLVS layer access units, including all subsequent layer access units up to but not including any subsequent layer access unit that is a CLVS layer access unit. The relation between the layer access units and coded layer video sequences is illustrated in
In VVC version 1, layers may be coded independently or dependently from each other. When the layers are coded independently, a layer with, for example, nuh_layer_id 0 may not predict video data from another layer with e.g. nuh_layer_id 1. In VVC version 1, dependent coding between layers may be used, which enables support for scalable coding with SNR, spatial and view scalability.
VVC includes a picture header, which is a NAL unit having nal_unit_type equal to PH_NUT. The picture header is similar to the slice header, but the values of the syntax elements in the picture header are used to decode all slices of one picture. Each picture in VVC consist of one picture header NAL unit followed by all coded slices of the picture where each coded slice is conveyed in one coded slice NAL unit.
For single layer coding in HEVC, an access unit (AU) is the coded representation of a single picture. An AU may consist of several video coding layer (VCL) NAL units as well as non-VCL NAL units.
An Intra Random Access Point (IRAP) picture in HEVC is a picture that does not refer to any pictures other than itself for prediction in its decoding process. The first picture in the bitstream in decoding order in HEVC must be an IRAP picture, but an IRAP picture may additionally also appear later in the bitstream. HEVC specifies three types of IRAP pictures, the broken link access (BLA) picture, the instantaneous decoder refresh (IDR) picture, and the clean random access (CRA) picture.
A coded video sequence (CVS) in HEVC is a series of access units starting at an IRAP access unit up to, but not including the next IRAP access unit in decoding order.
IDR pictures always start a new CVS. An IDR picture may have associated random access decodable leading (RADL) pictures. An IDR picture does not have associated RASL pictures.
BLA pictures also starts a new CVS and has the same effect on the decoding process as an IDR picture. However, a BLA picture in HEVC may contain syntax elements that specify a non-empty set of one or more reference pictures. A BLA picture may have associated RASL pictures, which are not output by the decoder and may not be decodable, as they may contain references to pictures that may not be present in the bitstream. A BLA picture may also have associated RADL pictures, which are decoded.
A CRA picture may have associated RADL or RASL pictures. As with a BLA picture, a CRA picture may contain syntax elements that specify a non-empty set of one or more reference pictures. For CRA pictures, a flag can be set to specify that the associated RASL pictures are not output by the decoder, because they may not be decodable, as they may contain references to pictures that are not present in the bitstream. A CRA may or may not start a CVS.
In VVC, there is also the GRA picture which may or may not start a CVS without an Intra picture. A coded layer video sequence start (CLVSS) picture in VVC is an IRAP picture or a GRA picture. A CLVSS picture in VVC may start a VVC coded layer video sequence (CLVS) which is similar to a CVS in HEVC. There is no BLA picture type in VVC.
HEVC specifies three types of parameter sets, the picture parameter set (PPS), the sequence parameter set (SPS) and the video parameter set (VPS). The PPS contains data that is common for a whole picture, the SPS contains data that is common for a coded video sequence (CVS) and the VPS contains data that is common for multiple CVSs.
VVC also uses parameter set types of HEVC. In VVC, there is also the adaptation parameter set (APS) and the decoding parameter set (DPS). The APS may contain information that can be used for multiple slices and two slices of the same picture can use different APSes. The DPS consist of information specifying the “worst case” in terms of profile and level that the decoder will encounter in the entire bitstream.
Supplementary Enhancement Information (SEI) messages are codepoints in the coded bitstream that do not influence the decoding process of coded pictures from VCL NAL units. SEI messages usually address issues of representation/rendering of the decoded bitstream. The overall concept of SEI messages and many of the messages themselves have been inherited from the H.264 and HEVC specifications into the VVC specification. In VVC, an SEI RBSP contains one or more SEI messages.
SEI message syntax table describing the general structure of an SEI message in VVC is shown in Table 4. The type of each SEI message is identified by its payload type.
Annex D in the VVC specification specifies syntax and semantics for SEI message payloads for some SEI messages, and specifies the use of the SEI messages and VUI parameters for which the syntax and semantics are specified in ITU-T H.SEI|ISO/IEC 23002-7. The SEI payload structure in Annex D of VVC version 1 lists the SEI messages supported in VVC version 1. Table 5 shows general SEI payload syntax in VVC version 1 where the SEI payload is the container of SEI messages.
SEI messages assist in processes related to decoding, display or other purposes. However, SEI messages are not required for constructing the luma or chroma samples by the decoding process. Some SEI messages are required for checking bitstream conformance and for output timing decoder conformance. Other SEI messages are not required for checking bitstream conformance. A decoder is not required to support all SEI messages. Usually, if a decoder encounters an unsupported SEI message, it is discarded.
ITU-T H.274|ISO/IEC 23002-7, also referred to as VSEI, specifies the syntax and semantics of SEI messages and is particularly intended for use with coded video bitstreams, although it is written in a manner intended to be sufficiently generic that it may also be used with other types of coded video bitstreams. The first version of ITU-T H.274 ISO/IEC 23002-7 was finalized in July 2020. At the time of writing, version 2 is under development. JVET-U2006-v1 is the current draft for version 2 that specifies additional SEI messages for use with coded video bitstreams.
The persistence of an SEI message indicates the pictures to which the values signalled in the instance of the SEI message may apply. The part of the bitstream that the values of the SEI message may apply to are referred to as the persistence scope of the SEI message.
A persistence scope is associated with each SEI message which specifies the scope of the bitstream that the SEI message applies to. Table 6 describes the persistence scope of the SEI messages defined in VVC version 1 and Table 7 describes the persistence scope of the SEI messages defined in VSEI.
The current version of VVC includes a tool called tiles that divides a picture into rectangular spatially independent regions. Tiles in the VVC coding standard are similar to the tiles used in HEVC. Using tiles, a picture in VVC can be partitioned into rows and columns of CTUs where a tile is an intersection of a row and a column.
The tile structure is signalled in the picture parameter set (PPS) by specifying the heights of the rows and the widths of the columns. Individual rows and columns can have different sizes, but the partitioning always spans across the entire picture, from left to right and top to bottom respectively.
There is no decoding dependency between tiles of the same picture. This includes intra prediction, context selection for entropy coding and motion vector prediction. One exception is that in-loop filtering dependencies are generally allowed between tiles.
In the rectangular slice mode in VVC, a tile can further be split into multiple slices where each slice consists of a consecutive number of CTU rows inside one tile.
The concept of slices in HEVC divides the picture into independently coded slices, where decoding of one slice in a picture is independent of other slices of the same picture. Different coding types could be used for slices of the same picture, i.e. a slice could either be an I-slice, P-slice or B-slice. One purpose of slices is to enable resynchronization in case of data loss. In HEVC, a slice is a set of one or more CTUs.
In the current version of VVC, a slice is defined as an integer number of complete tiles or an integer number of consecutive complete CTU rows within a tile of a picture that are exclusively contained in a single NAL unit. A picture may be partitioned into either raster scan slices or rectangular slices. A raster scan slice consists of a number of complete tiles in raster scan order. A rectangular slice consists of a group of tiles that together occupy a rectangular region in the picture or a consecutive number of CTU rows inside one tile. Each slice has a slice header comprising syntax elements. Decoded slice header values from these syntax elements are used when decoding the slice. Each slice is carried in one VCL NAL unit. In an early version of the VVC draft specification, slices were referred to as tile groups. In the current version of VVC, the partition layout of the rectangular slices is signalled in the PPS as described in Table 8.
The semantics for the syntax elements in Table 8 are given below.
pps_no_pic_partition_flag equal to 1 specifies that no picture partitioning is applied to each picture referring to the PPS. pps_no_pic_partition_flag equal to 0 specifies that each picture referring to the PPS might be partitioned into more than one tile or slice. When sps_num_subpics_minus1 is greater than 0 or pps_mixed_nalu_types_in_pic_flag is equal to 1, the value of pps_no_pic_partition_flag shall be equal to 0.
pps_log2_ctu_size_minus5 plus 5 specifies the luma coding tree block size of each CTU. pps_log2_ctu_size_minus5 shall be equal to sps_log2_ctu_size_minus5.
pps_num_exp_tile_columns_minus1 plus 1 specifies the number of explicitly provided tile column widths. The value of pps_num_exp_tile_columns_minus1 shall be in the range of 0 to PicWidthInCtbsY−1, inclusive. When pps_no_pic_partition_flag is equal to 1, the value of pps_num_exp_tile_columns_minus1 is inferred to be equal to 0.
pps_num_exp_tile_rows_minus1 plus 1 specifies the number of explicitly provided tile row heights. The value of pps_num_exp_tile_rows_minus1 shall be in the range of 0 to PicHeightInCtbsY−1, inclusive. When pps_no_pic_partition_flag is equal to 1, the value of num_tile_rows_minus1 is inferred to be equal to 0.
pps_tile_column_width_minus1[i] plus 1 specifies the width of the i-th tile column in units of CTBs for i in the range of 0 to pps_num_exp_tile_columns_minus1, inclusive. pps_tile_column_width_minus1[pps_num_exp_tile_columns_minus1] is also used to derive the widths of the tile columns with index greater than pps_num_exp_tile_columns_minus1 as specified in clause 6.5.1 of the above-referenced VVC specification. The value of pps_tile_column_width_minus1[i] shall be in the range of 0 to PicWidthInCtbsY−1, inclusive. When not present, the value of pps_tile_column_width_minus1[0] is inferred to be equal to PicWidthInCtbsY−1.
pps_tile_row_height_minus1[i] plus 1 specifies the height of the i-th tile row in units of CTBs for i in the range of 0 to pps_num_exp_tile_rows_minus1, inclusive. pps_tile_row_height_minus1[pps_num_exp_tile_rows_minus1] is also used to derive the heights of the tile rows with index greater than pps_num_exp_tile_rows_minus1 as specified in clause 6.5.1 of the above-referenced VVC specification. The value of pps_tile_row_height_minus1[i] shall be in the range of 0 to PicHeightInCtbsY−1, inclusive. When not present, the value of pps_tile_row_height_minus1[0] is inferred to be equal to PicHeightInCtbsY−1.
pps_loop_filter_across_tiles_enabled_flag equal to 1 specifies that in-loop filtering operations across tile boundaries are enabled for pictures referring to the PPS. pps_loop_filter_across_tiles_enabled_flag equal to 0 specifies that in-loop filtering operations across tile boundaries are disabled for pictures referring to the PPS. The in-loop filtering operations include the deblocking filter, SAO, and ALF operations. When not present, the value of pps_loop_filter_across_tiles_enabled_flag is inferred to be equal to 0.
pps_rect_slice_flag equal to 0 specifies that the raster-san slice mode is in use for each picture referring to the PPS and the slice layout is not signalled in PPS. pps_rect_slice_flag equal to 1 specifies that the rectangular slice mode is in use for each picture referring to the PPS and the slice layout is signalled in the PPS. When not present, the value of pps_rect_slice_flag is inferred to be equal to 1. When sps_subpic_info_present_flag is equal to 1 or pps_mixed_nalu_types_in_pic_flag is equal to 1, the value of pps_rect_slice_flag shall be equal to 1.
pps_single_slice_per_subpic_flag equal to 1 specifies that each subpicture consists of one and only one rectangular slice. pps_single_slice_per_subpic_flag equal to 0 specifies that each subpicture could consist of one or more rectangular slices. When pps_no_pic_partition_flag is equal to 1, the value of pps_single_slice_per_subpic_flag is inferred to be equal to 1. Note 4—when there is only one subpicture per picture, pps_single_slice_per_subpic_flag equal to 1 means that there is only one slice per picture.
pps_num_slices_in_pic_minus1 plus 1 specifies the number of rectangular slices in each picture referring to the PPS. The value of pps_num_slices_in_pic_minus1 shall be in the range of 0 to MaxSlicesPerAu−1, inclusive, where MaxSlicesPerAu is specified in Annex A of the above-referenced VVC specification. When pps_no_pic_partition_flag is equal to 1, the value of pps_num_slices_in_pic_minus1 is inferred to be equal to 0. When pps_single_slice_per_subpic_flag is equal to 1, the value of pps_num_slices_in_pic_minus1 is inferred to be equal to sps_num_subpics_minus1.
pps_tile_idx_delta_present_flag equal to 0 specifies that pps_tile_idx_delta_val[i] syntax elements are not present in the PPS and all pictures referring to the PPS are partitioned into rectangular slice rows and rectangular slice columns in slice raster order. pps_tile_idx_delta_present_flag equal to 1 specifies that pps_tile_idx_delta_val[i] syntax elements could be present in the PPS and all rectangular slices in pictures referring to the PPS are specified in the order indicated by the values of the pps_tile_idx_delta_val[i] in increasing values of i. When not present, the value of pps_tile_idx_delta_present_flag is inferred to be equal to 0.
pps_slice_width_in_tiles_minus1[i] plus 1 specifies the width of the i-th rectangular slice in units of tile columns. The value of pps_slice_width_in_tiles_minus1[i] shall be in the range of 0 to NumTileColumns−1, inclusive. When not present, the value of pps_slice_width_in_tiles_minus1[i] is inferred to be equal to 0.
pps_slice_height_in_tiles_minus1[i] plus 1 specifies the height of the i-th rectangular slice in units of tile rows when pps_num_exp_slices_in_tile[i] is equal to 0. The value of pps_slice_height_in_tiles_minus1[i] shall be in the range of 0 to NumTileRows−1, inclusive.
When pps_slice_height_in_tiles_minus1[i] is not present, it is inferred as follows: If SliceTopLeftTileIdx[i]/NumTileColumns is equal to NumTileRows−1, the value of pps_slice_height_in_tiles_minus1[i] is inferred to be equal to 0; Otherwise, the value of pps_slice_height_in_tiles_minus1[i] is inferred to be equal to pps_slice_height_in_tiles_minus1[i−1].
pps_num_exp_slices_in_tile[i] specifies the number of explicitly provided slice heights for the slices in the tile containing the i-th slice (i.e., the tile with tile index equal to SliceTopLeftTileIdx[i]). The value of pps_num_exp_slices_in_tile[i] shall be in the range of 0 to RowHeightVal[SliceTopLeftTileIdx[i]/NumTileColumns]−1, inclusive. When not present, the value of pps_num_exp_slices_in_tile[i] is inferred to be equal to 0. If pps_num_exp_slices_in_tile[i] is equal to 0, the tile containing the i-th slice is not split into multiple slices. Otherwise (pps_num_exp_slices_in_tile[i] is greater than 0), the tile containing the i-th slice might or might not be split into multiple slices.
pps_exp_slice_height_in_ctus_minus1[i][j] plus 1 specifies the height of the j-th rectangular slice in the tile containing the i-th slice, in units of CTU rows, for j in the range of 0 to pps_num_exp_slices_in_tile[i]−1, inclusive, when pps_num_exp_slices_in_tile[i] is greater than 0. pps_exp_slice_height_in_ctus_minus1[i] [pps_num_exp_slices_in_tile[i] ] is also used to derive the heights of the rectangular slices in the tile containing the i-th slice with index greater than pps_num_exp_slices_in_tile[i]−1 as specified in clause 6.5.1 of the above-referenced VVC specification. The value of pps_exp_slice_height_in_ctus_minus1[i][j] shall be in the range of 0 to RowHeightVal[SliceTopLeftTileIdx[i]/NumTileColumns]−1, inclusive.
pps_tile_idx_delta_val[i] specifies the difference between the tile index of the tile containing the first CTU in the (i+1)-th rectangular slice and the tile index of the tile containing the first CTU in the i-th rectangular slice. The value of pps_tile_idx_delta_val[i] shall be in the range of −NumTilesInPic+1 to NumTilesInPic−1, inclusive. When not present, the value of pps_tile_idx_delta_val[i] is inferred to be equal to 0. When present, the value of pps_tile_idx_delta_val[i] shall not be equal to 0. When pps_rect_slice_flag is equal to 1, it is a requirement of bitstream conformance that, for any two slices, with picture-level slice indices idxA and idxB, that belong to the same picture and different subpictures, when SubpicIdxForSlice[idxA] is less than SubpicIdxForSlice[idxB], the value of idxA shall be less than idxB.
pps_loop_filter_across_slices_enabled_flag equal to 1 specifies that in-loop filtering operations across slice boundaries are enabled for pictures referring to the PPS. loop_filter_across_slice_enabled_flag equal to 0 specifies that in-loop filtering operations across slice boundaries are disabled for the PPS. The in-loop filtering operations include the deblocking filter, SAO, and ALF operations. When not present, the value of pps_loop_filter_across_slices_enabled_flag is inferred to be equal to 0.
Subpictures are supported in the current version of VVC. A Subpicture in VVC is defined as a rectangular region of one or more slices within a picture. This means a subpicture contains one or more slices that collectively cover a rectangular region of a picture.
In the current version of VVC, subpicture location and size are signalled in the SPS. Boundaries of a subpicture region may be treated as picture boundaries (excluding in-loop filtering operations) conditioned to a per-subpicture flag sps_subpic_treated_as_pic_flag[i] in the SPS. Also loop-filtering on subpicture boundaries is conditioned to a per-subpicture flag sps_loop_filter_across_subpic_enabled_flag[i] in the SPS.
There is also a subpicture ID mapping mechanism signalled in the SPS or in the PPS for the subpictures which is gated by two flags in SPS sps_subpic_id_present_flag and sps_subpic_id_signalling_present_flag and a flag in PPS pps_subpic_id_mapping_present_flag. The subpicture ID mapping mechanism maps each subpicture ID of a picture associated with the SPS/PPS to an index describing the bitstream order of the subpictures in the picture. This mechanism enables bitstream extraction and merger operations to be performed without having to rewrite the subpicture ID signalled in each slice, only the SPS/PPS may need to be rewritten.
Table 9 shows the subpicture syntax in the SPS in the current version of VVC. In Table 9 variable i is the subpicture index and the syntax elements for subpicture position, size and other properties are signalled for each subpicture in the order of subpicture index. For instance, all the syntax elements with i equal to 0 specify position, size and other properties of a subpicture with subpicture index equal to 0.
The semantics regarding syntax elements in Table 9 are given below
sps_subpic_info_present_flag equal to 1 specifies that subpicture information is present for the CLVS and there might be one or more than one subpicture in each picture of the CLVS. sps_subpic_info_present_flag equal to 0 specifies that subpicture information is not present for the CLVS and there is only one subpicture in each picture of the CLVS. When sps_res_change_in_clvs_allowed_flag is equal to 1, the value of sps_subpic_info_present_flag shall be equal to 0. Note 5—when a bitstream is the result of a subpicture sub-bitstream extraction process and contains only a subset of the subpictures of the input bitstream to the subpicture sub-bitstream extraction process, it might be required to set the value of sps_subpic_info_present_flag equal to 1 in the RBSP of the SPSs.
sps_num_subpics_minus1 plus 1 specifies the number of subpictures in each picture in the CLVS. The value of sps_num_subpics_minus1 shall be in the range of 0 to MaxSlicesPerAu−1, inclusive, where MaxSlicesPerAu is specified in Annex A of the above-referenced VVC specification. When not present, the value of sps_num_subpics_minus1 is inferred to be equal to 0.
sps_independent_subpics_flag equal to 1 specifies that all subpicture boundaries in the CLVS are treated as picture boundaries and there is no loop filtering across the subpicture boundaries. sps_independent_subpics_flag equal to 0 does not impose such a constraint. When not present, the value of sps_independent_subpics_flag is inferred to be equal to 1.
sps_subpic_same_size_flag equal to 1 specifies that all subpictures in the CLVS have the same width specified by sps_subpic_width_minus1[0] and the same height specified by sps_subpic_height_minus1[0]. sps_subpic_same_size_flag equal to 0 does not impose such a constraint. When not present, the value of sps_subpic_same_size_flag is inferred to be equal to 0. Let the variable tmpWidthVal be set equal to (sps_pic_width_max_in_luma_samples+CtbSizeY−1)/CtbSizeY, and the variable tmpHeightVal be set equal to (sps_pic_height_max_in_luma_samples+CtbSizeY−1)/CtbSizeY.
sps_subpic_ctu_top_left_x[i] specifies horizontal position of top-left CTU of i-th subpicture in unit of CtbSizeY. The length of the syntax element is Ceil(Log2(tmpWidthVal)) bits. When not present, the value of sps_subpic_ctu_top_left_x[i] is inferred as follows: If sps_subpic_same_size_flag is equal to 0 or i is equal to 0, the value of sps_subpic_ctu_top_left_x[i] is inferred to be equal to 0; Otherwise, the value of sps_subpic_ctu_top_left_x[i] is inferred to be equal to (i % numSubpicCols)*(sps_subpic_width_minus1[0]+1). When sps_subpic_same_size_flag is equal to 1, the variable numSubpicCols, specifying the number of subpicture columns in each picture in the CLVS, is derived as follows: numSubpicCols=tmpWidthVal/(sps_subpic_width_minus1[0]+1). When sps_subpic_same_size_flag is equal to 1, the value of numSubpicCols*tmpHeightVal/(sps_subpic_height_minus1[0]+1)−1 shall be equal to sps_num_subpics_minus1.
sps_subpic_ctu_top_left_y[i] specifies vertical position of top-left CTU of i-th subpicture in unit of CtbSizeY. The length of the syntax element is Ceil(Log2(tmpHeightVal)) bits. When not present, the value of sps_subpic_ctu_top_left_y[i] is inferred as follows: If sps_subpic_same_size_flag is equal to 0 or i is equal to 0, the value of sps_subpic_ctu_top_left_y[i] is inferred to be equal to 0; Otherwise, the value of sps_subpic_ctu_top_left_y[i] is inferred to be equal to (i/numSubpicCols)*(sps_subpic_height_minus1[0]+1).
sps_subpic_width_minus1[i] plus 1 specifies the width of the i-th subpicture in units of CtbSizeY. The length of the syntax element is Ceil(Log2(tmpWidthVal)) bits. When not present, the value of sps_subpic_width_minus1[i] is inferred as follows: If sps_subpic_same_size_flag is equal to 0 or i is equal to 0, the value of sps_subpic_width_minus1[i] is inferred to be equal to tmpWidthVal−sps_subpic_ctu_top_left_x[i]−1; Otherwise, the value of sps_subpic_width_minus1[i] is inferred to be equal to sps_subpic_width_minus1[0]. When sps_subpic_same_size_flag is equal to 1, the value of tmpWidthVal % (sps_subpic_width_minus1[0]+1) shall be equal to 0.
sps_subpic_height_minus1[i] plus 1 specifies the height of the i-th subpicture in units of CtbSizeY. The length of the syntax element is Ceil(Log2(tmpHeightVal)) bits. When not present, the value of sps_subpic_height_minus1[i] is inferred as follows: If sps_subpic_same_size_flag is equal to 0 or i is equal to 0, the value of sps_subpic_height_minus1[i] is inferred to be equal to tmpHeightVal−sps_subpic_ctu_top_left_y[i]−1; Otherwise, the value of sps_subpic_height_minus1[i] is inferred to be equal to sps_subpic_height_minus1[0]. When sps_subpic_same_size_flag is equal to 1, the value of tmpHeightVal % (sps_subpic_height_minus1[0]+1) shall be equal to 0. It is a requirement of bitstream conformance that the shapes of the subpictures shall be such that each subpicture, when decoded, shall have its entire left boundary and entire top boundary consisting of picture boundaries or consisting of boundaries of previously decoded subpictures. For each subpicture with subpicture index i in the range of 0 to sps_num_subpics_minus1, inclusive, it is a requirement of bitstream conformance that all of the following conditions are true: The value of (sps_subpic_ctu_top_left_x[i] *CtbSizeY) shall be less than (sps_pic_width_max_in_luma_samples−sps_conf_win_right_offset*SubWidthC); The value of ((sps_subpic_ctu_top_left_x[i]+sps_subpic_width_minus1[i]+1)*CtbSizeY) shall be greater than (sps_conf_win_left_offset*SubWidthC); The value of (sps_subpic_ctu_top_left_y[i] *CtbSizeY) shall be less than (sps_pic_height_max_in_luma_samples−sps_conf_win_bottom_offset*SubHeightC); and The value of ((sps_subpic_ctu_top_left_y[i]+sps_subpic_height_minus1[i]+1)*CtbSizeY) shall be greater than (sps_conf_win_top_offset*SubHeightC).
sps_subpic_treated_as_pic_flag[i] equal to 1 specifies that the i-th subpicture of each coded picture in the CLVS is treated as a picture in the decoding process excluding in-loop filtering operations. sps_subpic_treated_as_pic_flag[i] equal to 0 specifies that the i-th subpicture of each coded picture in the CLVS is not treated as a picture in the decoding process excluding in-loop filtering operations. When not present, the value of sps_subpic_treated_as_pic_flag[i] is inferred to be equal to 1.
sps_loop_filter_across_subpic_enabled_flag[i] equal to 1 specifies that in-loop filtering operations across subpicture boundaries is enabled and might be performed across the boundaries of the i-th subpicture in each coded picture in the CLVS. sps_loop_filter_across_subpic_enabled_flag[i] equal to 0 specifies that in-loop filtering operations across subpicture boundaries is disabled and are not performed across the boundaries of the i-th subpicture in each coded picture in the CLVS. When not present, the value of sps_loop_filter_across_subpic_enabled_pic_flag[i] is inferred to be equal to 0.
sps_subpic_id_len_minus1 plus 1 specifies the number of bits used to represent the syntax element sps_subpic_id[i], the syntax elements pps_subpic_id[i], when present, and the syntax element sh_subpic_id, when present. The value of sps_subpic_id_len_minus1 shall be in the range of 0 to 15, inclusive. The value of 1<<(sps_subpic_id_len_minus1+1) shall be greater than or equal to sps_num_subpics_minus1+1.
sps_subpic_id_mapping_explicitly_signalled_flag equal to 1 specifies that the subpicture ID mapping is explicitly signalled, either in the SPS or in the PPSs referred to by coded pictures of the CLVS. sps_subpic_id_mapping_explicitly_signalled_flag equal to 0 specifies that the subpicture ID mapping is not explicitly signalled for the CLVS. When not present, the value of sps_subpic_id_mapping_explicitly_signalled_flag is inferred to be equal to 0.
sps_subpic_id_mapping_present_flag equal to 1 specifies that the subpicture ID mapping is signalled in the SPS when sps_subpic_id_mapping_explicitly_signalled_flag is equal to 1. sps_subpic_id_mapping_present_flag equal to 0 specifies that subpicture ID mapping is signalled in the PPSs referred to by coded pictures of the CLVS when sps_subpic_id_mapping_explicitly_signalled_flag is equal to 1.
sps_subpic_id[i] specifies the subpicture ID of the i-th subpicture. The length of the sps_subpic_id[i] syntax element is sps_subpic_id_len_minus1+1 bits.
15. Picture order count (POC)
Pictures in HEVC are identified by their picture order count (POC) values, also known as full POC values. Each slice contains a code word, pic_order_cnt_lsb, that shall be the same for all slices in a picture. pic_order_cnt_lsb is also known as the least significant bits (lsb) of the full POC since it is a fixed-length code word and only the least significant bits of the full POC is signalled. Both encoder and decoder keep track of POC and assign POC values to each picture that is encoded/decoded. The pic_order_cnt_lsb can be signalled by 4-16 bits. There is a variable MaxPicOrderCntLsb used in HEVC which is set to the maximum pic_order_cnt_lsb value plus 1. This means that if 8 bits are used to signal pic_order_cnt_lsb, the maximum value is 255 and MaxPicOrderCntLsb is set to 2{circumflex over ( )}8=256. The picture order count value of a picture is called PicOrderCntVal in HEVC. Usually, PicOrderCntVal for the current picture is simply called PicOrderCntVal.
Certain challenges presently exist. For instance, in the existing systems for controlling an overlay process (e.g., a film grain, denoising, or renoising process), the overlay process can only be applied per picture, per subpicture, or per slice. For example, in the existing systems, the film grain process can be applied per picture using a film grain characteristics SEI message or can be applied per subpicture using a scalable-nested SEI message containing a film grain characteristics SEI message. This is not adequate for applications that require different overlay handlings, such as noise handling, for areas of the picture that are not matching the subpicture or slice partitions. Examples of such picture areas include: i) rectangular areas that do not coincide with the slice partitioning in the picture; ii) areas in the picture with boundaries that do not coincide with the CTU partitioning or the CU partitioning in the picture; and iii) nonrectangular regions such as other polygon shaped regions, free-form regions, or union of different polygon shaped regions.
Furthermore, in some applications it is desired to apply or avoid applying an overlay process such as a film grain process to an area of the picture with a particular content or color value. Some examples for this scenario are: i) areas with specific statistical properties or color for the content such as grass, sky or flat texture surfaces; ii) areas that are different in their importance, sensitivity or interest such as a human face; and iii) mix of natural and computer graphics content such as an animated figure on a natural background, a score board during sports, subtitles or text over natural background, etc.
In all the above scenarios it is desired to be able to define and control an overlay process for an area of the picture which is not bound to follow the slice boundaries. This is not currently possible using existing solutions in AVC, HEVC or VVC.
Yet another problem with using a scalable-nested SEI message containing a film grain characteristics SEI message is that all subpictures to which the film grain process is applied must be explicitly defined and signalled. If the film grain process is going to be applied to all the picture except one single subpicture A in the picture, the signalling needs to include all the subpictures except subpicture A which may not be a bit-efficient signalling. Some other limitations regarding the scalable nesting SEI message solution in VVC version 1 are: i) the number of scalable-nested SEI messages in a scalable nesting SEI is limited to a maximum of 64 and ii) more than one scalable nesting SEI message is required to specify more than one scalable nested film grain SEI messages applied to different subpicture sets in a picture. Additionally, when the scalable-nested SEI message is used to apply an SEI message to a subpicture area, it uses the subpicture ID to define the subpicture area, but the subpicture information describing the subpicture layout may not always be present in a post-processing step decoupled from the decoder. A post-processing step may for instance only take the decoded picture and information from SEI messages as input.
Accordingly, in one aspect there is provided a method for applying an overlay process to a picture in a bitstream. In one embodiment the method includes decoding a first set of one or more overlay process parameters from syntax elements in the bitstream, the first set of one or more overlay process parameters specifying a first overlay process. The method also includes decoding a first set of one or more picture partitioning parameters from syntax elements in the bitstream, the first set of one or more picture partitioning parameters specifying a first segment area of the picture, wherein a boundary of the first segment area of the picture is not fully aligned with a boundary of the picture or a boundary of any subpictures of the picture or a boundary of any slices in the picture. The method further includes decoding the picture, wherein decoding the picture comprises applying the first overlay process on the first segment area of the picture using the first set of one or more overlay process parameters.
In another aspect there is proved a method performed by an encoder. In one embodiment, the method includes obtaining a first set of one or more overlay process parameters, the first set of one or more overlay process parameters specifying a first overlay process. The method also includes obtaining a first set of one or more picture partitioning parameters, the first set of one or more picture partitioning parameters specifying a first segment area of a picture, wherein a boundary of the first segment area of the picture is not fully aligned with a boundary of the picture or a boundary of any subpictures of the picture or a boundary of any slices in the picture. The method further includes generating a bitstream, wherein the bitstream comprises a first set of one or more syntax elements encoding the first set of one or more overlay process parameters and a second set of one or more syntax elements encoding the first set of one or more picture partitioning parameters.
In another aspect there is provided a computer program comprising instructions which when executed by processing circuitry of an apparatus causes the apparatus to perform any of the methods disclosed herein. In one embodiment, there is provided a carrier containing the computer program wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium. In another aspect there is provided an apparatus that is configured to perform the methods disclosed herein. The apparatus may include memory and processing circuitry coupled to the memory.
An advantage of the embodiments disclosed herein is that they enable overlay processes to be defined and applied independent from the knowledge of the slice and subpicture partitioning in the picture. The segment area to which the overlay process is applied may, therefore, be defined in a more flexible way in contrast to the scalable nesting SEI in VVC where the finest granularity to define segment area is a subpicture granularity.
Another advantage of the embodiments is that the overlay process can be applied to a segment area in the picture without the need to access the NAL units that describe the picture partitioning. For instance, if the overlay process is defined in an SEI message, there is no need to access the SPS in VVC where the subpicture partitioning structure is defined or the PPS in VVC where the slice partitioning structure is defined, and it is known from the SEI message to which part of the picture to apply the overlay process. The embodiments also remove the need to consider subpicture partitioning in defining the overlay process area when sub-bitstreams are merged to create one bitstream which simplifies the bit stream merging process.
Another advantage of the embodiments is that the content creator is given greater flexibility in defining and applying overlay processes, such as, for example, a film grain process, a denoising process, a renoising process, or other overlay processes, to the decoded picture or video. As a result, the overlay process will not be limited to subpicture or slice partitions and can be defined and applied to or stopped from being applied to a desired area of the picture that does not necessarily match the slice or subpicture partitioning. Examples of such areas include: 1) rectangular areas that do not coincide with the slice partitioning in the picture; 2) areas in the picture with boundaries that do not coincide with the CTU partitioning or the CU partitioning in the picture; 3) nonrectangular regions such as other polygon shaped regions or a union of different polygon shaped regions; and 4) an area of the picture with a particular content or color value, such as, for example: i) areas with specific statistical properties of the luminance or color samples such as grass, sky, water or flat texture surfaces, ii) areas that are different in their importance, sensitivity or interest such as a human face, and iii) a mix of captured content and computer graphics content such as an animated figure in a natural background, a score board during sports, text over natural background, etc.
Another advantage of the embodiments is that the overlay process can be defined to be applied to all the picture except a defined area. If the overlay process is going to be applied to all the picture except one single area defined as A, it can be more efficient to signal the area A and not all the other parts of the picture depending on the partitioning of the picture into segment areas.
Another benefit of the embodiments, compared to the existing solution of the scalable nested SEI message where an SEI message can be signalled for subpictures using the subpicture ID, is that there is no need for the parser of the SEI message to be aware of the subpictures and their subpicture IDs. This is useful for when the overlay process parameters of the SEI message are used in a post-process operation detached from the decoding of the video.
The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.
FIG. TA illustrates the relation between layer access units and coded layer video sequences.
Noise in video originates from different sources. This noise can be suppressed by the encoder at the earliest stage of the process. When the picture is reconstructed at the decoder before display, modelled or unmodeled noise can be added to the decoded picture. Different objectives have been introduced that manifests the subjective quality increase by adding noise which, as a result of increase in picture resolution, has now become more apparent. The first reason to add noise might be to introduce artistic effects, e.g. while shooting documentaries, portraits, black and white scenes, to capture reality, or to get the “real cinema effect” for movies. The second reason is to hide coding artifacts such as blurriness, blocking, and banding effects appeared due to the heavy encoding procedure in the encoder.
A film grain process is supported in VVC. This process is essentially identical to the film grain processes specified in the H.264/AVC and HEVC video coding standards. The process includes an SEI message that carries a parametrized model for film grain synthesis in the decoder. The film grain characteristic SEI message includes a cancel flag, film_grain_characteristics_cancel_flag, which enables the film grain process if it is set equal to 0. Also, when the flag is set to 0, film grain parameter syntax elements follow the flag. At last, film_grain_characteristics_persistence_flag specifies the persistence of the film grain characteristic SEI message for the current layer. In Table 10 below, a simplified version of the syntax is shown.
In the Film Grain Technology—specification introduced in [SPMTE], a seed derivation method is specified that derives a seed value to be used for the Film grain characteristics SEI process. The seed is initialized using information that is already available at the decoder and is selected from a predetermined set of one or more 256 possible seeds in a look-up table. For the pseudo-random number generator, and to select 8×8 blocks of samples, the seed is initialized as: seed=Seed_LUT[Mod[pic_offset+color_offset[c], 256)], in which color_offset[c] is equal to 0, 85, and 170 for Y, Cb and Cr channels respectively and pic_offset is defined as: pic_offset=POC(curr_pic)+(POC_offset<<5), where POC(curr_pic) is equal to the picture order count value of the current frame; and POC_offset is set equal to the value of idr_pic_id on IDR frames, otherwise it is set equal to 0. Moreover, the pseudo-random number generator for creation of 64×64 sample blocks is initialized as follows: seed=Seed LUT[h+v*13], where h and v represent a value for horizontal and vertical directions respectively. Both h and v are in the range of [0,12] and determine which pattern of the film grain database is used as source of film grain samples. Finally, in either cases, the output of Seed_LUT[.], which is the variable of the variable seed above, is used as the seed for the pseudo-random number generator.
The AV1 video codec format supports film grain generation. The film grain is applied when a picture is output. The sequence_header_obu( ) contains a film_grain_params_present_flag that is an enable flag for the film grain signalling and process. The film grain parameters are signalled last in the frame_header_obu( ) in a syntax table called film_grain_params( ) which is shown in Table 11 below.
In film_grain_params( ) first there is a flag, apply_grain, that controls whether film grain shall be applied to the current picture or not. Then there is a 16-bit grain_seed syntax element that is used as a seed for a pseudo-random number generator that generates the grains. The update_grain flag specifies whether film grain parameter values from a reference picture should be used, or if the film grain parameter values to use shall be decoded from the frame header. The reference picture to use is identified by the film_grain_params_ref_idx syntax element value. In Table 11, the frame header film grain parameters are represented by the more_film_srain_parameters( ) row to simplify the table. The value of grain_seed initializes the seed for the pseudo-random number generator used for the Luma component of the white noise grain. For chroma components Cb and Cr, the value is modified via XOR operation as follows: Cb_seed=grain_seed{circumflex over ( )}0xb524 and Cr_seed=grain_seed{circumflex over ( )}0x49d8.
The scalable nesting SEI message in VVC provides a mechanism to associate SEI messages with specific OLSs, specific layers, or specific sets of subpictures. A scalable nesting SEI message contains one or more SEI messages. The SEI messages contained in the scalable nesting SEI message are also referred to as the scalable-nested SEI messages. The scalable nesting SEI message syntax in VVC is shown in Table 12.
A similar scalable nesting SEI message also exists in HEVC. The HEVC scalable nesting SEI message is providing a mechanism to associate SEI messages with bitstream subsets corresponding to various operation points or with specific layers or sublayers. The subpicture concept does not exist in HEVC and so the nesting SEI message in HEVC could not be used to associate SEI messages with specific sets of subpictures in contrast to VVC nesting SEI message, The scalable nesting SEI message syntax in HEVC is shown in Table 13.
In the description below, various embodiments are described that solve one or more of the above described problems. It is to be understood by a person skilled in the art that two or more embodiments, or parts of embodiments, may be combined to form new embodiments which are still covered by this disclosure.
The following terminology is used herein:
Segment area: A segment area is a part of a picture. A picture may be partitioned into multiple segment areas. Not all boundaries of a segment area are aligned with the picture or subpicture or slice boundaries. A segment area may for instance be, a tile, a CTU, a CU, a PU or part of a picture, part of a subpicture, part of a slice, part of a tile, part of a CTU, part of a CU or part of a PU.
Sub-segment area: A sub-segment area is a part of a segment area. A segment area may be divided into multiple sub-segment areas. A sub-segment area may for instance be a subpicture, a slice, a tile, a CTU, a CU, a PU or part of a picture, part of a subpicture, part of a slice, part of a tile, part of a CTU, part of a CU or part of a PU.
Overlay process: An overlay process is a process which takes a pixel value and overlay process parameters as inputs and generates or outputs a final pixel value based on the inputs. One example of an overlay process is when the final pixel values is obtained by adding an overlay value to the initial pixel value where the overlay value is obtained from the overlay process parameters. The overlay process may be applied on sample values or decoded pixel values in the decoding process or in a post-processing step either after the segment area has been decoded but before the decoded picture is output from the decoder, or after the decoded picture is output but before the decoded picture is displayed. An example of the overlay process is a film grain process. Typically, the film grain pattern has been estimated from the original pixel values in the segment area before encoding, but it could also be estimated in other ways. Another example of an overlay process is a renoising process. A renoising process may add noise to decoded pixel values in a segment area where the original noise pattern is unknown, in order to mask coding artifacts. An overlay process need not be a pure function of the initial pixel value but may take both the initial pixel value and the overlay process parameters as input and outputs the final pixel value. The overlay process is denoted as O( ) in the following illustration: final_pixel_value=O(initial_pixel_value, overlay_process_parameters).
In one example, the overlay process is defined as a film grain process and the final pixel value is determined as the sum of the initial pixel value and a film grain value where the film grain value is output of F( ) that takes the film grain parameters as input:
Overlay process parameters: An overlay process parameter is a parameter related to an overlay process, such as type of the model or overlay filter used in the overlay process, strength of the model or filter, seed to pseudo-randomize patterns or other values for generating pattern for the overlay process such as the seed for film grain pattern in the film grain process. Parameters related to one overlay process may be referred to as a set of one or more overlay parameters.
Overlay value: An overlay value is a value obtained from the overlay process parameters. This value may be used for creating the final pixel value as the output of the overlay process.
Overlay process area: An overlay process area is a segment area of a picture to which an overlay process is applied.
In this embodiment, a segment area and an overlay process are specified where the boundaries of the segment area are not fully aligned with the boundaries of the picture or boundaries of existing subpictures or slices in the picture. That is, a segment area may have a shape and/or position not possible to produce with a subpicture partitioning and slice partitioning scheme. The picture is then decoded where decoding the picture comprises applying the overlay process only on the segment area of the picture. The overlay process may be applied using a set of one or more overlay process parameters signalled in the bitstream, e.g. in an SEI message, a parameter set such as APS, PPS, SPS or VPS, in a picture header or in a slice header.
When the boundaries of the segment area are not fully aligned with the boundaries of the picture or boundaries of existing subpictures or slices in the picture, there exists at least a part of the boundaries of the segment area which is not part of the picture boundaries and is not part of the boundaries of existing subpictures or slices in the picture. In other words, if one drew all boundaries of a picture, all boundaries of slices in the picture, and all boundaries of subpictures in the picture, one would see a difference (e.g., a new boundary line segment that does not coincide with any of the other boundary line segments) when the boundaries of the segment area are drawn.
A segment area of a picture may be specified by means of geometrical partitioning (i.e., the geometry or shape of the segment area is specified (explicitly or implicitly)). In one example, the geometrical partitioning is specified explicitly for the segment area that the overlay process is applied to. In another example, the geometrical partitioning of the segment area may be specified indirectly, for example, by specifying a segment area and specifying that the segment area that the overlay process is applied to is derived as the area of the picture which is not part of the segment area. The segment area may be a rectangular area. The segment area SA may have other shapes or be the union of two or more connected or unconnected areas in the picture.
In one embodiment, an indicator value is used to determine whether the overlay process using a set of one or more overlay process parameters is applied on a segment area of the picture or not. The indicator value may for instance be a flag or a value in a certain range such as an index. The indicator value may be signalled in a syntax element in the bitstream, e.g. in a CU, CTU, slice header, picture header or parameter set such as APS, PPS, SPS or VPS or from an SEI message. If the indicator value has a certain value, e.g. 1, then the overlay process is applied on the segment area using the set of one or more overlay process parameters. If the indicator value has another value, e.g. 0, then the overlay process is not applied on the segment area using the set of one or more overlay process parameters.
In one embodiment, there is one indicator value for each segment area of a picture. In another embodiment, an indicator value is used to determine whether the overlay process using a set of one or more overlay process parameters is applied to one or more segment areas of a picture.
In another embodiment, the indicator value may also be used to determine whether the overlay process parameters are to be decoded from the bitstream or not.
In one embodiment, a decoder may perform all or a subset of the following steps for decoding a picture from a bitstream and applying an overlay:
In another embodiment the decoder may perform all or a subset of the following steps for decoding a picture from a bitstream and applying an overlay:
In another embodiment the decoder may perform all or a subset of the following steps:
In some embodiments, the first codeword is the same as the second codeword.
An encoder may perform all or a subset of the following steps:
In this embodiment syntax elements for the partitioning of the picture into segment areas and the overlay process parameters are signalled in one non-VCL NAL unit such as an SEI message. That is, picture partitioning parameters and overlay process parameters are contained in one non-VCL NAL unit.
In one example of this embodiment the picture partitioning parameters that are included in the NAL unit only explicitly specify the overlay process area(s) (i.e., the area(s) of the picture that no overlay process is applied to is not explicitly signalled), and the NAL unit also includes the overlay process parameters.
An example of syntax and semantics for this embodiment is shown below:
In one embodiment, there may be only one overlay process and one overlay process area in one non-VCL NAL unit applied to one picture.
A decoder may perform all or a subset of the following steps according to this embodiment:
An encoder may perform all or a subset of the following steps according to this embodiment:
In another example of this embodiment, a NAL unit for one picture may contain i) overlay process parameters that specify more than one overlay process and ii) picture partitioning parameters that specify more than one overlay process area.
In this embodiment syntax elements for the partitioning of the picture to segment areas and the overlay process are signalled in a header (such as a picture header (PH) or a slice header (SH)) or in a parameter set (such as a sequence parameter set (SPS), a picture parameter set (PPS), an adaptive parameter set (APS)).
A decoder may perform all or a subset of the following steps according to this embodiment:
An encoder may perform all or a subset of the following steps according to this embodiment:
In a variant of this embodiment the first header or parameter set is the same as the second header or parameter set.
In this embodiment the overlay process area is specified in a header or parameter set (such as a picture header (PH), a slice header (SH), a sequence parameter set (SPS), a picture parameter set (PPS), or an adaptive parameter set (APS)) and the overlay process parameters are decoded from a separate NAL unit such as an SEI message. Partitioning of the picture into segment areas may be signalled in the same NAL unit where the overlay process area is signalled or in a separate NAL unit.
Exemplary decoder steps according to this embodiment are given here. A decoder may perform all or a subset of the following steps according to this embodiment:
In a variant of this embodiment, the first NAL unit and the third NAL unit are the same NAL unit.
In another variant of this embodiment, the second NAL unit and the third NAL unit are the same NAL unit.
An encoder may perform all or a subset of the following steps according to this embodiment:
In a variant of this embodiment, the first NAL unit and the third NAL unit are the same NAL unit.
In another variant of this embodiment, the second NAL unit and the third NAL unit are the same NAL unit.
In this embodiment two or more overlay processes are defined and applied to overlay process area(s) in the picture. Multiple overlay processes may be defined in one NAL unit such as one SEI message or in more than one NAL units such as multiple SEI messages.
A decoder may perform all or a subset of the following steps according to this embodiment:
In one variant of this embodiment the first segment area is the same as the second segment area.
In another variant of this embodiment, a first set of one or more overlay process parameters P1 and a first set of one or more picture partitioning parameters specifying a first overlay process area A1 are decoded from a first NAL unit, and a second set of one or more overlay process parameters P2 and a second set of one or more picture partitioning parameters specifying a second overlay process area A2 are decoded from a second NAL unit. A first picture is then decoded using P1, A1, P2 and A2.
In another variant of this embodiment, a first set of one or more overlay process parameters P1 is decoded from a first NAL unit and a first set of one or more picture partitioning parameters specifying a first overlay process area A1 is decoded from a second NAL unit, and a second set of one or more overlay process parameters P2 is decoded from a third NAL unit and a second set of one or more picture partitioning parameters specifying a second overlay process area A2 is decoded from a fourth NAL unit. A first picture is then decoded using P1, A1, P2 and A2. In another variant of this embodiment a first and a third NAL unit are the same NAL unit and the second and the fourth NAL unit are the same NAL unit.
An encoder may perform all or a subset of the following steps according to this embodiment:
In this embodiment, the set(s) of overlay parameters are signalled in a parameter set such as APS, SPS, PPS and the picture partitioning parameters, which specify one or more segment areas, are signalled together with information indicating, for each specified segment area, which overlay parameters are applied to that segment area. For example, multiple sets of overlay parameters such as multiple sets of renoising filter models with different model IDs may be signalled. In one embodiment, an ID is assigned to each set of one or more overlay process parameters signalled in a parameter set and the set of one or more picture partitioning parameters are signalled in an SEI message together with the corresponding ID of the set of one or more overlay process parameters to each of the segment area to specify which of the overlay processes is applied to the segment area.
A decoder may perform all or a subset of the following steps according to this embodiment:
An encoder may perform all or a subset of the following steps according to this embodiment:
This embodiment is similar to Embodiment 1 but focuses on the shape of the segment area(s). In this embodiment the segment areas may have the shape of a rectangular area, a polygon area, a free form area, or other geometrically shaped segment areas that the picture can be partitioned into. A rectangular segment area may be specified by means of the position of top-left corner and its horizontal and vertical size or by the position of its two diagonal corners and the knowledge of its orientation or by other means.
A segment area may be specified directly or indirectly by removing the specified areas from the picture and specifying the remaining areas of the picture as a segment area.
In a variant of this embodiment, the shape of a segment area is specified using a mask. A mask may partition a picture into two segment areas, one specified by the black part of the mask and one specified by the white part of the mask.
This embodiment is similar to Embodiment 1 with the addition that at least one of the segment areas is the union of at least two sub-segment areas (e.g., a first sub-segment area and a second sub-segment area). An example is shown in
This embodiment is similar to Embodiment 1 with the addition that the segment areas may be specified using position and/or size information, or a list or array or pointers to a list of predefined masks or area forms. The segment areas may also be specified using straight lines that partition the picture into segment areas.
In this embodiment, more than one overlay process is specified (e.g. P1 and P2) and the overlay process areas OPA1 for P1 and OPA2 for P2 overlap partially or completely and the overlap area is specified as segment area OPA1∩OPA2 and one of the followings may be applied to the segment area OPA1∩OPA2 according to a rule such as:
This embodiment is similar to Embodiment 1 with the addition that the overlay process areas may be specified using the color values in the picture. Examples of areas with specific color values and distributions are like sky, grass, human skin, etc. In one example of this embodiment the overlay process area is specified as the areas in the picture that have a certain color value distributions and an overlay process such as a renoising process is applied to that particular overlay process areas of the picture.
This embodiment is similar to Embodiment 1 with the addition that a syntax element, denoted f1, equal to a first value specifies if an overlay process is applied to a specified segment area or an overlay process is applied to the entire picture except the specified segment area. One example of this embodiment is applying film grain process to the picture everywhere but not the segment area corresponding to a human face or the area with a particular color value or within a range of color values.
In this embodiment, it is specified that the overlay process is applied to one or a subset of the temporal sublayers in a layer. In one example of this embodiment, the temporal ID value(s) of the temporal sublayer(s) that the overlay process applies to is specified in the same NAL unit that the overlay process is specified. In another example of this embodiment, the temporal ID value(s) of the temporal sublayer(s) that the overlay process applies to is specified in a second NAL unit which is different from a first NAL unit where the overlay process is specified.
A1. A method (700) for applying an overlay process to a picture in a bitstream, the method comprising decoding a first set of one or more overlay process parameters from syntax elements in the bitstream, the first set of one or more overlay process parameters specifying a first overlay process; decoding a first set of one or more picture partitioning parameters from syntax elements in the bitstream, the first set of one or more picture partitioning parameters specifying a first segment area of the picture, wherein a boundary of the first segment area of the picture is not fully aligned with a boundary of the picture or a boundary of any subpictures of the picture or a boundary of any slices in the picture; and decoding the picture, wherein decoding the picture comprises applying the first overlay process on the first segment area of the picture using the first set of one or more overlay process parameters.
A2. The method of embodiment A1, further comprising: decoding a first indicator value from a first syntax element in the bitstream, wherein applying the first overlay process on the first segment area of the picture using the first set of one or more overlay process parameters is done in response to the first indicator value being equal to a first value.
A3. The method of embodiment A2, further comprising: decoding a second set of one or more overlay process parameters from syntax elements in the bitstream, the second set of one or more overlay process parameters specifying a second overlay process; decoding a second set of one or more picture partitioning parameters from syntax elements in the bitstream, the second set of one or more picture partitioning parameters specifying a second segment area of the picture; decoding a second indicator value from a second syntax element in the bitstream; and either: in response to the second indicator value being equal to a first value, applying the second overlay process on the second segment area of the picture using the second set of one or more overlay process parameters, or in response to the second indicator value being equal to a second value, not applying the second overlay process on the second segment area of the picture.
A4. The method of embodiment A2, further comprising: decoding a second indicator value from a second syntax element in the bitstream; decoding from the bitstream additional picture portioning parameters specifying a second segment area of the picture; and in response to the second indicator value being equal to a second value, not applying the first overlay process to the second segment area.
A5. The method of any one of the previous embodiments, wherein the first set of one or more picture partitioning parameters implicitly specifies the first segment area of the picture by explicitly specifying a complementary segment area of the picture, wherein the first segment area of the picture is specified as the area of the picture that is not part of the explicitly specified complementary segment area.
A6. The method of any one of the previous embodiments, wherein the first set of one or more picture partitioning parameters further implicitly specifies a second segment area, wherein the second segment area is the area of the picture that is not part of the first segment area.
A7. The method of any one of the previous embodiments, wherein the first set of one or more picture partitioning parameters and the first set of one or more overlay process parameters are signalled in the same Network Abstraction Layer (NAL) unit.
A8. The method of any one of the previous embodiments, wherein the first set of one or more overlay process parameters are decoded from syntax elements in the bitstream in response to the first indicator value being equal to a certain value; or the first set of one or more overlay process parameters are decoded from syntax elements in the bitstream in response to a third indicator value decoded from the bitstream being equal to a certain value.
A9. The method of any one of the previous embodiments, wherein the first set of one or more picture partitioning parameters specifies the first segment area of the picture by specifying at least one of: a color value, a luminance value, or a local similarity index value.
A10. The method of embodiment A2 or any embodiment that depends from embodiment A2, wherein the first syntax element is a flag.
A11. The method of any one of the previous embodiments, wherein the first segment area of the picture comprises at least one of: i) a part of a slice but not all of the slice, ii) a part of a tile but not all of the tile, iii) a part of a CTU but not all of the CTU, or iv) a part of a CU but not all of the CU.
A12. The method of any one of the previous embodiments, wherein the first segment area of the picture is a non-rectangular area.
A13. The method of any one of the previous embodiments, wherein the first segment area of the picture comprises at least a first sub-segment area and a second sub-segment area.
A13a. The method of embodiment A13, wherein the first and second sub-segment areas do not overlap.
A13b. The method of embodiment A13, wherein the first and second sub-segment areas are unconnected.
A14. The method of any one of the previous embodiments, wherein the overlay process is a film grain process.
A15. The method of any one of the previous embodiments, wherein the overlay process is a renoising process, a denoising process, or a post filtering process.
A16. The method of any one of the previous embodiments, wherein the first set of one or more overlay process parameters comprises at least one of: an overlay process model type parameter, an overlay process strength parameter, or an overlay process seed parameter.
A17. The method of any one of the previous embodiments, wherein the first set of one or more overlay process parameters is signalled in an SEI message, a parameter set (e.g., APS, PPS, SPS or VPS), a picture header, or a slice header.
A18. The method of any one of the previous embodiments, wherein an overlay process is applied to one or a subset of layers or to one or a subset of temporal sublayers in one layer.
A19. The method of embodiment A18, wherein applying an overlay process to one or a subset of temporal sublayers in one layer comprises applying an overlay process to one or a subset of temporal sublayers in one layer using a subset of the temporal sublayer IDs belonging to the one or a subset of temporal sublayers.
A20. The method of embodiment A2 or any embodiment that depends from embodiment A2, wherein the first indicator value is decoded from: a slice header, a picture header, a parameter set (e.g., an APS, PPS, SPS or VPS), or an SEI message.
A21. The method of any one of the previous embodiments, wherein at least one of the boundaries of the first segment area of the picture is not a slice boundary.
B1. A method (800) performed by an encoder, the method comprising obtaining a first set of one or more overlay process parameters, the first set of one or more overlay process parameters specifying a first overlay process; obtaining a first set of one or more picture partitioning parameters, the first set of one or more picture partitioning parameters specifying a first segment area of a picture, wherein a boundary of the first segment area of the picture is not fully aligned with a boundary of the picture or a boundary of any subpictures of the picture or a boundary of any slices in the picture; and generating a bitstream, wherein the bitstream comprises: a first set of one or more syntax elements encoding the first set of one or more overlay process parameters, and a second set of one or more syntax elements encoding the first set of one or more picture partitioning parameters.
B2. The method of embodiment B1, wherein the bitstream further comprises a first indicator syntax element encoding a first indicator, wherein the value of the first indicator indicates whether or not the first overlay process should be applied to the first segment area.
B3. The method of embodiment B2, further comprising: obtaining a second set of one or more overlay process parameters, the second set of one or more overlay process parameters specifying a second overlay process; and obtaining a second set of one or more picture partitioning parameters, the second set of one or more picture partitioning parameters specifying a second segment area of a picture, wherein the bitstream further comprises: a third set of one or more syntax elements encoding the second set of one or more overlay process parameters, a fourth set of one or more syntax elements encoding the second set of one or more picture partitioning parameters, and a second indicator syntax element encoding a second indicator, wherein the value of the second indicator indicates whether or not the second overlay process should be applied to the second segment area.
B3b. The method of embodiment B2, further comprising: encoding a second indicator value in the bitstream, wherein when the second indicator value is equal to a second value, the second indicator value indicates to a decoder that the decoder should decode from the bitstream picture portioning parameters specifying a second segment area of the picture and should not apply the first overlay process to the second segment area.
B4. The method of any one of embodiments B1-B3b, wherein the first set of one or more picture partitioning parameters implicitly specifies the first segment area of the picture by explicitly specifying a complementary segment area of the picture, wherein the first segment area of the picture is specified as the area of the picture that is not part of the explicitly specified complementary segment area.
B5. The method of any one of embodiments B1-B4, wherein at least one of the boundaries of the first segment area of the picture is not a slice boundary.
B6. The method of any one of embodiments B1-B5, wherein the first set of one or more picture partitioning parameters further implicitly specifies a second segment area, wherein the second segment area is the area of the picture that is not part of the first segment area.
B7. The method of any one of embodiments B1-B6, wherein the first set of one or more picture partitioning parameters and the first set of one or more overlay process parameters are signalled in the same Network Abstraction Layer (NAL) unit.
B8. The method of any one of embodiments B1-B7, wherein the first set of one or more picture partitioning parameters specifies the first segment area of the picture by specifying at least one of a color value, a luminance value, or a local similarity index value.
B9. The method of embodiment B2 or any embodiment that depends from embodiment B2, wherein the first indicator syntax element is a flag.
B10. The method of any one of embodiments B1-B9, wherein the first segment area of the picture comprises at least one of: i) a part of a slice but not all of the slice, ii) a part of a tile but not all of the tile, iii) a part of a CTU but not all of the CTU, or iv) a part of a CU but not all of the CU.
B11. The method of any one of embodiments B1-B10, wherein the first segment area of the picture is a non-rectangular area.
B12. The method of any one of embodiments B1-B11, wherein the first segment area of the picture comprises at least a first sub-segment area and a second sub-segment area.
B13. The method of embodiment B12, wherein the first and second sub-segment areas do not overlap.
B13b. The method of embodiment B12, wherein the first and second sub-segment areas are unconnected.
B14. The method of any one of embodiments B1-B13b, wherein the overlay process is a film grain process.
B15. The method of any one of embodiments B1-B14, wherein the overlay process is a renoising process, a denoising process, or a post filtering process.
B16. The method of any one of embodiments B1-B15, wherein the first set of one or more overlay process parameters comprises at least one of: an overlay process model type parameter, an overlay process strength parameter, or an overlay process seed parameter.
B17. The method of any one of embodiments B1-B16, wherein the first set of one or more overlay process parameters is signalled in an SEI message, a parameter set (e.g., APS, PPS, SPS or VPS), a picture header, or a slice header.
B18. The method of embodiment B2 or any embodiment that depends from embodiment B2, wherein the first indicator syntax element is comprised in: a slice header, a picture header, a parameter set (e.g., an APS, PPS, SPS or VPS), or an SEI message.
C1. A computer program (943) comprising instructions (944) which when executed by processing circuitry (902) causes the processing circuitry (902) to perform the method of any one of the above embodiments.
C2. A carrier containing the computer program of embodiment C1, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium (942).
D1. An apparatus (900), the apparatus being adapted to perform the method of any one of embodiments A1-A21 or B1-B18.
E1. An apparatus (900), the apparatus comprising: memory (942); and processing circuitry (902), wherein the apparatus is configured to perform the method of any one of embodiments A1-A21 or B1-B18.
While various embodiments are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of this disclosure should not be limited by any of the above-described exemplary embodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/066740 | 6/20/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63216279 | Jun 2021 | US |