At least one of the present embodiments generally relates to a method or an apparatus for video encoding or decoding, and more particularly, to a method or an apparatus comprising applying a spatial local illumination compensation.
To achieve high compression efficiency, image and video coding schemes usually employ prediction, including motion vector prediction, and transform to leverage spatial and temporal redundancy in the video content. Generally, intra or inter prediction is used to exploit the intra or inter frame correlation, then the differences between the original image and the predicted image, often denoted as prediction errors or prediction residuals, are transformed, quantized, and entropy coded. To reconstruct the video, the compressed data are decoded by inverse processes corresponding to the entropy coding, quantization, transform, and prediction. Recent additions to video compression technology include various industry standards, versions of the reference software and/or documentations such as Joint Exploration Model (JEM) and later VTM (Versatile Video Coding (VVC) Test Model) being developed by the JVET (Joint Video Exploration Team) group. The aim is to make further improvements to the existing HEVC (High Efficiency Video Coding) standard.
Existing methods for coding and decoding show some limitations in compensating illumination discrepancy between different regions/blocks in the same slice/picture. The issue is particularly salient for content comprising some sample values with gradually propagating spatial illumination variation in inter/intra/IBC prediction. Therefore, there is a need to improve the state of the art.
The drawbacks and disadvantages of the prior art are solved and addressed by the general aspects described herein.
According to a first aspect, there is provided a method. The method comprises video decoding by determining, for a current block being decoded in a picture, parameters for a local illumination compensation based on spatially neighboring reconstructed samples and corresponding spatially neighboring reconstructed samples of at least one spatial reference block; decoding the current block using local illumination compensation based on the determined parameters. Advantageously, the at least one spatial reference block is a spatially neighboring block of the current block in the picture.
According to another aspect, there is provided a second method. The method comprises video encoding by determining, for a current block being encoded in a picture, parameters for a local illumination compensation based on spatially neighboring reconstructed samples of the current block and corresponding spatially neighboring reconstructed samples of at least one spatial reference block; encoding the current block using local illumination compensation based on the determined parameters Advantageously, the at least one spatial reference block is a spatially neighboring block of the current block in the picture.
According to another aspect, there is provided an apparatus. The apparatus comprises one or more processors, wherein the one or more processors are configured to implement the method for video decoding according to any of its variants. According to another aspect, the apparatus for video decoding comprises means for determining, for a current block being decoded in a picture, parameters for a local illumination compensation based on spatially neighboring reconstructed samples and corresponding spatially neighboring reconstructed samples of at least one spatial reference block; means for decoding the current block using local illumination compensation based on the determined parameters. Advantageously, the at least one spatial reference block is a spatially neighboring block of the current block in the picture.
According to another aspect, there is provided another apparatus. The apparatus comprises one or more processors, wherein the one or more processors are configured to implement the method for video encoding according to any of its variants. According to another aspect, the apparatus for video encoding comprises means for determining, for a current block being encoded in a picture, parameters for a local illumination compensation based on spatially neighboring reconstructed samples and corresponding spatially neighboring reconstructed samples of at least one spatial reference block; means for encoding the current block using local illumination compensation based on the determined parameters. Advantageously, the at least one spatial reference block is a spatially neighboring block of the current block in the picture.
According to another general aspect of at least one embodiment, a syntax element is determined that indicates whether the spatial local illumination compensation applies on the current block or not.
According to another general aspect of at least one embodiment, the current block is coded in any of an inter prediction, intra prediction, IBC prediction.
According to another general aspect of at least one embodiment, the at least one spatial reference block is any of above neighboring block and left neighboring block.
According to another general aspect of at least one embodiment, the at least one spatial reference block is any of above neighboring block (B0), left neighboring block (A0), above-right neighboring block (B1), bottom-left neighboring block (A1) and above-left neighboring block (B2).
According to another general aspect of at least one embodiment, a syntax element is determined that indicates which spatial reference block is used in determining the parameters of the local illumination compensation.
According to another general aspect of at least one embodiment, the at least one spatial reference block is a neighboring block selected as motion vector predictor MVP candidate in Inter prediction.
According to another general aspect of at least one embodiment, the at least one spatial reference block is responsive to an intra prediction mode used to code the current block.
According to another general aspect of at least one embodiment, the at least one spatial reference block comprises the neighboring block selected as intra block copy reference block.
According to another general aspect of at least one embodiment, the neighboring reconstructed samples are located in the left and above boundaries of the current block and at least one spatial reference block.
According to another general aspect of at least one embodiment, the neighboring reconstructed samples are located in the multi left and above reference lines of the current block and at least one spatial reference block. According to another general aspect of at least one embodiment, the neighboring reconstructed samples are located in the whole reconstructed blocks of the current block and at least one spatial reference block. According to another general aspect of at least one embodiment, the at least one spatial reference block comprises a first spatial reference block and a second spatial reference block and wherein the spatially neighboring reconstructed samples of the first spatial reference block and the spatially neighboring reconstructed samples of the second spatial reference block are averaged to determine the parameters of the local illumination compensation.
According to another aspect, there is provided a third method. The method comprises video decoding by determining, for a current block being decoded in a picture, parameters for a local illumination compensation based on spatially neighboring reconstructed samples and corresponding spatially neighboring reconstructed samples of at least one reference block; decoding the current block using local illumination compensation based on the determined parameters; wherein the neighboring reconstructed samples are located in the multi left and above reference lines of the current block and at least one reference block. According to another general aspect of at least one embodiment, the neighboring reconstructed samples are located in the whole reconstructed blocks of the current block and at least one spatial reference block
According to another aspect, there is provided a fourth method. The method comprises video encoding by determining, for a current block being encoded in a picture, parameters for a local illumination compensation based on spatially neighboring reconstructed samples and corresponding spatially neighboring reconstructed samples of at least one reference block; encoding the current block using local illumination compensation based on the determined parameters; wherein the neighboring reconstructed samples are located in the multi left and above reference lines of the current block and at least one reference block. According to another general aspect of at least one embodiment, the neighboring reconstructed samples are located in the whole reconstructed blocks of the current block and at least one spatial reference block
According to another general aspect of at least one embodiment, there is provided a device comprising an apparatus according to any of the decoding embodiments; and at least one of (i) an antenna configured to receive a signal, the signal including the video block, (ii) a band limiter configured to limit the received signal to a band of frequencies that includes the video block, or (iii) a display configured to display an output representative of the video block.
According to another general aspect of at least one embodiment, there is provided a non-transitory computer readable medium containing data content generated according to any of the described encoding embodiments or variants.
According to another general aspect of at least one embodiment, there is provided a signal comprising video data generated according to any of the described encoding embodiments or variants.
According to another general aspect of at least one embodiment, a bitstream is formatted to include data content generated according to any of the described encoding embodiments or variants.
According to another general aspect of at least one embodiment, there is provided a computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out any of the described encoding/decoding embodiments or variants.
These and other aspects, features and advantages of the general aspects will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.
In the drawings, examples of several embodiments are illustrated.
It is to be understood that the figures and descriptions have been simplified to illustrate elements that are relevant for a clear understanding of the present principles, while eliminating, for purposes of clarity, many other elements found in typical encoding and/or decoding devices. It will be understood that, although the terms first and second may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
The various embodiments are described with respect to the encoding/decoding of an image. They may be applied to encode/decode a part of image, such as a slice or a tile, a tile group or a whole sequence of images.
Various methods are described above, and each of the methods comprises one or more steps or actions for achieving the described method. Unless a specific order of steps or actions is required for proper operation of the method, the order and/or use of specific steps and/or actions may be modified or combined.
At least some embodiments relate to method for encoding or decoding a video wherein a spatial LIC allows to compensate for gradual illumination in a same picture.
where P(x,y) is the prediction signal of the current block at the coordinate (x,y); Pr(x+vx,y+vy) is the reference block pointed by the motion vector (vx,vy); α and β are the corresponding scaling factor and offset that are applied to the reference block.
As shown in
where N represents the number of template samples that are used for deriving the LIC parameters; T(xi,yi) is the template sample of the current block at the coordinate (xi,yi); T0/1(xi+vx0/1,yi+vy0/1) is the corresponding reference sample of the template sample based on the motion vector (vx0/1,vy0/1) (either L0 or L1) of the current block. Additionally, to reduce the computational complexity, both the template samples and the reference template samples are subsampled (2:1 subsampling) to derive the LIC parameters, i.e., only the shaded samples in
Moreover, when LIC is applied to bi-directional blocks (i.e., being predicted by two temporal prediction blocks), the LIC parameters are derived and applied for each prediction direction, i.e., L0 and L1, separately. As shown in
where α0 and β0 and α1 and β1 are the LIC parameters associated with the L0 and L1 motion vectors (i.e., (vx0,vy0) and (vx1,vx1)) of the current block; Pr0(x+vx0,y+vy0) and Pr1(x+vx1,y+vy1) are the corresponding temporal reference blocks of the current block from list L0 and L1, respectively.
When an inter block is predicted with merge mode, LIC flag is included as a part of motion information in addition to MVs and reference indices. When merge candidate list is constructed, LIC flag is inherited from the neighbor blocks for merge candidates. Otherwise, LIC flag is context coded with a single context, when LIC tool is not applicable, LIC flag is not signaled.
However, it is desirable to enhance the coding efficiency of some video contents that contain some gradual illumination variation inside a same picture. Such situation may typically happen in some gaming video content or computer graphic images where some illumination source is located at some place in the picture and light propagates gradually across the picture.
As described above, the LIC can be considered as one enhancement of the regular motion-compensated prediction by addressing the illumination changes between different pictures at the motion compensation stage. Though the prior-art LIC can compensate illumination discrepancy between different pictures, it is neither applied nor adapted for the illumination compensation between different blocks in the same picture.
This is solved and addressed by the general aspects described herein, which are directed to determining, for a current block being decoded or decoded in a picture, parameters for a local illumination compensation based on spatially neighboring reconstructed samples and corresponding spatially neighboring reconstructed samples of at least one spatial reference block wherein the at least one spatial reference block is a spatially neighboring block of the current block in the picture. Thus, assuming one coding block and its spatial neighboring blocks inside the picture have the propagating illuminance variations, after generating the prediction signal of the block, the present principles propose to apply a spatial LIC to enhance the prediction. As the reference block is not located in the temporal reference pictures, but instead in the same picture, both the reference block search and the template used for the spatial LIC parameter estimation are adjusted. Moreover, the decision of spatial LIC flag, which indicates the usage of the spatial LIC, might also be defined. Besides, various embodiments of the spatial local illumination compensation (spatial LIC) for inter/intra/IBC prediction where different blocks with gradually propagating spatial illumination variation in the same picture are disclosed. In addition, various embodiments of shape of the template used in local illumination compensation (spatial/temporal LIC) are also disclosed.
According to a generic embodiment a method for encoding 100 is disclosed. The method comprises, determining 11 for a current block being encoded in a picture, parameters for a local illumination compensation based on spatially neighboring reconstructed samples and corresponding spatially neighboring reconstructed samples of at least one spatial reference block. Advantageously, the spatial reference block is a spatially neighboring block of the current block in the picture as described in various embodiments hereafter. Thus, the determined parameters for the local illumination compensation allows performing a spatial LIC. The spatial LIC is applied to a prediction of the current block to compensate for gradual illumination in the picture and results in a compensated prediction of the block. According to different embodiments, the prediction is one of an inter, intra or intra block copy (IBC) prediction. According to another embodiment, a syntax element indicating whether the spatial local illumination compensation applies on the current block or not is determined. After the spatial compensation of the prediction of the current block, a residual is for instance computed in the usual manner by subtracting the compensated prediction from the current block, and then the remaining processing (transform, quantization, CABAC encoding, etc.) is performed as in a state-of-the-art encoding method in a generic encoding step 12.
According to a generic embodiment a method for decoding 200 is disclosed. The method comprises, determining 21 for a current block being decoded in a picture, parameters for a local illumination compensation based on spatially neighboring reconstructed samples and corresponding spatially neighboring reconstructed samples of at least one spatial reference block. As for the encoding, the spatial reference block is a spatially neighboring block of the current block in the picture as described in various embodiments hereafter. According to different embodiments, the spatial LIC is enabled/disabled for the current block using a dedicated flag and the spatial LIC is applied to one of an inter, intra or IBC prediction of the current block. The decoding 22 then further comprises for instance decoding the residual values by performing the CABAC decoding, dequantization of the transform coefficients and then the inverse transform of the decoded coefficients, and adding the so-decoded residual values to the compensated prediction to decode the current block.
Various embodiments of the generic spatial LIC used in an encoding or decoding method are described in the following. According to various embodiments, a block (or CU) level spatial LIC flag is defined for an inter/intra/IBC block to indicate whether the spatial LIC applies on the block or not. If the spatial LIC applies for an inter/intra/IBC block, according to another particular embodiment, a linear model for spatial illumination changes is defined using a scaling factor α and an offset β. The estimation of the spatial LIC parameters is derived by minimizing the difference between the neighboring reconstructed samples of the current block (current template) and the corresponding neighboring reconstructed samples of the spatial reference block (reference template) inside the same picture. Various embodiments described in the following relate to the derivation of the CU-level spatial LIC flag; the selection of a spatial neighboring block used as the reference block for spatial LIC parameters estimation, the generation of the template, which is composed by the neighboring reconstructed samples and is used for spatial LIC parameters estimation.
In the following, for the spatial LIC in inter prediction, its spatial LIC derivation, reference block decision and the generation of the template used for spatial LIC parameter estimation are described. Then, for the spatial LIC in intra prediction, the reference block decision and the template generation are also described, especially the difference compared to the spatial LIC in inter prediction. After, for the spatial LIC in IBC prediction, the reference block decision is also described. At last, the spatial reference block search for inter/inter prediction is proposed.
According to a first embodiment, spatial LIC is applied during the encoding/decoding of an inter block.
According to prior-art LIC tool described above, LIC is applied to compensate the temporal illumination changes between different frames in inter prediction and is referred as temporal LIC in the following. Given there might be some propagating illuminance variations between some spatial blocks inside the same frame, spatial LIC is proposed to further compensate the spatial illumination changes inside the same frame in inter prediction.
According to a variant embodiment, a spatial LIC flag spatial_lic_flag is defined to indicate whether spatial LIC applies or not. When an inter block is coded with merge mode, the spatial LIC flag is copied from neighboring blocks, in a way similar to motion information copy in merge mode; otherwise, the spatial LIC flag is signaled for the block.
According to another variant embodiment, when the spatial LIC applies for a CU, it is also based on a liner model for spatial illumination changes, using a scaling factor α and an offset β. The estimation of the spatial LIC parameters is derived by minimizing the difference between the neighboring reconstructed samples of the current block (i.e., the template T in
In
where N represents the number of template samples that are used for deriving the spatial LIC parameters; T(xi,yi) is the template sample of the current block at the coordinate (xi,yi); TA(xi,yi−hA) is the corresponding reconstructed sample of the template sample based on the above neighboring block (hA is the height of the above block) of the current block. Additionally, to reduce the computational complexity, only the shaded samples in
Similar estimation process for the left spatial LIC parameters (αL and βL) is derived as below, if the left spatial neighboring block of the current block is available:
where TL(xi−wL,yi) is the corresponding reconstructed sample of the template sample based on the left neighboring block (wL is the width of the left block) of the current block. Only the shaded samples in
If only above or left spatial neighboring block is available, the above spatial LIC parameters (αA and βA), or the left LIC parameters (αL and βL) are applied to the regular motion-compensated prediction samples to obtain the final prediction samples of the current block:
If both above and left spatial neighboring blocks are available, the above and left spatial LIC parameters are derived by separately minimizing the distortions between TA and T, and TL and T. Afterwards, the final prediction samples of the current block are generated by applying the final spatial LIC parameters, which are obtained by averaging the above and left spatial LIC parameters, as indicated as:
In case spatial_lic_flag is false, then only the usual motion compensation decoding process is involved, for example as specified by the VVC decoding process. In case spatial_lic_flag is true, then the next step 1070 consists the estimation of spatial LIC parameters with available above/left spatial neighboring blocks. If both above and left spatial neighboring blocks are available (step 1080), the final spatial LIC parameters are obtained by averaging the above and left spatial LIC parameters in step 1090. Afterwards, as depicted in step 1100, the final prediction samples of the current block are generated by applying the spatial LIC parameters on the regular motion-compensated prediction samples.
According to a variant of this embodiment, only above or left spatial LIC parameters are applied on the regular motion-compensated prediction samples to obtain the final prediction samples of the current block, the decision of using which spatial reference block is, for instance, done via rate-distortion (RD) or sum absolute difference (SAD) check. A flag lic_refblk_flag to indicate which spatial reference block and the corresponding spatial LIC parameters set is applied, is signaled into the bitstream. When lic_refblk_flag equals to 0, then the left spatial LIC parameters are applied; otherwise, the above spatial LIC parameters are applied.
As aforementioned, when both above and left spatial neighboring blocks are available, the above and left spatial LIC parameters are separately derived; then, the above and left spatial LIC parameters are averaged to generate the final spatial LIC parameters and are applied to obtain the final prediction samples of the current block. Given that such method needs to perform the LMSE-based spatial LIC derivation twice, it introduces non-negligible complexity increase at both encoder and decoder.
According to another variant of this embodiment, to reduce the complexity of the proposed spatial LIC derivation, one improved spatial LIC algorithm is proposed for the case when both above and left spatial neighboring blocks are available.
After that, the LMSE-based derivation is employed to calculate the values of the scaling factor α and the offset β used for the spatial LIC by minimizing the difference between the reference template Tave and the template of current block T as below:
Finally, the derived spatial LIC parameters are applied on the regular motion-compensated prediction samples to obtain the final prediction samples of the current block based on the linear model as shown on
Therefore, for this variant, only one spatial LIC parameter estimation needs to be performed to form the final prediction samples of the current block.
According to a second embodiment, the motion vector prediction (MVP) candidate is used as the reference block in inter prediction.
where TAR(xi+wAR,yi−hAR) is the corresponding reconstructed sample of the template sample based on the above-right neighboring block (hAR and wAR are the height and width of the above-right block). Similar spatial LIC parameters derivation process could be performed for bottom-left (A1) and above-left (B2) spatial neighboring blocks if they are selected.
In case spatial_lic_flag is false, then only the usual motion compensation decoding process is involved. In case spatial_lic_flag is true, then the next step 2080 comprises estimating the spatial LIC parameters with the corresponding selected spatial neighboring block. Afterwards, as depicted in step 2090, the final prediction samples of the current block are generated by applying the spatial LIC parameters on the regular motion-compensated prediction samples. According to yet another variant of this embodiment, if the spatial LIC is applied for the current block, rather than only one of the five spatial candidates is selected as best MVP candidate, the spatial LIC parameters from these five spatial neighboring blocks are applied to obtain the final prediction samples of the current block. The decision of using which spatial reference block set could be done via rate-distortion (RD) or sum absolute difference (SAD) check. An index lic_refblk_index indicating which spatial reference block and the corresponding spatial LIC parameters set is applied, is signaled into the bitstream.
According to a third embodiment, spatial LIC is applied during the encoding/decoding of an intra block. As aforementioned for inter prediction, the spatial LIC is proposed to compensate the spatial illumination changes inside the same frame. While the illumination changes could propagate gradually across the intra coded frame, the intra block to encode/decode might also contain those gradually propagating spatial illumination variation.
As specified by the VVC, Planar and DC intra prediction modes are used to predict smooth and gradually changing regions, whereas angular prediction modes are used to capture different directional structures. However, even DC and planar intra prediction modes are targeted for the smooth and gradually changing contents, they are unable to properly handle some contents with directional gradual and propagating illumination variations; similar limits for other directional intra prediction modes. Therefore, the third embodiment proposes to apply spatial LIC to compensate the spatial illumination changes for intra prediction.
As previously described for inter block, a spatial LIC flag spatial_lic_flag is defined and signaled for an intra block to indicate whether spatial LIC applies or not. When the spatial LIC applies, it is also based on a linear model for spatial illumination changes, using a scaling factor α and an offset β. The estimation of the spatial LIC parameters is also derived by minimizing the difference between the neighboring reconstructed samples of the current block and the corresponding neighboring reconstructed samples of the spatial reference block inside the picture.
As for selecting the possible spatial reference block, there are some differences between spatial LIC for inter prediction and intra prediction. For example, the spatial neighboring block used for estimating spatial LIC parameters is determined based on the intra prediction mode. Moreover, rather than considering both above and left boundaries, only above or left boundary is used to compose the template, which then is used for estimating spatial LIC parameters. Besides, according to yet another variant, the template is generated by more than just the reconstructed samples in the neighboring first above/left line, for example, the reconstructed samples in the second/third, or more above/left lines, or the whole reconstructed neighboring blocks. According to another variant embodiment, the proposed spatial LIC for intra prediction, is only activated for some intra prediction modes (i.e. DC and planar modes).
According to a variant of the third embodiment, spatial LIC is applied during the encoding/decoding of an intra block based on intra prediction mode. The spatial LIC parameters for intra prediction are estimated with the LMSE-based LIC derivation using the neighboring reconstructed samples of the nearest reconstructed spatial neighboring blocks (i.e. above/left/above-right/bottom-left/above-left in
The template used for estimating spatial LIC parameters respects to the intra prediction mode IPM as shown in Table 1.
According to another variant of the third embodiment, the intra prediction mode is a matrix weighted intra prediction.
If spatial LIC is applied for this intra coded CU with MIP, the templates used for estimating spatial LIC parameters are the same as the CU with non-angular modes, both the neighboring reconstructed samples of the above and left blocks (TA and TL in
If this block is intra predicted with MIP (step 3300), the estimation of spatial LIC parameters with the spatial above and left neighboring block are performed (step 3314). If this block is intra predicted with conventional intra prediction, the template decision for the spatial LIC parameters is based on the intra prediction mode IPM (step 3304). Then the next step 3305 consists the estimation of spatial LIC parameters with the corresponding selected templates. Afterwards, as depicted in step 3306/3315, the final prediction samples of the current block are generated by applying the spatial LIC parameters on the regular intra prediction samples. According to a variant of this embodiment, for DC and planar modes, rather than only using the above and left spatial neighboring block of the current block as the reference blocks, the other three templates from bottom-left, above-left, and above-right could also be used together for the spatial LIC parameters.
According to another variant of this embodiment, rather than only one of the five spatial templates is selected as the template for estimating spatial LIC parameters when the intra prediction mode belongs to horizontal/vertical direction, two or three templates could be used together to calculate the spatial LIC parameters. For example, for modes belong to horizontal directions (IPM 3 to 33), left, bottom-left and above-left blocks could be used as the reference blocks and its neighboring reconstructed samples (TL, TBL and TAL in
According to additional variants of the third embodiment, several shapes for the template used for estimating spatial LIC parameters are disclosed. As aforementioned, the template used for estimating spatial LIC parameters is always L-shape around the current/reference block, which is composed by the neighboring reconstructed samples located in the left and above boundaries of the current/reference block. Rather than using this fixed L-shape template, some more flexible template generations are proposed in this section.
According to a first variant, only left or above boundary of a spatial reference block are used as template. According to a previous variant of the third embodiment, the selection of the reference template is derived from the intra prediction mode IPM to enhance the different impact of illumination changes from left and above reference samples under some situations. For modes belong to horizontal directions (IPM 3 to 33), left reference template (TL in
According to a second variant, multi reference lines of a spatial reference block are used as template.
According to another variant, left-boundary template is applied for horizontal directional modes; and above-boundary template is used for vertical directional modes. The computational complexity is reduced with fewer samples in the template, meanwhile the estimation accuracy of the illumination variation might also be influenced. Therefore, according to another variant of this embodiment, multi reference lines from only left/above side are applied for horizontal/vertical directional modes.
According to another variant, a flag lic_mrl_flag indicating whether multi reference lines are applied for composing the template, is signaled into the bitstream. In case lic_mrl_flag is false, only the conventional nearest reference line (above/left boundary) will be applied for generating the template.
According to another variant, the template with multi reference line is applied in the spatial LIC parameters estimation for inter prediction. Indeed, different aspects of the multiple lines reference template are described with for spatial LIC applied in Intra prediction. However, this is for purposes of clarity in description, and does not limit the application or scope of those aspects neither to Intra prediction, nor to spatial LIC. Indeed, any of the different aspects can be combined and interchanged to provide template with multi reference line applied in the spatial LIC parameters estimation for inter prediction, or template with multi reference line is applied in the prior-art LIC parameters estimation for inter prediction.
According to another variant, the template comprises a whole reconstructed neighboring block.
According to a variant, only for small blocks (block size ≤8×8), the template is generated using reconstructed neighboring block. Advantageously, this feature allows to reduce the complexity of the variant of
According to another variant, using the reconstructed neighboring block as the template is applied in the spatial LIC parameters estimation for inter prediction or in the prior-art LIC parameters estimation for inter prediction.
According to a fourth embodiment, spatial LIC is applied during the encoding/decoding of an IBC block.
Compared to the spatial LIC for inter/intra prediction as described above, the spatial reference block, which is used for spatial LIC estimation for IBC prediction, is the same reference block used for intra copy (i.e., the template TIBC in
where TIBC(xi−bvx,yi−bvy) is the corresponding reference sample of the template sample based on the block vector (bvx, bvy) of the current block.
According to a fifth embodiment, the spatial reference block is searched in spatial LIC for intra and inter prediction. As aforementioned, the spatial LIC parameters for intra/inter prediction are estimated using the nearest reconstructed spatial neighboring blocks (above/left/above-right/bottom-left/above-left as illustrated on the exemplary
This application describes a variety of aspects, including tools, features, embodiments, models, approaches, etc. Many of these aspects are described with specificity and, at least to show the individual characteristics, are often described in a manner that may sound limiting. However, this is for purposes of clarity in description, and does not limit the application or scope of those aspects. Indeed, all of the different aspects can be combined and interchanged to provide further aspects. Moreover, the aspects can be combined and interchanged with aspects described in earlier filings as well.
The aspects described and contemplated in this application can be implemented in many different forms.
In the present application, the terms “reconstructed” and “decoded” may be used interchangeably, the terms “pixel” and “sample” may be used interchangeably, the terms “image,” “picture” and “frame” may be used interchangeably.
Various methods are described herein, and each of the methods comprises one or more steps or actions for achieving the described method. Unless a specific order of steps or actions is required for proper operation of the method, the order and/or use of specific steps and/or actions may be modified or combined. Additionally, terms such as “first”, “second”, etc. may be used in various embodiments to modify an element, component, step, operation, etc., such as, for example, a “first decoding” and a “second decoding”. Use of such terms does not imply an ordering to the modified operations unless specifically required. So, in this example, the first decoding need not be performed before the second decoding, and may occur, for example, before, during, or in an overlapping time period with the second decoding.
Various methods and other aspects described in this application can be used to modify modules, for example, the intra and/or inter prediction modules (160, 170, 260, 275) of a video encoder 100 and decoder 200 as shown in
Various numeric values are used in the present application, for example, the number of transforms, the number of transform level, the indices of transforms. The specific values are for example purposes and the aspects described are not limited to these specific values.
Before being encoded, the video sequence may go through pre-encoding processing (101), for example, applying a color transform to the input color picture (e.g., conversion from RGB 4:4:4 to YCbCr 4:2:0), or performing a remapping of the input picture components in order to get a signal distribution more resilient to compression (for instance using a histogram equalization of one of the color components). Metadata can be associated with the pre-processing, and attached to the bitstream.
In the encoder 100, a picture is encoded by the encoder elements as described below. The picture to be encoded is partitioned (102) and processed in units of, for example, CUs. Each unit is encoded using, for example, either an intra or inter mode. When a unit is encoded in an intra mode, it performs intra prediction (160). In an inter mode, motion estimation (175) and compensation (170) are performed. The encoder decides (105) which one of the intra mode or inter mode to use for encoding the unit, and indicates the intra/inter decision by, for example, a prediction mode flag. Prediction residuals are calculated, for example, by subtracting (110) the predicted block from the original image block.
The prediction residuals are then transformed (125) and quantized (130). The quantized transform coefficients, as well as motion vectors and other syntax elements, are entropy coded (145) to output a bitstream. The encoder can skip the transform and apply quantization directly to the non-transformed residual signal. The encoder can bypass both transform and quantization, i.e., the residual is coded directly without the application of the transform or quantization processes.
The encoder decodes an encoded block to provide a reference for further predictions. The quantized transform coefficients are de-quantized (140) and inverse transformed (150) to decode prediction residuals. Combining (155) the decoded prediction residuals and the predicted block, an image block is reconstructed. In-loop filters (165) are applied to the reconstructed picture to perform, for example, deblocking/SAO (Sample Adaptive Offset) filtering to reduce encoding artifacts. The filtered image is stored at a reference picture buffer (180).
In particular, the input of the decoder includes a video bitstream, which can be generated by video encoder 100. The bitstream is first entropy decoded (230) to obtain transform coefficients, motion vectors, and other coded information. The picture partition information indicates how the picture is partitioned. The decoder may therefore divide (235) the picture according to the decoded picture partitioning information. The transform coefficients are de-quantized (240) and inverse transformed (250) to decode the prediction residuals. Combining (255) the decoded prediction residuals and the predicted block, an image block is reconstructed. The predicted block can be obtained (270) from intra prediction (260) or motion-compensated prediction (i.e., inter prediction) (275). In-loop filters (265) are applied to the reconstructed image. The filtered image is stored at a reference picture buffer (280).
The decoded picture can further go through post-decoding processing (285), for example, an inverse color transform (e.g. conversion from YCbCr 4:2:0 to RGB 4:4:4) or an inverse remapping performing the inverse of the remapping process performed in the pre-encoding processing (101). The post-decoding processing can use metadata derived in the pre-encoding processing and signaled in the bitstream.
The system 5000 includes at least one processor 5010 configured to execute instructions loaded therein for implementing, for example, the various aspects described in this document. Processor 5010 can include embedded memory, input output interface, and various other circuitries as known in the art. The system 5000 includes at least one memory 5020 (e.g., a volatile memory device, and/or a non-volatile memory device). System 5000 includes a storage device 5040, which can include non-volatile memory and/or volatile memory, including, but not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), flash, magnetic disk drive, and/or optical disk drive. The storage device 5040 can include an internal storage device, an attached storage device (including detachable and non-detachable storage devices), and/or a network accessible storage device, as non-limiting examples.
System 5000 includes an encoder/decoder module 5030 configured, for example, to process data to provide an encoded video or decoded video, and the encoder/decoder module 5030 can include its own processor and memory. The encoder/decoder module 5030 represents module(s) that can be included in a device to perform the encoding and/or decoding functions. As is known, a device can include one or both of the encoding and decoding modules. Additionally, encoder/decoder module 5030 can be implemented as a separate element of system 5000 or can be incorporated within processor 5010 as a combination of hardware and software as known to those skilled in the art.
Program code to be loaded onto processor 5010 or encoder/decoder 5030 to perform the various aspects described in this document can be stored in storage device 5040 and subsequently loaded onto memory 5020 for execution by processor 5010. In accordance with various embodiments, one or more of processor 5010, memory 5020, storage device 5040, and encoder/decoder module 5030 can store one or more of various items during the performance of the processes described in this document. Such stored items can include, but are not limited to, the input video, the decoded video or portions of the decoded video, the bitstream, matrices, variables, and intermediate or final results from the processing of equations, formulas, operations, and operational logic.
In some embodiments, memory inside of the processor 5010 and/or the encoder/decoder module 5030 is used to store instructions and to provide working memory for processing that is needed during encoding or decoding. In other embodiments, however, a memory external to the processing device (for example, the processing device can be either the processor 5010 or the encoder/decoder module 5030) is used for one or more of these functions. The external memory can be the memory 5020 and/or the storage device 5040, for example, a dynamic volatile memory and/or a non-volatile flash memory. In several embodiments, an external non-volatile flash memory is used to store the operating system of, for example, a television. In at least one embodiment, a fast external dynamic volatile memory such as a RAM is used as working memory for video coding and decoding operations, such as for MPEG-2 (MPEG refers to the Moving Picture Experts Group, MPEG-2 is also referred to as ISO/IEC 13818, and 13818-1 is also known as H.222, and 13818-2 is also known as H.262), HEVC (HEVC refers to High Efficiency Video Coding, also known as H.265 and MPEG-H Part 2), or VVC (Versatile Video Coding, a new standard being developed by JVET, the Joint Video Experts Team).
The input to the elements of system 5000 can be provided through various input devices as indicated in block 5005. Such input devices include, but are not limited to, (i) a radio frequency (RF) portion that receives an RF signal transmitted, for example, over the air by a broadcaster, (ii) a Component (COMP) input terminal (or a set of COMP input terminals), (iii) a Universal Serial Bus (USB) input terminal, and/or (iv) a High Definition Multimedia Interface (HDMI) input terminal. Other examples, not shown in
In various embodiments, the input devices of block 5005 have associated respective input processing elements as known in the art. For example, the RF portion can be associated with elements suitable for (i) selecting a desired frequency (also referred to as selecting a signal, or band-limiting a signal to a band of frequencies), (ii) downconverting the selected signal, (iii) band-limiting again to a narrower band of frequencies to select (for example) a signal frequency band which can be referred to as a channel in certain embodiments, (iv) demodulating the downconverted and band-limited signal, (v) performing error correction, and (vi) demultiplexing to select the desired stream of data packets. The RF portion of various embodiments includes one or more elements to perform these functions, for example, frequency selectors, signal selectors, band-limiters, channel selectors, filters, downconverters, demodulators, error correctors, and demultiplexers. The RF portion can include a tuner that performs various of these functions, including, for example, downconverting the received signal to a lower frequency (for example, an intermediate frequency or a near-baseband frequency) or to baseband. In one set-top box embodiment, the RF portion and its associated input processing element receives an RF signal transmitted over a wired (for example, cable) medium, and performs frequency selection by filtering, downconverting, and filtering again to a desired frequency band. Various embodiments rearrange the order of the above-described (and other) elements, remove some of these elements, and/or add other elements performing similar or different functions. Adding elements can include inserting elements in between existing elements, such as, for example, inserting amplifiers and an analog-to-digital converter. In various embodiments, the RF portion includes an antenna.
Additionally, the USB and/or HDMI terminals can include respective interface processors for connecting system 5000 to other electronic devices across USB and/or HDMI connections. It is to be understood that various aspects of input processing, for example, Reed-Solomon error correction, can be implemented, for example, within a separate input processing IC or within processor 5010 as necessary. Similarly, aspects of USB or HDMI interface processing can be implemented within separate interface ICs or within processor 5010 as necessary. The demodulated, error corrected, and demultiplexed stream is provided to various processing elements, including, for example, processor 5010, and encoder/decoder 5030 operating in combination with the memory and storage elements to process the data stream as necessary for presentation on an output device.
Various elements of system 5000 can be provided within an integrated housing, Within the integrated housing, the various elements can be interconnected and transmit data therebetween using suitable connection arrangement 5015, for example, an internal bus as known in the art, including the Inter-IC (12C) bus, wiring, and printed circuit boards.
The system 5000 includes communication interface 5050 that enables communication with other devices via communication channel 5090. The communication interface 5050 can include, but is not limited to, a transceiver configured to transmit and to receive data over communication channel 5090. The communication interface 5050 can include, but is not limited to, a modem or network card and the communication channel 5090 can be implemented, for example, within a wired and/or a wireless medium.
Data is streamed, or otherwise provided, to the system 5000, in various embodiments, using a wireless network such as a Wi-Fi network, for example IEEE 802.11 (IEEE refers to the Institute of Electrical and Electronics Engineers). The Wi-Fi signal of these embodiments is received over the communications channel 5090 and the communications interface 5050 which are adapted for Wi-Fi communications. The communications channel 5090 of these embodiments is typically connected to an access point or router that provides access to external networks including the Internet for allowing streaming applications and other over-the-top communications. Other embodiments provide streamed data to the system 5000 using a set-top box that delivers the data over the HDMI connection of the input block 5005. Still other embodiments provide streamed data to the system 5000 using the RF connection of the input block 5005. As indicated above, various embodiments provide data in a non-streaming manner. Additionally, various embodiments use wireless networks other than Wi-Fi, for example a cellular network or a Bluetooth network.
The system 5000 can provide an output signal to various output devices, including a display 5065, speakers 5075, and other peripheral devices 5085. The display 5065 of various embodiments includes one or more of, for example, a touchscreen display, an organic light-emitting diode (OLED) display, a curved display, and/or a foldable display. The display 5065 can be for a television, a tablet, a laptop, a cell phone (mobile phone), or other device. The display 5065 can also be integrated with other components (for example, as in a smart phone), or separate (for example, an external monitor for a laptop). The other peripheral devices 5085 include, in various examples of embodiments, one or more of a stand-alone digital video disc (or digital versatile disc) (DVR, for both terms), a disk player, a stereo system, and/or a lighting system. Various embodiments use one or more peripheral devices 5085 that provide a function based on the output of the system 5000. For example, a disk player performs the function of playing the output of the system 5000.
In various embodiments, control signals are communicated between the system 5000 and the display 5065, speakers 5075, or other peripheral devices 5085 using signaling such as AV.Link, Consumer Electronics Control (CEC), or other communications protocols that enable device-to-device control with or without user intervention. The output devices can be communicatively coupled to system 5000 via dedicated connections through respective interfaces 5065, 5075, and 5085. Alternatively, the output devices can be connected to system 5000 using the communications channel 5090 via the communications interface 5050. The display 5065 and speakers 5075 can be integrated in a single unit with the other components of system 5000 in an electronic device such as, for example, a television. In various embodiments, the display interface 5065 includes a display driver, such as, for example, a timing controller (T Con) chip.
The display 5065 and speaker 5075 can alternatively be separate from one or more of the other components, for example, if the RF portion of input 5005 is part of a separate set-top box. In various embodiments in which the display 5065 and speakers 5075 are external components, the output signal can be provided via dedicated output connections, including, for example, HDMI ports, USB ports, or COMP outputs.
The embodiments can be carried out by computer software implemented by the processor 5010 or by hardware, or by a combination of hardware and software. As a non-limiting example, the embodiments can be implemented by one or more integrated circuits. The memory 5020 can be of any type appropriate to the technical environment and can be implemented using any appropriate data storage technology, such as optical memory devices, magnetic memory devices, semiconductor-based memory devices, fixed memory, and removable memory, as non-limiting examples. The processor 5010 can be of any type appropriate to the technical environment, and can encompass one or more of microprocessors, general purpose computers, special purpose computers, and processors based on a multi-core architecture, as non-limiting examples.
Various implementations involve decoding. “Decoding”, as used in this application, can encompass all or part of the processes performed, for example, on a received encoded sequence in order to produce a final output suitable for display. In various embodiments, such processes include one or more of the processes typically performed by a decoder, for example, entropy decoding, inverse quantization, inverse transformation, and differential decoding. In various embodiments, such processes also, or alternatively, include processes performed by a decoder of various implementations described in this application, for example, comprising deriving parameters of a spatial LIC and applying a spatial LIC to any of an inter prediction, intra prediction or IBC prediction.
As further examples, in one embodiment “decoding” refers only to entropy decoding, in another embodiment “decoding” refers only to differential decoding, and in another embodiment “decoding” refers to a combination of entropy decoding and differential decoding. Whether the phrase “decoding process” is intended to refer specifically to a subset of operations or generally to the broader decoding process will be clear based on the context of the specific descriptions and is believed to be well understood by those skilled in the art.
Various implementations involve encoding. In an analogous way to the above discussion about “decoding”, “encoding” as used in this application can encompass all or part of the processes performed, for example, on an input video sequence in order to produce an encoded bitstream. In various embodiments, such processes include one or more of the processes typically performed by an encoder, for example, partitioning, differential encoding, transformation, quantization, and entropy encoding. In various embodiments, such processes also, or alternatively, include processes performed by an encoder of various implementations described in this application, for example, deriving parameters of a spatial LIC and applying a spatial LIC to any of an inter prediction, intra prediction or IBC prediction.
As further examples, in one embodiment “encoding” refers only to entropy encoding, in another embodiment “encoding” refers only to differential encoding, and in another embodiment “encoding” refers to a combination of differential encoding and entropy encoding. Whether the phrase “encoding process” is intended to refer specifically to a subset of operations or generally to the broader encoding process will be clear based on the context of the specific descriptions and is believed to be well understood by those skilled in the art.
Note that the syntax elements as used herein, for example, spatial_lic_flag, lic_refblk_index, lic_mrl_flag are descriptive terms. As such, they do not preclude the use of other syntax element names.
When a figure is presented as a flow diagram, it should be understood that it also provides a block diagram of a corresponding apparatus. Similarly, when a figure is presented as a block diagram, it should be understood that it also provides a flow diagram of a corresponding method/process.
Various embodiments refer to rate distortion optimization. In particular, during the encoding process, the balance or trade-off between the rate and distortion is usually considered, often given the constraints of computational complexity. The rate distortion optimization is usually formulated as minimizing a rate distortion function, which is a weighted sum of the rate and of the distortion. There are different approaches to solve the rate distortion optimization problem. For example, the approaches may be based on an extensive testing of all encoding options, including all considered modes or coding parameters values, with a complete evaluation of their coding cost and related distortion of the reconstructed signal after coding and decoding. Faster approaches may also be used, to save encoding complexity, in particular with computation of an approximated distortion based on the prediction or the prediction residual signal, not the reconstructed one. Mix of these two approaches can also be used, such as by using an approximated distortion for only some of the possible encoding options, and a complete distortion for other encoding options. Other approaches only evaluate a subset of the possible encoding options. More generally, many approaches employ any of a variety of techniques to perform the optimization, but the optimization is not necessarily a complete evaluation of both the coding cost and related distortion.
The implementations and aspects described herein can be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed can also be implemented in other forms (for example, an apparatus or program). An apparatus can be implemented in, for example, appropriate hardware, software, and firmware. The methods can be implemented in, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.
Additionally, this application may refer to “determining” various pieces of information. Determining the information can include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application may refer to “accessing” various pieces of information. Accessing the information can include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information.
Additionally, this application may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information can include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as is clear to one of ordinary skill in this and related arts, for as many items as are listed.
Also, as used herein, the word “signal” refers to, among other things, indicating something to a corresponding decoder. For example, in certain embodiments the encoder signals a particular one of a plurality of parameters for transform. In this way, in an embodiment the same parameter is used at both the encoder side and the decoder side. Thus, for example, an encoder can transmit (explicit signaling) a particular parameter to the decoder so that the decoder can use the same particular parameter. Conversely, if the decoder already has the particular parameter as well as others, then signaling can be used without transmitting (implicit signaling) to simply allow the decoder to know and select the particular parameter. By avoiding transmission of any actual functions, a bit savings is realized in various embodiments. It is to be appreciated that signaling can be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth are used to signal information to a corresponding decoder in various embodiments. While the preceding relates to the verb form of the word “signal”, the word “signal” can also be used herein as a noun.
This disclosure has described various pieces of information, such as for example syntax, that can be transmitted or stored, for example. This information can be packaged or arranged in a variety of manners, including for example manners common in video standards such as putting the information into an SPS, a PPS, a NAL unit, a header (for example, a NAL unit header, or a slice header), or an SEI message. Other manners are also available, including for example manners common for system level or application level standards such as putting the information into:
As will be evident to one of ordinary skill in the art, implementations can produce a variety of signals formatted to carry information that can be, for example, stored or transmitted. The information can include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal can be formatted to carry the bitstream of a described embodiment. Such a signal can be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting can include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries can be, for example, analog or digital information. The signal can be transmitted over a variety of different wired or wireless links, as is known. The signal can be stored on a processor-readable medium.
We describe a number of embodiments. Features of these embodiments can be provided alone or in any combination, across various claim categories and types. Further, embodiments can include one or more of the following features, devices, or aspects, alone or in any combination, across various claim categories and types:
Number | Date | Country | Kind |
---|---|---|---|
21305170.9 | Feb 2021 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/051924 | 1/27/2022 | WO |