Embodiments of the present disclosure relate to, but are not limited to video technology, and more particularly, to a neural network based loop filtering method, a video encoding method and apparatus, a video decoding method and apparatus, and a system.
Digital video compression technology is mainly used to compress huge digital image video data for facilitating transmission and storage. A picture of an original video sequence contains luma and chroma components. In a process of digital video encoding, an encoder reads a monochrome picture or color picture and partitions each frame of the picture into largest coding unit (LCU) with the same size (such as 128×128 or 64×64). According to a rule, each largest coding unit may be partitioned into rectangular coding units (CUs), and may be further partitioned into prediction units (PUs), transform units (TU), or the like. A hybrid encoding framework includes a prediction module, a transform module, a quantization module, an entropy coding module, an in loop filter module or the like. The prediction module may employ intra prediction and inter prediction. The intra prediction predicts pixel information within a current block based on information of the same picture, to eliminate spatial redundancy. The inter prediction may refer to information of different pictures, and search for motion vector information that best matches with the current block by using motion estimation, to eliminate temporal redundancy. The transform may convert the predicted residual to a frequency domain, to redistribute its energy. Combined with the quantization, information that is not sensitive to the human eye may be removed, to eliminate visual redundancy. The entropy coding may eliminate character redundancy according to probability information of a binary bitstream and a current context model, to generate a bitstream.
With the proliferation in Internet videos and increasing demand for video definition from people, although the existing digital video compression standards may save a lot of video data, it is still necessary to pursue better digital video compression technology to reduce the bandwidth and traffic pressure of the digital video transmission.
The following is a summary for the subject matters described in detail herein. This summary is not intended to limit the scope of the claims.
An embodiment of the present disclosure provides a neural network based loop filtering (NNLF) method, which is applied to a filter for NNLF at a decoding side. The filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, and the method includes:
An embodiment of the present disclosure further provides a neural network based loop filtering method, which is applied to a filter for NNLF at a decoding side. The filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, and the method includes: when NNLF is performed on a reconstructed picture including three components input to the neural network, performing following processes for each component of the reconstructed picture:
An embodiment of the present disclosure further provides a video decoding method, which is applied to a video decoding apparatus and includes: performing following processes when NNLF is performed on a reconstructed picture: responsive to that NNLF enables residual offset, performing NNLF on the reconstructed picture according to the NNLF methods described in any one of embodiments applied to the filter for NNLF at the decoding side of the present disclosure.
An embodiment of the present disclosure further provides a neural network based loop filtering method, which is applied to a filter for NNLF at an encoding side. The filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, and the method includes:
An embodiment of the present disclosure further provides a neural network based loop filtering method, which is applied to a filter for NNLF at an encoding side. The filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, and the method includes:
An embodiment of the present disclosure further provides a video encoding method, which is applied to a video encoding apparatus and includes: performing following processes when NNLF is performed on a reconstructed picture:
An embodiment of the present disclosure further provides a bitstream, where the bitstream is generated by the video encoding method described in any one of embodiments of the present disclosure.
An embodiment of the present disclosure further provides a neural network based loop filter, which includes a processor and a memory storing a computer program. The processor, when executing the computer program, is capable of implementing the neural network based loop filtering methods described in any one of embodiments of the present disclosure.
An embodiment of the present disclosure further provides a video decoding apparatus, which includes a processor and a memory storing a computer program. The processor, when executing the computer program, is capable of implementing the video decoding method described in any one of embodiments of the present disclosure.
An embodiment of the present disclosure further provides a video encoding apparatus, which includes a processor and a memory storing a computer program. The processor, when executing the computer program, is capable of implementing the video encoding method described in any one of embodiments of the present disclosure.
An embodiment of the present disclosure further provides a video encoding and decoding system, which includes the video encoding apparatus described in any one of embodiments of the present disclosure and the video decoding apparatus described in any one of embodiments of the present disclosure.
An embodiment of the present disclosure further provides a non-transitory computer readable storage medium, and the computer readable storage medium stores a computer program. The computer program, when executed by a processor, is capable of implementing the neural network based loop filtering methods described in any one of embodiments of the present disclosure, or implementing the video decoding method described in any one of embodiments of the present disclosure, or implementing the video encoding method described in any one of embodiments of the present disclosure.
Other aspects may be understood upon reading and understanding the drawings and detailed description.
The accompanying drawings are used to provide understanding of the embodiments of the present disclosure, and constitute a part of the description, which is used to illustrate technical solutions of the present disclosure together with the embodiments of the present disclosure, and do not constitute a limitation to the technical solutions of the present disclosure.
Multiple embodiments are described in the present disclosure, however, the description is exemplary rather than restrictive, and it is apparent to those skilled in the art that there may be more embodiments and implementations within the scope of the embodiments described in the present disclosure.
In the description of the present disclosure, the wordings, such as “exemplarily” or “for example” and variations thereof, are used to indicate examples, instances, or illustrations. Any embodiment described in the present disclosure with wordings such as “exemplarily” or “for example” should not be illustrated as being more preferred or advantageous over other embodiments. The term “and/or” in the present disclosure is just the description for an association relationship between related objects, indicating that there are three relationships. For example, A and/or B may represent these three situations: A exists alone; A and B exist simultaneously; or B exists alone. Terms “multiple” or “a plurality of” and variations thereof mean two or more than two. In addition, in order to describe the technical solutions of the embodiments of the present disclosure clearly, wordings such as “first” and “second” are adopted to distinguish same or similar items with substantially same functions and effects. Those skilled in the art may understand that wordings such as “first”, “second” do not limit the quantity and the execution order, and the wordings such as “first”, “second” do not limit that corresponding objects must be different.
When describing representative exemplary embodiments, the description may have presented methods and/or processes with a specific sequence of steps. However, to the extent that the methods or processes do not depend on the specific order of steps described herein, the methods or processes should not be limited to the specific order of steps as described. As those of ordinary skill in the art will understand, other orders of steps are also possible. Therefore, the specific order of steps set forth in the description should not be construed as the limitations of the claims. Furthermore, the claims for the methods and/or processes should not be limited to the described orders to perform their steps, and those skilled in the art may readily understand that these orders may be varied, and still remain within the spirit and scope of the embodiments of the present disclosure.
Neural network based loop filtering methods, a video encoding method and a video decoding method of the embodiments of the present disclosure may be applied to various video codec standards, for example, H.264/Advanced Video Coding (AVC), H.265/High Efficiency Video Coding (HEVC), H.266/Versatile Video Coding (VVC), Audio Video coding Standard (AVS), and other standards formulated by Moving Picture Experts Group (MPEG), Alliance for Open Media (AOM) and Joint Video Experts Team (JVET) and extensions of these standards, or any other customized standards.
As illustrated in
A partition unit 101, and the partition unit 101 is configured to cooperate with a prediction unit 100, to partition received video data into slices, coding tree units (CTUs) or other larger units. The received video data may be a video sequence including video frames such as I frames, P frames or B frames.
A prediction unit 100, and the prediction unit 100 is configured to partition the CTU into coding units (CUs) and perform intra prediction encoding or inter prediction encoding on the CUs. When intra prediction and inter prediction are performed on a CU, the CU may be partitioned into one or more prediction units (PUs).
The prediction unit 100 includes an inter prediction unit 121 and an intra prediction unit 126.
The inter prediction unit 121 is configured to perform inter prediction on the PU, to generate prediction data for the PU, where the prediction data includes a prediction block of the PU, motion information of the PU and various syntax elements. The inter prediction unit 121 may include a motion estimation (ME) unit and a motion compensation (MC) unit. The motion estimation unit may be configured for motion estimation, to generate a motion vector, and the motion compensation unit may be configured to obtain or generate a prediction block according to the motion vector.
The intra prediction unit 126 is configured to perform intra prediction on the PU, to generate prediction data for the PU. The prediction data for the PU may include a prediction block of the PU and various syntax elements.
A residual generation unit 102 (represented by a circle with a plus sign behind the partition unit 101 in
A transform processing unit 104, and the transform processing unit 104 is configured to partition the CU into one or more transform units (TUs), and the partition of the prediction unit and the partition of the transform unit may be different. A residual block associated with a TU is a sub-block obtained by partitioning the residual block of the CU. A coefficient block associated with the TU is generated by applying one or more transforms to the residual block associated with the TU.
A quantization unit 106, and the quantization unit 106 is configured to quantize a coefficient in the coefficient block based on quantizer parameters. A quantization degree of the coefficient block may be adjusted by adjusting the quantizer parameters (QP).
An inverse quantization unit 108 and an inverse transform processing unit 110, and the inverse quantization unit 108 and the inverse transform processing unit 110 are configured to apply inverse quantization and inverse transform to the coefficient block, respectively, to obtain a reconstructed residual block associated with the TU.
A reconstruction unit 112 (represented by a circle with a plus sign behind the inverse transform processing unit 110 in the
A filter unit 113, and the filter unit 113 is configured to perform in loop filtering on the reconstructed picture.
A decoded picture buffer 114, and the decoded picture buffer 114 is configured to store the reconstructed picture after in loop filtering. The intra prediction unit 126 may extract a reference picture of a block adjacent to the current block from the decoded picture buffer 114 to perform intra prediction. The inter prediction unit 121 may perform inter prediction on a PU of a current frame picture by using a reference picture of a previous frame buffered by the decoded picture buffer 114.
Am entropy coding unit 115, and the entropy coding unit 115 is configured to perform an entropy coding operation on the received data (such as syntax elements, quantized coefficient blocks and motion information), to generate a video bitstream.
In other examples, the video encoding apparatus 10 may include more, fewer or different functional components than those in this example, for example, cancelling the transform processing unit 104 and the inverse transform processing unit 110.
An entropy decoding unit 150, and the entropy decoding unit 150 is configured to perform entropy decoding on received encoded bitstream, to extract syntax elements, quantized coefficient blocks, motion information of PUs, or the like. A prediction unit 152, an inverse quantization unit 154, an inverse transform processing unit 155, a reconstruction unit 158 and a filter unit 159 may all perform corresponding operations based on the syntax elements extracted from the bitstream.
An inverse quantization unit 154, and the inverse quantization unit 154 is configured to perform inverse quantization on a quantized coefficient block associated with the TU.
An inverse transform processing unit 155, and the inverse transform processing unit 155 is configured to apply one or more inverse transforms to the inverse quantized coefficient block, to generate a reconstructed residual block of the TU.
A prediction unit 152, and the prediction unit 152 includes an inter prediction unit 162 and an intra prediction unit 164. If intra prediction encoding is adopted by the current block, the intra prediction unit 164 determines an intra prediction mode of the PU based on the syntax element decoded from the bitstream, and performs intra prediction according to reconstructed reference information adjacent to the current block acquired from the decoded picture buffer 160. If inter prediction encoding is adopted by the current block, inter prediction unit 162 determines a reference block of the current block based on motion information of the current block and the corresponding syntax element, and performs inter prediction based on the reference blocks acquired from decoded picture buffer 160.
A reconstruction unit 158 (represented by a circle with a plus sign behind the inverse transform processing unit 155 in the
A filter unit 159, and the filter unit 159 is configured to perform in loop filtering on the reconstructed picture.
A decoded picture buffer 160, and the decoded picture buffer 160 is configured to store the reconstructed picture after in loop filtering as a reference picture for subsequent motion compensation, intra prediction, inter prediction, or the like. The filtered reconstructed picture after in loop filtering may also be output as decoded video data for presentation on a display apparatus.
In other embodiments, the video decoding apparatus 15 may include more, fewer or different functional components, for example, in some cases, the inverse transform processing unit 155 may be cancelled.
Herein, the current block may be a block-level encoding unit such as a current coding tree unit (CTU), a current coding unit (CU), or a current prediction unit (PU) in the current picture.
Based on the above video encoding apparatus and video decoding apparatus, the following basic encoding and decoding processes may be performed. At the encoding side, one frame of the picture is partitioned into blocks, intra prediction or inter prediction or other algorithms is performed on the current block to generate a prediction block of the current block, an original block of the current block is used to subtract the prediction block to obtain a residual block, transform and quantization are performed on the residual block to obtain a quantization coefficient, and entropy encoding is performed on the quantization coefficient to generate a bitstream. At the decoding side, intra prediction or inter prediction is performed on the current block to generate a prediction block of the current block. In addition, inverse quantization and inverse transform are performed on quantization coefficients that are obtained by decoding the bitstream, to obtain a residual block. The prediction block is added with the residual block to obtain a reconstructed block, and reconstructed blocks form a reconstructed picture. In loop filtering is performed on the reconstructed picture based on a picture or based on a block, to obtain a decoded picture. The encoding side also obtains a decoded picture through operations similar to those operations at the decoding side, and the decoded picture obtained at the encoding side is also referred to as a reconstructed picture after in loop filtering. The decoded picture after in loop filtering may be used as a reference frame for inter prediction for subsequent frames. Block partition information, mode information (such as prediction, transform, quantization, entropy coding or in loop filtering) and parameter information determined at the encoding side may be written into the bitstream. The decoding side determines the block partition information, mode information (such as prediction, transform, quantization, entropy coding or in loop filtering) and parameter information used at the encoding side through decoding the bitstream or performing analysis according to set information, thereby ensuring that the decoded picture obtained at the encoding side is the same as the decoded picture obtained at the decoding side.
Although a block-based hybrid encoding framework is taken as an example above, the embodiments of the present disclosure are not limited thereto. With the development of technology, one or more modules in the framework and one or more steps in the process may be replaced or optimized.
The embodiments of the present disclosure relate to, but are not limited to, filter units (the filter unit may also be referred to as an in loop filtering module) and corresponding in loop filtering methods at the above encoding side and decoding side.
In an embodiment, the filter units at the encoding side and the decoding side include tools such as a deblocking filter (DBF) 20, a sample adaptive offset (SAO) filter 22 and an adaptive loop filter (ALF) 26. A neural network based loop filter (NNLF) 24 is also included between the SAO and the ALF, as illustrated in
In an exemplary embodiment, a neural network based loop filtering NNLF solution is provided, and the model used adopts a filter network illustrated in
As illustrated in
One model is used for NNLF1 to perform filtering on the YUV components of the reconstructed picture (rec_YUV), to output the YUV components of the filtered picture (out_YUV), as illustrated in
In another exemplary embodiment, another NNLF solution is provided and denoted as NNLF2. Two models are used for NNLF2, in which one model is used to perform filtering on the luma component of the reconstructed picture, and the other model is used to perform filtering on the two chroma components of the reconstructed picture. The two models may adopt the same filter network, and there is also a skip connection branch between the reconstructed picture input to the filter for NNLF2 and the filtered picture output by the filter for NNLF2. As illustrated in
A first model of NNLF2 used to perform filtering on the luma component of the reconstructed picture is illustrated in
The above NNLF1 and NNLF2 solutions in neural network based video coding (NNVC) may be implemented by neural network based common software (NCS), in which the NCS is taken as a baseline tool in the reference software testing platform of NNVC, that is, the baseline NNLF.
In the field of deep learning, the concept of residual learning has been proposed, which enables a network to focus on learning residual information of a picture through a simple skip connection structure from an input to an output, so as to improve learning ability and prediction performance of the network. A basic structure of a residual network (ResNet) is illustrated in
In video encoding, inter prediction technology enables a current frame to refer to picture information of previous frame, so as to improve the encoding performance, however, encoding effect of the previous frame will also affect that of subsequent frame. In the NNLF1 and NNLF2 solutions, in order to enable the filter network to adapt to influence of the inter prediction technology, training processes of the model includes an initial training stage and an iterative training stage, and adopts a way of multi-round training. In the initial training stage, a model to be trained has not been deployed in the coder yet, and the first round of training is performed on the model by using collected sample data of the reconstructed picture, to obtain a model after the first round of training. In the iterative training stage, all models have been deployed in the coder. Firstly, the model after the first round of training is deployed in the coder, the sample data of the reconstructed picture is recollected, and the second round of training is performed on the model after the first round of training to obtain the model after the second round of training. Then, the model after the second round of training is deployed in the coder, the sample data of the reconstructed picture is recollected, and the third round of training is performed on the model after the second round of training to obtain the model after the third round of training; and so on iteratively. Finally, encoding testing is performed on the models after each round of training on the validation set, to find the model with the best encoding performance for actual deployment.
However, with the multi-round training operation, there is still a certain lag between training and encoding testing. The analysis is as follows: a schematic diagram of the (N+1)-th round of training is illustrated in
When encoding testing is performed on the model_N+1, the model (model_N+1) is deployed in the encoder or decoder. As illustrated in
Herein, for the residual value of the residual picture, the residual of the residual picture become smaller by performing residual offset means that the residual value in the residual picture is closer to 0, that is, the absolute value of the residual value becomes smaller, such as 3 becomes 2, −3 becomes −2, but does not means such change of −3 becoming −4. The residual becoming smaller is for the residual picture as a whole, which may be that the absolute values of the residual values of some pixels become smaller and the residual values of some pixels remains unchanged, or may be that the absolute values of all non-zero residual values become smaller except for the zero-value residual values not changing, or may be that the absolute values of some non-zero residual values becomes smaller. For example, the residual values in the value interval [1, 2] and [−1, −2] may remain unchanged, while the absolute values of the residual values greater than or equal to 3 and less than or equal to −3 may become smaller.
An embodiment of the present disclosure provides a neural network based loop filtering method, which is applied to a filter for NNLF at an encoding side. The filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, as illustrated in
In S110, a reconstructed picture is input into the neural network, to obtain a residual picture output by the neural network.
In S120, a rate distortion cost cost1 of performing NNLF on the reconstructed picture using a first mode is calculated, and a rate distortion cost cost2 of performing NNLF on the reconstructed picture using a second mode is calculated.
The first mode is an NNLF mode that residual offset is not performed on the residual picture, and the second mode is an NNLF mode that residual offset is performed on the residual picture.
In S130, the first mode is selected to perform NNLF on the reconstructed picture responsive to that the cost1 is less than the cost2 (i.e., cost1<cost2); the second mode is selected to perform NNLF on the reconstructed picture responsive to that the cost2 is less than the cost1 (i.e., cost2<cost1); or the first mode or the second mode is selected to perform NNLF on the reconstructed picture responsive to that the cost1 is equal to the cost2 (i.e., cost1=cost2).
In the neural network based loop filtering method of the embodiment, the encoding side may select a mode with the lower rate distortion cost from the mode with residual offset and the mode without residual offset to perform NNLF, to compensate for the performance loss caused by the lag in training of NNLF mode relative to encoding testing to a certain extent, thereby improving the effect of NNLF and enhancing the encoding performance.
Unless otherwise limited, residual offset herein refers to residual offset performed on the residual picture when neural network based loop filtering is performed on the reconstructed picture.
In an exemplary embodiment of the present disclosure, the reconstructed picture is a reconstructed picture of a current frame or a current slice or a current block, and may also be a reconstructed picture of other coding units. Herein, the reconstructed picture for performing NNLF may be coding units at different levels, such as picture-level (including frame and slice), or block-level.
In an exemplary embodiment of the present disclosure, a residual of the residual picture becomes smaller by performing residual offset.
In an exemplary embodiment of the present disclosure, the operation that the rate distortion cost cost1 of performing NNLF on the reconstructed picture using the first mode is calculated includes: adding the residual picture and the reconstructed picture, to obtain a first filtered picture; and calculating the cost1 according to a difference between the first filtered picture and a corresponding original picture. In a case where both the reconstructed picture and the residual picture include three components, such as Y component, U component and V component, the cost1 may be obtained by calculating sum of squared differences (SSDs) between the first filtered picture and the original picture on the three components and performing weighted sum on the SSDs on the three components. In the present embodiment, the operation that the first mode is selected to perform NNLF on the reconstructed picture includes: taking the first filtered picture obtained by adding the residual picture and the reconstructed picture as a filtered picture that is output after NNLF is performed on the reconstructed picture. In the present embodiment, when NNLF is performed on the reconstructed picture using the first mode, the above filtering method of NNLF1 or NNLF2 may be adopted, or other filtering methods that do not perform residual offset on the residual picture may also be adopted.
In an exemplary embodiment of the present disclosure, the operation that the rate distortion cost cost2 of performing NNLF on the reconstructed picture using the second mode is calculated includes:
In an example of the present embodiment, in a case where there is one set residual offset mode, one rate distortion cost is calculated, and the rate distortion cost is the cost2. In another example of the present embodiment, in a case where there are multiple set residual offset modes, it is assumed that there are two set residual offset modes, two rate distortion costs are calculated, and the minimum rate distortion cost of the two rate distortion costs is the cost2.
In an example of the present embodiment, the operation of performing residual offset on the residual picture and adding with the reconstructed picture may be the operation that the result obtained by performing residual offset on the residual picture is added with the reconstructed picture. The residual offset mode is that, for example, 1 is subtracted from positive residual values in the residual picture, and 1 is added to negative residual values in the residual picture, that is, the residual value with a value of 0 is not offset (adjusted), so that the residual in the residual picture becomes smaller as a whole. However, in the specific implementation, it is not necessary to calculate in this order, for example, it is also possible to add the residual picture and the reconstructed picture firstly, in which the picture obtained by the addition also includes the residual picture, and then perform residual offset on the residual picture. Taking any pixel in the picture as an example, it is assumed that the value of the pixel in the residual picture (i.e. the residual value) is x, the value of the pixel in the reconstructed picture (i.e. the reconstructed value) is y, and 1 is subtracted from the residual value of the pixel by performing residual offset. When the value of the pixel in the second filtered picture is calculated, the result is the same either by the operation of first subtracting 1 from x and then adding y or by the operation of first adding x to y and then subtracting 1. In other embodiments of the present disclosure, including the embodiments at the decoding side, the specific implementation of performing residual offset on the residual picture (or its components) and adding with the reconstructed picture (or its components) is also the same.
In an example of the present embodiment, both the reconstructed picture and the residual picture include three components, and the operation of calculating the respective rate distortion cost according to the difference between the respective second filtered picture and the original picture includes: calculating SSDs between the respective second filtered picture and the original picture on the three components, and performing weighted sum on the SSDs on the three components, to obtain the respective rate distortion cost.
In an exemplary embodiment of the present disclosure, the set residual offset modes includes one or more of the following types.
A fixed value is added or subtracted to a non-zero residual value in the residual picture, to enable an absolute value of the non-zero residual value to become smaller. For example, 1 is subtracted from the positive residual value in the residual picture, and 1 is added to the negative residual value in the residual picture.
According to an interval in which a non-zero residual value in the residual picture is located, an offset value corresponding to the interval is added or subtracted to the non-zero residual value, to enable an absolute value of the non-zero residual value to become smaller. Where there are multiple intervals, and a larger value in the interval, larger the offset value corresponding to the interval. For example, the residual values in the residual picture that are greater than or equal to 1 and less than or equal to 5 are subtracted by 1, the residual values that are greater than 5 are subtracted by 2, the residual values that are less than or equal to −1 and greater than or equal to −5 are added with 1, and the residual values that are less than −5 are added with 2.
In the above embodiment, the encoding performance is improved by offsetting the residual information output by the filter network. As mentioned above, the residual of the residual picture output by the neural network may be relatively large, and the residual is reduced by performing residual offset, so as to improve the encoding performance. For one residual picture, the residual value of each pixel in the residual picture may be positive or negative; when reducing the residual, the positive residual values may be subtracted by the fixed value (positive number) and the negative residual values may be added to the fixed value, and the residual value with a value of 0 is not offset (adjusted), so that the residual values become smaller as a whole, that is, closer to 0. As illustrated in
In addition to adopting the residual offset mode with the fixed value, other types of residual offset modes may also be adopted. For example, the residual values are segmented according to their magnitude, and offset operations with different precisions are attempted. For example, for a residual value with a larger absolute value, an offset value with a larger absolute value is set; and for a residual value with a smaller absolute value, an offset value with a smaller absolute value is set.
An example pseudo code is as follows.
It is assumed that the residual value corresponding to the current pixel of the current frame is res, and the offset value that needs to be decided is RO_FACTOR. The specific strategy for deriving the offset value is as follows:
Where {x1, x2, x3} represent positive residual values, {y1, y2, y3} represent negative residual values, and {a1, a2, a3} and {b1, b2, b3} are preset candidate fixed values.
The above solutions do not offset (adjust) the 0-value residual values, but search for the interval into which the non-zero residual values fall (a total of 6 intervals are set), to determine the offset (adjust) value to be used.
In the present embodiment, the multiple set residual offset modes may include one type of residual offset mode, or may include multiple types of residual offset modes.
In the above embodiment, the three components in the residual picture are offset uniformly by using the same residual offset mode. Residual offset is performed on the residual picture by using this residual offset mode is an overall optimal result under the premise of uniform offsetting the three components. However, the residual offset mode is not necessarily an optimal residual offset mode for the specific component in the residual picture. In this regard, the determination of whether to separately perform residual offset on each component and the selection of the residual offset mode may be made, so as to further optimize the encoding performance.
An embodiment of the present disclosure provides a neural network based loop filtering method, which is applied to a filter for NNLF at an encoding side. The filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, as illustrated in
In S210, a reconstructed picture is input into the neural network, to obtain a residual picture output by the neural network; where both the reconstructed picture and the residual picture include three components, such as Y component, U component and V component.
In S210, following processes are performed for each component of the reconstructed picture and the residual picture, where the processes may be referred to as mode selection processes.
A rate distortion cost cost1 of performing NNLF on the component of the reconstructed picture using a first mode is calculated and a rate distortion cost cost2 of performing NNLF on the component of the reconstructed picture using a second mode is calculated, where the first mode is an NNLF mode that residual offset is not performed on the component of the residual picture, and the second mode is an NNLF mode that residual offset is performed on the component of the residual picture.
The first mode is selected to perform NNLF on the component of the reconstructed picture responsive to that the cost1 is less than the cost2 (i.e., cost1<cost2); the second mode is selected to perform NNLF on the component of the reconstructed picture responsive to that the cost2 is less than the cost1 (i.e., cost2<cost1); or the first mode or the second mode is selected to perform NNLF on the component of the reconstructed picture responsive to that the cost1 is equal to the cost2 (i.e., cost1=cost2).
In the present embodiment, mode selection processes may be performed on each component separately, so that the encoding performance may be further optimized based on the above embodiment in which the three components are uniformly offset. Since the correlation operations are performed at the output side of the filter for NNLF, the influence on the computational complexity is not significant.
In an exemplary embodiment of the present disclosure, the reconstructed picture is a reconstructed picture of a current frame or a current slice or a current block.
In an exemplary embodiment of the present disclosure, a residual in the component of the residual picture becomes smaller by performing residual offset.
In an exemplary embodiment of the present disclosure, the operation that the rate distortion cost cost1 of performing NNLF on the component of the reconstructed picture using the first mode is calculated includes: adding the component of the residual picture and the component of the reconstructed picture, to obtain the filtered component; and calculating the cost1 according to a difference between the filtered component and the component of a corresponding original picture. In an example, the difference is represented by SSD, that is, the SSD between the filtered component and the component of the corresponding original picture is taken as the cost1. In other examples, the difference may also be represented as mean square error (MSE), mean absolute error (MAE) and other indicators, which does not limit in the present disclosure. Other embodiments of the present disclosure are the same.
In an example of the embodiment, the operation that the first mode is selected to perform NNLF on the component of the reconstructed picture includes: taking the filtered component obtained by adding the component of the residual picture and the component of the reconstructed picture as the component of a filtered picture that is output after NNLF is performed on the reconstructed picture. In the present embodiment, the component of the filtered picture is obtained by performing NNLF on the component of the reconstructed picture according to the first mode, and no residual offset is performed on the component of the residual picture. Therefore, the component of the filtered picture may be the same as the component of the filtered picture obtained by the NNLF solution that no residual offset is performed (such as NNLF1 or NNLF2).
In an exemplary embodiment of the present disclosure, the operation that the rate distortion cost cost2 of performing NNLF on the component of the reconstructed picture using the second mode is calculated includes:
In an example of the present embodiment, the operation that the second mode is selected to perform NNLF on the component of the reconstructed picture includes:
In an example of the present embodiment, the operation of performing residual offset on the component of the residual picture and adding with the component of the reconstructed picture may be the operation that the result obtained by performing residual offset on the component of the residual picture is added with the component of the reconstructed picture. However, in the specific implementation, it is not necessary to calculate in this order.
In an example of the present embodiment, set residual offset modes for the three components are the same or different. For example, the set residual offset mode for the Y component is to subtract 1 from the positive residual value and add 1 to the negative residual value in the residual picture; and the set residual offset mode for the U component and the V component is to subtract 1 from the residual value greater than 2 and add 1 to the residual value less than −2 in the residual picture.
In an example of the embodiment, set residual offset modes for at least one of the three components include one or more of following types:
An embodiment of the present disclosure further provides a video encoding method, which is applied to a video encoding apparatus and includes: performing following processes when NNLF is performed on a reconstructed picture, as illustrated in
In S310, responsive to that NNLF enables residual offset, NNLF is performed on the reconstructed picture according to the NNLF methods described in any one of embodiments of the present disclosure.
In S320, a residual offset usage flag of the reconstructed picture is encoded, to indicate whether residual offset needs to be performed when NNLF is performed on the reconstructed picture.
In the embodiments of the present disclosure, when neural network based loop filtering is performed on the reconstructed picture, the residual picture may be offset (adjusted) or not offset (adjusted) according to the rate distortion cost, so as to compensate for the performance loss caused by the lag in training of NNLF mode relative to encoding testing, thereby improving the encoding performance.
In an exemplary embodiment of the present disclosure, the residual offset usage flag is a picture-level syntax element or a block-level syntax element.
In an exemplary embodiment of the present disclosure, it is determined that NNLF enables residual offset responsive to that one or more of following conditions are met:
In addition to the above conditions, other conditions may also be added. For example, a condition that a frame in which the input reconstructed picture is located is an inter encoded frame is taken as a necessary condition for NNLF to allow residual offset, or the like.
In other embodiments of the present disclosure, residual offset of NNLF may also be enabled all the time. In this case, there is no need to determine whether residual offset is enabled for NNLF through a flag, and the default is that NNLF enables residual offset.
In an exemplary embodiment of the present disclosure, the method further includes: responsive to that it is determined that NNLF disables residual offset, skipping encoding of the residual offset usage flag, and adding the reconstructed picture input into the neural network to the residual picture output by the neural network, to obtain a filtered picture that is output after NNLF is performed on the reconstructed picture. That is, at this time, filtering on the reconstructed picture may be implemented using NNLF without residual offset.
In an exemplary embodiment of the present disclosure, the method is to perform NNLF on the reconstructed picture according to any of the above embodiments of uniformly performing residual offset on the three components of the residual picture. A number of residual offset usage flags roflags of the reconstructed picture is one; responsive to that the first mode is selected to perform NNLF on the reconstructed picture, the roflag is set to a value (e.g. 0) indicating that residual offset does not need to be performed; and responsive to that the second mode is selected to perform NNLF on the reconstructed picture, the roflag is set to a value (e.g. 1) indicating that residual offset needs to be performed.
In an example of the present embodiment, the method further includes: responsive to that the roflag is set as the value indicating that residual offset needs to be performed and there are multiple set residual offset modes, continuing to encode an index of residual offset mode of the reconstructed picture, and the index of the residual offset mode is used to indicate a residual offset mode to be based when residual offset is performed. For example, when there are three set residual offset modes, the index of residual offset mode may be a 2-bit flag. The value of the flag being 0, 1 or 2 represents the three residual offset modes, respectively. The correspondence between the value of the flag and the residual offset mode is agreed in advance (for example, defined in the standard or protocol) at the encoding side and the decoding side.
In the present embodiment, two flags (i.e., the residual offset usage flag and the index of residual offset mode) are adopted to indicate whether residual offset needs to be performed and the residual offset mode to be based when residual offset is performed (when there are the multiple set residual offset modes), respectively. However, in another exemplary embodiment of the present disclosure, in a case where there are the multiple set residual offset modes, the residual offset usage flag is also used to indicate the residual offset mode to be based when residual offset is performed, that is, in the present embodiment, the residual offset usage flag is used to simultaneously indicate whether residual offset needs to be performed and the residual offset mode to be based when residual offset is performed. For example, in a case where there are three set residual offset modes, a 2-bit residual offset usage flag (roflag) is used, in which the four values of the roflag may indicate that residual offset does not need to be performed, residual offset is performed using the first residual offset mode, residual offset is performed using the second residual offset mode, and residual offset is performed using the third residual offset mode, respectively.
In a case where there are three set residual offset modes and residual offset does not need to be performed, the embodiment that two flags are used only needs to encode one 1-bit flag, that is, the residual offset usage flag, and does not need to encode the index of residual offset mode; and while the embodiment that one flag is used needs to encode one 2-bit residual offset usage flag. In a case where there are three set residual offset mode and residual offset needs to be performed, the embodiment that two flags are used needs to encode one 1-bit residual offset usage flag and one 2-bit the index of residual offset mode, and while the embodiment that one flag is used needs to encode one 2-bit residual offset usage flag.
In an exemplary embodiment of the present disclosure, the method is to perform NNLF on the reconstructed picture according to any of the above embodiments of respectively performing residual offset on the three components of the residual picture. A number of residual offset usage flags roflag(j) of the reconstructed picture is three, j being 1, 2 and 3, and the roflag(j) is used to indicate whether residual offset needs to be performed when NNLF is performed on a j-th component of the reconstructed picture; responsive to that the first mode is selected to perform NNLF on the j-th component of the reconstructed picture, the roflag(j) is set to a value (e.g., 0) indicating that residual offset does not need to be performed; and responsive to that the second mode is selected to perform NNLF on the j-th component of the reconstructed picture, the roflag(j) is set to a value (e.g., 1) indicating that residual offset needs to be performed.
In an example of the embodiment, the method further includes: responsive to that the roflag(j) is set to the value indicating that residual offset needs to be performed and there are multiple set residual offset modes for the j-th component, continuing to encode an index of residual offset mode index(j) of the j-th component of the reconstructed picture, to indicate a residual offset mode to be based when residual offset is performed on the j-th component of the residual picture.
In another exemplary embodiment of the present disclosure, In a case where there are the multiple set residual offset modes for the j-th component, the residual offset usage flag of the j-th component is also used to indicate the residual offset mode to be based when residual offset is performed, that is, in the present embodiment, the residual offset usage flag of the j-th component is used to simultaneously indicate whether residual offset needs to be performed and the residual offset mode to be based when residual offset is performed.
An embodiment of the present disclosure further provides a neural network based loop filtering method, which is applied to a filter for NNLF at a decoding side. The filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, as illustrated in
In S410, a residual offset usage flag roflag of a reconstructed picture is decoded, where the roflag is used to indicate whether residual offset needs to be performed when NNLF is performed on the reconstructed picture.
In S420, NNLF is performed on the reconstructed picture using a first mode in response to determining, according to the roflag, that residual offset does not need to be performed, or NNLF is performed on the reconstructed picture using a second mode in response to determining, according to the roflag, that residual offset needs to be performed.
Where the first mode is an NNLF mode that residual offset is not performed on a residual picture output by the neural network, and the second mode is an NNLF mode that residual offset is performed on the residual picture.
In the present embodiment, the neural network based loop filtering method selects a better mode from two modes with and without residual offset by decoding the residual offset usage flag, which may enhance the filtering effect of NNLF and improve the quality of the decoded picture.
In an exemplary embodiment of the present disclosure, the roflag may adopt a 1-bit flag, and whether residual offset needs to be performed may be indicated according to the value of the roflag. For example, when the value of the roflag is 1, it is determined that residual offset needs to be performed, and when the value of roflag is 0, it is determined that residual offset does not need to be performed. Other embodiments of the present disclosure are the same.
In an exemplary embodiment of the present disclosure, a residual of the residual picture becomes smaller by performing residual offset.
In an exemplary embodiment of the present disclosure, the reconstructed picture is a reconstructed picture of a current frame or a current slice or a current block.
In an exemplary embodiment of the present disclosure, the residual offset usage flag is a picture-level syntax element or a block-level syntax element.
In an exemplary embodiment of the present disclosure, the operation that NNLF is performed on the reconstructed picture using the first mode includes: adding the residual picture output by the neural network and a reconstructed picture input into the neural network, to obtain a filtered picture that is output after NNLF is performed on the reconstructed picture. The operation that NNLF is performed on the reconstructed picture using the second mode includes: performing residual offset on the residual picture according to one of set residual offset modes, and adding with the reconstructed picture, to obtain a filtered picture that is output after NNLF is performed on the reconstructed picture.
In an example of the present embodiment, where in a case where there are multiple set residual offset methods, the operation of performing residual offset on the residual picture according to one of the set residual offset modes includes: continuing to decode an index of residual offset mode of the reconstructed picture, where the index is used to indicate a residual offset mode to be based when residual offset is performed; and performing residual offset on the residual picture according to a residual offset mode indicated by the index.
In the present embodiment, two flags are used to indicate whether residual offset needs to be performed and the residual offset mode to be based when residual offset is performed, respectively. In another exemplary embodiment, in a case where there are the multiple set residual offset modes, the encoding side uses one flag, that is, the residual offset usage flag (roflag), to simultaneously indicate whether residual offset needs to be performed and the residual offset mode to be based when residual offset is performed. At this time, the decoding side continues to determine the residual offset mode to be based when residual offset is performed according to the roflag of the reconstructed picture, and performs residual offset on the residual picture according to the determined residual offset mode.
In an exemplary embodiment of the present disclosure, the set residual offset modes include one or more of the following types:
An embodiment of the present disclosure further provides a neural network based loop filtering (NNLF) method, which is applied to a filter for NNLF at a decoding side. The filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, and the method includes: as illustrated in
In S510, a residual offset usage flag roflag of the component of the reconstructed picture is decoded, where the roflag is used to indicate whether residual offset needs to be performed when NNLF is performed on the component of the reconstructed picture.
In S520, NNLF is performed on the component of the reconstructed picture using a first mode in response to determining, according to the roflag, that residual offset does not need to be performed, or NNLF is performed on the component of the reconstructed picture using a second mode in response to determining, according to the roflag, that residual offset needs to be performed.
Where the first mode is an NNLF mode that residual offset is not performed on the component of a residual picture output by the neural network, and the second mode is an NNLF mode that residual offset is performed on the component of the residual picture.
According to the neural network based loop filtering method of the present embodiment, a better mode for each component is selected from the two NNLF modes with and without residual offset to perform NNLF on each component by decoding the residual offset usage flag, which may further enhance the filtering effect of NNLF and improve the quality of the decoded picture, compared to the uniform mode selection for the multiple components.
In an exemplary embodiment of the present disclosure, a residual in the component of the residual picture becomes smaller by performing residual offset.
In an exemplary embodiment of the present disclosure, the reconstructed picture is a reconstructed picture of a current frame or a current slice or a current block.
In an exemplary embodiment of the present disclosure, the residual offset usage flag is a picture-level syntax element or a block-level syntax element.
In an exemplary embodiment of the present disclosure, the operation that NNLF is performed on the component of the reconstructed picture using the first mode includes: adding the component of the residual picture and the component of the reconstructed picture, to obtain the component of a filtered picture that is output after NNLF is performed on the reconstructed picture.
The operation that NNLF is performed on the component of the reconstructed picture using the second mode includes: performing residual offset on the component of the residual picture according to one of set residual offset modes for the component, and adding with the component of the reconstructed picture, to obtain the component of a filtered picture that is output after NNLF is performed on the reconstructed picture; where there are one or more set residual offset modes for the component.
In an example of the present embodiment, where in a case where there are multiple set residual offset methods for the component, the operation of performing residual offset on the component of the residual picture according to one of the set residual offset modes for the component includes: continuing to decode an index of residual offset mode of the component of the reconstructed picture, where the index is used to indicate a residual offset mode to be based when residual offset is performed; and performing residual offset on the component of the residual picture according to a residual offset mode indicated by the index.
In an example of the present embodiment, the picture header is illustrated in the following table:
In the table, ro_enable_flag represents the sequence-level residual offset enabled flag. In a case where ro_enable_flag is 1, the following semantic is defined:
The compIdx in the above table represents the sequence number of the color component. For a picture in YUV format, the value of the compIdx is 0/1/2 in general.
In other examples, NNLF may be performed in units of blocks (such as CTUs), and in this case, the residual offset usage flag and the index of residual offset mode are defined as block-level syntax elements.
In the present embodiment, two flags are adopted to indicate whether residual offset needs to be performed on the component and the residual offset mode to be based when residual offset is performed, respectively. In another exemplary embodiment, in a case where there are the multiple set residual offset modes for the component, the encoding side adopts one flag (that is, the residual offset usage flag (roflag)) to simultaneously indicate whether residual offset needs to be performed on the component and the residual offset mode to be based when residual offset is performed. In this case, the decoding side determines, according to the roflag of the component, the residual offset mode to be based when residual offset is performed, and performs residual offset on the component of the residual picture according to the determined residual offset mode.
In an exemplary embodiment of the present disclosure, set residual offset modes for the three components are same or different; and set residual offset modes for at least one of the three components include one or more of following types:
An embodiment of the present disclosure further provides a video decoding method, which is applied to a video decoding apparatus and includes: as illustrated in
In S610, it is determined whether NNLF enables residual offset.
In S620, responsive to that NNLF enables residual offset, NNLF is performed on the reconstructed picture according to the NNLF methods described in any one of embodiments of the present disclosure applied to the filter for NNLF at the decoding side.
According to the video decoding method of the present embodiment, a better mode for each component is selected from the two NNLF modes with and without residual offset to perform NNLF on each component by decoding the residual offset usage flag, which may further enhance the filtering effect of NNLF and improve the quality of the decoded picture, compared to the uniform mode selection for the multiple components.
In an exemplary embodiment of the present disclosure, where it is determined that NNLF enables residual offset responsive to that one or more of following conditions are met:
In addition to the above conditions, other conditions may also be added. For example, a condition that a frame in which the input reconstructed picture is located is an inter encoded frame is taken as a necessary condition for NNLF to enable (allow) residual offset, or the like.
In an example of using the sequence-level residual offset enabled flag, the sequence header of the video sequence is illustrated in the following table:
The ro_enable_flag in the table is the sequence-level residual offset enabled flag.
In an exemplary embodiment of the present disclosure, the method further includes: responsive to that NNLF disables residual offset, adding a reconstructed picture input into the neural network and a residual picture output by the neural network, to obtain a filtered picture that is output after NNLF is performed on the reconstructed picture.
In an exemplary embodiment of the present disclosure, the filter for NNLF is arranged after a deblocking filter or a sample adaptive offset filter and before an adaptive loop filter. In an example of the present embodiment, the structure of the filter unit (or referred to as the in loop filter module, referring to
An embodiment of the present disclosure provides a neural network based loop filtering method. When loop filtering is performed on the reconstructed picture at the encoder side, processes are performed according to the order of the deployed filters, and when entering NNLF, the following processes are performed.
Step 1, it is determined whether residual offset is enabled in the current sequence according to the sequence-level residual offset enabled flag (ro_enable_flag). Responsive to that ro_enable_flag is “1”, which represents that residual offset is enabled for the current sequence for attempt and skips to step 2; and responsive to that ro_enable_flag is “0”, which represents that residual offset is disabled for the current sequence and ends the processes (skips subsequent processes).
Step 2, the reconstructed picture of the current frame is input into the neural network of NNLF for prediction, to obtain the residual picture from the output of NNLF, and the residual picture is superimposed on the input reconstructed picture, to obtain the first filtered picture.
Step 3, residual offset is performed on the residual picture and then superimposed with the input reconstructed picture, to obtain the second filtered picture.
Step 4, the rate distortion cost CNNLF is calculated by comparing the first filtered picture and the original picture of the current frame; and the rate distortion cost CRO is calculated by comparing the second filtered picture and the original picture of the current frame.
Step 5, the two costs are compared. Responsive to that CRO is less than CNNLF (CRO<CNNLF), the second filtered picture is taken as the filtered picture output by the filter for NNLF, that is, the second mode is selected to perform NNLF on the reconstructed picture. Responsive to that CRO is greater than or equal to CNNLF (CRO≥CNNLF), the first filtered picture is taken as the filtered picture output by the filter, that is, the first mode is selected to perform NNLF on the reconstructed picture.
The calculation formula of the rate distortion cost in the present embodiment is:
Where SSD(*) represents the SSD for a certain color component; Wy, Wu and Wv represent the weight values of the SSDs for the Y component, U component and V component, respectively, for example, the values may be taken as 10:1:1 or 8:1:1, or the like.
The calculation formula of SSD is as follows:
Where M represents the length of the reconstructed picture of the current frame, N represents the width of the reconstructed picture of the current frame, and rec(x, y) and org(x, y) represent the pixel values of the reconstructed picture and the original picture at the pixel (x, y), respectively.
Step 6, the residual offset usage flag (picture_ro_enable_flag) of the current frame and the index of residual offset mode (picture_ro_index) are encoded into the bitstream.
Step 7, if all blocks in the current frame have been processed, processing on the current frame is terminated, and then the next frame may be continuously loaded for processing. If there are still blocks in the current frame that have not been processed, step 2 is returned.
In the present embodiment, NNLF processing is performed in units of the reconstructed picture of the current frame. In other embodiments, NNLF processing may also be performed based on other coding units such as blocks (such as CTUs) and slices of the current frame.
In the present embodiment, NNLF1 baseline tool is selected as a comparison. On the basis of NNLF1, mode selection processing is performed on inter encoded frames (i.e., non-I frame), and two residual offset modes using the fixed values are set, in which the fixed values are set to 1 and 2, respectively. Under the general test conditions of random access and low delay B configuration, the general sequence specified by the Joint Video Experts Team (JVET) has been tested, and the anchor for the comparison is NNLF1. The results are illustrated in Tables 1 and 2:
The meanings of the parameters in the tables are as follows.
EncT: encoding Time, 10X % represents that after the technology of performing residual offset based on the NNLF mode selection processes is integrated, an encoding time is 10X %, compared with the situation of before integration, which means that the encoding time increases by X %.
DecT: decoding Time, 10X % represents that after technology of performing residual offset based on the NNLF mode selection processes is integrated, the decoding time is 10X % compared with the situation of before integration, which means that the decoding time increases by X %.
Class A1 and Class A2 are test video sequences with a resolution of 3840×2160, ClassB is a test sequence with a resolution of 1920×1080, a resolution of ClassC is 832×480, a resolution of ClassD is 416×240, and a resolution of ClassE is 1280×720; and ClassF are screen content sequences with several different resolutions.
Y, U, and V are the three color components. The columns in which Y, U, and V are located represent the BD-rate (Bjøntegaard-delta rate) indicators of the test results on Y, U, and V. The smaller the value, the better the encoding performance.
By analyzing the data in the two tables, it may be seen that by introducing the optimization method of residual offset, the encoding performance may be further improved on the basis of NNLF1, especially in the chroma component. The influence about residual offset of the present embodiment on the decoding complexity is not significant.
NNLF mode selection may also be performed for an intra encoded frame (I frame) according to the method of the present embodiment.
An embodiment of the present disclosure further provides a bitstream, where the bitstream is generated by the video encoding method described in any one of embodiments of the present disclosure.
An embodiment of the present disclosure further provides a neural network based loop filter, as illustrated in
An embodiment of the present disclosure further provides a video decoding apparatus, illustrated in
An embodiment of the present disclosure further provides a video encoding apparatus, illustrated in
The processor of the above embodiments of the present disclosure may be a general-purpose processor, which includes a central processing unit (CPU), a network processor (NP), a microprocessor, etc., or may be other conventional processors, etc. The processor may also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a discrete logic component or other programmable logic components, a discrete gate or a transistor logic component, a discrete hardware assembly, or other equivalent integrated or discrete logic circuits, or a combination thereof. That is, the processor in the above embodiments may be any processing component or component combination that implements the various methods, steps and logic block diagrams disclosed in the embodiments of the present disclosure. If the embodiments of the present disclosure are partially implemented in software, instructions for the software may be stored in a suitable non-transitory computer readable storage medium, and one or more processors may be used to execute the instructions in hardware to implement the methods of the embodiments of the present disclosure. The term “processor” used herein may refer to the above structure or any other structure suitable for implementation of the techniques described herein.
An embodiment of the present disclosure further provides a video encoding and decoding system, referring to
An embodiment of the present disclosure further provides a non-volatile computer readable storage medium, and the computer readable storage medium stores a computer program. The computer program, when executed by a processor, is capable of implementing the neural network based loop filtering methods described in any one of embodiments of the present disclosure, or implementing the video decoding method described in any one of embodiments of the present disclosure, or implementing the video encoding method described in any one of embodiments of the present disclosure.
In a first clause, provided is a neural network based loop filtering (NNLF) method, which is applied to a filter for NNLF at a decoding side, where the filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, and the method includes:
In a second clause, the method of the first clause, a residual of the residual picture becomes smaller by performing residual offset;
In a third clause, the method of the first clause, where performing NNLF on the reconstructed picture using the first mode includes:
In a fourth clause, the method of the third clause, where in a case where there are a plurality of set residual offset methods, performing residual offset on the residual picture according to one of the set residual offset modes includes:
In a fifth clause, the method of the first clause, the set residual offset modes includes one or more of following types:
In a sixth clause, provided in a neural network based loop filtering (NNLF) method, which is applied to a filter for NNLF at a decoding side, where the filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, and the method includes:
In a seventh clause, the method of the sixth clause, a residual in the component of the residual picture becomes smaller by performing residual offset;
In an eighth clause, the method of the sixth clause, where performing NNLF on the component of the reconstructed picture using the first mode includes:
performing residual offset on the component of the residual picture according to one of set residual offset modes for the component, and adding with the component of the reconstructed picture, to obtain the component of a filtered picture that is output after NNLF is performed on the reconstructed picture; where there are one or more set residual offset modes for the component.
In a ninth clause, the method of the eighth clause, where in a case where there are a plurality of set residual offset methods for the component, performing residual offset on the component of the residual picture according to one of the set residual offset modes for the component includes:
In a tenth clause, the method of the sixth clause, set residual offset modes for the three components are same or different; and
In an eleventh clause, provided is a video decoding method, which is applied to a video decoding apparatus and includes:
In a twelfth clause, the method of the eleventh clause, where it is determined that NNLF enables residual offset responsive to that one or more of following conditions are met:
In a thirteenth clause, the method of the eleventh clause, the method further includes:
In a fourteenth clause, the method of the eleventh clause, the filter for NNLF is arranged after a deblocking filter or a sample adaptive offset filter and before an adaptive loop filter.
In a fifteenth clause, provided is a neural network based loop filtering (NNLF) method, applied to a filter for NNLF at an encoding side, where the filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, and the method includes:
In a sixteenth clause, the method of the fifteenth clause, the reconstructed picture is a reconstructed picture of a current frame or a current slice or a current block; and a residual of the residual picture becomes smaller by performing residual offset.
In a seventeenth clause, the method of the fifteenth clause, where calculating the rate distortion cost cost1 of performing NNLF on the reconstructed picture using the first mode includes:
In an eighteenth clause, the method of the seventeenth clause, where calculating the rate distortion cost cost2 of performing NNLF on the reconstructed picture using the second mode includes:
In a nineteenth clause, the method of the eighteenth clause, where selecting the second mode to perform NNLF on the reconstructed picture includes:
In a twentieth clause, the method of the eighteenth clause, where both the reconstructed picture and the residual picture include three components;
In a twenty-first clause, the method of the eighteenth clause, where the set residual offset modes include one or more of following types:
In a twenty-second clause, provided is a neural network based loop filtering (NNLF) method, which is applied to a filter for NNLF at an encoding side, where the filter for NNLF includes a neural network and a skip connection branch from an input to an output of the filter for NNLF, and the method includes:
In a twenty-third clause, the method of the twenty-second clause, the reconstructed picture is a reconstructed picture of a current frame or a current slice or a current block; and a residual in the component of the residual picture becomes smaller by performing residual offset.
In a twenty-fourth clause, the method of the twenty-second clause, where calculating the rate distortion cost cost1 of performing NNLF on the component of the reconstructed picture using the first mode includes:
In a twenty-fifth clause, the method of the twenty-second clause, where calculating the rate distortion cost cost2 of performing NNLF on the component of the reconstructed picture using the second mode includes:
In a twenty-sixth clause, the method of the twenty-fifth clause, where selecting the second mode to perform NNLF on the component of the reconstructed picture includes:
In a twenty-seventh clause, the method of the twenty-fifth clause, where set residual offset modes for the three components are same or different; and
In a twenty-eighth clause, provided is a video encoding method, which is applied to a video encoding apparatus and includes:
In a twenty-ninth clause, the method of the twenty-eighth clause, the residual offset usage flag is a picture-level syntax element or a block-level syntax element.
In a thirtieth clause, the method of the twenty-eighth clause, where it is determined that NNLF enables residual offset responsive to that one or more of following conditions are met:
In a thirty-first clause, the method of the twenty-eighth clause, the method further includes:
In a thirty-second clause, the method of the twenty-eighth clause, where the method is to perform NNLF on the reconstructed picture according to the method of the fifteenth to twenty-first clauses;
In a thirty-third clause, the method of the thirty-second clause, where the method is to perform NNLF on the reconstructed picture according to the method of the eighteenth to twenty-first clauses, and the method further includes:
In a thirty-fourth clause, the method of the twenty-eighth clause, where the method is to perform NNLF on the reconstructed picture according to the method of the twenty-second to twenty-seventh clauses;
In a thirty-fifth clause, the method of the thirty-fourth clause, where the method is to perform NNLF on the reconstructed picture according to the method of claims twenty-fifth to twenty-seventh clauses, and the method further includes:
In a thirty-sixth clause, provided is a bitstream, the bitstream is generated according to the video encoding method of any one of twenty-eighth to thirty-fifth clauses.
In a thirty-seventh clause, provided is a neural network based loop filter, which includes a processor and a memory storing a computer program, where the processor, when executing the computer program, is capable of implementing the neural network based loop filtering methods of any one of the first to tenth clauses or the fifteenth to twenty-seventh clauses.
In a thirty-eighth clause, provided is a video decoding apparatus, which includes a processor and a memory storing a computer program, where the processor, when executing the computer program, is capable of implementing the video decoding method according to any one of claims the eleventh to fourteenth clauses.
In a thirty-ninth clause, provided is a video encoding apparatus, which includes a processor and a memory storing a computer program, where the processor, when executing the computer program, is capable of implementing the video encoding method according to any one of the twenty-eighth to thirty-fifth clauses.
In a fortieth clause, provided is a video encoding and decoding system, which includes the video encoding apparatus of the thirty-ninth clause and the video decoding apparatus of the thirty-eighth clause.
In a forty-first clause, provided is a non-transitory computer readable storage medium storing a computer program, where the computer program, when executed by a processor, is capable of implementing the neural network based loop filtering methods according to any one of the first to tenth clauses or the fifteenth to twenty-second clauses, or implementing the video decoding method according to any one of the eleventh to fourteenth clauses, or implementing the video encoding method according to any one of the twenty-eighth to thirty-fifth clauses.
In one or more of the above exemplary embodiments, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If these functions are implemented in software, the functions may be stored on a computer readable medium or transmitted by the computer readable medium as one or more instructions or codes, which are executed by a hardware-based processing unit. The computer readable medium may include a computer-readable storage medium corresponding to a tangible medium such as a data storage medium, or may include a communication medium including any medium that facilitates the transfer of a computer program from one place to another, for example, transfer according to a communication protocol. In this manner, the computer-readable medium may generally correspond to a non-transitory tangible computer-readable storage medium or a communication medium such as a signal or a carrier. The data storage medium may be any available medium that may be accessed by one or more computers or one or more processors to retrieve instructions, codes and/or data structures for implementation of the technologies described in the present disclosure. A computer program product may include a computer-readable medium.
By way of example but not limitation, such computer readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical storage devices, magnetic storage devices or other magnetic storage devices, a flash memory, or any other media that may be used to store desired program codes in the form of instructions or data structures and that may be accessed by a computer. Moreover, any connection may also be referred to as a computer-readable medium. For example, if a coaxial cable, a fiber optic cable, a twisted pair cable, a digital subscriber line (DSL), or a wireless technology such as infrared, radio, or microwave is used to transmit instructions from a website, a server, or other remote sources, then the coaxial cable, the fiber optic cable, the twisted pair cable, the DSL, or the wireless technology such as infrared, radio, or microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carriers, signals, or other transient (transitory) media, but are instead directed to non-transitory tangible storage media. As used herein, a disk and a light disk include a compact disc (CD), a laser disc, an optical disc, a digital versatile disc (DVD), a floppy disk, or a Blu-ray disc, etc., where the disk usually reproduces data magnetically, while the light disk reproduces data optically with lasers. Combinations of the above description should also be included within the scope of the computer-readable media.
This application is a Continuation Application of International Application No. PCT/CN2022/125231 filed on Oct. 13, 2022, which is incorporated herein by reference in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2022/125231 | Oct 2022 | WO |
| Child | 19175530 | US |