The present invention relates in general to the field of image processing, and more specifically to the coding and the decoding of digital images and of sequences of digital images.
The coding/decoding of digital images applies in particular to images from at least one video sequence comprising:
The present invention applies similarly to the coding/decoding of 2D or 3D images. The invention may in particular, but not exclusively, be applied to the video coding implemented in current AVC, HEVC and VVC video encoders and their extensions (MVC, 3D-AVC, MV-HEVC, 3D-HEVC, etc.), and to the corresponding decoding.
Current video encoders (MPEG, AVC, HEVC, VVC, AV1, etc.) use a blockwise representation of the video sequence. The images are split up into blocks, which are able to be split up again recursively. Next, each block is coded by intra-image or inter-image prediction. Thus, some images are coded by spatial prediction (intra prediction, IBC (for “Intra Block Copy”) prediction), and other images are also coded by temporal prediction (inter prediction) with respect to one or more coded-decoded reference images, using motion compensation, which is well known to those skilled in the art.
A prediction block BP associated with a block currently being coded is related directly to at least one reference block BR0 of the image to which the block currently being coded belongs or of an already decoded image, conventionally called reference image. In order to match the reference block BR0 to the block currently being coded, the reference block BR0 is displaced for each spatial position (x,y) of the pixels thereof. A motion-compensated reference block BC0 is then obtained. The relationship between the prediction block BP and the motion-compensated reference block BC0 is then expressed as follows:
BP(x,y)=(1−w)*BC0(x,y)
where w is a prediction weighting parameter, which is 0 most of the time, but which may be adjustable, as explained below.
When for example a block currently being coded is predicted with respect to two reference blocks BR0 and BR1 belonging to one or two already decoded reference images, the two reference blocks BR0 and BR1 are motion-compensated, generating two motion-compensated reference blocks BC0 and BC1, which are then combined by linear weighting. Each pixel of the prediction block BP is the result of weighting of the pixels of the two motion-compensated reference blocks BC0 and BC1. More precisely, if for example the prediction is implemented row by row, and from left to right:
The most common weighting, applied by default, is the half-sum. To this end, the prediction block BP is computed according to the following relationship:
BP(x,y)=0.5*BC0(x,y)+0.5*BC1(x,y)
More elaborate weightings are possible.
In the HEVC standard, the linear weighting is applied uniformly to the image currently being coded. The weighting parameter w is fixed therein, and is signaled to the decoder, for each sub-image or “slice” of the image currently being coded. By default, in the case of a bi-prediction of the block currently being coded, the balanced weighting (0.5/0.5) is applied unless this is explicitly indicated in the PPS (for “Picture Parameter Set”) information.
In the VVC standard, the prediction is weighted block by block using the BCW (for “bi-prediction with CU level weights”) tool. A prediction block BP is computed according to the following relationship:
BP(x,y)=(1−w)*BC0(x,y)+w*BC1(x,y)
where the weighting parameter w may take 5 values: 0.5, 0.625, 0.375, 1.25, −0.25. The optimum value of the weighting parameter w to be applied is determined at the encoder and signaled to the decoder for each block. It is encoded with a context element that indicates whether it is the value w=0.5 that is used, that is to say the equal weighting on the motion-compensated reference blocks BC0 and BC1. If not, then the weighting is signaled on 2 bits to indicate one of the 4 remaining values. This principle is adopted in the AV1 technique.
It should be noted that, regardless of the video standard used, the weighting parameter w is associated with a relatively small number of values, thereby leading to a lack of precision in the weighted prediction that is applied. Furthermore, the encoder according to the abovementioned standards systematically has to code and transmit, to the decoder, the value of the weighting parameter w that has been selected, thereby increasing the signaling cost.
One of the aims of the invention is to rectify the drawbacks of the abovementioned prior art by improving the precision of the weighted prediction from the prior art, in favor of reducing the cost of signaling information related to this prediction.
To this end, one subject of the present invention relates to a method for predicting at least one current set of pixels, implemented by a prediction device, wherein said at least one current set of pixels is predicted based on at least one reference set of pixels, using a pixel prediction weighting function, characterized in that the pixel prediction weighting function for said at least one current set of pixels is associated with at least one weighting value computed based on analysis of at least one reference set of pixels.
Such a prediction method according to the invention advantageously makes it possible to rely only on one or more reference sets of pixels, in other words one or more sets of pixels that are already decoded at the time of the prediction, to estimate the weighting of the prediction of a current set of pixels. Since this or these reference sets of pixels are available at the time of the prediction of the current set of pixels, the estimation of the weighting of the prediction is improved as it is more spatially precise than that implemented in the prior art, which requires approximating or quantizing the one or more weighting values of the prediction.
According to one particular embodiment, the prediction weighting function is modified using at least one modification parameter that results from analysis of said at least one current set of pixels.
Such an embodiment advantageously makes it possible to apply a correction to the prediction weighting function that has been computed, when the current set of pixels contains an element that was not present/predictable in the one or more reference sets of pixels.
The invention also relates to a device for predicting at least one current set of pixels, comprising a processor that is configured to predict said at least one current set of pixels based on at least one reference set of pixels, using a pixel prediction weighting function.
Such a prediction device is characterized in that the pixel prediction weighting function for said at least one current set of pixels is associated with at least one weighting value computed based on analysis of at least one reference set of pixels. In one particular embodiment, the prediction device is a neural network.
The use of a neural network advantageously makes it possible to optimize the quality of the weighted prediction.
Such a prediction device is in particular able to implement the abovementioned prediction method.
The invention also relates to a method for coding at least one current set of pixels, implemented by a coding device, comprising the following:
Such a coding method is characterized in that the prediction set of pixels is obtained using the abovementioned prediction method according to the invention.
Such a coding method is advantageous in that it does not require the coding of one or more prediction weighting values of the prediction weighting function. This means that this or these prediction weighting values do not need to be transmitted by the encoder to a decoder for the current set of pixels, thereby making it possible to reduce the cost of signaling the information transmitted between the encoder and the decoder in favor of better quality of the image related to the improved precision of the prediction. Furthermore, any weighting value associated with the prediction weighting function does not need to be approximated or quantized with a view to being transmitted to the decoder, thereby making it possible to make this weighting value continuous for the set of pixels to be predicted.
According to one particular embodiment, the coding method comprises the following:
The invention also relates to a coding device or encoder for coding at least one current set of pixels, comprising a processor that is configured to implement the following:
Such a coding device is characterized in that the prediction set of pixels is obtained using the abovementioned prediction device according to the invention.
Such a coding device is in particular able to implement the abovementioned coding method.
The invention also relates to a method for decoding at least one current set of pixels, implemented by a decoding device, comprising the following:
Such a decoding method is characterized in that the prediction set of pixels is obtained using the abovementioned prediction method according to the invention. The advantage of such a decoding method lies in the fact that the prediction weighting function is computed autonomously by the decoder based on one or more available reference sets of pixels, without the decoder needing to read specific information from the data signal received from the encoder. Moreover, as already explained above, the at least one weighting value of the prediction weighting function, since it is neither coded nor transmitted in a data signal, may be made continuous, without having to be approximated or quantized, as is the case in the prior art.
In one particular embodiment, such a decoding method further comprises the following:
The invention also relates to a decoding device or decoder for decoding at least one current set of pixels, comprising a processor that is configured to implement the following:
Such a decoding device is characterized in that the prediction set of pixels is obtained using the abovementioned prediction device according to the invention.
Such a decoding device is in particular able to implement the abovementioned decoding method.
The invention also relates to a method for constructing at least one set of pixels from at least one reference set of pixels, implemented by a video data processing device. Such a construction method is characterized in that the set of pixels is constructed using a pixel prediction weighting function, such as the prediction function used in the abovementioned prediction method of the invention.
The prediction weighting function of the invention is thus not limited just to the context of an image prediction generating or not generating a prediction residual, and may be advantageously used in the case of an interpolation or an image synthesis based on one or more already decoded reference images.
The invention also relates to a computer program comprising instructions for implementing the prediction method according to the invention and also the coding or decoding method integrating the prediction method according to the invention, or else the abovementioned construction method, according to any one of the particular embodiments described above, when said program is executed by a processor. Such instructions may be permanently stored in a non-transitory memory medium of the prediction device implementing the abovementioned prediction method, of the encoder implementing the abovementioned coding method, of the decoder implementing the abovementioned decoding method, of the video processing device implementing the abovementioned construction method.
This program may use any programming language and be in the form of source code, object code or intermediate code between source code and object code, such as in a partially compiled form, or in any other desirable form.
The invention also targets a computer-readable recording medium or information medium comprising instructions of a computer program as mentioned above. The recording medium may be any entity or device capable of storing the program. For example, the medium may comprise a storage means, such as a ROM, for example a CD-ROM, a DVD-ROM, a synthetic DNA (deoxyribonucleic acid), etc., or a microelectronic circuit ROM, or else a magnetic recording means, for example a USB key or a hard disk.
Moreover, the recording medium may be a transmissible medium such as an electrical or optical signal, which may be conveyed via an electrical or optical cable, by radio or by other means. The program according to the invention may in particular be downloaded from a network such as the Internet.
Alternatively, the recording medium may be an integrated circuit in which the program is incorporated, the circuit being designed to execute or to be used in the execution of the abovementioned prediction method, coding method, decoding method or construction method.
Other features and advantages will become apparent from reading particular embodiments of the invention, which are given by way of illustrative and non-limiting examples, and the appended drawings, in which:
A description is given below of a 2D or 3D image prediction method that is able to be implemented in any type of video encoders or decoders, for example compliant with the AVC, HEVC, VVC standard and their extensions (MVC, 3D-AVC, MV-HEVC, 3D-HEVC, etc.), or the like, such as for example a convolutional neural network (or CNN).
With reference to
Within the meaning of the invention, a current set of pixels Bc is understood to mean:
According to the invention, as shown in
According to the invention, as shown in
Of course, one or more other reference sets of pixels may be used together with the reference sets of pixels BR0 and BR1 to compute the current prediction set of pixels BPc.
In the embodiments presented below, it is assumed that the one or more reference sets of pixels BR0, BR1, etc. have the same geometry as the current set of pixels Bc to be predicted. Of course, it is also possible, depending on the context of the prediction, to oversize this reference set of pixels so as to cover an area greater than or equal to that of the current set of pixels Bc to be predicted.
With reference again to
In P1, said at least one reference set of pixels BR0 is analyzed.
Such analysis implements a motion estimation comprising an estimation of the pixel shift between said at least one reference set of pixels BR0 that was displaced beforehand and a predicted version BPc of a current set of pixels Bc that is not available at the time of the prediction. This motion estimation implements conventional motion compensation, at the end of which a motion-compensated set of pixels BRC0 is obtained. During this analysis, displacement information is obtained, such as a displacement vector V0 that describes the displacement of BR0 toward BRC0.
In P2, a weighting value w0 is computed for each pixel of the motion-compensated set of pixels BRC0, depending on the result of the analysis performed in P1.
In P3, a predicted version BPc of a current set of pixels Bc is computed according to the following function, for each coordinate (x,y) of a pixel under consideration of the motion-compensated reference set of pixels BRC0:
BPc(x,y)=w0(x,y)*BRC0(x,y)
When two reference sets of pixels BR0 and BR1 are considered during the analysis P1, thereby generating two motion-compensated sets of pixels BRC0 and BRC1 and the corresponding displacement vectors V0, V1, two weighting values w0, w1 are computed in P2 for the motion-compensated set of pixels BRC0 and the motion-compensated set of pixels BRC1, respectively. The current prediction set of pixels BPc is then computed in P3 according to the following function, for each coordinate (x,y) of a pixel under consideration of the motion-compensated sets of pixels:
BPc(x,y)=w0(x,y)*BRC0(x,y)+w1(x,y)*BRC1(x,y)
The prediction method that has just been described above may then be implemented for each current set of pixels to be predicted, considered to be unavailable at the time of the prediction.
On initialization, the code instructions of the computer program PG_P1 are for example loaded into a RAM memory (not shown) before being executed by the processor PROC_P1. The processor PROC_P1 of the processing unit UT_P1 implements in particular the actions of the prediction method described above, according to the instructions of the computer program PG_P1.
The prediction device receives, at input E_P1, one or more reference sets of pixels BR0, BR1, etc., computes the one and/or more corresponding displacement vectors V0, V1, etc. along with the one and/or more corresponding weighting values w0, w1, etc., and delivers, at output S_P1, the abovementioned prediction set of pixels BPc.
A computer CAL receives this information at input in order to compute the prediction set of pixels BPc according to the abovementioned relationship:
BPc(x,y)=w0(x,y)*BRC0(x,y)+w1(x,y)*BRC1(x,y).
In a manner known per se, the convolutional neural network RNC1 carries out a succession of layers of filtering, non-linearity and scaling operations. Each filter that is used is parameterized by a convolution kernel, and non-linearities are parameterized (ReLU, leaky ReLU, GDN (“generalized divisive normalization”), etc.). The neural network RNC1 is for example of the type described in the document D. Sun, et al., “PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume” CVPR 2018.
In this case, the neural network RNC1 may be trained:
To this end, in a preliminary phase, the network RNC1 is trained to carry out operation P1. For example, the network RNC1 is trained to minimize the root mean square error between an image Ii to be approximated and the result BPc of the weighted prediction of
The network RNC1 is trained during a training phase by presenting a plurality of associated reference sets of pixels BR0, BR1, etc. together with a current set of pixels Bc, and by changing, for example using a gradient descent algorithm, the weights of the network so as to minimize the mean squared error between Bc and the result BPc(x,y) computed according to the abovementioned relationship.
At the end of this preliminary training phase, the network RNC1 is fixed and suitable for use in the prediction device PRED2.
The network RNC1 is thus optimized to implement steps P1 and P2 of the weighted prediction of
A description will now be given, with reference to
In the example shown, two reference sets of pixels BR0 and BR1 are taken into account for the prediction.
To this end, as illustrated in
In P10, a motion estimate between BR0 and BR1 is computed. Such a step is performed through conventional motion search steps, such as for example an estimation of displacement vectors.
With the vector V01 or V10 having been obtained in P10, P11 (
In the example of
In the example of
In the example of
With reference to
By way of illustration in
In contrast, a part Z0 of ERC0 and a part Z1 of ERC1 are undefined since they correspond to the unknown content that is located behind the element E of BR0 and the element E of BR1. However, as may be seen in
The computing P2 of prediction weighting values w0 and w1 then comprises the following:
With reference to
To this end, as illustrated in
At the end of this operation, what is obtained is an intermediate weighting in which, for each of the motion-compensated reference sets of pixels BRC0 and BRC1, the white content corresponds to wint0(x,y)=0 and wint1(x,y)=0 and the gray content corresponds to wint0(x,y)=0.5 and wint1(x,y)=0.5.
With reference to
where ws(x,y)=w0int(x,y)+w1int(x,y)
The computing of the sum of the intermediate weightings wint0(x,y) and wint1(x,y) is illustrated in
The computing P3 of the prediction weighting function then comprises the following:
With reference to
To this end, the following compensation weightings w0(x,y) and w1(x,y) are computed for each motion-compensated reference set of pixels BRC0 and BRC1, respectively:
w
0(x,y)=w0int(x,y)/ws(x,y)
and
w
1(x,y)=w1int(x,y)/ws(x,y)
Such compensation weighting is shown in
The predicted versions BRC0 and BRC1 are then motion-compensated and weighted by their respective weightings w0 and w1. Weighted compensated predicted versions BRCW0 and BRCW1 are then obtained.
With reference to
This has thus constructed motion compensation including a compensation weighting determined by elements present only at the time of the prediction, that is to say only on the reference sets of pixels BR0 and BR1. One particular advantage of such a compensation weighting compared to the currently standardized solutions lies in the fact that, with BR0 and BR1 being perfectly known at the time of the prediction and the prediction according to the invention using only BR0 and BR1, it is possible to effectively deal with disocclusions during the prediction, as illustrated in
The prediction weighting may be presented in another form. For example, a single weighting w may be used. Then, w weights the motion-compensated reference set of pixels BRC0 and (1−w) weights the motion-compensated reference set of pixels BRC1.
w may be computed based on the above embodiment starting from the weighting values w0 and w1 computed beforehand, and by performing:
Although a prediction method has been described above, such a method could also be implemented to construct or synthesize a set of pixels Bc from one or more reference sets of pixels BR0, BR1, etc. using the weighted prediction function that has just been described. The prediction method could be called a construction or synthesis method for the set of pixels Bc, corresponding for example to a missing set of pixels or a set of pixels not captured by a camera (360° video). In this case, strictly speaking, the prediction devices PRED1 and PRED2 will be considered more to be construction or synthesis devices. The prediction device PRED1 could thus implement a so-called view “synthesis” algorithm. For example, the VSRS (for “View Synthesis Reference”) software, the VVS (“Versatile View Synthesizer”) algorithm, may be used as view synthesis algorithm. The construction or synthesis device PRED2 may for its part, as described above, be a neural network, such as for example a convolutional neural network, a multilayer perceptron, an LSTM (for “Long Short Term Memory”), etc.
A description is given below, with reference to
Such a coding method comprises the following:
In C1, the weighted prediction, in its steps P1 to P3 illustrated in
The following coding steps are conventional and compliant with AVC, HEVC, VVC coding or the like. Thus:
At the end of this operation, a quantized and coded difference signal BEccod is obtained.
During the coding C3, multiple coding possibilities may be explored, for example a plurality of prediction weighting values w0, w1, etc. may be investigated to find the best data rate/distortion or efficiency/complexity compromise.
The encoder may put the weighting values computed according to the prediction method described above in competition with weighting values that it may choose to transmit. To this end, it may evaluate the quality of the prediction BPc obtained from the above prediction method and measure the prediction error, for example using a root mean squared error. This prediction error may be compared with the prediction error resulting from a set of pairs of predetermined weighting values (w0,w1) as used in current video standards. This set may be restricted to (0.5,0.5), as for example in the HEVC (for “High Efficiency Video Coding”) standard or comprise other values, such as for example those used in the BCW (for “Bi-prediction with CU level Weights”) tool of the VVC (for “Versatile Video Coding”) standard. A flag will indicate to the decoder whether it should use the prediction method described above or whether it should apply the VVC BCW tool or whether it should apply the HEVC balanced prediction.
This putting of the prediction weighting values w0, w1, etc. into competition has the advantage of optimizing the precision of the weighted prediction in comparison with the default prediction weighting implemented in the prediction devices from the prior art. Indeed, the prediction weighting of the invention generates more precise weighting values, but may lead to greater distortion when the signal is not predictable. A conventional prediction weighting, although it is less spatially precise and at the expense of a data rate to be transmitted, may lead to lower distortion.
In C4, the data of the quantized and coded difference signal BEccod are written to a transport stream F able to be transmitted to a decoder, which will be described later in the description.
By contrast, in accordance with the invention, the weighting w0 and/or the weighting w1 are advantageously neither coded nor transmitted to the decoder.
Of course, in the case where the quantized and coded difference signal BEc is zero, which may be the case for the SKIP coding mode, the abovementioned steps C2 and C4 are not implemented.
According to this first embodiment, the actions performed by the coding method are implemented by computer program instructions. To that end, the coding device COD1 has the conventional architecture of a computer and comprises in particular a memory MEM_C1, a processing unit UT_C1, equipped for example with a processor PROC_C1, and driven by the computer program PG_C1 stored in memory MEM_C1. The computer program PG_C1 comprises instructions for implementing the actions of the coding method as described above when the program is executed by the processor PROC_C1.
On initialization, the code instructions of the computer program PG_C1 are for example loaded into a RAM memory (not shown) before being executed by the processor PROC_C1. The processor PROC_C1 of the processing unit UT_C1 implements in particular the actions of the coding method described above, according to the instructions of the computer program PG_C1.
The encoder COD1 receives, at input E_C1, a current set of pixels Bc and delivers, at output S_C1, the transport stream F, which is transmitted to a decoder using a suitable communication interface (not shown).
A description is given below, with reference to
Such a decoding method implements image decoding corresponding to the image coding of
The decoding method comprises the following:
In D1, data of the coded difference signal BEccod are extracted, in a conventional manner, from the received transport stream F.
In D2, BEccod is decoded in a conventional manner. At the end of this operation, a decoded difference signal BEcdec is obtained.
In D3, the weighted prediction according to the invention, in its steps P1 to P3 illustrated in
In D4, a reconstructed current set of pixels BDc is computed by combining the decoded difference signal BEcdec obtained in D2 with the prediction set of pixels BPc obtained in D3.
In a manner known per se, the reconstructed current set of pixels BDc may possibly undergo filtering by a loop filter performed on the reconstructed signal, which is well known to those skilled in the art.
Of course, in the case where the difference signal BEc that was computed during the abovementioned coding method is zero, which may be the case for the SKIP coding mode, the abovementioned steps D1 and D2 are not implemented.
According to this first embodiment, the actions performed by the decoding method are implemented by computer program instructions. To that end, the decoder DEC1 has the conventional architecture of a computer and comprises in particular a memory MEM_D1, a processing unit UT_D1, equipped for example with a processor PROC_D1, and driven by the computer program PG_D1 stored in memory MEM_D1. The computer program PG_D1 comprises instructions for implementing the actions of the decoding method as described above when the program is executed by the processor PROC_D1.
On initialization, the code instructions of the computer program PG_D1 are for example loaded into a RAM memory (not shown) before being executed by the processor PROC_D1. The processor PROC_D1 of the processing unit UT_D1 implements in particular the actions of the decoding method described above in connection with
The decoder DEC1 receives, at input E_D1, the transport stream F transmitted by the encoder COD1 of
A description will now be given, with reference to
Such a variant aims to improve the weighted prediction method of
To this end, on the encoder side, as illustrated in
As shown in
At the end of step C′1, a set of latent variables is obtained in the form of a signal U′. The signal U′ is quantized in C′2 by a quantizer QUANT, for example a uniform or vector quantizer controlled by a quantization parameter. A quantized signal U′q is then obtained.
At C′3, the quantized signal U′q is coded using an entropy encoder CE, for example of arithmetic type, with a determined statistic. This statistic is for example parameterized by probabilities of statistics, for example by modeling the variance and the mean of a Laplacian law (σ,μ), or else by considering hyperpriors as in the publication: “Variational image compression with a scale hyperprior by Ballé, which was presented at the ICLR 2018 conference. A coded quantized signal U′qcod is then obtained.
In C′4, the coded quantized signal U′qcod is written to a transport stream F′, which is transmitted to a decoder DEC3, illustrated in
A description will now be given, with reference to
To this end, on the decoder side, as illustrated in
Following the reception of the stream F′, in D′2, entropy decoding is carried out on the coded quantized signal U′qcod using an entropy decoder DE corresponding to the entropy encoder CE of
In D′3, the decoded quantized signal U′q is concatenated with the latent space U obtained by the neural network RNC1 of
The neural network RNC1 then processes this concatenation through various layers, in the same way as in step P2 of
In a manner corresponding to
In the embodiments that have been disclosed above with reference to
These embodiments may be extended to three or more reference sets of pixels. To this end, the neural network RNC1 described with reference to
A degraded weighted prediction mode is of course possible, for example when only one reference frame is used for the prediction (case of type P prediction in video coding mode). Such a degraded mode is illustrated with reference to
The decoder DEC3′ differs from the decoder DEC3 of
This means that the prediction set of pixels BPc obtained at the end of step P3 of
BPc(x,y)=w0(x,y)*BRC0(x,y)+0*0=w0(x,y)*BRC0(x,y).
Number | Date | Country | Kind |
---|---|---|---|
FR2101632 | Feb 2021 | FR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FR2022/050272 | 2/15/2022 | WO |