This disclosure relates to video encoding and/or decoding of an image or a video sequence.
A video sequence consists of several images (also referred to herein as “pictures”). When viewed on a screen, the image consists of pixels, each pixel having a red, green and blue (RGB) value. However, when encoding and decoding a video sequence, the image is often not represented using RGB values but typically using another color space, including but not limited to YCbCr, ICTCP, non-constant-luminance YCbCr, and constant luminance YCbCr. If we take the example of YCbCr, which is currently the most used representation, it is made up of three components: luma (Y), which roughly represents luminance, and chroma (Cb, and Cr), both of which represents chrominance. It is often the case that Y is of full resolution, whereas the two other components, Cb and Cr, are of a smaller resolution. A typical example is a high definition (HD) video sequence containing 1920×1080 RGB pixels, which is often represented with a 1920×1080-resolution Y component, a 960×540 Cb component and a 960×540 Cr component. The elements in the components are called samples. In the example given above, there are therefore 1920×1080 samples in the Y component, and, hence, there is a direct relationship between samples and pixels. Therefore, in this document, the term pixels and samples is sometimes used interchangeably. For the Cb and Cr components, there is no direct relationship between samples and pixels; a single Cb sample typically influences several pixels.
In the draft for the Versatile Video Coding (VVC) standard, which is developed by the Joint Video Experts Team (JVET), the decoding of an image can be thought of as carried out in two stages: (1) prediction decoding and (2) loop filtering. In the prediction decoding stage, the samples of the components (Y, Cb, and Cr) are partitioned into rectangular blocks. As an example, one block may be of size 4×8 samples, whereas another block may be of size 64×64 samples. The decoder obtains instructions for how to do a prediction for each block, for instance to copy samples from a previously decoded image (an example of temporal prediction), to copy samples from already decoded parts of the current image (an example of intra prediction), or to perform a combination thereof. To improve this prediction, the decoder may obtain a residual, often encoded using transform coding such as discrete sine transform (DST) or the discrete cosine transform (DCT). This residual is added to the prediction, and the decoder can proceed to decode the subsequent block.
The output from the prediction decoding stage is the three components: Y, Cb, and Cr. However, it is possible to further improve the fidelity of these components, and this is done in the loop filtering stage. The loop filtering stage in the current draft of VVC consists of three sub-stages: (1) a deblocking filter stage, (2) a sample adaptive offset filter (SAO) sub-stage, and (3) an adaptive loop filter (ALF) sub-stage.
In the deblocking filter sub-stage, the decoder changes Y, Cb, and Cr by smoothing edges near block boundaries when certain conditions are met. This increases perceptual quality (subjective quality) since the human visual system is very good at detecting regular edges such as block artifacts along block boundaries. In the SAO sub-stage, the decoder adds or subtracts a signaled value to samples that meet certain conditions, such as being in a certain value range (band offset SAO) or having a specific neighborhood (edge offset SAO). This can reduce ringing noise since such noise often aggregates in certain value ranges or in specific neighborhoods (e.g., in local maxima). In this document, the reconstructed image components that are the result of this stage are denoted as YSAO, CbSAO, and CrSAO.
Embodiments of this disclosure relate to the third sub-stage (i.e., the ALF stage). The basic idea behind adaptive loop filtering is that the fidelity of the image components YSAO, CbSAO, and CrSAO can often be improved by filtering the image using a linear filter that is signaled from the encoder to the decoder. As an example, by solving a least-squares problem, the encoder can determine what coefficient values a linear filter should have in order to most efficiently lower the error between the reconstructed image components so far, YSAO, CbSAO, and CrSAO, and the original image components Yorg, Cborg, and Crorg. These coefficient values (or simply “coefficients” for short) can then be signaled from the encoder to the decoder. The decoder reconstructs the image as described above to get YSAO, CbSAO, and CrSAO, obtains the filter coefficients from the bit stream, and then applies the filter to get the final output, which are denoted as YALF, CbALF, CrALF.
In VVC, the ALF luma filter is more advanced than this. To start, it is observed that it is often advantageous to filter some samples with one set of coefficients, but avoid filtering other samples, or perhaps filter those other samples with another set of coefficients. To that end, VVC classifies every Y sample (i.e., every luma sample) into one of 25 classes. The class to which a sample belongs is decided for each 4×4 block based on the local neighborhood of that sample (8×8 neighborhood), specifically on the gradients of surrounding samples and the activity of surrounding samples. As can be seen from the VVC specification, four variables are computed to determine the characteristics of the local neighborhood of the current sample where filtH measures gradient horizontally, filtV measures gradients vertically, filtD0 measures gradients diagonally top left to bottom right, and filtD1 measures gradients diagonally top right to bottom left:
filtH[i][j]=Abs((recPicture[hx4+i][vy4+j]<<1)−recPicture[hx4+i−1][vy4+j]−recPicture[hx4+i+1][vy4+j]) (1471)
filtV[i][j]=Abs((recPicture[hx4+i][vy4+j]<<1)−recPicture[hx4+i][vy4+j−1]−recPicture[hx4+i][vy4+j+1]) (1472)
filtD0[i][j]=Abs((recPicture[hx4+i][vy4+j]<<1)−recPicture[hx4+i i−1][vy4+j−1]−recPicture[hx4+i+1][vy4+j+1]) (1473)
filtD1[i][j]=Abs((recPicture[hx4+i][vy4+j]<<1)−recPicture[hx4+i+1][vy4+j−1]−recPicture[hx4+i−1][vy4+j+1]) (1474)
Then, these variables are summed up in a local neighborhood around the current sample to get a more reliable estimate of the directionality of the neighborhood as follows, where sumH indicates the sum of filtH, sumV indicates the sum of filtV, sumD0 indicates the sum of filtD0, sumD1 indicates the sum of filtD1, and sumOfHV indicates the sum of sumH and sumV from VVC draft below:
sumH[x][y]=ΣiΣjfiltH[i][j], with i=−2 . . . 5, j=minY . . . maxY (1475)
sumV[x][y]=ΣiΣjfiltV[i][j], with i=−2 . . . 5, j=minY . . . maxY (1476)
sumD0[x][y]=ΣiΣjfiltD0[i][j], with i=−2 . . . 5, j=minY . . . maxY (1477)
sumD1[x][y]=ΣiΣjfiltD1[i][j], with i=−2 . . . 5, j=minY . . . maxY (1478)
sumOfHV[x][y]=sumH[x][y]+sumV[x][y] (1479)
Finally, based on these metrics, a classification is made to determine which set of filters filtIdx to use for the current sample and also a transposeIdx such that several directionalities can share the same filter coefficients, from VVC draft below:
The classification filter index array filtIdx and transpose index array transposeIdx are derived by the following steps:
hv1=sumV[x>>2][y>>2] (1480)
hv0=sumH[x>>2][y>>2] (1481)
dirHV=1 (1482)
hv1=sumH[x>>2][y>>2] (1483)
hv0=sumV[x>>2][y>>2] (1484)
dirHV=3 (1485)
d1=sumD0[x>>2][y>>2] (1486)
d0=sumD1[x>>2][y>>2] (1487)
dirD=0 (1488)
d1=sumD1[x>>2][y>>2] (1489)
d0=sumD0[x>>2][y>>2] (1490)
dirD=2 (1491)
hvd1=(d1*hv0>hv1*d0)?d1:hv1 (1492)
hvd0=(d1*hv0>hv1*d0)?d0:hv0 (1493)
dir1[x][y]=(d1*hv0>hv1*d0)?dirD:dirHV (1494)
dir2[x][y]=(d1*hv0>hv1*d0)?dirHV:dirD (1495)
dirS[x][y]=(hvd1*2>9*hvd0)?2:((hvd1>2*hvd0)?1:0) (1496)
varTab[ ]={0,1,2,2,2,2,2,3,3,3,3,3,3,3,3,4} (1497)
avgVar[x][y]=varTab[Clip3(0,15,(sumOfHV[x>>2][y>>2]*ac[x>>2][y>>2])>>(BitDepth−1))] (1498)
transposeTable[ ]={0,1,0,2,2,3,1,3}
transposeIdx[x][y]=transposeTable[dir1[x][y]*2+(dir2[x][y]>>1)]
filtIdx[x][y]=avgVar[x][y]
filtIdx[x][y]+=(((dir1[x][y]&0x1)<<1)+dirS[x][y])*5 (1499)
From above it can be seen that filtIdx equal to 0 to 4 do not have any specific directional characteristics. A value of filterIdx greater than 4 corresponds to directionality of the samples, since this means that dirS is greater than 0. Studying the addition to filtIdx,
filtIdx[x][y]+=(((dir1[x][y]&0x1)<<1)+dirS[x][y])*5,
we see that if there is a diagonal directionality, i.e., if dir1 is either 0 or 2, the first term will be zero, and either 1*5 (if dirS=1) or 2*5 (if dirS=2) can be added. (If dirS=0, the addition will not be done.) Hence, all values of filterIdx from 5 to 14 correspond to a diagonal directionality of the samples. Likewise, if there is a horizontal or vertical directionality, i.e., if dir1 is either 1 or 3, then the first term (dir1 & 1)<<1 will become 2. Therefore, in this case, either (2+1)*5 (if dirS=1) or (2+2)*5 (if dirS=2) will be added resulting in values between 15 and 24. Hence, we have concluded that filtIdx indicates the directionality of the surrounding samples in the following way as described in Table 1:
Where transposeIdx equal to 0 corresponds to no transpose of the filter coefficients, transposeIdx equal to 1 corresponds to mirroring the filter coefficients along the diagonal from top right to bottom left, transposeIdx equal to 2 corresponds to mirroring the filter coefficients along the vertical axis, and transposeIdx equal to 3 corresponds to rotating the filter coefficients 90 degrees.
This means that, when the filterIdx is between 15 and 24 and transposeIdx is equal to 3, the local structure around the current sample has a vertical directionality, and, when transposeIdx is equal to 0, the local structure around the current sample has a horizontal directionality.
It is possible for the encoder to signal one set of coefficients for each of the 25 classes. In VVC, the ALF coefficients are signaled in the adaptive parameter sets (APS) that then can be referred by an aps index that determines which of the defined sets to use to when decoding pictures. The decoder will then first decide which class a sample belongs to, and then select the appropriate set of coefficients to filter the sample. However, signaling 25 sets of coefficients can be costly. Hence the VVC standard also allows that only a few of the 25 classes are filtered using unique sets of coefficients. The remaining classes may reuse a set of coefficients used in another class, or it may be determined that it should not be filtered at all. For samples belonging to Cb or Cr, i.e., for chroma samples, no classification is used and the same set of coefficients is used for all samples.
Transmitting the filter coefficients is costly, and, therefore, the same coefficient value is used for two filter positions. For luma (samples in the Y-component), the coefficients are re-used in the way shown in
Assume R(x,y) is the sample to be filtered, situated in the middle of the
The filtered version of the reconstructed sample in position (x,y), which we will denote RF(x,y), is calculated in the following way from VVC specification equation 1411 to 1426 and Table 43, where (x,y)=(hx,vy) and C0=f[idx[0]], C1=f[idx[1]], C2=f[idx[2]], C3=f[idx[3]], C4=f[idx[4]], C5=f[idx[5]], C6=f[idx[6]], C7=f[idx[7]], C8=f[idx[8]], C9=f[idx[9]], C10=f[idx[10]] and C11=f[idx[11]]:
i=AlfCtbFiltSetIdxY[xCtb>>CtbLog2SizeY][yCtb>>CtbLog2SizeY] (1453)
f[j]=AlfFixFiltCoeff[AlfClassToFiltMap[i][filtIdx[x][y]][j] (1454)
c[j]=2BitDepth (1455)
i=slice_alf_aps_id_luma[AlfCtbFiltSetIdxY[xCtb>>CtbLog2SizeY][yCtb>>CtbLog2SizeY]−16] (1456)
f[j]=AlfCoeffL[i][filtIdx[x][y]][j] (1457)
c[j]=AlfClipL[i][filtIdx[x][y]][j] (1458)
idx[ ]={9,4,10,8,1,5,11,7,3,0,2,6} (1459)
idx[ ]={0,3,2,1,8,7,6,5,4,9,10,11} (1460)
idx[ ]={9,8,10,4,3,7,11,5,1,0,2,6} (1461)
idx[ ]={0,1,2,3,4,5,6,7,8,9,10,11} (1462)
h
x+i=Clip3(0,pic_width_in_luma_samples−1,xCtb+x+i) (1463)
v
y+j=Clip3(0,pic_height_in_luma_samples−1,yCtb+y+j) (1464)
curr=recPicture[hx][vy] (1465)
sum=f[idx[0]]*(Clip3(—c[idx[0]], c[idx[0]], recPicture[hx][vy+y3]−curr)+Clip3(—c[idx[0]], c[idx[0]], recPicture[hx][vy−y3]−curr))+f[idx[1]]*(Clip3(—c[idx[1]], c[idx[1]], recPicture[hx+1][vy+y2]−curr)+Clip3(—c[idx[1]], c[idx[1]], recPicture[hx−1][vy−y2]−curr))+f[idx[2]]*(Clip3(—c[idx[2]], c[idx[2]], recPicture[hx][vy+y2]−curr)+Clip3(—c[idx[2]], c[idx[2]], recPicture[hx][vy−y2]−curr))+f[idx[3]]*(Clip3(—c[idx[3]], c[idx[3]], recPicture[hx−1][vy+y2]−curr)+Clip3(—c[idx[3]], c[idx[3]], recPicture[hx+1][vy−y2]−curr))+f[idx[4]]*(Clip3(—c[idx[4]], c[idx[4]], recPicture[hx+2][vy+y1]−curr)+Clip3(—c[idx[4]], c[idx[4]], recPicture[hx−2][vy−y1]−curr))+f[idx[5]]*(Clip3(—c[idx[5]], c[idx[5]], recPicture[hx+1][vy+y1]−curr)+Clip3(—c[idx[5]], c[idx[5]], recPicture[hx−1][vy−y1]−curr))+f[idx[6]]*(Clip3(—c[idx[6]], c[idx[6]], recPicture[hx][vy+y1]−curr)+Clip3(—c[idx[6]], c[idx[6]], recPicture[hx][vy−y1]−curr))+f[idx[7]]*(Clip3(—c[idx[7]], c[idx[7]], recPicture[hx−1][vy+y1]−curr)+Clip3(—c[idx[7]], c[idx[7]], recPicture[hx+1][vy−y1]−curr))+f[idx[8]]*(Clip3(—c[idx[8]], c[idx[8]], recPicture[hx−2][vy+y1]−curr)+Clip3(—c[idx[8]], c[idx[8]],recPicture[hx+2][vy−y1]−curr))+f[idx[9]]*(Clip3(—c[idx[9]],c[idx[9]],recPicture[hx+3][vy]−curr)+Clip3(—c[idx[9]],c[idx[9]],recPicture[hx−3][vy]−curr))+f[idx[10]]*(Clip3(—c[idx[10]],c[idx[10]],recPicture[hx+2][vy]−curr)+Clip3(—c[idx[10]],c[idx[10]],recPicture[hx−2][vy]−curr))+f[idx[11]]*(Clip3(—c[idx[11]],c[idx[11]],recPicture[hx+1][vy]−curr)+Clip3(—c[idx[11]],c[idx[11]],recPicture[hx−1][vy]−curr)) (1466)
sum=curr+((sum+64)>>alfShiftY) (1467)
alfPictureL[xCtb+x][yCtb+y]=Clip3(0,(1<<BitDepth)−1,sum) (1468)
CtbSizeY is the vertical size of the CTU. CTU in VVC is typically 128×128. Here the Clip3(x,y,z) operation simply makes sure that the magnitude of the value z never exceeds y or goes below x:
The clipping parameters “c[x]” are also to be signaled from the encoder to the decoder.
The ALF filter is designed to keep the DC gain, which means that the sum of all filter coefficients, including a coefficient for the current sample, is equal to 128. Based on this design, the modification of ALF of a sample can be derived based on equation 1466 where modification, excluding the clipping, equals the sum of difference between the current sample and a neighboring sample times respective filter coefficient. When referring to filter coefficients in this disclosure, we mainly refer to these filter coefficients (e.g., excluding the center filter coefficient).
A similar filter design as shown above is used for ALF of chroma components but without use of any classification.
In the reference software for VVC Test Model (VTM)-10.0, the filter coefficients and the clipping parameters are optimized in rate distortion sense (e.g., to minimize the mean squared error while also considering the bits for transmission of filter coefficients and clipping parameters).
Section 8.8.5.7 of the VVC Specification describes a cross-component filtering process. The text of this section is reproduced below.
8.8.5.7 Cross-Component Filtering Process
Inputs of this process are:
h
x+i=Clip3(0,pic_width_in_luma_samples−1,xL+i) (1528)
v
y+j=Clip3(0,pic_height_in_luma_samples−1,yL+j) (1529)
curr=alfPicturec[xCtbC+x][yCtbC+y] (1530)
f[j]=CcAlfCoeff[j] (1531)
sum=f[0]*(recPictureL[hx][vy−yP1]−recPictureL[hx][vy])+f[1]*(recPictureL[hx−1][vy]−recPictureL[hx][vy])+f[2]*(recPictureL[hx+1][vy]−recPictureL[hx][vy])+f[3]*(recPictureL[hx−1][vy+yP1]−recPictureL[hx][vy])+f[4]*(recPictureL[hx][vy+yP1]−recPictureL[hx][vy])+f[5]*(recPictureL[hx+1][vy+yP1]−recPictureL[hx][vy])+f[6]*(recPictureL[hx][vy+yP2]−recPictureL[hx][vy]) (1532)
scaledSum=Clip3(—(1<<(BitDepth−1)),(1<<(BitDepth−1))−1,(sum+64)>>7) (1533)
sum=curr+scaledSum (1534)
ccAlfPicture[xCtbC+x][yCtbC+y]=Clip3(0,(1<<BitDepth)−1,sum) (1535)
One problem with the existing method for rate distortion optimization of filter coefficients in VVC Test Model (VTM) is that it only focuses on the minimization of error and rate and does not allow for flexible control of the filter strength for adaptive loop filter (ALF). Controlling the filter strength can be useful to avoid removing natural texture and also to give a desired amount of smoothing. Typically, the more filtering that is used, the smoother the image, and the more natural texture you lose, such as gravel, grass etc. At the same time, the less filtering that is used, the more artifacts remain in the image after ALF filtering. The current method for calculating the filters in VTM only look at the best peak signal-to-noise ratio (PSNR) for a certain bit rate, which may give an over-smoothed look.
Aspects of the invention may overcome one or more of the problems with the existing method for rate distortion optimization of filter coefficients in VTM by applying a scaling factor on determined ALF coefficients before encoding them into the video bitstream. In some aspects, applying the scaling factor on determined ALF coefficients may make sure that the strength of the filter is kept sufficiently low. Aspects of the invention may provide means to control the amount of filter strength for ALF. Aspects of the invention may provide improved visual quality of VVC (e.g., better visual quality than VTM-10.0).
According to the first aspect of the present invention there is provided a method performed by an encoder for encoding an image. The method comprises determining adaptive loop filter, ALF, coefficient values. The method comprises determining a scaling factor. The method further comprises generating scaled ALF coefficient values by applying the scaling factor to one or more of the ALF coefficient values. The method comprises providing the scaled ALF coefficient values to a decoder, wherein providing the scaled ALF coefficient values to the decoder comprises encoding the scaled ALF coefficient values in a bitstream and conveying the bitstream over a network. The determined ALF coefficient values reduce an error between reconstructed image components and original image components and the determined scaling factor improves subjective performance for image to be encoded.
In some embodiments, applying the scaling factor to one or more of the ALF coefficient values may include applying the scaling factor as a multiplication of the one or more of the ALF coefficient values in floating point representation. In some embodiments, applying the scaling factor to one or more of the ALF coefficient values may include applying the scaling factor as a multiplication, addition, and/or shift of one or more filter coefficients in fixed point representation.
In some embodiments, determining the ALF coefficient values may include solving a least-squares problem.
In some embodiments, determining the scaling factor may include determining a strength of filtering with the ALF coefficient values. In some embodiments, determining the strength of filtering with the ALF coefficient values may include calculating a sum of the absolute values of the ALF coefficient values. In some embodiments, determining the scaling factor may further include comparing the sum of the absolute values of the ALF coefficient values to 128. In some embodiments, determining the strength of filtering with the ALF coefficient values may include calculating a sum of the squares of the ALF coefficient values. In some embodiments, determining the strength of filtering with the ALF coefficient values may include calculating a square root of a sum of the squares of the ALF coefficient values. In some embodiments, the strength of filtering with the ALF coefficient values may be determined based on a quantization parameter (QP).
In some embodiments, the scaling factor may be determined based on the strength of filtering with the ALF coefficient values.
In some embodiments, determining the scaling factor may include determining a classification type (e.g., vertical/horizontal, diagonal, and non-oriented), and the determined scaling factor may be based on the determined classification type. In some embodiments, the determined scaling factor may be based on whether the image is an intra coded picture or an inter coded picture.
In some embodiments, the method may further include determining that the scaling factor is not below 1 and using fixed ALF coefficient values only if the determined scaling factor is not below 1. In some embodiments, the method may further include determining that a strength of filtering with fixed filter coefficients is less than a threshold and using the fixed ALF coefficient values only if the determined strength is less than the threshold.
In some embodiments, the method may further include quantizing the scaled ALF coefficient values and adjusting the quantized coefficient values. In some embodiments, the method may further include determining a strength of filtering with the scaled ALF coefficient values and determining a strength of filtering with the adjusted quantized coefficient values. In some embodiments, the adjusted quantized coefficient values may be such that the strength of filtering with the adjusted quantized coefficient values is not greater than the strength of filtering with the scaled ALF coefficient values by more than a threshold amount. In some embodiments, adjusting the quantized coefficient values may include only adjusting the quantized coefficient values if the determined scaling factor is less than 1.
According to the second aspect of the present invention there is provided an apparatus adapted to perform the method according to the first aspect.
According to the third aspect of the present invention there is provided a method performed by a decoder for decoding an image. The method comprises receiving scaled ALF coefficient values signaled by an encoder, reconstructing image components, and filtering the image components by applying the scaled ALF coefficient values to generate final reconstructed image components.
According to the fourth aspect of the present invention there is provided an apparatus adapted to perform the method according to the third aspect.
According to the fifth aspect of the present invention there is provided a computer program comprising instructions for adapting an apparatus to perform the method according to the first or the third aspect.
According to the sixth aspect of the present invention there is provided a carrier comprising the computer program, and the carrier may be one of an electronic signal, optical signal, radio signal, or compute readable storage medium.
According to the seventh aspect of the present invention there is provided an apparatus comprising processing circuitry and a memory, the memory comprising instructions executable by said processing circuitry, wherein the apparatus is operative to perform any of the methods set forth above.
The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.
System
In some embodiments, the encoder 202 (e.g., the ALF portion of the loop filter 100 of the encoder 202) may be configured to determine ALF coefficient values, determine one or more scaling factors, and to generate scaled ALF coefficient values by applying the one or more scaling factors to the determined ALF coefficient values. The encoder 202 may signal the scaled ALF coefficient values to the decoder 204 (e.g., by encoding them into the video bit stream). The decoder 204 may be configured to use the scaled ALF coefficient values to decode an image. That is, the decoder 204 may be configured to reconstruct image components (e.g., to get YSAO, CbSAO, and CrSAO) and obtain the scaled filter coefficients from the bit stream, and the decoder 204 (e.g., the ALF portion of the loop filter 100 of the decoder 204) may be configured to filter the image components by applying the scaled ALF coefficient values to generate the final reconstructed image components (e.g., YALF, CbALF, and CrALF).
In some embodiments, the system 200 may use the one or more scaling factors to control the strength of ALF filtering. In some embodiments, a scaling factor may attenuate or amplify the strength of the ALF filter. For example, in some embodiments, a scaling factor that is below 1 may attenuate the strength of the ALF filter, a scaling factor that is above 1 may amplify the strength of the ALF filter, and a scaling factor that is equal to 1 may keep the filter strength the same.
In some embodiments, the encoder 204 (e.g., the ALF portion of the loop filter 100 of the encoder 202) may apply a scaling factor as a multiplication of one or more filter coefficients in floating point representation (e.g., Cnew(i)=C(i)*sf, where C(i) are the filter coefficients for non-center neighboring samples, and sf is the scaling factor). In some alternative embodiments, the encoder 204 (e.g., the ALF portion of the loop filter 100 of the encoder 202) may apply a scaling factor as a multiplication, addition, and/or shift of one or more filter coefficients in fixed point representation (e.g., Cnew(i)=C(i)>>sfShift, where C(i) are the filter coefficients for non-center neighboring samples, and 2{circumflex over ( )}(−sfShift) is the scaling factor).
In some embodiments, determining one or more scaling factors may include determining one or more strengths of filtering (e.g., ALF filtering). In some embodiments, the encoder 202 may determine the strength of filtering as the sum of the absolute value of filter coefficients. In some alternative embodiments, the encoder 202 may determine the strength of filtering as the sum of the squares of the filter coefficients. In some other alternative embodiments, the encoder 202 may determine the strength of filtering as the square root of the sum of the squares of the filter coefficients. In some further alternative embodiments, the encoder 202 may determine the strength of filtering based on a quantization parameter (QP).
In some embodiments, the one or more scaling factors may be determined based on the one or more determined filter strengths. In some embodiments, determining the one or more scaling factors may include comparing the one or more determined strengths of filtering to one or more strength thresholds (e.g., a scaling factor that attenuates the filter strength may be used if the filter strength is above a strength threshold).
In some embodiments, the strength of ALF filtering is calculated by the sum of the absolute value of filter coefficients. In some embodiments, in floating point arithmetic, a sum of 1 may correspond to a filter that is un-restricted, a sum below 1 may correspond to a filter that is attenuated, and a sum above 1 may correspond to a filter that is amplified. In some fixed point arithmetic embodiments with a quantization of 1/128, a sum of 128 corresponds to a filter that is un-restricted, a sum below 128 corresponds to a filter that is attenuating, and a sum above 128 corresponds to a filter that has amplification.
In some embodiments, the encoder 202 may determine the strength of filtering separately for positive filter coefficients and negative filter coefficients. In some embodiments, the strength of filtering for positive filter coefficients may be the sum of positive filter coefficients, and the strength of filtering for negative filter coefficients may be the sum of negative filter coefficients.
Positive filter coefficients decrease the sample distance between the current sample and the neighboring samples with positive coefficients. Negative filter coefficients increase the sample distance between the current sample and the neighboring samples with negative coefficients.
In some embodiments, to control the strength of filtering, a first scaling factor may be used for positive filter coefficients, and a second scaling factor may be used for negative filter coefficients.
In some embodiments, a first scaling factor for positive filter coefficients that is equal to 1 and a second scaling factor for negative filter coefficients that is less than 1 may increase the low-pass effect of the filter. In some embodiments, a first scaling factor for positive filter coefficients that is equal to 1 and a second scaling factor for negative filter coefficients that is greater than 1 may increase the high-pass effect of the filter.
In some embodiments where ALF allows for the use of fixed filter coefficients (e.g., pre-determined by the specification) that have varying filter strength, the use of fixed filter coefficients may be avoided when applying a scaling factor below 1. This may enable full control of the filter strength for ALF.
Some alternative embodiments may allow for usage of fixed filter coefficients that have a filter strength less than a threshold.
In some embodiments, to enable control of the filter strength of ALF as part of filter optimization or filter selection, a maximum filter strength may be defined and kept track of during optimization and filter selection, and solutions that deviate too much from the maximum filter strength may be avoided.
In some embodiments, after filter coefficients have been quantized, the filter coefficients can be tuned to achieve better objective distortion and/or rate distortion by increasing or decreasing the filter coefficients from the quantized values. This adjustment of filter coefficients after quantization may be referred to as “fine-tuning”. By determining the filter strength before quantization and avoiding refinements in the fine-tuning stage that increase the filter strength more than a threshold, the strength of the filter can be maintained as part of the refinement. In some embodiments, refinement of filter coefficients after quantization may be omitted when the scaling factor is less than 1.
In some embodiments, this means that a condition may be added to the optimization so that the strength of the filter is kept constant during the optimization. As an example, this may mean that, if one coefficient is increased, another one must be decreased to arrive at a filter with the same strength. If the strength is measured as the sum of absolute values of the coefficients, this means that the sum of absolute values of the coefficients must be the same (or almost the same) during the optimization. If the strength is instead measured as the squared sum of absolute values of the coefficients, then the optimization instead must make sure that this measure does not change during optimization. This can be accomplished by minimizing the loss function:
Where ck is the coefficients before optimization and dk is a coefficient after optimization, and BDrate(dk) is the loss function that has traditionally been minimized. This can for instance be done by taking steps in the gradient of L,
for a sufficiently small step size α such as α=0.001, and then update the coefficients using dknew=dkold+Δdk. The variable λ is used to set the balance between keeping the same strength (large λ) and between lowering the BD-rate penalty (small λ).
If instead the strength of the filter is measured as the sum of absolute values of the coefficients in the filter, the following loss function may instead be used:
Another way of ensuring the same strength is to test pairs of coefficients, where one coefficient is moved to increase the filter strength and the other coefficient is moved to decrease the filter strength by an equal amount or more. If the new combination results in a lower BDrate loss measure, the optimization procedure executes the change, and otherwise leaves it unchanged. It then proceeds to the next pair of coefficients.
In some alternative embodiments, after “optimal” filter coefficients have been obtained, the “optimal” filter coefficients may be merged such that fewer filters are defined due to rate distortion preference. Doing so can alter the filter strength of the merged filter compared to the filter strength of the “optimal” filter, and this can enable use of a stronger filter than necessary. By determining the filter strength before merging filters, mergings that results in filters that deviates too much from the “optimal” filter strengths may be avoided. In some embodiments, merging of filter coefficients may be omitted when the scaling factor is less than 1.
Some further alternative embodiments may include determining the optimal filter coefficients for a coding tree unit (CTU), determining the filter strength for respective filters, and then selecting filter coefficients (after merging, quantization, refinement) that maintain the determined filter strength and otherwise omit ALF for that CTU.
In some embodiments, the filter strength of ALF may be controlled by the quantization parameter (QP). In some embodiments, a scaling factor less than 1 may be used to reduce the magnitude of the filter coefficients when QP is larger than a threshold (e.g., 36). In some embodiments, a filter strength larger than a QP dependent threshold is not allowed.
In some embodiments, a scaling factor may be selected by testing different scaling factors and using the scaling factor that improves subjective performance for the content to be encoded. In some alternative embodiments, a scaling factor may be selected automatically by testing different scaling factors and selecting the scaling factor that maintains most of the objective performance of a non-scaled approach.
In some embodiments, the filter strength can be controlled individually for different classification types (e.g., vertical/horizontal, diagonal, and non-oriented). In some embodiments, a specific scaling factor may be used for each of the respective classifications, and the scaling factors for the respective classifications may be different from one another.
In some embodiments, the filter strength may be controlled differently for intra coded pictures than for inter coded pictures. In some embodiments, only filter strengths equal to or less than a pre-defined filter strength may be allowed.
In some embodiments, the filter strength may be controlled by measuring the sum of the squares of the filter coefficients. As an example, if c0=10, c1=0, c2=−20, c3 through c10=30, and c11=−40, then then sum of the squares of the filter coefficients is 10{circumflex over ( )}2+0{circumflex over ( )}2+(−20){circumflex over ( )}2+8*(30){circumflex over ( )}2+(−40){circumflex over ( )}2=9300.
In some embodiments, the encoder 202 may calculate a filter strength measure for all of the filters affecting an image. The maximum filter strength calculated for the filters affecting an image may be regarded as full strength. As an example, if there are four filters that affect an image, and their strength measures are 9300, 3200, 4600, and 10200, respectively, the encoder 202 may regard 10200 as full strength.
In some embodiments, the encoder 202 may apply a scaling factor to each of the filters that has a filter strength above a filter strength threshold. In some embodiments, the filter strength threshold may be a percentage or a fraction of the full strength measure. For example, a threshold factor of s=1.0 may result in no change to the filters because no filter will have a filter strength above the full strength measure (e.g., 10200×1.0). However, if the threshold factor is s=0.75, the encoder may make sure that no filter gets a strength larger than 0.75*10200=7650. This means that the filters with strength 3200 and 4600 will not be changed, but the filters with strength 9300 and 10200 will be scaled down until their strength is less than or equal to 7650. In some embodiments, scaling down the filters may be done by multiplying the filter coefficients with a factor r until the filter is below the strength threshold. In the example where c0=10, c1=0, c2=−20, c3 through c10=30, and c11=−40, scaling every filter coefficient with r=0.9 will give the new filter coefficients c0=9, c2=−18, c3 through c10=27, and c11=−36, and the strength after the scaling will become 7533, which is less than the filter strength threshold of 7650.
In some embodiments, the encoder 202 may determine the scaling factor r as:
r=sqrt(s*maxstrength/strength)=sqrt(0.75*10200/9300)=sqrt(0.8226)=0.9070.
In some embodiments, this scaling with a factor of r may be performed before quantization. However, this is not required, and, in some alternative embodiments, this scaling with a factor of r may be performed after quantization but before fine-tuning. In some additional embodiments, the scaling with a factor of r may instead be performed after fine tuning.
Some alternative embodiments may be similar to the embodiments using the sum of the squares of the filter coefficients as a measure for filter strength but may instead use the square root of the sum of squares of the filter coefficients as a measure for filter strength. As an example, if c0=10, c1=0, c2=−20, c3 through c10=30, and c11=−40, then the sum of the squares of the filter coefficients is 10{circumflex over ( )}2+0{circumflex over ( )}2+(−20){circumflex over ( )}2+8*(30){circumflex over ( )}2+(−40){circumflex over ( )}2=9300, and the measure is sqrt(9300)=96.44.
In some embodiments, the encoder 202 may calculate this measure for all the filters affecting an image. The maximum filter strength measure calculated for the filters affecting an image may be regarded as full strength. As an example, if there are four filters, and their strength measures are 96.44, 56.57, 67.82, and 100.99, respectively, the encoder 202 may regard 100.99 as full strength.
In some embodiments, the encoder 202 may apply a scaling factor to each of the filters that has a filter strength above a filter strength threshold. In some embodiments, the filter strength threshold may be a percentage or a fraction of the full strength measure. In some embodiments, the filter strength threshold may be equal to the full strength measure multiplied by a threshold factor s. For example, a threshold factor of s=1.0 may result in no change to the filters because none of the filters will have a filter strength above the full strength measure (e.g., 100.99×1.0). However, if the threshold factor is s=0.75, the encoder may make sure that no filter gets a strength larger than 0.75*100.99=75.75. This means that the filters with strength 56.57 and 67.8 will not be changed, but the filters with strength 96.44 and 100.99 will be scaled down until their strength is less than or equal to 75.75. In some embodiments, scaling down the filters may be done by multiplying the filter coefficients with a factor r until the filter is below the strength threshold. In the example where c0=10, c1=0, c2=−20, c3 through c10=30, and c11=−40, scaling every filter coefficient with r=0.7 will give the new filter coefficients c0=7, c2=−14, c3 through c10=21, and c11=−28, and the strength after the scaling will become 67.50, which is less than the filter strength threshold of 75.75 and will satisfy the criterion. However, scaling every filter coefficient with r=0.8 will give the new filter coefficients c0=8, c2=−16, c3 through c10=24, and c11=−32, and the strength after the scaling will become 77.14, which is higher than 75.75 and may not satisfy the criterion.
In some embodiments, the encoder 202 may determine the scaling factor r that perfectly hits the target as:
r=s*maxstrength/strength=0.75*sqrt(10200)/sqrt(9300)=sqrt(0.8226)=0.7855.
Because the coefficients may need to be rounded to integers afterwards, this calculated scaling factor r may still not hit the target exactly. In our example, multiplying with 0.7855 will give c0=7.854524, c1=0, c2=−15.709048, c3 through c10=23.563572 and c11=−31.418096. After rounding to integers we get c0=8, c1=0, c2=−16, c3 through c10=24 and c11=31, which gives the strength sqrt(5889)=76.74 which is too high. However, in some embodiments, the encoder 202 may nonetheless regard the scaled filter coefficients as close enough. Hence, in some embodiments, the encoder 202 may determine the filter coefficients by calculating the scaling factor r that perfectly hits the target, applying the calculated scaling factor to the filter coefficients, and then rounding each of the scaled filter coefficients to the closest integer.
In some embodiments, the encoder 202 may control the filter strength based on how much ALF is used. In some embodiments, the encoder 202 may set a scaling factor that reduced the filter strength if ALF is used a lot.
Results: Visual Comparison
It may be hard to visualize video quality with a frame. However, as shown in
Results: Objective Comparison
Bjøntegaard delta rate results for Example 1 with a scaling factor of 0.75 when compared against VTM-10.0 are shown in the tables below. A figure of −1% means that it is possible to reach the same measured distortion with 1% less bits. The results indicate that the solution in embodiment 1 can be set to maintain the BDR of ALF but with less filter strength. Most of the objective benefit of ALF can be kept with a scaling factor of 0.75. In the tables below, values that have not yet been determining are indicated with a “TBD.”
The table below shows all intra over VTM-10.0:
The table below shows random access over VTM-10.0:
The table below shows low-delay B over VTM-10.0:
Flowcharts
In some encoding embodiments, the process 700 may include a step 702 of determining ALF coefficient values. In some embodiments, the ALF coefficient values may be adaptive loop filter (ALF) coefficient values. In some embodiments, the ALF portion of the loop 100 may determine the ALF coefficient values.
In some embodiments, determining the ALF coefficient values in step 702 may include solving a least-squares problem. In some embodiments, the determined ALF coefficient values may reduce an error between reconstructed image components (e.g., YSAO, CbSAO, and CrSAO) and original image components (e.g., Yorg, Cborg, and Crorg).
In some encoding embodiments, the process 700 may include a step 704 of determining a scaling factor. In some embodiments, the determined scaling factor may improve subjective performance for image to be encoded. In some embodiments, the determined scaling factor may maintain most of the objective performance of a non-scaled approach.
In some embodiments, determining the scaling factor in step 704 may include determining a strength of filtering with the ALF coefficient values. In some embodiments, determining the strength of filtering with the ALF coefficient values may include calculating a sum of the absolute values of the ALF coefficient values. In some alternative embodiments, determining the strength of filtering with the ALF coefficient values may include calculating a sum of the squares of the ALF coefficient values. In some other alternative embodiments, determining the strength of filtering with the ALF coefficient values may include calculating a square root of a sum of the squares of the ALF coefficient values. In some embodiments, the strength of filtering with the ALF coefficient values may be determined based on a quantization parameter (QP).
In some embodiments, the scaling factor may be determined in step 704 based on the strength of filtering with the ALF coefficient values. In some embodiments, determining the scaling factor in step 704 may include comparing the determined strength of filtering with the ALF coefficient values to a strength threshold (e.g., comparing the sum of the absolute values of the ALF coefficient values to a strength threshold of 128 or comparing the QP to a strength threshold of 36). In some embodiments, determining the scaling factor in step 704 may include determining that the strength of filtering with the ALF coefficient values is larger than a strength threshold and, if the strength of filtering with the ALF coefficient values is determined to be larger than the strength threshold, setting the scaling factor such that a strength of filtering with the scaled ALF coefficient values is less than or equal to the strength threshold. In some embodiments, determining the scaling factor in step 704 may include determining that the strength of filtering with the ALF coefficient values is larger than a strength threshold and, if the strength of filtering with the ALF coefficient values is determined to be larger than the strength threshold, setting the scaling factor to be less than 1.
In some embodiments, the process 700 (e.g., step 704 of the process 700) may include determining the strength threshold to which the determined strength of filtering with the ALF coefficient values is compared. In some embodiments, determining the strength threshold may include (i) for each filter affecting the image, determining a strength of filtering with ALF coefficient values of the filter and (ii) multiplying a threshold factor by a maximum determined strength of the filters affecting the image. In some embodiments (e.g., embodiments where the determined strength of filtering with the ALF coefficient values is the calculated sum of the squares of the ALF coefficient values), the determined scaling factor may be equal to sqrt(s*maxstrength/strength), where sqrt is the square root, s is the threshold factor, maxstrength is the maximum determined strength of the filters affecting the image, and strength is the determined strength of filtering with the ALF coefficient values. In some embodiments (e.g., embodiments where the determined strength of filtering with the ALF coefficient values is the calculated square root of the sum of the squares of the ALF coefficient values), the determined scaling factor may be equal to s*maxstrength/strength, where s is the threshold factor, maxstrength is the maximum determined strength of the filters affecting the image, and strength is the determined strength of filtering with the ALF coefficient values.
In some embodiments, determining the scaling factor in step 704 may include determining a classification type (e.g., one of vertical/horizontal, diagonal, and non-oriented), and the determined scaling factor may be based on the determined classification type. In some embodiments, the scaling factor determined in step 704 may additionally or alternatively be based on whether the image is an intra coded picture or an inter coded picture.
In some embodiments, the process 700 may include steps of determining that the scaling factor is not below 1 and using fixed ALF coefficient values only if the determined scaling factor is not below 1. In some embodiments, the process 700 may include steps of determining that a strength of filtering with fixed filter coefficients is less than a threshold and using the fixed ALF coefficient values only if the determined strength is less than the threshold.
In some encoding embodiments, the process 700 may include a step 706 of generating scaled ALF coefficient values. In some embodiments, generating the scaled ALF coefficient values may include applying the scaling factor to one or more of the ALF coefficient values, and the scaled ALF coefficient values may be for use by the decoder 204 in filtering image components.
In some embodiments, applying the scaling factor to one or more of the ALF coefficient values in step 706 may include applying the scaling factor as a multiplication of the one or more of the ALF coefficient values in floating point representation. In some alternative embodiments, applying the scaling factor to one or more of the ALF coefficient values in step 706 may include applying the scaling factor as a multiplication, addition, and/or shift of one or more filter coefficients in fixed point representation.
In some embodiments, the process 700 (e.g., the step 706 of the process 700) may include rounding the scaled coefficients to the closest integer, and the rounded scaled coefficients may be used by the decoder 204 to in filtering image components.
In some embodiments, the process 700 may include an optional step 708 of quantizing the scaled ALF coefficient values. In some embodiments, ALF coefficient values and cross-component (CC) ALF coefficient values may be derived in floating point. In some embodiments where ALF coefficient values are derived in floating point, the ALF coefficient values in floating point may be quantized by multiplying them by 128 and then rounding them to integer values, the max value may be 128, and the min value may be −128. In some embodiments where CC ALF coefficient values are derived in floating point, the CC ALF filter coefficients may be quantized by representing them in multiples of 2, the max value may be 64, and the min value may be −64.
In some embodiments, the optional step 708 may additionally include adjusting the quantized coefficient values. In some embodiments, the step 708 may include determining a strength of filtering with the scaled ALF coefficient values and determining a strength of filtering with the adjusted quantized coefficient values. In some embodiments, the adjusted quantized coefficient values may be such that the strength of filtering with the adjusted quantized coefficient values are not greater than the strength of filtering with the scaled ALF coefficient values by more than a threshold amount. In some embodiments, adjusting the quantized coefficient values may include only adjusting the quantized coefficient values if the determined scaling factor is less than 1. In some embodiments, the optional step 708 may include determining optimal filter coefficients for a coding tree unit (CTU) and determining a strength of filtering with the optimal ALF coefficient values, and the adjusted quantized coefficient values may maintain the determined strength of filtering with the optimal ALF coefficient values.
In some embodiments, the process 700 may include an optional step 710 of providing the scaled ALF coefficient values to the decoder 204. In some embodiments, the scaled ALF coefficient values provided to the decoder 204 may be the quantized coefficient values (or the adjusted quantized coefficient values). In some embodiments, providing the scaled ALF coefficient values to the decoder 204 may include encoding the scaled ALF coefficient values in a bitstream and conveying the bitstream over the network 110.
In some embodiments, the process 700 may include avoiding merging of ALF coefficient values when the scaling factor is less than 1.
In some embodiments, the scaling factor determined in step 704 may be a first scaling factor, the one or more of the ALF coefficient values to which the first scaling factor is applied in step 706 may be a first set of the ALF coefficient values, the process 700 (e.g., the step 704 of the process 700) may include determining a second scaling factor, and generating the scaled ALF coefficient values in step 706 may include applying the second scaling factor to one or more of the ALF coefficient values in a second set of the ALF coefficient values. In some embodiments, the first set of the ALF coefficient values may be positive filter coefficients, and the second set of the ALF coefficient values may be negative ALF coefficient values. In some embodiments, determining the first scaling factor may include determining a first strength of filtering, and determining the second scaling factor may include determining a second strength of filtering. In some embodiments, determining the first strength of filtering may include calculating a sum of the first set of the ALF coefficient values, and determining the second strength of filtering may include calculating a sum of the second set of the ALF coefficient values. In some embodiments, the first scaling factor may be determined based on the first strength of filtering, and the second scaling factor may be determined based on the second strength of filtering.
While various embodiments are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of this disclosure should not be limited by any of the above-described exemplary embodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SE2021/050928 | 9/23/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63084975 | Sep 2020 | US |