The present invention relates generally to images and video coding. More particularly, an embodiment of the present invention relates to image reshaping in video coding.
BACKGROUND
In 2013, the MPEG group in the International Standardization Organization (ISO), jointly with the International Telecommunications Union (ITU), released the first draft of the HEVC (also known as H.265) video coding standard (Ref. [4]). More recently, the same group has released a call for evidence to support the development of a next generation coding standard that provides improved coding performance over existing video coding technologies.
As used herein, the term ‘bit depth’ denotes the number of pixels used to represent one of the color components of an image. Traditionally, images were coded at 8-bits, per color component, per pixel (e.g., 24 bits per pixel); however, modern architectures may now support higher bit depths, such as 10 bits, 12 bits or more.
In a traditional image pipeline, captured images are quantized using a non-linear opto-electronic function (OETF), which converts linear scene light into a non-linear video signal (e.g., gamma-coded RGB or YCbCr). Then, on the receiver, before being displayed on the display, the signal is processed by an electro-optical transfer function (EOTF) which translates video signal values to output screen color values. Such non-linear functions include the traditional “gamma” curve, documented in ITU-R Rec. BT.709 and BT. 2020, the “PQ” (perceptual quantization) curve described in SMPTE ST 2084, and the “HybridLog-gamma” or “HLG” curve described in Rec. ITU-R BT. 2100.
As used herein, the term “forward reshaping” denotes a process of sample-to-sample or codeword-to-codeword mapping of a digital image from its original bit depth and original codewords distribution or representation (e.g., gamma or PQ or HLG, and the like) to an image of the same or different bit depth and a different codewords distribution or representation. Reshaping allows for improved compressibility or improved image quality at a fixed bit rate. For example, without limitation, reshaping may be applied to 10-bit or 12-bit PQ-coded HDR video to improve coding efficiency in a 10-bit video coding architecture. In a receiver, after decompressing the reshaped signal, the receiver may apply an “inverse reshaping function” to restore the signal to its original codeword distribution. As appreciated by the inventors here, as development begins for the next generation of a video coding standard, improved techniques for the integrated reshaping and coding of images are desired. Methods of this invention can be applicable to a variety of video content, including, but not limited, to content in standard dynamic range (SDR) and/or high-dynamic range (HDR).
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.
An embodiment of the present invention is illustrated by way of example, and not in way by limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Signal reshaping and coding techniques for compressing images using rate-distortion optimization (RDO) are described herein. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating the present invention.
Example embodiments described herein relate to signal reshaping and coding for video. In an encoder, a processor receives an input image in a first codeword representation to be reshaped to a second codeword representation, wherein the second codeword representation allows for a more efficient compression than the first codeword representation, and generates a forward reshaping function mapping pixels of the input image to a second codeword representation, wherein to generate the forward reshaping function, the encoder: divides the input image into multiple pixel regions, assigns each of the pixel regions to one of multiple codeword bins according to a first luminance characteristic of each pixel region, computes a bin metric for each one of the multiple codeword bins according to a second luminance characteristic of each of the pixel regions assigned to each codeword bin, allocates a number of codewords in the second codeword representation to each codeword bin according to the bin metric of each codeword bin and a rate distortion optimization criterion, and generates the forward reshaping function in response to the allocation of codewords in the second codeword representation to each of the multiple codeword bins.
In another embodiment, in a decoder, a processor receives coded bitstream syntax elements characterizing a reshaping model, wherein the syntax elements include one or more of a flag indicating a minimum codeword bin index value to be used in a reshaping construction process, a flag indicating a maximum codeword bin index value to be used in a reshaping construction process, a flag indicating a reshaping model profile type, wherein the model profile type is associated with default bin-relating parameters, including bin importance values, or a flag indicating one or more delta bin importance values to be used to adjust the default bin importance values defined in the reshaping model profile. The processor determines based on the reshaping model profile the default bin importance values for each bin and an allocation list of a default numbers of codewords to be allocated to each bin according to the bin's importance value. Then,
for each codeword bin, the processor:
generates a forward reshaping function based on the number of codewords allocated to each codeword bin.
In another embodiment, in a decoder, a processor receives a coded bitstream comprising one or more coded reshaped images in a first codeword representation and metadata related to reshaping information for the coded reshaped images. The processor
generates based on the metadata related to the reshaping information, an inverse reshaping function and a forward reshaping function, wherein the inverse reshaping function maps pixels of the reshaped image from the first codeword representation to a second codeword representation, and the forward reshaping function maps pixels of an image from the second codeword representation to the first codeword representation. The processor extracts from the coded bitstream a coded reshaped image comprising one or more coded units, wherein for one or more coded units in the coded reshaped image:
for a reshaped intra-coded coding unit (CU) in the coded reshaped image, the processor:
for a reshaped inter-coded coding unit in the coded reshaped image, the processor:
In another embodiment, in a decoder, a processor receives a coded bitstream comprising one or more coded reshaped images in an input codeword representation and
reshaping metadata (207) for the one or more coded reshaped images in the coded bitstream. The processor generates a forward reshaping function (282) based on the reshaping metadata, wherein the forward reshaping function maps pixels of an image from a first codeword representation to the input codeword representation. The processor
generates an inverse reshaping function (265-3) based on the reshaping metadata or the forward reshaping function, wherein the inverse reshaping function maps pixels of a reshaped image from the input codeword representation to the first codeword representation. The processor extracts from the coded bitstream a coded reshaped image comprising one or more coded units, wherein:
for an intra-coded coding unit (intra-CU) in the coded reshaped image, the processor:
for an inter-coded CU (inter-CU) in the coded reshaped image, the processor:
generates a decoded image in the first codeword representation based on output samples in the reference buffer.
The video data of production stream (112) is then provided to a processor at block (115) for post-production editing. Block (115) post-production editing may include adjusting or modifying colors or brightness in particular areas of an image to enhance the image quality or achieve a particular appearance for the image in accordance with the video creator's creative intent. This is sometimes called “color timing” or “color grading.” Other editing (e.g. scene selection and sequencing, image cropping, addition of computer-generated visual special effects, etc.) may be performed at block (115) to yield a final version (117) of the production for distribution. During post-production editing (115), video images are viewed on a reference display (125).
Following post-production (115), video data of final production (117) may be delivered to encoding block (120) for delivering downstream to decoding and playback devices such as television sets, set-top boxes, movie theaters, and the like. In some embodiments, coding block (120) may include audio and video encoders, such as those defined by ATSC, DVB, DVD, Blu-Ray, and other delivery formats, to generate coded bit stream (122). In a receiver, the coded bit stream (122) is decoded by decoding unit (130) to generate a decoded signal (132) representing an identical or close approximation of signal (117). The receiver may be attached to a target display (140) which may have completely different characteristics than the reference display (125). In that case, a display management block (135) may be used to map the dynamic range of decoded signal (132) to the characteristics of the target display (140) by generating display-mapped signal (137).
Following coding (120) and decoding (130), decoded frames (132) may be processed by a backward (or inverse) reshaping function (160), which converts the re-quantized frames (132) back to the original EOTF domain (e.g., gamma), for further downstream processing, such as the display management process (135) discussed earlier. In some embodiments, the backward reshaping function (160) may be integrated with a de-quantizer in decoder (130), e.g., as part of the de-quantizer in an AVC or HEVC video decoder.
As used herein, the term “reshaper” may denote a forward or an inverse reshaping function to be used when coding and/or decoding digital images. Examples of reshaping functions are discussed in Ref. [2]. In Ref. [2], an in-loop block-based image reshaping method for high dynamic range video coding was proposed. That design allows block-based reshaping inside the coding loop, but at a cost of increased complexity. To be specific, the design requires maintaining two sets of decoded-image buffers: one set for inverse-reshaped (or non-reshaped) decoded pictures, which can be used for both prediction without reshaping and for output to a display, and another set for forward-reshaped decoded pictures, which is used only for prediction with reshaping. Though forward-reshaped decoded pictures can be computed on the fly, the complexity cost is very high, especially for inter-prediction (motion compensation with sub-pixel interpolation). In general, display-picture-buffer (DPB) management is complicated and requires very careful attention, thus, as appreciated by the inventors, simplified methods for coding video are desired.
In Ref. [6], additional reshaping-based codec architectures were presented, including an external, out-of-loop reshaper, an architecture with an in-loop intra only reshaper, an architecture with an in-loop reshaper for prediction residuals, and a hybrid architecture which combines both intra, in-loop, reshaping and inter, residual reshaping. The main goal of those proposed reshaping architectures is to improve subjective visual quality. Thus, many of these approaches will yield worse objective metrics, in particular the well-known Peak Signal to Noise Ratio (PSNR) metric.
In this invention, a new reshaper is proposed based on Rate-Distortion Optimization (RDO). In particular, when the targeted distortion metric is MSE (Mean Square Error), the proposed reshaper will improve both subjective visual quality and well-used objective metrics based on PSNR, Bjontegaard PSNR (BD-PSNR), or Bjontegaard Rate (BD-Rate). Note that any of the proposed reshaping architectures, without loss of generality, may be applied for the luminance component, one or more of the chroma components, or a combination of luma and chroma components.
Consider a reshaped video signal represented by a bit-depth of B bits in a color component (e.g., B=10 for Y, Cb, and/or Cr), thus there are a total of 2B available codewords. Consider dividing the desired codeword range [0 2B] into N segments or bins, and let Mk represents the number of codewords in the k-th segment or bin, after a reshaping mapping, so that given a target bit rate R, the distortion D between the source picture and the decoded or reconstructed picture is minimal. Without loss of generality, D may be expressed as a measure of the sum of square error (SSE) between corresponding pixel values of the source input (Source(i,j)) and the reconstructed picture (Recon(i,j))
D=SSE=Σ
i,jDiff(i,j)2, (1)
where
Diff(i,j)=Source(i,j)−Recon(i,j).
The optimization reshaping problem may be re-written as: find Mk (k=0, 1, . . . , N−1), such that given a bitrate R, D is minimal, where Σk=0N−1Mk<=2B.
Various optimization methods can be used to find a solution, but the optimal solution could be very complicated for real-time encoding. In this invention, a suboptimal, but more practical analytical solution is proposed.
Without losing generality, consider an input signal represented by a bit depth of B bits (e.g., B=10), where the codewords are uniformly divided into N bins (e.g., N=32). By default, each bin is assigned to Ma=2B/N codewords (e.g., for N=32 and B=10, Ma=32). Next, a more efficient codeword allocation, based on RDO, will be demonstrated through an example.
As used herein, the term “narrow range” [CW1, CW2] denotes a continuous range of codewords between codewords CW1 and CW2 which is a subset of the full dynamic range [0 2B−1]. For example, in an embodiment, a narrow range may be defined as [16*2(B−8), 235*2(B−8)], (e.g., for B=10, the narrow range comprises values [64 940]). Assuming the bit depth of the output signal is Bo, if the dynamic range of an input signal is within a narrow range, then, in what will be denoted as “default” reshaping, one can stretch the signal into the full range [0 2B
For the same quantization parameter (QP), the effect of increasing the number of codewords in a bin is equivalent to allocating more bits to code the signal within the bin, therefore it is equivalent to reducing SSE or improving PSNR; however, a uniform increase of codeword allocation in each bin may not give better results than coding without reshaping, because PSNR gain may not beat the increase of bitrate, i.e., this is not a good tradeoff in terms of RDO. Ideally, one would like to assign more codewords only to the bins which yield the best tradeoff on RDO, i.e., generate significant SSE decrease (PSNR increase) at the expense of little amount of bitrate increase.
In an embodiment, RDO performance is improved through an adaptive piecewise reshaping mapping. The method can be applied to any type of a signal, including standard dynamic range (SDR) and high-dynamic range (HDR) signals. Using the previous simple case as an example, the goal of this invention is to assign either Ma or Mf codewords for each codeword segment or codeword bin.
At an encoder, given N codeword bins for the input signal, the average luminance variance of each bin can be approximated as following:
varbin(k)=varbin(k)+Luma_var(i);
c
bin(k)=cbin(k)+1. (2)
varbin(k)=varbin(k)/cbin(k) (3)
A person skilled in the art would appreciate that one may apply alternative metrics than luminance variance to characterize the sub-blocks. For example, one may use the standard deviation of luminance values, a weighted luminance variance or luminance value, a peak luminance, and the like.
In an embodiment, the following pseudo code depicts an example on how an encoder may adjusts the bin allocation using the computed metrics for each bin.
where THU denotes a predetermined upper threshold.
In another embodiment, the allocation may be performed as follows:
For the k-th bin,
where TH0 and TH1 denote predetermined lower and upper thresholds.
In another embodiment
For the k-th bin,
where THL denotes a predetermined lower threshold.
The above examples show how to select the number of codewords for each bin from two pre-select numbers Mf and Ma. Thresholds (e.g., THU or THL) can be determined based on optimizing the rate distortion, e.g., through exhaustive search. Thresholds may also be adjusted based on the quantization parameter values (QP). In an embodiment, for B=10, thresholds may range between 1,000 and 10,000.
In an embodiment, to expedite processing, a threshold may be determined from a fixed set of values, say, {2,000, 3,000, 4,000, 5,000, 6,000, 7,000}, using a Lagrangian optimization method. For example, for each TH(i) value in the set, using pre-defined training clips, one can run compression tests with fixed QP, and compute values of an objective function J defined as
J(i)=D+λR. (7)
Then, the optimal threshold value may be defined as the TH(i) value in the set for which J(i) is minimum.
In a more general example, one can predefine a look-up table (LUT). For example, in Table 1, the first row defines a set of thresholds dividing the full range of possible bin metrics (e.g., varbin(k) values) into segments, and the second row defines the corresponding number of codewords (CW) to be assigned in each segment. In an embodiment, one rule to build such a LUT is: if the bin variance is too big, one may need to spend lots of bits to reduce the SSE, therefore one can assign codeword (CW) values less than Ma. If the bin variance is very small, one can assign a CW value larger than Ma.
Using Table 1, the mapping of thresholds into codewords may be generated as follows:
For the k-th bin,
if there are no pixels in the bin
For example, given two thresholds and three codeword allocations, for B=10, in an embodiment, TH0=3,000, CW0=38, TH1=10,000, CW1=32, and CW2=28.
In another embodiment, the two thresholds TH0 and TH1 may be selected as follows: a) consider TH1 to be a very large number (even infinity) and select TH0 from a set of predetermined values, e.g., using the RDO optimization in equation (7). Given TH0, now define a second set of possible values for TH1, e.g., set {10,000, 15,000, 20,000, 25,000, 30,000}, and apply equation (7) to identify the optimum value. The approach can be iteratively performed with a limited numbers of threshold values or until it converges.
One may note that after allocating codewords to bins according to any of the schemes defined earlier, either the sum of Mk values may exceed the maximum of available codewords (2B) or there are unused codewords. If there are unused codewords, one may simply decide to do nothing, or allocate them to specific bins. On the other hand, if the algorithm assigned more codewords than available, then one may want to readjust the Mk values, e.g., by renormalizing the CW values. Alternatively, one may generate the forward reshaping function using the existing Mk values, but then readjust the output value of the reshaping function by scaling with (ΣkMk)/2B. Examples of codeword reallocation techniques are also described in Ref. [7].
In an embodiment, as an example and without limitation, the forward LUT (FLUT) can be built using the following C code.
In an embodiment, the inverse LUT can be built as follows:
Syntax-wise, one can re-use the syntax proposed in previous applications, such as the piecewise polynomial mode or parametric model in References [5] and [6]. Table 2 shows such an example for N=32 for equation (4).
where,
reshaper_model_profile_type specifies the profile type to be used in the reshaper construction process. A given profile may provide information about default values being used, such as the number of bins, default bin importance or priority values, and default codeword allocations (e.g., Ma and/or Mf values).
reshaper_model_scale_idx specifies the index value of a scale factor (denoted as ScaleFactor) to be used in the reshaper construction process. The value of the ScaleFactor allows for improved control of the reshaping function for improved overall coding efficiency.
reshaper_model_min_bin_idx specifies the minimum bin index to be used in the reshaper construction process. The value of reshaper_model_min_bin_idx shall be in the range of 0 to 31, inclusive.
reshaper_model_max_bin_idx specifies the maximum bin index to be used in the reshaper construction process. The value of reshaper_model_max_bin_idx shall be in the range of 0 to 31, inclusive.
reshaper_model_bin_profile_delta[i] specifies the delta value to be used to adjust the profile of the i-th bin in the reshaper construction process. The value of reshaper_model_bin_profile_delta[i] shall be in the range of 0 to 1, inclusive.
Table 3 depicts another embodiment with an alternative, more efficient, syntax representation.
where,
reshaper_model_delta_max_bin_idx is set equal to the maximum allowed bin index (e.g., 31) minus the maximum bin index to be used in the reshaper construction process.
reshaper_model_num_cw_minus1 plus 1 specifies the number of codewords to be signalled.
reshaper_model_delta_abs_CW[i] specifies the i-th absolute delta codeword value.
reshaper_model_delta_sign_CW[i] specifies the sign for the i-th delta codeword. Then: reshaper_model_delta_CW[i]=(1-2*reshaper_model_delta_sign_CW[i])*reshaper_model_delta_abs_CW[i]; reshaper_model_CW[i]=32+reshaper_model_delta_CW[i].
reshaper_model_bin_profile_delta[i] specifies the delta value to be used to adjust the profile of the i-th bin in the reshaper construction process. The value of reshaper_model_bin_profile_delta[i] shall be in the range of 0 to 1 when reshaper_model_num_cw_minus1 is equal to 0. The value of reshaper_model_bin_profile_delta[i] shall be in the range of 0 to 2 when reshaper_model_num_cw_minus1 is equal to 1.
CW=32 when reshaper_model_bin_profile_delta[i] is set equal to 0, CW=reshaper_model_CW[0] when reshaper_model_bin_profile_delta[i] is set equal to 1; CW=reshaper_model_CW[1] when reshaper_model_bin_profile_delta[i] is set equal to 2. In an embodiment, reshaper_model_num_cw_minus1 is allowed to be larger than 1 allowing reshaper_model_num_cw_minus1 and reshaper_model_bin_profile_delta[i] to be signaled with ue(v) for more efficient coding.
In another embodiment, as described in Table 4, the number of codewords per bin may be defined explicitly.
reshaper_model_number_bins_minus1 plus 1 specifies the number of bins used for the luma component. In some embodiments it may be more efficient that the number of bins is a power of two. Then, the total number of bins may be represented by its log 2 representation, e.g., using an alternative parameter like log 2_reshaper_model_number_bins_minus1. For example, for 32 bins log 2_reshaper_model_number_bins_minus1=4.
reshaper_model_bin_delta_abs_cw_prec_minus1 plus 1 specifies the number of bits used for the representation of the syntax reshaper_model_bin_delta_abs_CW[i].
reshaper_model_bin_delta_abs_CW[i] specifies the absolute delta codeword value for the i-th bin.
reshaper_model_bin_delta_sign_CW_flag[i] specifies the sign of reshaper_model_bin_deltaabs_CW[i] as follows:
In an embodiment, assuming the codeword allocation according to one of the earlier examples, e.g., equation (4), an example of how to define the parameters in Table 2, comprises:
First assume one assigns “bin importance” as follows:
For the k-th bin,
As used herein, the term “bin importance” is a value assigned to each of the N codeword bins to indicate the importance of all codewords in that bin in the reshaping process with respect to other bins.
In an embodiment, one may set the default_bin_importance from reshaper_model_min_bin_idx to reshaper_model_max_bin_idx to 1. The value of reshaper_model_min_bin_idx is set to the smallest bin index which has Mk not equal to 0. The value of reshaper_model_max_bin_idx is set to the largest bin index which has Mk not equal to 0. reshaper_model_bin_profile_delta for each bin within [reshaper_model_min_bin_idx reshaper_model_max_bin_idx] is the difference between bin_importance and the default_bin_importance.
An example of how to use the proposed parametric model to construct a Forward Reshaping LUT (FLUT) and an Inverse Reshaping LUT (ILUT) is shown as follows.
From a syntax point of view, alternative methods can also be applied. The key is to specify the number of codewords in each bin (e.g., Mk, for k=0, 1, 2,. . . , N−1) either explicitly or implicitly. In one embodiment, one can specify explicitly the number of codewords in each bin. In another embodiment, one can specify the codewords differentially. For example, the number of codewords in a bin can be determined using the difference of the number of codewords in the current bin and the previous bin (e.g., M_Delta(k)=M(k)−M(k−1)). In another embodiment, one can specify the most commonly used number of codewords (say, MM) and express the number of codewords in each bin as the difference of the codeword number in each bin from this number (e.g., M_Delta(k)=M(k)−MM).
In an embodiment, two reshaping methods are supported. One is denoted as the “default reshaper,” where Mf is assigned to all bins. The second, denoted as “adaptive reshaper,” applies the adaptive reshaper described earlier. The two methods can be signaled to a decoder as in Ref. [6] using a special flag, e.g., sps_reshaper_adaptive_flag (e.g., use sps_reshaper_adaptive_flag=0 for the default reshaper and use sps_reshaper_adaptive_flag=1) for the adaptive reshaper.
The invention is applicable to any reshaping architecture proposed in Ref. [6], such as: an external reshaper, in-loop intra only reshaper, in-loop residue reshaper, or in-loop hybrid reshaper. As an example,
In the decoder (200_D), the following new normative blocks are added to a traditional block-based decoder: a block (250) (reshaper decoding) to reconstruct a forward reshaping function and an inverse reshaping function based on the encoded reshaping function parameters (207), a block (265-1) to apply the inverse reshaping function to the decoded data, and a block (265-2) to apply both the forward reshaping function and inverse reshaping function to generate the decoded video signal (162). For example, in (265-2)
the reconstructed value is given by Rec=ILUT(FLUT(Pred)+Res), where FLUT denotes the forward reshaping LUT and ILUT denotes the inverse reshaping LUT.
In some embodiments, operations related to blocks 250 and 265 may be combined into a single processing block. As depicted in
As described in Ref. [6] and earlier in this specification, the forward reshaping LUT FwdLUT may be built by integration, while the inverse reshaping LUT may be built based on a backward mapping using the forward reshaping LUT (FwdLUT). In an embodiment, the forward LUT may be built using piecewise linear interpolation. At the decoder, inverse reshaping can be done by using the backward LUT directly or again by linear interpolation. The piece-wise linear LUT is built based on input pivot points and output pivot points.
Let (X1, Y1), (X2, Y2) be two input pivot points and their corresponding output values for each bin. Any input value X between X1 and X2 can be interpolated by the following equation:
Y=((Y2−Y1)/(X2−X1))*(X−X1)+Y1.
In a fixed-point implementation, the above equation can be rewritten as
Y=((m*X+2FP_PREC−1)>>FP_PREC)+c
where m and c denote the scalar and offset for linear interpolation and FP_PREC is a constant related to the fixed-point precision.
As an example, FwdLUT may be built as follows: Let the variable
lutSize =(1<<BitDepthY).
Let variables
binNum=reshaper_model_number_bins_minus1+1,
and
binLen=lutSize/binNum.
For the i-th bin, its two bracketing pivots (e.g., X1 and X2) may be derived as X1=i*binLen and X2=(i+1)*binLen. Then:
FP_PREC defines the fixed-point precision of the fractional part of the variables (e.g., FP_PREC=14). In an embodiment, binsLUT[] may be computed in higher precision than the precision of FwdLUT. For example, binsLUT[] values may be computed as 32-bit integers, but FwdLUT may be the binsLUT values clipped at 16 bits.
As described earlier, during reshaping, the codeword allocation may be adjusted using one or more thresholds (e.g., TH, THU, THL, and the like). In an embodiment, such thresholds may be generated adaptively based on the content characteristics.
BinHist[b]=100*(total pixels in bin b)/(total pixels in the picture), (10)
As discussed before, another good metric of image characteristics is the average variance (or standard deviation) of pixels in each bin, to be denoted BinVar[b]. BinVar[b] may be computed in “block mode” as varbin(k) in the steps described in the section leading to equations (2) and (3). Alternatively, the block-based calculation could be refined with pixel-based calculations. For example, denote as vf(i) the variance associated with a group of pixels surrounding the i-th pixel in a m×m neighborhood window (e.g., m=5) with the i-th pixel at its center). For example, if
denotes the mean value of pixels in a WN=m*m window (e.g., m=5) surrounding the i-th pixel with value x(i), then
An optional non-linear mapping, such as vf(i)=log 10(vf(i)+1), can be used to suppress the dynamic range of raw variance values. Then, the variance factor may be used in calculating the average variance in each bin as
where Kb denotes the number of pixels in bin b.
An example plot of sorted average bin variance factors is depicted in
An example plot (605) of a computed CDF, based on the data of
5) Finally, in step 525, given the CDF BinVarSortDsdCDF[BinVarSortDsd[b]] as a function of the sorted average bin-variance values, one can define thresholds based on bin variances and the accumulated percentages.
Examples for determining a single threshold or two thresholds are shown in
When using two thresholds, an example of selecting THL and THU is depicted in
The techniques above can be easily extended to cases with more than two thresholds. The relationship can also be used to adjust the number of codewords (Mf, Ma, etc.). As a rule of thumb, in low-variance bins, one should assign more codewords to boost PSNR (and reduce MSE); for high-variance bins, one should assign less codewords to save bits.
In an embodiment, if the set of parameters (e.g., THL, THU, Ma, Mf, and the like) were obtained manually for specific content, for example, through an exhaustive manual parameter tuning, this automatic method may be applied to design a decision tree to categorize each content in order to set the optimum manual parameters automatically. For example, content categories include: film, television, SDR, HDR, cartoons, nature, action, and the like.
To reduce complexity, in-loop reshaping may be constrained using a variety of schemes. If in-loop reshaping is adopted in a video coding standard, then these constrains should be normative to guarantee decoder simplifications. For example, in an embodiment, luma reshaping may be disabled for certain block coding sizes. For example, one could disable intra and inter reshaper mode in an inter slice when nTbW*nTbH<TH, where the variable nTbW specifies the transform block width and variable nTbH specifies the transform block height. For example, for TH=64, blocks with sizes 4×4, 4×8, and 8×4 are disabled for both intra and inter mode reshaping in inter-coded slices (or tiles).
Similarly, in another embodiment, one may disable luma-based, chroma residue scaling in intra mode in inter-coded slices (or tiles), or when having separate luma and chroma partitioning trees is enabled.
Interaction with Other Coding Tools
In Ref. [6], it was described that a loop filter can operate either in the original pixel domain or in the reshaped pixel domain. In one embodiment it is suggested that loop filtering is performed in the original pixel domain (after picture reshaping). For example, in a hybrid in-loop reshaping architecture (200_E and 200_D), for intra picture, one will need to apply inverse reshaping (265-1) before the loop filter (270-1).
As depicted in
For decoding intra-coded CUs (200B_D), Intra prediction (225) is performed on reshaped neighboring pixels. Given residual Res, and a predicted sample PredSample, the reconstructed sample (227) is derived as:
RecSample=Res+PredSample. (14)
Given the reconstructed samples (227), loop filtering (270) and inverse picture reshaping (265) are applied to derive RecSampleInDPB samples to be stored in DPB (260), where
where InvLUT( ) denotes the inverse reshaping function or inverse reshaping look-up table, and LPF( ) denotes the loop-filtering operations.
In traditional coding, inter/intra-mode decisions are based on computing a distortion function (dfunc( )) between the original samples and the predicted samples. Examples of such functions include the sum of square errors (SSE), the sum of absolute differences (SAD), and others. When using reshaping, at the encoder side (not shown), CU prediction and mode decision are performed on the reshaped domain. That is, for mode decision,
distortion=dfunc(FwdLUT(SrcSample)−RecSample), (16)
where FwdLUT( ) denotes the forward reshaping function (or LUT) and SrcSample denotes the original image samples.
For inter-coded CUs, at the decoder side (e.g., 200C_D), inter prediction is performed using reference pictures in the non-reshaped domain in the DPB. Then in reconstruction block 275, the reconstructed pixels (267) are derived as:
RecSample=(Res+FwdLUT(PredSample)). (17)
Given the reconstructed samples (267), loop filtering (270) and inverse picture reshaping (265) are applied to derive RecSampleInDPB samples to be stored in DPB, where
RecSampleInDPB=InvLUT(LPF(RecSample)))=InvLUT(LPF(Res+FwdLUT(PredSample)))). (18)
At the encoder side (not shown), intra prediction is performed in the reshaped domain as
Res=FwdLUT(SrcSample)−PredSample, (19a)
under the assumption that all neighbor samples (PredSample) used for prediction are already in the reshaped domain. Inter prediction (e.g., using motion compensation) is performed in the non-reshaped domain (i.e., using reference pictures from the DPB directly), i.e.,
PredSample=MC(RecSampleinDPB), (19b)
where MC( ) denotes the motion compensation function. For motion estimation and fast mode decision, where residue is not generated, one can compute distortion using
distortion=dfunc(SrcSample−PredSample).
However, for full mode decision where residue is generated, mode decision is performed in the reshaped domain. That is, for full mode decision,
distortion=dfunc(FwdLUT(SrcSample)−RecSample). (20)
As explained before, the proposed in-loop reshaper allows reshaping to be adapted at the CU level, e.g., to set the variable CU_reshaper on or off as needed. Under the same architecture, for an inter-coded CU, when CU_reshaper=off, the reconstructed pixels need to be in the reshaped domain, even if the CU_reshaper flag is set to off for this inter-coded CU.
RecSample=FwdLUT(Res+PredSample), (21)
so that intra-prediction always has neighboring pixels in the reshaped domain. The DPB pixels can be derived as:
For an intra-coded CU, depending on the encoding process, two alternative methods are proposed:
1) All intra-coded CUs are coded with CU_reshaper=on. In this case, no additional processing is needed because all pixels are already in the reshaped domain.
2) Some intra-coded CUs can be coded using CU_reshaper =off. In this case, for CU_reshaper=off, when applying intra prediction, one needs to apply inverse reshaping to the neighboring pixels so that intra prediction is performed in the original domain and the final reconstructed pixels need to be in the reshaped domain, i.e.,
In general, the proposed architectures may be used in a variety of combinations, such as in-loop intra-only reshaping, in-loop reshaping only for prediction residuals, or a hybrid architecture which combines both intra, in-loop, reshaping and inter, residual reshaping. For example, to reduce the latency in the hardware decoding pipeline, for inter slice decoding, one can perform intra prediction (that is, decode intra CUs in an inter slice) before inverse reshaping. An example architecture (200D_D) of such an embodiment is depicted in
RecSample=(Res+FwdLUT(PredSample)).
where FwdLUT(PredSample) denotes the output of the inter predictor (280) followed by forward reshaping (282). Otherwise, for Intra CUs (e.g., the Mux enables the output from 284), the output of the reconstruction module (285) is
RecSample=(Res+IPredSample),
where IPredSample denotes the output of the Intra Prediction block (284). The inverse Reshaping block (265-3), generates
Y
CU=InvLUT[RecSample].
Applying intra prediction for inter slices in the reshaped domain is applicable to other embodiments as well, including those depicted in
PredSampleCombined=PredSampeIntra+FwdLUT(PredSampleInter)RecSample=Res+PredSampleCombined,
that is, inter-coded samples in the original domain are reshaped before the addition. Otherwise, when the combined inter/intra prediction mode is done in the original domain, then:
PredSampleCombined=InvLUT(PredSampeIntra)+PredSampleInter
RecSample=Res+FwdLUT(PredSampleCombined),
that is, intra-predicted samples are inversed-reshaped to be in the original domain
Similar considerations are applicable to the corresponding encoding embodiments as well, since encoders (e.g., 200_E) include a decoder loop that matches the corresponding decoder. As discussed earlier, equation (20) describes an embodiment where mode decision is performed in the reshaped domain. In another embodiment, mode decision may be performed in the original domain, that is:
distortion=dfunc(SrcSample−InvLUT(RecSample)).
For luma-based chroma QP offset or chroma residue scaling, the average CU luma value (
As in Ref. [6], one may apply the same proposed chromaDQP derivation process to balance the luma and chroma relationship caused by the reshaping curve. In a embodiment, one can derive a piece-wise chromaDQP value based on the codeword assignment for each bin. For example:
for the k-th bin,
scalek=(Mk/Ma); (25)
chromaDQP=6*log 2(scalek);
end
As described in Ref. [6], it is recommended to use pixel-based weighted distortion when lumaDQP is enabled. When reshaping is used, in an example, the weight needed is adjusted based on the reshaping function (f(x)). For example:
W
rsp
=f′(x)2, (26)
where f′(x) denotes the slope of reshaping function f(x).
In another embodiment, one can derive piecewise weights directly based on codeword assignment for each bin. For example:
for the k-th bin,
For a chroma component, weight can be set to 1 or some scaling factor sf. To reduce chroma distortion, sf can be set larger than 1. To increase chroma distortion, sf can be set larger than 1. In one embodiment, sf can be used to compensate for equation (25). Since chromaDQP can be only set to integer, we can use sf to accommodate the decimal part of chromaDQP: thus,
sf=2((chromaDQP−INT(chromaDQP))/3).
In another embodiment, one can explicitly set the chromaQPOffset value in the Picture Parameter Set (PPS) or a slice header to control chroma distortion.
The reshaper curve or mapping function does not need to be fixed for the whole video sequence. For example, it can be adapted based on the quantization parameter (QP) or the target bit rate. In one embodiment, one can use a more aggressive reshaper curve when the bit rate is low and use less aggressive reshaping when the bit rate is relatively high. For example, given 32 bins in 10-bit sequences, each bin has initially 32 codewords. When the bit rate is relative low, one can use codewords between [28 40] to choose codewords for each bin. When the bit rate is high, one can choose codewords between [31 33] for each bin or one can simply use an identity reshaper curve.
Given a slice (or a tile), reshaping at the slice (tile) level can be performed in a variety of ways that may trade-off coding efficiency with complexity, including: 1) disable reshaping in intra slices only; 2) disable reshaping in specific inter slices, such as inter slices on particular temporal level(s), or in inter slices which are not used for reference pictures, or in inter slices which are considered to be less important reference pictures. Such slice adaption can also be QP/rate dependent, so that different adaption rules can be applied for different QPs or bit rates.
In an encoder, under the proposed algorithm, a variance is computed for each bin (e.g., BinVar(b) in equation (13)). Based on that information, one can allocate codewords based on each bin variance. In one embodiment, BinVar(b) may be inversely linearly mapped to the number of codewords in each bin b. In another embodiment, non-linear mappings such as (BinVar(b))2, sqrt(BinVar(b)), and the like, may be used to inversely map the number of codewords in bin b. In essence, this approach allows an encoder to apply arbitrary codewords to each bin, beyond the simpler mapping used earlier, where the encoder allocated codewords in each bin using the two upper-range values Mf and Ma (e.g., see
As an example,
alpha=(minCW−maxCW)/(maxVar−minVar);
beta=(maxCW*maxVar−minCW*minVar)/(maxVar−minVar);
bin_cw=round(alpha*bin_var+beta),
where minVar denotes the minimum variance across all bins, maxVar denotes the maximum variance across all bins, and minCW, maxCW denote the minimum and maximum number of codewords per bin, as determined by the reshaping model.
In Ref. [6], to compensate for the interaction between luma and chroma, an additional chroma QP offset (denoted as chromaDQP or cQPO) and a luma-based chroma residual scaler (cScale) were defined. For example:
chromaQP=QP_luma+chromaQPOffset+cQPO, (28)
where chromaQPOffset denotes a chroma QP offset, and QP_luma denotes the luma QP for the coding unit. As presented in Ref. [6], in an embodiment
cQPO=−6*log 2(FwdLUT′[
where FwdLUT′ denotes the slope (first order derivative) of the FwdLUT( ). For an inter slice,
cScale=FwdLUT′[
where y=pow(2,x) denotes the y=2x function.
Given the non-linear relationship between luma-derived QP values (denoted as qPi) and the final chroma QP values (denoted as Qpc) (for example, see Table 8-10, “Specification of Qpc as a function of qPi for ChromaArrayType equal to 1” in Ref [4]), in an embodiment cQPO and cScale may be further adjusted as follows.
Denote as f_QPi2QPc( ) a mapping between adjusted luma and chroma QP values, e.g., as in Table 8-10 of Ref. [4], then
For scaling the chroma residual, the scale need to be calculated based on the real difference between the actual chroma coding QP, both before applying cQPO and after applying cQPO:
QPcBase=f_QPi2QPc[QP_luma+chromaQPOffset];
QPcFinal=f_QPi2QPc[QP_luma+chromaQPOffset+cQPO]; (32)
cQPO_refine=QPcFinal−QpcBase;
cScale=pow(2,−cQPO_refine/6).
In another embodiment, one can absorb chromaQPOffset into cScale too. For example,
QPcBase=f_QPi2QPc[QP_luma];
QPcFinal=f_QPi2QPc[QP_luma+chromaQPOffset+cQPO]; (33)
cTotalQPO_refine=QPcFinal−QpcBase;
cScale=pow(2,−cTotalQPO_refine/6).
As an example, as described in Ref. [6], in an embodiment:
Let CSCALE_FP_PREC=16 denote a precision parameter
In an alternative embodiment, the operations for in-loop chroma reshaping may be expressed as follows. At the encoder side, for the residue (CxRes=CxOrg−CxPred) of chroma component Cx (e.g., Cb or Cr) of each CU or TU,
CxResScaled=CxRes*cScale[
where CxResScaled is the scaled Cb or Cr residue signal of the CU to be transformed and quantized. At the decoder side, CxResScaled is the scaled chroma residue signal after inverse quantization and transform, and
CxRes=CxResScale/cScale[
The final reconstruction of chroma component is
CxRec=CxPred+CxRes. (36)
This approach allows the decoder to start inverse quantization and transform operations for chroma decoding immediately after syntax parsing. The cScale value being used for a CU may be shared by the Cb and Cr components, and from equations (29) and (30), it may be derived as:
where
Using cScale is not limited to chroma residue scaling for in-loop reshaping. The same method can be applied for out-of-loop reshaping as well. In an out of loop reshaping, cScale may be used for chroma samples scaling. The operations are the same as in the in-loop approach.
At the encoder side, when computing the chroma RDOQ, the lambda modifier for chroma adjustment (either when using QP offset or when using chroma residue scaling) also needs to be calculated based on the refined offset:
Modifier=pow(2,−cQPO_refine/3);
New_lambda=Old_lambda/Modifier. (38)
As noted in equation (35), using cScale may require a division in the decoder. To simplify the decoder implementation, one may decide to implement the same functionality using a division in the encoder and apply a simpler multiplication in the decoder. For example, let
cScaleInv=(1/cScale)
then, as an example, on an encoder
cResScale=CxRes*cScale=CxRes/(1/cScale)=CxRes/cScaleInv,
and on the decoder
CxRes=cResScale/cScale=CxRes*(1/cScale)=CxRes*cScaleInv.
In an embodiment, each luma-dependent chroma scaling factor may be calculated for a corresponding luma range in the piece-wise linear (PWL) representation instead of for each luma codeword value. Thus, chroma scaling factors may be stored in a smaller LUT (e.g., with 16 or 32 entries), say, cScaleInv[binIdx], instead of the 1024-entry LUT (for 10-bit Luma codewords) (say, cScale[Y]). The scaling operations at both the encoder and the decoder side may be implemented with fixed point integer arithmetic as follows:
c′=sign(c)*((abs(c)*s+2CSCALE_FP_PREC−1)>>CSCALE_FP_PREC),
where c is the chroma residual, s is the chroma residual scaling factor from cScaleInv[binIdx], binIdx is decided by the corresponding average luma value, and CSCALE_FP_PREC is a constant value related to precision.
In an embodiment, while the forward reshaping function may be represented using N equal segments (e.g., N=8, 16, 32, and the like), the inverse representation will comprise non-linear segments. From an implementation point of view, it is desirable to have a representation of the inverse reshaping function using equal segments as well; however, forcing such a representation may cause loss in coding efficiency. As a compromise, in an embodiment one may be able to construct an inverse reshaping function with a “mixed” PWL representation, combining both equal and unequal segments. For example, when using 8 segments, one may first divide the whole range to two equal segments, and then subdivide each of these into 4 unequal segments. Alternatively, one may divide the whole range into 4 equal segments and then subdivide each one into two unequal segments. Alternatively, one may first divide the whole range into several unequal segments, then subdivide each unequal segment into multiple equal segments. Alternatively, one may first divide the whole range into two equal segments, and then subdivide each equal segment into equal sub-segments, where the segment length in each group of sub-segments is not the same.
For example, without limitation, with 1,024 codewords, one could have: a) 4 segments with 150 codewords each and two segments with 212 codewords each, or b) 8 segments with 64 codewords each and 4 segments with 128 codewords each. The general purpose of such a combination of segments is to reduce the number of comparisons required to identify the PWL-piece index given a code value, thus simplifying hardware and software implementations.
In an embodiment, for a more efficient implementation related to chroma residue scaling, the following variations may be enabled:
As an example, given the decoder depicted in
As depicted in
From equation (34), at the decoder side, let CxResScaled denote the extracted scaled chroma residual signal after inverse quantization and transform (before block 288), and let
CxRes=CxResScaled*CScaleInv
denote the rescaled chroma residual generated by the Chroma Residual scaling block (288) to be used by the reconstruction unit (285-C) to compute CxRec=CxPred+CxRes, where CxPred is generated either by the Intra (284) or Inter (280) Prediction blocks.
The CScaleInv value being used for a Transform Unit (TU) may be shared by the Cb and Cr components and can be computed as follows:
Disabling luma-based chroma residual scaling for intra slices with dual trees may cause some loss in coding efficiency. To improve the effects of chroma reshaping, the following methods may be used:
In AVC and HEVC, the parameter delta_qp is allowed to modify the QP value for a coding block. In an embodiment, one can use the luma curve in the reshaper to derive the delta_qp value. One can derive a piece-wise lumaDQP value based on the codeword assignment for each bin. For example: for the k-th bin,
scalek=(Mk/Ma); (39)
lumaDQPk=INT(6*log 2(scalek)),
where INT( ) can be CEIL( ) ROUND( )or FLOOR( ). The encoder can use a function of luma, e.g., average(luma), min(luma), max(luma), and the like, to find the luma value for that block, then use the corresponding lumaDQP value for that block. To get the rate-distortion benefit, from equation (27), one can use weighted distortion in mode decision and set
W
rsp(k)=scalek2.
In typical 10-bit video coding, it is preferable to use at least 32 bins for the reshaping mapping; however, to simplify the decoder implementation, in an embodiment, one may use fewer bins, say 16, or even 8 bins. Given that an encoder may already being using 32 bins to analyze the sequence and derive the distribution codeword, one can reuse the original 32-bin codeword distribution and derive the 16 bins-codewords by adding the corresponding two 16-bins inside each 32 bins, i.e.,
for i=0 to 15
For the chroma residue scaling factor, one can simply divide the codeword by 2, and point to the 32-bins chromaScalingFactorLUT. For example, given
the corresponding 16-bins CW allocation is
This approach can be extended to handle even fewer bins, say 8, then,
for i =0 to 7
When using a narrow range of valid codewords (e.g., [64, 940] for 10-bit signals and [64, 235] for 8-bit signals), care should be taken that the first and last bin do not consider mapping to reserved codewords. For example, for a 10-bit signal, with 8 bins, each bin will have 1024/8=128 codewords, and the first bin will be [0, 127]; however, since the standard codeword range is [64, 940], the first bin should only consider codewords [64, 127]. A special flag, (e.g., video_full_range_flag =0) may be used to notify the decoder that the input video has a narrower range than the full range [0, 2bitdepth−1] and that special care should be taken to not generate illegal codewords when processing the first and last bins. This is applicable to both luma and chroma reshaping.
As an example, and without limitation, Appendix 2 provides an example syntax structure and associated syntax elements to support reshaping in the ISO/ITU Video Versatile Codec (VVC) (Ref. [8]) according to an embodiment using the architectures depicted in
Each one of the references listed herein is incorporated by reference in its entirety.
[8] B. Bross, J. Chen, and S. Liu, “Versatile Video Coding (Draft 3),” JVET output document, JVET-L1001, v9, uploaded, Jan. 8, 2019.
Embodiments of the present invention may be implemented with a computer system, systems configured in electronic circuitry and components, an integrated circuit (IC) device such as a microcontroller, a field programmable gate array (FPGA), or another configurable or programmable logic device (PLD), a discrete time or digital signal processor (DSP), an application specific IC (ASIC), and/or apparatus that includes one or more of such systems, devices or components. The computer and/or IC may perform, control, or execute instructions relating to signal reshaping and coding of images, such as those described herein. The computer and/or IC may compute any of a variety of parameters or values that relate to the signal reshaping and coding processes described herein. The image and video embodiments may be implemented in hardware, software, firmware and various combinations thereof.
Certain implementations of the invention comprise computer processors which execute software instructions which cause the processors to perform a method of the invention. For example, one or more processors in a display, an encoder, a set top box, a transcoder or the like may implement methods related to signal reshaping and coding of images as described above by executing software instructions in a program memory accessible to the processors. The invention may also be provided in the form of a program product. The program product may comprise any non-transitory and tangible medium which carries a set of computer-readable signals comprising instructions which, when executed by a data processor, cause the data processor to execute a method of the invention. Program products according to the invention may be in any of a wide variety of non-transitory and tangible forms. The program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like. The computer-readable signals on the program product may optionally be compressed or encrypted.
Where a component (e.g. a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a “means”) should be interpreted as including as equivalents of that component any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.
Example embodiments that relate to the efficient signal reshaping and coding of images are thus described. In the foregoing specification, embodiments of the present invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
The invention may be embodied in any of the forms described herein, including, but not limited to the following Enumerated Example Embodiments (EEEs) which describe structure, features, and functionality of some portions of the present invention.
accessing with a processor an input image in a first codeword representation; and
generating a forward reshaping function mapping pixels of the input image to a second codeword representation, wherein the second codeword representation allows for a more efficient compression than the first codeword representation, wherein generating the forward reshaping function comprises:
dividing the input image into multiple pixel regions;
assigning each of the pixel regions to one of multiple codeword bins according to a first luminance characteristic of each pixel region;
computing a bin metric for each one of the multiple codeword bins according to a second luminance characteristic of each of the pixel regions assigned to each codeword bin;
allocating a number of codewords in the second codeword representation to each codeword bin according to the bin metric of each codeword bin and a rate distortion optimization criterion;
and generating the forward reshaping function in response to the allocation of codewords in the second codeword representation to each of the multiple codeword bins.
assigning no codewords to the codeword bin, if no pixel regions are assigned to the codeword bin;
assigning a first number of codewords if the bin metric of the codeword bin is lower than an upper threshold value; and
assigning a second number of codewords to the codeword bin otherwise.
defining a set of potential threshold values;
for each threshold in the set of threshold values:
and selecting as the upper threshold the threshold value in the set of potential threshold values for which the RDO metric is minimum.
a flag indicating a minimum codeword bin index value to be used in a reshaping reconstruction process,
a flag indicating a maximum codeword bin index value to be used in the reshaping construction process,
a flag indicating a reshaping model profile type, wherein each model profile type is associated with default bin-related parameters, or
one or more delta values to be used to adjust the default bin-related parameters.
0 if no codewords are assigned to the codeword bin;
2 if the first value of codewords is assigned to the codeword bin; and
1 otherwise.
dividing the luminance range of the pixel values in the input image into bins;
for each bin, determining a bin-histogram value and an average bin-variance value, wherein for a bin, the bin-histogram value comprises the number of pixels in the bin over the total number of pixels in the image and the average bin-variance value provides a metric of the average pixel variance of the pixels in the bin;
sorting the average bin variance values to generate a sorted list of average bin variance values and a sorted list of average bin variance-value indices;
computing a cumulative density function as a function of the sorted average bin variance values based on the bin-histogram values and the sorted list of average bin variance-value indices; and
determining the upper threshold based on a criterion satisfied by values of the cumulative density function.
where b denotes a bin number, PIC_ANALYZE_CW_BINS denotes the total number of bins, BinVarSortDsdCDF[b] denotes the output of the CDF function for bin b, BinHist[i] denotes the bin-histogram value for bin i, and BinIdxSortDsd[] denotes the sorted list of average bin variance-value indices.
receiving in a coded bitstream syntax elements characterizing a reshaping model, wherein the syntax elements include one or more of
a flag indicating a minimum codeword bin index value to be used in a reshaping construction process,
a flag indicating a maximum codeword bin index value to be used in a reshaping construction process,
a flag indicating a reshaping model profile type, wherein the model profile type is associated with default bin-relating parameters, including bin importance values, or
a flag indicating one or more delta bin importance values to be used to adjust the default bin importance values defined in the reshaping model profile;
determining based on the reshaping model profile the default bin importance values for each bin and an allocation list of a default numbers of codewords to be allocated to each bin according to the bin's importance value;
for each codeword bin:
generating a forward reshaping function based on the number of codewords allocated to each codeword bin.
where Ma and Mf are elements of the allocation list and bin_importance[k] denotes the bin importance value of the k-th bin.
receiving a coded bitstream (122) comprising one or more coded reshaped images in a first codeword representation and metadata (207) related to reshaping information for the coded reshaped images;
generating (250) an inverse reshaping function based on the metadata related to the reshaping information, wherein the inverse reshaping function maps pixels of the reshaped image from the first codeword representation to a second codeword representation;
generating (250) a forward reshaping function based on the metadata related to the reshaping information, wherein the forward reshaping function maps pixels of an image from the second codeword representation to the first codeword representation;
extracting from the coded bitstream a coded reshaped image comprising one or more coded units, wherein for one or more coded units in the coded reshaped image:
for an intra-coded coding unit (CU) in the coded reshaped image:
for an inter-coded coding unit in the coded reshaped image:
generating a decoded image based on the stored samples in the reference buffer.
Example implementation of bubble sort.
As an example, this Appendix provides an example syntax structure and associated syntax elements according to an embodiment to support reshaping in the Versatile Video Codec (VVC) (Ref. [8]), currently under joint development by ISO and ITU. New syntax elements in the existing draft version are either in an italic font or explicitly noted. Equation numbers like (8-xxx) denote placeholders to be updated, as needed, in the final specification.
u(1)
if ( sps_reshaper_enabled_flag ) {
tile_group_reshaper_model_present_flag
u(1)
if ( tile_group_reshaper_model_present_flag )
tile_group_reshaper_model ( )
tile_group_reshaper_enable_flag
u(1)
if ( tile_group_reshaper_enable_flag && (!( qtbtt_dual_tree_intra_
flag && tile_group_type == 1 ) ) )
tile_group_reshaper_chroma_residual_scale_flag
u(1)
}
Add a new syntax table tile group reshaper model:
tile_group_reshaper_model ( ) {
reshaper_model_min_bin_idx
ue(v)
reshaper_model_delta_max_bin_idx
ue(v)
reshaper_model_bin_delta_abs_cw_prec_minus1
ue(v)
for ( i=reshaper_model_min_bin_idx; i<=
reshaper_model_max_bin_idx; i++ ) {
reshaper_model_bin_delta_abs_CW [i]
u(v)
if ( reshaper_model_bin_delta_abs_
CW
[
i
] ) >0 )
reshaper_model_bin_delta_sign_CW_flag[ i ]
u(1)
}
}
In General sequence parameter set RBSP semantics, add the following semantics:
sps_reshaper_enabled_flag equal to 1 specifies that reshaper is used in the coded video sequence (CVS). sps_reshaper_enabled_flag equal to 0 specifies that reshaper is not used in the CVS.
In Tile Group Header Syntax, Add the Following Semantics
tile_group_reshaper_model_present_flag equal to 1 specifies tile_group_reshaper_model( ) is present in tile group header. tile_group_reshaper_model_present_flag equal to 0 specifies tile_group_reshaper_model( ) is not present in tile group header. When tile_group_reshaper_model_present_flag is not present, it is inferred to be equal to 0.
tile_group_reshaper_enabled_flag equal to 1 specifies that reshaper is enabled for the current tile group. tile_group_reshaper_enabled_flag equal to 0 specifies that reshaper is not enabled for the current tile group. When tile_group_reshaper_enable_flag is not present, it is inferred to be equal to 0.
tile_group_reshaper_chroma_residual_scale_flag equal to 1 specifies that chroma residual scaling is enabled for the current tile group. tile_group_reshaper_chroma_residual_scale_flag equal to 0 specifies that chroma residual scaling is not enabled for the current tile group. When tile_group_reshaper_chroma_residual_scale_flag is not present, it is inferred to be equal to 0.
reshaper_model_min_bin_idx specifies the minimum bin (or piece) index to be used in the reshaper construction process. The value of reshaper_model_min_bin_idx shall be in the range of 0 to MaxBinldx, inclusive. The value of MaxBinldx shall be equal to 15.
reshaper_model_delta_max_bin_idx specifies the maximum allowed bin (or piece) index MaxBinldx minus the maximum bin index to be used in the reshaper construction process. The value of reshaper_model_max_bin_idx is set equal to MaxBinIdx
InputPivot[i]=i*OrgCW
The variable ReshapePivot[i] with i in the range of 0 to MaxBinIdx+1, inclusive, the variable ScaleCoef[i] and InvScaleCoeff[i] with i in the range of 0 to MaxBinIdx, inclusive, are derived as follows:
The variable ChromaScaleCoef[i] with i in the range of 0 to MaxBinIdx, inclusive, are derived as follows:
shiftC =11
Inputs to this process are:
predSamplesComb[x][y]=(w* predSamplesIntra[x][y]+(8−w)*predSamplesInter[x][y])>>3) (8-740)
Add the Following in Picture Reconstruction Process
Inputs to this process are:
recSamples[xCurr+i][yCurr+j] =clipCidx1(predSamples[i][j]+resSamples[i][j]) (8-xxx)
predMapSamples[xCurr+i][yCurr+j]=predSamples[i][j] (8-xxx)
predMapSamples[xCurr+i][yCurr+j]=ReshapePivot[idxY]+(ScaleCoeff[idxY]*(predSamples[i][j]−InputPivot[idxY])+(1<<shiftY (8-xxx)
The recSamples is derived as follows:
recSamples[xCurr+i][yCurr+j]=Clip1Y(predMapSamples[xCurr+i][yCurr+j]+resSamples[i][j] (8-xxx)
Inputs to this process are:
recSamples[xCurr+i][yCurr+j]=Clip1C(predSamples[i][j]+resSamples[i][j]) (8-xxx)
recSamples[xCurr+i][yCurr+j]=ClipCidx1(predSamples[i][j]+Sign(resSamples[i][j])*((Abs(resSamples[i][j])*varScale+(1<<(shiftC−1)))>>shiftC)) (8-xxx)
recSamples[xCurr+i][yCurr+j]=ClipCidx1(predSamples[i][j]) (8-xxx)
This clause is invoked when the value of tile_group_reshaper_enabled_flag is equal to 1. The input is reconstructed picture luma sample array SL and the output is modified reconstructed picture luma sample array S′L after inverse mapping process.
The inverse mapping process for luma sample value is specified in 8.4.6.1.
Inputs to this process is a luma location (xP, yP) specifying the luma sample location relative to the top-left luma sample of the current picture.
Outputs of this process is a inverse mapped luma sample value invLumaSample.
The value of invLumaSample is derived by applying the following ordered steps:
invLumaSample=InputPivot[idxYInv]+(InvScaleCoeff[idxYInv]*(SL[xP][yP]−ReshapePivot[idxYInv])+(1<<(shiftY−1)))>>shiftY (8-xxx)
Inputs to this process are a luma sample value S.
Output of this process is an index idxS identifing the piece to which the sample S belongs.
The variable idxS is derived as follows:
Note, an alternative implementation to find the identification idxS is as following:
This application claims the benefit of priority to U.S. Provisional Patent Applications Ser. No. 62/792,122, filed on Jan. 14, 2019, Ser. No. 62/782,659, filed on Dec. 20, 2018, Ser. No. 62/772,228, filed on Nov. 28, 2018, Ser. No. 62/739,402, filed on Oct. 1, 2018, Ser. No. 62/726,608, filed on Sep. 4, 2018, Ser. No. 62/691,366, filed on Jun. 28, 2018, and Ser. No. 62/630,385, filed on Feb. 14, 2018, each of which is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/017891 | 2/13/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62630385 | Feb 2018 | US | |
62691366 | Jun 2018 | US | |
62726608 | Sep 2018 | US | |
62739402 | Oct 2018 | US | |
62772228 | Nov 2018 | US | |
62782659 | Dec 2018 | US | |
62792122 | Jan 2019 | US |