This application relates generally to systems and methods of encoding high dynamic range (HDR) video content using reshaping algorithms.
As used herein, the term ‘dynamic range’ (DR) may relate to a capability of the human visual system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest grays (blacks) to brightest whites (highlights). In this sense, DR relates to a ‘scene-referred’ intensity. DR may also relate to the ability of a display device to adequately or approximately render an intensity range of a particular breadth. In this sense, DR relates to a ‘display-referred’ intensity. Unless a particular sense is explicitly specified to have particular significance at any point in the description herein, it should be inferred that the term may be used in either sense, e.g. interchangeably.
As used herein, the term high dynamic range (HDR) relates to a DR breadth that spans some 14-15 orders of magnitude of the human visual system (HVS). In practice, the DR over which a human may simultaneously perceive an extensive breadth in intensity range may be somewhat truncated, in relation to HDR. As used herein, the terms enhanced dynamic range (EDR) or visual dynamic range (VDR) may individually or interchangeably relate to the DR that is perceivable within a scene or image by a human visual system (HVS) that includes eye movements, allowing for some light adaptation changes across the scene or image.
In practice, images comprise one or more color components (e.g., luma Y and chroma Cb and Cr) wherein each color component is represented by a precision of n-bits per pixel (e.g., n=8). Using linear luminance coding, images where n<8 are considered images of standard dynamic range, while images where n>8 (e.g., color 24-bit JPEG images) may be considered images of enhanced dynamic range. EDR and HDR images may also be stored and distributed using high-precision (e.g., 16-bit) floating-point formats, such as the OpenEXR file format developed by Industrial Light and Magic.
As used herein, the term “metadata” relates to any auxiliary information that is transmitted as part of the coded bitstream and assists a decoder to render a decoded image. Such metadata may include, but are not limited to, color space or gamut information, reference display parameters, and auxiliary signal parameters, as those described herein.
Most consumer desktop displays currently support luminance of 200 to 300 cd/m2 or nits. Most consumer HDTVs range from 300 to 500 nits with new models reaching 1000 nits (cd/m2). Such conventional displays thus typify a lower dynamic range (LDR), also referred to as a standard dynamic range (SDR), in relation to HDR or EDR. As the availability of HDR content grows due to advances in both capture equipment (e.g., cameras) and HDR displays (e.g., the PRM-4200 professional reference monitor from Dolby Laboratories), HDR content may be color graded and displayed on HDR displays that support higher dynamic ranges (e.g., from 1,000 nits to 5,000 nits or more).
Gadgil, Neeraj J. et al.: “Efficient Banding-Alleviating Inverse Tone Mapping for High Dynamic Range Video”, 53rd Asilomar Conference on Signals, Systems, and Computers, IEEE, 3 Nov. 2019, pages 1885-1889, XP033750575, discloses an approach to constructing an HDR image from a standard dynamic range (SDR) image is to use inverse tone mapping (iTM). However, it can create or amplify visual artifacts such as banding/false contouring in the resulting HDR images. To address this, a novel method is proposed to efficiently construct iTM to reduce banding in the highlight regions of HDR images. The proposed approach uses a given iTM curve to estimate the banding-risk in each luminance range, based on the input SDR image properties. Then, the risk measure is used to adjust the local slope of the iTM to avoid banding in the resulting HDR images. Experimental results exhibit that the proposed method is highly effective in reducing banding in the HDR images.
WO 2020/033573 A1 discloses methods and systems for reducing banding artifacts when displaying high-dynamic-range images. Given an input image in a first dynamic range, and an input backward reshaping function mapping codewords from the first dynamic range to a second dynamic range, wherein the second dynamic range is equal or higher than the first dynamic range, statistical data based on the input image and the input backward reshaping function are generated to estimate the risk of banding artifacts in a target image in the second dynamic range generated by applying the input backward reshaping function to the input image. Separate banding alleviation algorithms are applied in the darks and highlights parts of the first dynamic range to generate a modified backward reshaping function, which when applied to the input image to generate the target image eliminates or reduces banding in the target image.
WO 2020/072651 A1 discloses methods and systems for reducing banding artifacts when displaying high-dynamic-range images reconstructed from coded reshaped images. Given an input image in a high dynamic range (HDR) which is mapped to a second image in a second dynamic range, banding artifacts in a reconstructed HDR image generated using the second image are reduced by a) in darks and mid-tone regions of the input image, adding noise to the input image before being mapped to the second image, and b) in highlights regions of the input image, modifying an input backward reshaping function, wherein the modified backward reshaping function will be used by a decoder to map a decoded version of the second image to the reconstructed HDR image. An example noise generation technique using simulated film-grain noise is provided.
EP 3 203 442 A1 discloses a processor for signal reshaping that receives an input image with an input bit depth. Block-based standard deviations are computed. The input codewords are divided into codeword bins and each bin is assigned a standard deviation value. For each bin, a standard deviation to bit-depth function is applied to the bin values to generate minimal bit depth values for each codeword bin. An output codeword mapping function is generated based on the input bit depth, a target bit depth, and the minimal bit depth values. The codeword mapping function is applied to the input image to generate an output image in the target bit depth.
The invention is defined by the independent claims. The dependent claims concern optional features of some embodiments of the invention. In growing uses for HDR content, such as cloud-based gaming, there is a need to transmit HDR video data to target display devices (e.g., a TV) using encoding, such as 8-bit base layer (BL) that has minimum latency. For cloud gaming cases specifically, 8-bit advanced video coding (AVC) BL may be needed. Accordingly, encoders for such cases need to transfer HDR content to a lower bit-depth-domain and provide metadata for the receiving decoder such that the decoder reconstructs the HDR content from the decompressed BL.
Additionally, for cloud-based gaming and other real-time uses of HDR content, there is a need for low latency and lightweight computations. Accordingly, feature-based efficient reshaping algorithms for converting HDR content to BL and generating backwards reshaping metadata for reconstructing the HDR content may be used. Bitstreams may be generated that allows for an eight-piece polynomial function for luma reshaping and two-piece polynomials for chroma reshaping. This avoids heavy computation, reducing latency.
Additionally, 8-bit BL may experience banding artifacts in the reconstructed HDR content. Banding generally exhibits in the smoother regions of an image. The visibility of banding is dependent on how large (e.g., how many pixels) the affected region is relative to the image as a whole. By having a content-adaptive non-linear reshaping function, banding is minimized.
Proposed systems and methods collect and use block-based image statistics, such as the standard deviation and histogram image statistics in the luma channel. These statistics are used to construct an image-feature as a function of discrete luma codeword-ranges (known as “bins”). The value of this binwise-feature indicates which bin has the greatest need for codewords. The identified bin is assigned as the “peak” bin for a functional curve (e.g., a Gaussian curve, a parabolic curve, or the like) that encompasses the entire luma codeword range. The shape of the curve is determined by the relative values of the feature. This functional curve is used to compute a forward reshaping function for the image. The forward reshaping function is used to compress the HDR video data, as described in more detail below. Additionally, to improve banding reduction performance, synthetically-generated film-grain noise of a fixed maximum strength can be injected to the HDR luma channel prior to reshaping. Accordingly, the proposed encoding framework is computationally efficient to meet the low delay requirement and is effective in reducing banding in the reconstructed HDR content.
Various aspects of the present disclosure relate to devices, systems, and methods for encoding video data using reshaping algorithms. While certain embodiments are directed to HDR video data, video data may also include Standard Dynamic Range (SDR) video data and other User Generated Content (UGC), such as gaming content.
In one exemplary aspect of the present disclosure, there is provided a video delivery system for context-based encoding of video data. The delivery system comprises a processor to perform encoding of video data. The processor is configured to receive the video data, the video data composed of a plurality of image frames, each image frame including a plurality of pixel blocks. The processor is configured to determine, for each pixel block, a luma bin index, determine, for each luma bin, a banding risk value, and determine Gaussian function parameters based on the banding risk value. The processor is configured to generate a differential reshaping function using the Gaussian function parameters, compute a luma-based forward reshaping function based on the differential reshaping function, and generate an output image for each image frame by applying the luma-based forward reshaping function to the respective image frame.
In another exemplary aspect of the present disclosure, there is provided a method for context-based encoding of video data. The method includes receiving the video data, the video data composed of a plurality of image frames, each image frame including a plurality of pixel blocks. The method includes determining, for each pixel block, a luma bin index, determining, for each luma bin, a banding risk value, and determining Gaussian function parameters based on the banding risk value. The method includes generating a differential reshaping function using the Gaussian function parameters, computing a luma-based forward reshaping function based on the differential reshaping function, and generating an output image for each image frame by applying the luma-based forward reshaping function to the respective image frame.
In another exemplary aspect of the present disclosure, there is provided a non-transitory computer-readable medium storing instructions that, when executed by a processor of a video delivery system, cause the video delivery system to perform operations comprising receiving the video data, the video data composed of a plurality of image frames, each image frame including a plurality of pixel blocks, determining, for each pixel block, a luma bin index, determining, for each luma bin, a banding risk value, determining Gaussian function parameters based on the banding risk value, generating a differential reshaping function using the Gaussian function parameters, computing a luma-based forward reshaping function based on the differential reshaping function, and generating an output image for each image frame by applying the luma-based forward reshaping function to the respective image frame.
In this manner, various aspects of the present disclosure provide for the display of images having a high dynamic range and high resolution, and effect improvements in at least the technical fields of image projection, holography, signal processing, and the like.
These and other more detailed and specific features of various embodiments are more fully disclosed in the following description, reference being had to the accompanying drawings, in which:
This disclosure and aspects thereof can be embodied in various forms, including hardware, devices or circuits controlled by computer-implemented methods, computer program products, computer systems and networks, user interfaces, and application programming interfaces; as well as hardware-implemented methods, signal processing circuits, memory arrays, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), and the like. The foregoing is intended solely to give a general idea of various aspects of the present disclosure, and does not limit the scope of the disclosure in any way.
In the following description, numerous details are set forth, such as optical device configurations, timings, operations, and the like, in order to provide an understanding of one or more aspects of the present disclosure. It will be readily apparent to one skilled in the art that these specific details are merely exemplary and not intended to limit the scope of this application.
Moreover, while the present disclosure focuses mainly on examples in which the various circuits are used in digital projection systems, it will be understood that these are merely examples. It will further be understood that the disclosed systems and methods can be used in any device in which there is a need to project light; for example, cinema, consumer, and other commercial projection systems, heads-up displays, virtual reality displays, and the like. Disclosed systems and methods may be implemented in additional display devices, such as with an OLED display, an LCD display, a quantum dot display, or the like.
The video data of production stream (112) is then provided to a processor (or one or more processors such as a central processing unit (CPU)) at block (115) for post-production editing. Block (115) post-production editing may include adjusting or modifying colors or brightness in particular areas of an image to enhance the image quality or achieve a particular appearance for the image in accordance with the video creator's creative intent. This is sometimes called “color timing” or “color grading.” Other editing (e.g. scene selection and sequencing, image cropping, addition of computer-generated visual special effects, etc.) may be performed at block (115) to yield a final version (117) of the production for distribution. During post-production editing (115), video images are viewed on a reference display (125).
Following post-production (115), video data of final production (117) may be delivered to encoding block (120) for delivering downstream to decoding and playback devices such as television sets, set-top boxes, movie theaters, and the like. In some embodiments, coding block (120) may include audio and video encoders, such as those defined by ATSC, DVB, DVD, Blu-Ray, and other delivery formats, to generate coded bit stream (122). Methods described herein may be performed by the processor at block (120). In a receiver, the coded bit stream (122) is decoded by decoding unit (130) to generate a decoded signal (132) representing an identical or close approximation of signal (117). The receiver may be attached to a target display (140) which may have completely different characteristics than the reference display (125). In that case, a display management block (135) may be used to map the dynamic range of decoded signal (132) to the characteristics of the target display (140) by generating display-mapped signal (137). Additional methods described herein may be performed by the decoding unit (130) or the display management block (135). Both the decoding unit (130) and the display management block (135) may include their own processor, or may be integrated into a single processing unit.
As mentioned above, a Gaussian function may be used for determining the reshaping function. A Gaussian function y(⋅) for an input x is defined as:
where μG is the mean and σG is the standard deviation of the underlying Gaussian distribution.
Without loss of generality, a=1 may be set such that the maximum of the Gaussian function is 1 for given inputs.
producing a bell-shaped curve. The maximum value of the curve is center at the means of μG=0.5 and tapers down symmetrically in both directions as x moves away from the mean. Varying the mean μG varies the “location” of the peak with respect to x. Additionally, varying
henceforth referred to as kG, varies the width of the Gaussian function. Specifically, a higher value of kG results in a steeper bell-shaped curve, whereas lower values of kG “flatten” the curve.
Let (viY, viCb, viCr) be the YCbCr values at pixel i of an original HDR image of bit-depth ηv and spatial dimensions (W×H). Pixels are assumed to be stored in a data-structure in raster-scan order. There are total Nv=2η
for p-channel, p={Y, Cb, Cr}. Let (vLY, vHY), (vLCb, vHCb), (vLCr, vHCr) be the minimum and maximum values in Y, Cb, Cr channels and (
Let sip be the forward-reshaped (BL) signal in the p-axis. There are Ns=2η
Let TpF(⋅):[0, Nv−1]→[0, Ns−1] be the single-channel forward reshaping (integer-valued) function for p-channel, where p can be one of Y, Cb, Cr channels. It can be stored in the form of look-up table (LUT), known as forward LUT (FLUT). Let sip be the resulting reshaped i'th pixel value of p-channel:
The FLUT TpF(⋅) can also be constructed using a normalized FLUT
where round[⋅] is rounding operation and clip3(value, a, b) is clipping operation that limits value between [a,b]
Let
Using the BL and the BLUT, the reconstructed normalized HDR value in pth channel at pixel i is
and the reconstructed ηv-bit HDR is: vi(r)p=clip3(round [Nv⋅
The normalized BLUT
where m is such that ρmp≤Si(r)p<ρm+1p. Note that, as a standard practice, TpF(⋅) and TpB(⋅) may be monotonically non-decreasing functions.
A forward reshaping function is a monotonically non-decreasing function that transfers a higher bit-depth codeword (e.g., 16-bit [0, 65535]) to a lower bit-depth (e.g., 8-bit [0,255]). The forward reshaping function may be expressed as a forward look-up table (FLUT). The FLUT can be constructed using a differential look-up table (dLUT) that specifies the amount of increment to the previous value of FLUT to get the current value. For example:
where
Equation 6 also gives the expression for the FLUT using cumulative summation of all small increments up to the current codeword:
Accordingly, the dLUT defines the local slope of its corresponding FLUT. The dLUT specifies how many codewords are allocated to a given luminance range. A higher value of dLUT indicates more codewords in that range. Conversely, the luma range that needs to spend more codewords for its corresponding image-content to be reconstructed needs a higher dLUT value compared to other luma ranges. Additionally, the dLUT may be based on image statistics such that the codewords are allocated in a way that removes banding.
In some embodiments, a Gaussian curve is selected as the dLUT. Such a dLUT is defined as:
Shifting the mean towards vHY while keeping the kG constant results in the dLUT of
If the kG parameter is lowered, the Gaussian curve becomes more “flat” around its peak, as shown in the dLUT of
As described above, a content-based non-linear luma reshaping function may be generated using Gaussian-dLUT curves. The backward reshaping metadata is approximated in the form of an 8-piece 1st order polynomial curve.
At step (704), the encoder (200) collects image statistics from the input HDR image. The image statistics may include the minimum and maximum pixel values from luma and chroma channels of the input HDR image (vLY, vHY), (vLCb, vHCb), and (vLCr, vHCr). In some implementations, the image statistics includes letterbox detection in which the letterbox or pillarbox in an image is identified. The start and end rows of “active”-content (non-letterbox) as (rminv, rmaxv), and the columns as (cminv, Cmaxv), using a letterbox-detection algorithm. Additionally, an average block-based standard deviation (BLKSTD (
Processing blockwise pixels reduces needed computations compared to a pixel-by-pixel approach. For blockwise pixels, let the entire HDR luma codeword range be divided into non-overlapping NB codeword-intervals (bins), b=0, 1, . . . . NB−1. Each such interval is a luma bin containing
codewords. Nv must be a multiple of NB. For example, NB=64 for ηv=16 bit-depth signal means each luma bin contains
codewords. Let vb,cY be the HDR codeword at the center of bin b and
Let the non-letterbox (active content) part of the luma image be indexed by rows: (rminv, rmaxv) and columns: (cminv, cmaxv), and be divided into non-overlapping square pixel-blocks of (ωB×ωB) pixels. A k'th block is indicated by Bk, the set of pixel-indices in that block. Blockwise computations begin from (rminv, cminv) and ignore any partial-block pixels near the boundary towards right columns and bottom rows to avoid letterbox content. For example, an HD (1920×1080) image has active content between: (rminv, rmaxv)=(91,990) and (cminv, cmaxv)=(0,1919), as indicated by letterbox detection. Beginning from (0, 91) as our first pixel of first block, the encoder (200) proceeds in raster-scan order to compute blockwise statistics. There are total ΩB=
such blocks, where └⋅┘ is floor operation. Note that ωB is small enough (e.g. 16) as compared to image dimensions, such that the number of boundary pixels that are not considered in any block is negligible as compared with either W or
For each block k (k=0, 1, . . . , ΩB−1), the mean (
The corresponding luma bin index bk of the block-mean is:
The BLKHIST is computed by counting the number of pixels that have block-mean bin index b:
BLKSTD is computed in all bins b where hbY≠0 by averaging the standard deviation over all blocks that have block-mean bin index b:
For bins where hbY=0, the BLKSTD
Using block-based image statistics over pixel image statistics saves on computations at a negligible loss of accuracy.
As one particular example of performing statistics collections, the following pseudocode is presented using a C-like format:
Returning to
A banding indicator unit may be constructed using a multiplicative combination of BLKSTD and BLKHIST. For example, let op be the predicted banding risk in bin b=0, 1, . . . . NB−1:
If hbY=0 for some bin, then there are no pixels in that bin, and banding risk is 0. This indicator considers the effects of two features to indicate banding risk in larger areas of image. Since
Where a higher value of (
As one particular example of determining banding risk, the following pseudocode is presented using a C-like format:
At step (708), the encoder (200) determines curve parameters for the earlier-defined Gaussian function. The Gaussian curve parameters include μG and kG. The Gaussian function has a maximum value (or peak) at its mean μG. The peak location of the banding risk across all luma bins is identified and set as the mean μG. Let
The normalized bin-center codeword is set as the mean of the Gaussian function, as previously described with respect to Equation 11.
The parameter kG determines the shape of the Gaussian bell-curve and is computed as a number between (KG,min, kG,max), using the peak-bin index b
The sum of risk in this window (
Where
As one particular example of determining Gaussian curve parameters, the following pseudocode is presented using a C-like format:
peak =
b;
In this example, CLIP3{a, b, c} means the signal a is clipped to be in between [b,c].
At step (710) the encoder (200) constructs the dLUT. A pointwise dLUT denoted by δY(⋅):[0,1)→[0,1] is constructed at NC number of equidistant “compute-points” xi, i=0, 1, . . . . NC−1, using Gaussian curve with parameters: μG, kG.
Setting NC<Nv significantly saves computations e.g. in case of 16-bit content. NC=4096 compute-points i.e 12-bit granularity are used instead of full Nv=65536 actual codewords. Here,
is normalized stride for compute-points. NC needs to be a factor of Nv.
Where δY,MAX is minimum value imposed on dLUT.
Setting δY,MAX ensures any luma range receives at least some minimum number of codewords, as a purely curve-based function may not account for ensuring each luma range receives codewords otherwise.
Returning to
Equation 24 defines a curve at NC points that maps the entire HDR codeword range to a real-number range. To construct a normalized FLUT
In the example of graph (1000), vLY=10000 and vHY=55536. Accordingly, the resulting FLUT did not allocate any codewords to v<vLY and v>vHY. As one particular example of obtaining dLUT and FLUT at compute-points, the following pseudocode is presented using a C-like format:
Returning to
where s=round [Ns.
To simplify computations over using jointly-designed pivot points and a 2nd order polynomial, fixed pivot points at equal intervals and a 1st order polynomial is used. The 9 pivot points are computed as:
For example, for 8-bit BL, the 9 pivot points are 0, 32, 64, 96, 128, 160, 192, 224, and 255. The normalized pivot points are:
The corresponding mapping of the BL-domain points to HDR domain is computed using the normalized FLUT
To compute the start points of all the polynomial pieces:
since
Using the monotonically non-decreasing property of reshaping functions, for l=1, . . . ΩY−1:
The end points for l=0, . . . ΩY−2 are:
and
The BLUT is constructed using these 8 polynomial pieces. As one particular example for computing first-order BLUT coefficients, the following pseudocode is presented using a C-like format:
Returning to
As illustrated in
Here,
Graph (1500) includes a 45-degree line that indicates the proposed reshaping operation is properly revertible. Visible steps in some part of the graph (1500) (lower codewords in this example) indicate a higher quantization due to less codewords being allocated in those parts. This shows the non-linearity of the proposed reshaping function that allows for an unequal distribution of codewords.
Returning to
Accordingly, the use of an 8-piece polynomial function for luma reshaping avoids costly computations and reduces latency. Additionally, the use of Gaussian curves whose parameters are determined based on image features minimizes banding within the image frames.
In some implementations, the encoder (200) pre-builds a number of Gaussian dLUTs and their cumulate addition (or intermediate) intermediate FLUTs: {tilde over (T)}YF,C(⋅).
As previously described, there are two parameters: UG and kG to construct the Gaussian curve. A 2D table is pre-built, which can be addressed using an index-pair: (πμ, πk), where πμ=0, 1, . . . , Πμ−1 and πk=0, 1, . . . , Πk−1. A ΔμG stepsize is used for μG and ΔkG for kG parameters to construct a (Πμ×Πk)-sized look up table, in which each entry indexed by (πμ, πk) is an intermediate FLUT: {tilde over (T)}Y,{π
Here,
For example, if ΔμG=0.1 is the step size, there are 11 entries between [0,1]. Similarly,
For example, if (KG,min, kG,max)=(2,10) and ΔkG=0.5, there will be Πk=17 entries. Thus, the 2D look-up table will have (Πμ×Πk)=11×17=187 pre-computed FLUTs that are mapping entire HDR codeword range from [0,1] to BL.
As one particular example for pre-computing intermediate FLUTs, the following pseudocode is presented using a C-like format:
With the pre-built tables, for a given frame f, image features are computed and parameters μG,f, kG,f are estimated by the encoder (200). Next, these are quantized to find the corresponding indices (πμ,f, πk,f) using:
{tilde over (T)}Y,{π
For chroma channels, luma-weighted reshaping is used. This facilitates assigning more importance to reshaped luma content than that of chroma, aiding typical video compression to spend more bits on the more-visually-significant luma part.
First, the range of BL codewords to be used for p chroma channel, p=Cb or Cr is determined based on the ratio of HDR chroma range to luma range. Specifically, the number of BL codewords used in channel p is denoted by srangep and computed as:
The chroma-neutral point is shifted to the center of BL axis such that the minimum and maximum reshaped BL codewords sminp, smaxp are:
Thus, the chroma forward reshaping for channel p is:
The corresponding backward reshaping function is:
The p-channel chroma backward reshaping parameters are expressed as first order polynomials using a 1-piece 1st order polynomial (straight line) coefficients a0p,0 and a1p,0:
where
As one particular example for computing revertible reshaping functions for p-chroma channel, the following pseudocode is presented using a C-like format:
As previously described, the encoder (200) includes a noise injection block (202) that injects film-grain noise to the HDR luma channel. After noise is injected, the HDR image is forward-reshaped to generate a BL signal using the determined reshaping functions. As one particular example for noise injection and forward reshaping, the following pseudocode is presented using a C-like format:
.Ξptr
Here, ρ is the fixed maximum noise strength and Ξi is ith pixel value of normalized [−1,1] noise image. The noise image is generated using a pseudo-random number as index to select an image from noise-image bank, pre-computed and stored.
In some implementations, reshaping methods described herein are implemented using linear encoding architecture. One example of linear encoding architecture is provided in WIPO Publication No. WO2019/169174, “Linear Encoder for image/video processing,” by N. J. Gadgil and G-M Su, which is incorporated herein by reference in its entirety.
The above video delivery systems and methods may provide for encoding high dynamic range (HDR) video data using reshaping functions. Systems, methods, and devices in accordance with the present disclosure may take any one or more of the following configurations.
(1) A video delivery system for context-based encoding of video data, the delivery system comprising: a processor to perform encoding of video data, the processor configured to: receive the video data, the video data composed of a plurality of image frames, each image frame including a plurality of pixel blocks, determine, for each pixel block, a luma bin index, determine, for each luma bin, a banding risk value, determine Gaussian function parameters based on the banding risk value, generate a differential reshaping function using the Gaussian function parameters, compute a luma-based forward reshaping function based on the differential reshaping function, and generate an output image for each image frame by applying the luma-based forward reshaping function to the respective image frame.
(2) The video delivery system according to (1), wherein the Gaussian function includes a mean value and a width value, and wherein the mean value and the width value are each based on the banding risk value.
(3) The video delivery system according to any one of (1) to (2), wherein the processor is further configured to: determine a backwards reshaping function based on the luma-based forward reshaping function.
(4) The video delivery system according to (3), wherein the backwards reshaping function is approximated in the form of an 8-piece 1st order polynomial curve.
(5) The video delivery system according to any one of (1) to (4), wherein the processor is further configured to: determine maximum and minimum pixel values from luma and chroma channels of each of the plurality of image frames, and identify a letterbox within each of the plurality of image frames.
(6) The video delivery system according to any one of (1) to (5), wherein the processor is further configured to: compute, for each pixel block, a mean for luma pixel values included in the pixel block, and compute, for each pixel block, a standard deviation for luma pixel values included in the pixel block.
(7) The video delivery system according to (6), wherein the banding risk value determined for each luma bin is determined based on the mean and the standard deviation for luma pixel values included in each pixel block.
(8) The video delivery system according to any one of (1) to (7), wherein the processor is further configured to: compute a block histogram by counting a number of pixels that have a first block-mean bin index, and compute a block standard deviation by averaging standard deviation over all pixel blocks that have the first block-mean bin index.
(9) A method for context-based encoding of video data, the method comprising: receiving the video data, the video data composed of a plurality of image frames, each image frame including a plurality of pixel blocks, determining, for each pixel block, a luma bin index, determining, for each luma bin, a banding risk value, determining Gaussian function parameters based on the banding risk value, generating a differential reshaping function using the Gaussian function parameters, computing a luma-based forward reshaping function based on the differential reshaping function, and generating an output image for each image frame by applying the luma-based forward reshaping function to the respective image frame.
(10) The method according to (9), wherein the Gaussian function includes a mean value and a width value, and wherein the mean value and the width value are each based on the banding risk value.
(11) The method according to any one of (9) to (10), further comprising: determining a backwards reshaping function based on the luma-based forward reshaping function.
(12) The method according to (11), wherein the backwards reshaping function is approximated in the form of an 8-piece 1st order polynomial curve.
(13) The method according to any one of (9) to (12), further comprising: determining maximum and minimum pixel values from luma and chroma channels of each of the plurality of image frames, and identifying a letterbox within each of the plurality of image frames.
(14) The method according to any one of (9) to (13), wherein determining, for each pixel block, the luma bin index includes: computing, for each pixel block, a mean for luma pixel values included in the pixel block, and computing, for each pixel block, a standard deviation for luma pixel values included in the pixel block.
(15) The method according to (14), wherein the banding risk value determined for each luma bin is determined based on the mean and the standard deviation for luma pixel values included in each pixel block.
(16) The method according to any one of (9) to (15), wherein determining, for each pixel block, the banding risk includes: computing a block histogram by counting a number of pixels that have a first block-mean bin index, and computing a block standard deviation by averaging standard deviation over all pixel blocks that have the first block-mean bin index.
(17) The method according to any one of (9) to (16), further comprising: adding noise to each image frame.
(18) The method according to any one of (9) to (17), wherein the differential reshaping function defines a number of codewords allocated to a given luminance range.
(19) The method according to any one of (9) to (18), further comprising: setting a floor value of the differential reshaping function.
(20) A non-transitory computer-readable medium storing instructions that, when executed by an electronic processor, cause the electronic processor to perform operations comprising the method according to any one of (9) to (19).
With regard to the processes, systems, methods, heuristics, etc. described herein, it should be understood that, although the steps of such processes, etc. have been described as occurring according to a certain ordered sequence, such processes could be practiced with the described steps performed in an order other than the order described herein. It further should be understood that certain steps could be performed simultaneously, that other steps could be added, or that certain steps described herein could be omitted. In other words, the descriptions of processes herein are provided for the purpose of illustrating certain embodiments, and should in no way be construed so as to limit the claims.
Accordingly, it is to be understood that the above description is intended to be illustrative and not restrictive. Many embodiments and applications other than the examples provided would be apparent upon reading the above description. The scope should be determined, not with reference to the above description, but should instead be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. It is anticipated and intended that future developments will occur in the technologies discussed herein, and that the disclosed systems and methods will be incorporated into such future embodiments. In sum, it should be understood that the application is capable of modification and variation.
All terms used in the claims are intended to be given their broadest reasonable constructions and their ordinary meanings as understood by those knowledgeable in the technologies described herein unless an explicit indication to the contrary is made herein. In particular, use of the singular articles such as “a,” “the,” “said,” etc. should be read to recite one or more of the indicated elements unless a claim recites an explicit limitation to the contrary.
The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments incorporate more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in fewer than all features of a single disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.
Various aspects of the present invention may be appreciated from the following enumerated example embodiments (EEEs):
EEE1. A video delivery system for context-based encoding of video data, the delivery system comprising:
| Number | Date | Country | Kind |
|---|---|---|---|
| 21203845.9 | Oct 2021 | EP | regional |
This application claims the benefit of priority from U.S. Provisional Patent Application No. 63/270,097, filed on Oct. 21, 2021, and European Patent Application No. 21203845.9, filed Oct. 21, 2021, both of which are incorporated herein by reference in their entirety.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/US2022/047226 | 10/20/2022 | WO |
| Number | Date | Country | |
|---|---|---|---|
| 63270097 | Oct 2021 | US |