The present disclosure relates generally to images. More particularly, an embodiment of the present invention relates to trim-pass correction for processing HDR video in cloud-based coding architectures.
As used herein, the term ‘dynamic range’ (DR) may relate to a capability of the human visual system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest grays (blacks) to brightest whites (highlights). In this sense, DR relates to a ‘scene-referred’ intensity. DR may also relate to the ability of a display device to adequately or approximately render an intensity range of a particular breadth. In this sense, DR relates to a ‘display-referred’ intensity. Unless a particular sense is explicitly specified to have particular significance at any point in the description herein, it should be inferred that the term may be used in either sense, e.g. interchangeably.
As used herein, the term high dynamic range (HDR) relates to a DR breadth that spans the 14-15 orders of magnitude of the human visual system (HVS). In practice, the DR over which a human may simultaneously perceive an extensive breadth in intensity range may be somewhat truncated, in relation to HDR. As used herein, the terms visual dynamic range (VDR) or enhanced dynamic range (EDR) may individually or interchangeably relate to the DR that is perceivable within a scene or image by a human visual system (HVS) that includes eye movements, allowing for some light adaptation changes across the scene or image. As used herein, VDR may relate to a DR that spans 5 to 6 orders of magnitude. Thus, while perhaps somewhat narrower in relation to true scene referred HDR, VDR or EDR nonetheless represents a wide DR breadth and may also be referred to as HDR.
In practice, images comprise one or more color components (e.g., luma Y and chroma Cb and Cr) wherein each color component is represented by a precision of n-bits per pixel (e.g., n=8). For example, using gamma luminance coding, images where n≤8 (e.g., color 24-bit JPEG images) are considered images of standard dynamic range, while images where n≥10 may be considered images of enhanced dynamic range. HDR images may also be stored and distributed using high-precision (e.g., 16-bit) floating-point formats, such as the OpenEXR file format developed by Industrial Light and Magic.
Most consumer desktop displays currently support luminance of 200 to 300 cd/m2 or nits. Most consumer HDTVs range from 300 to 500 nits with new models reaching 1,000 nits (cd/m2). Such conventional displays thus typify a lower dynamic range (LDR), also referred to as a standard dynamic range (SDR), in relation to HDR. As the availability of HDR content grows due to advances in both capture equipment (e.g., cameras) and HDR displays (e.g., the PRM-4200 professional reference monitor from Dolby Laboratories), HDR content may be color graded and displayed on HDR displays that support higher dynamic ranges (e.g., from 1,000 nits to 5,000 nits or more).
As used herein, the term “forward reshaping” denotes a process of sample-to-sample or codeword-to-codeword mapping of a digital image from its original bit depth and original codewords distribution or representation (e.g., gamma, PQ, HLG, and the like) to an image of the same or different bit depth and a different codewords distribution or representation. Reshaping allows for improved compressibility or improved image quality at a fixed bit rate. For example, without limitation, reshaping may be applied to 10-bit or 12-bit PQ-coded HDR video to improve coding efficiency in a 10-bit video coding architecture. In a receiver, after decompressing the received signal (which may or may not be reshaped), the receiver may apply an “inverse (or backward) reshaping function” to restore the signal to its original codeword distribution and/or to achieve a higher dynamic range.
In many video-distribution scenarios, HDR video may be coded in a multi-processor environment, typically referred to as a “cloud computing server.” In such an environment, trade-offs among ease of computing, workload balance among the computing nodes, and video quality, may force reshaping-related metadata to be updated on a frame-by-frame basis, which may result in unacceptable overhead, especially when transmitting video at low bit rates. Splitting of a scene into multiple computing nodes may create temporal inconsistencies in sub-scenes, which may affect how trim-pass data affect the output video. As appreciated by the inventors here, improved techniques for trim-pass correction in a cloud-based environment are desired.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.
An embodiment of the present invention is illustrated by way of example, and not in way by limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Methods for trim-pass correction in cloud-based video coding of HDR video are described herein. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating the present invention.
Example embodiments described herein relate to cloud-based reshaping and coding for HDR images. In an embodiment, in a cloud-based system for encoding HDR video, a current node receives a first video sequence comprising video frames in a high dynamic range. Then, one or more processors in the node:
Under this framework, given reference HDR content (120) and corresponding reference SDR content (125) (that is, content that represents the same images as the HDR content, but color-graded and represented in standard dynamic range), reshaped SDR content (134) is encoded and transmitted as SDR content in a single layer of a coded video signal (144) by an upstream encoding device that implements the encoder architecture. The reshaped and subsequently compressed SDR content is received and decoded, in the single layer of the video signal, by a downstream decoding device that implements the decoder architecture. Backward-reshaping metadata (152) is also encoded and transmitted in the video signal with the reshaped content so that HDR display devices can reconstruct HDR content based on the (reshaped) SDR content and the backward reshaping metadata. Without loss of generality, in some embodiments, as in non-backward-compatible systems, reshaped SDR content may not be watchable on its own but must be watched in combination with the backward reshaping function, which will generate watchable SDR or HDR content. In other embodiments which support backward compatibility, legacy SDR decoders can still playback the received SDR content without employing the backward reshaping function.
As illustrated in
Examples of backward reshaping metadata representing/specifying the optimal backward reshaping functions may include, but are not necessarily limited to only, any of: an inverse tone mapping function, inverse luma mapping functions, inverse chroma mapping functions, lookup tables (LUTs), polynomials, inverse display management coefficients/parameters, etc. In various embodiments, luma backward reshaping functions and chroma backward reshaping functions may be derived/optimized jointly or separately, may be derived using a variety of techniques, for example, and without limitation, as described later in this disclosure.
The backward reshaping metadata (152), as generated by the backward reshaping function generator (150) based on the reshaped SDR images (134) and the target HDR images (120), may be multiplexed as part of the video signal 144, for example, as supplemental enhancement information (SEI) messaging.
In some embodiments, backward reshaping metadata (152) is carried in the video signal as a part of overall image metadata, which is separately carried in the video signal from the single layer in which the SDR images are encoded in the video signal. For example, the backward reshaping metadata (152) may be encoded in a component stream in the coded bitstream, which component stream may or may not be separate from the single layer (of the coded bitstream) in which the SDR images (134) are encoded.
Thus, the backward reshaping metadata (152) can be generated or pre-generated on the encoder side to take advantage of powerful computing resources and offline encoding flows (including but not limited to content adaptive multiple passes, look ahead operations, inverse luma mapping, inverse chroma mapping, CDF-based histogram approximation and/or transfer, etc.) available on the encoder side.
The encoder architecture of
In some embodiments, as illustrated in
Optionally, alternatively, or in addition, in the same or another embodiment, a backward reshaping block 158 extrats the backward (or forward) reshaping metadata (152) from the input video signal, constructs the backward reshaping functions based on the reshaping metadata (152), and performs backward reshaping operations on the decoded SDR images (156) based on the optimal backward reshaping functions to generate the backward reshaped images (160) (or reconstructed HDR images). In some embodiments, the backward reshaped images represent production-quality or near-production-quality HDR images that are identical to or closely/optimally approximating the reference HDR images (120). The backward reshaped images (160) may be outputted in an output HDR video signal (e.g., over an HDMI interface, over a video link, etc.) to be rendered on an HDR display device.
In some embodiments, display management operations specific to the HDR display device may be performed on the backward reshaped images (160) as a part of HDR image rendering operations that render the backward reshaped images (160) on the HDR display device.
Existing reshaping techniques may be frame-based, that is, new reshaping metadata is transmitted with each new frame, or scene-based, that is, new reshaping metadata is transmitted with each new scene. As used herein, the term “scene” for a video sequence (a sequence of frames/images) may relate to a series of consecutive frames in the video sequence sharing similar luminance, color and dynamic range characteristics. Scene-based methods work well in video-workflow pipelines which have access to the full scene; however, it is not unusual for content providers to use cloud-based multiprocessing, where, after dividing a video stream into segments, each segment is processed independently by a single computing node in the cloud. As used herein, the term “segment” denotes a series of consecutive frames in a video sequence. A segment may be part of a scene or it may include one or more scenes. Thus, processing of a scene may be split across multiple processors.
As discussed in Ref. [1], in certain cloud-based applications, under certain quality constraints, segment-based processing may necessitate generating reshaping metadata on a frame-by-frame basis, resulting in undesirable overhead. This may be an issue in very low bit-rate applications (e.g., lower than 1 Mbit/s). Ref. [6] proposed a solution to this problem using a two stage architecture which includes: a) a dispatcher stage implemented on a single computing node, which allocates scenes into segments and b) an encoding stage, where each node in the cloud encodes a sequence of segments. After a scene is segmented, the proposed scene to segment allocation process includes one or more iterations with an initial random assignment of scenes to nodes, followed by a refined assignment based on optimizing the allocation cost across all the nodes. In such an implementation, the total length of video to be processed in each node may vary across all the nodes.
Embodiments discussed in Ref. [7] provided an alternative solution. After a sequence is divided into segments, each segment to be processed by a separate node, in each node, each segment is sub-divided into sub-segments (or scenes) in such a way to minimize the need to update the corresponding reshaping function of each sub-segment, thus minimizing the overhead required to transmit reshaping-related metadata.
In some embodiments, a reference SDR signal (e.g. 125) based on HDR signal (120) may not have the desired “look.” Then, a colorist may adjust “trim parameters” (commonly referred to as lift, gain, and gamma (LGG)) to achieve the desired effect. This process may be referred to as a “trim pass,” and its main goal is to maintain the director's intent or look. As used herein, the term “trim pass” denotes a post-production process which may include trimming, rotating, cropping, flipping and adjusting the brightness, color and saturation of a video sequence to generate a sequence to be displayed on a display with a target dynamic range, typically lower than the dynamic range of the master. For example, given a movie mastered at 4,000 nits, a colorist may generate “trims” at 400, 500, and 1,000 nits. Such a trim pass may preserve artistic intent in the SDR signal, but after reshaping, it may also introduce unpleasant clipping artifacts in the darks (the low intensity regions) or in the highlights (the high intensity regions). Such artifacts may be further exaggerated by the video-to-nodes segmentation process and/or the video coding process (142). To reduce such artifacts, one could try to change the LGG “trim” data itself; however, studios do not allow any changes of their data after a director's approval, and any such change would require an additional review process. Hence, as appreciated by the inventors, it would be beneficial to be able to reduce artifacts introduced by the HDR to SDR mapping process. Such a methodology is described next.
As it was disclosed first in Ref. [7],
In preprocessing step 210, the mezzanine input is split into segments and each segment is assigned to a different computing node (e.g., node 205-N). These segments are mutually exclusive, i.e. they have no frames in common. Each node will also get a certain number of frames that are before the first frame in the segment and some frames after the last frame in the segment. These prior and post overlapped frames, called bumper frames, are only used for maintaining temporal consistency with the previous and the next node respectively. Bumper frames are not encoded by the node. Without loss of generality, in an embodiment, these video segments may be all be of equal, fixed-length, except perhaps for the segment assigned to the last node. As an example, a sample distribution of a mezzanine (305) into three segments (307-1, 307-2, 307-3) along with their bumper frames (e.g., 309), and their assignment of these frames to different nodes is illustrated in
After the preprocessing step 210 is over, each node gets access to its frames and a two-pass approach follows.
For ease of discussion, let L denote the number of frames in a segment, and let B denote number of frames in each bumper section. Let the i-th frame in the mezzanine be denoted as fi. In an embodiment, the first node encodes the frames f0˜fL−1 that are in the segment portion. This node has no left bumper, and its right bumper spans the frame range fL˜fL+B−1. The segment portion of node-N will process frames f(N−1)L˜fNL−1, with f(N−1)L−B˜f(N−1)L−being the left bumper and fNL˜fNL+B−1 being the right bumper. The last node will have no right bumper section and it may have less then L frames in the segment portion.
Given a node N, node N−1 is the left/previous neighbor-node and node N+1 is the right/next, or subsequent, neighbor. Referring to nodes that are left/previous nodes to N includes all the nodes from 0 to N−1. Similarly, referring to nodes that are right/next to N denotes all the nodes from N+1 to the last node. The two passes described earlier will now be discussed in further detail.
The key objective of this pass is to generate a list of scenes in the segment allocated to a node. The process starts by detecting scene cuts in all the frames allocated to the node, which includes the node segment and both the bumper sections. Only those scene cuts inside the segment will eventually be used by the Pass-2 for scene-based encoding. But the scenes in the bumper section are still useful for maintaining temporal consistency with the neighboring nodes.
Colorist-specified scene cuts (209) are read in from the XML file (207). An automatic scene cut detector (215) may also identify possible scene cut locations. These scene cuts from the colorists and the automatic detector are merged to get a first list of scenes, known as primary scenes. Primary scenes on the segment boundaries are split using bumper frames and a novel scene splitting technique. Splitting of a primary scene on a segment boundary creates additional scenes, known as secondary scenes or subscenes. Secondary scenes are added to the first list of scenes to get a second list. This list is then used by the Pass-2 for scene-based encoding. Apart from the list of scenes, Pass-2 may also need auxiliary data (212) for forward reshaping of the secondary scenes. Details for each step are provided next.
Colorists and professional color-graders usually process each scene as a single unit. To meet their goals (e.g., proper color grading, inserting fade-ins and fade-outs, etc.), they need to manually detect scene cuts in the sequence. This information is stored in the XML file and can be used for other purposes as well. Every node will read only the relevant scene cuts for its segment from the XML file. These scene cuts may be in the segment section or in the bumper sections.
Even though XML scene cuts are defined by the colorists, they are not completely accurate. For grading purposes, sometimes colorists introduce scene cuts in the middle of a dissolving scene or at the start of fade in or fade out portion of a scene. These scene cuts, if taken into consideration during the reshaping phase, may cause flashing in the reconstructed HDR video and normally should be avoided. For this reason, in an embodiment, an automatic scene-cut detector (Auto-SCD) 215 is also employed.
An automatic scene-cut detector or Auto-SCD uses the change in luminance levels in different sections of consecutive video pictures to detect a scene change. Any scene cut detector known in the art can be used as the automatic detector. In an embodiment, such an automatic detector is oblivious to dissolving, fade in or fade out parts of the video and it can still detect all the true scene cuts correctly.
A potential problem with an automatic detector is false positives. Sometimes there are brightness changes within a scene due to camera panning, movements, occlusions etc. These brightness changes may also be detected as scene cuts by the Auto-SCD. To discard these false positives, in an embodiment, the scene cuts from the XML file and those from the and Auto-SCD are merged together in step 220. A person skilled in the art will appreciate that if there are no scene cuts defined in the XML file one may simply use the output of the automatic scene detector. Similarly, in other embodiments, one may rely strictly on scene cuts defined in the XML file. Alternatively, one may also use more than two scene-cut detectors, where each one detects different attributes of interest, and then define the primary scenes based on a combination of all of their results (e.g., their intersection or a combination of other set operations, e.g., their union, intersection, and the like).
Let ΨXMLN be the set of frame indices representing scene start frames in node N as reported in the XML file. Similarly, let ψAuto-SCDN denote the set of frame indices representing scene start frames in node N as reported by Auto-SCD. In an embodiment, merging the scene cuts from these two sets is equivalent to taking the intersection of these two sets.
Ψ1N=ΨXMLN∩ΨAuto-SCDN, (1)
where, Ψ1N indicates the first list of scene cuts (or scenes) in the node N. These scenes are also known as primary scenes.
As depicted in
As depicted in
It should be noted that splitting creates additional scenes and thus increases the metadata bitrate. The challenge is to achieve temporal consistency using a minimum number of splits to keep the metadata bitrate low. Bumper frames play a significant role in achieving a good visual quality while reducing the number of splits.
Consider a case with a parent scene P with M frames (M>1) ranging from the Q -th index frame to the Q+M−1 frame in the mezzanine.
Process 400 starts with an initialization step 410, where, given input HDR and SDR frames (405) for primary scene P, HDR and SDR histograms hv and hs, and individual forward reshaping functions {tilde over (T)}jF (FLUTs) are computed for each frame in scene P. As an example, and without limitation, given frame histograms, one can apply cumulative density function (CDF) matching (Ref. [4-5]) to generate the forward mapping function (FLUT) from HDR to SDR, e.g.,
{tilde over (T)}F=CDF_MATCHING(hv(b),hs(b)). (2)
thus, for the j-frame, this step generates:
{tilde over (T)}jF, hjv∀j∈[Q, Q+M−1], (3)
where hjv denotes a histogram of HDR frame j.
The segmentation methods described herein are agnostic on how frame-based reshaping functions are generated. Thus, in an embodiment, such reshaping functions may be generated directly from the available HDR video using any of the known reshaping techniques and without any dependency on the availability of a corresponding SDR video.
A scene FLUT {tilde over (T)}PF is constructed for P by averaging all the frame FLUTs in the scene. In the following equation, b indicates the index in the FLUT. In an embodiment, FLUT values may be normalized, i.e. {tilde over (T)}jF (b)∈[0.0, 1.0].
The scene FLUT and the generated histograms are used to predict a “DC” value χj for every frame in scene P. If the height and width of a frame are H and W respectively, then its DC value is computed as
In an embodiment, the DC difference of every frame with its previous frame, denoted as ℑjDC, is used as one set of thresholds to make the splitting decisions. These ℑjDC values are calculated once during the initialization phase and are used several times during the splitting process:
ℑjDC=χj−χj−1∀j∈[Q+1, Q+M−1]. (6)
The maximum absolute element-wise difference between the FLUT of every frame with its previous frame's FLUT is stored also at the initialization stage to be used as an additional set of thresholds for detecting smoothness violations,
ℑjFLUT=max(α×max(({|{tilde over (T)}jF(b)−{tilde over (T)}j−1F(b)||∀b}), β) ∀j∈[Q+1, Q+M−1], (7)
where α and β are configurable parameters, with typical values 2.0 and 0.60, respectively
Secondary scene cuts Cg are collected in a sorted subscene set Ωp, where g is an index in the set. The frame index Q+M acts as the end of the list marker and is not used as a scene cut. In an embodiment, secondary scene cuts at initialization are as follows:
Ωp={Q, Q+M}={C0, C1}, (8)
In an embodiment, a violation subscene set Y is used to store the subscenes that violate the smoothness criteria. To start splitting parent scene P, at initialization, ={P}. Only the scenes or subscenes in the violation set will be split later on. In summary, in step 410, the initialization step generates: ℑjDC and ℑjFLUT values, a violation set , and a sorted set of scene cuts Ωp.
In step 415, given a violation set and a sorted set of secondary scene cuts Ωp as the input, a new round of subscene splitting begins. One iterates through all the subscenes in violation set and decides on how to split them.
Let Pg be a subscene in the violation set that spans the frame range [Cg, Cg+1−1]. For splitting, one compares subscene FLUT {tilde over (T)}P
where the mathematical operator |⋅| denotes the absolute value.
After the split, the subscene Pg is divided into two subscenes or secondary scenes and the new splitting index is inserted into the secondary set at the correct location.
Ωp=Ωp∪{Cs}. (10)
All the new splits from all the subscenes in the violation set are inserted into the set Ωp in a sorted manner The violation set is set to an empty set after iterating through every subscene in it. The updated set Ωp is passed on to the next step in the splitting process.
In step 420, new subscene FLUTs are computed for every secondary scene in the updated set Ωp. Suppose at this time, the set Ωp contains G+1 secondary scene cuts from C0 to CG:
Ωp={C0, C1 . . . Cg . . . CG−1, CG}. (11)
There are G subscenes in this iteration round, and the frame indices in the set Ωp are in ascending order, i.e.
Q=C
0
<C
1
. . . <C
g
< . . . <C
G−1
<C
G
=Q+M. (12)
Consider subscene Pg that spans the frame range [Cg, Cg+1−1]. To build a subscene FLUT, i.e. {tilde over (T)}PgF, for Pg for g∈[0, G−1], a subscene overlap parameter θ is introduced to allow a small overlap between neighboring subscenes.
Θ′=max (Q, Cg−θ),
Θ″=min (Q+M−1, Cg+1−1+θ). (13)
The overlap frames are used to estimate the forward LUT for the subscene Pg by averaging the FLUTs in the subscene and the overlap portion.
In the current round of the splitting process, let the DC value be defined by λ. These DC values will be used later on in step 425 to find threshold violations at the subscene boundaries. Let
These new DC values for all the frames in the primary scene P are collected after iterating through all the subscenes in Ωp and computing statistics in them.
In step 425, temporal stability violations at boundaries between subscenes are detected. For example, for secondary scenes Pg−1 in {Cg−1, Cg−1} and Pg in {Cg, Cg+1−1}, a boundary check needs to be computed at Cg. If any one of the checks fail, then both subscenes Pg−1 and Pg are moved to the violation set . For subscenes Pg and Pg+1, a boundary check needs to be computed at Cg+1. The same checks are applied at each subscene boundary Cg, except at the first frame of the segment, C0 (Q), and the last frame of the segment, Q+M−1=CG−1.
Using equation (15), updated DC values (λj) are available for all the frames in the primary scene P after iterating through all the subscenes in Ωp. These values will be used in steps 425 and 430 to perform boundary-violation checks. The DC difference Δc
Δc
Violation Check #1:
Is it true that |Δc
If absolute DC difference |Δc
Violation Check #2:
Is it true that sign (Δc
The sign(x) (or signum) operator for a real number x is defined as follows:
A positive DC difference Δc
Violation Check #3:
Is it true that max ({|{tilde over (T)}P
If maximum of absolute element-wise difference between FLUTs {tilde over (T)}P
All the violation checks are at subscene boundaries. If there is a violation, then both subscenes are entered into the violation set. This ends the current round of splitting. At step 430, if the updated violation set is not empty, control goes back to step 415 with the updated Ωp and sets for the next round of splitting. Otherwise, if there are no boundary violations and the violation set is empty, the process terminates and step 440 outputs the final secondary set of subscenes. In an embodiment, in step 425, if a secondary scene in is only one frame long, it can be removed from the set, since it is impossible to be further split. Alternatively, such single-frame scenes can be ignored in step 415.
In practice, a parent scene is sub-divided only if it is processed across two or more nodes. For example, a node may look for scene cuts at the left and right bumper sections. If no such scene cuts are detected, then it can deduce that the beginning or end of its segment is processed by neighboring nodes as well, thus, one or more primary scenes need to be subdivided.
Consider a scenario shown in
In an embodiment, for the example in
In an embodiment, these initial sync splits may be performed as part of step 410, and one can apply the splitting algorithm 400 on these primary scenes. The only minor change will be in the initialization step 410, where the set Ωp for each node will include one additional synchronization scene cut (320). Then, since there is no need to do further splits one can directly jump to step 420 after initialization. Next, the algorithm proceeds as usual.
Alternatively, given the original Ωp set, upon detecting that a primary scene is not fully in the current node, this sync subdivision may be performed in step 415 using the rules described earlier (e.g., for node N, if the primary scene does not terminate at node N, adding a scene cut at position CL−1−B) in lieu of using equation (9).
With these initial synchronization splits, the subscene cuts computed by nodes N and N+1 in isolation are expected to be reasonably aligned with each other. For node N, let Ψ1N denote a first list of scenes obtained after merging the XML scene cuts with Auto-SCD scene cuts as seen in
Ψ2N=Ψ1N∪ΨlN∪ΨrN∪fNL (20)
There is a possibility that for a scene longer than a segment length, there may not be separate left or right sets, but a single set of secondary scene cuts. Let Sk denote the starting frame index for the kth scene in the list Ψ2N. Suppose there are K scenes in the list, then the elements in the list can be expressed as:
Ψ2N={S0, S1, S2 . . . Sk−1, Sk,Sk+1 . . . Sk} (21)
where SK denotes a dummy scene cut that is just after the last frame of the segment. It is only used as an end-of-list marker. By default, S0=fNL as the first frame of the segment is also a beginning of a new scene for node N. The second list of scenes Ψ2N (222) is handed over to Pass-2 along with subscene-related auxiliary data.
The second list of scenes Ψ2N has details about the primary and secondary scenes in the segment. Primary scenes do not need any additional data from Pass-1, but secondary scenes require the following auxiliary data from Pass-1.
The term “merging point” refers to a luminance value in the SDR or HDR domain which acts as a boundary point to modify an original forward reshaping function (FLUT) to correct for potential effects of trim-pass operations. For example, given the low and high merging points and in a set (or window) of consecutive frames denoted as , and normalized SDR values in [0, 1), the FLUT may be adjusted in the SDR luminance ranges [0, ]and [, 1). Without limitation, examples of low-intensity and high-intensity luminance regions may be normalized luminance values in [0, 0.5) and [0.5, 1) respectively. Example methods to generate merging points are provided in Ref. [5] and in later sections of this specification.
There are two major types of scenes in the proposed architecture, namely, primary scenes and secondary scenes. Pass-2 processes all scenes to produce the same set of metadata parameters for every frame in that scene. In the forward phase of Pass-2, reshaping parameters are computed from all the frames in the statistics collection window of that scene.
As a rule, a primary scene will have no overlap with any neighboring scene, and secondary scenes are only allowed to have overlap with neighboring secondary scenes. In other words, the overlapping frames for a subscene can never come from a neighboring primary scene. The overlap parameter θ (see equation (13)) is set by the user and the default value is 1. Backward phase in Pass-2 uses no such overlap for primary or secondary scenes.
As mentioned earlier, a colorist may introduce trims in the reference SDR (125) to meet the director's intent. Sometimes the trim results in clipping of the highlights and/or crushing of the low intensity values. Reconstructing HDR from a trim-affected SDR introduces undesirable artifacts in the reconstructed HDR domain. To reduce these artifacts, as discussed in Ref. [5] and later in this specification, a trim correction algorithm needs (i) a reference display mapping (DM) curve computed using minimum, maximum and mean (or average) HDR and SDR luma values in the content (ii) merging points in the low and/or high intensity regions
As used herein, the term “display mapping” (DM) curve denotes a function which maps pixel values in an image frame (in a first dynamic range) to pixel values on a target display (in a second, different, dynamic range). For example, and without limitation, as discussed in Ref. [8], given min, max, and average luminance values in a video frame and corresponding min, average, and maximum luminance values in a target display, one may determine a tone-mapping curve mapping the input HDR values to corresponding HDR or SDR values. In an embodiment, such a tone-mapping or display-mapping (DM) curve may be used to better determine the range of HDR values where the original forward reshaping function needs to be adjusted so that artifacts due to the trim pass are reduced.
For trim handling in secondary scenes, the minimum and maximum HDR and SDR luma values used in the reference DM curve are fixed globally for the entire video sequence. In other words, there is no need to compute the minimum and maximum HDR or SDR luma values in each secondary scene. Let us denote the luma mean HDR and SDR values by v avg and savg respectively. Let the merging points in the low and high intensity SDR regions to be represented by slm and shm respectively. As an example, consider the scenario in
where B denotes the number of bumper frames. For the trim-parameter window , the low-intensity and high-intensity merging points are derived as:
=max({slm,j|j∈}).
=min({shm,j|j∈}) (23)
The trim parameter set, and is computed on all frames in , so they are the same for both the nodes. These values are copied to every subscene of the parent scene on both the nodes (e.g., scenes A, B, E, and F). With the same set of values used by all the subscenes, temporal stability is maintained.
In an embodiment, the trim-parameter window may be defined as follows:
For longer parent scenes that span multiple nodes, the trim parameters are either copied or interpolated.
For node N, the trim value for any subscene is computed as follows. Suppose ρ represent any of the three parameters in the trim set. Let a0 denote the first frame (351) after the end of the start trim-parameter window 350 in node N. Frame a0 and all frames before a0 will use trim-set 1. Let an (353) denote the first frame of the end trim-window 355 in node N. Frame an and all frames starting at an will use trim-set 2. For all other frames between these two trim-parameter windows (i.e., frames with frame indices j∈[ao, an−1]), for example, frames in the subscene A, the trim parameters will be computed by linear interpolation from the trim set 1 and 2 depending on their distance from the anchor frames a0 and an. For example, in an embodiment, for the j-th frame, an ρj trim parameter may be computed as:
where and denote the corresponding 1 and 2 trim parameters (see equations (22) and (23)). Then, in node N, for any subscene A within the two trim parameter windows, that is within frames as>a0 and ae−1<an, the scene-related trim parameters in subscene A are computed using an average of the frame trim parameters within that subscene, that is:
The same formula applies for any parameter in the trim set. Interpolation of trim parameters across subscenes ensures a gradual transition from trim parameters in 1 (i.e. ) and 2 (i.e. ) computed at either ends of node N. Without the interpolation, different trim parameters at the edges of the segment will cause drastic changes in the forward reshaping parameters from one subscene to the next, that may cause flashing or sudden brightness change.
While equation (24) applies simple linear interpolation, alternative interpolation methods known in the art, such as second order, bilinear, polynomial, spline, and the like, may also be used.
If a node detects that one or more scenes are shared with both of is neighbor nodes (e.g., a prior and a subsequent node), then, in step 462, it uses the two synchronization scene cuts to define two trim-pass parameter windows (e.g., 350 and 355), and determine two new sets (1 and 2) of updated trim-pass correction parameters (e.g., see equations (22) and (23)). Using 1 and 2, in step 464, the node will generate updated trim-pass parameters for all frames between the two trim-pass parameter windows by interpolating values between 1 and 2 according to their distance. Next (step 466), the following assignment is made:
As depicted in
As depicted in
Given a segment to scenes list (222),
From
Scene-based generation of a forward reshaping function (505) consists of two levels of operation. First, statistics are collected for each frame. For example, for luma, one computes the histograms for both SDR (hjs(b)) and HDR (hjv(b)) frames and stores them in the frame buffer for the j-th frame, where b is the bin index. After generating the 3DMT representation for each frame, one generates an “a/B” matrix representation denoted as:
B
j
F=(SjF)TSjF,
a
j
F,ch=(SjF)TvjF,ch, (26)
where ch refers to a chroma channel (e.g., Cb or Cr), (SjF)T denotes a transpose matrix based on the reference HDR scene data and a parametric model of the forward reshaping function, and vjF,ch denotes a vector based on the SDR scene data and the parametric model of the forward reshaping function.
Given the statistics of each frame within the current scene, one can apply a scene-level algorithm to compute the optimal forward reshaping coefficients. For example, for luma, one can generate scene-based histograms for SDR (hs(b)) and HDR data (hv(b)) by summing or averaging the frame-based histograms. For example, in an embodiment,
hs(b)=Σj=S
hv(b)=Σj=S
Having both scene-level histograms, in an embodiment, as an example and without limitation, one can apply cumulative density function (CDF) matching (Ref. [4-5]) to generate the forward mapping function (FLUT) from HDR to SDR, e.g.,
{tilde over (T)}F=CDF_MATCHING(hv(b),hs(b)). (28)
However, one can apply any known reshaping method to generate the original forward mapping function. If a node detects the need to apply trim-pass correction, then this forward reshaping function will need to be corrected using a reference DM curve and the trim-pass correction methods discussed next.
While the frame-based FLUTs are typically designed based on a frame's minimum, average, and maximum HDR and SDR luminance values, a reference DM curve is constructed a bit differently. For primary scenes, a reference DM curve uses the minimum and maximum HDR and SDR luminance values for all the frames in the scene. For secondary scenes, a reference DM curve uses the global minimum and maximum values. These global-minimum and global-maximum HDR and SDR values may define a broader dynamic range than the corresponding frame-based values. In an embodiment, these values may be preselected to represent the allowable legal range of values during data transmission. For example, in an embodiment, they may be defined based on the SMPTE range, thus, for 16-bit HDR data and 8-bit SDR data, a DM curve will map HDR codewords in [4096, 60160] to SDR codewords in [16, 235]. Alternatively, one may apply the full range of possible values, for example, by mapping the full 16-bit HDR range [0, 65535] to the full 10-bit SDR range [0, 1023].
For primary scenes, clipping-related distortion in the scene is measured using the SDR and HDR scene histograms. For example, if values of the normalized SDR histograms (e.g., see equation (55)) are lower than a certain threshold (e.g., 0.20 for 8-bit SDR data and 0.05 for 10-bit SDR data) and/or if there are no “variance peaks” (see equations (56)-(58)), then no trim correction is required, i.e. TF={tilde over (T)}F. Otherwise, if trim-related clipping is detected, then the trim-correction process is enabled. A range-restricted DM curve {tilde over (T)}rrDM is used frequently in trim correction and it is built as follows. Initially, a reference DM curve {tilde over (T)}DM is constructed using, for example, the minimum, average and maximum HDR luma values in the scene, i.e., vminY, vavgY, and vmaxY:
{tilde over (T)}DM=DM_TONE_MAPPING(vminY, vavgY, vmaxY) (29)
where DM_TONE_MAPPING() refers to a tone-mapping representation mapping the HDR luminance range [vminY, vmaxY] to the corresponding SDR range [sminY, smaxY] (e.g., see Ref. [8])). Differential FLUT {tilde over (t)}F and DM curve {tilde over (t)}DM values are derived from their original counterparts as:
{tilde over (t)}F(i)={tilde over (T)}F(i)−{tilde over (T)}F(i−1) for i∈[vminY+1, vmaxY],
{tilde over (t)}DM(i)={tilde over (T)}DM(i)−{tilde over (T)}DM(i−1) for i∈[vminY+1, vmaxY], (30)
where i denotes an index in the FLUT array. Values outside the index range [vminY+1, vmaxY] are set to zero. The differential DM curve is then bounded (or scaled) to the SDR range of the FLUT curve:
This bounded differential DM curve is denoted as the range-restricted differential DM curve. A cumulative sum of the elements of this range-restricted differential DM curve (corresponding to a simple integration operation) gives the range restricted DM curve {tilde over (T)}rrDM:
{tilde over (T)}rrDM(i)={tilde over (t)}rrDM(i)+{tilde over (T)}rrDM(i−1), for i>0,
and
{tilde over (T)}rrDM(i)={tilde over (T)}F(i) for i=0 (32)
For low-intensity trim correction, the original FLUT and DM curves are merged from vminY in until the merging point. On the other hand, in high intensity trim correction, the curves are combined from the high-merging point forwards. A merging point can be defined as a FLUT index value that marks the beginning/end of the FLUT and DM merging process depending on the area of trim correction. There can be one merging point in a low-intensity region and/or one for the high-intensity region.
As described in more detail in in Ref. [5] and in a later section (see “Computing Merging Points for Trim-pass Correction”), one begins with an initial estimate of the merging points with respect to the SDR codeword range. Let these first estimates be slm,1 and shm,1 for the low- and high-intensity regions respectively. The FLUT curve {tilde over (T)}F helps procure the equivalent merging points in the HDR range, namely, vlm,1 and vhm,1, by reverse mapping SDR codewords to HDR codewords:
v
l
m,1=max{i|{tilde over (T)}F(i)=slm,1},
v
h
m,1=min{i|{tilde over (T)}F(i)=shm,1} (33)
These first estimates are refined using the range restricted DM curve to get a second, more accurate, estimate of the merging point in the SDR range, slm,2 and shm,2, where
s
l
m,2={tilde over (T)}rrDM(vlm,1),
s
h
m,2={tilde over (T)}rrDM(vhm,1).tm (34)
These second estimates in the SDR domain are then reversed mapped to equivalent HDR merging point values. These second HDR merging points, vlm,2 and vhm,2, are used for the final merging.
v
l
m,2=max{i|{tilde over (T)}F(i)=slm,2},
v
h
m,2=min{i|{tilde over (T)}F(i)=shm,2} (35)
During merging, the differential FLUT curve is replaced by the range-restricted differential DM curve in the low or high intensity regions to avoid clipping. In order to keep the SDR mapped range of the resulting hybrid curve the same as the range in the original FLUT curve, the range-restricted differential DM curve should be scaled appropriately, as shown.
{tilde over (t)}corrF(i)={tilde over (t)}F(i). (38)
Finally, a cumulative sum of the updated or trim-corrected differential FLUT curve, {tilde over (t)}corrF, provides the final, trim-corrected, luma FLUT TF.
T
F(i)={tilde over (t)}corrF(i)+TF(i−1) for i>0 and
T
F(i)={tilde over (T)}F(i) for i=0. (39)
An example of the process is illustrated in
For secondary scenes, the process starts with the updated trim parameters generated earlier (see process 450). As discussed earlier (see process 450 in
Suppose the SDR merging points that are read in from the data block for the current subscene are now lm and shm for the low- and high-intensity range respectively. The HDR equivalents, vlm and vhm, of these merging points can be determined using reverse mapping. Then, these merging points may be used to construct the hybrid FLUT and DM curve. Let
v
l
m=max{i|{tilde over (T)}F(i)=slm},
v
h
m=min{i|{tilde over (T)}F(i)=shm} (33)
In an embodiment, a reference DM curve is generated based on global minimum and maximum possible HDR luma codeword values, denoted as HDRmin and HDRmax, (e.g., as those defined by the restricted SMPTE range) within the maximum possible range [0, 2B
DM=DM_TONE_MAPPING(HDRmin,vavgY, HDRmax). (41)
where the HDR range [HDRmin, HDRmax] is mapped to [SDRmin,SDRmax] (e.g., the restricted SMPTE range, the full range [0, 2B
For temporal stability, it is advisable to use a similar DM curve for every subscene. Choosing fixed min/max values allow the DM curve to be consistent throughout the sequence and it also covers the entire valid codeword range. These min/max values are configurable and should be set to lowest/highest possible codeword values in the sequence.
Next, the differential DM curve {tilde over (t)}DM and FLUT curve {tilde over (t)}F are derived as:
{tilde over (t)}DM(i)={tilde over (T)}DM(i)−{tilde over (T)}DM(i−1) for i∈[HDRmin+1, HDRmax],
{tilde over (t)}F(i)={tilde over (T)}F(i)−{tilde over (T)}F(i−1) for i∈[HDRmin+1, HDRmax] (42)
For trim correction in the low intensity region, the DM curve is scaled similarly to the legacy method. In an embodiment, scaling is preferred in the low intensity region to avoid introducing any constant offset in the FLUT. In the high intensity region, the FLUT curve is replaced by the original DM curve, without any scaling.
{tilde over (t)}corrF(i)={tilde over (t)}DM(i) for i∈[vhm+1, vmaxY]. (44)
{tilde over (t)}corrF(i)={tilde over (t)}F(i). (45)
Finally, a cumulative sum of the updated or trim-corrected differential FLUT curve, {tilde over (t)}corrF, yields the final luma FLUT TF.
T
F(i)={tilde over (t)}corrF(i)+TF(i−1) for i>0 and
T
F(i)={tilde over (T)}F(i) for i=0. (46)
An example, based on the curves in
For a secondary scene, in step 490, a reference DM is generated using scene-based average luminance values, but global minimum and maximum HDR luminance values (e.g., HDRmin, vavgY, HDrmax). As done for primary scenes, in step 492, differential FLUT and reference DM curves are constructed. Unlike what is done in primary scenes, the original merging points are not refined. Step 494 is similar to step 482; using the original merging points, the original differential FLUT curve is edited with segments from the differential DM curve that was generated in step 492, except that the low and high merging points are treated a bit differently. For a low-intensity merging point, the differential reference DM curve is scaled to match the SDR dynamic range (equation (43)). For a high-intensity merging point, the differential reference DM curve is used with no additional scaling (see equation (44)). Finally, in step 485, the trim-corrected FLUT is generated from the edited differential FLUT.
For chroma (e.g., ch=Cb or ch=Cr), one may again average over the a/B frame-based representations in equations (26) to generate a scene-based a/B matrix representation given by
and generate parameters for a multiple-color, multiple- regression (MMR) model of a reshaping function as (Ref. [2-3])
m
F,ch=(BF)−1aF,ch. (48)
Then, the reshaped SDR signal (229) can be generated as:
{circumflex over (v)}jF,ch=BFmF,ch. (49)
Generating the scene-based backward reshaping function (152) includes also both frame-level and scene-level operations. Since the luma mapping function is a single-channel predictor, one can simply revert the forward reshaping function to obtain the backward reshaping function. For chroma, one forms a 3DMT representation using the reshaped SDR data (229) and the original HDR data (504) and computes a new frame-based a/B representation as:
B
j
B=(SjB)TSjB,
a
j
B,ch=(SjB)TvjB,ch. (50)
At the scene-level, for luma, one may apply the histogram-weighted BLUT construction in Ref. [3] to generate the backward luma reshaping function. For chroma, one can again average the frame-based a/B representation to compute a scene-based a/B representation
with an MMR model solution for the backward reshaping mapping function given by
m
B,ch=(BB)−1aB,ch. (52)
Then, in a decoder, the reconstructed HDR signal (160) can be generated as:
{circumflex over (v)}jB,ch=BBmB,ch. (53)
Trim-related clipping may be present in low- or high-intensity regions. To fix these clipping artifacts, as discussed earlier, the trim-affected forward reshaping curve (FLUT) is merged with a trim-free display-mapping (DM) curve to build a hybrid curve that avoids clipping. Two important parameters for generating this hybrid curve are the merging points for the FLUT and DM curves. Initially, these merging points (e.g., slm and shm), one for the low-intensity region and one for the high-intensity region, are calculated in the SDR codeword range. Then, equivalent merging points (e.g., vlm and vhm) are derived in the HDR range using one or more transformations, as discussed earlier and in Ref. [5]. These HDR merging points are eventually the ones used for constructing the hybrid (trim-corrected) FLUT. This section presents an example method for computing the merging points slm and shm in the SDR codeword range.
The SDR and HDR histograms and the original FLUT needed for trim pass correction may be frame-based, scene-based or window-based (e.g., computed within a narrow window of frames), but the merging points are evaluated in the same way. Let {tilde over (T)}F denote the FLUT curve affected by trim-related clipping, and let hs, hv be the SDR and HDR codeword histograms respectively.
The first step is to detect peaks in the SDR luma histogram hs using a moving average filtered version of the SDR histogram, denoted as hsms. In an embodiment, element-wise differences between the original and the smoothened SDR histogram may be used to estimate the locations of these peaks.
h
pk
s(b)=max(0, hs(b)−hsms(b))∀b∈[0,2B
where Bs denotes the bit-depth of the SDR codewords. Histogram hpks is normalized by dividing each element by the sum of all the entries in the histogram to generate a normalized histogram
Each SDR bin b that has a normalized value
Due to HDR-to-SDR mapping, as the HDR codeword bit-depth is higher than the SDR bit-depth, multiple HDR codewords are mapped to a single SDR codeword. Clipping exacerbates this many-to-one mapping, as many more codewords get mapped to the same SDR codeword in the clipped regions. For any SDR codeword c, one can find the HDR codewords that are assigned to it and compute the variance of these codewords. Let
v
min
c=min{i|{tilde over (T)}F(i)=c},
v
max
c=max{i|{tilde over (T)}F(i)=c}. (56)
where c denotes any SDR codeword value from the entire SDR codeword range. Then, the cross-domain variance a for c can be calculated as
The cross domain variance σc2 is evaluated at every SDR codeword value and normalized by dividing with the sum of all the SDR codeword variances to get
Normalized variance values
The peaks in the SDR histogram and the cross domain variance array are collected in a set and separated out into low intensity and high intensity groups. Suppose j represents any peak in the set and τlocj<2B
s
l
m,j=τlocj+η×τsevj, (59)
where η denotes a constant that is determined heuristically (e.g., η=25 for 8-bit SDR and η=100 for 10-bit SDR).
The low-merging point in the SDR range, slm, is then the maximum of all these merging points bounded by 2B
s
l
m=min (max{slm,j}, 2B
For the high intensity region, the high-merging point, slm, in SDR range is derived as follows:
s
h
m,j=τlocj−η×τsevj, (61)
s
h
m=max (min[shm,j],2B
These SDR merging points are later used to find the final HDR merging points using one or more transformations.
Each of these references is incorporated by reference in its entirety.
1. H. Kadu et al., “Coding of high-dynamic range video using segment-based reshaping,” U.S. Pat. No. 10,575,028.
2. G-M. Su et al., “Multiple color channel multiple regression predictor,” U.S. Pat. No.8,811,490.
3. Q. Song et al., PCT Patent Application Ser. No. PCT/US2019/031620, “High-fidelity full reference and high-efficiency reduced reference encoding in end-to-end single-layer backward compatible encoding pipeline,” filed on May 9, 2019, published as WO 2019/217751.
4. B. Wen et al., “ Inverse luma/chroma mappings with histogram transfer and approximation,” U.S. Pat. No. 10,264,287.
5. H. Kadu and G-M. Su, “Reshaping curve optimization in HDR coding,” U.S. Pat. No. 10,397,576.
6. G-M. Su et al., “Workload allocation and processing in cloud-based coding of HDR video,” U.S. Provisional Patent Application, Ser. No. 63/049,673, filed on Jul. 9, 2020, also filed on Jul. 9, 2021, as PCT/US2021/040967.
7. H. Kadu et al., “Recursive segment to scene segmentation for cloud-based coding of HDR video,” U.S. Provisional Patent Application, Ser. No. 63/080255, filed on Sep. 18, 2020.
8. A. Ballestad and A. Kostin, “Method and apparatus for image data transformation,” U.S. Pat. No. 8,593,480,
Embodiments of the present invention may be implemented with a computer system, systems configured in electronic circuitry and components, an integrated circuit (IC) device such as a microcontroller, a field programmable gate array (FPGA), or another configurable or programmable logic device (PLD), a discrete time or digital signal processor (DSP), an application specific IC (ASIC), and/or apparatus that includes one or more of such systems, devices or components. The computer and/or IC may perform, control or execute instructions relating to trim-pass correction in cloud-based video coding of HDR video, such as those described herein. The computer and/or IC may compute, any of a variety of parameters or values that relate to trim-pass correction and node-based processing in cloud- based video coding of HDR video as described herein. The image and video dynamic range extension embodiments may be implemented in hardware, software, firmware and various combinations thereof.
Certain implementations of the invention comprise computer processors which execute software instructions which cause the processors to perform a method of the invention. For example, one or more processors in a display, an encoder, a set top box, a transcoder or the like may implement methods for trim-pass correction and node-based processing in cloud-based video coding of HDR video as described above by executing software instructions in a program memory accessible to the processors. The invention may also be provided in the form of a program product. The program product may comprise any non-transitory and tangible medium which carries a set of computer-readable signals comprising instructions which, when executed by a data processor, cause the data processor to execute a method of the invention. Program products according to the invention may be in any of a wide variety of non-transitory and tangible forms. The program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like. he computer-readable signals on the program product may optionally be compressed or encrypted.
Where a component (e.g. a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a “means”) should be interpreted as including as equivalents of that component any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.
Example embodiments that relate to trim-pass correction and node-based processing in cloud-based video coding of HDR video are thus described. In the foregoing specification, embodiments of the present invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and what is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Various aspects of the present invention may be appreciated from the following enumerated example embodiments (EEEs):
EEE1. A method for trim-pass correction for a video segment to be encoded in a computing node, the method comprising:
receiving in a current computing node a first video sequence comprising video frames in a high dynamic range;
generating (455) for each video frame in the first video sequence a first forward reshaping function and frame-based trim-pass correction parameters, wherein a forward reshaping function maps frame-pixels from the high dynamic range to a second dynamic range lower than the high dynamic range;
for a subscene in the first video sequence that is part of a parent scene to be processed by the current computing node and one neighbor computing node:
applying the output forward reshaping function to the frames in the subscene to generate an output video subscene in the second dynamic range; and
compressing the output video subscene to generate a coded subscene in the second dynamic range.
EEE3. The method of EEE 1 or EEE 2, wherein for a high-dynamic range (HDR) frame in the first video sequence, frame-based trim-pass correction parameters comprise one or more of:
an average luminance value of the HDR frame,
a first merging point in a low-luminance region of a standard-dynamic range (SDR) frame representing the same scene as the HDR frame but at the second dynamic range, and
a second merging point in a high-luminance region of the SDR frame.
EEE4. The method of EEE 3, wherein computing the scene-based trim-pass correction parameters comprises computing:
where B denotes a number of bumper frames, denotes a set of frames in the trim-pass parameter window, vavgj denotes an average luminance value of the j-th HDR frame, slm,j denotes a first merging point of the j-th HDR frame, shm,j denotes a second merging point of the j-th HDR frame, and and denote the scene-based trim-pass correction parameters.
EEE5. The method of EEE 4, further comprising assigning the scene-based trim-pass correction parameters to every subscene to be processed in the current computing node.
EEE6. The method of any of EEEs 3-5, wherein generating for an HDR frame in the subscene an output forward reshaping function comprises:
generating a reference tone-mapping function mapping pixel values from the high dynamic range to pixel values in the second dynamic range based at least on an average luminance value in the subscene and global minimum and maximum HDR and SDR luminance values; and
generating the output forward reshaping function by:
starting from the high-intensity merging point, replacing the scene-based forward reshaping function based on a corresponding section in the reference tone-mapping function.
EEE7. The method of EEE 6, wherein generating the output forward reshaping function comprises:
generating a differential tone-mapping function based on the reference tone-mapping function;
generating a differential scene-based forward reshaping function based on the scene-based forward reshaping function;
generating an updated differential scene-based forward reshaping function by replacing, starting from the high-intensity merging point, the differential scene-based forward reshaping function with a corresponding part of the differential tone-mapping function; and
generating the output forward reshaping function by integrating the updated differential scene-based forward reshaping function.
EEE8. The method of EEE 6 or EEE 7, wherein given an average HDR luminance value and an average SDR luminance value in the subscene, the reference tone-mapping function maps:
the average HDR luminance value to the average SDR luminance value;
a maximum global luminance in the high dynamic range to a maximum global luminance value in the second dynamic range; and
a minimum global luminance value in the high dynamic range to a minimum global luminance value in the second dynamic range.
EEE9. The method of any of EEEs 1-8, wherein for a subscene in the first video sequence that is part of a parent scene to be processed by the current computing node and one neighbor computing node:
if the neighbor computing node is prior to the current computing node, then the trim-pass parameter window comprises all (B) bumper frames before the first frame of the first video sequence (C0), up to, but not including, frame Cs=C0+B; and
if the neighbor computing node is subsequent to the current computing node, then
determining (462) a first trim-pass parameter window comprising frames in the first video sequence and bumper frames shared with the neighbor prior node;
determining a second trim-pass parameter window comprising frames in the first video sequence and bumper frames shared with the neighbor subsequent node;
computing a first set of scene-based trim-pass correction parameters based on the frame-based trim-pass correction parameters of the frames in the first trim-pass parameter window;
computing a second set of scene-based trim-pass correction parameters based on the frame-based trim-pass correction parameters of the frames in the second trim-pass parameter window;
computing interpolated frame-based trim-pass correction parameters based on the first set and second set of the scene-based trim-pass correction parameters; and
where denotes a trim-pass correction parameter in the first set of scene-based trim-pass correction parameters, denotes a trim-pass correction parameter in the second set of scene-based trim-pass correction parameters, ρx denotes an interpolated frame-based trim-pass correction parameter for the x-th frame between a0, the last frame of the first trim-pass parameter window, and an, the first frame of the second trim-pass parameter window.
EEE12. The method of EEE 11, wherein generating a scene-based trim-pass correction parameter (ρA) based on the interpolated scene-based trim-pass correction parameters comprises computing
wherein as denotes the starting frame of the subscene and ae denotes the starting frame of the next subscene.
EEE13. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions for executing with one or more processors a method in accordance with any one of the EEEs 1-12.
EEE14. An apparatus comprising a processor and configured to perform any one of the methods recited in EEEs 1-12.
Number | Date | Country | Kind |
---|---|---|---|
20196876.5 | Sep 2020 | EP | regional |
20200781.1 | Oct 2020 | EP | regional |
This application claims the benefit of priority from U.S. Provisional Patent Application 63/080,255, filed on 18 Sep. 2020; European Patent Application 20196876.5, filed on 18 Sep. 2020; U.S. Provisional Patent Application 63/089,154, filed on 8 Oct. 2020 and European Patent Application Ser. No. 20200781.1, filed on 8 Oct. 2020, which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/050957 | 9/17/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63080255 | Sep 2020 | US | |
63089154 | Oct 2020 | US |