This application claims the benefit, under 35 U.S.C. § 365 of International Application PCT/US2018/063254, filed Nov. 30, 2018, which was published in accordance with PCT Article 21(2) on Jun. 6, 2019 in English and which claims the benefit of European patent application 17306671.3, filed Nov. 30, 2017.
The present disclosure relates generally to picture and video distribution using in high-dynamic range (HDR) and more generally for YUV Saturation control for HDR adaptation.
This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Standard-Dynamic-Range pictures (SDR pictures) are color pictures whose luminance values are represented with a limited dynamic usually measured in power of two or f-stops. SDR pictures have a dynamic around 10 fstops, i.e. a ratio 1000 between the brightest pixels and the darkest pixels in the linear domain, and are coded with a limited number of bits (most often 8 or 10 in HDTV (High Definition Television systems) and UHDTV (Ultra-High Definition Television systems) in a non-linear domain, for instance by using the ITU-R BT.709 OEFT (Optico-Electrical-Transfer-Function). This limited non-linear representation does not allow correct rendering of small signal variations, in particular in dark and bright luminance ranges. In High-Dynamic-Range pictures (HDR pictures), the signal dynamic is much higher (up to 20 f-stops, a ratio one million between the brightest pixels and the darkest pixels) and a new non-linear representation is needed in order to maintain a high accuracy of the signal over its entire range. In HDR pictures, raw data are usually represented in floating-point format (either 32-bit or 16-bit for each component, namely float or half-float), the most popular format being openEXR half-float format (16-bit per RGB component, i.e. 48 bits per pixel) or in integers with a long representation, typically at least 16 bits.
By contrast, High Dynamic Range pictures (HDR pictures) are color pictures whose luminance values are represented with a HDR dynamic that is higher than the dynamic of a SDR picture. The HDR dynamic is not yet defined by a standard but one may expect a dynamic range up to a few thousands nits. For instance, a HDR color volume is defined by a RGB BT.2020 color space and the values represented in said RGB color space belong to a dynamic range from 0 to 4000 nits. Another example of HDR color volume is defined by a RGB BT.2020 color space and the values represented in said RGB color space belong to a dynamic range from 0 to 1000 nits.
Tone mapping is a technique used in image processing and computer graphics to map one set of colors to another to approximate the appearance of high-dynamic-range images in a medium that has a more limited dynamic range. Tone mapping addresses the problem of strong contrast reduction from the scene radiance to the displayable range while preserving the image details and color appearance important to appreciate the original scene content.
One challenging problem is performing tone mapping in a way that true colors are represented during the distribution of an HDR picture (or video) while, at the same time, distributing an associated SDR picture (or video) representative of a color-graded version of said HDR picture (or video). Current prior art techniques are limited in solutions they present. Consequently, improved techniques are needed in the distribution of a video or HDR image that is to be processed by medium of a more limited dynamic range in a way that colors are reproduced correctly.
In one embodiment, a method is provided for generating a tone mapping function to reduce dynamic range of a first image to produce a second image. A luma signal and a plurality of chroma components associated with the first and second image are then determined. A gamut color correction is then performed on the second image using an adaptive function. The adaptive function is generated by comparing said luma signal and at least one chroma component of said first and second image.
In a different embodiment, a system is provided having processing means configured to generate a tone mapping function to reduce dynamic range of a first image to produce a second image. A luma signal and a plurality of chroma components associated with the first and second image are then determined using the processing means. A gamut color correction is then performed using the processing means on the second image using an adaptive function. The adaptive function is generated by comparing said luma signal and at least one chroma component of said first and second image.
Additional features and advantages are realized through the techniques of the present disclosure. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawings.
The present disclosure will be better understood and illustrated by means of the following embodiment and execution examples, in no way limitative, with reference to the appended figures on which:
Wherever possible, the same reference numerals will be used throughout the figures to refer to the same or like parts.
It is to be understood that the figures and descriptions of the present invention have been simplified to illustrate elements that are relevant for a clear understanding of the present invention, while eliminating, for purposes of clarity, many other elements found in typical digital multimedia content delivery methods and systems. However, because such elements are well known in the art, a detailed discussion of such elements is not provided herein. The disclosure herein is directed to all such variations and modification. In addition, various inventive features are described below that can each be used independently of one another or in combination with other features. Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
In general, chromaticity is an objective specification of the quality of a color regardless of its luminance and is often represented by two independent parameters, namely hue (h) and colorfulness. The white point of an illuminant or of a display is a neutral reference characterized by a chromaticity and all other chromaticities may be defined in relation to this reference using polar coordinates. The hue is the angular component, and the purity is the radial component, normalized by the maximum radius for that hue.
A color picture also contains several arrays of samples (pixel values) in a specific picture/video format which specifies all information relative to the pixel values of a picture (or a video) and all information which may be used by a display and/or any other device to visualize and/or decode a picture (or video) for example. A color picture comprises at least one component, in the shape of a first array of samples, usually a luma (or luminance) component, and at least one another component, in the shape of at least one other array of samples. Or, equivalently, the same information may also be represented by a set of arrays of color samples (color component), such as the traditional tri-chromatic RGB representation. A pixel value is represented by a vector of C values, where c is the number of components. Each value of a vector is represented with a number of bits which defines a maximal dynamic range of the pixel values.
In step 10, a set of parameters SP is obtained to reconstruct the image I3. These parameters are either parameters P obtained from the bitstream B, or recovered parameters Pr when at least one parameter P is lost, corrupted or not aligned with a decoded image whose graphics or overlay is added to. In step 11, a module M1 obtains the decoded image and in step 12, a module M2 reconstructs the image I3 from the decoded image by using the set of parameters SP. The decoded image data is obtained from the bitstream (signal) B or any other bitstream and, possibly, said bitstreams may be stored on a local memory or any other storage medium. In sub-step 101 (of step 10), a module M3 obtains the parameters P required to reconstruct the image I3. In sub-step 102 (of step 10), a module M4 checks if at least one of the parameters P is lost, corrupted or not aligned with the decoded image whose graphics or overlay is added to. When none of the parameter P is lost, corrupted, or not aligned with the decoded image whose graphics or overlay is added to, the set of parameters SP only comprises the parameters P.
When at least one of the parameters P is either lost, corrupted or not aligned with the decoded image whose graphics or overlay is added to, in sub-step 103 (of step 10), a module M5 obtains an information data ID indicating how said parameters have been processed, in sub-step 104 (of step 10), a module M6 selects a recovery mode RMi according to said information data ID, and in sub-step 105 (of step 10), a module M7 recovers said at least one lost, corrupted or not aligned parameter by applying the selected recovery mode RMi. The at least one recovered parameter Pr is added to the set of parameters SP. In step 12, the image I3 is then reconstructed taking also into account said at least one recovered parameter Pr. The method is advantageous because it allows to obtain parameters for a single layer based distribution solution when multiple single layer based distribution solutions share a same set of syntax elements for carrying a common set of parameters and when said single layer based distribution solutions require different recovery modes (process) for recovering lost, corrupted or not aligned parameters, guaranteeing thus the success of the reconstruction of the image I3 for each of said single layer based distribution solution. The method is also advantageous when a CE device, typically a set-top-box or a player, inserts graphics on top of a decoded imager , because the method selects a specific recovery mode to replace the not aligned parameters by parameters adapted to the decoded image I2 plus the graphics (or overlay) and reconstructs the image I3 by using said recovered parameters from said decoded image whose graphics or overlay is added to, avoiding thus some flickering artefacts or undesired effects impacting the reconstructed image quality.
It should be noted that image data refer to one or several arrays of samples (pixel values) in a specific image/video format which specifies all information relative to the pixel values of an image (or a video) and all information which may be used by a display and/or any other device to visualize and/or decode a image (or video) for example. An image comprises a first component, in the shape of a first array of samples, usually representative of luminance (or luma) of the image, and a second and third component, in the shape of other arrays of samples, usually representative of the color (or chroma) of the image. Or, equivalently, the same information may also be represented by a set of arrays of color samples, such as the traditional tri-chromatic RGB representation. A pixel value is represented by a vector of C values, where C is the number of components. Each value of a vector is represented with a number of bits which defines a maximal dynamic range of the pixel values.
Standard-Dynamic-Range images (SDR images) are images whose luminance values are represented with a limited number of bits (typically 8). This limited representation does not allow correct rendering of small signal variations, in particular in dark and bright luminance ranges. In high-dynamic range images (HDR images), the signal representation is extended to maintain a high accuracy of the signal over its entire range. In HDR images, pixel values representing luminance levels are usually represented in floating-point format (typically at least 10 bits per component, namely float or half-float), the most popular format being openEXR half-float format (16-bit per RGB component, i.e. 48 bits per pixel) or in integers with a long representation, typically at least 16 bits.
The High Efficiency Video Coding (HEVC) standard (ITU-T H.265 Telecommunication standardization sector of ITU (10/2014), series H: audiovisual and multimedia systems, infrastructure of audiovisual services—coding of moving video, High efficiency video coding, Recommendation ITU-T H.265) enables the deployment of new video services with enhanced viewing experience, such as Ultra HD broadcast services. In addition to an increased spatial resolution, Ultra HD can bring a wider color gamut (WCG) and a higher dynamic range (HDR) than the Standard dynamic range (SDR) HD-TV currently deployed. Different solutions for the representation and coding of HDR/WCG video have been proposed (SMPTE 2014, “High Dynamic Range Electro-Optical Transfer Function of Mastering Reference Displays, or SMPTE ST 2084, 2014, or Diaz, R., Blinstein, S. and Qu, S. “Integrating HEVC Video Compression with a High Dynamic Range Video Pipeline”, SMPTE Motion Imaging Journal, Vol. 125, Issue 1. February 2016, pp 14-21).
SDR backward compatibility with decoding and rendering devices is an important feature in some video distribution systems, such as broadcasting or multicasting systems. This is because some applications may need a solution based on a single layer coding/decoding process may be backward compatible, e.g. SDR compatible, and may leverage legacy distribution networks and services already in place. Such a single layer based distribution solution enables both high quality HDR rendering on HDR-enabled Consumer Electronic (CE) devices, while also offering high quality SDR rendering on SDR-enabled CE devices. Such a single layer based distribution solution generates an encoded signal, e.g. SDR signal, and associated metadata (of a few bytes per video frame or scene) that can be used to reconstruct another signal, e.g. HDR signal, from a decoded signal, e.g. SDR signal. Metadata stored parameters values used for the reconstruction of the signal and may be static or dynamic. Static metadata means metadata that remains the same for a video (set of images) and/or a program.
Static metadata are valid for the whole video content (scene, movie, clip . . . ) and may not depend on the image content. They may define, for example, image format or color space, color gamut. For instance, SMPTE ST 2086:2014, “Mastering Display Color Volume Metadata Supporting High Luminance and Wide Color Gamut Images” is such a kind of static metadata for use in production environment. The Mastering Display Colour Volume (MDCV) SEI (Supplemental Enhanced Information) message is the distribution flavor of ST 2086 for both H.264/AVC (“Advanced video coding for generic audiovisual Services”, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Recommendation ITU-T H.264, Telecommunication Standardization Sector of ITU, January 2012) and HEVC video codecs. Dynamic metadata are content-dependent, that is metadata can change with the image/video content, e.g. for each image or when each group of images. As an example, SMPTE ST 2094:2016 standards families, “Dynamic Metadata for Color Volume Transform” are dynamic metadata for use in production environment. SMPTE ST 2094-30 can be distributed along HEVC coded video stream thanks to the Colour Remapping Information (CRI) SEI message.
Other single layer based distribution solutions exist on distribution networks for which display adaptation dynamic metadata are delivered along with a legacy video signal. These single layer based distribution solutions may produce HDR 10-bits image data (e.g. image data which signal is represented as an HLG10 or PQ10 signal as specified in Rec. ITU-R BT.2100-0 “Recommendation ITU-R BT.2100-0, Image parameter values for high dynamic range television for use in production and international program exchange”) and associated metadata from an input signal (typically 12 or 16 bits), encodes said HDR 10-bits image data using, for example an HEVC Main 10 profile encoding scheme, and reconstructs a video signal from a decoded video signal and said associated metadata. The dynamic range of the reconstructed signal being adapted according to the associated metadata that may depend on characteristics of a target display.
Dynamic metadata transmission in actual real-world production and distribution facilities were hard to guarantee and could be possibly lost or corrupted because of splicing, overlay layers insertion, professional equipment pruning bitstream, stream handling by affiliates and current lack of standardization for the carriage of metadata throughout the post-production/professional plant. The single layer based distribution solutions cannot work without the presence of different bunch of dynamic metadata with some of them being critical for guaranteeing the success of the reconstruction of the video signal. Similar issues may also occur when dynamic metadata are not aligned with an image whose graphics or overlay is added to. This occurs, for example, when graphics (overlays, OSD, . . . ) are inserted in (added to) an image outside the distribution chain because the metadata, computed for said image, is also applied once the graphics are inserted in (added to) the image. The metadata are then considered as being not aligned with the image whose graphics or overlay are added to because they may not be adapted to the part of said image which contains graphics or overlay. These issues might be characterized by image flickering on fixed portion of graphics when the decoded image is displayed over time or by undesirable effects (saturation, clipping . . . ) on portion of the image containing graphics or overlay processed with inappropriate metadata (e.g. bright OSD processed by metadata generated for a dark content).
Referring back to
The format adaptation of the above mentioned steps, 21, 22, 25, and 26 may also include color space conversion and/or color gamut mapping. Usual format adapting processes may be used such as RGB-to-YUV or YUV-to-RGB conversion, BT.709-to-BT.2020 or BT.2020-to-BT.709, down-sampling or up-sampling chroma components, etc. Note that the well-known YUV color space refers also to the well-known YCbCr in the prior art. Annex E of the recommendation ETSI recommendation ETSI TS 103 433 V1.1.1, release 2016-8, provides an example of format adapting processes and inverse gamut mapping (Annex D). The input format adaptation step 21 may also include adapting the bit depth of the original image I1 to specific bit depth such as 10 bits for example, by applying a transfer function on the original image I1. For example, a PQ or HLG transfer function may be used (Rec. ITU-R BT.2100-0).
In one embodiment, the pre-processing stage 20 comprises steps 200-202. In step 200, a first component c1 of the output image I12 is obtained by mapping a first component C1 of the original image I1:
c1=TM(C1)
with TM being a mapping function. The mapping function TM may reduce or increase the dynamic range of the luminance of the original image I1 and its inverse may increase or reduce the dynamic range of the luminance of an image.
In one embodiment, in step 201, a second and third component u′, v′ of the output image I12 are derived by correcting second and third components U′, V′ of the original image I1 according to the first component c1. The correction of the chroma components may be maintained under control by tuning the parameters of the mapping. The color saturation and hue are thus under control. According to an embodiment of step 201, the second and third components U′ and V′ are divided by a scaling function β0(c1) whose value depends on the first component c1. Mathematically speaking, the second and third components u′, v′ are given by:
Optionally, in step 202, the first component c1 may be adjusted to further control the perceived saturation, as follows:
c=c1−max(0,a.u′+b.v′)
where a and b are two parameters of the set of parameters SP. This step 202 allows to control the luminance of the output image I12 to guarantee the perceived color matching between the colors of the output image I12 and the colors of the original image I1. The set of parameters SP may comprise parameters relative to the mapping function TM or its inverse ITM, the scaling function β0(c1). These parameters are associated with dynamic metadata and carried in a bitstream, for example the bitstream B. The parameters a and b may also be carried in a bitstream.
In one embodiment, at the post-processing part, in step 10, a set of parameters SP is obtained as explained in
c1=c+max(0,a.u′+b.v′)
where a and b are two parameters of the set of parameters SP. In step 121, the first component C1 of the image I3 is obtained by inverse-mapping the first component c1:
C1=ITM(c1)
In step 122, the second and third component U′, V′ of the image I3 are derived by inverse correcting the second and third components u′, v′ of the decoded image according to the component c1. According to an embodiment, a second and third components u′ and v′ are multiplied by a scaling function β0(c1) whose value depends on the first component c1. Mathematically speaking, the two first and second components U′, V′ are given by:
According to a first embodiment of the method of
and the second and third component U′, V′ are derived by applying a pseudo-gammatization using square-root (close to BT.709 OETF) to the RGB components of the original image I1:
In step 200, the first component y1 of the output image I12 is obtained by mapping said linear-light luminance component L:
y1=TM(L)
In step 201, the second and third component u′, v′ of the output image I12 are derived by correcting the first and second components U′, V′ according to the first component y1. At the post-processing part, in step 121, a linear-light luminance component L of the image I3 is obtained by inverse-mapping the first component c1:
L=ITM(y1)
In step 122, the second and third component U′, V′ of the image I3 are derived by inverse correcting the second and third components u′, v′ of the output image I12 according to the first component y1. According to an embodiment of step 122, the second and third components u′ and v′ are multiplied by a scaling function β0(y1) whose value depends on the first component y1. Mathematically speaking, the two first and second components U′, V′ are given by:
According to a second embodiment of the method of
and the second and third component U′, V′ by applying a gammatization to the RGB components of the original image I1:
where γ may be a gamma factor, preferably equal to 2.4. Note, the component Y′, which is a non-linear signal, is different of the linear-light luminance component L.
In step 200, the first component y′1 of the output image I12 is obtained by mapping said component Y′:
y′1=TM(Y′)
In step 121, a reconstructed component is obtained by inverse-mapping the first component y′1:
=ITM(y′1)
where ITM is the inverse of the mapping function TM. The values of the reconstructed component belong thus to the dynamic range of the values of the component Y′.
In step 201, the second and third component u′, v′ of the output image I12 are derived by correcting the first and second components U′, V′ according to the first component y′1 and the reconstructed component . This step (201) allows to control the colors of the output image I12 and guarantees their matching to the colors of the original image I1. The correction of the chroma components may be maintain under control by tuning the parameters of the mapping (inverse mapping). The color saturation and hue are thus under control. Such a control is not possible, usually, when a non-parametric perceptual transfer function is used. According to an embodiment of step 201, the second and third components U′ and V′ are divided by a scaling function β0(y′1) whose value depends on the ratio of the reconstructed component over the component y′1:
with Ω is constant value depending on the color primaries of the original image I1 (equals to 1.3 for BT.2020 for example). At the post-processing part, in step 121, a component of the image I3 is obtained by inverse-mapping the first component y′1:
In step 122, the second and third component U′, V′ of the image I3 are derived by inverse correcting the second and third components u′, v′ of the decoded image according to the first component y′1 and the component . According to an embodiment of step 122, a second and third components u′ and v′ are multiplied by the scaling function β0(y′1) Mathematically speaking, the two first and second components U′, V′ are given by:
The mapping function, TM, based on a perceptual transfer function, whose goal is to convert a component of an original image I1 into a component of an output image I12, thus reducing (or increasing) the dynamic range of the values of their luminance. The values of a component of an output image I12 belong thus to a lower (or greater) dynamic range than the values of the component of an original image I1. The perceptual transfer function uses a limited set of control parameters.
According one embodiment based on the recommendation ETSI TS 103 433 V1.1.1, the dynamic metadata may be conveyed according to either a so-called parameter-based mode or a table-based mode. The parameter-based mode may be of interest for distribution workflows which primary goal is to provide direct SDR backward compatible services with very low additional payload or bandwidth usage for carrying the dynamic metadata. The table-based mode may be of interest for workflows equipped with low-end terminals or when a higher level of adaptation is required for representing properly both HDR and SDR streams.
In the parameter-based mode, dynamic metadata to be conveyed are luminance mapping parameters representative of the inverse function ITM, i.e.
In the table-based mode, dynamic data to be conveyed are pivots points of a piece-wise linear curve representative of the inverse mapping function ITM. For example, the dynamic metadata are luminanceMappingNumVal that indicates the number of the pivot points, luminanceMappingX that indicates the x values of the pivot points, and luminanceMappingY that indicates they values of the pivot points (see recommendation ETSI TS 103 433 V1.1.1 clauses 6.2.7 and 6.3.7 for more details). Moreover, other dynamic metadata to be conveyed may be pivots points of a piece-wise linear curve representative of the scaling function β0(⋅). For example, the dynamic metadata are colorCorrectionNumVal that indicates the number of pivot points, colorCorrectionX that indicates the x values of pivot points, and colorCorrectionY that indicates the y values of the pivot points (see the recommendation ETSI TS 103 433 V1.1.1 clauses 6.2.8 and 6.3.8 for more details). These dynamic metadata may be conveyed using the HEVC Colour Remapping Information (CRI) SEI message whose syntax is based on the SMPTE ST 2094-30 specification (recommendation ETSI TS 103 433 V1.1.1 Annex A.4). Typical payload is about 160 bytes per scene. In step 102, the CRI (Colour Remapping Information) SEI message (as specified in HEVC/H.265 version published in December 2016) is parsed to obtain the pivot points of a piece-wise linear curve representative of the inverse mapping function ITM and the pivot points of a piece-wise linear curve representative of the scaling function β0(⋅), and the chroma to luma injection parameters a and b.
In step 12, the inverse mapping function ITM is derived from those of pivot points relative to a piece-wise linear curve representative of the inverse mapping function ITM (see recommendation ETSI TS 103 433 V1.1.1 clause 7.2.3.3 for more details). In step 12, the scaling function β0(⋅), is also derived from those of said pivot points relative to a piece-wise linear curve representative of the scaling function β0(⋅), (see recommendation ETSI TS 103 433 V1.1.1 clause 7.2.3.4 for more details). Note that static metadata also used by the post-processing stage may be conveyed by SEI message. For example, the selection of either the parameter-based mode or table-based mode may be carried by the Information (TSI) user data registered SEI message (payloadMode) as defined by the recommendation ETSI TS 103 433 V1.1.1 (clause A.2.2). Static metadata such as, for example, the color primaries or the maximum display mastering display luminance are conveyed by a Mastering Display Colour Volume (MDCV) SEI message as defined in AVC, HEVC.
According an embodiment of step 103, the information data ID is explicitly signaled by a syntax element in a bitstream and thus obtained by parsing the bitstream. For example, said syntax element is a part of an SEI message. According to an embodiment, said information data ID identifies what is the processing applied to the original image I1 to process the set of parameters SP. According to this embodiment, the information data ID may then be used to deduce how to use the parameters to reconstruct the image I3 (step 12). For example, when equal to 1, the information data ID indicates that the parameters SP have been obtained by applying the pre-processing stage (step 20) to an original HDR image I1 and that the decoded image is a SDR image. When equal to 2, the information data ID indicates that the parameters have been obtained by applying the pre-processing stage (step 20) to an HDR10 bits image (input of step 20), that the decoded image is a HDR10 image, and the mapping function TM is a PQ transfer function. When equal to 3, the information data ID indicates that the parameters have been obtained by applying the pre-processing stage (step 20) to a HDR10 image (input of step 20), that the decoded image is an HLG10 image, and the mapping function TM is a HLG transfer function to the original image I1.
According to an embodiment of step 103, the information data ID is implicitly signaled. For example, the syntax element transfer-characteristics present in the VUI of HEVC (annex E) or AVC (annex E) usually identifies a transfer function (mapping function TM) to be used. Because different single layer distribution solutions use different transfer function (PQ, HLG, . . . ), the syntax element transfer-characteristics may be used to identify implicitly the recovery mode to be used.
The information data ID may also be signaled by a service defined at a higher transport or system layer. In accordance with another example, a peak luminance value and the color space of the image I3 may be obtained by parsing the MDCV SEI message carried by the bitstream, and the information data ID may be deduced from specific combinations of peak luminance values and color spaces (color primaries).
According to an embodiment of step 102, a parameter P is considered as being lost when it is not present in (not retrieved from) the bitstream. For example, when the parameters P are carried by SEI message such as the CVRI or CRI SEI messages as described above, a parameter P is considered as being lost (not present) when the SEI message is not transmitted in the bitstream or when the parsing of the SEI message fails. According to an embodiment of step 103, a parameter P is considered as being corrupted when at least one of the following conditions is fulfilled:
According to an embodiment of the method, a recovery mode RMi is to replace all the parameters P by recovered parameters Pr even if only some of the parameters P are not corrupted, lost or not aligned with the decoded image (whose graphics or overlay is added to). According to an embodiment of the method, another recovery mode RMi is to replace each lost, corrupted or not aligned parameter P by a recovered parameter Pr. According to an embodiment of the method, a recovery mode RMi is to replace a lost, corrupted or not aligned parameter P by a value of a set of pre-determined parameter values previously stored. For example, a set of pre-determined parameter values may gather a pre-determined value for at least one metadata carried by the CRI and/or CVRI SEI message. A specific set of pre-determined parameter values may be determined, for example, for each single layer based distribution solution identified by the information data ID.
Table 1 (below) provides for an illustration of a non-limitative example of specific set of predetermined values for 3 different single layer based distribution solutions.
According to the Table 1, three different sets of predetermined values are defined according to the information data ID. These sets of predetermined values defined recovered values for some parameters used by the post-processing stage. The other parameters being set to fixed values that are common to the different single layer solutions.
According to an embodiment of step 104, a recovery mode RMi is selected according to either at least one characteristic of the original video (image I1), typically the peak luminance of the original content, or of a mastering display used to grade the input image data or the image data to be reconstructed, or at least one characteristic of another video, typically the peak luminance of the reconstructed image I3, or of a target display.
According to an embodiment, a recovery mode RMi is to check if a characteristic of the original video (10 or of a mastering display used to grade the input image data or the image data to be reconstructed (e.g. a characteristic as defined in ST 2086) is present and to compute at least one recovered parameter from said characteristic. If said characteristic of the input video is not present and a characteristic of a mastering display is not present, one checks if a characteristic of the reconstructed image I3 or of the target display is present (e.g. the peak luminance as defined in CTA-861.3) and computes at least one recovered parameter from said characteristic. If said characteristic of the reconstructed image I3 is not present and said characteristic of the target display is not present, at least one recovered parameter is a fixed value (e.g. fixed by a video standardization committee or an industry forum such as, for example 1000 cd/m2).
In accordance with a non-limitative example, Table 2 provides examples of recovery values for some parameters used by the post-processing stage that depends on the presence of available information on the input/output content and mastering/target displays.
The parameters matrix_coefficient_value[ i] may be set according to the input/output video color space, BT.709 or BT.2020 (characteristic of the input or output video) obtained by parsing a MDCV SEI/ST 2086 message if present. The recovery mode depends on said color spaces. The parameter shadow_gain_control may be computed according to a value obtained by parsing a MDCV SEI/ST 2086 message if present. For example, an information representative of the peak luminance of a mastering display is obtained from said MDCV SEI/ST 2086 message and the parameter shadow_gain_control is computed by (recovery mode 1):
It is likely that at a service level information or for a specific workflow the value of hdrDisplayMaxLuminance is known. This value may also be set to the peak luminance of a target (presentation) display when this characteristic is available. Otherwise (recovery mode 2), it is arbitrarily set to a default value, typically 1000 cd/m2. This default value corresponds to a currently observed reference maximum display mastering luminance in most of the current HDR markets.
When an overlay has to added to a decoded image , the decoded image is obtained (step 11) and, in step 60, a composite image I′2 is obtained by adding graphics (overlay) to the decoded image . The information data ID is then obtained (step 103), a recovery mode selected (step 104) and the selected recovery mode RMi applies (step 105) to obtain recovered parameters Pr. The image I3 is then reconstructed (step 12) from the recovered parameters Pr and the decoded image .
According to an embodiment, the parameters Pr are obtained by training a large set of images of different aspects (bright, dark, with logos and other alternative embodiments). Optionally (not shown in
Device 70 comprises following elements that are linked together by a data and address bus 71:
In accordance with an example, the battery 76 is external to the device. In each of mentioned memory, the word «register» used in the specification can correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data). The ROM 73 comprises at least a program and parameters. The ROM 73 may store algorithms and instructions to perform techniques in accordance with present principles. When switched on, the CPU 72 uploads the program in the RAM and executes the corresponding instructions.
RAM 64 comprises, in a register, the program executed by the CPU 72 and uploaded after switch on of the device 70, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
In accordance with an example, the input video or an original image of an input video is obtained from a source. For example, the source belongs to a set comprising:
In accordance with examples, the bitstreams carrying on the metadata are sent to a destination. As an example, one of these bitstream or both are stored in a local or remote memory, e.g. a video memory (74) or a RAM (74), a hard disk (73). In a variant, at least one of the bitstreams is sent to a storage interface (75), e.g. an interface with a mass storage, a flash memory, ROM, an optical disc or a magnetic support and/or transmitted over a communication interface (75), e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.
In accordance with other examples, the bitstream carrying on the metadata is obtained from a source. Exemplarily, the bitstream is read from a local memory, e.g. a video memory (74), a RAM (74), a ROM (73), a flash memory (73) or a hard disk (73). In a variant, the bitstream is received from a storage interface (75), e.g. an interface with a mass storage, a RAM, a ROM, a flash memory, an optical disc or a magnetic support and/or received from a communication interface (75), e.g. an interface to a point to point link, a bus, a point to multipoint link or a broadcast network.
In accordance with examples, device 70 being configured to implement the method as described above, belongs to a set comprising:
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and any other device for processing a image or a video or other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a computer readable storage medium. A computer readable storage medium can take the form of a computer readable program product embodied in one or more computer readable medium(s) and having computer readable program code embodied thereon that is executable by a computer. A computer readable storage medium as used herein is considered a non-transitory storage medium given the inherent capability to store the information therein as well as the inherent capability to provide retrieval of the information therefrom. A computer readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. It is to be appreciated that the following, while providing more specific examples of computer readable storage mediums to which the present principles can be applied, is merely an illustrative and not exhaustive listing as is readily appreciated by one of ordinary skill in the art: a portable computer diskette; a hard disk; a read-only memory (ROM); an erasable programmable read-only memory (EPROM or Flash memory); a portable compact disc read-only memory (CD-ROM); an optical storage device; a magnetic storage device; or any suitable combination of the foregoing.
The instructions may form an application program tangibly embodied on a processor-readable medium. In addition, as provided in
The HDR-to-SDR decomposition process aims at converting the input linear-light 4:4:4 HDR, to an SDR compatible version (also in 4:4:4 format). The process also uses side information such as the mastering display peak luminance, color primaries, and the color gamut of the container of the HDR and SDR pictures. The HDR-to-SDR decomposition process generates an SDR backward compatible version from the input HDR signal, using an invertible process that guarantees a high quality reconstructed HDR signal.
The process is summarized in
A derivation of linear-light luminance L is then obtained from linear-light RGB signal is obtained then as follows:
with A=[A1 A2 A3]T being the conventional 3×3 R′G′B′-to-Y′CbCr conversion matrix (e.g. BT.2020 or BT.709 depending on the colour space), A1, A2, A3 being 1×3 matrices.
Then the linear-light luminance L is mapped to an SDR-like luma Ytmp, using the luminance mapping function:
Ytmp=LUTTM(L) (eq. 3)
In the next step, the chroma components are built as follows (step 3 of
Then the U and V values are derived as follows:
In the final step, a colour correction (step 4) is applied
where A2, A3 are made of the second and third lines of coefficients of the conversion matric from R′G′B′-to-Y′CbCr, and β0 is the pre-processing colour correction LUT, and the CLAMP function is a clamping function defined by CLAMPUV(x)=min(max(x,−512),511) for a 10 bits YUV output.
The luma component is corrected as follows:
Ypre1=Ypre0−ν×max(0,a.Upre1+b.Vpre1) (eq. 7)
Where a and b are two pre-defined parameters, and which results in the output SDR signal Ypre1Upre1Vpre1.
The luminance mapping variables used in the JHDR Tone mapping curve are defined in SMPTE ST 2094-20.
In one embodiment, the HDR reconstruction process is the inverse of the HDR-to-SDR decomposition process. It applies the following steps for each pixel of the SDR picture made of three components SDRy, SDRcb, SDRcr. First the values Upost1 and Vpost1 are derived as follows:
The value Ypost1 is derived as follows:
Ypost1=SDRy[x][y]ν×max(0,a×Upost1+b×Vpost1) (eq. 9)
Possibly followed by a clipping to avoid to be out of the legacy signal range.
Where βp is the post-processing colour correction LUT, that depends directly on the pre-processing colour correction LUT β0.
First a value T is derived as:
T=k0×Upost1×Vpost1+k1×Upost1×Upost1+k2×Vpost1×Vpost1 (eq. 11)
where k0, k1, k2 are predefined values depending on the SDR colour gamut. The value S0 is then initialized to 0, and the following applies:
If (T≤1),S0 is set to Sqrt(1−T)
Else, Upost1 and Vpost1 are modified as follows:
In a next step, the values R1, G1, B1 are derived as follows:
where MY′CbCr-to-R′G′B′ is the conventional conversion matrix from Y′CbCr to R′G′B′.
The values R2, G2, 82 are derived from R1, G1, B1 as follows:
The output samples HDRR, HDRG, HDRB are derived from R2, G2, B2 as follows:
A clipping may be applied to limit the range of the output HDR signal.
A scanning of the RGB cube is performed, and each RGB sample is modified to reach a luminance of 1 cd/m2. Then the following applies
The output sample YUVSDR as described in the HDR-to-SDR decomposition process is built, with β_0=β_test. Then an error in the Lab color space, errorab, between RGBsdr and RGBhdr is computed. This step is controlled by a parameter (saturation skew) that enables to control the color saturation of the derived SDR signal. And err is updated as follows:
err=err+errorab (eq. 16)
The final value β0[Y] corresponds to βtest giving the lowest cumulated err value among all the tested βtest values.
The LUTs β0 and βtest are linked by the following equation
Where K is a constant value.
For coding of the LUT □P, as it can be seen, the derivation of these LUTs is not straightforward. In particular, it is not possible to apply this derivation process at the decoder to build the LUT □P. Instead the following process is applied.
A set of pre-defined default LUTs □P_default[k], k=1 to N, is used. For instance, one LUT is defined for each triple (container colour gamut, content colour gamut, peak luminance). At the pre-processing side, an adjustment function fad; is built to map as much as possible the LUT □P_default[k] to the real LUT □P. that is such that
□P_cod[Y]=fadj[Y]×□P_default[k][Y] (eq. 18)
is as close as possible to □P_real[k][Y] for all Y values.
To limit the coding cost, the function fadj is modeled using pivot points of a piece-wise linear model.
Only these PWL pivot points are coded. The post-processing can then decode these points, build the function fadj and reconstruct the □P_cod LUT from the default LUT □P_default (identified thanks to the coded content characteristics parameters) and fadj by applying equation eq. 18.
From eq5 and eq8, it can be seen that if Upre1 and/or Vpre1 are clamped, then it is not possible to achieve the equality Ypre1=Ypost1 necessary for good reconstruction.
As seen previously the β0 computation results in finding the best color correction by minimizing an error criterion over the RGB samples. To be sure that no clamp occurs in eq5, we need to have for all RGB samples of the frame:
To resolve this situation, avoiding UV saturation could be achieved by forcing the β0(Y0) to be high enough but this would led to a loss of saturation in all SDR image which would be non-efficient for most content and is therefore not satisfactory. However, for some specific contents, it is necessary to decrease the SDR saturation in order to allow a correct HDR reconstruction.
The process for deriving the LUT □0 is independent is independent from the content can be performed in the following manner. It applies in the container color gamut and takes into account the content color gamut. The synoptic of this process is summarized herein.
The process can be summarized as follows. For each luma value Y, the following steps are applied. The luminance is generated using the inverse function of LUTTM: L=invLUTTM[Y]. Then the best β0[Y] for luminance L (and therefore for luma Y) is identified as follows. Values βtest in a given pre-defined range are evaluated as follows. The cumulative error err associated to βtest is computed as follows.
In order to prevent the UV saturation, the function fadj is adapted for each frame in the following way:
A graphical depiction of the result is provided in
This algorithm needs a temporal stabilization which can be done, for instance, by:
fadj_corrected(t)=0.25*fadj_corrected+0.75*fadj_corrected(t−1)
According to an embodiment illustrated in
According to an embodiment of the disclosure, the network is a broadcast network, adapted to broadcast still pictures or video pictures from device A to decoding devices including the device B.
In addition, this same process can be applied to both HDR codecs that have a Constant Luminance mode (already discussed) and one that has a Non Constant Luminance mode. The latter computes HDR luma Y′ instead of a linear-light luminance L from linear-light RGB signal while maintaining the compatibility with SLHDR1 post-processing. In one embodiment, this luma Y′ component is a weighted sum of gamma-compressed R′G′B′ components of a color video depending of the input color gamut. In this way, the HDR chrominance (U′HDR, V′HDR) components are corrected to produce SDR chroma components.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and any other device for processing a picture or a video or other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a computer readable storage medium. A computer readable storage medium can take the form of a computer readable program product embodied in one or more computer readable medium(s) and having computer readable program code embodied thereon that is executable by a computer. A computer readable storage medium as used herein is considered a non-transitory storage medium given the inherent capability to store the information therein as well as the inherent capability to provide retrieval of the information therefrom. A computer readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. It is to be appreciated that the following, while providing more specific examples of computer readable storage mediums to which the present principles can be applied, is merely an illustrative and not exhaustive listing as is readily appreciated by one of ordinary skill in the art: a portable computer diskette; a hard disk; a read-only memory (ROM); an erasable programmable read-only memory (EPROM or Flash memory); a portable compact disc read-only memory (CD-ROM); an optical storage device; a magnetic storage device; or any suitable combination of the foregoing.
The instructions may form an application program tangibly embodied on a processor-readable medium.
Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
In this way, as provided in flowchart depictions of
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.
Number | Date | Country | Kind |
---|---|---|---|
17306671 | Nov 2017 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/063254 | 11/30/2018 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2019/108899 | 6/6/2019 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7567705 | Hamburg | Jul 2009 | B1 |
7679786 | Scott et al. | Mar 2010 | B2 |
8441498 | Lammers et al. | May 2013 | B2 |
20080297596 | Inomata | Dec 2008 | A1 |
20140198261 | Yamaguchi | Jul 2014 | A1 |
20160093066 | Lin | Mar 2016 | A1 |
20170064334 | Minoo | Mar 2017 | A1 |
20170078706 | Van Der Vleuten | Mar 2017 | A1 |
20170105014 | Lee et al. | Apr 2017 | A1 |
20170256039 | Hsu | Sep 2017 | A1 |
20180005358 | Lasserre et al. | Jan 2018 | A1 |
20180025477 | Min | Jan 2018 | A1 |
20190156468 | Olivier | May 2019 | A1 |
Number | Date | Country |
---|---|---|
107211141 | Sep 2017 | CN |
2039145 | Mar 2009 | EP |
3242482 | Nov 2017 | EP |
WO 2016055178 | Apr 2016 | WO |
WO 2016118395 | Jul 2016 | WO |
WO 2017190985 | Nov 2017 | WO |
Entry |
---|
Besrour, Amine, et al., “The Conception and Implementation of a Local HDR fusion Algorithm depending on contrast and luminosity parameters”, Proceedings of the SPIE, vol. 9598, 95980Z, 11 pl. 2015. |
Hwi-Gang Kim, et al., “Color Saturation Compensation in iCAM06 for High-Chroma HDR Imaging”, IEICE Transactions on Fundamentals of Electronics; Communications and Computer Sciences, vol. E94-A, No. 11, 2353-7, Nov. 2011. |
Lasserre S et al, Technicolor's response to cfe for HDR and WCG; “Single layer HDR video coding with SDR backward compatibility”, ISO/IEC JTC1/SC29/WG11 MPEG2014/M36263r1, Jun. 2015 Warsaw, Poland. |
Lee, Ji Won, et al. “Combined Luminance Compression and Color Constraint for Color Correction in Tone Mapping” 2011 IEEE 54th International Midwest Symposium on Circuits and Systems. |
Diao, Motong, et al., “Tone Mapping for High-Dynamic-Range Images Using Localized Gamma Correction” Journal of Electronic Imaging, vol. 24, No. 1, 013010 (12PP) 2015. |
Mantiuk R et al—“Color correction for tone mapping”, Computer Graphics Forum, Wiley-Blackwell Publishing Ltd, GB, vol. 28, No. 2, Apr. 1, 2009, pp. 193-282. |
European Search Report for EP17306671 dated. Mar. 20, 2018. |
SMPTE ST 2084:2014, “High Dynamic Range Electro-Optical Transfer Function of Mastering Reference Displays”, 14 pages. |
Diaz, et al., “Integrating HEVC Video Compression with a High Dynamic Range Video Pipeline”, SMPTE—Technical Paper, 7 pages. |
Number | Date | Country | |
---|---|---|---|
20210183028 A1 | Jun 2021 | US |