Recently developed image and video file formats, such as high dynamic range (HDR) and wide color gamut (WCG) formats, enable the creation and display of video and image content with more realistic contrast, brightness, and color. To support these file formats, a number of new electro-optical transfer functions have been developed. An electro-optical transfer function (EOTF) defines the non-linear mapping from digital code values representing pixels of an image to the luminance output of a display displaying the image. Some commonly used mappings are based on gamma transfer functions or logarithmic transfer functions.
The non-linearity of the EOTF generally implies that, depending on the particular EOTF being used during a file format conversion, the precision (i.e., number of digital code values) available in different luminance ranges may vary. Thus, due to the unique properties of each EOTF, different EOTFs may be preferable for different applications requiring file format conversion (e.g., production, compression for delivery, display, etc.). For instance, an EOTF that uses a gamma mapping in shadow and mid-tone regions but a logarithmic mapping in brighter luminance regions may be useful for providing backward compatibility to non-HDR displays when delivering a single video stream to both HDR and non-HDR displays. However, an EOTF that provides an absolute mapping from digital code value to display value may be better suited for maintaining creative intent when going from production to display.
In one example, the present disclosure describes a device, computer-readable medium, and method for image format conversion using luminance-adaptive dithering. For instance, in one example, a method includes acquiring an image in a first format, wherein the first format is associated with a first electro-optical transfer function, identifying a second format to which to convert the image, wherein the second format is associated with a second electro-optical transfer function, and applying dithering to the image in the second format, based on an evaluation of a luminance-dependent metric against a predefined threshold, wherein the luminance-dependent metric is computed from at least one of the first electro-optical transfer function and the second electro-optical transfer function.
In another example, a device includes a processor and a computer-readable medium storing instructions which, when executed by the processor, cause the processor to perform operations. The operations include acquiring an image in a first format, wherein the first format is associated with a first electro-optical transfer function, identifying a second format to which to convert the image, wherein the second format is associated with a second electro-optical transfer function, and applying dithering to the image in the second format, based on an evaluation of a luminance-dependent metric against a predefined threshold, wherein the luminance-dependent metric is computed from at least one of the first electro-optical transfer function and the second electro-optical transfer function.
In another example, a computer-readable medium stores instructions which, when executed by the processor, cause the processor to perform operations. The operations include acquiring an image in a first format, wherein the first format is associated with a first electro-optical transfer function, identifying a second format to which to convert the image, wherein the second format is associated with a second electro-optical transfer function, and applying dithering to the image in the second format, based on an evaluation of a luminance-dependent metric against a predefined threshold, wherein the luminance-dependent metric is computed from at least one of the first electro-optical transfer function and the second electro-optical transfer function.
The teachings of the present disclosure can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
In one example, the present disclosure provides a technique for image format conversion using luminance-adaptive dithering. As discussed above, a number of new electro-optical transfer functions (EOTFs) have been developed to support file formats such as high dynamic range (HDR) and wide color gamut (WCG) formats, which enable the creation and display of video and image content with more realistic contrast, brightness, and color. Due to the unique properties of each EOTF, different EOTFs may be preferable for different applications requiring file format conversion (e.g., production, compression for delivery, display, etc.).
A typical HDR video production and delivery chain may include a number of interoperability points at which the video is converted from a source EOTF to a different, destination EOTF. The bit-depth of the video (i.e., the total number of digital code values available) typically remains constant (e.g., ten bits per color component, for source and destination) during these conversions. If the precision of the destination EOTF is lower than the precision of the source EOTF, video quality may be lost in luminance regions. For instance, loss of video quality may be most apparent in smooth regions of an image, i.e., where a smooth gradient in the source video will exhibit contouring or banding artifacts in the converted form.
Examples of the present disclosure applying dithering (i.e., an intentionally applied form of noise used to minimize large-scale patterns such as color banding) in a luminance-adaptive manner when converting an image or video file from a source EOTF to a destination EOTF (i.e., where the bit-depth may remain constant). A luminance-dependent metric may be computed for both the source EOTF and the destination EOTF. In one example, dithering is applied if the luminance-dependent metric of the destination EOTF falls below a predefined threshold that is determined based on the visibility of luminance differences to the human eye. In another example, dithering is applied if the difference between the luminance-dependent metrics of the source and destination EOTFs exceeds a predefined threshold that is determined based on an acceptable level of precision loss.
Although examples of the disclosure are discussed within the context of “file formats,” which may imply that images are stored in files, it will be appreciated that these examples could also apply to image conversions in which no files are stored (e.g., such as real-time broadcasts). As such, any references to “file formats” also apply to formats that are not stored, unless stated otherwise.
To better understand the present disclosure,
In one example, the system 100 generally comprises an EOTF comparison module 102 and a dithering module 102. The EOTF comparison module 102 examines the respective EOTFs of the source and destination images (e.g., the first and second EOTFs), and, based on this examination, determines whether or not to perform dithering in any regions of the converted image. In one example, the examination is performed on a pixel-by-pixel basis. In a further example, each pixel may be further broken down into separate color components (e.g., red, green, and blue) for examination.
In one example, the EOTF comparison module 102 computes a luminance-dependent metric 106, such as a luminance-dependent precision P(L), for each pixel in the source image and each corresponding pixel in the destination image. This luminance-dependent metric 106 may be obtained for each pixel/image pair based on the EOTF of the file format of the image (e.g., source image or destination image). Then, the EOTF comparison module 102 may evaluate the luminance-dependent metric 106 against a predefined threshold 108 in order to determine whether dithering should be performed on the destination image.
Based on the determination of the EOTF comparison module 102, the dithering module 104 may apply dithering to one or more pixels of the destination image to adjust the luminance(s) and/or colors of the one or more pixels.
In one example, the bit-depth of the destination image is equal to the bit-depth of the source image.
To further aid in understanding the present disclosure,
The method 200 begins in step 202. In step 204, a pixel of the image is selected.
In step 206, a luminance-dependent metric is computed for the pixel for both the source EOTF and the destination EOTF. In one example, the luminance-dependent metric may be referred to as “luminance-dependent precision,” or P(L), where L denotes the surround luminance of the selected pixel. Thus, if the digital code value representing the luminance of the selected pixel in the source image is denoted as n, then the value of the surround luminance, L, for the selected pixel can be obtained as EOTFS(n), where EOTFS is the EOTF of the source image file format. In other examples, the value of the surround luminance, L, for the selected pixel may be computed as a weighted sum, a median value, a maximum value, or a minimum value of the luminance in a local neighborhood surrounding the selected pixel.
Given the surround luminance, L, for the selected pixel, the luminance-dependent precision P(L) for the selected pixel may be computed as follows:
where EOTF−1 denotes the inverse EOTF (i.e., the digital code value associated with a given luminance). The EOTF may be the source EOTF (i.e., EOTFS,) or the destination EOTF (i.e., EOTFD). In this case, the luminance-dependent precision P(L) can be described as a monotonically increasing function of the number of codewords per luminance unit of the selected pixel. Any given EOTF may have different levels of precision in different luminance ranges. For instance, the source EOTF may have a higher precision than the destination EOTF in a first luminance range, but the destination EOTF may have a higher precision than the source EOTF in a second luminance range.
In step 208, a predefined threshold is evaluated based on at least one of the luminance-dependent metrics (i.e., based on the luminance-dependent metric for the source EOTF and/or the luminance dependent metric for the destination EOTF).
In one example, evaluation of the threshold includes determining whether the luminance-dependent metric of the destination EOTF, i.e., PD(L), is below a first threshold, TP(L). In one example, the value of the first threshold TP(L) is determined based on the visibility of luminance differences to the human eye. For instance, in one example, first threshold TP(L) may be obtained using a luminance-dependent just noticeable difference (JND) metric, where the JND metric indicates the smallest error in luminance that would be visible to the human eye. The JND metric in this case increases as the surround luminance of the selected pixel increases. In one example, assuming a JND metric, J(L), the first threshold TP(L) can be calculated as:
In another example, evaluation of the threshold includes determining whether the difference between the luminance-dependent metric of the source EOTF, i.e., PS(L), and the luminance-dependent metric of the destination EOTF, i.e., PD(L) at the surround luminance of the selected pixel is above a second threshold T(L). In one example, the difference between PS(L) and PD(L), i.e., ΔP(L), can be calculated as:
ΔPS→D(L)=PS(L)−PD(L). (EQN. 3)
In this case, when ΔP(L) is greater than zero, this indicates a loss of precision when mapping the luminance of L from the source EOTF (i.e., EOTFS) to the destination EOTF (i.e., EOTFD). Conversely, when ΔP(L) is less than zero, this indicates that precision can be maintained. Since the computations of PS(L) and PD(L) account for the respective bit-depths of the source and destination file formats, ΔP(L) will indicate the difference in luminance precision even if the source file format's bit depth is different from the destination file format's bit depth.
In step 210, it is determined, based on the evaluation of the threshold in step 208, whether dithering should be applied to the selected pixel. For instance, in one example, if the first luminance-dependent metric of the destination EOTF, i.e., PD(L), is below the first threshold TP(L)—i.e., PD(L)<TP(L)—then it is determined that dithering should be applied to the selected pixel. In another example, if the difference between the luminance-dependent metric of the source EOTF and the luminance-dependent metric of the destination EOTF is greater than the second threshold T(L)—i.e., ΔP(L)>T(L)—then it is determined that dithering should be applied to the selected pixel. In the latter case based on the second threshold T(L), the value of T(L) may be set to zero in order to apply dithering aggressively when loss of precision is detected. In another example, rather than making a binary (e.g., yes/no) decision as to whether to apply dithering, the strength of the dithering may be increased or decreased as a function of ΔP(L).
Step 212 confirms whether dithering should be applied to the selected pixel. If it is confirmed in step 212 that dithering should be applied to the selected pixel, then the dithering is applied to the selected pixel in step 214. In one example, the dithering may be applied based on the difference between the luminance-dependent metric of the source EOTF, i.e., PS(L), and the luminance-dependent metric of the destination EOTF, i.e., PD(L) at the surround luminance of the selected pixel. The difference between the luminance-dependent metric of the source and the luminance dependent metric of the destination may be calculated as discussed above in connection with EQN. 3.
Alternatively, if it is confirmed in step 212 that dithering should not be applied to the selected pixel, then the selected pixel is left alone in step 216.
In step 218, it is determined whether there are any pixels remaining in the image (i.e., any pixels for which a determination as to whether to apply dithering has not yet been made).
If it is determined in step 218 that there are pixels remaining in the image, then the method 200 returns to step 204 and selects a new pixel (i.e., a pixel for which a determination as to whether to apply dithering has not yet been made) of the image. The method 200 then proceeds as described above.
Alternatively, if it is determined in step 218 that there are no pixels remaining in the image, then the method 200 ends in block 220.
Thus, in some examples, the method 200 ensures that application of dithering is limited to the regions of the image in which video quality is lost due to conversion of the image's file format. Thus, the amount of additional noise and/or distortion that is added to the image is limited.
In one example, where it is determined through operation of the method 200 that dithering should be performed on a pixel, the dithering may be applied separately to each color component (e.g., red, green, blue) of the pixel. In this case, separate surround luminance values—denoted as LR, LG, and LB—may be computed for each color component of the pixel. Steps 206-210 may then be performed separately for each color component of the pixel based on these surround luminance values, applying separate thresholds (e.g., TP(LR), TP(LG), TP(LB), T(LR), T(LG), T(LB)) to determine when and how much dithering should be applied. In another example, dithering may be applied to a pixel based on the aggregate luminance over all of its color components, which could be obtained as a weighted sum of the respective luminances of the individual color components. It should be noted that this technique is not limited to R,G,B color representations of the pixel. For instance, X,Y,Z tristimulus representations or other representations may be used instead and analyzed accordingly.
Although not expressly specified above, one or more steps of the method 200 may include a storing, displaying and/or outputting step as required for a particular application. In other words, any data, records, fields, and/or intermediate results discussed in the method can be stored, displayed and/or outputted to another device as required for a particular application. Furthermore, operations, steps, or blocks in
In one example, one or both of the source file format and the destination file format may be defined as “scene-referred,” as opposed, e.g., to “display-referred.” In a display-referred format, the digital code values that represent the image are mapped to absolute luminance values on a display (e.g., up to the capabilities of the display). By contrast, in a scene-referred format, the digital code values represent scene light, and the mapping to luminance values on a display will be scaled relative to the capabilities of the display. For example, the same code value may be mapped to 500 candela per square meter (cd/m2) on a 1,000 cd/m2 peak luminance display, but scaled to 1,000 cd/m2 on a 2,000 cd/m2 peak luminance display.
Typically, scene-referred file formats are defined in terms of an opto-electronic transfer function (OETF) that maps scene light to digital code values, rather than in terms of an EOTF. Thus, when converting to and/or from a scene-referred format, the computation of luminance dependent precision P(L) (e.g., in accordance with step 206 of the method 200) may include an additional operation of deriving an EOTF from the OETF of the scene-referred file format that is the source and/or destination format). In one example, derivation of the EOTF from the OETF may include applying the inverse OETF to recover the relative scene light and then applying a display-dependent opto-optical transfer function (OOTF) to map the scene light to the luminance capabilities of the display.
In one example, the EOTF is derived from the OETF based on an assumption of a specific reference display's (or virtual display's) capabilities (e.g., peak luminance of 1,000 cd/m2 with a DCI-P3 color gamut).
The method 300 begins in step 302. In step 304, an opto-electronic transfer function (OETF) is identified.
In step 306, the inverse of the OETF (i.e., the relative scene light values that correspond to each digital codeword) is computed.
In step 308, an opto-optical transfer function (OOTF) is computed from the display parameters of the reference display. The OOTF maps scene light values to display light values of the reference display.
In step 310, the EOTF for the OETF is derived as a function of the inverse of the OETF computed in step 306 and the OOTP computed in step 308. In one example, the EOTF comprises a mapping from the digital codewords of the OETF to the display light values of the reference display.
The method ends in step 312. The EOTF derived in accordance with the method 300 may be used to determine the luminance-dependent precision P(L) as discussed above.
In another example, the EOTF may be derived from the OETF using a different set of assumptions. For example, if the capabilities of the display on which the scene-referred image format has been or will be viewed are known, then the EOTF can be adjusted to fit the capabilities of that display rather than a generic reference display.
In another example (e.g., where the destination file format is a scene-referred image format), dithering can be minimized by performing multiple conversions in which each conversion (and, therefore, each level of dithering), corresponds to a particular target display.
In another example, a conservative estimate for the luminance-dependent precision of the target EOTF, PD(L), may be obtained when the destination image file format is a scene-referred image format. This conservative estimate may be obtained by using the EOTF that provides the minimum value for PD(L) over a set of most likely EOTFs, given the particular scene-referred image file format.
In another example, is the transfer function associated with the source image file format is an EOTF, and the transfer function associated with the destination image file format is an OETF, then the source image can be pre-analyzed to determine the peak luminance and color gamut information.
In another example, an OOTF may be determined based on available metadata describing the peak luminance and color gamut of the reference display.
As depicted in
The hardware processor 402 may comprise, for example, a microprocessor, a central processing unit (CPU), or the like. The memory 404 may comprise, for example, random access memory (RAM), read only memory (ROM), a disk drive, an optical drive, a magnetic drive, and/or a Universal Serial Bus (USB) drive. The dithering module 405 may include circuitry and/or logic for performing special purpose functions relating to performing image format conversion using luminance-adaptive dithering. The input/output devices 406 may include, for example, a camera, a video camera, storage devices (including but not limited to, a tape drive, a floppy drive, a hard disk drive or a compact disk drive), a receiver, a transmitter, a display, an output port, or a user input device (such as a keyboard, a keypad, a mouse, and the like).
Although only one processor element is shown, it should be noted that the general-purpose computer may employ a plurality of processor elements. Furthermore, although only one general-purpose computer is shown in the Figure, if the method(s) as discussed above is implemented in a distributed or parallel manner for a particular illustrative example, i.e., the steps of the above method(s) or the entire method(s) are implemented across multiple or parallel general-purpose computers, then the general-purpose computer of this Figure is intended to represent each of those multiple general-purpose computers. Furthermore, one or more hardware processors can be utilized in supporting a virtualized or shared computing environment. The virtualized computing environment may support one or more virtual machines representing computers, servers, or other computing devices. In such virtualized virtual machines, hardware components such as hardware processors and computer-readable storage devices may be virtualized or logically represented.
It should be noted that the present disclosure can be implemented in software and/or in a combination of software and hardware, e.g., using application specific integrated circuits (ASIC), a programmable logic array (PLA), including a field-programmable gate array (FPGA), or a state machine deployed on a hardware device, a general purpose computer or any other hardware equivalents, e.g., computer readable instructions pertaining to the method(s) discussed above can be used to configure a hardware processor to perform the steps, functions and/or operations of the above disclosed method(s). In one example, instructions and data for the present dithering module or process 405 for performing image format conversion using luminance-adaptive dithering (e.g., a software program comprising computer-executable instructions) can be loaded into memory 404 and executed by hardware processor element 402 to implement the steps, functions or operations as discussed above in connection with the example methods 200 and 300. Furthermore, when a hardware processor executes instructions to perform “operations,” this could include the hardware processor performing the operations directly and/or facilitating, directing, or cooperating with another hardware device or component (e.g., a co-processor and the like) to perform the operations.
The processor executing the computer readable or software instructions relating to the above described method(s) can be perceived as a programmed processor or a specialized processor. As such, the present dithering module 405 (including associated data structures) of the present disclosure can be stored on a tangible or physical (broadly non-transitory) computer-readable storage device or medium, e.g., volatile memory, non-volatile memory, ROM memory, RAM memory, magnetic or optical drive, device or diskette and the like. More specifically, the computer-readable storage device may comprise any physical devices that provide the ability to store information such as data and/or instructions to be accessed by a processor or a computing device such as a computer or an application server.
While various examples have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred example should not be limited by any of the above-described examples, but should be defined only in accordance with the following claims and their equivalents.
This application is a continuation of U.S. patent application Ser. No. 15/914,355, filed on Mar. 7, 2018, now U.S. Pat. No. 10,832,613, which is herein incorporated by reference in its entirety. The present disclosure relates generally to digital media distribution, and relates more particularly to devices, non-transitory computer-readable media, and methods for converting image and video formats using luminance-adaptive dithering to reduce visual artifacts.
Number | Name | Date | Kind |
---|---|---|---|
4937677 | van Dorsselaer | Jun 1990 | A |
5638187 | Cariffe | Jun 1997 | A |
6441867 | Daly | Aug 2002 | B1 |
8750390 | Sun et al. | Jun 2014 | B2 |
9736419 | Kozuka et al. | Aug 2017 | B2 |
20110051817 | Mizoguchi | Mar 2011 | A1 |
20120133668 | Harada et al. | May 2012 | A1 |
20120154423 | Barnhoefer et al. | Jun 2012 | A1 |
20140363093 | Miller et al. | Dec 2014 | A1 |
20160330513 | Toma et al. | Nov 2016 | A1 |
20160360214 | Sole Rojals et al. | Dec 2016 | A1 |
20170006273 | Borer et al. | Jan 2017 | A1 |
20170026627 | Toma et al. | Jan 2017 | A1 |
20170134703 | Norkin | May 2017 | A1 |
20170249721 | Hirai | Aug 2017 | A1 |
20180027262 | Reinhard et al. | Jan 2018 | A1 |
20180048845 | Kozuka et al. | Feb 2018 | A1 |
20180048892 | Norkin | Feb 2018 | A1 |
20180220144 | Su et al. | Aug 2018 | A1 |
Entry |
---|
ITU-R Rec. BT. 1886, “Reference electro-optical transfer function for flat panel displays used in HDTV studio production,” International Telecommunications Union, 2011. 7 Pages. |
SMPTE ST. 2084:2014, “High Dynamic Range Electro-Optical Transfer Function of Mastering Reference Displays,” The Society for Moving Picture and Television Engineers, 2014. 14 Pages. |
“Technical Summary for S-Gamut3.Cine/S-Log3 and S-Gamut3/S-Log3,” https://www.sony.co.uk/pro/support/attachment/1237494271390/1237494271406/technical-summary-for-s-gamut3-cine-s-log3-and-s-gamut3-s-log3.pdf. 7 Pages. |
ITU-R Rec. BT. 2100-0, “Image parameter values for high dynamic range television for use in production and International programme exchange,” International Telecommunications Union, 2016. 17 Pages. |
P. G. J. Barten, “Formula for the contrast sensitivity of the human eye,” Proc. SPIE-IS&T, vol. 5294:231-238, Jan. 2004. 15 Pages. |
S. Miller, M. Nezamabadi, and S. Daly, “Perceptual Signal Coding for More Efficient Usage of Bit Codes,” SMPTE Annual Technical Conference, 2012. 39 Pages. |
ITU-R Rec. BT. 2390-2, “High dynamic range television for production and international programme exchange,” International Telecommunications Union, 2017. 45 Pages. |
Number | Date | Country | |
---|---|---|---|
20210056886 A1 | Feb 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15914355 | Mar 2018 | US |
Child | 17092892 | US |