This invention relates to processing a video signal from a source, to convert from a high dynamic range (HDR) to a signal usable by devices having a lower dynamic range.
High dynamic range (HDR) video is starting to become available. HDR video has a dynamic range, i.e. the ratio between the brightest and darkest parts of the image, of 10000:1 or more. Dynamic range is sometimes expressed as “stops” which is logarithm to the base 2 of the dynamic range. A dynamic range of 10000:1 therefore equates to 13.29 stops. The best modern cameras can capture a dynamic range of 13.5 stops and this is improving as technology develops.
Conventional televisions (and computer displays) have a restricted dynamic range of about 100:1. This is sometimes referred to as standard dynamic range (SDR).
HDR video provides a subjectively improved viewing experience. It is sometime described as an increased sense of “being there” or alternatively as providing a more “immersive” experience. For this reason many producers of video would like to produce HDR video rather than SDR video. Furthermore since the industry worldwide is moving to HDR video, productions are already being made with high dynamic range, so that they are more likely to retain their value in a future HDR world.
At present HDR video may be converted to SDR video through the process of “colour grading” or simply “grading”. This is a well-known process, of long heritage, in which the colour and tonality of the image is adjusted to create a consistent and pleasing look. Essentially this is a manual adjustment of the look of the video, similar in principle to using domestic photo processing software to change the look of still photographs. Professional commercial software packages are available to support colour grading. Grading is an import aspect of movie production and movies, which are produced in relatively high dynamic range, and are routinely graded to produce SDR versions for conventional video distribution. However the process of colour grading requires the use of a skilled operator, is time consuming and, therefore expensive. Furthermore it cannot be used on “live” broadcasts such as sports events.
HDR still images may be converted to SDR still images through the process of “tone mapping”. Conventional photographic prints have a similar, low, dynamic range to SDR video. There are many techniques in the literature for tone mapping still images. However these are primarily used, with user intervention in the same style as colour grading, to produce an artistically pleasing SDR image. There is no one accepted tone mapping algorithm than can be used automatically to generate an SDR image from an HDR one. Furthermore many tone mapping algorithms are computationally complex rendering them unsuitable for real time video processing.
Attempts have been made to adapt still image tone mapping algorithms for application to video. However these tend to suffer from a fundamental problem of inconsistency across time. Conventional still image tone mapping produce an image dependent mapping of the input HDR image to the output SDR image. Consequently the mapping changes according to the image content. This is unsuitable for video processing where it is necessary to maintain the same mapping for objects in a scene as they move, change orientation, move in and out of shadows and appear and disappear from the scene. Therefore for video processing a static, i.e. image independent, mapping is required. Conventional still image tone mapping algorithms do not provide such a static mapping of HDR to SDR.
Various attempts have been made to convert between HDR video signals and signals useable by devices using lower dynamic ranges (for simplicity referred to as standard dynamic range (SDR)). One such approach is to modify an opto electronic transfer function (OETF).
where:
L is luminance of the image 0≤L≤1
V is the corresponding electrical signal Note that although the Rec 709 characteristic is defined in terms of the power 0.45, overall, including the linear portion of the characteristic, the characteristic is closely approximated by a pure power law with exponent 0.5.
Combined with a display gamma of 2.4 this gives an overall system gamma of 1.2. This deliberate overall system non-linearity is designed to compensate for the subjective effects of viewing pictures in a dark surround and at relatively low brightness. This compensation is sometimes known as “rendering intent”. The power law of approximately 0.5 is specified in Rec 709 and the display gamma of 2.4 is specified in ITU Recommendation BT.1886 (hereafter Rec 1886). Whilst the above processing performs well in many systems improvements are desirable for signals with extended dynamic range.
The arrangement shown in
We have appreciated that the dynamic range of the video signal may be increased by using alternative OETFs such as those mentioned, or other OETF, but that this can cause consequential problems in relation to other qualities of the video signal. We have further appreciated the need to maintain usability of video signals produced by HDR devices with equipment having lower than HDR dynamic range. We have further appreciated the need to avoid undesired colour changes when processing an HDR signal to provide usability with existing standards.
The invention is defined in the claims to which reference is directed.
In broad terms, the invention provides conversion of a video signal from a high dynamic range source to produce a signal usable by devices of a lower dynamic range involving a function that compresses a luminance components in a manner that depends upon the maximum allowable luminance for the lower dynamic range scheme for the corresponding colour component of each pixel.
An embodiment of the invention provides advantages as follows. The separation into luminance and colour components prior to compression of luminance ensures that relative amounts of colour as represented in the source signals (such as RGB) do not alter as a result of the compression. This ensures that colours are not altered by the processing.
The use of a compression function that depends upon the maximum allowable luminance for the lower dynamic range scheme for the corresponding colour, that is the ratios of the colour components, of each pixel ensures that a given luminance value for a colour in the source signal may be modified in such a manner that it is chosen not to exceed (and therefore hard clip) that which is possible in the target scheme.
The dependence on the maximum allowable brightness is preferably that the compression function has a maximum output for a given colour that is the maximum luminance output for that colour in the target scheme. This allows the full range of the target scheme to be used whilst ensuring that the brightness of all colours is altered appropriately to avoid perceptible colour shifts.
The compression function applied to the luminance component of each pixel is reversible in the sense that each output value may be converted back to a unique input value. This allows a target device that is capable of delivering HDR to operate a reverse process (decompression) so that the full HDR range is delivered. This reversibility may be achieved by use of a curve function that has a continuous positive non zero gradient between the black and white points.
The compression applied to the luminance components may be provided as a single process or separated into a compression function and a limiting function. The compression function in such an arrangement may generate values outside the legal range of the target scheme. Accordingly, the limiting function serves the purpose of ensuring output signals remain within a legal range of the target scheme. Example compression functions include power laws, log functions or combinations of these with a linear portion. Preferably, the limiting function includes a linear portion for lower luminance values and log portion for higher luminance values. This ensures that darker parts of a scene are unaltered by the process, but brighter parts of a scene are modified so as to bring the luminance values into a tolerable dynamic range without altering colours.
The conversion function may be implemented using dedicated hardware components for each of the processing steps, but preferably the conversion function is implemented using a three dimensional look up table (3D-LUT). Such a 3D-LUT may be pre-populated using calculations according to the invention such that an input signal comprising separate components may be converted to an output signal of separate components, but in which each of the output components is a function of all three input components. This is the nature of a 3D-LUT. The conversion function may also be implemented as separate modules. Such separate modules may themselves comprise look up tables.
One implementation of the limiting function is preferably as a two dimensional look up table (2D-LUT), such a two dimensional look up table would comprise the two dimensions of colour space to provide an output value that is the maximum luminance for each such colour on the two dimensional colour space. Further aspects may also be implemented as look up tables, for example the compression function may be a one dimensional look up table applied prior to the two dimensional limiting function.
Alternatively, the individual parts of the HDR to SDR conversion may be implemented arithmetically, e.g. with floating point inputs. The preferred implementation of the components would be as LUTs, where the bit depth is sufficiently small to permit this. As already noted, overall the components may be subsumed into a single 3D LUT which is the preferred implementation.
The invention will be described in more detail by way of example with reference to the accompanying drawings, in which:
The invention may be embodied in a method of processing video signals to convert between higher dynamic range and lower dynamic range compatible signals, devices for performing such conversion, transmitters, receivers and systems involving such conversion.
An embodiment of the invention will be described in relation to a processing step which may be embodied in a component within a broadcast chain. The component may be referred to as a pre-processor for ease of discussion, but it is to be understood as a functional module that may be implemented in hardware or software within another device or as a standalone component. A corresponding post-processor may be used later in the broadcast chain such as within a receiver or within an HDR display. In both cases, the function may be implemented as a 3D look up table. Some background relating to HDR video will be repeated for ease of reference.
An embodiment of the invention addresses two impediments to the wider adoption of high dynamic range (HDR) video. Firstly it is necessary to convert HDR video to signals recognisable as standard dynamic range (SDR) so that they may be distributed via conventional video channels using conventional video technology. Secondly a video format is needed that will allow video to be produced using existing infrastructure, video processing algorithms, and working practices. To address both these requirements, and others, it is necessary to convert HDR video into SDR video algorithmically, hence allowing automatic conversion.
A key difference between HDR images and SDR images is that the former support much brighter “highlights”. Highlights are bright parts of the image, such as specular reflections from objects, e.g. the image of the sun reflected in a chrome car bumper (automobile fender). In converting from HDR to SDR, for example during grading, a key process is to “compress” the highlights. That is the amplitude of the highlights reduced while minimising the effect on the rest of the image. So the embodiment provides for the automatic reduction in the amplitude of image highlights.
One way to reduce the dynamic range of an image is to apply a compressive, non-linear transfer function to each of the colour components (RGB) of the image. This is the situation of known arrangements as shown in the arrangement of
A compressive transfer function is a “convex” function, which in this context means a function in which the gradient decreases as the input argument increases. Furthermore such a compressive function should be strictly positive for positive arguments (because light amplitude, i.e. luminance, is strictly positive, you can't have negative photons). So an example of a compressive function might be: output=natural logarithm(input+1.0).
Examples of compressive functions are those already shown in
Unfortunately simply applying a compressive function to each of the colour components in the manner of
The embodiment provides a static mapping from HDR video to SDR video, that is one in which the mapping is independent of picture content. Furthermore it may be implemented using simple hardware, a 3D lookup table (LUT) to implement the pre-processor or post-processor, such 3D-LUTs being already present in a high proportion of video displays. 3D LUTs may also be purchased, at low cost, for professional video (i.e. using conventional serial digital interfaces (SDI)). The embodiment implements a conversion of HDR video to SDR compatible video independently of the scene content. It also provides a complementary restoration of the SDR compatible video produced from an HDR original back to HDR. That is the conversion is reversible.
The overall process will first be described in relation to
We will first describe the arrangement shown in
An RGB to YCbCr converter 12 and corresponding converters 14 and 16 to convert back to RGB may be provided as part of a transmission channel. A standard definition display 20 contains an EOTF function such as Rec 1886 corresponding to Rec 709 which is capable of rendering an appropriate representation of the original HDR signal on the SDR display. It is the use of the pre-processor 40 that ensures an appropriate image is displayable. If the receiver has an HDR display 18 having an appropriate corresponding HDR EOTF, a post-processor 42 is provided to reverse the processing undertaken in the pre-processor 40 to recover the original RGB HDR signal to take advantage of the full dynamic range for display.
Some particular features of the arrangement of
The pre-processor 40 (
In order to convert the input RGB to Yu′v′ the signal is converted to the CIE 1931 XYZ. Because the input signal is derived from linear light via an OETF (non-linear) the RGB components are first transformed back to linear using the inverse of the OETF in RGB to linear module 67. The conversion to XYZ may then simply be performed, as is well known in the literature, by pre-multiplying the RGB components (as a vector) by a 3×3 conversion matrix. The RGB to XYZ converter 60 receives the linear RGB signals and converts to XYZ format. At this stage, the XYZ signals represent the full dynamic range of linear RGB HDR signals. An XYZ to u′v′ converter 62 receives the XYZ signals and provides an output in u′v′ colour space. Separately the luminance component Y is provided to a compressor 61 which provides a function to compress (also known as compand) the signal to reduce the range. Compression is used in the sense of a compressive function previously described. This may also be referred to companding. The companding applied may be similar to the “Knee” function shown in
The luminance component Y may be further modified to allow for viewing conditions such as by adding a black offset and applying a system gamma (described later). Such modifications to the luminance Y are applied to that luminance rather than separately to the RGB components as previously described to avoid changing colour saturation.
A compression function of the type applied by the compressor module 61 is shown in
The effect of the modifications may be to generate values that are outside the legal range 0 to 1 of RGB when the signal is converted back to RGB format. Accordingly, the luminance component is soft clipped to ensure the final RGB signal remains within its legal range. Referring back to
YMAX is provided to a limiter function 64 which receives the luminance component of the signal and, for each pixel, limits the luminance component based on the colour of that component to provide an output signal YPRACTICAL.
The limiter function is conceptually shown in
Referring back to
At an SDR receiver, the RGB signals may be used directly using a Rec 1886 EOTF. At an HDR receiver, the inverse of the process of
The compatibility of the RGB output from the pre-processor may be understood by referring back again to
The path from the HDR camera to a SDR display will now be considered. Recall that the RGB signal provided from the HDR device 10 has been provided according to a particular OETF. The first stage of the pre-processor reversed the camera OETF to generate linear RGB and then the luminance component. The luminance values could go beyond those displayable on an SDR display and so the soft clipping provided by the compressive limiter function ensures the final RGB signal remains within its legal range and conceptually modifies the luminance component such that it falls within an allowable range 0 to 1 for an SDR display, but without particular modification to the shape of the signal versus luminance curve. At the output, a Rec 709 OETF is used the signal provided looks to a receiver like SDR Rec 709 and can be displayed at the receiver using a normal SDR EOTF.
The choice of OETF does not particularly impact the operation of an embodiment of the invention because whatever the input, the first step is effectively conversion to linear light (i.e. no OETF) and with sufficient precision (i.e. enough bits) to avoid artefacts. This is, potentially, a practical scenario because the embodiment might be used with the OpenEXR format, which is a 16 bit floating point format that (usually) stores linear light. Other floating point formats might also be used. One implementation would be to use a 3D LUT to perform the processing. The problem with this is, again, the number of bits required on the input for linear light with an HDR signal (minimum 16 bits for linear light HDR signal). We would get round this by using a nonlinear compressive function on each channel (RGB) prior to inputting the signal into the 3D LUT. So you might have a 16 bit linear signal, through a 1D LUT, reduced to 10 bits. We can have a LUT because it is only 1 dimensional, or there would be other, simple, ways to implement this compressive non-linearity prior to the 3D LUT. The proposed OETF as shown in
The concept of the embodiment is not strongly coupled to the choice of OETF; the arrangement may operate with any OETF that encodes HDR into a limited number of bits (e.g. 10 bits). A key point is that the simplest LUT implementation would need RGB linear light passed through (3) 1D LUTs and then the 3 reduced bit depth signals processed in a 3D LUT. Both the 1D LUTs and the 3D LUT might reasonably be implemented in the camera.
The post-processor 42, 52 within the path to an HDR display implements an inverse of the process of any of
The preferred implementation of the pre-processor 40, 50 and post-processor 42,52 described in the embodiments is preferably using a 3D look-up table (3D-LUT). Existing SDR receivers include a 3D-LUT to map the colorimetry of the input signal to that of the native colorimetry of the display, or implement manufacturer selected pre-sets to the choice of “look” such as “vivid”, “film” and so on. Each “look” is designated by settings in the 3D-LUT that take the inputs in 3D RGB space and provide RGB outputs, wherein each of the R, G and B outputs is based on a combination of the RGB inputs (hence the 3D nature of the table). The size of the 3D-LUT will depend upon the number of bits in the signal. A 10 bit signal would require 210 lookups and a 30 bit signal 230 lookups. The latter may be too large and so a design choice would be to use a smaller 3D-LUT and to interpolate between values.
The 3D-LUT already existing within SDR receivers could, therefore, be modified to implement the compression and limiting functions of the pre-processor. If this could be done, then there would be no requirement for a post-processor at HDR receivers. However, this would require transmission of the new 3D-LUT settings to existing SDR receivers and so is not the preferred option. Instead, it is preferred to implement a pre-processor 3D-LUT prior to transmission and to include the post-processor 3D-LUT within new HDR receivers. The post-processor 42, 52 may therefore be considered to be a component within a new HDR display, set-top-box, receiver or other device capable of receiving video signals. The preferred implementation is a simple modification by including appropriate values within an existing 3D-LUT of an HDR display. Such values could be provided at the point of manufacture or later by subsequent upgrade using an over air transmission or other route. The values for such a lookup table would may be calculated according to the calculation for YMAX described herein including Appendix A and using chosen limiting functions such as those shown in
The 3D-LUT or other LUT may implement some or all of the functionality of the pre-processor and post-processor. Some aspects may require calculation for accuracy, other aspects could be performed by lookup. For example, the calculation of maximum luminance level can be pre-calculated and stored in a 2D LUT. However a problem with using multidimensional LUTs is that their memory requirements can get impracticably large depending on the number of bits in the input signal. For example the signal inputs may be floating point (e.g. 16 bit format), in which case a 2D LUT would be impracticably large. So for floating point signal it would be better to implement a module to perform calculations. The same goes for other parts of the functional components of
In general, 3D LUTs for video, e.g. changing colour space, use a reduced number of bits on the input to a lookup table and then interpolate to generate results for the full number of input bits. This works well in practice for video. However for intermediate steps of a process (as here) the loss of precision due to interpolation may be significant. We have appreciated, therefore, that it may not be appropriate to use multidimensional LUTs for all functional blocks.
However implemented, the arrangement ensures that the following three conditions are met:
(1) YHDR is less than or equal to YMAX
(2) YSDR may be greater than YMAX
(3) YPRACTICAL must be less than or equal to YMAX
This is condition enforced by the limiter to avoid the problems discussed.
The embodiment provides an adjustment to the Y component for each pixel as before using a compressor block. However, the allowable brightness of a given pixel in the target dynamic range is not a fixed value for all colours and so the compressor 70 provides both a compressive and limiting function. The allowable brightness is a function of colour.
The purpose of the maximum brightness block may be appreciated by an exemplar considering particular colours. Consider a pixel having a pure blue colour. This colour may have a maximum allowable luminance value in the target scheme that is lower than, say, a pure red pixel. If one applied the same luminance compression to both colours, one would potentially have the blue colour above an allowable level in the target scheme, and the red in an allowable range. As a result, the blue colour would not be correctly represented (it would have a lower value than intended) and we would have a colour shift: more red in comparison to blue.
The maximum brightness block therefore determines a maximum allowable luminance value for each colour component in relation to the lower dynamic scheme. This is provided as an input to a compression block that applies a compression function to the luminance component of each pixel to produce a compressed luminance component. Significantly, the compression function depends upon the maximum allowable brightness in the lower dynamic range scheme for the corresponding colour component of each pixel. In this way, the effective compression curve used for each colour differs whilst ensuring a maximum RGB value is not violated.
The output comprises an RGB signal that originated from an HDR RGB signal but which is usable within SDR systems. Moreover, the reverse process may be operated to recover an HDR RGB signal by splitting into components as before and operating a reverse of the compression curves.
One might think that quantisation problems could result in consequence of alterations to the luminance components using the compression limiting and subsequent delimiting and decompression functions. However, it is noted that grey pixels remain unaltered by the process and significant changes only occur to highly coloured pixels. The human eye is less sensitive to quantisation of colour than luminance and so it is unlikely to be a problem. In any event, precision of the compressor and limiter can be chosen to be sufficient such that these do not inherently limit the quantisation and this is a further reason why quantisation problems should not arise.
We have appreciated a further advantage than may be provided in any of the embodiments of the invention by applying a further variation to those embodiments that implements colour compression. Separately from considerations of the dynamic range, it is preferred that modern displays and systems generally should use a wider colour gamut than previous systems. Accordingly, it is desired that a signal acquired using such a wider colour gamut such as Rec 2020 should be viewable on an existing display designed for Rec 709. For this purpose, an additional colour compressor may be provided within the pre-processor and post-processor as shown in
The choice of compression function applied to the radial colour components of
As previously noted, the invention may be implemented using separate functional components as described in relation to
This appendix addresses how to determine the maximum value of luminance (CIE 1931 Y) given a colour defined by u′/v′ colour co-ordinates. Let this maximum luminance value be denoted Ymax.
If we knew the colour coordinates XYmaxZ (CIE 1931) then, when we calculated the corresponding RGB co-ordinates, in the output colour space, we would find that one or more of RGB would be 1.0, since this is the maximum permitted value for RGB components. To find Ymax we would to find algebraic formulae for the values of RGB, given XYmaxZ, and then solve these to find Ymax. However, we have co-ordinates Ymaxu′v′, so we need to find formulae for RGB in terms of Ymaxu′v′, then we can solve for Ymax.
Given the values of Ymaxu′v′ the corresponding values of X & Z are given by
Given XYZ components, then RGB components are calculated by pre-multiplying by a 3×3 matrix (as is well known), where the matrix, denoted “M” herein, depends on the RGB colour space. So,
Substituting equation(s) 1 into equation 2 yields:
We may re-write this as:
where the values of the matrix K are defined as:
Now, as stated above, for maximum luminance, Ymax, at least one of RGB must be 1.0. therefore, from equation(s) 4 one or more of the following must be true:
where the 3 equations are derived from the maximum values of R, G & B equal to 1.0.
Hence the maximum luminance, Ymax, is the minimum of the values calculated from equation(s) 6.
For example, with and ITU Recommendation BT.709 colour space the matrix M, to convert from XYZ to RGB may be calculated, from the specification, to be:
From this we may calculate the matrix, K, to be:
Number | Date | Country | Kind |
---|---|---|---|
1502016.7 | Feb 2015 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2016/050272 | 2/5/2016 | WO | 00 |