The present disclosure generally involves converting image data from one format to another.
New display technology has drastically improved the potential image quality of content. Specific improvements include the ability to display a wider color gamut and a much larger brightness range (usually measured in nits). This combination is usually refered to as HDR (high-dynamic range) or Ultra HD.
Unfortunately, almost all content is currently graded for SDR (standard-dynamic range) displays. Meaning the potential advantages of HDR technology including improvements in user experience due to wider color gamut and brightness ranges are not fully realized. This results in degraded image quality, and decreased consumer motivation to purchase higher-end displays.
Currently HDR content is generated using either (A) native acquisition using HDR cameras (which is very expensive) or (B) upconversion from SDR content using specialized software. This specialized software requires trained technicians to operate and professional color grading monitors, which can cost tens of thousands of dollars. Approaches to automating conversion of SDR to HDR have been proposed that may involve defining or selecting a set of parameters to define the conversion. Such approaches may provide suitable HDR content. However, such approaches may be limited by the parameters to certain situations associated with the parameters selected or may require pristine SDR content to provide useable HDR content. That is, conversion of SDR content that includes artifacts or is corrupted may produce unsatisfactory HDR content. The result is a current lack of HDR content.
According to an aspect, an apparatus may comprise a partition module partitioning input data representing a standard dynamic range image into a plurality of portions of data, wherein each portion represents a respective one of a plurality of SDR patches of the SDR image and each of the plurality of SDR patches covers a portion of the SDR image and the set of the plurality of SDR patches fully covers the SDR image; an autoencoder processing each of the plurality of SDR patches responsive to a plurality of model weights representing a model of SDR to HDR conversion to produce a respective plurality of estimated HDR patches; and an image stitching module stitching the estimated HDR patches together to form a HDR image version of the SDR image.
According to another aspect, a method may comprise partitioning input data representing a standard dynamic range image into a plurality of portions of data, wherein each portion represents a respective one of a plurality of SDR patches of the SDR image and each of the plurality of SDR patches covers a portion of the SDR image and the set of the plurality of SDR patches fully covers the SDR image; processing each of the plurality of SDR patches in a deep learning autoencoder responsive to a plurality of model weights representing a model of SDR to HDR conversion to produce a respective plurality of estimated HDR patches; and stitching the estimated HDR patches together to form a HDR image version of the SDR image.
The present disclosure may be better understood by consideration of the detailed description below in conjunction with the accompanying figures, in which:
In the various figures, like reference designators refer to the same or similar features.
The present disclosure is generally directed to conversion of image data from one format to another different format.
While one of ordinary skill in the art will readily contemplate various applications to which aspects and embodiments of the present disclosure can be applied, the following description will focus on apparatus, systems and methods for image conversion applications such as converting standard dynamic range (SDR) images or image data to high dynamic range (HDR) images or image data. Such processing may be used in various embodiments and devices such as set-top boxes, gateway devices, head end devices operated by a service provider, digital television (DTV) devices, mobile devices such as smart phones and tablets, etc. However, one of ordinary skill in the art will readily contemplate other devices and applications to which aspects and embodiments of the present disclosure can be applied. For example, an embodiment may comprise any device that has data processing capability. It is to be appreciated that the preceding listing of devices is merely illustrative and not exhaustive.
In general, an embodiment involves a deep learning approach to up-convert image content in SDR color space to image content in HDR color space using a training corpus of content in both SDR and HDR. An embodiment may comprise a convolutional neural networks (CNN) including an autoencoder using a training corpus to learn how to extract relevant structural information from image patches and predict pixel values in HDR space. An embodiment may provide a non-parametric approach, i.e., image conversion is not based on a predetermined set of parameters. For example, an embodiment involves learning parameters and parameter values that produce the best fit conversion result, thereby providing flexible conversion that can be used in systems that efficiently implement deep learning architectures such as graphics processing units (GPU).
In more detail in reference to the drawings,
The SDR image passes from input 100 to block 110 where patch decomposition, or partitioning of the input image, occurs. To limit or reduce memory requirements during processing, rather than processing the complete SDR input image at one time, a series of patches are created from the input image. The creation of patches may also be considered to be partitioning of the input image into the patches or portions of the input image and block 110 may also be referred to herein as a partitioning module. The set of patches cover the complete original image. For example, block 110 processes a 1080p frame to create a series of 128×128×3 patches (i.e., 128 pixels×128 pixels×3 color channels per pixel (e.g., R, G, B)) with a 50% redundancy or overlap on each patch. Other patch sizes and redundancy values may be suitable and are contemplated. For patch sizes above 256×256, memory requirements may become prohibitive with current technology and/or scalability may be problematic. For patch sizes below 32×32, accuracy may be less than required or desirable.
The image patches created by block 110 pass to block 130 which may be a deep learning autoencoder, e.g., a convolutional neural network. An example of an embodiment of block 130 comprises a convolutional autoencoder with skip connections as explained further below in regard to
The processing of image patches in block 130 occurs based on model weights provided by block 120. The model weights that produce a best fit of conversion of SDR input to HDR output are derived during a training operation that, as explained in more detail below in regard to
The output of block 130 comprises a plurality of estimated HDR patches corresponding to respective ones of the plurality of SDR patches. In block 140, the series of estimated HDR patches from block 130 are stitched together to form an output HDR image at block 150 corresponding to the input SDR image at block 100. The stitching operation performed in block 140 may comprise calculating a median value for all pixel values.
As mentioned above, the conversion model weights are determined during a training operation. That is,
A training corpus of images suitable for training of apparatus such as the example of an embodiment shown in
As described above in regard to
Also shown in
The present description illustrates various aspects and embodiments. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, are included within the spirit and scope of the present description. For example, according to an aspect, an apparatus may comprise a partition module partitioning input data representing a SDR image into a plurality of portions of data, wherein each portion represents a respective one of a plurality of SDR patches of the SDR image and each of the plurality of SDR patches covers a portion of the SDR image and the set of the plurality of SDR patches fully covers the SDR image; an autoencoder processing each of the plurality of SDR patches responsive to a plurality of model weights representing a model of SDR to HDR conversion to produce a respective plurality of estimated HDR patches; and an image stitching module stitching the estimated HDR patches together to form a HDR image version of the SDR image.
According to another aspect, an apparatus may comprise a partition module partitioning input data representing a SDR image into a plurality of portions of data, wherein each portion represents a respective one of a plurality of SDR patches of the SDR image and each of the plurality of SDR patches covers a portion of the SDR image and the set of the plurality of SDR patches fully covers the SDR image; an autoencoder processing each of the plurality of SDR patches responsive to a plurality of model weights representing a model of SDR to HDR conversion to produce a respective plurality of estimated residual values wherein each of the plurality of residual values represents a difference between one of the plurality of SDR patches and a respective patch of a HDR image corresponding to the SDR image; combining each of the plurality of residual values with a respective one of the plurality of SDR patches to produce a plurality of estimated HDR patches; and an image stitching module stitching the plurality of estimated HDR patches together to form a HDR image version of the SDR image.
In an embodiment, model weights may be learned during a training operation using a stochastic gradient descent on a training corpus of images in both SDR and HDR.
In an embodiment, an autoencoder may comprise a convolution autoencoder with one or more skip connections.
In an embodiment, each of a plurality of SDR patches and a plurality of estimated HDR patches may have a dimension of 128×128×3 and a 50% redundancy.
In an embodiment, a SDR image may comprise a single frame of HDTV content in 1080p resolution.
In an embodiment, a training corpus of images may comprise a plurality of known SDR images and a plurality of respective known HDR images, and the training operation may comprise processing a set of training data through an autoencoder during an epoch, and wherein the set of training data includes a plurality of batches of images included in the training corpus of images, wherein each batch comprises a subset of the plurality of images included in the training corpus, and repeating the processing for a plurality of epochs.
In an embodiment, a SDR image may comprise image data in the BT.709 color space and a HDR image may comprise image data in the BT.2020 color space.
In an embodiment, an autoencoder having one or more skip connections may process a plurality of SDR image patches to produce a respective plurality of residual values each representing a difference between one of the plurality of SDR patches and a respective patch of a HDR image, and one of the skip connections may provide each of the plurality of SDR image patches to the output of the autoencoder to be combined with a respective one of the plurality of residual values to produce a plurality of estimated HDR image patches.
According to another aspect, a method of converting a SDR image to a HDR image may comprise partitioning input data representing a SDR image into a plurality of portions of data, wherein each portion represents a respective one of a plurality of SDR patches of the SDR image and each of the plurality of SDR patches covers a portion of the SDR image and the set of the plurality of SDR patches fully covers the SDR image; processing each of the plurality of SDR patches in a deep learning autoencoder responsive to a plurality of model weights representing a model of SDR to HDR conversion to produce a respective plurality of estimated HDR patches; and stitching the estimated HDR patches together to form a HDR image version of the SDR image.
According to another aspect, a method of converting a SDR image to a HDR image may comprise partitioning input data representing a SDR image into a plurality of portions of data, wherein each portion represents a respective one of a plurality of SDR patches of the SDR image and each of the plurality of SDR patches covers a portion of the SDR image and the set of the plurality of SDR patches fully covers the SDR image; processing each of the plurality of SDR patches in a deep learning autoencoder responsive to a plurality of model weights representing a model of SDR to HDR conversion to produce a respective plurality of estimated residual values wherein each of the plurality of residual values represents a difference between one of the plurality of SDR patches and a respective patch of a HDR image corresponding to the SDR image; combing each of the plurality of residual values with a respective one of the plurality of SDR patches to produce a plurality of estimated HDR patches; and stitching the estimated HDR patches together to form a HDR image version of the SDR image.
In an embodiment, a method may include a processing step preceded by learning model weights using a stochastic gradient descent on a training corpus of images in both SDR and HDR.
In an embodiment, a method may include processing using an autoencoder comprising a convolution autoencoder with one or more skip connections.
In an embodiment, a method may include each of a plurality of SDR patches and a plurality of estimated HDR patches having a dimension of 128×128×3 and 50% redundancy.
In an embodiment, a method may include processing a SDR image comprising a single frame of HDTV content in 1080p resolution.
In an embodiment, a method may include a training operation processing a training corpus of images comprising a plurality of known SDR images and a plurality of respective known HDR images, and the training operation may further include processing the set of training data through an autoencoder during an epoch having a plurality of batches of images included in the training corpus of images, wherein each batch comprises a subset of the plurality of images included in the training corpus, and wherein the embodiment may further include repeating the processing for a plurality of epochs.
In an embodiment, a method may include processing a SDR image having image data in the BT.709 color space and a HDR image having image data in the BT.2020 color space.
In an embodiment, a method may include processing each of a plurality of SDR patches using a deep learning autoencoder having one or more skip connections and may further include processing the plurality of SDR image patches using the autoencoder to produce a respective plurality of residual values each representing a difference between one of the plurality of SDR patches and a respective patch of a HDR image, and wherein one of the skip connections provides each of the plurality of SDR image patches to the output of the autoencoder to be combined with a respective one of the plurality of residual values to produce a plurality of estimated HDR image patches.
According to another aspect, a non-transitory computer-readable medium may comprise instructions thereon which, when executed by a computer, cause the computer to carry out a method in accordance with any of the aspects and/or embodiments in accordance with the present disclosure.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the present disclosure and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting features, aspects, and embodiments, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown. As an example of such an embodiment, a non-transitory computer readable media may store executable program instructions to cause a computer executing the instructions to perform an embodiment of a method in accordance with the present disclosure.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read-only memory (“ROM”) for storing software, random access memory (“RAM”), and non-volatile storage.
Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
Herein, the phrase “coupled” is defined to mean directly connected to or indirectly connected with through one or more intermediate components. Such intermediate components may include both hardware and software based components.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. Functionalities provided by the various recited means are combined and brought together in the manner defined by the claims. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
Reference in the specification to “one embodiment” or “an embodiment”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment in accordance with the present disclosure. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
It is to be understood aspects, embodiments and features in accordance with the present disclosure may be implemented in various forms of hardware, software, firmware, special purpose processors, or combinations thereof, e.g., as a combination of hardware and software. Moreover, the software may be implemented as an application program tangibly embodied on a program storage unit. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPU”), a random access memory (“RAM”), and input/output (“I/O”) interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit.
It is to be further understood that, because some of the constituent system components and methods depicted in the accompanying drawings are preferably implemented in software, the actual connections between the system components or the process function blocks may differ depending upon the programming used. Given the teachings herein, one of ordinary skill in the pertinent art will be able to contemplate these and similar implementations or configurations.
Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present disclosure is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present disclosure. All such changes and modifications are intended to be included within the scope of the appended claims.
Number | Date | Country | |
---|---|---|---|
62555710 | Sep 2017 | US |