The present disclosure relates to digital techniques for representing image information and, in particular, to techniques for multi-color image information.
In modern computing applications, there are a variety of ways to represent multi-color image information. In many cases, image information is represented by a variety of orthogonal color components, sometimes called “planes.” For example, a multi-color image may be represented by red, green, and blue color components in an “RGB” color space. In another example, the same multi-color image may be represented by a luma and two chroma color components, in a Y-Cr-Cb color space. Standards have been developed to govern representation of color in images, which facilitates image exchange in modern computing applications.
The 4:2:0 chroma format is currently the most popular chroma sampling format in consumer oriented video applications. In this format, each frame of a video sequence is represented with a luma component and two chroma components. The two chroma components, however, are represented using half the resolution vertically and horizontally compared to the luma component. This occurs because, for most content, the characteristics of the chroma signals permit a reduction in their resolution with limited impact in image quality. Such resolution reduction can help in reducing memory storage and bandwidth and provide some compressibility benefits when compressing the video sequence (e.g., the downscaling process can help in reducing some of the noise that may exist in the original 4:4:4 full resolution representation of each chroma component, making it easier to compress the chroma data). Alternative reduced resolution formats, such as the 4:2:2 format, also are known.
There are some applications where higher resolution of the chroma information is desirable, such as screen sharing, gaming, and still-image photography. Reducing chroma resolution compared to the luma resolution may result, in some cases, in artifacts around object edges/boundaries, especially if the edges have significant color differences (e.g. red object next to a blue background). Chroma subsampling can also exacerbate chroma leakage during compression. Chroma leakage, due to subsampling, can be especially visible in HDR content, where some techniques, such as the luma adjustment method for chroma conversion, have been proposed to keep such artifacts in check. Users often wish to maintain, e.g. for archiving purposes, the highest quality version of their content, which may include full resolution chroma samples, while distributing the lower resolution versions to others when needed.
Unfortunately, although a full resolution chroma feature is highly desirable, the majority of consumer-oriented devices currently deployed, such as set-top box decoders, mobile devices, computers, etc. can support only up to 4:2:0 chroma format hardware decoding. Although software decoding of natively coded 4:4:4 content is possible, such software applications could consume too much power in battery powered devices or might not be possible in real time coding applications involving certain resolutions and frame-rates.
Embodiments of the present disclosure provide techniques for representing video and images with full resolution color component information and yet remaining compatible with legacy processing systems that process images with reduced resolution information, such as the 4:2:2 and/or 4:2:0 representations that are popular with luma-chroma image representations. The image representation may include a scalable format that consists of a base layer, where image data is coded to match expectations of a legacy coder. The image representation also may include additional enhancement layer(s) that support upconversions of reduced-resolution color components to higher resolutions. The image representation not only provides power savings in decoding full-resolution representations but it also provides other benefits, such as scalable and power aware decoding.
The following discussion presents the techniques proposed by the present disclosure in context of a system that codes images in a luma-chroma color plane. As discussed herein, luma-chroma representations of images and/or video (“images” for convenience) commonly are represented in a 4:2:0 format, where chroma image components are represented with reduced resolution as compared to the luma color component. The principles of the present disclosure, however, may be extended to other image formats, as may be desired, where one color component is represented in a reduced-resolution representation as compared to another color component. The use of luma-chroma examples within the following discussion should not be interpreted to limit application of the proposed techniques to any particular color space.
The downsampler 110 may downsample resolution of the chroma color components to a lower resolution in conformance to the color format to which the base layer image adheres. Thus, in an implementation using a 4:2:2 format, the downsampler 110 may downsample the chroma color components (Cr, Cb) so that each has half the resolution in the horizontal direction as the corresponding luma color component. Similarly, in an implementation using a 4:2:0 format, the downsampler 110 may downsample the chroma color components (Cr, Cb) so that each has half the resolution in both the horizontal and vertical directions as the corresponding luma color component. The downsampler 110 may output downsampled Cr and Cb chroma data to the base layer buffer 120.
The base layer buffer 120 may store luma component data and downsampled chroma component data until it is to be transferred to a file. The data stored in the base layer buffer 120 may form a base layer image (
The upsampler 130 may upsample downsampled chroma data from the base layer buffer 120 to a higher resolution. For example, the Cr and Cb chroma data may be upsampled from the 4:2:2 or 4:2:0 resolution as stored in the base layer buffer 120 to a full resolution format (e.g., 4:4:4). The upsampler 130 may operate according to a predefined upscaling technique such as lanczos 5, bilinear, bicubic, or some other upsampler. Alternatively, the system 100 may dynamically select parameters of the upsampler 130 and provide metadata in a file identifying the selected parameters. The upsampling techniques accounts for the chroma location type compared to that of the luma, i.e. whether the chroma location type is equal to 0, 1, 2 etc., which may impact the phase of the upscaler used.
The residual generator 140 may generate residual signals for the Cr and Cb chroma data based on comparisons between the upsampled Cr and Cb chroma signals and the source Cr and Cb chroma signals at the system's input. The Cr and Cb chroma residual signals may be input to enhancement layer buffer(s) 150. These Cr and Cb chroma residual signals may form the basis of enhancement layer image(s) for the source image.
The enhancement layer image(s) 220, 230 may include color residual(s) that possess information corresponding to one or more reduced-resolution color components 214, 216 from the base layer image 210 at a higher resolution. Continuing with the 4:2:0 example above, the Cr and Cb planes 214, 216 of the base layer image 210 may have half the resolution horizontally and vertically comparatively to the luma plane 212 of the base layer image 210. Cr and Cb chroma residual enhancement layer images 220, 230 may provide information corresponding to the Cr and Cb chroma residuals generated by the residual generator 140. The enhancement layer images 220, 230 may provide information from which full resolution Cr and Cb chroma residuals may be derived.
Two enhancement layer images 220, 230 are shown in the example of
The principles of the present disclosure may be applied to provide multiple levels of scalability as may be desired. One such embodiment is illustrated in
As a further example of multi-level scalability, one or more enhancement layer images may support increased resolution of regions of interest within image(s). A region of interest (also “ROI”) may be a spatial area of a source image that is determined to contain image data that likely is of interest to human viewers. The example of
In such applications, a first layer of scalability may be provided by enhancement layer images 220, 230 that may provide enhancement information for the entire spatial area of a source image and a second layer of scalability may be provided by enhancement layer images 240, 250 that correspond to the spatial area of the region of interest 260. The enhancement layer images 220, 230 may provide enhancement information for the entire spatial area of the source image that, when decoded with content of the base layer 210, increases the resolution of the image so obtained (say, from a 4:2:0 format to a 4:2:2 format). The ROI enhancement layer images 240, 250 may provide enhancement information that, when decoded with content of the base layer image 210, increase resolution of the area corresponding to the region of interest 260 to a maximum resolution (e.g., from a 4:2:0 format to a 4:4:4 format). The ROI enhancement layer images 240, 250 may be coded differentially with respect to spatially coincident content of the enhancement layer images 220, 230, which themselves may be coded differentially with respect to the base layer image 210. In another implementation, the ROI enhancement layer images 240, 250 may be coded differentially directly from the base layer image 210 (e.g., without consideration of the enhancement layer images 220, 230).
The principles of the present disclosure find application in systems in which a base layer coder 180 supports monochrome coding such as by using the HEVC monochrome profiles or by using the HEVC Main/Main 10 profiles in which coding discards chroma planes. In such an application, the Cr and Cb chroma planes 214, 216 of the base layer image 210 would be empty. The Cr and Cb residual enhancement layer images 220, 230 would contain full resolution representations of the chroma information (e.g., they are not residuals).
The principles of the present disclosure also find applications in which Cr and Cb residual enhancement layers 220, 230 are not coded differentially with respect to upsampled Cr and Cb chroma information from the base layer image 210. For example, the upsampler(s) 130 and residual generator(s) 140 (
In one embodiment, the image file 200 may be represented using the High Efficiency Image File (commonly, “HEIF”) format defined by MPEG, for example, in ISO/IEC 23008-12 (MPEG-H Part 12). In particular, an implementation may use the derived image item and the alternative group concepts in this format. The base layer image 210, for example, may be stored as a primary item in a HEIF file. The enhancement layer image(s) 220, 230, 240, 250 may be placed in an ‘altr’ alternative group, which indicates that the base layer image 210 (e.g., having a 4:2:0 format) and the enhancement layer image(s) 220, 230, 240, 250 are alternatives of each other.
Returning to
In application, the enhancement layer image likely will contain mostly residual data commonly shifted to the center of the bitdepth representation, (i.e. if the coded representation is of 8 bits a value of 128 is added to the residual signals and clipped within 0 to 255) and, therefore, may be appropriate for the coder 180 to reduce the dynamic range of the signal and code it with a lower bitdepth. Different bitdepth could be used for the different layers, while also scaling of the samples may also be used to increase precision, and that could be different for each layer.
In an embodiment, the coder 180 may operate on the chroma component data after the component data is packed into a virtual image format by an enhancement image packing unit 190. The enhancement image packing unit 190 may arrange the Cr and Cb chroma component data into a format that presents the component data to the coder 180 as if it were luma data. The coder 180 may apply is coding protocols to the virtual image as presented by the enhancement image packing unit 190, which may lead to generation of a file that has an enhancement layer image in an alternate format. Two examples of the alternate formats are shown in
In the example of
In the
A metadata field 530 may identify processing performed by the enhancement layer packing unit 190 and/or coder 180. For example, the metadata 530 may identify a packing relationship between the Cr and Cb chroma residuals 523, 524 within the virtual luma image. The metadata 530 also may identify a type of coding applied by the coder 180 in implementations where the system 100 may select dynamically a type of coding to be applied.
In the example of
A metadata field 630 may identify processing performed by the enhancement layer packing unit 190 and/or coder 180. For example, the metadata 630 may identify a packing relationship between the Cr and Cb chroma residuals 623, 624 within the virtual luma image (e.g., their horizontal placement with respect to each other). The metadata 630 also may identify a type of coding applied by the coder 180 in implementations where the system 100 may select dynamically a type of coding to be applied.
In the
A metadata field 730 may identify processing performed by the enhancement layer packing unit 190 and/or coder 180. For example, the metadata 730 may identify a packing relationship between the Cr and Cb chroma residuals 723, 724 within the virtual luma image (e.g., their interleaved relationship respect to each other and the granularity at which they were selected). The metadata 730 also may identify a type of coding applied by the coder 180 in implementations where the system 100 may select dynamically a type of coding to be applied.
As in the other embodiments, the dummy Cr and Cb chroma images 726, 728 may contain null data. It is expected that, when the dummy Cr and Cb chroma images 726, 728 are coded by the coder 180, they will have extremely small bit sizes.
In the example of
Similarly, Cb chroma data of a source image (also not shown) may be packed into the luma plane 832 of the Cb enhancement layer image 830. Virtual Cr and Cb chroma fields 834, 836 of the Cb enhancement layer image 830 may contain dummy data. The Cb enhancement layer image 830 may be presented to a second enhancement layer encoder 860 in a legacy representation (e.g., 4:2:0 or 4:2:2), which allows the Cb enhancement layer image 830 to be coded by a legacy coder of a conventional consumer electronics device.
A first enhancement layer decoder 930 may invert coding processes performed by a first enhancement layer encoder 850 (
A second enhancement layer decoder 940 may invert coding processes performed by a second enhancement layer encoder 860 (
An image reconstructor 980 may generate a reconstructed image 990 from the recovered base layer and enhancement layer images 950, 960, 970. As discussed, virtual luma planes 962, 972 of the Cr and Cb enhancement layer images 960, 970 may contain recovered Cr and Cb chroma data, respectively. The data in those luma planes 962, 972 may represent the Cr and Cb components at full resolution (e.g., matching the resolution of the luma data contained within the luma plane 952 of the recovered base layer image 950). The image reconstructor may derive a reconstructed image 990 at full resolution (e.g., 4:4:4) from the full resolution luma representation contained within the luma plane 952 of the base layer image 950, the full resolution Cr chroma representation contained within the luma plane 962 of the first enhancement layer image 960, and the Cb chroma representation contained within the luma plane 972 of the second enhancement layer image 970. Of course, in processing applications for which lower resolution image information is suitable (e.g., a recovered 4:2:0 representation of the source image suffices), a decoder may decode the base layer image 950 without processing of any coded enhancement layer image from the file.
The data flow diagrams of
The downsampler 1010 may downsample resolution of the chroma color components to a lower resolution in conformance to the color format to which the base layer image adheres. Thus, in an implementation using a 4:2:2 format, the downsampler 1010 may downsample the chroma color components (Cr, Cb) so that each has half the resolution in the horizontal direction as the corresponding luma color component. Similarly, in an implementation using a 4:2:0 format, the downsampler 1010 may downsample the chroma color components (Cr, Cb) so that each has half the resolution in both the horizontal and vertical directions as the corresponding luma color component. The downsampler 1010 may output downsampled Cr and Cb chroma data to the base layer buffer 1020.
The base layer buffer 1020 may store luma component data and downsampled chroma component data until it is to be transferred to a file. The data stored in the base layer buffer 1020 may form a base layer image (
The coder/decoder 1030 may code the downsampled Cr and Cb chroma signals according to the coding algorithm applied by the coder 1070 and decode the coded signals. Many coding algorithms are lossy coding processes, which cause signals to incur coding losses as they are coded and decoded. Thus, the coder/decoder 1030 may output Cr and Cb chroma signals that represent the Cr and Cb chroma signals that are input to the coder/decoded 1030 but exhibit some coding errors. The coding errors introduced by the coder/decoder 1030 are likely to resemble coding errors that are incurred by a decoding system (not shown) when the image file is decoded.
The upsampler 1040 may upsample downsampled Cr and Cb chroma signals input to it from the coder/decoder 1030 to a higher resolution. For example, the Cr and Cb chroma data may be upsampled from the 4:2:2 or 4:2:0 resolution as present in the base layer buffer 1020 to a full resolution format (e.g., 4:4:4). Again, the upsampler 1040 may operate according to a predefined upscaling technique such as lanczos 5, bilinear, bicubic, or some other upsampler. The upsampling techniques account for the chroma location type compared to that of the luma, i.e. whether the chroma location type is equal to 0, 1, 2 etc., which can impact the phase of the upscaler used.
The residual generator 1050 may generate residual signals for the Cr and Cb chroma data based on comparisons between the upsampled Cr and Cb chroma signals and the source Cr and Cb chroma signals at the system's input. The Cr and Cb chroma residual signals may be input to respective enhancement layer buffers 1060. These Cr and Cb chroma residual signals may form the basis of the enhancement layer images for the source image.
Enhancement layer data stored in the enhancement layer buffer 1060 may be compressed by a coder 1090, which may (but need not) operate according to the same coding protocol(s) as in the coder 1070. The coder 1090 may operate directly on the Cr and Cb chroma components in which case the enhancement layer image 220 (
In an alternate embodiment, the coder/decoder 1030 may be implemented solely as a decoder that inverts coding operations performed by the coder 1070. In such an embodiment, coded Cr and Cb chroma data may be input to the decoder 1030 from the coder 1070 (path shown in phantom).
The system 1300 also may include enhancement layer decoder 1320, an image repacking unit 1330, adders 1340, and upsamplers 1350 for processing of enhancement layer images (
The upsamplers 1350 may perform upsampling operations on the Cr and Cb chroma signals output from the base layer decoder 1310. The upsampling operations may mimic the upsampling operations performed by the upsamplers shown in
The downsampler 1410 may downsample resolution of the chroma color components to a lower resolution in conformance to the color format to which the base layer image adheres. Thus, in an implementation using a 4:2:2 format, the downsampler 1410 may downsample the chroma color components (Cr, Cb) so that each has half the resolution in the horizontal direction as the corresponding luma color component. Similarly, in an implementation using a 4:2:0 format, the downsampler 1410 may downsample the chroma color components (Cr, Cb) so that each has half the resolution in both the horizontal and vertical directions as the corresponding luma color component. The downsampler 1410 may output downsampled Cr and Cb chroma data to the base layer buffer 1420.
The base layer buffer 1420 may store luma component data and downsampled chroma component data until it is to be transferred to a file. The data stored in the base layer buffer 1420 may form a base layer image (
The coder 1460 may code the luma data and downsampled Cr and Cb chroma data before the system 1400 stores the base layer image in the file. As in the prior embodiments, the compression may occur according to an interoperability coding standard such as HEVC or AV1. In such cases, the downsampling provided by the downsampler 1410 may conform to the resolution of image data (e.g., 4:2:2, 4:2:0 or another resolution) that is appropriate for the coder 1470 being used. In practice, the coder 1460 may be a coding system provided by a processing device on which the system 1400 operates.
The color converter 1430 may convert the source luma component and the downsampled Cr and Cb chroma components from a luma/chroma color space to an alternate color space such as a red/green/blue color space or a Y′UV color space. Other representations, e.g. a different RGB representation with different color primaries and a different transfer characteristics (e.g. from YCbCr BT.709 to RGB BT.2100 PQ) could be used. The color converter 1430 may output component data to the component processors 1440.1-1440.n. When the color converter 1430 has capability to convert input image data to multiple color spaces, the color converter 1430 may output metadata identifying a selected color space.
The system 1400 may have a plurality of component processors 1440.1-1440.n one provided for each color component of the color space to which the color converter 1430 converts its input data. Each component processor 1440.1, . . . , 1440.n may possess an upsampler 1442 and a filter system 1444. As in the prior embodiments, the upsampler 1442 may upsample its respective color component to a full resolution of the source image. The upsampler 1440 may upsample downsampled Cr and Cb chroma signals input to it from the coder/decoder 1430 to a higher resolution. For example, the Cr and Cb chroma data may be upsampled from the 4:2:2 or 4:2:0 resolution as present in the base layer buffer 1420 to a full resolution format (e.g., 4:4:4). The filter system 1444 may apply filtering operations to the upsampled color component. Filtered output from the filters 1444 may be stored in the enhancement layer buffer 1450 and may form the basis of the enhancement layer image.
The system 1400 also may possess an image packing unit 1470 that arranges the component data stored in the enhancement layer buffer 1450 into a virtual image, which may be coded by a coder 1480. As in the prior embodiments, the coder 1480 may (but need not) operate according to a different compression standard than that of the coder 1460.
In an embodiment, the system 1400 also may include a coder/decoder (not shown) as described in the
Because the embodiment of
In the foregoing embodiments, base layer images and enhancement layer images may be stored in container formats such a HEIF container. With a HEIF containers, in order to be backwards compatible and allow older players to render a “traditional” 4:2:0 image, the base layer image (e.g., a 4:2:0 image) may be stored as a primary item in a HEIF file and enhancement layer image(s) may be stored in alternative groups. Alternatively, both image items (base and enhancement layer images) may be placed in an ‘altr’ alternative groups, which may indicate that the images are alternatives of each other. Placing the base layer image as the first image in the altr group and enhancement layer image(s) as secondary images may facilitate backwards compatibility with decoders that are not programmed to recognize the enhancement layer image(s) because decoders typically are programmed to ignore data representations that they do not recognize. Systems, however, that are programmed to understand the enhancement layer image(s) can access, decode, and display the larger resolution (e.g., 4:4:4).
In another embodiment, a system according to the foregoing embodiments may cascade operations, generating a succession of enhancement layer images. For example, an ‘altr’ group can have an image at 4:4:4 resolution, another image at 4:2:2 resolution, and a further image at 4:2:0 resolution. In a cascaded operation, the 4:4:4 version may be derived from the 4:2:2 version, and the 4:2:2 version may be derived from the 4:2:0 version. In the context of the
In a further embodiment, other features, such as HDR could be included. For example, an ‘altr’ group can have 4:4:4 HDR, 4:2:0 HDR, and 4:2:0 SDR (primary), or be a 4:4:4 HDR, 4:4:4 SDR, and 4:2:0 SDR. In the first case, the HDR enhancement image is derived first followed by the 4:4:4 generation, while in the second case the 4:4:4 SDR enhancement image is derived first and the HDR one is derived thereafter. Close loop conversions for all these cases can be employed at the encoder end for improved performance.
The foregoing discussion has described the various embodiments of the present disclosure in the context of coding systems, decoding systems and process flows that they employ. In practice, these systems may be applied in a variety of devices, such as mobile devices provided with integrated video cameras (e.g., camera-enabled phones, entertainment systems and computers) and/or wired communication systems such as videoconferencing equipment and camera-enabled desktop computers. In some applications, the functional blocks described hereinabove may be provided as elements of an integrated software system, in which the blocks may be provided as elements of a computer program, which are stored as program instructions in memory and executed by a general processing system. In other applications, the functional blocks may be provided as discrete circuit components of a processing system, such as functional units within a digital signal processor or application-specific integrated circuit. Still other applications of the present invention may be embodied as a hybrid system of dedicated hardware and software components. Moreover, the functional blocks described herein need not be provided as separate elements. For example, although
Further, the figures illustrated herein have provided only so much detail as necessary to present the subject matter of the present invention. In practice, video coders and decoders typically will include functional units in addition to those described herein, including buffers to store data throughout the coding pipelines illustrated and communication transceivers to manage communication with the communication network and the counterpart coder/decoder device. Such elements have been omitted from the foregoing discussion for clarity.
Several embodiments of the invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
The present application claims benefit from priority of U.S. application Ser. No. 63/519,306, entitled “Techniques For Providing Chroma Format Scalability In Image Processing Applications” and filed Aug. 14, 2023, the disclosure of which is incorporated herein in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| 63519306 | Aug 2023 | US |