Current digital image and video coding systems typically process video data with uniform fidelity (meaning the sampled pixels are equally spaced) with the same color format, bit-depth, color gamut, etc. However there are situations where non-uniform fidelity is preferred.
Although scalable video coding system could be used to support coding of video data with non-uniform fidelity by coding different portions of video data with different fidelity characteristics in different enhancement layers, such techniques would have a number of drawbacks.
For example, more layers means more overhead and use of multiple layers to carry image data of different fidelities would result in higher-bit-rate coding, even if coding data were forced to skip mode in areas that did not carry data of relevant fidelity. Further, encoding/decoding entire frames at multiple layers requires more memory and processing cycles. As other example drawbacks, modern scalable video coding standards do not support color format scalability and boundaries between image areas having different fidelities would have to be aligned to coding blocks of the different layers. In addition, quality disruption would occur at boundaries between image areas having different fidelities, which may cause unpleasant visual effects with low number of enhancement layers.
Accordingly, the inventors perceive a need in the art for a coding system that codes images with non-uniform fidelity regions by single layer coding.
Embodiments of the present disclosure provide techniques for non-uniform digital image fidelity and video coding. According to these techniques, a plurality of fidelity regions within an image may be identified. Each fidelity region may be associated with a fidelity characteristic. Video encoding may be performed for each pixel block of the image. The video encoding for each pixel block may include determining whether image data of a fidelity region neighboring the pixel block's fidelity region is a candidate for prediction. If so, content of the neighboring fidelity region may be interpolated using the fidelity characteristic of the pixel block. Subsequently, the pixel block may be predictively encoded using interpolated content.
As an example, a video coder may define multiple fidelity regions in different spatial areas of a video sequence, each of which may have different fidelity characteristics. The coder may code the different representations in a common video sequence. Where prediction data crosses boundaries between the regions, interpolation may be performed to create like kind representations between prediction data and video content being coded.
The fidelity converter 110 may analyze input video and assign different fidelity characteristics to different spatial regions of the input video. The fidelity characteristics of a region may include respective definitions of characteristics that are useful to represent image content of the region such as pixel density, color format, bit-depth or color gamut. Thus, where one region may have a 4:4:4 color format assigned to it, another region may have a 4:2:0 or 4:2:2 format assigned to it. Similarly, one region may utilize 16-bit assignments for color bit depth where another region may have 8- or 10-bit bit depths. Still further, one region may have BT.2020 color gamut to represent image data where another region may utilize BT.709 bit depth.
Fidelity regions may be defined based on content analysis performed across video data (or portion thereof) that prioritizes image content and estimates coding quality that likely is to arise of different fidelity representations. For example, prioritization may be performed based on region of interest (ROI) detection that identifies human faces or other foreground objects from video content. ROI detection also may be performed by foreground/background discrimination processes, or field of focus estimation in virtual/augmented reality (VR/AR), or estimation of objects motion within image data. Another example is screen content coding, in which case higher fidelity may be assigned to areas like text and other graphic rendered objects.
Video frames may be parsed into pixel blocks, which represent spatial arrays of those frames. Pixel blocks need not be located wholly within one region or another so, as a consequence, some blocks may have content that belongs to different fidelity regions. Prediction operations may be performed using interpolation (represented by interpolator 150) that cause prediction operations such as motion prediction searches to convert candidate prediction data stored in the decoded picture buffer 140 to fidelity characteristics of the pixel block being coded.
In an embodiment, decoded video data from the video decoder 130 may be subject to interpolation (represented by interpolator 190) prior to being stored in the decoded picture buffer 140. Such interpolation may generated as a plurality of interpolation regions 142.1-142.n which may be stored in the decoded picture buffer 140.
Coded video data may be defined using pixel blocks as bases of representation, which represent spatial arrays of corresponding frames. As indicated, pixel blocks need not be located wholly within one region or another so, as a consequence, some blocks may have content that belongs to different fidelity regions. When prediction reference data identifies a portion of a reference frame as a basis of prediction, the interpolator 250 may convert the prediction data stored in the decoded picture buffer 240 to fidelity characteristics of the pixel block being decoded.
In an embodiment, decoded video data from the video decoder 220 may be subject to interpolation (represented by interpolator 290) prior to being stored in the decoded picture buffer 240. Such interpolation may be generated as a plurality of interpolation regions 252.1-252.n which may be stored in the decoded picture buffer 240.
With the various fidelity regions thus defined, exchange of coded video may commence. An encoder may code video frames on a pixel block by pixel block basis. For each pixel block, the method 300 may determine whether image data of neighboring regions are candidates for prediction (box 330) and, if so, the encoder may interpolate content of neighboring regions using the fidelity characteristics of the pixel block being coded (box 340). Thereafter, the encoder may code the pixel block predictively (box 350) using either reference frame data that already matches the fidelity characteristics of the pixel block being coded or the interpolated content generated at box 330. The encoder may transmit the coded video data to the decoder (msg. 360).
At the decoder, the decoder may analyze prediction references within the coded pixel block data to determine whether there is a mismatch between fidelity characteristics of reference frame data that will serve as prediction data for the pixel block and fidelity characteristics of the pixel block itself (box 370). If so, the decoder may convert content of the reference pixel block to the fidelity domain of the coded pixel block (box 380). Such conversion, of course, is unnecessary if the prediction data matches the fidelity characteristics of the pixel block being decoded. Thereafter, the decoder may decode the coded pixel block using the prediction data (box 390).
Fidelity regions may be defined in a variety of ways. Where pixel density varies among regions, the positions of pixels in each region may be explicitly described in a binary map, which may be compressed losslessly. The map may identify pixel locations using locations of pixels in the master image as a basis for comparison. The map may be signaled per frame or only when a change happens.
Alternatively, pixel density information may be described as a function of spatial offsets (x, y) with regard to the top left corner of the master image:
In another embodiment, interval distances between two adjacent sample pixels (Interval_x and Internal_y for example) may be represented, again, in pixel increments of the master image. In addition, an initial re-sampled pixel position may be defined relative to the top-left corner of the original image. Again, this information may be signaled per frame or only when changed.
Another way of signaling the density is to partition the frame into multiple tiles or slices with each one covering one density. Different tiles/slices may overlap between each other, as shown in the example of
In the example of
As illustrated, the regions 410-440 may overlap each other spatially. Where overlap occurs between regions, the region having highest fidelity (e.g., highest pixel density, highest bit depth, etc.) may be taken to govern in the region of overlap.
As indicated, pixel block boundaries need not align with region boundaries. Accordingly, pixel blocks may contain image data with non-uniform fidelity characteristics. As indicated, interpolation of image content may be performed to develop prediction data that matches the fidelity characteristics of the pixel blocks being coded.
As an example, a pixel block 450 may be identified in the frame 400 and located within the region 430. An area 455 may be identified as a candidate for prediction with respect to the pixel block 450. Notably, the candidate area 455 is found within the region 420 neighboring the region 430. Therefore, the frame 400 may be encoded by interpolating content of the region 420 using the fidelity characteristics of the pixel block 450. The pixel block 450 may be predictively coded using the interpolated content.
Conversely, a pixel block 460 may also be within the region 430. An area 465 may be identified as a prediction candidate with respect to pixel block 460. However, in this case, the candidate area 465 is also within the region 430 with the pixel block 460. Thus, the pixel block 460 may be predictively coded using reference frame data that already matches the fidelity characteristic of the pixel block 460.
Other processes may be performed for coding pixel blocks. To perform transform coding (for example, conversion from pixel residuals to discrete cosine transform coefficients), a non-uniform residual block either may be padded with additional residual values to create a pixel block with uniform density of coefficients or it may be partitioned into sub-blocks with uniform density of residuals. For example,
The foregoing discussion has described operation of the embodiments of the present disclosure in the context of video coders and decoders. Commonly, these components are provided as electronic devices. Video decoders and/or controllers can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on camera devices, personal computers, notebook computers, tablet computers, smartphones or computer servers. Such computer programs typically are stored in physical storage media such as electronic-, magnetic- and/or optically-based storage devices, where they are read to a processor and executed. Decoders commonly are packaged in consumer electronics devices, such as smartphones, tablet computers, gaming systems, DVD players, portable media players and the like; and they also can be packaged in consumer software applications such as video games, media players, media editors, and the like. And, of course, these components may be provided as hybrid systems that distribute functionality across dedicated hardware components and programmed general-purpose processors, as desired.
For example, the techniques described herein may be performed by a central processor of a computer system.
As indicated, the memory 620 may store program instructions that, when executed, cause the processor to perform the techniques described hereinabove. The memory 620 may store the program instructions on electrical-, magnetic- and/or optically-based storage media.
The system 600 may possess other components as may be consistent with the system's role as an image source device, an image sink device or both. Thus, in a role as an image source device, the system 600 may possess one or more cameras 630 that generate the multi-view video. The system 600 also may possess a coder 640 to perform video coding on the video and a transmitter 650 (shown as TX) to transmit data out from the system 600. The coder 640 may be provided as a hardware device (e.g., a processing circuit separate from the central processor 610) or it may be provided in software as an application 614.1.
In a role as an image sink device, the system 600 may possess a receiver 650 (shown as RX), a decoder 680, a display 660 and user interface elements 670. The receiver 650 may receive data and the decoder 680 may decode the data. The display 660 may be a display device on which content of the view window is rendered. The user interface 670 may include component devices (such as motion sensors, touch screen inputs, keyboard inputs, remote control inputs and/or controller inputs) through which operators input data to the system 600.
Several embodiments of the present disclosure are specifically illustrated and described herein. However, it will be appreciated that modifications and variations of the present disclosure are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the disclosure.
This application benefits from priority of application Ser. No. 62/347,915, filed Jun. 9, 2016 and entitled “Non-Uniform Digital Image Fidelity and Video Coding,” the disclosure of which is incorporated herein by its entirety.
Number | Date | Country | |
---|---|---|---|
62347915 | Jun 2016 | US |