The present disclosure relates generally to screen content coding, and more particularly, to advanced screen content coding with improved color (palette) table and index map coding.
Screen content coding creates new challenges for video compression because of its distinct signal characteristics compared to conventional video signals. There are multiple existing techniques for advanced screen content coding, e.g., pseudo string match, color palette coding, and intra motion compensation or intra block copy. Among these techniques, pseudo string match shows the highest gain for lossless coding, but with significant complexity overhead and difficulties on lossy coding mode. Color palette coding is developed for screen content under the assumption that non-camera captured content (e.g., computer-generated content) typically contains a limited number of distinct colors, rather than the continuous or near-continuous color tones found in many video sequences. Even though the pseudo string match and color palette coding methods showed great potential, intra motion compensation or intra block copy was adopted into the working draft (WD) version 4 and reference software of the on-going High Efficiency Video Coding (HEVC) range extension for screen content coding. However, the coding performance of intra block copy is bounded because of its fixed block decomposition. Performing block matching (similar to motion estimation in intra picture) also increases the encoder complexity significantly on both computing and memory access.
According to one embodiment, there is provided a method for screen content encoding. The method includes deriving a color index map based on a current coding unit (CU). The method also includes encoding the color index map, wherein at least a portion of the color index map is encoded using a first coding technique, wherein a first indicator indicates a significant distance of the first coding technique. The method further includes combining the encoded color index map and the first indicator for transmission to a receiver.
According to another embodiment, there is provided a method for screen content decoding. The method includes receiving a video bitstream comprising a color index map. The method also includes receiving a first indicator. The method further includes decoding at least a portion of the color index map using a first decoding technique, wherein the first indicator indicates a significant distance of the first decoding technique. In addition, the method includes reconstructing pixels associated with a current coding unit (CU) based on the color index map.
Other embodiments include apparatuses configured to perform these methods.
For a more complete understanding of the present disclosure, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, wherein like numbers designate like objects, and in which:
The following documents and standards descriptions are hereby incorporated into the present disclosure as if fully set forth herein:
T. Lin, S. Wang, P. Zhang, K. Zhou, “AHG7: Full-chroma (YUV444) dictionary+hybrid dual-coder extension of HEVC”, JCT-VC Document, JCTVC-K0133, Shanghai, China, October 2012 (hereinafter “REF1”);
W. Zhu, J. Xu, W. Ding, “RCE3 Test 2: Multi-stage Base Color and Index Map”, JCT-VC Document, JCTVC-N0287, Vienna, Austria, July 2013 (hereinafter “REF2”);
L. Guo, M. Karczewicz, J. Sole, “RCE3: Results of Test 3.1 on Palette Mode for Screen Content Coding”, JCT-VC Document, JCTVC-N0247, Vienna, Austria, July 2013 (hereinafter “REF3”);
L. Guo, M. Karczewicz, J. Sole, R. Joshi, “Non-RCE3: Modified Palette Mode for Screen Content Coding”, JCT-VC Document, JCTVC-N0249, Vienna, Austria, July 2013 (hereinafter “REF4”);
D.-K. Kwon, M. Budagavi, “RCE3: Results of test 3.3 on Intra motion compensation, JCT-VC Document, JCTVC-N0205, Vienna, Austria, July 2013 (hereinafter “REF5”);
C. Pang, J. Sole, L. Guo, M. Karczewicz, R. Joshi, “Non-RCE3: Intra Motion Compensation with 2-D MVs”, JCT-VC Document, JCTVC-N0256, Vienna, Austria, July 2013 (hereinafter “REF6”);
C. Pang, J. Sole, L. Guo, M. Karczewicz, R. Joshi, “Non-RCE3: Pipeline Friendly Intra Motion Compensation”, JCT-VC Document, JCTVC-N0254, Vienna, Austria, July 2013 (hereinafter “REF7”);
D. Flynn, J. Soel and T. Suzuki, “Range Extension Draft 4”, JCTVC-L1005, August 2013 (hereinafter “REF8”); and
H. Yu, K. McCann, R. Cohen, and P. Amon, “Draft call for proposals for coding of screen content and medical visual content”, ISO/IEC JTC1/SC29/WG11 N13829, July 2013 (hereinafter “REF9”).
Embodiments of this disclosure provide an advanced screen content coding process with improved palette table and index map coding. The disclosed embodiments significantly outperform the current version of High-Efficiency Video Coding (HEVC Version 2). The disclosed embodiments include multiple algorithms that are specifically for coding screen content. These algorithms include pixel representation using a palette table (or equivalently, color table), palette table compression, color index map compression, string match, and residual compression. The embodiments disclosed herein are developed, harmonized, and integrated with the HEVC Range Extension (RExt) as future HEVC extensions to support efficient screen content coding. However, these embodiments could additionally or alternatively be implemented with existing video standards or any other suitable video standards. For ease of explanation, HEVC RExt is used herein as an example to describe the various embodiments. Similarly, HEVC RExt software is used to implement the various embodiments to showcase the compression efficiency.
The transmitter 100 is configured to perform a high-efficiency color palette compression (CPC) process that can be performed on each coding unit (CU) or coding tree unit (CTU) in a bitstream. As shown in
A palette table creating block 103 uses the CU 101 to derive or generate a palette table (sometimes referred to as a color table). An example palette table 303 is shown in
Based on the derived palette table 303, a color classifier block 105 uses the CU 101 to assign the colors or pixel values of the CU 101 into the color index map 311 and one or more prediction residual maps 313. A table encoding block 107 receives the palette table 303 and encodes the entries in the palette table 303. An index map encoding block 109 encodes the color index map 311 created by the color classifier block 105. These operations are described in greater detail below.
A residual encoding block 111 encodes each prediction residual map 313 created by the color classifier block 105. In some embodiments, the residual encoding block 111 performs adaptive fixed-length or variable-length residual binarization, as indicated at 321 in
Turning to
Although
Based on the derived palette table 303, each pixel in the original CU 101 can be converted to its color index within the palette table 303. Embodiments of this disclosure provide methods to efficiently compress the palette table 303 and the color index map 311 (described below) for each CU 101 into the stream. At the receiver side, the compressed bitstream can be parsed to reconstruct, for each CU 101, the complete palette table 303 and the color index map 311, and then further derive the pixel value at each position by combining the color index and palette table.
For simplicity, sequences of 4:4:4 are used in the disclosure. For 4:2:2 and 4:2:0 videos, chroma upsampling could be applied to obtain the 4:4:4 sequences, or each chroma component 402-404 could be processed independently. In the case of 4:0:0 monochrome videos, these can be treated as an individual plane of 4:4:4 without the other two planes. All methods for 4:4:4 can be applied directly.
The color components 402-404 can be interleaved together in a packing process, resulting in the packed CU 401. In an embodiment, a flag called enable_packed_component_flag is defined for each CU 101 to indicate whether the CU 101 is processed using packed mode (thus resulting in the CU 401) or conventional planar mode (i.e., G, B, R or Y, U, V components 402-404 are processed independently.)
Both packed mode and planar mode can have advantages and disadvantages. For instance, planar mode supports parallel color component processing for G/B/R or Y/U/V. However, planar mode may result in low coding efficiency. Packed mode can share the header information (such as the palette table 303 and color index map 311) for the CU 101 among different color components. However, packed mode might prevent multiple color components from being processed simultaneously or in a parallel fashion. One simple method to decide whether the current CU 101 should be encoded in the packed mode is to measure the rate distortion (R-D) cost.
The enable_packed_component_flag is used to explicitly signal the encoding mode to the decoder. In addition to defining the enable_packed_component_flag at the CU level for low-level handling, the flag can be duplicated in the slice header or even the sequence level (e.g., the Sequence Parameter Set or Picture Parameter Set) to allow slice level or sequence level handling, depending on the specific application requirement.
Palette Table and Index Map Derivation
The following describes operations at the palette table creating block 103 and the table encoding block 107 in
A new hash based palette table derivation will now be described, which can be used to efficiently determine the major colors and reduce error. For each CU 101, the palette table creating block 103 examines the color value of each pixel in the CU 101 and creates a color histogram using the three color components together, i.e., packed G, B, R or packed Y, Cb, Cr according to the frequency of occurrence of each color in descending order. To represent each 24-bit color, the G and B color components (or Y and Cb color components) can be bit-shifted accordingly. That is, each packed color can be represented according to a value (G<<16)+(B<<8)+(R) or (Y<<16)+(Cb<<8)+(Cr), where <<x is a left bit shift operation. The histogram is sorted according to the frequency of color occurrence in descending order.
For lossy coding, the palette table creating block 103 then applies a hash-based neighboring color grouping process on the histogram-ordered color data to obtain a more compact palette table representation. For each color component, the least significant X bits (depending on quantization parameter (QP)) are cleared and a corresponding hash representation is generated using a hash function (G>>X<<(16+X))|(B>>X<<(8+X))|(R>>X<<X) or (Y>>X<<(16+X))|(Cb>>X<<(8+X))|(Cr>>X<<X), where >>x is a right bit shift operation, and X is determined based on QP. A hash table or alternatively a binary search tree (BST) data structure is exploited for fast seeking colors having the same hash value. For any two hash values, their distance is defined as the maximum absolute difference of the corresponding color components.
During neighboring color grouping, the palette table creating block 103 processes packed colors in descending order of the frequency of occurrence, until N colors have been processed. If the number of colors in the current CU is smaller than N, then all colors in the current CU are processed. N is bounded by a predetermined maximum number of colors (max_num_of_colors). In some embodiments, max_num_of_colors=128, i.e., N<=128. After hash based color grouping, the N chosen colors (or all colors in the case that the number of colors in the current CU is smaller than N), are then reordered by sorting the colors in ascending order based on the value of each packed color. The result is a palette table such as the palette table 303 shown in
When the number of colors represented in the CU 101 is greater than the number of colors N in the palette table 303, the less-frequently occurring colors are arranged as residuals outside of the palette table 303. For example, the color values 49, 53, 50, and 51 are part of the palette table 303, while the color values 48, 52, 47, 54, 55, and 56 are residual colors 305 outside of the palette table 303.
The derivation of the palette table 303, as performed by the palette table creating block 103, can be described by the following pseudo-code.
In the pseudo-code above, ComputeHash(C, QP) applies the hash function (G>>X<<(16+X))|(B>>X<<(8+X))|(R>>X<<X) or (Y>>X<<(16+X))|(Cb>>X<<(8+X))|(Cr>>X<<X) to generate the hash value, where X is dependent on QP. Dist(hash1, hash2) obtains the maximum absolute difference of the corresponding color components in hash 1 and hash2. Here, hash table data and binary search tree structures are utilized to quickly find the colors satisfying a certain condition based on its hash value.
As discussed above, based on the derived palette table 303, the color classifier block 105 uses the CU 101 to assign the colors or pixel values of the CU 101 into the color index map 311 and one or more prediction residual maps 313. That is, the color classifier block 105 assigns each color in the palette table 303 to a color index within the palette table 303. For example, as indicated at 307 in
For a planar CU, each color component can have its own individual palette table, such as colorTable_Y, colorTable_U, colorTable_V or colorTable_R, colorTable_G, colorTable_B. In some embodiments, the palette table for a major component can be derived, such as Y in YUV or G in GBR, and this table can be shared for all components. Typically, by using a shared Y or G palette table, color components other than Y or G would have some mismatch relative to the original pixel colors from those in the shared palette table. The residual engine (such as HEVC coefficients coding methods) can then be applied to encode those mismatched residuals. On other embodiments, for a packed CU, a single palette table can be shared among all components.
The following pseudo code exemplifies the palette table and index map derivation.
Palette Table Processing
For each CU 101, the transmitter 100 can derive the palette table 303 from the current CU 101 (referred to as explicit palette table carriage) or the transmitter 100 can derive the palette table 303 from a left or upper neighbor of the current CU 101 (referred to as implicit palette table carriage). The table encoding block 107 receives the palette table 303 and encodes the entries in the palette table 303.
Palette table processing involves the encoding of the size of the palette table 303 (i.e., the total number of distinct colors) and each color itself. The majority of the bits are consumed by the encoding of each color in the palette table 303. Hence, the focus will be placed on the color encoding (i.e., the encoding of each entry in the palette table 303).
The most straightforward method to encode the colors in a palette table is using a pulse code modulation (PCM) style algorithm, where each color is coded independently. Alternatively, the nearest prediction for successive color can be applied, and then the prediction delta can be encoded rather than the default color intensity, which is the so-called DPCM (differential PCM) style. Both methods can later be entropy encoded using an equal probability model or adaptive context model, depending on the trade-off between complexity costs and coding efficiency.
Embodiments of this disclosure provide another advanced scheme, called Neighboring Palette Table Merge, where a color_table_merge_flag is defined to indicate whether the current CU (e.g., the CU 101) uses the palette table associated with its left CU neighbor or its upper CU neighbor. If not, the current CU carries the palette table signaling explicitly. This process may also be referred as neighboring palette table sharing. With this merging process, a color_table_merge_direction flag indicates the merging direction, which is either from the upper CU or from the left CU. Of course, the merging direction candidates could be in directions other than the upper CU or left CU (e.g., upper-left, upper-right, and the like). However, the upper CU and left CU are used in this disclosure to exemplify the concept. Each pixel in the current CU is compared with the entries in the existing palette table associated with the left CU or upper CU and assigned an index yielding the least prediction difference (i.e., pixel subtracts the closest color in the palette table) via the deriveIdxMap( ) pseudo code shown above. For the case where the prediction difference is non-zero, all of the residuals are encoded using the HEVC Range Extension (RExt) residual engine. The decision of whether or not to use the table merging process can be determined by the R-D cost.
When a color table is carried explicitly in the bit stream, it can be coded sequentially for each color component. Inter-table palette stuffing or intra-table color DPCM is applied as described below to code each entry sequentially for all three color components.
Inter-Table Palette Stuffing
Even when the palette table sharing method is not used, there may still exist colors that are common between the palette table 303 and the palette predictor. Therefore, applying an inter-table palette stuffing technique entry-by-entry can further improve coding efficiency. Here, the palette predictor is derived from a neighboring block, such as a left neighbor CU or an upper neighbor CU.
Let c(i) and r(j) represent the i-th entry in the current palette table 553 and the j-th entry in the palette predictor 551, respectively. It is noted again that each entry contains three color components (GBR, YCbCr, or the like). For each color entry c(i), i<=N, in the current table 553, the table encoding block 107 finds an identical match r(j) from the palette predictor 551. Instead of signaling c(i), j is encoded predicatively. The predictor is determined as the smallest index k that is greater than the previously reconstructed j and that satisfies r(k)[0]>=c(i−1)[0]. The prediction difference (j−k) is signalled in the bitstream. Since the difference (j−k) is non-negative, no sign bit is needed.
It is noted that either a context adaptive model or a bypass model can be used to encode (j−k), as known in the art. Typically, a context adaptive model is used for high efficiency purposes while a bypass model is used for high-through and low-complexity requirement. In some embodiments of this disclosure, two context adaptive models can be used to encode the index prediction difference (j−k), using a dynamic truncated unary binarization scheme.
Intra-Table Color DPCM
If no match is found in the palette predictor 551 for the i-th entry in the current palette table 553, the value of the i-th entry is subtracted from the previous entry (the (i-1)th entry) and the absolute difference (|d(i)|) is encoded using color DPCM for each component. In general, fewer bits for the absolute predictive difference and a sign bit will be produced and encoded using intra-table color DPCM. Either a context adaptive model or a bypass model can be used to encode the absolute predictive difference and the associated sign bin, as known in the art. In addition, the sign bit could be hidden or not coded for the some cases. For example, given that the current palette table 553 is already ordered in ascending order, the Y (or G) component difference doesn't require a sign bit at all. Likewise, the Cb (or B) component difference doesn't need the sign bit if the corresponding Y (or G) difference is zero. Furthermore, the Cr (or R) component difference doesn't need the sign bit if both the Y (or G) and Cb (or B) differences are zeros. As another example, the sign bit can be hidden if the absolute difference is zero. As yet another example, the sign bit can be hidden if the following boundary condition is satisfied: c[i-1]−|d(i)|<0 or c[i-1]+|d(i)|>255.
For the first entry c(0) of the current table 553, if the inter-table palette stuffing technique is not used, each component of c(0) can be encoded using a fixed 8-bit bypass context model. Additionally or alternatively, it could be encoded using an adaptive context model to further improve the performance.
To better illustrate the inter-table palette stuffing and intra-table color DPCM techniques, an example using the data in the current palette table 553 will now be described.
Starting from the first entry c(0) of the current palette table 553, i.e., (G, B, R)=(0, 0, 192), it can be seen that c(0) does not have a match in the palette predictor 551, therefore c(0) is encoded independently. The second entry c(1) of the current palette table 553 ((G, B, R)=(0, 0, 240) also does not have a match in the palette predictor 551. Given that the first entry c(0) has already been coded, only the prediction difference between c(1) and c(0) should be carried in the bitstream, i.e., (0, 0, 240)−(0, 0, 192)=(0, 0, 48). For the third entry c(2) of the current table 553, an exact match is identified in the palette predictor 551 where j=1. The predictive index using the previously coded color entry is 0, therefore, only (1−0)=1 needs to be encoded. These coding techniques are applied until the last entry of the current table 553 (i.e., idx=12 in
The explicit coding of the color table is summarized in the following pseudo code, where N and M are the number of entries in current and reference color table, respectively.
The explicit decoding of the color table is summarized in the following pseudo code.
There are several methods to generate the neighboring palette tables for use in the merging process in coding the current CU. Depending on the implementation, one of the methods (referred to as Method A for ease for explanation) requires updating at both the encoder and the decoder. Another method (referred to as Method B) is an encoder side only process. Both methods will now be described.
Method A: In this method, the palette tables of neighbor CUs are generated upon the available reconstructed pixels, regardless of CU depth, size, etc. For each CU, the reconstructions are retrieved for its neighboring CU at the same size and same depth (assuming the color similarity would be higher in this case).
Method B: In this method, the merging process occurs when a current CU shares the same size and depth as its upper CU neighbor and/or its left CU neighbor. The palette tables of the available neighbors are used to derive the color index map of the current CU for subsequent operations. For example, for a current 16×16 CU, if its neighboring CU (i.e., either its upper neighbor or its left neighbor) is encoded using the palette table and index method, the palette table of the neighboring CU is used for the current CU to derive the R-D cost. This merge cost is compared with the case where the current CU derives its palette table explicitly (as well as other conventional modes that may exist in the HEVC or HEVC RExt). Whichever case produces the lowest R-D cost is selected as the mode to be written into the output bit stream. In Method B, only the encoder is required to simulate different potential modes. At the decoder, the color_table_merge_flag and color_table_merge_direction flag indicate the merge decision and merge direction without requiring additional processing by the decoder.
Predictor Palette
To further reduce the complexity, a predictor palette is used to cache the colors that come from the previously coded palette table or another predictor palette, which eventually comes from the previously coded palette table. In one embodiment, the entries in the predictor palette come from the predictor palette or coded palette table of the left or upper CU of the current CU. After a CU is encoded with a color palette, the predictor palette is updated if this CU size is larger than or equal to the CU size associated with the predictor palette and the current palette is different from the predictor palette. If the current CU is not encoded using the palette mode, there is no change to the predictor palette. This is also referred to as predictor palette propagation. This predictor palette may be reset in the beginning of each picture or slice or each CU row.
A number of methods are available for constructing the predictor palette. In a first method, for each CU encoding, the predictor palette is constructed from the predictor palette of its left CU or upper CU. In this method, one predictor palette table is saved for each CU.
A second method is different from the first method in that the palette table, instead of the predictor palette table, associated with the upper CU is used in the prediction process.
Color Index Map Processing/Coding
The index map encoding block 109 encodes the color index map 311 created by the color classifier block 105. To encode the color index map 311, the index map encoding block 109 performs at least one scanning operation (horizontal 315 or vertical 317) to convert the two-dimensional (2D) color index map 311 to a one-dimensional (1D) string. Then the index map encoding block 109 performs a string search algorithm (described below) to generate a plurality of matches. In some embodiments, the index map encoding block 109 performs separate horizontal and vertical scanning operations and performs the string search algorithm to determine which provides better results.
Embodiments of this disclosure provide a 1D string matching technique and a 2D variation to encode the color index map 311. At each position, the encoding technique finds a matched point and records the matched distance and length for the 1D string match, or records the width and height of the match for the 2D string match. For an unmatched position, its index intensity, or alternatively, the delta value between its index intensity and predicted index intensity, can be encoded directly.
A straightforward 1D search method can be performed over the color index map 601. For example,
The following is a result set for the encoding technique using the portion of the 1D color index vector 700 shown in
The following pseudo code is given for this matched pair derivation.
Simplified Color Index Map Coding
In some embodiments, the following operations can be performed as a simplified method for color index map processing in a 1D fashion. As described above, the color index map 601 can be represented by matched or unmatched pairs. For matched pairs, the pair of matched distance and length of group indices is signaled to the receiver.
There are a number of quite noticeable scenarios where a coding unit includes only a few colors. This can result in one or more large consecutive or adjacent sections that have the same index value. In such cases, signaling a (distance, length) pair may introduce more overhead than necessary. To address this issue, the simplified color index map processing method described below further reduces the number of bits consumed in coding the color index map.
As in the 1D index map coding solution, the concept of “distance” can be separated into two main categories: significant distance and normal distance. Normal distance is encoded using contexts. Then, associated lengths are encoded sequentially.
Embodiments of this method use significant distance. There are two types of significant distance for this method. One is distance=blockWidth. The other is distance=1. These two types of significant distance reflect the observation that distance=1 and distance=blockWidth are associated with the most significant percentage of the overall distance distribution. The two types of significant distance will now be described by way of illustration.
The coding method using distance=blockWidth is also referred to as CopyAbove coding. To illustrate the CopyAbove coding method, the 64×64 color index map 601 of
The coding method using distance=1 is also referred to as IndexMode coding or CopyLeft coding. To illustrate the IndexMode coding, consider the string 613 of indexes in the color index map 601. The string 613 includes a first index value ‘14’ followed by 51 subsequent index values ‘14’. Because each of the index values in the string 613 is the same, the 51 index values of the string 613 following the first ‘14’ can be coded together using distance=1 (which indicates that the index value that is a distance of one to the left of the current index value has the same value). The length of the matched string 613 is 51. Thus, the string 613 can be coded simply by indicating the IndexMode coding method and a length of 51 index values.
As described above, for this method of simplified color index map coding, the distance used for coding can be limited to the significant positions only; that is, the distance for these embodiments can be limited to only 1 or blockWidth. To further reduce the overhead, the length of the matched index can also be limited to the coding unit width. Using this definition, the distance and length pair can be signaled using only two binary flags (i.e., 2 bins) without sending the overhead of length and distance (it is inferred as the block width). For example, a first flag can indicate if the coding uses significant distance or does not use significant distance. If the first flag indicates that the coding uses significant distance, then a second flag can indicate if the significant distance is 1 (i.e., IndexMode) or blockWidth (i.e., CopyAbove). Since the matched string occurs line by line (or row by row) in a coding unit, any indices in a line which are not matched by distance=1 or distance=blockWidth are treated as unmatched indices. Such unmatched indices are coded one by one individually. For these unmatched indices, the prediction methods described above can be employed to improve the efficiency.
The decoder can perform decoding operations analogous to the CopyAbove coding and IndexMode coding techniques described above. For example, the decoder can receive the second flag, and based on the value of the second flag, the decoder knows to decode according to the CopyAbove or IndexMode decoding technique.
A 2D variation of the 1D string matching technique described above can also be used. The 2D matching technique includes the following steps:
Step 1: The location of the current pixel and a reference pixel are identified as a starting point.
Step 2: A horizontal 1D string search is applied to the right direction of the current pixel and the reference pixel. The maximum search length is constrained by the end of the current horizontal row. The maximum search length can be recorded as right_width.
Step 3: A horizontal 1D string search is applied to the left direction of the current pixel and the reference pixel. The maximum search length is constrained by the beginning of the current horizontal row, and may also be constrained by the right_width of a prior 2D match. The maximum search length can be recorded as left_width.
Step 4: The same 1D string search is performed at the next row, using pixels below the current pixel and the reference pixel as the new current pixel and reference pixel.
Step 5: Stop when right_width==left_width==0.
Step 6: For each height[n]={1, 2, 3 . . . }, there is a corresponding array of width[n] (e.g., {left_width[1], right_width[1]}, {left_width[2], right_width[2]}, {left_width[3], right_width[3]} . . . }.
Step 7: A new min_width array is defined as {{lwidth[1], rwidth[1]}, {lwidth[2], rwidth[2]}, lwidth[3], rwidth[3]} . . . } for each height[n], where lwidth[n]=min(left_width[1:n-1]), rwidth[n]=min(right_width[1:n-1]).
Step 8: A size array{size[1], size[2], size[3] . . . } is also defined, where size[n]=height[n]×(lwidth[n]+hwidth[n]).
Step 9: Assuming that size[n] hold the maximum value in the size array, the width and height of the 2D string match is selected using the corresponding {lwidth[n], rwidth[n], height[n]}.
One technique to optimize the speed of a 1D or 2D search is to use a running hash. In some embodiments, a 4-pixel running hash structure can be used. A running hash is calculated for every pixel in the horizontal direction to generate a horizontal hash array running_hash_h[ ]. Another running hash is calculated on top of running_hash_h[ ] to generate a 2D hash array running_hash_hv[ ]. Each value match in the 2D hash array running_hash_hv[ ] represents a 4×4 block match. To perform a 2D match, 4×4 block matches are found before performing a pixel-wise comparison to their neighbors. Since a pixel-wise comparison is limited to 1-3 pixels, the search speed can be increased dramatically.
From above description, the matched widths of each row are different from each other, thus each row has to be processed separately. To achieve efficiency and low complexity, embodiments of this disclosure provide a block based algorithm that can be used in both hardware and software implementations. Similar in some respects to standard motion estimation, this algorithm processes one rectangle block at a time.
Take a 4×4 block as example. The first step is to process each row in parallel. Each pixel in one row of the rectangle is assigned to one U_PIXEL module 800. A processing unit for processing each row is called a U_ROW module.
Four U_ROW modules 900 are employed to process the four rows of the 4×4 block. The four U_ROW modules 900 can be arranged in parallel in a U_CMP module.
The next step of the algorithm is to process each column of the cmp array in parallel. Each cmp in a column of the cmp array is processed by a U_COL module.
The number of zeros in each row of the array rw[n][0-3] is then counted and the four results are recorded to an array r_width[n]. The array r_width[n] is the same as the array rwidth[n] in step 7 of the 2D matching technique described above. The array l_width[n] is generated in the same manner. The min_width array in step 7 can be obtained as {{l_width[1], r_width[1]}, {l_width[2], r width[2]}, {l_width[3], r_width[3]} . . . }.
This algorithm can be implemented in hardware or a combination of hardware and software to work in the parallel processing framework of any modern CPU (central processing unit), DSP (digital signal processor), or GPU (graphics processing unit). A simplified pseudo code for fast software implementation is listed below.
As shown in the pseudo code above, there is no data dependence in each FOR loop so typical software parallel processing methods, such as loop unrolling or MMX/SSE, can be applied to increase the execution speed.
This algorithm can also apply to a 1D search if the number of rows is limited to one. A simplified pseudo code for fast software implementation of a fixed length based 1D search is listed below.
After both of the 1D search and the 2D search are completed, the maximum of (ID length, 2D size (width×height)) is selected as the “winner.” If the lwidth (left width) of the 2D match is non-zero, the length of the prior 1D match (length=length−lwidth) can be adjusted to avoid an overlap between the prior 1D match and the current 2D match. If the length of the prior 1D match becomes zero after the adjustment, it should be removed from the match list.
Next, a starting location is calculated using current_location+length if the previous match is a 1D match, or current location+(lwidth+rwidth) if the previous match is a 2D match. When a 1D search is performed, if any to-be-matched pixel falls into any previous 2D match region where its location has already been covered by a 2D match, the next pixel or pixels are scanned through until a pixel is found that has not been coded by a previous match.
After obtaining the matched pairs, an entropy engine can be applied to convert these coding elements into the binary stream. In some embodiments, the entropy engine can use an equal probability model. An advanced adaptive context model could be applied as well for better compression efficiency. The following pseudo code is an example of the encoding procedure for each matched pair.
Correspondingly, the decoding process for the matched pair is provided in the following pseudo code.
It is noted that only pixels at unmatched positions will be encoded into the bit stream. To have a more accurate statistical model, some embodiments may use only these pixels and their neighbors for the palette table derivation, instead of using all pixels in the CU.
For encoding modes that determine an index or delta output, the encoding results usually contain a limited number of unique values. Embodiments of this disclosure provide a second delta palette table to utilize this observation. This delta palette table can be created after all literal data are obtained in the current CU. The delta palette table can be signaled explicitly in the bit stream. Alternatively, it can be created adaptively during the coding process, so that the table does not have to be included in the bit stream. A delta_color_table_adaptive_flag is provided for this choice.
In some embodiments, another advanced scheme, called Neighboring Delta Palette Table Merge, is provided. For adaptive delta palette generation, the encoder can use the delta palette from the top or left CU as an initial starting point. For non-adaptive palette generation, the encoder can also use the delta palette from the top or left CU, and then compare the R-D cost among the top, left, and current CUs.
A delta_color_table_merge_flag is defined to indicate whether the current CU uses the delta palette table from its left or upper CU. The current CU carries the delta palette table signaling explicitly only when delta_color_table_adaptive_flag==0 and delta_color_table_merge_flag==0 at the same time. For the merging process, if delta_color_table_merge_flag is asserted, another flag, delta_color_table_merge_direction, is defined to indicate whether the merge candidate is from either the upper CU or the left CU.
If delta_color_table_adaptive_flag==1, the following is an example of an encoding process for adaptive delta palette generation. On the decoder side, whenever the decoder receives a literal data, the decoder can then regenerate the delta palette using the reverse steps.
Step 1: The arrays palette_table[ ] and palette_count[ ] are defined.
Step 2: The array palette_table[ ] is initialized as palette_table(n)=n (n=0 . . . 255). Alternatively, the palette_table[ ] from the top or left CU can be used as an initial value.
Step 3: The array palette_count[ ] is initialize as palette_count(n)=0 (n=0 . . . 255). Alternatively, the palette_count[ ] from the top or left CU can be used as an initial value.
Step 4: For any delta value c′, the following operations are performed:
a) Locate n so that palette_table(n)==delta c′;
b) Use n as the new index of delta c′;
c) ++palette_count(n);
d) Sort palette_count[ ] so that it is in descending order; and
e) Sort palette_table[ ] accordingly.
Step 5: The process returns to step 1 and the process is repeated until all delta c′ in the current CU are processed.
For any block that includes both text and graphics, a mask flag can be used to separate the text section and graphics section. The text section can be compressed using the compression method described above; the graphics section can be compressed by another compression method. Because the value of any pixel covered by the mask flag has been coded by the text layer losslessly, each pixel in the graphics section can be considered as a “don't-care-pixel”. When the graphics section is compressed, any arbitrary value can be assigned to a don't-care-pixel in order to obtain optimal compression efficiency.
The index map and residuals are generated during the palette table derivation process. Compressing the index map losslessly allows efficient processing using the 1D or 2D string search. In some embodiments, the 1D or 2D string search is constrained within the current CU; however, the search window can be extended beyond the current CU. The matched distance can be encoded using a pair of motion vectors in the horizontal and vertical directions, e.g., (MVy=matched_distance/cuWidth, MVy=matched_distance-cuWidth*MVy).
Because the image can have different spatial texture orientations at local regions, the 1D search can be performed in either the horizontal or vertical directions based on the value of a color_idx_map_pred_direction indicator. The optimal index scanning direction can be determined based on the R-D cost.
Improved Binarization
As shown above, the palette table and a pair of matched information for the color index map can be encoded using fixed length binarization. Alternatively, variable-length binarization can be used. For example, for palette table encoding, the palette table may have 8 different color values. Therefore, the corresponding color index map may contain only 8 different indices. Instead of using a fixed 3 bins to encode every index value equally, just one bin can be used to represent the background pixel. For example, the background pixel may be represented as 0. Then the remaining 7 pixel values can be represented using fixed-length codewords such as 1000, 1001, 1010, 1011, 1100, 1101, and 1110 to encode the color index. This is based on the fact that the background color may occupy the largest percentage of the image, and therefore a distinct codeword of only one bit for the background color could save space overall. This scenario occurs commonly for screen content. As an example, consider a 16×16 CU. Using fixed 3-bin binarization, the color index map requires 3×16×16=768 bins. Alternatively, let the background color, which occupies 40% of the image, be indexed as 0, while the other colors are equally distributed. In this case, the color index map only requires 2.8×16×16<768 bins.
For the matched pair encoding, the maximum possible value of the matched distance and length can be used to bound its binarization, given the current constraints of technology within the area of the current CU. Mathematically, the matched distance and length could be as long as 64×64=4K in each case. However, this typically would not occur jointly. For every matched position, the matched distance is bounded by the distance between the current position and the very first position in the reference buffer (e.g., the first position in the current CU), which can be indicated as L. Therefore, the maximum bins for the distance binarization is log2(L)+1 (instead of fixed length), and the maximum bins for the length binarization is log2(cuSize−L)+1 with cuSize=cuWidth*cuHeight.
In addition to the palette table and index map, residual coefficient coding could be significantly improved by different binarization methods. As for HEVC RExt and HEVC versions, the transform coefficient is binarized using the variable length based on the observation that the coefficient produced after prediction, transform and quantization using conventional methods has typically close-to-zero magnitude, and the non-zero values are typically located on the left-upper corner of the transform unit. However, after introducing the transform skip coding tool in HEVC RExt that enables bypassing the entire transform process, the residual magnitude distribution has changed. Especially when enabling the transform skip on the screen content with distinct colors, there commonly exist coefficients with large values (i.e., not close-to-zero values, such as ‘1’, ‘2’, or ‘0’) and the non-zero values may occur at random locations inside the transform unit. If the current HEVC coefficient binarization is used, it may result in a very long code word. Alternatively, fixed length binarization can be used, which could save the code length for the residual coefficients produced by the palette table and index coding mode.
New Predictive Pixel Generation Method
As described above, a 1D/2D string search is performed in encoding the color index map. At any location in the color index map where a matched index has been found, the decoder takes the pixel at the matched location and subtracts it from the original pixel to generate a residual pixel. This procedure can be performed either by using the corresponding color in the color palette table represented by the color index at the matched location, or by using the reconstructed pixel at the matched location.
There are two methods to generate the prediction value based on the two methods described above. In the first method, for any target pixel location, a RGB value is derived from the palette table by the major color index at the matched location, and this RGB value is used as the prediction value of the target pixel. However, this method forces the decoder to perform a color index derivation procedure to the pixels that are outside of the current CU, resulting in an increase of decoding time.
To avoid the color index derivation procedure in the first method, a second method is applied where, for any target pixel location, the reconstructed pixel value at the matched location is used as the prediction value. In this method, the reconstructed value is not valid when the prediction pixel is within the current CU. In this case, however, a color index is available and its corresponding color in the color palette table can be used as the prediction pixel.
The residual value of any pixel in the current CU can be derived by subtracting its prediction value from the original value. It is then quantized and encoded into the bit-stream. The reconstructed value of any pixel in the current CU can be derived by adding its prediction value and the quantized residual value.
Single Color Mode
A single color CU can be either a CU with only one color at every pixel location or a CU having a single color in its palette with a uniform single-value index map. There are multiple methods to compress a single color CU in the palette mode. In one method, i.e., Single Color Mode, only this single color palette information is encoded and included in the bitstream. The entire color index map section is skipped. This is in contrast to encoding and transmitting the uniform all-zero index map. On the decoder side, if there is only a single color in the palette without an index map, every pixel location in the current CU will be filled up with the color in the palette
Pixel Domain String Copy
As described above, the 1D/2D string copy is applied in the color index map domain. The 1D/2D string copy can also be applied in the pixel domain. Compared to the index map domain 1D/2D string copy, the 1D/2D string copy in the pixel domain includes a number of changes. The changes are as follows:
1. The palette table and the index map generation process are not necessary and can be skipped. As an alternative, all palette table generation, index map generation, and 1 D/2D string search on index domain are still performed, but the palette table is not written to the bit stream. A coded map is generated based on the length of the 1D string match or the width and height of the 2D string match. The coded map indicates whether a pixel location is covered by a previous match. The next starting location is the first location that is not covered by a previous match.
2. When coding unmatched data, its RGB value (instead of the color index value) is written to the bit stream. When coding unmatched data, a pixel index coding method can also be applied where a one-bit flag is added in front of this RGB value in the syntax table. If this RGB value appears for the first time, the flag is set to 1 and this RGB value itself is coded to the bit stream. This RGB value is added to a lookup table after that. If this RGB value appears again, the flag is set to 0 and the lookup table index value instead of this RGB value is coded.
3. The predictive pixel generation method uses Option 2 of the single color mode (the reconstructed pixel value from the prediction pixel location is used as the prediction value).
4. For a single color CU, either Option 1 or Option 2 of the single color mode can be selected. When Option 1 is selected, the RGB value of the major color is written to the palette table section of the bit stream. When Option 2 is selected, if no upper line is used in the 1D search and no 2D option is allowed for the current CU, the RGB value of the major color is written to the palette table section of the bit stream.
In general, the 2D string copy is a flexible algorithm; it can perform operations on blocks of different widths and heights to find a match block. When the 2D string copy is constrained to the width and height of the CU, the 2D string copy becomes a fixed width/height block copy. Intra block copy (IBC) is substantially identical to this particular case of the 2D string copy that operates on the fixed width/height block. In the fixed width/height 2D string copy, the residual is encoded as well. This is also substantially identical to the residual coding method used by IBC.
Adaptive Chroma Sampling for Mixed Content
The embodiments described above provide various techniques for high-efficiency screen content coding under the framework of the HEVC/HEVC-RExt. In practice, in addition to pure screen content (such as text, graphics) or pure natural video, there is also content containing both computer-generated screen material and camera-captured natural video. This is referred to as mixed content. Currently, mixed content is processed with 4:4:4 chroma sampling. However, for the embedded camera-captured natural video portion in such mixed content, the 4:2:0 chroma sampling may be sufficient to provide perceptually lossless quality. This is due to the fact that human vision is less sensitive to the spatial changes in chroma components compared to that from the luma components. Hence, sub-sampling typically is performed on the chroma components (e.g., the popular 4:2:0 video format) to achieve noticeable bit rate reduction while maintaining the same reconstructed visual quality.
Embodiments of this disclosure provide a flag, enable_chroma_subsampling, which is defined and signaled at the CU level recursively. For each CU, the encoder determines whether it is being coded using 4:2:0 or 4:4:4 according to the rate-distortion cost.
At the encoder side, for each CU, assuming the input is the 4:4:4 source shown in
Encoder Control
As discussed above, multiple flags are provided to control the low-level processing at the encoder. For example, enable_packed_component_flag is used to indicate whether the current CU uses its packed format or a conventional planar format for encoding the processing. The decision whether or not to enable packed format could depend on the R-D cost calculated at the encoder. In some encoder implementations, a low-complexity solution could be achieved by analyzing the histogram of the CU and finding the best threshold for the decision.
The size of the palette table has a direct impact on the complexity. A parameter maxColorNum is introduced to control the trade-off between complexity and coding efficiency. The most straightforward way is choosing the option that results in the lowest R-D cost. The index map encoding direction could be determined by R-D optimization, or by using a local spatial orientation (e.g., edge direction estimation using a Sobel operator).
Some of the embodiments described above may limit the processing within every CTU or CU. In practice, this constraint can be relaxed. For example, for color index map processing, the line buffer from the upper CU or left CU can be used, as shown in
Decoder Syntax
The information provided below can be used to describe the decoding operations of the receiver 200 shown in
7.3.5.8 Coding Unit Syntax:
At operation 1701, a device derives a color index map based on a current CU. At operation 1703, the device encodes the color index map. The device encodes at least a portion of the color index map using a first coding technique. A first indicator indicates a significant distance of the first coding technique. For example, in some embodiments, a first value of the first indicator indicates an IndexMode coding technique that uses a significant distance equal to 1, and a second value of the first indicator indicates a CopyAbove coding technique that uses a significant distance equal to a block width of the current CU.
The portion of the color index map that the device encodes using the first coding technique is either a first string of indexes that has a matching second string of indexes immediately above the first string of indexes in the current CU, or a third string of indexes that all have the same value as a reference index value immediately to the left of a first index among the third string of indexes in the current CU.
At operation 1705, the device combines the encoded color index map and the first indicator for transmission to a receiver.
Although
At operation 1801, a device receives a compressed video bitstream from a transmitter. The video bitstream includes an encoded color index map. The device also receives a first indicator. The first indicator indicates a significant distance of a first decoding technique. For example, in some embodiments, a first value of the first indicator indicates an IndexMode decoding technique that uses a significant distance equal to 1, and a second value of the first indicator indicates a CopyAbove decoding technique that uses a significant distance equal to a block width of the current CU.
At operation 1803, the device decodes at least a portion of the color index map using the first decoding technique, wherein the first indicator indicates the significant distance of the first decoding technique. Later, at operation 1805, the device reconstructs pixels associated with a current CU based on the color index map.
Although
In some embodiments, some or all of the functions or processes of the one or more of the devices are implemented or supported by a computer program that is formed from computer readable program code and that is embodied in a computer readable medium. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory.
It may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrases “associated with” and “associated therewith,” as well as derivatives thereof, mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like.
While this disclosure has described certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure, as defined by the following claims.
This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application No. 62/018,349, filed Jun. 27, 2014, entitled “ADVANCED SCREEN CONTENT CODING SOLUTION WITH IMPROVED COLOR TABLE AND INDEX MAP CODING METHODS—PART 4”, which is hereby incorporated by reference into this application as if fully set forth herein.
Number | Date | Country | |
---|---|---|---|
62018349 | Jun 2014 | US |