This U.S. Patent Application is related to U.S. patent application Ser. No. 13/169,959 “Method for Coding Pictures Using Hierarchical Transform Units,” filed by Cohen et al. on Jun. 27, 2011, incorporated herein by reference.
The invention relates generally to coding pictures, and more particularly to methods for coding pictures using selected transforms in the context of encoding and decoding pictures.
For the High Efficiency Video Coding (HEVC) standard currently under consideration by the ISO and ITU standards organizations, Discrete Cosine Transforms (DCT) and/or Discrete Sine Transforms (DST) are applied to 4×4 blocks of data. The decision as to which transform types to apply horizontally and vertically is defined by a look-up table, see
The hierarchical coding layers defined in the proposed standard include video sequence, picture, slice, and treeblock layers. Higher layers contain lower layers.
According to the proposed standard, a picture is partitioned into slices, and each slice is partitioned into a sequence of treeblocks (TBs) ordered consecutively in a raster scan. Pictures and TBs are broadly analogous to frames and macroblocks, respectively, in previous video coding standards, such as H.264/AVC. The maximum allowed size of the TB is 64×64 pixels luma (intensity), and chroma (color) samples.
A Coding Unit (CU) is the basic unit of splitting used for Intra and Inter prediction. Intra prediction operates in the spatial domain of a single picture, while Inter prediction operates in the temporal domain among the picture to be predicted and a set of previously-decoded pictures. The CU is always square, and can be 128×128 (LCU), 64×64, 32×32, 16×16 and 8×8 pixels. The CU allows recursive splitting into four equally sized blocks, starting from the TB. This process gives a content-adaptive coding tree structure comprised of CU blocks that can be as large as the TB, or as small as 8×8 pixels.
A Prediction Unit (PU) is the basic unit used for carrying the information (data) related to the prediction processes. In general, data contained within a PU is predicted using data from previously-decoded PUs. Each CU can contain one or more PUs.
The TU is the basic unit used for the transformation and quantization processes. For Intra-coded blocks, the TU cannot be larger than the PU. Also, the TU does not exceed the size of the CU. Multiple TUs can be arranged in a tree structure, henceforth—transform tree. Each CU may contain one or more TUs, where multiple TUs can be arranged in a tree structure. Each TU has an associated Coded Block Flag (CBF), which indicates whether any of the transformed and quantized coefficients inside the TU are nonzero. If the CBF for a TU is zero (0), then all coefficients in the TU are zero, hence no inverse transformation needs to be performed. The inverse transformation is performed when the CBF is one (1).
After decoding and inverse-quantizing a block of data, or more specifically, an Intra prediction residual, from a bitstream, the decoder applies two-dimensional transforms to the data in the TUs corresponding to that block. For separable transforms, the inverse-transform process is implemented as a set of one-dimensional transforms or inverse transforms applied vertically, and a set of one-dimensional transforms or inverse transforms applied horizontally.
Intra prediction is the process of using previously-decoded blocks to act as predictors of the current block. The difference between the current block and the predicted block is known as the prediction residual. One or more pixels in the current block are predicted by previously-decoded neighboring pixels based upon an Intra prediction mode, which indicates the direction or angle of prediction. For example, Intra prediction mode 0 is the vertical prediction mode, where pixels in the current block are predicted from pixels located directly above, in the previously-decoded block. Intra prediction mode 1 is the horizontal prediction mode, where pixels in the current block are predicted from pixels located directly to the left, in the previously-decoded neighboring block. In the current draft of the standard, one Intra prediction direction is associated with each PU.
The prior art defines the table 200 that maps each Intra prediction mode to a particular horizontal and vertical transform type, which is used for both encoding and decoding the prediction residual, see “CE7: Mode-dependent DCT or DST without 4×4 full matrix multiplication for intra prediction,” JCTVC-E125, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3, ISO/IEC JTC1/SC29/WG11, 5th Meeting: Geneva, CH, 16-23 Mar. 2011, Table 1. For example, when decoding a block that uses Intra prediction mode 0, a DST is applied vertically, and a DCT is applied horizontally. When decoding a block that uses Intra prediction mode 1, a DCT is applied vertically, and a DST is applied horizontally.
The fixed mapping table is defined in the HEVC Working Draft. Currently, the mapping table only applies to 4×4 transforms.
In practice, however, image or video blocks, and their residuals, do not necessarily match the model that was assumed to design the prior art mapping table 200 in the draft standard.
Hence, there is a need for altering the mapping during the decoding and encoding process. There is also a need for a method to indicate when the mapping among the modes and transform types should be altered for Intra and Inter prediction.
A bitstream includes coded CUs, and their corresponding PUs and TUs. For Intra-coded PUs, the bitstream includes a prediction mode, also known as prediction direction or prediction angle, associated with each PU. A mapping table defined in the decoder and encoder maps each prediction direction to a vertical transform type and to a horizontal transform type. If a PU contains a TU of a size for which a mapping table is defined, and if a Coded Block Flag (CBF) for any of the TUs of that size are nonzero, then a “transform-toggle” (ttf) flag is present in the bitstream for the PU. The ttf indicates that the way that transforms are mapping is altered for all TUs of that size within the PU. The specific type of alteration to be performed when the ttf is set to one (1) is either pre-defined in the decoder and encoder, or it can be signaled in the bitstream. The ttf can also be a transform-toggle value, which can represent more than two conditions.
As shown in
In one embodiment as shown in
The embodiments of the invention can provide additional mapping tables, or the identical table, to use for TUs of other possible sizes. The mapping table for each intra prediction modes includes modes or directions (VER or HOR or DC), a vertical inverse transform (DCT or DST), and a horizontal inverse transform (DCT or DST). The number of modes is specified in the working draft.
For each CU 110, one or more PU 111 are parsed from the input bitstream, and a transform tree 170 is generated. A “toggle-processed flag” (tpf) 112 associated with each PU is set to zero (0) 120. The tpf can be internal to the codec, e.g., stored in the memory or a register, and is not necessarily in the input bitstream 104.
The data in the TU is decoded according to the transform tree 170. The transform tree is described in detail in the related U.S. Patent Application, as well as the draft standard.
During the decoding, the first time an M×M TU with a nonzero CBF is encountered 130, a “transform-toggle flag” (ttf) 115 is decoded. The tpf for the PU that contains this TU is set 140 to one (1). If the TU is not M×M, and therefore does not have an associated mapping table, then the ttf is not decoded. In that case, the ttf remains unset. The decision on whether to set the ttf to 0 or 1 can also be based upon data already available or decoded, such as statistics from adjacent blocks, or other parameters. In this case, the ttf is not decoded from the bitstream, but is inferred from already available data.
If the ttf for a TU is set to 1, the ttf indicates that the way the transform types in the mapping table are applied to the TU is altered 150 for that TU.
Then, processing of the TU tree continues. The next time the M×M TU with a nonzero CBF is decoded, the tpf for its corresponding PU is checked. If the flag is set to one (1), then that indicates that the ttf for the PU has already been decoded, so the identical ttf is associated with this TU as well. In other words, the ttf is only present in the bitstream for the first M×M TU (with a non-zero CBF) contained in a PU, and the rest of the M×M TUs in that PU use the identical ttf.
During the decoding of the data in the TUs, the inverse vertical and horizontal transforms are applied 160 to the data in the TUs to obtain the output 109. The output 109 can be reconstructed prediction residual data, or other image/video-related data based upon the unaltered or altered mapping 105 that maps each Intra prediction direction to the vertical and horizontal transform types. If the ttf for the TU is set to zero (0), then the unaltered mapping is used. If the ttf for the TU is set to one (1), then the altered mapping is used.
In this embodiment, the steps above are applied, and if the ttf for a TU is one (1), then a DCT in the mapping table becomes a DST, and a DST becomes a DCT. In other words, if the original mapping table specifies that a DCT is used as the horizontal transform for a given prediction mode, then a DST is used if the ttf is one (1). Similarly, a DCT is used instead of a DST. If the transform-toggle is zero (0) set, then the original mapping is used.
This embodiment is similar to Embodiment 1, except mapping is altered only for Intra prediction directions for which both the horizontal and vertical transforms are different. Thus, if the table specifies that the DCT is used for both transforms, then the mapping is not altered. If the table specifies that a first set of horizontal DCT and vertical DST is used, then the mapping is altered so that a second set of horizontal DST and vertical DCT is used. The end effect is that the transforms in first set are swapped 260 to effectively result in the second set, as shown in
This embodiment is similar to Embodiment 1, except that the mapping of the first set is altered to the second set only for Intra prediction directions for which both the horizontal and vertical transforms are identical. This maps a 2-D ACT to a 2-D DST, and a 2-D DST to a 2-D DCT, without altering any 2-D transform where the horizontal and vertical transform types are different.
This embodiment includes any of the embodiments described above. In addition, the mapping is altered only for a subset of Intra prediction directions (modes). For example, the mapping for Intra prediction angles that are near-vertical and near-horizontal remain unaltered, but more oblique prediction angles are allowed to have the mapping from the first set to the second set altered. Thus, the decision as to alter the way the transforms are applied depends on both the transform type and prediction direction.
This embodiment is similar as Embodiment 4, and in addition, the transform in type is not used to determine whether to alter the way the transforms are applied. Hence, the way the transforms are applied is altered and is based on the prediction direction.
In the earlier embodiments, the mapping is altered depending on various factors. In this embodiment, a separate, pre-defined mapping table is used when the ttf is one (1). The ttf thus selects which of these two independent mapping tables to use.
In this embodiment, the ttf can have integer values other than 0 and 1 (false or true). In the case the value n represents an index, and the index is used to either alter the way transforms are applied, or to select an otherwise defined transform 270 according to the index, e.g., a discrete wavelet transform (DWT), undecimated discrete wavelet transform (UDWT), or some transform. That is, the table can be said to have an additional “virtual” column 251 of transforms, other than the transforms defined by the draft standard.
In this embodiment, the tpf is associated with a CU rather than a PU. Thus, only one ttf is decoded or inferred for each CU, and all TUs that use a ttf use that tpf flag.
In this embodiment, ttfs and tpfs are associated with more than one M×M TU size. For example, 4×4 TUs can use one table and, and 8×8 can use another. Different TU sizes can share tables and alternate mappings as well. For example, TUs of size 8×8 and smaller can use one table, and TUs of size 16×16 and larger can use another table.
This embodiment allows for altered mapping for PUs other than Intra predicted PUs. For example, data associated with Inter-predicted PUs can be used to index a mapping table to select horizontal and vertical transform types.
In this embodiment, the ttf is decoded or inferred for every TU, or for every M×M TU with a nonzero CBF. In this case, the tpf is not needed because every relevant TU is associated with its own ttf.
In this embodiment, an index indicating which mapping table to use is decoded once per sequence, or other subset of the bitstream, e.g. GOP, slice, etc.
In this embodiment, the mapping table or tables are defined and decoded from the bitstream.
In this embodiment, the depth of the TU within the transform tree is also used to determine if the mapping is altered before the transforms are applied.
One embodiment of the invention, deals with CUs or PUs that include multiple TUs having predetermined characteristics are signaled in the bitstream for specialized processing. When the CU or the PU includes multiple TUs with the predetermined characteristics, an indicator of how to process these TUs is only signaled in the bitstream for the first TU having the predetermined characteristics. Then, subsequent TUs, within the same PU or CU, having the same predetermined characteristics use that indicator, and the indicator is not signaled in the bitstream for the subsequent TUs for the CU or the PU. The predetermined characteristics can relate to the TU size or shape, intra or inter prediction, tree depth, and the like.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
7925107 | Kim et al. | Apr 2011 | B2 |
8208532 | Ekman | Jun 2012 | B2 |
8249147 | Watanabe et al. | Aug 2012 | B2 |
8594183 | Han et al. | Nov 2013 | B2 |
8798159 | Han et al. | Aug 2014 | B2 |
20060104357 | Burazerovic et al. | May 2006 | A1 |
20090238271 | Kim et al. | Sep 2009 | A1 |
20110293002 | Sole et al. | Dec 2011 | A1 |
20120121009 | Lu et al. | May 2012 | A1 |
20120147947 | Chien et al. | Jun 2012 | A1 |
20120163469 | Kim et al. | Jun 2012 | A1 |
20120170662 | Karczewicz et al. | Jul 2012 | A1 |
20120177118 | Karczewicz et al. | Jul 2012 | A1 |
20120230411 | Liu et al. | Sep 2012 | A1 |
20120287989 | Budagavi et al. | Nov 2012 | A1 |
20130308701 | Chen et al. | Nov 2013 | A1 |
Entry |
---|
Cohen et al. “Direction-Adaptive transforms for Coding Prediction Residuals”. (Nov. 2010) IEEE Inter. Con. on Image Processing 2010. |
Saxena et al. “Jointly optimal intra prediction and adaptive primary transform”. (Oct. 2010) JCT-VC C108 ISO/IEC JTC1/SC29/WG11. |
Number | Date | Country | |
---|---|---|---|
20130003828 A1 | Jan 2013 | US |