The invention relates generally video codecs, and more particularly to directional transforms used during the encoding and decoding of blocks of pixels in video frames and images.
A digital video codec compresses and decompresses a video. Codecs can be found in broadcast equipment, televisions, personal computers, video recorders and players, satellites, as well as mobile and on-line devices. Codecs partition each frame of the video into blocks of pixels, and process the blocks one at the time.
During encoding, spatial and temporal redundancies are eliminated to reduce the data rate. The invention is particularly concerned with the transforms that are used during encoding and decoding videos. The most common transform is a discrete cosine transform (DCT) as specified in the MPEG and H.264/AVC standards. The DCT converts pixel intensities in the spatial domain to transform coefficients in the frequency domain. The coefficients are then quantized, and entropy encoded to produce a compressed bitstream. The bitstream can be stored on a medium (DVD), or communicated directly to the decoder. During decoding, the steps are inverted. After entropy decoding and inverse quantization, an inverse transformation is applied to recover the original video.
Generally, the number of decoders, e.g., consumer products all over the world, far exceeds the number of encoders. Therefore, to enable interoperability, only the bitstream and the decoding process is standardized. The encoding process is typically not specified at all in a standard.
Transforms
The DCT includes a horizontal 1-D DCT applied to each row of pixels in the block and a vertical 1-D DCT applied to each column. For blocks with predominantly horizontal or vertical features, the 2-D DCT is efficient. However, the 2-D DCT does not efficiently transform blocks that contain features that are not horizontal or vertical, i.e., directional features, where directional refers to orientations other than horizontal and vertical.
Generally, there are two methods that implement directional transforms. The first method applies the 2-D DCTs along predefined paths within the block. The second method applies a directional filter, followed by the 2D DCT. Typically, a fan filter partitions the block into a set of directional sub-bands. Transforms are subsequently applied to each sub-band. Directional transforms such as contourlets are implemented in this way. Contourlets efficiently transform frames containing smooth regions separated by curved boundaries.
Directional transforms have been used to supplement the existing 2-D DCT or DCT-like transform for existing video coding methods, such as H.264/AVC. During the encoding process, the H.264/AVC encoder selects from a set of transforms, such as the conventional 2-D transform, and a set of directional transforms. The single transform that yields the best performance, in a rate/distortion sense, is then selected for the encoding and decoding.
After the transform, improvements can by made in the entropy encoding of the corresponding data by leveraging statistics of the directional data. In the H.264/AVC, a context adaptive binary arithmetic coder (CABAC) or a context adaptive variable length coder (CAVLC) is used to entropy encode different types of data. The input symbols are mapped to binary code words and compressed by an arithmetic coder. Contexts are used to adapt the statistics used by the arithmetic coder. Each context stores the most probable symbol (either 0 or 1), and the corresponding probability.
The H.264/AVC standard is designed to use the 2-D DCT. Existing methods can use directional transforms to extend the performance of H.264/AVC encoders. However, those methods still generate and code the direction-related decisions and data using the conventional H.264/AVC framework. Thus, there is a need to efficiently represent directional information, as well as a need for improving coding efficiencies.
The purpose of the transform is to convert a block of varying pixel values to a block of coefficients for which most of the coefficients are zero. In the case of the DCT, an array set of pixels is converted into a set of DCT coefficients representing low-frequency to high-frequency data in the block. The lowest frequency is the DC coefficient, which relates to the average value of all the pixels converted by the transforms. The next coefficient represents the magnitude of the lowest-frequency cosine wave that is contained in the signal. Subsequent coefficients correspond to increasing frequencies. If the data are well-suited for the DCT, then many of the frequency coefficients are be zero, and are not needed by the decoder to reconstruct the video.
One problem with existing directional transforms, which use a set of parallel 1-D transforms, is that the length of each 1-D transform may vary depending on the position of the transform in the block. For example, to transform an 8×8 block using a directional transform oriented at 45 degrees, the 1-D transform along the main diagonal of the block has eight elements, and the adjacent 1-D has seven elements, and so on to transforms of one or two elements, which are inefficient. A one-element transform is, at best, a scaling of one pixel value, which does little to improve coding efficiency. Thus, there is a need for a method for transforming blocks using these transforms in a way that does not suffer the inefficiencies exhibited by small transform paths, yet still maintain the directional properties of the original transform.
There is also a need for a method to apply a second set of transforms to the output of the first set of transforms in a way that further improves coding efficiency without degrading performance in the way that a 2-D DCT would when the data are uncorrelated in the second orthogonal direction.
Furthermore, there is a need for partitioned versions of this transform that are suited for coding prediction residual blocks that are commonly found in predictive coders such as H.264/AVC.
A bitstream includes a sequence of frames. Each frame is partitioned into encoded blocks. For each block, a set of paths is determined at a transform angle determined from a transform index in the bitstream. Transform coefficients are obtained from bitstream. The transform coefficients include one DC coefficient for each path. An inverse transform is applied to the transform coefficients to produce a decoded video.
The encoder compresses an input video 1 into a bitstream 15. The encoder applies transformation, quantization, and entropy encoding to the input video as described in detail below. To ensure that the output video accurately reflects the input video, the decoder 20 performs the inverse steps in an inverse order. In addition, the encoder typically includes the equivalent of the decoder to provide feedback for the encoding process. Because all encoder variables are readily available in the encoder, the decoder in the encoder is relatively simple. The invention is particularly concerned with the inverse directional transforms 25.
As described below, and shown in
To ensure interoperability between the encoder and the decoder, video coding standards typically only specify the bitstream and the decoding process. However, it is understood that a description of the encoding process, as detailed below, is sufficient to exactly deduce the inverse decoding process by one of ordinary skill in the art.
Decoder
The coefficients are inverse quantized 24 and inverse transformed 25 so that the decoded blocks form the output or decoded video 2. The transform can be an inverse discrete cosine transform (IDCT). The transforms can include a 2D inverse discrete cosine transform, and a set of inverse directional transforms. Secondary inverse transforms can also be applied as described below in greater detail.
The information 160 is presented to the context generation module (CGM) of the decoder, which forwards selected contexts 921-922 to the CABAC decoder. Predicted transform indicators (PTI) 501 of the previously decoded blocks 160 are presented to a directional index decoding module (DIDM) 601, which generates a transform indicator 602 for the inverse transform 25. The inverse transform can use any of the inverse transforms, e.g., 1D horizontal and 1D vertical inverse DCTs (2D IDCT) 41, a set of inverse directional transforms 42, and any other known inverse transforms 43.
It is noted that current video coding standards only use a single pre-specified transform so that an index to different transforms is not needed. Also, current standards do not consider side information related to previously decoded blocks during the inverse transform.
Encoder
Input to the encoder is a block 101 of a frame of a video to be coded. As defined herein, blocks include macroblocks, sub-blocks, and block partitions, generally an array of pixels. In most coding applications, the operations are preferably performed on macroblocks and sub-blocks The block can contain original video data, residuals from a spatial or motion-compensated prediction of video data, or other texture-related data to be transformed. The block can be partitioned into sub-blocks by a sub-block partition directional processing module (SPDPM) 200. Herein, the sub-blocks are processed one at a time as “blocks.”
Each block is transformed using transforms selected from a conventional two-dimensional discrete cosine transform (2-D DCT) 120, a set of directional transforms 130, or other transforms, generally transforms 125. The output of the transform is measured by a transform type and direction decision module (TTDDM) 300. The TTDDM uses a metric, such as a rate/distortion cost, to determine which of the transforms provides the best performance. The rate/distortion cost is a sum of a encoding rate and a scalar multiplied by the distortion. The transform type and direction have a minimal cost are selected for the transforming. The performance can be, but is not limited to, a measure of the coding efficiency. The idea is that the transform which has the best performance is selected for the encoding, and the selected transform is signaled to the decoder as in an index 16 in the bitstream.
The TTDDM can also receive input from a direction inference module (DIM) 400. The input to the DIM is a collection of data 160 indicating the transforms and directions used for adjacent previously processed blocks. The output of the DIM is a value or set of values corresponding to the data 160, such as preferred directions 431. The TTDDM uses this information to make the decision as to which transforms and directions are used to encode the block 101. The TTDDM can also forward a final partitioning indicator (FPI) 141 to the SPDPM as a guide for the partitioning. The TTDDM module produces the transformed block 102 and a selected transform indicator (STI) 145 representing the selected transform and direction.
Then, the transformed block 102 can be appropriately encoded 150 using entropy coding to produce an encoded output block 17.
The direction prediction module (DPM) 500 also receives information from the DIM, and information related to the previously processed blocks 160. The DPM uses this information to generate a predicted transform indicator (PTI) 501. The PTI is input to a directional index encoding module (DIEM) 600, along with the STI 145. The DIEM converts the representation to a binary codeword 603 for encoding by a context-adaptive binary arithmetic coder (CABAC) 190.
The contexts used by the CABAC are determined by a context generation module (CGM) 900. The input to the CGM is information about the transforms and directions used by adjacent previously encoded blocks from the DIM, or already coded information from the current block. The CGM produces contexts for the CABAC to encode the binary directional index. The CABAC outputs an encoded transform index 16.
Sub-Block and Partition Directional Processing Module
Transform Type and Direction Decision Module
The transform selector can be influenced by the DIM 400. The DIM, for example, can examine adjacent blocks to determine which directions are more likely to perform well for the current block. The measuring can then be limited to a subset of available directions, thus reducing the processing time. After these measurements are used to determine the best direction or transform, the selected transform indicator 145, and the corresponding transformed block 102 are output. If the TTDDM is operating on a selection of partitions, then the final partitioning indicator 141 that yields the best performance is also output to the SPDPM.
Direction Inference Module
A block selection module (BSM) 410 selects from the blocks 160 based on criteria, such as a distance of the selected blocks to the current block. The reliability decision module (RDM) 420 estimates the reliability of the selected blocks. The RDM module can use texture information, the position and other block data 412. A reliability factor 421 of each of the selected blocks, and the corresponding transform direction 411 are fed into the preferential direction determination module (PDDM) where the preferred directions 431 are identified.
Directional Prediction Module
For encoding transformed texture residuals, the selected transform direction indicator 145 can be correlated with a texture predictor, such as an intra-prediction mode used in H.264/AVC. Therefore, the side information fed to the DPM can include, for example, the intra-prediction mode to select the indicator 501.
Directional Index Encoding Module
Δ=(IS−IP+N)mod N,
where IS and IP are the mapped indices of the selected and predicted direction indicators, respectively, and N is the number of possible direction, e.g., eight. Because small differences are more probable, the binarization 620 codes differences close to zero (0, 1, N−1, 2, N−2, . . . ) with fewer bits. The difference calculation can be bypassed 611 and the mapped transform indicator is forwarded directly to the binarization module 620. In this case, the context generation module 900 uses the predicted transform indicator to select an appropriate context.
Context Generation Module
The embodiment shown in
Primary and Secondary Directional Transforms
The transform includes a set of 1-D transforms {T0, T1, . . . , TN-1} 1102, where N is the total number of 1-D transforms applied to the block. The length li of transform Ti indicates the number of pixels on which the 1-D transform operates. Thus, the transforms {T0, T1, TN-1} in the set have corresponding lengths {l0, l1, . . . , lN-1}.
Each transform is applied to pixels along a path 1102 in the block. A path typically includes a set of contiguous or adjacent pixels. However, non-contiguous pixels can also be included in the path.
As shown in
During decoding, the secondary directional transform 26 can be applied to the set of secondary transform coefficients to reconstruct the DC coefficient of each path. During encoding, the first or DC coefficient of each path is discarded after the set of secondary transform coefficients 1170 is formed.
Each path is oriented, with respect to the vertical direction 1103, at a transform angle θ 1105, which is determined by the directional transform index. As described above, the transform index, which is determined during the encoding, is part of the bitstream to be decoded.
The paths for a particular block and transform are generally oriented in the same direction. The paths are generated as follows.
A minimum path of transform length Lmin 1110 is specified for the block. The transform path typically begins at a starting pixel 1120 located at an edge (or corner) of the block 1100. As stated above, if the values for the pixels are coefficients, the value for the starting pixel is the DC coefficient.
The path continues along the angle θ until an end pixel 1121 at another edges. The path length, in pixels, is m. If the starting pixel is at an edge or corner, then the length m=1.
If m≧Lmin, then the path is considered complete. If the block still contains any pixels not on a path, then a new path is started. The new path can start at any untransformed pixel in the block. Typically, the next path begins at a pixel adjacent to or near the beginning of the previous path, or, the path can start at an opposite corner of the block so that the distribution of path lengths within the block is substantially be symmetric. The process continues with Step 2 until all pixels are transformed.
If m<Lmin, then the path is too short, and the process continues by including a pixel adjacent to a previously processed pixel. If there is more than one adjacent pixel, then other paths in the block are used to determine the current path. If an unprocessed pixel is available, the pixel is made part of the path, and path length m is incremented, and the process continues with Step 2 in the direction (180−θ) until the edge of the block is reached. Thus, the direction θ is set to (180−θ), effectively a U-turn, before continuing with Step 2.
The inverse transform 26 operates along the same paths as described above, except inverse-transform coefficients are used. For example, if a 1-D DCT is used for each path, then the inverse transform would use the 1-D Inverse DCT (IDCT).
Secondary Directional Transforms
The inverse secondary transform operates along the same path as described above, except inverse secondary transform coefficients are used. The inverse secondary directional transform is performed before the inverse directional transform during the decoding.
The secondary transform further reduces redundancy in the DC components of the directional transform coefficients. Alternatively, the DC component of one directional transform can be used predict the DC component of another directional transform.
Partitioned Directional Transforms
Then, directional transforms are generated using the steps described above, with the constraint that block partition A has a set of paths oriented at angle θA, and partition B has a set if paths oriented at angle θB. The primary angle of the directional transform θ is considered to be the same as θA. As a path is generated within the partition, the line 1310 approximates the edge of the partition. Thus, each pixel in the directional transform is either in partition A or B. The angles θA and θB can be different.
In one embodiment of the invention, the secondary directional transform is applied to both partitions B. To invert the process, the secondary inverse transform is applied, and then the inverse directional transforms are applied independently to the partitions A and B. In another embodiment of the invention, the secondary transform is applied independently to each partition.
That is, either both inverse secondary transforms can be applied before the inverse primary transforms, or the inverse secondary transform and inverse primary transform can be applied to the partitions independently. This decision is made adaptively on a per block basis.
Scaling and Quantization Order
After the primary and secondary transforms are completed, the resulting coefficients are scaled, ordered and quantized.
The scaling of the transform coefficients depend upon the length m of the path of each 1-D directional transform, or a location of the coefficient in the block. A 1-D transform of length m has a scale factor Sm. Thus, all coefficients in the path of length m are scaled, by Sm. Typically, the scale factor is selected so the magnitudes of the DC coefficients are the same when transforming identical pixel values. If a transform with length m=4 transforms four pixels each with value v, and a transform with length m=5 transforms five pixels each with value v, then the scale factor Sm is selected so that both transforms output the first (DC) coefficients with the same value.
Alternative scaling methods are also possible. Shorter transforms can be given smaller or larger scale factors based on the length m, or direction θ. The scaling can also be made part of the transform itself, to simplify the implementation of this process.
The scaled coefficients are arranged in a scanning order. In one embodiment, the set of transforms {T0, T1, . . . , TN-1} is scanned independently and in order. In each transform, the first DC coefficient is scanned first, followed by the following AC coefficients in the same order as the path for that transform.
In another embodiment, all the first DC coefficients from each transform are scanned, followed by all the second coefficients from each transform, and so on, to the last transform. In the second embodiment, the order in which the transforms are scanned can vary. For example, the transforms can be scanned in order of their index, i.e., the first scan uses the DC coefficients from the set of transforms {T0, T1, . . . , TN-1} in the order {0, 1, . . . , N−1}. Alternatively, the transforms can be scanned in order of their length {l0, l1, . . . , lN-1}, The coefficients can be scanned based on their relative position in the block. For example, all the coefficients along an edge of the block can be scanned first, followed by coefficients that are offset from the edge.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
This Non-Provisional Application is related to U.S. Non-provisional application Ser. No. 12/603,100, entitled “Video Codecs with Directional Transforms,” filed Oct. 21, 2009, by Cohen et al., co-filed herewith, and incorporated herein by reference.