1. Field of the Invention
The present invention relates generally to video encoding. More particularly, the present invention relates to a method and apparatus for intra-prediction in a video encoder.
2. Description of the Related Art
Video compression is used in many current and emerging products, such as digital television set-top boxes (STBs), high definition television (HDTV) decoders, digital versatile disk (DVD) players, BLU-RAY disc players, digital camcorders, personal computers, and the like. Without video compression, digital video content can be extremely large, making it difficult or even impossible for the digital video content to be efficiently stored, transmitted, or viewed. There are numerous video coding methods that compress digital video content. Consequently, video coding standards have been developed to standardize the various video coding methods so that the compressed digital video content is rendered in formats that a majority of video decoders can recognize. For example, the Motion Picture Experts Group (MPEG) and International Telecommunication Union (ITU-T) have developed video coding standards that are in wide use. Examples of these standards include the MPEG-1, MPEG-2 (ITU-T H.262), MPEG-4, ITU-T H.261, ITU-T H.263, and ITU-T H.264 standards.
Video encoding standards, such as MPEG standards, typically achieve data compression by utilizing various coding techniques, such as spatial and temporal prediction, transform and quantization, entropy encoding, and the like. Prediction in video encoders is typically includes both inter-prediction and intra-prediction for improving coding efficiency. Inter-prediction exploits the temporal correlation between images of video, whereas intra-prediction exploits the spatial correlation of pixels within an image of video. Both types of prediction are typically performed on blocks of pixels.
For intra-prediction, the prediction of a block is formed by extrapolating from neighboring samples of previously coded and reconstructed blocks, and then the difference between the block and its prediction is coded. Such a technique, however, does not work well with images having complex textures. Furthermore, the farther the pixels being predicted are from the surrounding pixels, the greater the error in prediction.
Accordingly, there exists a need in the art for a method and apparatus for intra-prediction in a video encoder that overcomes the aforementioned deficiencies.
An aspect of the invention relates to a method of intra-prediction for a group of samples in an image being coded. In some embodiments, the method includes: defining a target template for the group of samples; comparing the target template with affine transformations of candidate templates within a search area of the image; identifying at least one matching template of the candidate templates as matching the target template; determining a candidate group of samples based on the at least one matching template; and coding the group of samples using the candidate group of samples as a predictor.
Another aspect of the invention relates to an apparatus for intra-prediction for a group of samples in an image being coded. In some embodiments, the apparatus includes: an encoder configured to code the group of samples using a candidate group of samples as a predictor; and a temporal/spatial prediction module, within the encoder, configured to: define a target template for the group of samples; compare the target template with affine transformations of candidate templates within a search area of the image; identify at least one matching template of the candidate templates as matching the target template; and determine the candidate group of samples based on the at least one matching template.
Another aspect of the invention relates to an apparatus for intra-prediction for a group of samples in an image being coded. In some embodiments, the apparatus includes: means for defining a target template for the group of samples; means for comparing the target template with affine transformations of candidate templates within a search area of the image; means for identifying at least one matching template of the candidate templates as matching the target template; means for determining a candidate group of samples based on the at least one matching template; and means for coding the group of samples using the candidate group of samples as a predictor.
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
It should be noted that although aspects of the present invention are described within the context of H.264/MPEG-4 AVC, the present invention is not so limited. Namely, the video encoder can be an H.264/MPEG-4 AVC compliant encoder or an encoder that is compliant to any other compression standards that are capable of exploiting the intra prediction scheme. Note that aspects of the invention described below are not part of any compression standards presently known.
Input video data comprises a sequence of pictures, where each picture is a field or frame (two interlaced fields) having an array of luminance (luma) samples and two arrays of chrominance (chroma) samples. Each picture can be further divided into slices, which can be divided into macroblocks, which can be divided into blocks of different sizes. The input video data is coupled to the temporal/spatial prediction module 140 via path 110. The temporal/spatial prediction module 140 may include a variable block motion estimation (ME) module 142 and a motion compensation (MC) module 144. Motion vectors from the ME module 142 are received by the MC module 144 for improving the efficiency of the prediction of sample values. Motion compensation involves a prediction that uses motion vectors to provide offsets into the past and/or future reference pictures containing previously decoded sample values that are used to form the prediction error. Namely, the temporal/spatial prediction module 140 uses previously decoded picture(s) and motion vectors to construct an estimate of the current picture being coded.
The temporal/spatial prediction module 140 may also perform spatial prediction processing, e.g., directional spatial prediction (DSP). Directional spatial prediction can be implemented for intra coding, for extrapolating the edges of the previously-decoded parts of the current picture and applying it in regions of pictures that are intra coded. This improves the quality of the prediction signal, and also allows prediction from neighboring areas that were not coded using intra coding.
Furthermore, prior to performing motion compensation prediction for a given block, a coding mode must be selected. In the area of coding mode decision, MPEG provides a plurality of different coding modes. Generally, these coding modes are grouped into two broad classifications, inter coding and intra coding. Intra coding involves the coding of a block, macroblock, or slice in a picture using intra prediction, which is a prediction derived only from the same decoded picture. Conversely, inter coding involves the coding of a block, macroblock, or slice in a picture using inter prediction, which is a prediction derived from decoded picture(s) other than the current picture. Embodiments of an intra prediction process performed by the temporal/spatial prediction module 140 are described below.
Once a coding mode is selected, the temporal/spatial prediction module 140 generates a motion compensated prediction (predicted image) on path 152 of the contents of the block based on past and/or future reference pictures. This motion compensated prediction on path 152 is subtracted via the subtractor 115 from the video image on the path 110 in the current block to form an error signal or predictive residual signal on path 153. The predictive residual signal on the path 153 is passed to the transform module 160 for encoding.
The transform module 160 then applies a discrete cosine transform based (DCT-based) transform. Specifically, in H.264/MPEG-4 AVC, the transformation is applied to 4×4 blocks, where a separable integer transform is applied. An additional 2×2 transform is applied to the four DC coefficients of each chroma component. The resulting transformed coefficients are received by the quantization module 170, where the transform coefficients are quantized. H.264/MPEG-4 AVC uses scalar quantization.
The resulting quantized transformed coefficients are then decoded in the inverse quantization module 175 and the inverse DCT module 165 to recover the reference picture(s) that will be stored in reference buffer 150. In H.264/MPEG-4 AVC, the in-loop deblocking filter 151 is also employed to minimize blockiness.
The resulting quantized transformed coefficients from the quantization module 170 are also received by the entropy encoder 180 via signal connection 171. The entropy encoder 180 may perform context-adaptive variable length coding (CAVLC) or context-adaptive binary arithmetic coding (CABAC), where the two-dimensional block of quantized coefficients is scanned using a particular scanning mode, e.g., a “zig-zag” order, to convert it into a one-dimensional string of quantized transformed coefficients.
The data stream is received into the buffer 190, which is a first in first out (FIFO) memory. A consequence of using different picture types and variable length coding is that the overall bit rate into the buffer 190 is variable. Namely, the number of bits used to code each frame can be different. In applications that involve a fixed-rate channel, the buffer 190 can be used to match the encoder output to the channel for smoothing the bit rate. Thus, the output signal of the buffer 190 is a compressed representation of the input video image 110, where it is output via a path 195. The rate control module 130 serves to monitor and adjust the bit rate of the data stream entering the buffer 190 for preventing overflow and underflow on the decoder side (within a receiver or target storage device, not shown) after transmission of the data stream.
In some embodiments, the modules of the video encoder 100 can be implemented using hardware, such as one or more integrated circuits (ICs), discrete components, circuit boards, and the like. In some embodiments, one or more of the modules of the video encoder 100 may be implemented via software (e.g., a processor executing software to perform the functionality of the module(s). In some embodiments, the modules of the video encoder 100 may be implemented using a combination of hardware and software.
The method 401 begins at step 406, where a target template is defined for the target sub-block (e.g., template 308 for the upper right 2×2 sub-block 305 in the block 304). At step 408, a candidate template is selected within the search region (e.g., candidate template 310). At step 410, an indicium of match between the target template and the candidate template is computed.
One indicium, commonly used in MPEG, is the sum of absolute differences (SAD). In some embodiments, rather than directly measuring the difference between templates using a SAD, an optimal affine transformation of the candidate template is found that minimizes a mean squared error (MSE) with respect to the target template. Use of the affine transformation may result in a smaller prediction error as compared not using the affine transformation, which advantageously results in less transform coefficients to be coded and transmitted to the decoder (i.e., coding efficiency is improved).
Notably, the target template may be represented by a vector with N samples (e.g., the template 308 has N=5 samples), i.e., x=[x1 . . . xN]. Likewise, a candidate template may be represented by a vector with N samples, i.e., y=[y1 . . . yN]. A matching indicium for a target template may be computed by finding an optimal affine transformation, x_hat, of y that minimizes:
|x−(αy+β)|2
where x_hat=αy+β is an affine transformation of y. In the problem, the coefficients α and β are unknown. Techniques for minimizing an equation with two unknowns (α and β) are well known in the art. Equation (1) represents a mean squared error (MSE) of an affine transformation of the candidate template with respect to the target template. The optimal affine transformation is one that provides a minimum MSE (MMSE).
At step 412, a determination is made whether there are more candidate templates in the search region. If so, the method 400 returns to step 408, where another candidate template is selected. Otherwise, the method 400 proceeds to step 414. At step 414, the indicia of match between the all candidate templates and the template are analyzed to identify the best indicium. If an affine transform is used, the best candidate template is the one where equation (1) is closest to zero. At step 416, a candidate sub-block corresponding to the candidate template is defined and used as a predictor for the target sub-block (e.g., candidate sub-block 312). If an affine transform is used as an indicium of matching, the candidate sub-block is defined by affine transforming the sub-block adjacent the best candidate template in accordance with the coefficients α and β of the optimal affine transform found for the best candidate template. Note that the method 401 of intra prediction may be performed on any group of samples in an image, where the current 2×2 sub-block is an example.
At step 418, a determination is made whether there are more sub-blocks in the target block. If so, the method 400 returns to step 404, where another target sub-block in the target block is selected. Otherwise, the method 400 proceeds to step 420. At step 420, predictors for each target sub-block are combined to produce a predictor for the target block.
The target template and the affine transformations of the M best candidate templates (“transformed candidate templates”) may be treated as vectors in an N-dimensional space. The transformed candidate templates thus span a subspace in the N-dimensional vector space. At step 506, a projection of the target template onto the subspace spanned by the candidate templates is computed. Mathematical techniques for computing a linear projection of a vector onto a subspace are known in the art. The projection produces coefficients respectively associated with the candidate templates that relate the target template to the candidate templates. At step 508, the coefficients for the M best candidate templates are found from the projection. At step 510, a weighted average of sub-blocks associated with the M best candidate templates is computed using the respective coefficients for the M best candidate templates as respective weights. At step 512, the values of the average sub-block are rounded and/or clipped to produce values in a valid range. At step 514, the average sub-block is used as a predictor for the target sub-block.
The target template and the M candidate templates (“transformed candidate templates”) may be treated as vectors in an N-dimensional space. The transformed candidate templates thus span a subspace in the N-dimensional vector space. At step 606, a projection of the target template onto the subspace spanned by the transformed candidate templates is computed. Mathematical techniques for computing a linear projection of a vector onto a subspace are known in the art. The projection produces coefficients respectively associated with the M candidate templates that relate the target template to the M candidate templates. At step 608, the coefficients for the M candidate templates are found from the projection. At step 610, a weighted average of sub-blocks associated with the M candidate templates is computed using the respective coefficients for the M candidate templates as respective weights. At step 612, the values of the average sub-block are rounded and/or clipped to produce values in a valid range. At step 614, the average sub-block is used as a predictor for the target sub-block.
In the example of
In the embodiments described above, the candidate templates may be searched for in the search window with single pixel accuracy. Alternatively, in some embodiments, the search for candidate templates within the search region for any of the above-described embodiments may be performed with sub-pixel accuracy.
Method and apparatus for intra-prediction in a video encoder has been described. In some embodiments, intra-prediction is preformed using a template matching technique. Notably, the indicium of match between candidate templates and a target template is computed using affine transformations. In some embodiments, embodiments of the intra-prediction process described herein can be used in video encoders compliant with MPEG standards. Notably, the template matching intra-prediction process can be used as another intra-prediction mode in an H.264 compliant video encoder, in addition to the modes defined by the standard.
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.