This invention relates generally to intra mode prediction in a video coder. More particularly, this invention relates to a system and method for jointly selecting the intra prediction mode of each intra-coded block in a video sequence to improve the visual quality of the sequence.
Digital video coding technology enables the efficient storage and transmission of the vast amounts of visual data that compose a digital video sequence. With the development of international digital video coding standards, digital video has now become commonplace in a host of applications, ranging from video conferencing and DVDs to digital TV, mobile video, and Internet video streaming and sharing. Digital video coding standards provide the interoperability and flexibility needed to fuel the growth of digital video applications worldwide.
There are two international organizations currently responsible for developing and implementing digital video coding standards: the Video Coding Experts Group (“VCEG”) under the authority of the International Telecommunication Union—Telecommunication Standardization Sector (“ITU-T”) and the Moving Pictures Experts Group (“MPEG”) under the authority of the International Organization for Standardization (“ISO”) and the International Electrotechnical Commission (“IEC”). The ITU-T has developed the H.26x (e.g., H.261, H.263) family of video coding standards and the ISO/IEC has developed the MPEG-x (e.g., MPEG-1, MPEG-4) family of video coding standards. The H.26x standards have been designed mostly for real-time video communication applications, such as video conferencing and video telephony, while the MPEG standards have been designed to address the needs of video storage, video broadcasting, and video streaming applications.
The ITU-T and the ISO/IEC have also joined efforts in developing high-performance, high-quality video coding standards, including the previous H.262 (or MPEG-2) and the recent H.264 (or MPEG-4 Part 10/AVC) standard. The H.264 video coding standard, adopted in 2003, provides high video quality at substantially lower bit rates (up to 50%) than previous video coding standards. The H.264 standard provides enough flexibility to be applied to a wide variety of applications, including low and high bit rate applications as well as low and high resolution applications. New applications may be deployed over existing and future networks.
The H.264 video coding standard has a number of advantages that distinguish it from other existing video coding standards, while sharing common features with those standards. The basic video coding structure of H.264 is illustrated in
Each macroblock may be coded as an intra-coded macroblock by using information from its current video frame or as an inter-coded macroblock by using information from its previous frames. Intra-coded macroblocks are coded to exploit the spatial redundancies that exist within a given video frame through transform, quantization, and entropy (or variable-length) coding. Inter-coded macroblocks are coded to exploit the temporal redundancies that exist between macroblocks in successive frames, so that only changes between successive frames need to be encoded. This is accomplished through motion estimation and compensation.
In order to increase the efficiency of the intra coding process for the intra-coded macroblocks, spatial correlation between adjacent macroblocks in a given frame is exploited by using intra prediction 105. Since adjacent macroblocks in a given frame tend to have similar visual properties, a given macroblock in a frame may be predicted from already coded, surrounding macroblocks. The difference between the given macroblock and its prediction is then coded, which results in fewer bits to represent the given macroblock as compared to coding it directly. A block diagram illustrating intra prediction in more detail is shown in
Intra prediction may be performed for the entire 16×16 macroblock or it may be performed for each 4×4 block within a macroblock. These two different prediction types are denoted by “Intra—16×16” and “Intra—4×4”, respectively. The Intra—16×16 mode is more suited for coding very smooth areas of a video frame, while the Intra—4×4 mode is more suited for coding areas of a video frame having significant detail.
In the Intra—4×4 mode, each 4×4 block is predicted from spatially neighboring samples as illustrated in
For each 4×4 block, one of nine prediction modes defined by the H.264 video coding standard may be used. The nine prediction modes are illustrated in
Typical H.264 video coders select one from the nine possible Intra 4×4 prediction modes according to some criterion to code each 4×4 block within an intra-coded macroblock, in a process commonly referred to as “mode decision” or “mode selection”. Once the intra prediction mode is decided, the prediction pixels are taken from the reconstructed version of the neighboring blocks to form the prediction block. The residual is then obtained by subtracting the prediction block from the current block, as illustrated in
The mode decision criterion usually involves optimization of a cost to code the residual, as illustrated in
The rate-distortion cost evaluates the Lagrange cost for predicting the block with each candidate mode out of the nine possible modes and selects the mode that yields the minimum Lagrange cost. Because of the large number of available modes for coding a macroblock, the process for determining the cost needs to be performed many times. The computation involved in the coding mode decision stage is therefore very intensive.
Despite being computationally intensive, the cost optimization to decide the prediction mode(s) for a given block is typically based solely upon the previous blocks, as illustrated in
Accordingly, it would be desirable to provide techniques for deciding the coding modes of all blocks in a macroblock that achieve a better rate-distortion trade-off than the current approaches.
The invention includes a computer readable storage medium with executable instructions to select a plurality of blocks in a video sequence to be coded as intra-coded blocks. Aggregate intra prediction costs are computed for each intra-coded block relative to a corresponding previous intra-coded block. An intra prediction mode is selected for each intra-coded block based on the aggregate intra prediction costs.
An embodiment of the invention includes a method for selecting intra prediction modes for intra-coded blocks in a video sequence. Aggregate intra prediction costs associated with a plurality of intra prediction modes for each intra-coded block are computed relative to a subset of intra prediction modes for a corresponding previous intra-coded block. A subset of intra prediction modes for each intra-coded block is selected based on the aggregate intra prediction costs. An intra prediction mode from the subset of intra prediction modes for each intra-coded block that yields a smallest total aggregate intra prediction cost is determined.
Another embodiment of the invention includes a video coding apparatus having an interface for receiving a video sequence and a processor for coding the video sequence. The processor has executable instructions to select a plurality of blocks from the video sequence to be coded as intra-coded blocks and to select an intra prediction mode for each intra-coded block based on an aggregate intra prediction cost computed relative to a subset of intra prediction modes for a corresponding previous intra-coded block.
The invention is more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
The present invention provides an apparatus, method, and computer readable storage medium for high-quality intra prediction mode selection in a video coder. As generally used herein, intra mode prediction refers to the prediction of a block in a macroblock of a digital video sequence using a given intra prediction mode. The intra prediction mode may be selected from a plurality of intra prediction modes, such as the prediction modes specified by a given video coding standard or video coder, e.g., the H.264 video coding standard, for coding a video sequence. The block may be a 4×4 block or a 16×16 block from a 16×16 macroblock, or any other size block or macroblock as specified by the video coding standard or video coder.
According to an embodiment of the invention, an intra prediction mode is selected for each block in a given intra-coded macroblock based on aggregate intra prediction costs relative to a corresponding previous block. As generally used herein, aggregate intra prediction costs refer to cumulative intra prediction costs for a current intra-coded block and its corresponding previous intra-coded block. The costs can be a Sum of the Absolute Differences (“SAD”) cost between the original block and the predicted block, a Sum of the Square Differences (“SSE”) cost between the original block and the predicted block, or, more commonly utilized, a rate-distortion cost.
Accordingly, as generally used herein, an intra prediction cost for a given intra-coded block refer to the intra prediction cost associated with a given intra prediction mode selected for coding the block. As appreciated by one of ordinary skill in the art, the intra prediction cost for a given intra-coded block is computed by predicting the block relative to the reconstructed version of its neighboring blocks and coding the residual from the predicted block and the given block, as described above with reference to
As described in more detail herein below, a current intra-coded block and its corresponding previous intra-coded block are processed in a processing order. For example, the corresponding previous block in a macroblock for the second block to be processed in the macroblock is the first block processed in the macroblock, the corresponding previous block in a macroblock for the third block to be processed in the macroblock is the second block processed in the macroblock, the corresponding previous block for the fourth block to be processed in the macroblock is the third block processed in the macroblock, and so on. It is appreciated that the first block to be processed in the macroblock does not have a corresponding previous block. As described in more detail herein below, aggregate intra prediction costs computed for the first block in the macroblock are simply the intra prediction costs for coding the first block.
In one embodiment, intra prediction costs are computed for a subset of intra prediction modes for the corresponding previous block. The aggregate intra prediction costs for the current intra-coded block are then computed by adding the intra prediction costs for a plurality of intra prediction modes for the current intra-coded block to the intra prediction costs for the subset of intra prediction modes for the corresponding previous block.
For example, as described in more detail herein below, for a given previous block A, intra prediction costs are computed for a subset of intra prediction modes, e.g., three intra prediction modes out of a total of nine intra prediction modes such as those specified in the H.264 standard. Then, for a current block B, intra prediction costs are computed for all the intra prediction modes, e.g., for all the nine intra prediction modes. The intra prediction costs for the subset of intra prediction modes for previous block A are then added to the intra prediction costs for all the intra prediction modes for current block B to generate the aggregate intra prediction costs for the current block B.
According to an embodiment of the invention, a subset of intra prediction modes having the lowest aggregate intra prediction costs are selected for each intra-coded block. Using the example above, for current block B, a subset of, say, three, intra prediction modes are selected.
Coding paths are then formed and stored between each intra prediction mode in the subset of intra prediction modes for the corresponding previous block and a corresponding intra prediction mode for the current block. A coding path, as generally used herein, refers to an association between an intra prediction mode for coding a previous block and an intra prediction mode for coding a current block. In one embodiment, each coding path is associated with an aggregate intra prediction cost.
Using the example above and as described in more detail herein below, each intra prediction mode in the subset of intra prediction modes in current block B has a coding path to a corresponding intra prediction mode in the subset of intra prediction modes for previous block A. For example, three coding paths are formed between current block B and previous block A for three intra prediction modes in the subset of intra prediction modes.
In one embodiment, a subset of coding paths having the lowest aggregate intra prediction costs are joined from the first to the last intra-coded block in a given macroblock. The aggregate intra prediction costs for the coding paths leading the first to the last intra-coded block are then added to generate a subset of macroblock aggregate intra prediction costs. The coding path joining the first to the last intra-coded block that yields the lowest macroblock aggregate intra prediction cost is selected to determine the intra prediction mode for coding each intra-coded block in the macroblock.
For example, as specified in the H.264 and other like video coding standards, e.g., the MPEG family of video coding standards, a macroblock is a 16×16 macroblock having 4×4 or 16×16 intra-coded blocks. Each intra-coded block may be coded as specified in the video coding standard, such as, for example, by using intra prediction.
Next, as described in more detail herein below, aggregate intra prediction costs are computed for each intra-coded block relative to a corresponding previous intra-coded block in step 605. For example, each 16×16 macroblock has a total of 16 4×4 intra-coded blocks. Aggregate intra prediction costs for, for example, the second 4×4 intra-coded block in the 16×16 macroblock are computed relative to the first 4×4 intra-coded block in the 16×16 macroblock. That is, as described in more detail herein below, the aggregate intra prediction costs for the second 4×4 intra-coded block are computed by adding the intra prediction costs for the second 4×4 intra-coded block to the intra prediction costs for the first 4×4 intra-coded block.
It is appreciated that the intra prediction costs that are computed for each intra-coded block are the costs associated with intra prediction modes. It is further appreciated that the first intra-coded block in a given macroblock, by virtue of being the first block in the macroblock, does not have a corresponding previous block in the macroblock. Accordingly, its aggregate intra prediction costs are simply the intra prediction costs associated with intra prediction modes for predicting and coding the block.
Lastly, as described in more detail herein below, an intra prediction mode for each intra-coded block in the macroblock is selected based on the aggregate intra prediction costs in step 610. The intra prediction mode selected for each intra-coded block is selected according to an overall lowest intra prediction cost for the macroblock.
It is appreciated that, in contrast to traditional intra prediction performed in prior art approaches, the intra prediction modes selected for the macroblock are jointly selected between the blocks. That is, the selection of a prediction mode for a given block impacts the selection of the prediction mode for the immediate previous neighboring blocks. By jointly selecting the intra prediction modes for all the blocks in the macroblock, the intra mode decision is not just locally optimized as in the traditional prior art approaches, but rather, it is globally optimized for the entire macroblock.
Referring now to
According to an embodiment of the invention, a subset of the N intra prediction modes is selected for the previous block A in step 700. The subset of intra prediction modes is formed by computing aggregate intra prediction costs for coding the previous block A with the N intra prediction modes and selecting the N intra prediction modes that yield the lowest aggregate intra prediction costs for coding the previous block A. The subset may contain, for example, M<N intra prediction modes, e.g., the subset may contain M=3 intra prediction modes.
It is appreciated that for the first block of the given macroblock, the subset of intra prediction modes contain the M prediction modes that yield the lowest intra prediction costs for coding the block. It is also appreciated that the intra prediction cost for coding the block according to a given prediction mode is computed by predicting and coding the block as described above with reference to
Next, intra prediction is conducted with N allowed prediction modes for the current block B in step 705. Notice that, for the previous block A, there are M reconstructed versions, each corresponding to one of the M selected coding modes, with each coding mode having defined neighboring information. Therefore, for current block B, each one of the N candidate modes is tried M times given different neighboring information in the previous block A. There are then M intra costs computed for each one of the N intra prediction modes for the current block B.
The aggregate intra prediction costs for coding block B are computed by adding the intra prediction costs for the N intra prediction modes for the current block B to the intra prediction costs for the subset of M intra prediction modes for coding the previous block A in step 710. It is appreciated that, only one out of the M computed costs for current block B is added to each cost for block A. That is, if one out of the M modes in previous block A (which has a cost associated with it) is used to predict current block B, a cost can be obtained with this prediction, and only these two costs are added together. In this way, M aggregate intra prediction costs are computed for each intra prediction mode out of the N intra prediction modes available for coding the current block B, resulting in a total of N×M aggregate intra prediction cost computations.
A subset of M intra prediction modes for the current block B is then selected based on the aggregate intra prediction costs in step 715. This is accomplished by selecting, for each one out of the M intra prediction modes available for coding the previous block A, a corresponding one out of the N intra prediction modes for coding the current block B that yields the lowest aggregate intra prediction cost.
Lastly, a coding path is formed and stored between each one out of the M intra prediction modes available for coding the previous block A and its corresponding one out of the N intra prediction modes for coding the current block B that yields the lowest aggregate intra prediction cost in step 720.
Referring now to
That is, block 805 is the corresponding previous block for block 810, block 810 is the corresponding previous block for block 815, block 815 is the corresponding previous block for block 820, and so on. Each block is coded with one intra prediction mode as appreciated by one of ordinary skill in the art and as described above with reference to
Referring now to
A subset of intra prediction modes is also selected for current block B 925, as described in more detail herein above with reference to
As illustrated, each intra prediction mode 930-970 has an M intra prediction cost associated with it, for example, intra prediction mode mB1 930 has an M prediction cost JB1
This is done for all the intra prediction modes 930-970 for current block B 910, that is, for each one of intra prediction modes 930-970, three aggregate intra prediction costs are computed. Then, for each intra prediction mode 930-970, a corresponding intra prediction mode in subset 905 is selected as the one in the subset 905 that yields the lowest aggregate intra prediction cost. For example, intra prediction mode mA1 910 is selected out of intra prediction modes 910-920 in subset 905 as the one that yields the lowest aggregate intra prediction cost for intra prediction mode mB1 930.
The three intra prediction modes for current block B 925 are then selected as the ones that yield the lowest three aggregate intra prediction costs, for example, mB1 930, mB5 950, and mB8 965. As described herein above, coding paths are then formed and stored between the subset of intra prediction modes 905 for previous block A 900 and the subset of intra prediction modes for current block B 910.
Referring now to
Coding paths 1000-1010 have aggregate intra prediction costs associated with them. Coding path 1000 has aggregate intra prediction cost JA1+JB1 1015 associated with it, coding path 1005 has aggregate intra prediction cost JA1+JB5 1020 associated with it, and coding path 1010 has aggregate intra prediction cost JA3+JB8 1025 associated with it.
It is appreciated by one of ordinary skill in the art that aggregate intra prediction costs 1015-1025 are the lowest aggregate intra prediction costs that were computed between previous block A 900 and current block B 925. It is also appreciated by one of ordinary skill in the art that coding paths are formed between the subset of intra prediction modes associated with the first block in a given macroblock all the way to the subset of intra prediction modes associated with the last block in a given macroblock. Selecting intra prediction modes for predicting and coding each block in the given macroblock is simply a matter of selecting the coding path that yields the lowest overall aggregate intra prediction cost.
Referring now to
It is appreciated that for a subset having M intra prediction modes, there are a total of M joined coding paths as each intra prediction mode in a subset selected for a current block is associated via a coding path with one intra prediction mode in the subset selected for its corresponding previous block. For example, in the case where M=3, a total of 3 joined coding paths are available. The joined coding path presenting the lowest aggregate intra prediction cost is selected as the final coding path.
Referring now to
It is appreciated that by jointly selecting the intra prediction modes for all the blocks in the macroblock, that is, by selecting the intra prediction modes from the joined coding path that yields the lowest aggregate intra prediction cost, the intra mode decision for coding a video sequence is not just locally optimized as in traditional prior art approaches, but rather, it is globally optimized for the entire macroblock.
Referring now to
In accordance with an embodiment of the invention and as described above, processor 1310 has executable instructions or routines for coding the received video sequence by using intra prediction. For example, processor 1310 has a routine 1315 for selecting frames, macroblocks, and blocks in the video sequence to be intra-coded by using intra prediction and a routine 1320 for selecting an intra prediction mode for each intra-coded block based on aggregate intra prediction costs computed relative to a subset of intra prediction modes for a corresponding previous intra-coded block.
It is appreciated that video coding apparatus 1300 may be a stand-alone apparatus or may be a part of another device, such as, for example, digital cameras and camcorders, hand-held mobile devices, webcams, personal computers, laptops, mobile devices, personal digital assistants, and the like.
Advantageously, the present invention enables intra prediction to be performed globally in a macroblock to achieve high-quality video sequences. In contrast to traditional intra prediction approaches, the intra prediction modes selected for the macroblock are jointly selected between the blocks. In doing so, the intra mode decision is not just locally optimized as in the traditional prior art approaches, but rather, it is globally optimized for the entire macroblock, thereby achieving superior rate-distortion performance for the entire video sequence.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously, many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications; they thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the following claims and their equivalents define the scope of the invention.