The present invention relates generally to video coding. More particularly, the present invention relates to template matching intra prediction in video coding processes.
This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC). In addition, there are currently efforts underway with regards to the development of new video coding standards. One such standard under development is the SVC standard, which will become the scalable extension to H.264/AVC. Another standard under development is the multi-view coding standard (MVC), which is also an extension of H.264/AVC. Yet another such effort involves the development of China video coding standards.
A draft of the SVC standard is described in JVT-V201, “Joint Draft 9 of SVC Amendment”, 22nd JVT meeting, Marrakech, Morocco, January 2007, available at http://ftp3.1tu.ch/av-arch/jvt-site/2007—01_Marrakech/JVT-V201.zip. A draft of the MVC standard is described in JVT-V209, “Joint Draft 2.0 on Multiview Video Coding”, 22nd JVT meeting, Marrakech, Morocco, January 2007, available at http://ftp3.1tu.ch/av-arch/jvt-site/2007—01_Marrakech/JVT-V209.zip.
In general, conventional video coding standards (e.g., MPEG-1, H.261/263/264) incorporate intra-frame or inter-frame predictions which can be used to remove redundancies within a frame or among the video frames in multimedia applications and services. In a typical single-layer video codec, like H.264, a video frame is processed in macroblocks. If the macroblock (MB) is an inter-MB, the pixels in one MB can be predicted from the pixels in one or more reference frames. If the macroblock is an intra-MB, the pixels in the MB in the current frame can be predicted entirely from the pixels, that were already coded (or decoded), in the same video frame.
For both inter-MB and intra-MB, the MB can be decoded by decoding the syntax elements of the MB. Based on syntax elements, the pixel predictors for each partition of MB can be retrieved. Entropy decoding is performed to obtain the quantized coefficients, and inverse transformation is performed on the quantized coefficients to reconstruct the prediction residual; and. Additionally, pixel predictors can be added to the reconstructed prediction residuals to obtain the reconstructed pixel values of the MB.
At the encoder side, the prediction residuals can be the difference between the original pixels and their predictors. The residuals can be transformed and the transform coefficients can be quantized. The quantized coefficients can then be encoded using certain entropy-coding schemes. If the MB is an inter-MB, the following information related to mode decision can be coded. For example, using H.264 as an example, the information can include: an MB type to indicate whether this is an inter-MB; specific inter-frame prediction modes that are used, where the prediction modes indicate how the MB is partitioned, e.g., the MB can have one partition of size 16×16, or two 16×8 partitions and each partition can have different motion information, and so on; one or more reference frame indices to indicate the reference frames from which the pixel predictors are obtained, where different parts of an MB can have predictors from different reference frames; and one or more motion vectors to indicate the locations on the reference frames where the predictors are fetched.
If the MB is an intra-MB, the following information can be coded, where H.264 is again as an example: MB type to indicate that this is an intra-MB; intra-frame prediction modes used for luma, where if the luma signal is predicted using the intra4×4 mode, then each 4×4 block in the 16×16 luma block can have its own prediction mode, and sixteen intra4×4 modes can be coded for an MB. If the luma signal is predicted using the intra16×16 mode, then one intra16×16 mode be associated with the entire MB; and an intra-frame prediction mode used for chroma. It should be noted that enhancements made to intra prediction in successive video coding standards, e.g., more intra prediction modes, can lead to enhanced coding efficiency.
Intra prediction is used in H.264/AVC to remove spatial redundancy by referring to neighboring samples of previously coded blocks which are to the left of and/or above the to-be-coded block. For luminance samples (i.e., samples of luminance components of a video signal), the size of the intra prediction mode can be 4×4 (Intra4×4), 8×8 (Intra8×8) or 16×16 (Intra16×16). Nine prediction modes (e.g., eight directional modes and one DC mode) for Intra4×4 and Intra8×8 can be used, where the eight directional prediction modes are shown in
A video codec comprises an encoder that transforms input video into a compressed representation suited for storage and/or transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
The prediction representation of the image block 212, as well as the image to be encoded 200, are used together to define a prediction error signal 220 which is used for prediction error coding 203. In prediction error coding 203, the prediction error signal 220 undergoes transform 222 and quantization 224. The data describing prediction error and predicted representation of the image block 212 (e.g., motion vectors, mode information, and quantized transform coefficients) are passed to entropy coding 226. The prediction error decoding 204 is substantially the opposite of the prediction error coding 203, with the prediction error decoding including an inverse quantization 228 and an inverse transform 230. The result of the prediction error decoding 204 is a reconstructed prediction error signal 232, which is used in combination with the predicted representation of the image block 212 to create the preliminary reconstructed image 214.
The decoder reconstructs output video by applying prediction mechanisms that are similar to those used by the encoder in order to form a predicted representation of the pixel blocks (using motion or spatial information created by the encoder and stored in the compressed representation). Additionally, the decoder utilizes prediction error decoding (the inverse operation of the prediction error coding, recovering the quantized prediction error signal in the spatial pixel domain). After applying the prediction and prediction error decoding processes, the decoder sums up the prediction and prediction error signals (i.e., the pixel values) to form the output video frame. The decoder (and encoder) can also apply additional filtering processes in order to improve the quality of the output video before passing it on for display and/or storing it as a prediction reference for the forthcoming frames in the video sequence.
In conventional video codecs, such as that described above, motion information is indicated by motion vectors associated with each motion-compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (at the encoder side) or decoded (at the decoder side). Additionally, each of these motion vectors also represents the prediction source block in one of the previously coded or decoded pictures. In order to represent motion vectors efficiently, motion vectors are typically coded differentially with respect to block-specific predicted motion vectors. For example, in a conventional video codec, the predicted motion vectors are created in a predefined way by calculating the median of the encoded or decoded motion vectors of adjacent blocks.
Conventional video encoders utilize Lagrangian cost functions to find optimal coding modes, e.g., the desired macroblock mode and associated motion vectors, where a macroblock can comprise, for example, a block of 16×16 pixels. This kind of cost function uses a weighting factor λ to tie together the exact or estimated image distortion due to lossy coding methods and the exact or estimated amount of information that is required to represent the pixel values in an image area:
C=D+λ×R (1)
In Eq. (1), C is the Lagrangian cost to be minimized, D is the image distortion (e.g., the mean squared error) with the mode and motion vectors considered, and R is the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).
According to conventional video encoders and decoders, e.g., H.264, intra prediction is based upon directional patterns, where the intra prediction direction needs to be signaled from the encoder to the decoder. Performing template matching intra prediction in alternative pixel orders can improve the coding efficiency.
Various embodiments of the present invention allow for the performance of template matching intra prediction based on a given priority. Priority values of all, or a subset of the pixels on a border between a current N×N block and a reconstructed area are calculated. A border pixel p with the highest priority is used as the center of a template M×M block, and a search in the reconstructed area is performed to find the best matched candidate template. Distortion metrics, e.g., Sum of Absolute Difference (SAD), between known pixels in the to-match template and corresponding pixels in candidate templates are calculated and compared. The candidate template with the smallest distortion metric value is chosen as the best match. The corresponding pixels of the best-matched candidate template in the searching area are used as the predictors of the unknown pixels in the template centered at the pixel p with the highest priority, and the predicted pixels are marked as known. If all of the pixels in the current N×N block are not marked as known, the process is repeated. Coding efficiency in video and image coding, for example, can then be improved by utilizing template matching intra predication in accordance with the various embodiments.
In accordance with other aspects of various embodiments of the present invention, priority can be described as a rule, for example, which prioritizes certain points over others. This results in a sequence of template blocks that are processed according to a particular order. In other words, priority can alternatively be achieved through processes that favor one template block over others at each stage of intra prediction in the N×N block.
These and other features of the invention, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, wherein like elements have like numerals throughout the several drawings described below.
For exemplification, the system 10 shown in
The exemplary communication devices of the system 10 may include, but are not limited to, a combination of personal digital assistant (PDA) and mobile telephone 14, a mobile phone 12, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22, etc. The communication devices may be stationary or mobile as when carried by an individual who is moving. The communication devices may also be located in a mode of transportation including, but not limited to, an automobile, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle, etc. Some or all of the communication devices may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24. The base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the Internet 28. The system 10 may include additional communication devices and communication devices of different types.
The communication devices may communicate using various transmission technologies including, but not limited to, Code Division Multiple Access (CDMA), Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Transmission Control Protocol/Internet Protocol (TCP/IP), Short Messaging Service (SMS), Multimedia Messaging Service (MMS), e-mail, Instant Messaging Service (IMS), Bluetooth, IEEE 802.11, etc. A communication device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connection, and the like.
Various embodiments of the present invention provide systems and methods to perform template matching intra prediction based on a given priority, where the various embodiments can be implemented both at the encoder and decoder levels. According to various embodiments, template matching is performed along with prioritizing blocks based on factors including, but not limited to, a location of the center of the block to be matched and the image/frame content. This center can be located on a boundary separating the block to be intra predicted from an already reconstructed image/frame region, where the image/frame content can refer to that content which resides on the boundary. The template matching further comprises searching a part of or the entirety of the already reconstructed region for the best matching template block. Additionally, residual error is signaled to the decoder by the encoder. It should be noted that various embodiments of the present invention can be applied to both video coding and image coding systems and methods.
It should be noted that for the M×M block, the maximum value of M, i.e., Mmax can be fixed or can vary. When Mmax varies, it can be signaled from the encoder to the decoder. Therefore, in accordance with some embodiments, M is chosen as large as possible, but not larger than Mmax and small enough that the M×M block does not interfere with the non-reconstructed region outside of the N×N block. When Mmax is variable, it may vary according to the value of N or the image content.
As shown in
It should be noted that candidate templates, such as candidate template 830 can be any M×M block in the already reconstructed region 805 or alternatively, in a portion of the already reconstructed region 805. Additionally, the search area within the reconstructed region 805 can be limited, for example, to a predetermined area to reduce the computational complexity of search process effectuated therein. Alternatively, a list of predefined candidate templates may be utilized, where the predefined candidate templates are known to both the encoder and the decoder.
As described above, the pixel p with the highest priority is utilized as the center of the M×M block 820. In one embodiment, the priority can be defined as follows:
P(p)=C(p)+D(p) (2)
C(p) and D(p) are defined as follows:
C(q)=1, ∀qε
In accordance with another embodiment, the metric C(p) and D(p) can be defined differently, where
Defined in this manner, d1(p) is the distance between a point p and the left-most edge of the N×N block 800 to be intra predicted, and d2(p) can be the distance between a point p and the bottom-most edge of the N×N block 800 to be intra predicted. D(p) can be defined using horizontal edge detection filters (for points p on vertical boundaries of the region to be intra predicted) and using vertical edge filters (for points p on horizontal boundaries of the region to be intra predicted). Furthermore, P(p) can be more generally defined as P(p)=a*C(p)+b*D(p), where a+b=1. It should be noted that various embodiments described herein can assign some priority criteria (, e.g., as a rule) to M×M blocks that result in blocks with different sizes and a certain order in which the blocks are encoded/decoded. In other words, priority can alternatively be achieved through processes, rules, criteria, etc. that favor one template block over others at each stage of intra prediction in the N×N block. However, it should be noted that priority can be defined according to a plurality of different methods other than those described herein.
Additionally,
As illustrated in
In order to support the additional intra prediction modes described above, syntax changes to H.264/AVC, for example, are shown below in bolded text. With regard to the derivation process for the Intra4×4PredMode, Table 1 shows an updated specification of Intra4×4PredMode[luma4×4BlkIdx] and associated names. It should be noted that the additional intra prediction modes described above can be applied to other standards as well, where alternative methods of signaling the use of template matching intra prediction to the decoder can be supported.
9
Intra_4x4_Inpainting (prediction mode)
The variable Intra4×4PredMode is derived by applying the following procedure:
if( predIntra4x4PredMode == Intra_4x4_Inpainting)
predIntra4x4PredMode = Intra_4x4_DC
With regard to the derivation process for the Intra8×8PredMode, Table 2 shows an updated specification of Intra8×8PredMode[luma8×8BlkIdx] and associated names.
9
Intra_8x8_Inpainting (prediction mode)
Given an intraM×MPredModeA and intraM×MPredModeB, the variable Intra8×8PredMode is derived by applying the following procedure:
if( predIntra8x8PredMode== Intra_8x8_Inpainting )
predIntra8x8PredMode = Intra_8x8_DC
An updated macroblock prediction syntax is also shown in Table 3 below.
u(1) | ae(v)
!inpainting_intra4x4_pred_mode_flag[ luma4x4BlkIdx ] )
u(1) | ae(v)
!inpainting_intra8x8_pred_mode_flag[ luma8x8BlkIdx ] )
Setting the inpainting_intra4×4_pred_mode_flag[luma4×4BlkIdx] equal to 1 specifies that Intra—4×4_Inpainting is used for the 4×4 luma blocks with an index luma4×4BlkIdx=0 . . . 15. Setting the inpainting_intra4×4_pred_mode_flag[luma4×4BlkIdx] equal to 0 specifies that Intra—4×4_Inpainting is not used for the 4×4 luma blocks with index luma4×4BlkIdx=0 . . . 15.
Likewise, setting the inpainting_intra8×8_pred_mode_flag[luma8×8BlkIdx] equal to 1 specifies that Intra—8×8_Inpainting is used for the 8×8 luma block with an index luma8×8BlkIdx=0 . . . 3. Setting the inpainting_intra8×8_pred_mode_flag[luma8×8BlkIdx] equal to 0 specifies that Intra—8×8_Inpainting is not used for the 8×8 luma block with index luma8×8BlkIdx=0 . . . 3.
The present invention is described in the general context of method steps, which may be implemented in one embodiment by a program product including computer-executable instructions, such as program code, executed by computers in networked environments. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
Software and web implementations of the present invention could be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps. It should also be noted that the words “component” and “module,” as used herein and in the claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs.
The foregoing description of embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the present invention. The embodiments were chosen and described in order to explain the principles of the present invention and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated.
| Number | Date | Country | |
|---|---|---|---|
| 60946390 | Jun 2007 | US |