This application claims priority to and the benefit of Korean Patent Application No. 2007-82078, filed Aug. 16, 2007, the disclosure of which is incorporated herein by reference in its entirety.
1. Field of the Invention
The present invention relates to a method and apparatus for determining a block mode using bit-generation probability estimation in moving picture coding, and more particularly, to H.264 video coding technology for Internet protocol (IP)-television (TV) that estimates a bit-generation probability using an average value and a variance value of each block and determines whether or not a current block mode is a skip mode or a direct-prediction mode using the bit-generation probability, and thus can minimize the amount of computation for H.264 video coding and deterioration in image quality.
2. Discussion of Related Art
Digital video data is used in video conferencing, High Definition TVs (HDTVs), Video-on-Demand (VOD) receivers, personal computers (PCs) supporting Moving Picture Experts Group (MPEG) images, game consoles, terrestrial digital broadcast receivers, digital satellite broadcast receivers, cable TVs (CATVs), and so on. Since digital video data has image characteristics and significantly increases in amount during the process of digitizing an analog signal, it is compressed by an efficient compression method rather than being used as it is.
Digital image data is mainly compressed by 3 types of methods, which are a compression method using temporal redundancy, a compression method using spatial redundancy and a compression method using a statistical characteristic of a generated code. Among the 3 methods, a typical compression method using temporal redundancy is motion estimation and compensation, which is used in most moving picture compression standards such as MPEG, H.263, and so on.
The motion estimation and compensation method searches for the most similar portion in a specific portion between the current screen and a previous or next reference screen, and transfers only a difference component between the two portions. The method can reduce data effectively because the difference component to be transferred is reduced as a motion vector can be found as accurately as possible. However, a considerable amount of estimation time and a considerable amount of computation are required to determine the most similar portion in the previous or next frame. Therefore, research into reducing a motion estimation time, which takes the most time when encoding moving pictures, is ongoing.
Meanwhile, there are two main types of motion estimation methods. One is a pixel-by-pixel basis estimation method, and the other is a block-by-block basis estimation method, which is the most widely used algorithm.
The block-by-block basis estimation method divides an image into blocks of a predetermined magnitude and finds a block best matching a current image block within a search region of a previous image. A difference between the found block and the current image block is called a motion vector. The block-by-block basis estimation method is used to encode the motion vector and process the encoded motion vector. To calculate the degree of matching between two blocks, various matching functions can be used. The most generally used matching function is a Sum of Absolute Difference (SAD) calculated by summing all absolute values of differences between pixels in the two blocks.
In the case of an H.264 codec, a search is performed using a cost function based on Rate-Distortion Optimization (RDO) instead of a conventional SAD-based search method. The cost function used in H.264 performs a search using a rate-distortion cost, which is calculated by summing an existing SAD value and a value obtained by multiplying the number of encoded coefficients by a Lagrange multiplier. Here, the number of the encoded coefficients is substituted by and determined to be a value proportional to a quantization coefficient value, and a cost value is determined by multiplying the determined number of encoded coefficients by a fixed Lagrange multiplier. Based on the cost value, the search is performed.
In conventional moving picture coding, an encoding operation is performed in units of a 16×16 or an 8×8 block to obtain high compression efficiency and high image quality. On the other hand, in H.264 video coding, a block mode having a minimum value is selected from 8 kinds of different block modes.
However, in order to determine one of the 8 kinds of different block modes, main pixels and sub-pixels must be searched for each block mode, and also various kinds of encoding operations must be independently performed for each block mode. Thus, H.264 video coding requires a large amount of computation and consumes a large amount of computation time.
The present invention is directed to a method and apparatus for determining a block mode using a bit-generation probability in H.264 video coding for Internet protocol (IP)-television (TV), thus capable of minimizing the amount of computation for determining a block mode and deterioration in image quality.
One aspect of the present invention provides a method of determining a block mode using bit-generation probability estimation in moving picture coding, the method comprising: a performing motion estimation for an input image frame and determining a current macroblock and a corresponding reference macroblock; a calculating an average value and a variance value between the determined current macroblock and the determined corresponding reference macroblock; a calculating a bit-generation probability estimation value between the macroblocks using the average value and the variance value between the macroblocks; and a determining whether or not a current block mode requires additional motion estimation according to the calculated bit-generation probability estimation value.
Another aspect of the present invention provides an apparatus for determining a block mode using bit-generation probability estimation in moving picture coding, the apparatus comprising a motion estimator comprising: a motion estimation unit for performing motion estimation for an input image frame; and a block mode determination unit for determining whether or not a current block mode requires additional motion estimation using an average value and a variance value between a current macroblock determined by the motion estimation and a corresponding reference macroblock.
The above and other objects, features and advantages of the present invention will become more apparent to those of ordinary skill in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
Hereinafter, exemplary embodiments of the present invention will be described in detail. However, the present invention is not limited to the embodiments disclosed below, but can be implemented in various ways. The following embodiments are described in order to enable those of ordinary skill in the art to embody and practice the present invention.
In
The video encoder 1 shown in
Referring to
Here, an intra prediction mode determiner 104 receives a currently input image frame having units of a macroblock and the previously restored reference images, determines an intra prediction mode and outputs the mode to an intra predictor 106. The intra predictor 106 generates an image compensated for difference in color and luminance and outputs the image to the subtractor 110.
The subtractor 110 outputs a difference image Dn among an input macroblock, a macroblock motion-compensated by the motion compensator 102 and the image compensated for difference in color and luminance by the intra predictor 106. The output difference image Dn is quantized in units of a block by a Discrete Cosine Transformer (DCT) 120 and a quantizer 130.
An image frame quantized in units of a block is rearranged for variable length coding by a rearranger 140, and the rearranged image is entropy-encoded by an entropy encoder 150 and output in the form of Network Abstraction Layer (NAL) unit data.
While the encoding is going on, the quantized image output from the quantizer 130 is decoded by a dequantizer 160 and Inverse Discrete Cosine Transformer (IDCT) 170. The decoded image is input to an adder 180, and the motion-compensated macroblock output from the motion compensator 102 and a color difference and luminance-compensated macroblock output from the intra predictor 106 are added to the decoded image, thereby performing image restoration, i.e., motion compensation.
The restored image is passed through a filter 190 for enhancing image quality and then is stored as F′n 13, which is referred to as F′n-1 when a next image is encoded.
The most remarkable characteristic of the present invention is that in the H.264 video encoder 1 constituted as shown in
As illustrated in
First, in order to facilitate understanding of the present invention, differences between conventional art and the present invention will be described.
According to the conventional art, 4×4 DCT and quantization operations are performed, and a Coded Block Pattern (CBP) is obtained using the operation result. Then, using the CBP, it is determined whether or not a current mode is the skip mode in the case of a Predicted (P) frame, or it is determined whether or not a current mode is a direct-prediction mode in the case of a Bidirectionally predicted (B) frame. Subsequently, main pixel and sub-pixel estimation is performed for other modes except the skip mode and the direct-prediction mode, and rate-distortion computation is performed according to the result, thereby selecting a block mode having the minimum rate-distortion cost. Therefore, the amount of computation unnecessarily increases due to the DCT and quantization operations for obtaining a CBP value.
To solve this problem, the present invention does not perform DCT and quantization operations after finishing first main pixel and sub-pixel estimation, but calculates a bit-generation probability estimation value using an average value and a variance value between blocks, thereby determining whether or not a current mode is a skip mode or a direct-prediction mode. Such a principle of the present invention will be described in detail below using equations.
First, when XεR4×4 is a DCT result of motion-compensated 4×4 pixel data, 4×4 pixel data (XQεR4×4) obtained by quantizing the DCT result may be expressed by Equation 1 below.
In Equation 1, XQ(i,j) denotes an element in an i-th column and a j-th row of quantized XεR4×4, X(i,j) denotes an element in an i-th column and a j-th row of XεR4×4 before being quantized, QP denotes a quantization coefficient of H.264, Q[(QP+12) % 6, i, j) denotes a quantization function of QP, i and j depending on i, j and a remainder obtained by dividing a result of adding 12 to the quantization coefficient by 6, and f denotes a quantization level offset value.
In Equation 1, a CBP value of a 4×4 block equals 1 when any of the quantization coefficients is not 0, and otherwise the CBP value equals 0.
In other words, according to the conventional art, it is determined using the CBP value calculated by Equation 1 whether or not a current mode is a skip mode or a direct-prediction mode, and thus a large amount of computation is required.
In comparison with this, as illustrated in
According to the similarity between the DCT output value and the two-dimensional Gaussian probability density function, a DCT output value Y obtained when a frequency/variance value X equals 0 may be considered as an average or a DC component value of DCT output values.
Therefore, from a DCT output value of a 4×4 block, an average value m4×4 between blocks may be calculated by Equation 2 below, and a variance value V4×4 between blocks may be calculated by Equation 3 below.
In Equations 2 and 3, P(i,j,t-k) denotes a pixel value in an i-th column and a j-th row of a unit 4×4 block at a time of (t-k),
Using the thus-calculated average value m4×4 and variance value V4×4, a bit-generation probability estimation value Eh(m4×4, V4×4, QP) between blocks is calculated by Equation 4 below.
In Equation 4, m4×4 denotes an average value calculated in a unit 4×4 block, V4×4 denotes a variance value calculated in the unit 4×4 block, QP denotes a quantization coefficient of H.264, Q[(QP+12) % 6, i, j) denotes a quantization function of QP, i and j depending on i, j and a remainder obtained by dividing a result of adding 12 to the quantization coefficient by 6, f denotes a quantization level offset value, u(x) denotes a unit step function having a value of 1 when x≧0 and a value of 0 when x<0, and θ denotes a threshold value for estimating a bit-generation probability and is calculated by a least square method to be 2.5 to 3.5.
The bit-generation probability estimation value between blocks calculated by Equation 4 has a value of 0 or 1, which is the same as the CBP value. When the bit-generation probability estimation value between blocks is 0, the block mode determiner 322 determines that a current mode is a skip mode or a direct-prediction mode not requiring any further motion estimation.
In other words, according to the conventional art, a large amount of computation is required. This is because DCT and quantization are performed after main pixel and sub-pixel estimation is finished, a CBP value is obtained using the result, and then it is determined whether or not a current mode is a skip mode or a direct-prediction mode using the CBP value. On the other hand, according to the present invention, it is possible to reduce the amount of computation. This is because when all of 16 bit-generation probability estimation values calculated by Equation 4 are 0, it is possible to immediately determine whether or not a current mode is a skip mode or a direct-prediction mode after main pixel and sub-pixel estimation is finished.
Using a bit-generation probability estimation value between blocks calculated by Equation 4, a block mode for the P frame is determined, which shows an accuracy equal to or more than about 75%.
Meanwhile, a block mode may be incorrectly determined by Equation 4. In a first case, a current mode is not determined as a skip mode or a direct-prediction mode, even though it is the skip mode or the direct-prediction mode. In a second case, a current mode is determined as a skip mode or a direct-prediction mode even though it is not the skip mode or the direct-prediction mode. In the first case, it does not matter because the current mode is corrected to the skip mode or the direct-prediction mode. However, in the second case, image quality may deteriorate. Therefore, in Equation 4, the threshold value θ for estimating a bit-generation probability may be reduced from the value of 2.5 to 3.5, which is calculated by the least square method, by about 10%.
On the other hand, in case of the B frame, a bit-generation probability may be correctly estimated by bidirectional, i.e., forward and backward, prediction. Thus, a bit-generation probability estimation value Eh(m4×4, V4×4, QP) between blocks is calculated by Equation 5 below obtained by simplifying Equation 4.
In Equation 5, m4×4 denotes an average value calculated in a unit 4×4 block, V4×4 denotes a variance value calculated in the unit 4×4 block, QP denotes a quantization coefficient of H.264, Q[(QP+12) % 6, i, j) denotes a quantization function of QP, i and j depending on i, j and a remainder obtained by dividing a result of adding 12 to the quantization coefficient by 6, and f denotes a quantization level offset value.
Using a bit-generation probability estimation value between blocks calculated by Equation 5, a block mode for the B frame is determined, which shows a high accuracy equal to or more than about 97%.
In this way, the method of determining a block mode according to an exemplary embodiment of the present invention can rapidly determine whether or not a current mode is a skip mode or a direct-prediction mode using only an average value and a variance value between blocks. In comparison with a conventional method of determining a block mode using DCT, it is possible to rapidly determine whether or not a current mode is a skip mode or a direct-prediction mode with about 50% of the amount of computation, and an actual encoding rate increases by 10 to 20%.
First, when an image frame is input, the motion estimation unit 310 performs motion estimation in units of a main pixel and a sub-pixel using the main pixel prediction module 311 and a sub-pixel prediction module 312, thereby estimating a motion vector (step 510). According to the estimated motion vector, a current macroblock and a reference macroblock corresponding to the current macroblock are determined.
Subsequently, using the bit-generation probability estimator 321, the block mode determination unit 320 calculates an average value and a variance value between the current macroblock and the corresponding reference macroblock determined according to the estimated motion vector (step 520), and then calculates a bit-generation probability estimation value between the blocks using the calculated average value and variance value (step 530).
Subsequently, the block mode determiner 322 of the block mode determination unit 320 checks whether all of 16 bit-generation probability estimation values calculated in step 530 are 0 (step 540). When all of the 16 bit-generation probability estimation values are 0, the block mode determiner 322 determines that a current mode is a skip mode or a direct-prediction mode not requiring additional motion estimation (step 550), and then transfers the determination result to the motion estimation unit 310 to finish motion estimation (step 560).
Meanwhile, when it is checked in step 540 that all of the 16 bit-generation probability estimation values are not 0, the block mode determiner 322 determines that a current mode requires additional motion estimation, performs DCT and quantization, and then calculates a CBP value using the result (step 570). Subsequently, the block mode determiner 322 performs rate-distortion computation using the CBP value (step 580) and selects a block mode having the minimum rate-distortion cost (step 590).
For example, when bit-generation probability estimation results based on the macroblock partition modes illustrated in
In this exemplary embodiment, a block mode is determined through an additional searching process when a bit-generation probability value of a macroblock is 1. However, those skilled in the art can determine a block mode using a clustering algorithm. In this case, needless to say, a reference of a bit-generation probability estimation value requiring the additional searching process may be changed by those skilled in the art.
According to the present invention, whether or not a current mode is a skip mode or a direct-prediction mode is first determined by estimating a bit-generation probability in H.264 video coding for IP-TV, and thus it is possible to reduce unnecessary computation. Therefore, by minimizing the amount of computation for determining a block mode, it is possible to increase an encoding rate and also minimize deterioration in image quality.
While the invention has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2007-0082078 | Aug 2007 | KR | national |