The invention will be more fully understood by reference to the following drawings which are for illustrative purposes only:
Referring more specifically to the drawings, for illustrative purposes the present invention is embodied in the method and apparatus generally shown in
The invention pertains to video encoding mode selection within the framework of the AVC standard, but is also applicable in any video encoding system that uses block based encoding.
In
In
The basic steps of video coding and the basic structures of coding systems are well known in the art, and can be implemented in many different embodiments and configurations, so they are shown in general functional representations in
The present invention applies to the methods of selecting the encoding mode of step 16, and to the encoding mode selector 36, and in particular to skip mode selection. The invention can be implemented in any AVC standard video coding method and apparatus.
In the present invention, the approach is totally different from the prior art. Instead of fixed bias deduction, both penalty and bias become possible and are adaptively determined according to the real coding condition. The method is illustrated generally in
A more detailed description of this embodiment of the invention follows. First, the basic principles related to multi-pass encoding based rate-distortion optimization and SAD/SATD based mode decision for video compression within the AVC standard are presented (Section I). The encoding method of the present invention for skip mode cost modulation is then set forth in detail (Sections IIA, B, C). Finally, a set of experimental results (Section II) and conclusions (Section IV) are provided.
I. AVC Encoding Mode Decision Overview
The selection of the best encoding mode to encode each macroblock is one of the decisions in the AVC standard that has a very direct impact on the bit rate R of the compressed bitstream, as well as on the distortion D in the decoded video sequence. The goal of encoding mode selection is to select the encoding mode that minimizes the distortion subject to a bit rate constraint. To obtain the optimal result, the macroblock mode decision is made by minimizing the Lagrangian functional:
J(s,c,MODE|QP, λMODE)=SSD(s,c,MODE|QP)+λMODE·R(s,c,MODE|QP)
where QP is the macroblock quantiser, λMODE is the Lagrange multiplier for mode decision, and MODE indicates a mode chosen from the set of potential prediction modes:
where 8×8 includes all the mode combinations of 8×8, 8×4, 4×8 and 4×4, and INTRA 8×8 MODE is included in FRExtension. The SKIP mode refers to the 16×16 mode where no motion and residual information is encoded. SSD is the sum of the squared differences between the original block s and its reconstruction c given as
and R(s,c,MODE|QP) is the number of bits associated with choosing MODE and QP, including the bits for the macroblock header, the motion, and all DCT (discrete cosine transformation) blocks. chd Y[x, y, MODE|QP] and sY[x, y] represent the reconstructed and original luminance values; cu, cv, and su, sv the corresponding chrominance values.
The Lagrangian multiplier λMODE is given by
λMODE,P=0.85×2QP/3
for I and P frames and
for B frames, where QP is the macroblock quantization parameter.
The above approach can provide near optimal performance. However, it needs multi-pass encoding and decoding. The related huge complexity prevents its utilization in any practical applications. To solve this problem, the low complexity approach is widely adopted. This low complexity approach can be described by minimizing the following cost functional
J(s,c,MODE|QP,λMODE)=SA(T)D(s,c,MODE|QP)+λMODER(MV,REF)
Compared to above, SSD is replaced by SAD or SATD (SA(T)D in the equation stands for either SAD or SATD), the rate only represents the bits to code the motion vector and reference picture index. SAD is the sum of the absolute differences between the original block s and its reconstruction c, and SATD is the sum of the absolute transformed differences between the original block s and its reconstruction c. Note that the cost estimation based on SATD is usually more accurate than SAD, but the complexity of SATD is higher than SAD. Therefore, use of SAD or SATD may depend on the application requirements.
In this way, a one-pass mode decision can be obtained without encoding and decoding process. Since there is no motion vector information for SKIP and INTRA mode, some adjustments are made. For INTRA 16×16 mode, its Lagrangian cost is just SA(T)D; for SKIP mode, 8·f (QP) is subtracted from SA(T)D to favor the skip mode, where f denotes a fixed equation; for the whole intra 4×4 macroblock, 12·f (QP) is added to the SA(T)D before comparison with the best SA(T)D for inter prediction. This is an empirical value to prevent using too many intra blocks. These strategies have been proved to improve the encoding performance.
According to the above analysis, the method to modulate the cost of the skip mode is fixed. Once the quantization scale is fixed, a fixed level will be deducted from the original skip mode cost no matter what is the real condition. This is illustrated in
IIA. Complexity based skip mode cost modulation
According to the above analysis, the skip mode decision is actually equivalent to bit allocation. Skip mode can save bits at the expense of quality degradation. The problem is how to suitably select it. Based on the conventional R/D theory, if the R/D relationship is exponential-like, the optimal bit allocation will minimize the quality difference of each macroblock within a frame or a set of frames. In this way, if the current macroblock is potentially to have big coding distortion, more bits need to be assigned to it to reduce the distortion. On the other hand, if the current macroblock is potentially to have small coding distortion, no more bits need to be assigned to it to further reduce the distortion. When the average distortion can be obtained, bit allocation can be applied accordingly. Therefore, if the expected distortion of the skip mode is larger than the expected average distortion, it should not be selected. On the other hand, if the expected distortion of the skip mode is less than the expected average distortion, more bias should be given to it such that the saved bits can be used for other macroblocks to obtain the smaller distortion. Since the original skip mode estimation methods use uniform cost deduction without considering the actual distortion, modification is needed to improve the R/D performance.
Therefore, the present invention performs skip mode estimation as shown in
In the R/D theory, distortion is usually calculated by mean square error (MSE or SSD). Assuming the current picture uses the same quantization scale Q, the expected distortion of the current picture after decoding can be roughly estimated as Q2/12 for uniform distribution. Then, this can be set as a threshold and compared with the MSE of the skip mode. However, SAD/SATD are more widely utilized for cost estimation due to their low complexity and good performance. Hence, it is necessary to find a simple way to estimate the SAD/SATD based distortion of the current frame.
The distortion due to quantization can be estimated according to the PDF (probability distribution function) assumption of the DCT residues. Assuming a uniform quantizer with step size Q, the quantization caused SAD/SATD based distortion is given by
It can be shown that this infinite sum converges and is bounded by Q. Since there are 256 pixels in one macroblock, the SAD/SATD of one macroblock is bounded by 256 Q. Recent research shows that the Cauchy distribution more accurately reflects the distribution of AVC coded DCT residues. Thus, it can be used to obtain the expected distortion based on the above equation. In reality, rate control may use different Q's for different macroblocks. In this case, picture level QP should be used to obtain the expected distortion. After that, the expected distortion is used as the threshold. If the current skip mode caused SAD/SATD to be larger than the threshold, the penalty is gradually added to the skip mode cost; if the current skip mode caused SAD/SATD to be less than the threshold, the cost of the skip mode is gradually reduced by subtracting the bias. After this modulation, the skip mode cost is compared to the best inter mode cost and best intra mode cost to determine the final mode.
IIB. Complexity Based Threshold and Modulation Level Adjustment
In the current video encoding strategy, the accuracy of motion estimation is partly dependent on the quality of the previous reference frame.
If the previous frame has good quality, the prediction residue will be less in the next frame and the same quality can be obtained by using fewer bits; if the previous frame has bad quality, the prediction residue will be more in the next frame and the same quality can only be obtained by using more bits. Although skip mode can save many bits, the quality of the macroblock (MB) using skip mode is usually worse than using other inter modes. In low bit-rate conditions, the quality of the reference picture is usually not very good due to the big quantization scale. Hence, even with very good motion estimation, there are still many residues left after compensation. Under this condition, the percentage of skip mode by using the conventional method is higher. That means the bad quality in the previous frame more frequently transmits to the subsequent frames. Under this condition, it is necessary to add more penalties to the skip mode in the low bit rate. On the contrary, more bias should be given to the skip mode in the high bit rate.
For a low complexity application such as a mobile device, SAD has to be used to save the encoding time. Since SAD is done in spatial domain, sometimes it is not accurate compared to SATD. Through our investigation, we found that a smaller threshold should be used when SAD based prediction is used.
With the above analysis in mind, the present invention can be further adjusted as follows, as illustrated in
IIC. Overall Scheme for Skip Mode Cost Estimation
The overall skip mode estimation scheme of the invention is shown in
Finally, the skip mode cost is modulated by either penalty level or bias level, step 94.
An apparatus 100 for carrying out the overall skip mode estimation scheme of
The SAD or SATD is obtained accordingly. At the same time, quantization determination unit 106 derives the quantization scale from the rate control module (not shown). The initial threshold is calculated by threshold estimator 108, using inputs from difference calculator 104 and quantization determination unit 106. If SAD or very fast codec is used, the threshold is adjusted according to the above procedure. Then, the skip mode SAD/SATD from difference calculator 104 is compared with the threshold from threshold estimator 108 in comparator 110. If it is greater than the threshold, the penalty modulation level is calculated in penalty modulation unit 112 based on the quantization level (and any adjustments to threshold); if it is less than the threshold, the bias modulation level is calculated in the bias modulation unit 114 based on the quantization level (and any adjustments to threshold). Finally, the skip mode cost is modulated by either penalty level or bias level in the skip mode modulation unit 116. Apparatus 100 is generally a processor, e.g. digital computer, or a part thereof, and the various components may be implemented in hardware or in software.
Thus the invention provides improved method and apparatus of the types shown generally in
III. Experimental Results
The effectiveness of the skip mode optimization method of the present invention has been tested. Two sequences have been tested. Flower is an interlaced sequence and is coded by an IP only structure with two reference fields (I frames); City is a progressive sequence and is coded by an IP only structure with one reference frame. In order to obtain fair comparison, the performance was first tested by using fixed QP with fast coding (SAD, fast inter mode decision in SONY Real Time AVC Encoder). The results are shown in Table 1 below. It is seen that the present method significantly improves the performance of the fast encoder by using fixed QP.
Then, the performance was tested by using rate control when high quality coding (SATD, SONY High Quality AVC Encoder) is used. The results are shown in Table 2 below. It is seen that the present method also improves the performance of the high quality encoder when rate control is used.
Besides the SONY codec, the invention has also been tested on other codecs. Moderate coding gain has been obtained.
IV. Conclusions
This invention provides a method and apparatus to improve the skip mode selection in P pictures within the framework of the AVC standard.
Radically different from the prior art, which uses fixed bias to favor the skip mode, the invention has improved skip mode estimation by complexity based threshold determination, penalty modulation level adjustment and bias modulation level adjustment for the encoding mode selection. Experimental results have demonstrated the superior subjective and objective quality of the invention on various video sequences compared to the result obtained using a reference encoder. In the case of fast low complexity encoding, the improvement is significant. Moreover, this improvement is obtained without any complexity increase and can be easily embedded into any encoding system. Although the present invention makes use of the AVC framework, the encoding method of the present invention is applicable in any video encoding system that employs the block based encoding design.
Although the description above contains many details, these should not be construed as limiting the scope of the invention but as merely providing illustrations of some of the presently preferred embodiments of this invention. Therefore, it will be appreciated that the scope of the present invention fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the present invention is accordingly to be limited by nothing other than the appended claims, in which reference to an element in the singular is not intended to mean “one and only one” unless explicitly so stated, but rather “one or more.” All structural and functional equivalents to the elements of the above-described preferred embodiment that are known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the present claims. Moreover, it is not necessary for a device to address each and every problem sought to be solved by the present invention, for it to be encompassed by the present claims. Furthermore, no element or component in the present disclosure is intended to be dedicated to the public regardless of whether the element or component is explicitly recited in the claims. No claim element herein is to be construed under the provisions of 35 U.S.C. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for.”