(A) Field of the Invention
The present invention relates to a video decoding method, and more specifically, to a decoding method of context-based adaptive binary arithmetic coding (CABAC).
(B) Description of the Related Art
H.264/AVC is the latest video coding standard developed by the ITU-T Video Coding Experts Group and ISO/IEC Moving Picture Experts Group. It has several new features including multiple reference frames and variable block size motion estimation, integer DCT, in-loop deblocking filter, and context-based adaptive binary arithmetic coding (CABAC). In comparison with MPEG-4, CABAC can achieve up to 50% bit-rate saving under the same video quality constraint.
CABAC is one of two entropy coding methods in H.264/AVC, and the arithmetic decoding is shown in
Compared to another method named context-based adaptive variable length coding (CAVLC), CABAC saves more than 7% of bit-rate at the expense of higher computation complexity. Profiling results show that CABAC consumes about 10% of total decoding time. Therefore, accelerating the CABAC decoding with hardwired implementation is desirable for high-performance or low-power applications.
According to an analysis of decoding times for different types of syntax elements, the present invention provides a highly efficient CABAC decoding method by decreasing the decoding cycles.
According to the present invention, a decoding method of CABAC decoder comprises an arithmetic engine performing two arithmetic decodings for coefficient, Significant, and Last_significant bins and proposed a novel context memory architecture to read contexts at the same time in a clock cycle.
More specifically, the arithmetic decoding for a coefficient comprises the steps of: (1) providing a residual block comprising Significant_flags, Last_significant_flags, coefficients and the corresponding contexts; (2) sequentially resolving the Significant_flag and the Last_significant_flag of a non-zero coefficient; and (3) decoding the non-zero coefficient to obtain regular bins and bypass bins, wherein the arithmetic decoding is conducted once or twice bins in a clock cycle.
Reading contexts at the same time comprises the steps of: (1) providing a video block comprising Significant_flags, Last_significant_flags, coefficients and the corresponding contexts; (2) rearranging a context table corresponding to the plurality of contexts to a first context table and a second context table, the first context table comprising Significant_flags of the contexts and the second context table comprising Last_significant_flags of the contexts; and (3) simultaneously reading contexts corresponding to the Significant_flags and the Last_significant_flags for decoding.
The objectives and advantages of the present invention will become apparent upon reading the following description and upon reference to the accompanying drawings in which:
The CABAC decoding method of the present invention is illustrated with reference to the appended drawings.
Table 1 shows a data distribution of different syntax elements (SE). According to the bin numbers in Table 1, Coded_block_flag, Coefficient, Significant_flag and Last_significant_flag (Sig. & Last_sig. pair) occupy around 80% of the total data and, in particular, they occupy 90% in I macroblock. In addition, the data rate of I macroblock is more than 3 times that of P and B macroblock. The present invention exists mainly to increase the decoding efficiency of I macroblock.
H.264/AVC partitions one macroblock into 24 “4×4 residual blocks.”
Referring to
After Significant_flag and Last_significant_flag are resolved, the non-zero coefficients mapped to Sig. & Last_sig. Pair are obtained, and they are −1, 10, −20 in order. CABAC decoder presents the coefficient value by unary and 0th order Exp-Golomb Scheme and indicates the sign of coefficient by a Sign_flag syntax element. The decoded coefficient includes a regular portion of the prefix part and the following bypass portion of sign flag. If the coefficient is negative, the value of a Sign_flag is equal to 1. If the coefficient is positive, the value is equal to 0.
The present invention proposes two methods to reduce clock cycles for decoding syntax elements of coefficient, and Sig. & Last_sig. Pair, respectively.
The CABAC decoder uses 41% of total cycles to decode Coefficient SE. Therefore, the present invention proposes a two-bin-per-cycle method as depicted in
A context memory 51 transfers context data to an arithmetic engine 53 through a forwarding circuit 52. The forwarding circuit 52 is configured to avoid reading non-updated context data when decoding a sequence of bins with the same context. The arithmetic engine 53 includes two arithmetic decoders 531 and 533 and two renormalization modules 532 and 534. The arithmetic decoder 531, the renormalization module 532, the arithmetic decoders 533 and the renormalization module 534 are connected in series. The arithmetic decoders 531 and 533 transmit bin values to a syntax element decoder and the number of shift bits to a buffer 54. The buffer 54 transmits the bit streams to the renormalization modules 532 and 534.
The arithmetic engine 53 includes the two arithmetic decoders 531 and 533, so that two regular bins, two bypass bins or a regular bin and a bypass bin can be decoded in a clock cycle.
Referring to
If the first bin is not equal to 0, the empirical data shows that the percentage of the coefficient equal to 2 or −2 is around 20%. Therefore, the second bin equal to 0 has higher probability, and the second bin is assumed to be 0, which is still under regular mode. If the coefficient is equal to 2, it needs two clock cycles including a cycle for decoding the first bin and a cycle for decoding the second bin and Sign_flag bin to complete decoding.
If the coefficient is equal to 3, it needs three clock cycles to complete the decoding, including a cycle for decoding the first bin, a cycle for decoding the second and the third bins, and a cycle for decoding the Sign_flag bin.
In other words, a regular together with a bypass can be decoded by assuming that the second bin is bypass mode. Therefore, if the coefficient is equal to “1” or “−1”, only a clock cycle is needed for decoding.
Referring to Table 2, the present invention in comparison with the prior art can effectively reduce the clock cycles for coefficient decoding. Taking into account control overhead and stall due to buffer emptiness, the proposed two-bin-per-cycle method contributes 13% reduction of total cycles.
The above embodiment performs two coefficient decodings in one clock cycle. However, the applications with other numbers of coefficient decodings and based on the same concepts are also covered by the present invention.
According to empirical analysis, there are on the average 6 Significant_flags and 4 Last_significant_flags in one 4×4 block. Their decoding accounts for 31.7% of the total time. The context table is divided into two tables as shown in
By the rearrangement of context tables, the proposed CABAC decoder saves 12% of total cycles after taking into consideration stall due to buffer emptiness.
In an embodiment, 309 clock cycles are used to decode a typical I-type macroblock. It needs to run at only 45 MHz for 1080 HD application.
The above-described embodiments of the present invention are intended to be illustrative only. Numerous alternative embodiments may be devised by those skilled in the art without departing from the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
096111738 | Apr 2007 | TW | national |