1. Field
This disclosure relates generally to decoders, and more specifically, to video decoders.
2. Related Art
Video decoding when video is transmitted from a source to a receiver. The receiver needs to be prepared for the type of signal being received and standards have been and continue to be developed for this purpose. One standard for such purpose is a standard of the International Telecommunications Union (ITU) known as H.264. This standard is potentially applicable to any video transfer and has become particularly useful in cellular phone applications where video is being transmitted. Because video is transmitting so much data, it can be very time consuming to perform high quality video transfers using a cellular phone. Thus, there is a continuing need for increasing the speed of performing a video transfer at high quality. A video received may be downloaded to another medium such as a computer or even displayed on a television where the demand for improved quality is even higher. The H.264 standard uses compression to improve speed. One technique is to take advantage of spatial redundancy which identifies and transmits differences of adjacent portions of a frame. This is helpful in some ways but the demand for high quality continues, especially without sacrificing much if any speed.
Thus there is a need for a decoder that improves on the speed and/or quality of current decoders.
The present invention is illustrated by way of example and is not limited by the accompanying figures, in which like references indicate similar elements. Elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale.
In one aspect, a video encoder and a decoder analyze the spatial content video data in an H.264 stream using the discrete cosine transform (DCT). Although the DCT is computed as part of the H.264 encoding process, it is not computed as part of the decoding process. Thus, one would compute the DCT of the video data after it has been reconstructed by the video decoder for video post-processing or enhanced video encoding. A method for accelerating the computation of the DCT at the decoder side when transmitting intra-mode macroblocks uses information computed by the encoder and transmitted as part of the H.264 video stream. This is better understood by reference to the following description and the drawings.
Shown in
A macroblock in this context is a portion of a frame. In the specific example of the H.264 standard, implementations are a square portion of pixels of a frame in which each side of the square is a power of 2. For example, a macroblock may be a 4×4 portion of a frame. Frame sizes can vary. One example of a frame is data for 1920×1080 pixels which is for the 1080P standard used in high definition televisions. Another example of a frame is 176 by 144 known as QCIF resolution which is commonly used in mobile phones.
Decoder 14 also comprises video stream data 34 which comprises DCT of residual block 36 of data and prediction mode flag 38 of data. Decoder 14 further comprises inverse DCT and inverse quantizer 40 that receives DCT of residual MB, a residual MB 42 as an output of inverse DCT and inverse quantizer 40, a prediction MB generator 46 that receives prediction mode flag 38 of data, a prediction MB 48 of data provided by prediction MB generator 46, an adder that receives residual MB 42 and prediction MB 48 and provides reconstructed MB 50 of data. Decoder 14 is known to one of ordinary skill in the art for being part of an encoder that implements the H.264 standard. Further one of ordinary skill in the art would understand that decoder 14 could also be used for decoding video data that was being received so that the video data would be decoded into a form usable for a display. In such case DCT of residual MB 36 and prediction mode flag 38 would be from the incoming video stream instead of from encoder portion 12.
DCT unit 15 receives prediction MB 48, prediction mode flag 38, and DCT of residual MB 36. DCT unit comprises a DCT 16 that receives prediction mode flag 38 and prediction MB 48, a summer 18 that receives an output of DCT 16 and prediction mode flag 38 and provides an output DCT of MB 19 of data. DCT 16 comprises control circuitry 21 having an input for receiving prediction mode flag 38 and DCT computation circuitry 23 having a first input coupled to an output of control circuitry 21, a second input for receiving prediction MB 48, and an output coupled to summer 18.
Shown in
Decoder 114 also comprises video stream data 134 which comprises DCT of residual block 136 of data and prediction mode flag 138 of data. Decoder 114 further comprises inverse DCT and inverse quantizer 140 that receives DCT of residual MB, a residual MB 142 as an output of inverse DCT and inverse quantizer 140, a prediction MB generator 146 that receives prediction mode flag 138 of data, a prediction MB 148 of data provided by prediction MB generator 146, an adder that receives residual MB 142 and prediction MB 148 and provides reconstructed MB 150 of data. Decoder 114 is known to one of ordinary skill in the art for being part of an encoder that implements the H.264 standard. Further one of ordinary skill in the art would understand that decoder 114 could also be used for decoding video data that was being received so that the video data would be decoded into a form usable for a display. In such case DCT of residual MB 136 and prediction mode flag 138 would be from the incoming video stream instead of from encoder portion 112.
DCT unit comprises a DCT 116 that receives reconstructed MB 150 and provides a DCT of MB 119 of data. Because DCT 116 receives reconstructed MB 150, separate control circuitry, such as control circuitry 21 of
Whether provided by DCT unit 16 or 116, DCT of MB 19 and DCT of MB 119 are the same data for a given input. The DCT of MB 19, 119 has several potential beneficial uses whether the decoder is part of the encoder or not.
To encode a macroblock 20/120 into an H.264 stream, an H.264 encoder, such as an encoder portion 12/112, takes a macroblock MB 20/120 of video data and computes the quantities as shown in
To compute the DCT of this macroblock at the decoder side, the DCT of the reconstructed macroblock 150 can be directly computed as shown in
Intra prediction modes are modes defined in H.264. Each of the intra prediction modes specifies how the codec, which is an encoder (which typically includes a decoder) plus a decoder plus other features involved in providing the encode and decode, should incorporate data from neighboring, previously-decoded macroblocks 28/128 to form the prediction. For example, a vertical prediction mode selects a bottom row of the macroblock 20/120 directly above the current macroblock and replicates this row through all rows in this macroblock. Each of the other prediction modes filters and replicates neighboring pixels. In general, these filtered pixels are replicated to multiple positions in the current prediction macroblock 32/132. Analytically computing the DCT allows for exploit this structure and redundant data.
The forward DCT for H.264 can be computed by applying the transformation matrices Ti to the 4×4 macroblock M in the following way:
In general, this transformation would take 64 additions/subtractions and 16 scalings (implementable as binary shifts when scaling by a power of 2) when posed as DCT 116. A DCT 116 that meets these criteria may be known as a Fast DCT.
In a DC prediction mode (Intra—4×4_DC), all the values of macroblock 48 are formed by taking the average of available pixels neighboring macroblock 48.
In this equation pi represents the pixels within a previous macroblock selected by prediction mode select 130. The DCT performed by DCT 16, and more particularly computation circuitry 23, of this prediction matrix is
DCT{Mpred}[0,0]=k0
DCT{Mpred}[i,j]=0, otherwise.
For the vertical prediction mode (Intra—4×4_Vertical), prediction macroblock 48 defined by the H.264 standard has the form:
where ki are coefficients obtained from the bottom row of the macroblock vertically above this macroblock. This predicted macroblock can only have horizontal frequency components. Explicit computation shows the DCT of M to be
Thus, we see that this simplifies to the complexity of a 1-D DCT. By factoring appropriately and reusing terms, we can reduce the number of required additive operations to eight and the number of scalings to six. An additive operation is defined as an addition or subtraction.
DCT{Mpred}[0,0]=4[(k0+k3)+(k1+k2)]
DCT{Mpred}[0,1]=[8(k0+k3)+4(k1−k2)]
DCT{Mpred}[0,2]=4[(k0+k3)−(k1+k2)]
DCT{Mpred}[0,3]=[4(k0−k3)−8(k1−k2)]
i≠0DCT{Mpred}[i,j]=0
For the horizontal prediction mode (Intra—4×4_Horizontal), the DCT of prediction macroblock 48 in the case of DCT 16 are similarly simple, except that the nonzero terms occur for vertical spatial frequencies. Prediction macroblock 148 has the form
DCT{Mpred}[0,0]=4[(k0+k3)+(k1+k2)]
DCT{Mpred}[1,0]=[8(k0+k3)+4(k1−k2)]
DCT{Mpred}[2,0]=4[(k0+k3)−(k1+k2)]
DCT{Mpred}[3,0]=[4(k0−k3)−8(k1−k2)]
j≠0DCT{Mpred}[i,j]=0
The computational complexity of this case is the same as for the Intra—4×4_Vertical case.
In the diagonal down left prediction mode (Intra—4×4_Diagonal Down Left) the predicted macroblock has the structure.
This shows that this requires 33 additive operations and 11 scalings by a power of two.
In the diagonal down left prediction mode (Intra—4×4_Diagonal_Down_Right) the predicted macroblock has the structure:
This shows that this requires 33 additive operations and 11 scalings by a power of two.
Thus shown is that the DCT for five prediction macroblock modes can be simplified. Additionally, the same approach can be used to simplify the calculation of prediction macroblocks for other prediction modes as well.
The precise amount of time required to perform this computation would depend on what technology were used to implement it (e.g., in hardware as RTL, in software on a given processor architecture, etc. The above measures of operational complexity are intended to demonstrate the relative complexity of computing the DCT of macroblock for each prediction mode in a way that is relatively independent of architecture and technology.
From [1], we find that DCT 116 requires 64 additive operations to compute, and 16 scalings, which can be implemented as shifts.
The above table shows how many operations are required for DCT 16 to compute the DCT of prediction macroblock 48. That is, DCT 16 only computes DCT of a macroblock generated by Prediction Macroblock Generator 46. We wish to compute the DCT of the prediction macroblock 48 plus the residual, so we must add the DCT of the residual macroblock 42 to the DCT of the prediction macroblock 48. This means that 16 more additions are required. Therefore, the total operation count would be
In the last four cases as shown in the table, which are vertical right and left and horizontal up and down, the total number of operations required is greater than the number of operations required for DCT 116. Therefore, if presented with a macroblock whose prediction mode is one of these modes, DCT 116 would provide more benefit in terms of operation count, at the expense of concurrency.
Shown in
Although the invention is described herein with reference to specific embodiments, various modifications and changes can be made without departing from the scope of the present invention as set forth in the claims below. For example, a single line was drawn serially through the cores from the group controller, this may be achieved with multiple lines or different lines from the group controller. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present invention. Any benefits, advantages, or solutions to problems that are described herein with regard to specific embodiments are not intended to be construed as a critical, required, or essential feature or element of any or all the claims.
The term “coupled,” as used herein, is not intended to be limited to a direct coupling or a mechanical coupling.
Furthermore, the terms “a” or “an,” as used herein, are defined as one or more than one. Also, the use of introductory phrases such as “at least one” and “one or more” in the claims should not be construed to imply that the introduction of another claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an.” The same holds true for the use of definite articles.
Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements.
Number | Name | Date | Kind |
---|---|---|---|
5832124 | Sato et al. | Nov 1998 | A |
6115070 | Song et al. | Sep 2000 | A |
7936818 | Jayant et al. | May 2011 | B2 |
20040013202 | Lainema | Jan 2004 | A1 |
20050157797 | Gaedke | Jul 2005 | A1 |
20060233250 | Cha et al. | Oct 2006 | A1 |
20080037637 | Kawashima et al. | Feb 2008 | A1 |
20080165849 | Moriya et al. | Jul 2008 | A1 |
20080232463 | Lu et al. | Sep 2008 | A1 |
20080232471 | Mittal et al. | Sep 2008 | A1 |
20080310507 | Ye et al. | Dec 2008 | A1 |
20100239006 | Ng et al. | Sep 2010 | A1 |
20110090969 | Sung et al. | Apr 2011 | A1 |
20120213277 | Kondo et al. | Aug 2012 | A1 |
Number | Date | Country | |
---|---|---|---|
20100239006 A1 | Sep 2010 | US |