[Not Applicable]
[Not Applicable]
[Not Applicable]
The JPEG (Joint Pictures Experts Group) and MPEG (Motion Picture Experts Group) standards were developed in response to the need for storage and distribution of images and video in digital form. JPEG is one of the primary image-coding formats for still images, while MPEG is one of the primary image-coding formats for motion pictures or video. The MPEG standard includes many variants, such as MPEG-1, MPEG-2, and Advanced Video Coding (AVC). Video Compact Discs (VCD) store video and audio content coded and formatted in accordance with MPEG-1 because the maximum bit rate for VCDs is 1.5 Mbps. The MPEG-1 video stream content on VCDs usually has bit-rate of 1.15 Mbps. MPEG-2 is the choice for distributing high quality video and audio over cable/satellite that can be decoded by digital set-top boxes. Digital versatile discs also use MPEG-2.
Both JPEG and MPEG use discrete cosine transformation (DCT) for image compression. The encoder divides images into 8×8 square blocks of pixels. The 8×8 square blocks of pixels are the basic blocks on which DCT is applied. DV uses block transform types 8×8, and 4×8. DCT separates out the high frequency and low frequency parts of the signal and transforms the input spatial domain signal into the frequency domain.
Low frequency components contain information to reconstruct the block to a certain level of accuracy whereas the high frequency components increase this accuracy. The size of the original 8×8 block is small enough to ensure that most of the pixels will have relatively similar values and therefore, on an average, the high frequency components have either zero or very small values.
The human visual system is much more sensitive to low frequency components than to high frequency components. Therefore, the high frequency components can be represented with less accuracy and fewer bits, without much noticeable quality degradation. Accordingly, a quantizer quantizes the 8×8 of frequency coefficients where the high frequency components are quantized using much bigger and hence much coarser quantization steps. The quantized matrix generally contains non-zero values in mostly lower frequency coefficients. Thus the encoding process for the basic 8×8 block works to make most of the coefficients in the matrix prior to run-level coding zero so that maximum compression is achieved. Different types of scanning are used so that the low frequency components are grouped together.
The scanning scheme varies depending on the compression standard that is used. For example, MPEG-2 uses one type of scanning for progressive pictures and another scanning for interlaced pictures. MPEG-4 uses three types of scanning schemes. Other standards, such as DV-25, may use another type of scanning.
After the scan, the matrix is represented efficiently using run-length coding with Huffman Variable Length Codes (VLC). Each run-level VLC specifies the number of zeroes preceding a non-zero frequency coefficient. The “run” value indicates the number of zeroes and the “level” value is the magnitude of the non-zero frequency coefficient following the zeroes. After all non-zero coefficients are exhausted, an end-of-block (EOB) is transmitted in the bit-stream.
Operations at the decoder happen in opposite order. The decoder decodes the Huffman symbols first, followed by inverse scanning, inverse quantization and IDCT. An inverse scanner inverses the scanning. However, the content received by the decoder can be scanned according to one of several different scanning schemes.
Additional parallel inverse scanners can support each additional scanning scheme. However, the foregoing would add considerable hardware or firmware to the decoder.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.
Presented herein is a unified architecture for inverse scanning according to a plurality of scanning schemes.
In one embodiment, there is presented a method for decoding video data. The method comprises receiving frequency coefficients; determining a scanning scheme associated with the frequency coefficients; receiving scaling factors associated with the frequency coefficients; ordering the scaling factors according to a first scanning scheme, wherein the scanning scheme associated with the frequency coefficients is the first scanning scheme; and ordering the scaling factors according to a second scanning scheme, wherein the scanning scheme associated with the frequency coefficients is the second scanning scheme.
In another embodiment, there is presented a circuit for decoding video data. The circuit comprises a processor and a memory. The memory is connected to the processor, and stores a plurality of instructions executable by the processor. The plurality of instructions are for receiving frequency coefficients; determining a scanning scheme associated with the frequency coefficients; receiving scaling factors associated with the frequency coefficients; ordering the scaling factors according to a first scanning scheme, wherein the scanning scheme associated with the frequency coefficients is the first scanning scheme; and ordering the scaling factors according to a second scanning scheme, wherein the scanning scheme associated with the frequency coefficients is the second scanning scheme.
In another embodiment, there is presented a decoder for decoding video data. The decoder comprises a VLC decoder and a circuit. The VLC decoder provides frequency coefficients. The circuit determines a scanning scheme associated with the frequency coefficients; receives scaling factors associated with the frequency coefficients; orders the scaling factors according to a first scanning scheme, wherein the scanning scheme associated with the frequency coefficients is the first scanning scheme; and orders the scaling factors according to a second scanning scheme, wherein the scanning scheme associated with the frequency coefficients is the second scanning scheme.
These and other advantages and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.
Referring now to
The pictures 310 can be considered as snapshots in time of moving objects. With pictures 310 occurring closely in time, it is possible to represent the content of one picture 310 based on the content of another picture 310, and information regarding the motion of the objects between the pictures 310.
Accordingly, blocks 320 of one picture 310 (a predicted frame) are predicted by searching segment 320 of a reference frame 310 and selecting the segment 320 in the reference frame most similar to the segment 320 in the predicted frame. A motion vector indicates the spatial displacement between the segment 320 in the predicted frame (predicted segment) and the segment 320 in the reference frame (reference segment). The difference between the pixels in the predicted segment 320 and the pixels in the reference segment 320 is represented by an 8×8 matrix known as the prediction error 322. The predicted segment 320 can be represented by the prediction error 322, and the motion vector.
In MPEG-2, the frames 310 can be represented based on the content of a previous frame 310, based on the content of a previous frame and a future frame, or not based on the content of another frame. In the case of segments 320 in frames not predicted from other frames, the pixels from the segment 320 are transformed to the frequency domain using DCT, thereby resulting in a DCT matrix 324. For predicted segments 320, the prediction error matrix is converted to the frequency domain using DCT, thereby resulting in a DCT matrix 324.
The segment 320 is small enough so that most of the pixels are similar, thereby resulting in high frequency coefficients of smaller magnitude than low frequency components. In a predicted segment 320, the prediction error matrix is likely to have low and fairly consistent magnitudes. Accordingly, the higher frequency coefficients are also likely to be small or zero. Therefore, high frequency components can be represented with less accuracy and fewer bits without noticeable quality degradation.
The coefficients of the DCT matrix 324 are quantized, using a higher number of bits to encode the lower frequency coefficients 324 and fewer bits to encode the higher frequency coefficients 324. The fewer bits for encoding the higher frequency coefficients 324 cause many of the higher frequency coefficients 324 to be encoded as zero. The foregoing results in a quantized matrix 325 and a set of scale factors.
As noted above, the higher frequency coefficients in the quantized matrix 325 are more likely to contain zero value. In the quantized frequency components 325, the lower frequency coefficients are concentrated towards the upper left of the quantized matrix 325, while the higher frequency coefficients 325 are concentrated towards the lower right of the quantized matrix 325. In order to concentrate the non-zero frequency coefficients, the quantized frequency coefficients 325 are scanned according to a scanning scheme, thereby forming a serial scanned data structure 330.
The serial scanned data structure 330 is encoded using variable length coding, thereby resulting in blocks 335. The VLC specifies the number of zeroes preceding a non-zero frequency coefficient. A “run” value indicates the number of zeroes and a “level” value is the magnitude of the nonzero frequency component following the zeroes. After all non-zero coefficients are exhausted, an end-of-block signal (EOB) indicates the end of the block 335.
Referring now to
The positions in the matrices indicate increments in the horizontal and vertical frequency components, wherein left and top correspond to the lowest frequency components. The number in the matrices indicate the scanning order for the frequency coefficient thereat.
Continuing to
Blocks 335 representing a frame are grouped into different slice groups 340. In MPEG-1, MPEG-2 and MPEG4 each slice group 340 contains contiguous blocks 335. The slice group 340 includes the macroblocks representing each block 335 in the slice group 340, as well as additional parameters describing the slice group. Each of the slice groups 340 forming the frame form the data portion of a picture structure 345. The picture 345 includes the slice groups 340 as well as additional parameters. The pictures are then grouped together as a group of pictures 350. Generally, a group of pictures includes pictures representing reference frames (reference pictures), and predicted frames (predicted pictures) wherein all of the predicted pictures can be predicted from the reference pictures and other predicted pictures in the group of pictures 350. The group of pictures 350 also includes additional parameters. Groups of pictures are then stored, forming what is known as a video elementary stream 355.
The video elementary stream 355 is then packetized to form a packetized elementary sequence 360. Each packet is then associated with a transport header 365a, forming what are known as transport packets 365b.
Referring now to
Referring now to
An inverse quantizer/inverse scanner (IQ/IZ) 520 provides dequantized frequency coefficients, associated with the appropriate frequencies to the IDCT function 530. As noted, the frequency coefficients can be scanned according to any one of a number of different scanning schemes. The particular scanning scheme used can be determined based on the type of picture and type of compression used. For example, if the compression standard MPEG-2, and the pictures are progressive, then the scanning scheme used is scanning scheme 205. If the compression standard is DV-25, then the scanning scheme used is scanning scheme 210. If the compression standard is MPEG-2 and the pictures are interlaced, then the scanning scheme used is scanning scheme 210.
Accordingly, depending on the particular scanning scheme 205 or 210, the IQ/IZ 520 creates a data structure with the scale factors. Each of the scale factors are associated with a particular one of the quantized frequency coefficients. In the data structure created by the IQ/IZ 520, the scale factors for the quantized frequency coefficients are ordered according to the scanning scheme used for scanning the frequency coefficients. The frequency coefficients are then multiplied by the data structure in dot product fashion.
For example, where the quantized frequency coefficients are B00, B01, . . . , B07, B10, B11, . . . , B17, . . . B70, B71, . . . , B77, the scale factors are S00, S01, . . . , S07, S10, S11, . . . , S17, . . . S70, S71, . . . , S77, and scanning scheme 205 is used, the quantized frequency coefficients are received in the following order (top, left is first/bottom, right is last):
Accordingly, the IQ/IZ 520 orders the scale factors as:
The quantized frequency coefficients are then multiplied by the scale factors in dot-product fashion, resulting in:
In another example, where the quantized frequency coefficients are B00, B01, . . . , B07, B10, B11, . . . , B17, . . . B70, B71, . . . , B77, the scale factors are S00, S01, . . . , S07, S10, S11, . . . , S17, . . . S70, S71, . . . , S77, and scanning scheme 210 is used, the quantized frequency coefficients are received in the following order (top, left is first/bottom, right is last):
Accordingly, the IQ/IZ 520 orders the scale factors as:
The quantized frequency coefficients are then multiplied by the scale factors in dot-product fashion resulting in:
The foregoing results in dequantized frequency coefficients. The dequantized frequency coefficients are then provided to the IDCT function 530. Where the block decoded corresponds to a reference frame, the output of the IDCT is the pixels forming a segment 320 of the frame. The IDCT provides the pixels in a reference frame 310 to a reference frame buffer 540. The reference frame buffer combines the decoded blocks 535 to reconstruct a frame 310. The frames stored in the frame buffer 540 are provided to the display engine.
Where the block 335 decoded corresponds to a predicted frame 310, the output of the IDCT is the prediction error with respect to a segment 320 in a reference frame(s) 310. The IDCT provides the prediction error to the motion compensation stage 550. The motion compensation stage 550 also receives the motion vector(s) from the parameter decoder 516. The motion compensation stage 550 uses the motion vector(s) to select the appropriate segments 320 blocks from the reference frames 310 stored in the reference frame buffer 540. The segments 320 from the reference picture(s), offset by the prediction error, yield the pixel content associated with the predicted segment 320. Accordingly, the motion compensation stage 550 offsets the segments 320 from the reference block(s) with the prediction error, and outputs the pixels associated of the predicted segment 320. The motion compensation 550 stage provides the pixels from the predicted block to another frame buffer 540. Additionally, some predicted frames are reference frames for other predicted frames. In the case where the block is associated with a predicted frame that is a reference frame for other predicted frames, the decoded block is stored in a reference frame buffer 540.
The embodiments described herein may be implemented as a board level product, as a single chip, application specific integrated circuit (ASIC), or with varying levels of the decoder system integrated with other portions of the system as separate components. The degree of integration of the decoder system will primarily be determined by the speed and cost considerations. Because of the sophisticated nature of modern processor, it is possible to utilize a commercially available processor, which may be implemented external to an ASIC implementation. Alternatively, if the processor is available as an ASIC core or logic block, then the commercially available processor can be implemented as part of an ASIC device wherein certain functions can be implemented in firmware.
While the invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt particular situation or material to the teachings of the invention without departing from its scope. Therefore, it is intended that the invention not be limited to the particular embodiment(s) disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.