The present patent application relates to the field of video decoders and in particular to video decoders having a simplified memory access profile.
In the field of digital video the most prevalent picture coding types are: I-pictures (intra-coded pictures) which are coded without reference to any other pictures and are often referred to as reference or anchor frames; P-pictures (predictive-coded pictures) which are coded using motion-compensated prediction from the past I- or P-reference picture, and may also be considered reference or anchor frames; and B-pictures (bidirectionally predictive-coded pictures) which are coded using motion compensation from a previous (backward) and a future (forward) I- or P-picture. These picture types will sometimes also be referred to as I, P or B frames.
A compression standard referred to as MPEG (Moving Pictures Experts Group) compression is a set of methods for compression and decompression of full motion video images which uses the frame compression techniques described above. MPEG compression uses both motion compensation and discrete cosine transform (DCT) processes, among others, and can yield very high compression ratios. For better understanding of this compression standard, reference is made to “Digital Video: An Introduction to MPEG-2” by Barry G. Haskell, Atul Puri, and Arun N. Netravli, published by Chapman & Hall in 1997.
Presently most video decoders, such as MPEG-2 decoders, use external memory to create video frames from P-pictures and B-pictures by vector-controlled prediction to previously stored reference frames. This external memory is most likely DRAM-based because those represent the mainstream market of stand-alone memory devices. DRAM-based memories provide a burst-access mode to obtain a high bandwidth performance. This means that a number of consecutive data words (burst) are transferred to or from memory by giving only a single read or write command. To exploit the available data bandwidth the read and write accesses have to be burst-oriented. DRAM-based memories tends to have efficient memory transfers for large-size bursts only.
A first disadvantage is that vector-controlled prediction requires random positioned block-based access to one or more reference frames in memory. The efficiency for such access to and from DRAM-based memories is rather low. A second disadvantage is the video content dependent dynamics in required memory access bandwidth for reconstructing vector predicted frames.
Although many digital systems use MPEG-2 as compression standard there is a market differentiation between so-called main-level and high-level systems. Not only the encoder but also the decoder implementations for both systems are quite different. The difference in processing speed and memory requirement is a factor five to six. Another fast upcoming market differentiation is between systems (on chip) that can do single high-level decoding and double high-level decode. In the case of dual high-level MPEG-2 decoding, one or more state of the art MPEG-2 decoders will claim considerable system resources like in particular memory bandwidth to external memory and memory footprint for reference frame storage.
Due to improvements in mainstream CMOS performance, the high decoding speed has not resulted in a six times larger decoding block for high-level systems. Unfortunately, the memory requirements scale linearly in both access bandwidth and capacity thus having a bigger impact on the decoder architecture. Especially the difference in access bandwidth will imply a different approach in the case of external memories. This is complicated even further if the external memory has to be shared with other clients like CPU, scaler, graphics accelerators, image composition processors, etc. Memory resource sharing with other clients is a typical situation when the MPEG decoder is part of a system-on-chip, which uses unified external memory.
Previously known patent publication U.S. Pat. No. 6,088,391 relates to a memory system for B frames of pixel data, where each B frame includes a plurality of sections, and where each of the plurality of sections includes pixel data corresponding to the top and bottom fields of a frame. The memory system includes a memory organized into a plurality of segments for storing the pixel data, where the number of segments equals the number of frame sections plus two additional segments. However, each of the segments is half the size of a frame section. The memory system also includes a segmentation device for receiving and separating pixel data according to the top and bottom fields of each frame. The segmentation device tracks the segments to determine two available segments of said memory, and for each section of each frame, stores pixel data from the top field into one of the available segments and stores pixel data from the bottom field into the other available segment of the memory. A segment pointer table is preferably included for tracking the segments of memory for interlaced display. A decoder system includes the memory and the segmentation device, and also includes a reconstruction unit for receiving and decoding video data into pixel data, and display circuitry for retrieving pixel data from the segments. A method of storing and retrieving pixel data includes steps of separating and storing the pixel data by field into respective segments. After half a frame store, the data is retrieved by a display device for interlaced display.
A drawback of the above described decoder system and method according to U.S. Pat. No. 6,088,391 is that it only allows for partial reduction of memory size requirements but does not reduce memory bandwidth requirements, does not simplify the memory access profile, and does not reduce the dynamics in required memory access bandwidth.
Accordingly, there is a need for a video decoder and associated method implemented thereby, through which the memory access profile is simplified, the dynamics in memory access bandwidth is reduced, and further reduction of memory size requirements and memory access bandwidth can be achieved.
Taking the above into mind, it is an object of the present invention to provide an improved video decoder having an integrated memory buffer in combination with data compression and decompression, by which a simple access profile to external memory and low and fully deterministic memory access bandwidth to external memory can be achieved independent of video content.
This object is achieved in accordance with the characterizing portion of claim 1.
Thanks to the provision of means for compressing reference frame data using a scalable compression method; buffer means for intermediate storing of at least the vertical aperture (range) of motion vectors plus one row (slice) of macro blocks in lines of video per reference frame; means for decompressing reference frame data for enabling said means for motion compensation (MC) to reconstruct vector predicted pictures and macro blocks utilizing said decompressed reference frame data, both the size of reference frames to be stored and the memory access bandwidth requirements are reduced.
A further object of the present invention is to provide a method for simplifying the memory access profile and reducing the memory access bandwidth in a video decoder having an integrated memory buffer in combination with data compression and decompression, by which a simple access profile to external memory and low and fully deterministic memory access bandwidth to external memory can be achieved independent of video content.
This object is achieved in accordance with the characterizing portion of claim 18.
Thanks to the provision of steps for: variable length decoding (VLD) of compressed video data; inverse scan, inverse quantization, and Inverse Discrete Cosine Transformation (IDCT) decoding of intra coded pictures, intra coded macro blocks, and intra coded delta information; motion compensation for decoding vector predicted pictures and macro blocks; combining decoded intra-coded macro blocks, decoded intra-coded delta information, and motion compensated vector predicted macro blocks into reference frame or output frame data; compressing reference frame data using a scalable compression method; intermediately storing of at least the vertical aperture (range) of motion vectors plus one row (slice) of macro blocks in lines of video per reference frame in buffer means; decompressing reference frame data for enabling said means for motion compensation (MC) to reconstruct vector predicted pictures and macro blocks utilizing said decompressed reference frame data; outputting decoded picture data, both the size of reference frames to be stored and the memory access bandwidth requirements are reduced.
Preferred embodiments are listed in the dependent claims.
In the drawings, wherein like reference characters denote similar elements throughout the several views:
Still other objects and features of the present invention will become apparent from the following detailed description considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits of the invention, for which reference should be made to the appended claims. It should be further understood that the drawings are not necessarily drawn to scale and that, unless otherwise indicated, they are merely intended to conceptually illustrate the structures and procedures described herein.
In order to simplify the external memory access profile and to remove the dynamics in required external memory access bandwidth by a video decoder, in accordance with the present invention it is proposed to integrate memory. This integrated memory is used as a buffer 8, which is accessed in a first-in, first-out (FIFO) mode when video data is transferred from the external memory 9 into the buffer, and is accessed block-based by for instance a pre-fetch unit of the device that constructs predicted frames by motion vectors in the video decoder. The function of the buffer 8 is to hide complex (vector controlled) memory access profiles and dynamics in memory access bandwidth from the external memory 9.
Actually, the buffer implements a FIFO per reference frame. Hence, in the case of a MPEG-2 decoder the buffer 8 will contain maximal two FIFO's. The preferred granularity of a FIFO element in the buffer 8 in FIFO mode is one slice, which is one row of macro blocks. It is assumed that one slice spans the full horizontal range of the picture. This assumption is not mend to be restrictive. Note that in practice the transfer of one slice (i.e. one FIFO element) takes several efficient burst accesses from external memory 9. A preferred further optimization is that one FIFO element expressed in number of bytes equals exactly the number of bytes that is acquired by an integer number of burst accesses from external memory 9.
An advantageous approach for the run-out situation of the present reference frame in the FIFO 8 is to have video data of the next required reference frame already in place in the FIFO 8 when the decoding starts of the next vector predicted picture. This can be achieved by loading the first slice of a next required reference frame into the FIFO 8 when the present slice of the vector predicted picture has been decoded and the next slice is still to be decoded as shown in
In the case of MPEG-1 and MPEG-2 the buffer size, in bits, should be larger or equal to:
(the vertical range of a motion vector+one row of macro blocks, which spans the full horizontal size of the picture)×the maximum number of pixels per line×the maximum number of reference frames×the number of bytes per pixel×the number of bits per byte when assuming that the video in the buffer 8 is uncompressed.
A single high level MPEG-2 decoder for ATSC has a vertical range of a motion vector of 256, requires 16 lines for a macro block row, has a maximum of 1920 pixels per line, has maximal 2 reference frames, 1.5 bytes per pixel, and obvious 8 bits per byte. Hence, about 13 Mbit of buffer memory has to be integrated when no data compression is applied for a single-high level MPEG-2 decoder. This 13 Mbit of memory can be integrated with a high speed MPEG decoding pipe in one block. Such a block can cope with main level decoding in 50/60 Hz without requiring external memory 9. In the case of high-level decoding the buffer 8 is used for vector-controlled prediction, which is the most access intensive operation. The missing memory capacity should be added externally but requires only a very simple, minimal bandwidth, interface. The output of the decoder must, in both cases, be fed via en external display memory 13 to the output during which stage it can be mixed with graphics and other video streams. In some main-level systems the display memory 13 can even be fully omitted.
Due to the simple access profile to external memory 9, it is suggested in accordance with the present invention to add a block based memory compression algorithm for compression and decompression to and from the external memory 9. Any block based memory compression algorithm can be added. However, scalable compression algorithms are preferred, e.g. such as described in WO 0117268 A1, the teachings of which are hereby incorporated by reference.
A video decoder in accordance with a first embodiment is illustrated schematically in
Such a state of the art MPEG-2 decoder would require a maximum theoretical rate from the external prediction memory 9 that is 200% the video rate. E.g. high-definition video with formats 1920×1080 interlace at 60 Hz has a net rate (no blanking) of about 62.2 Mpixel/s, which is about 93.3 Mbyte/s (assuming YUV 4:2:0 format). Hence, in this case a state of the art high-level MPEG-2 decoder requires in theory a maximum of 187 Mbyte/s memory access bandwidth. However, systems-on-chip have to use a much worse case figure due to the complicated memory access profile and SDRAM being efficient for large packets only.
In accordance with the present invention a video data compressor 6, a video data decompressor 7, and a buffer 8 have been added to the state of the art decoder, as illustrated in
The size of the buffer 8 can in one example be equal to (2×128+16)×1920×1.5×2×8=12.53376×106 bit for decoding plus preferably 16×1920×1.5×8≈0.4×106 bit for line-to-line conversion buffering when a scaler 14 is integrated, i.e. in total approximately 13 Mbit. The access profile to external memory 9 is very simple. However, the size of the reference frames has been reduced by the compression ratio as well as the memory access bandwidth, as is illustrated in
In a second embodiment, according to
Another advantage provided by the present invention is that, in using a buffer 8 in combination with compression and decompression, reference frames for P-pictures and B-pictures can be given different compression ratios. P-pictures require in principle less impaired and hence less compressed reference frames than B-pictures because of consecutive prediction of P-pictures and therefore the risk of cumulating errors due to the compression. E.g., the required buffer size for a high-level MPEG-2 decoder is reduced from about 13 Mbit to about 3 Mbit when a 2:1 compressed reference frame is used for reconstructing P-pictures and two 4:1 compressed reference frames are used for reconstructing B-pictures. The advantage of using scalable compression is that 2:1 compressed reference frames have to be stored in memory only. A scalable compression method makes it very easy to obtain the required 4:1 compressed reference frames directly from the 2:1 compressed reference frames, which feature is known to those skilled in the art. E.g., the 2:1 compressed reference frame can be split into two half planes. The first containing most significant data, which represents 4:1 compression ratio and the second containing least significant data, which combined with the first represents 2:1 compressed reference frame. It is obvious to those skilled in the art that more levels of hierarchy can be introduced or other factors than two implemented.
Thus, the decoder in accordance with the present invention can decode double high-level MPEG-2, single high-level MPEG-2, and at least dual main-level MPEG-2 at relative low memory access bandwidth and easy access profile to the external memory. The embodiment of using less compressed reference frames for recursive used vector predicted pictures (e.g. P-pictures) than for non-recursive used vector predicted pictures (e.g. B-pictures) could also be used without buffer (8). The advantage is a potential reduction in memory access bandwidth and no need for an integrated buffer (8). However, the disadvantage is that the memory access profile to external memory is not simplified.
Another advantage of this second embodiment is that a reference frame for a P-picture can be compressed at a smaller compression ratio than reference frames for B-pictures since only one reference frame is needed. With the same amount of memory reference frames for P-pictures can have up to a factor two less compression than reference frames for B-pictures.
The preferred general concept of the second embodiment is illustrated in
A similar calculation results in 2.1 Mbit buffer memory when having compression ratios of 6:1 and 3:1 respectively for B-pictures and P-pictures.
The present invention further relates to a method for simplifying the memory access profile and reducing the dynamics in memory access bandwidth to reference frame memory in a video decoder comprising the steps of: variable length decoding (VLD) of compressed video data; inverse scan, inverse quantization, and Inverse Discrete Cosine Transformation (IDCT) decoding of intra coded pictures, intra coded macro blocks, and intra coded delta information; motion compensation for decoding vector predicted pictures and macro blocks; combining decoded intra-coded macro blocks, decoded intra-coded delta information, and motion compensated vector predicted macro blocks into reference frame or output frame data; compressing reference frame data using a scalable compression method; intermediately storing at least the vertical aperture (range) of motion vectors plus one row (slice) of macro blocks in lines of video per reference frame in buffer means; decompressing reference frame data for enabling said means for motion compensation (MC) to reconstruct vector predicted pictures and macro blocks utilizing said decompressed reference frame data; outputting decoded picture data.
In one embodiment the above method further comprises the steps of: storing said compressed reference frame data in external memory means; retrieving said compressed reference frame data from said external memory means; decompressing said retrieved reference frame data; intermediately storing in said buffer means said decompressed reference frame data; reconstructing vector predicted pictures and macro blocks utilizing said decompressed reference frame data.
In an alternative embodiment the above method further comprises the steps of: storing said compressed reference frame data in external memory means; retrieving said compressed reference frame data from said external memory means; intermediately storing in said buffer means said compressed reference frame data; decompressing said intermediately stored reference frame data; reconstructing vector predicted pictures and macro blocks utilizing said decompressed reference frame data.
In an alternative embodiment the above method further comprises the steps of: compressing a reference frame at a first compression ratio and a second compression ratio; reconstructing vector predicted pictures to be used as reference frames by reference frames compressed at said first compression ratio and vector predicted pictures that are not to be used as reference frames by reference frames compressed at said second compression ratio.
In yet an alternative embodiment the above method further comprises the steps of: compressing a reference frame at a first compression ratio and a second compression ratio; reconstructing P-pictures by reference frames compressed at said first compression ratio and B-pictures by reference frames compressed at said second compression ratio.
In still an alternative embodiment of the above method said first compression ratio is less than or equal to said second compression ratio.
In another alternative embodiment of the above method said first compression ratio is half of said second compression ratio.
In still another alternative embodiment of the above method said first compression ratio is 2:1 and said second compression ratio is 4:1.
In yet still another alternative embodiment of the above method said first compression ratio is 3:1 and said second compression ratio is 6:1.
In a still further alternative embodiment of the above method said first compression ratio is 4:1 and said second compression ratio is 8:1.
In yet another alternative embodiment the above method further comprises the steps of: deriving data for a reference frame compressed with said second compression ratio directly from data for the same reference frame being compressed with said first compression ratio; intermediately storing in said external memory means only data for said reference frame at said first compression ratio.
In still yet another alternative embodiment the above method further comprises the step of: intermediately storing in said external memory means said compressed reference frame data for said reference frame being compressed with said first compression ratio hierarchically such that a first sub-image stored will contain most significant data representing the same reference frame compressed with a said second compression ratio being larger than said first compression ratio and a second sub-image such that it will contain least significant data, such that both sub-images together represent data for a reference frame compressed with said first compression ratio.
In a further alternative embodiment of the above method said second compression ratio is twice said first compression ratio.
In yet still yet another alternative embodiment the above method further comprises the step of: intermediately storing in said external memory means said compressed reference frame data for said reference frame being compressed with said first compression ratio hierarchically such that a first sub-image stored will contain most significant data representing the same reference frame compressed with a said second compression ratio being larger than said first compression ratio and a second sub-image such that it will contain least significant data, such that both sub-images together represent data for a reference frame compressed with said first compression ratio.
In yet another alternative embodiment the above method further comprises said buffer means being an integrated memory buffer.
Thus, while there have been shown and described and pointed out fundamental novel features of the invention as applied to a preferred embodiment thereof, it will be understood that various omissions and substitutions and changes in the form and details of the devices illustrated, and in their operation, may be made by those skilled in the art without departing from the spirit of the invention. For example, it is expressly intended that all combinations of those elements and/or method steps which perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Moreover, it should be recognized that structures and/or elements and/or method steps shown and/or described in connection with any disclosed form or embodiment of the invention may be incorporated in any other disclosed or described or suggested form or embodiment as a general matter of design choice. It is the intention, therefore, to be limited only as indicated by the scope of the claims appended hereto.
Number | Date | Country | Kind |
---|---|---|---|
04100926.7 | Mar 2004 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB05/50650 | 2/23/2005 | WO | 9/8/2006 |