The invention is directed to a novel system and method to compress video data in frame buffers within memory, such as in a Dynamic Random Access Memory (DRAM), or other external memory, which is used in DVD players and other related video products.
When decoding video frames for MPEG standards 1, 2 or 4, or other video coding schemes, some current input frames or previous decoded frames need to be written to or read from storage spaces within external memory. These act as frame buffers for storing input frames and previously decoded frames from different modules for motion compensation or visual display. These frame buffers occupy a great deal of storage space within the external memory and also take up a large amount of bandwidth in the transmission of video data. Thus, to reduce memory cost, it is desirable to adopt frame buffer compression processes. In conventional systems, the motion compensation process requires random access frame data. As a result, conventional video coding schemes, such as MPEG schemes, can not be used. For some schemes using one dimensional or two dimensional transform techniques, the actual component implementations are either expensive or suffer from long processing latencies. In either case, conventional approaches require complicated algorithms.
Therefore, there exists in the art a more effective buffering scheme to overcome the shortcomings of the prior art. As will be seen, the invention accomplishes this in a novel manner.
The invention is directed to a system and method for encoding and compressing video data. The system includes a memory device configured to store video data and a corresponding a memory controller configured to control the storage of video data in the memory device. The system further includes a frame buffer compression module configured to compress frame data received from a video module to be stored in the memory device according to the memory controller and configured to decompress compressed frame data received from the memory device according to the memory controller for use by a video module. In one embodiment, the frame buffer compression module includes a frame buffer compression encoder configured to encode and compress frame data received from a video module for storage in memory according to the memory controller. The frame buffer also includes a corresponding frame buffer compression decoder configured to decode and decompress frame data received from memory according to the memory controller for use by a video module.
1. The Invention
The invention is directed to a novel buffer compression system, where two embodiments are described below. It will be understood by those skilled in the art, however, that the spirit and scope of the invention is not limited to the implementations described herein, but are defined in the appended claims and their equivalents and future claims in subsequent applications and their equivalents.
In a preferred embodiment, frame data is compressed in segments, and the frame buffer encoder further includes a quantizer configured to quantize an input frame segment to generate a quantized output; a DPCM configured to modulate the quantized output to generate a modulated output; a rice mapping module configured to perform rice mapping on the modulated output to generate a mapped output; and a variable length coding module (VLC) configured to encode the mapped output. The invention may further include a bit budget module configured to test whether a compressed segment is within a predetermined limit and feedback loop configured to select mode parameters for the quantizer and the VLC. The invention may further include a packing module configured to prepare package including a compressed data segment if the segment is compressed within the predetermined limit and feedback loop configured to select mode parameters for the quantizer and the VLC if the segment is not compressed within the predetermined limit. The invention may further include a worst case mode module configured to compress the segment if it is not within the predetermined limit wherein the packing unit is configured to prepare and generate a package having the worst case compressed segment and mode information.
The frame buffer encoder further includes a smoothing module configured to perform a smoothing operation on an input pixel segment; a modified rice mapping component within the rice module configured to perform modified rice mapping on the modulated output to generate a mapped output; a bit borrowing module configured to share bit space among compressed segments to be transmitted; and a toggle module configured to perform a toggle operation to change a portion of the input pixel segments by toggling the bits that represent the segments. The toggle module may be configured to toggle the bits of every other frame for the same location.
On the decoder side of the system, the frame data along with mode information that identifies the mode in which the segments are compressed and encoded is decoded and decompressed in segments. The decoder may include an inverse variable length decoding module configured to decode the mapped output; an inverse rice mapping module configured to perform inverse rice mapping on the inverse modulated output to generate a mapped output; an inverse DPCM configured to inverse modulate the inverse quantized output to generate a inverse modulated output; and an inverse quantizer configured to inverse quantize an input frame segment to generate an inverse quantized output. The unpacking module is configured to unpack a received packet packet including the compressed data segment and mode information, and a feed-forward loop configured to send mode parameters for the quantizer and the VLC. The frame buffer decoder may further include an inverse bit borrowing module configured to share bit space among compressed segments to be transmitted; an inverse modified rice mapping component within the rice module configured to perform modified rice mapping on the modulated output to generate a mapped output; and an inverse smoothing module configured to perform a smoothing operation on an input pixel segment.
In one embodiment, the unpacking module may be configured to unpack a received packet including the compressed data segment and mode information, and a feed-forward loop configured to send the compression mode parameters for the quantizer and the VLC. In another embodiment, it is configured to unpack and feed forward mode information for the smoothing module, quantizer and the VLC. In either case it is configured to unpack worst case mode parameters configured to decode any received compressed data that was packed according to a worst case mode.
The bit borrowing module may be configured to maintain a pool of available bit space from previously compressed segments for use to store bits that represent subsequent segments, and possibly up to the limit of the bit space required for the previous segment for use to store bits that represent subsequent segments.
The rice module may be configured to perform a modified rice mapping on the modulated output to generate a mapped output that represents the values of a segment that is skewed from a rice mapping center point. A segment may be initially mapped using rice normal rice mapping beginning with a center point until an end of the segment is reached and then maps the remainder of the segment in a consecutive manner to generate a mapped output that represents the values of a segment that is skewed from a rice mapping center point.
The smoothing module may be configured to perform a smoothing operation on an input pixel segment by averaging the values of a plurality of segments prior to compressing and decoding the plurality of segments. The smoothing process may include transmitting information that a plurality of segments were compressed and encoded according smoothing mode to a decoder so that the segment can be accurately decoded. The smoothing process includes transmitting information that a plurality of segments were compressed and encoded according smoothing mode to a decoder so that the segment can be accurately decoded.
The toggle module may be configured to perform a toggle operation to change a portion of the input pixel segments by toggling the bits that represent the segments. The toggle module may be configured toggle the bits of every other frame for the same location.
In operation, the system configured according to the invention may begin with first receiving write request and video frame data from a video module to store video data into memory. In response, the system compresses and encodes a frame segment of the data received from the video module and stores the compressed and encoded segment in a memory device according to a memory controller. On the decoder side, the system can receive receive a read request from a video module, then decompress and decode segments of frame data received from the memory device according to the read request from the video module, then send the decompressed segments of frame data to the module. Compressing the segments may include encoding and compressing segments of frame data received from a video module with a frame buffer compression encoder for storage in memory according to a frame memory controller. Decompressing may include decoding and decompressing segments of frame data received from memory with a frame buffer compression encoder according to a frame memory controller.
In one embodiment, the system may perform the method of encoding by quantizing an input frame segment to generate a quantized output; performing differential pulse code modulation (DPCM) of the quantized output to generate a modulated output; performing rice mapping on the modulated output to generate a mapped output; and performing variable length coding module (VLC) configured to encode the mapped output. Before sending a packaged segment, the system may first test for a predetermined bit limit by testing with a bit budget module whether a compressed segment is within a predetermined limit; and selecting mode parameters with a feedback loop for the quantizer and the VLC. If the segment is not within the bit limit, it may change the mode of one or more components within the encoding process, selecting mode parameters for the quantizer and the VLC if the segment is not compressed within the predetermined limit. If it is not within the predetermined limit, and if other modes are not able to bring the bit count below the bit limit, the segment may be compressed in a worst case mode, and a packaging unit may prepare and generate a package having the worst case compressed segment and mode information for use by the decoder.
In another embodiment, the encoder configured according to the invention may further enhance the system by performing a smoothing operation on an input pixel segment; performing modified rice mapping on the modulated output to generate a mapped output; and sharing bit space among compressed segments to be transmitted. In such a system, the packing module may then be configured to generate a packet including the compressed data segment and mode information if the segment is within the predetermined limit, where the mode parameters for the smoothing module, quantizer and the VLC are included. If not within the predetermined limit, the same package may be configured with the segment compressed under the worst case mode and include worst case parameters for decoding.
Upon receiving the packaged segment by the decoder, the system may be configured to process the segment by decoding the mapped output with an inverse variable length decoding method; performing an inverse rice mapping on the inverse modulated output to generate a mapped output; performing an inverse DPCM modulation on the inverse quantized output to generate a inverse modulated output; and performing an inverse quantization of an input frame segment to generate an inverse quantized output. The decoder may include an unpacking module configured to unpack a received packet including the compressed data segment and mode information, and sending mode parameters for the quantizer, the VLC, the smoothing module if one exists in a feed forward loop. The unpacking module may also include a worst case decoder module for decoding a segment encoded in the worst case mode if it is encoded in such a mode. At the decoder, the packet including the compressed data segment and mode information is unpacked, and the compression mode parameters for the smoothing module, quantizer and the VLC are fed forward for the decoding process. The unpacking module may further include unpacking worst case mode parameters configured to decode any received compressed data that was packed according to a worst case mode.
Among the different segments packaged, the packaged segments may share bit space among compressed segments to be transmitted. The sharing of the bit space includes maintaining pool of available bit space from previously compressed segments for use to store bits that represent subsequent segments. The sharing of the bit space further includes maintaining a pool of available bit space from previously compressed segments up to the bit space required for the previous segment for use to store bits that represent subsequent segments.
The rice mapping may further include performing a modified rice mapping on the modulated output to generate a mapped output that represents the values of a segment that is skewed from a rice mapping center point. This may be performed until an end of the segment is reached and then maps the remainder of the segment in a consecutive manner to generate a mapped output that represents the values of a segment that is skewed from a rice mapping center point. The method may be performed on pixel segments by averaging the values of a plurality of segments prior to compressing and decoding the plurality of segments.
Still referring to
In a more detailed embodiment, a system may be configured for a 2:1 compression ratio with segments of 16-pixel data, where each pixel is one byte. This embodiment is intended as an example of a specific embodiment of the invention, and is not intended as limiting to the invention in any way.
FIGS. 3(a) and 3(b) illustrates a block diagram of a system according to the invention that includes a FBC system in an encoder, 300 and decoder 320. The encoder 300 is configured to receive a video frame input, in this example a 16-pixel frame segment, into a quantizer 302.
Assuming an input segment is 16-pixel data be Sk={si, i ε I1}, where I1={0, 1, . . . , 15} and output compressed data be Ck, each pixel si is an 8 bit data segment. For a 2:1 compression ratio, the bit budget is 16×8/2=64 bits for the number of bits of Ck. In the embodiment illustrated in
If the number of coding bits is not greater than the bit budget, the coding bits of each si are packed properly and stored to DRAM. Otherwise, another mode is used with other parameters to encode the Sk. If even last mode fails to meet the bit budget, a worst-case mode is used to encode the Sk to meet the bit budget constraint. When decoding compressed data Ck, as in
Still referring to
Referring to
According to the invention, a method of quantization is provided to quantize a video data segment. Accordingly, the dynamic range can be adjusted at the quantization level, and the quantizes value can be represented in a smaller number of bits. To reduce the number of bits to encode the pixel data si of Sk, it can be quantized with a quantization step Qs defined as follows.
xi=int(si/Qs) (1)
where Xk={xi, i ε I1} is the quantization output and the function int (x) represents establishing an integer representation of x with a proper rounding. Since the dynamic range of data becomes smaller, a smaller number of bits can be used to represent the quantized value. Reducing the dynamic range has a consequence of a potential increase in quantization error, but the benefit is a reduced bit rate output for the quantizer, reducing the bandwidth required for transmission and further improving the compressibility of the data. For example, if the quantization step Qs=4, the value of xi becomes a 6-bit data representation with a dynamic range of 64.
In the decoding process, the reconstructed pixel value Sk′={si′, i ε I1} can be calculated by an inverse quantization process as
si′=xi×Qs (2)
It is important to note that there is no loss if Qs=1. To simplify the implementation, the values of powers of 2 can be used for Qs so that the division and multiplication in equations (1) and (2) above can be easily calculated by a bit shifting.
According to the invention, it has been observed that there is a correlation between neighborhood pixel values. Therefore, the dynamic range of most values can be further reduced by using a Differential Pulse Code Modulation (DPCM) coding that considers the difference between a current pixel value and a prior pixel value. For example according to one embodiment, the formula for values of y can be as follows:
yi=xi−xi−1 for i ε I1−{0} and y0=x0, (3)
where Yk={yi, i ε I1}. The reconstructed value Xk′={xi′, i ε I1} can be calculated by a DPCM decoding as
xi′=yi+x′i−1 for i ε I1−{0} and x0′=y0. (4)
Note that there is no loss for this process.
For the dynamic range, assume that xi ε [0, L−1]. Using Eq. (3), it can be shown that the range of DPCM output yi ε [−(L−1), L−1]. This means that the dynamic range becomes almost double. However, it has been observed that most values of yi concentrate in a region around the value of zero. For a typical data set, the distribution of yi follows a Laplacian distribution. This property leads the use of variable length coding, discussed below, to code yi effectively.
For the output value of DPCM, when encoding yi, the value can be positive or negative. It has been observed that the majority of the data values exist around the zero point. According to the invention, instead of encoding its magnitude and sign separately, Rice mapping is used for improving the coding performance. This is because the resulting values concentrate in a region around the zero value. Referring to
Zk={zi, i ε I2}, where I2={0, 1, . . . 2(L−1)} as
Where
zi=2|yi| for yi≧0; and
zi=2|yi|−1 for all other values. (5)
The reconstructed value of yi can be calculated by an inverse Rice mapping as
yi′=zi/2 for zi is an even number
yi′=−(zi+1)/2 for all other values. (6)
Since the values of DPCM with the Rice mapping concentrate in a small value region, variable length coding (VLC) can be used to compress the data effectively. To tradeoff the coding efficiency and implementation cost, the GR coding is adopted for VLC coding for its simplicity and its requiring of no code tables. Let “m” be the GR coding parameter which is powers of 2 as, m=2k. The GR coding of zi consists of an unary part and binary part. The unary part is formed as consecutive D zeros with a comma bit ‘1’, where D is the quotient of zi dividing by m. The binary part is just the last k bits of zi in a binary representation. For example, if zi=22 and m=4, it implies that k=2 and D=5. Then, the unary part is ‘000001’ with five consecutive zeros, indicating D=5. Since the binary representation of zi, 22=‘10110’, the binary part becomes ‘10’, where the last 2 bits of zi are used as the binary part of the number representation. Combining the unary and binary parts, the GR coding of zi for this example is ‘00000110’.
To decode the GR coding, the quotient of zi can be recovered by dividing by m. This is done by counting the number of zeros until hitting the comma bit ‘1’. Next, k bits are extracted from the comma bit as the binary part. The final decoding value is formed by multiplying the quotient with m and adding the result with the binary part.
To simplify the implementation for decoding, the invention provides a process for avoiding using a long unary during encoding. This is done by setting a threshold level at which the encoding process will exit the FBC system and select another mode for encoding. This value can be preset as a default limit where the FBC process is stopped. Thus, if the length of any unary in the above discussion is above some user-defined threshold value, such as 15 for example, the GR coding exits and the FBC system selects other mode. So, for example, a larger number to be encoded, such as 35, would have a larger number of bits for representation. If 15 is set for the default threshold for the failure of the FBC system, then 35 would be past the threshold level.
Two or more parameters may be selected for different modes in an implementation, and there is always a tradeoff between the coding distortion and efficiency. The modes exist are the quantization step Qs and the GR coding parameter m. There are many combinations for these selections. Theoretically, the more modes a system has, the better it can find a proper mode to encode the input 16-pixel values. However, there is a limit to the number of modes to be utilized in a system. This is because the compressed data is transmitted to a decoder system along with the mode information regarding the types and number of modes used to encode and compress the data. For example, in one embodiment used in practice, three bits at most are used for the mode information, therefore, at most eight modes may be used. Those skilled in the are will understand that there are such tradeoffs in different implementations, and the invention is directed to any such combinations and permutations of modes used for the encoding and compression process. In operation, the modes in which segments are compressed and encoded are identified, and information related to these modes are sent along with the compressed and encoded segments to the decoding and decompression process so that the segments are decoded and decompressed accurately.
For some cases, even all modes are tried, the number of output bits fails to meet the bit budget. In this case, a worst case mode is used. The input pixel values are quantized with minimum Qs values such that the number of total bits satisfies the bit budget constraint. Since the bits for indicating the mode selection should be included for the calculation, some pixel values are quantized more to cover the mode selection bits. To spread out the quantization error, these pixels are selected as evenly distributing among the input pixels. For example, for the 2:1 compression with 3-bit mode selection, pixel 3, 7 and 11 are quantized by 32 to become 3-bit data and the remaining pixel values are quantized by 16 to become 4-bit data. The total number bits is (3×3+13×4+3)=64 which equals to the bit budget.
To further improve the coding performance, the invention provides another embodiment, an enhanced system for performing frame buffer compression, and one implementation is depictured in FIGS. 4(a) and 4(b) with the FBC coding and decoding. There are four significant changes compared to the embodiment discussed above. Two modules of smoothing and borrow bit control are added, a novel Rice mapping operation is used and a scheme to toggle input segment value is proposed. The detail of these changes are discussed below. First, referring to
Next, referring to
For pixels at high frequency areas, the difference between pixels can be large. This means that the correlation between pixels is small. This leads to a large coding distortion using the conventional methods. According to another embodiment of the invention, in order to reduce the difference between pixels for this case, a novel smoothing filter is used. Let Fk={fi, i ε I1} be the output of the smoothing module. The smoothing process is as follows.
f0=s0
f1=(s0+s1)/2
fi=(si−2+si−1+2×si)/4 for i≧2 (7)
The reconstructed value of si can be calculated by an inverse smoothing filter as
s′0=f0
s′1=2×f0−s0
s′i=(4×fi−s′i−2−s′i−1)/2 for i≧2 (8)
According to the invention, a packing module that packages the compressed segment would send the compressed segment along with information of any smoothing mode operations so that the segment can be properly decoded when read from memory in response to a read request from a video module.
As discussed above in section above in Section 2.2, the dynamic range of DPCM output yi becomes almost double, comparing to that of the input quantized value xi; More particularly, if xi ε [0, L−1], then yi ε [−(L−1), L−1]. The process requires doubling the indexes for the Rice mapping process. However, when decoding the xi from yi, the value of xi−1 is already known. This reduces potential number of xi values. Given xi−1, it can be shown that yi ε [−xi−1, (L−1)−xi−1]. Thus, the dynamic range becomes the same for xi as that of L. This implies that the coding efficiency can be improved by a proper mapping to the index belonging to the range of [0, L−1]. Since, for a typical data value, yi concentrates in a region around the zero, satisfying with the Laplacian distribution, a system configured according to the invention is directed to modify the Rice mapping. Referring to
L=8 and xi−1=5.
For a better implementation, the DPCM process is combined with the modified Rice mapping. FIGS. 7(a) and 7(b) shows pseudo codes for the encoding and decoding process of this combined processing. Generally, those skilled in the art will mathematically and subjectively understand the function of the pseudo code.
The pseudo code DPCM_ModifiedRiceMapping(x,z,L) of
Referring to
Since some segments of a frame are easy to compress while some are not, the coding efficiency can be improved if a portion of bits can be borrowed from other segments that have a surplus of bit space, and use this surplus to encode segments that require more bit space to compress, and are thus difficult to compress. For simplicity, the following borrow bit control when coding the k-th segment Sk is represented by
BWk=BitsSavek−BitsKeepk (9)
BGk=BG0+BWk (10)
where BitsSavek is the number of saving bits in a pool up to Sk from previous segments. Thus, bit space from previous segments are reserved for use in future segments that are difficult to compress and therefore require extra bit space. BitsKeepk is the number of keeping bits for the future use so that all of the saving bits are not used up at once. Its value is a function of BitsSavek. This can be implemented in a look-up table. BWk is the number of borrowing bits while BGk is the bit budget for Sk. The BG0 is a normal bit budget for a segment. For 2:1 compression for example, BG0=64 bits. According to equations (9) and (10), the available number of bits for coding Sk is increased by borrowing some bits from the bit-saving pool, while the rest of the bits in the pool are kept for some future use. After coding a given Sk, BitsSavei is updated as follows.
BitsSave(k+1)=BitsKeepk+BGk−Bitsk (11)
where Bitsk is the number of bits for coding Sk.
To simplify the implementation, it is assumed that the current segment Sk will not borrow bits beyond the previous segment Sk−1 and the compress data of Sk putting in the data slot of Sk−1 in DRAM is attached at the end of that slot. This implies that if BitsSavek is greater than BG0, it is clipped to be BG0.
Furthermore, some bits are needed to indicate the number of borrowing bits for Sk so that the decoding process knows how to get the compressed data from the data slot of Sk−1 In one embodiment, to tradeoff this overhead with the efficiency of borrowing bits, four bits are used to represent the value of BWk with a 4-bit resolution so that the full 64-bit range of previous data slot can be identified.
For 2:1 compression ratio, the compressed data format of k-th 16-pixel segment Sk is shown in
The B[i] and U[i] are the binary and unary parts of i-th element zi for the GR coding of Zk={zi, i ε I1}, which stored continuously in the shading area of the figure. Note that there is no unary part U[0] for the first element z0. For the fields of mode, borrow bit, binary and unary parts, the bits are stored in a regular order as MSB first. For example, the mode bits of “100” means that the mode is 4. The B[0]=“000101” means that the value of zero-th data equals 5 for GR coding. The U[1]=“001” means that the unary part of first data for GR coding equals to 2. These compressed data is stored in DRAM as 32-bit words with increasing DRAM address. The Ck[63 . . . 32] is stored first as j-th word while the Ck[31 . . . 0] is stored in (j+1)-th word.
As discussed above, eight modes are used including the worst case mode to compress the segment. For one implementation, the mode parameters are selected according to Table 1 below. Note that the modes are arranged in an order of using less bits to compress while having more coding distortion, in general.
According to the invention, in the FBC systems, there is a loss for coding input segments except using mode 0. This loss will be accumulated when coding video using schemes with frame predictions. Fortunately, most schemes refresh the frame prediction for a short period, such as having one frame without prediction every 15 frames. This stops the error accumulation and makes the system robust. In the case that the refresh rate is not small, this accumulated error leads to a large coding distortion. This problem becomes more serious for the case that a segment does not change over time because the errors have the same sign. Otherwise, the errors can be cancelled out. According to the invention, in order to reduce the error accumulation problem, it is proposed to change an input segment Sk={si, i ε I1}, every other frame by subtracting it from the possible maximum value. Thus, for a 8-bit pixel data segment,
si″=255−si (12)
This subtraction is equivalent to toggling the bits of si between zero and one. According to this novel method, by this approach, it can be shown that this accumulation error reduces significantly. For an ideal case, the error can be cancelled out completely. In a preferred embodiment, for the decoding, it requires having the same toggle to recover the segment values. And, for the segment of the same location, toggling bits is performed every other frame. Within a frame, the toggling may be changed for different ways which follows a fixed pattern. The simplest pattern is that all segments of a frame is toggled in the same way.
Referring to
The invention has been described in the context of a system and method for compressing, encoding a video frame in segments for storage in memory, such as a DRAM, and correspondingly decompressing and decoding a video frame in segments according to the modes in which the segments were compressed and encoded. It will be understood by those skilled in the art, however that such systems and methods can be made useful in many other applications, and that the scope of invention or inventions described herein is not limited by the embodiments herein described, but is defined by the appended and future claims and their equivalents.