The present invention relates generally to image processing, and more specifically, to techniques for bandwidth efficient compression of images.
Images can be stored electronically in digital form as matrices of quantized values. Each matrix is a two-dimensional grid of individual picture elements or “pixels.” Each pixel has an integer value representing a color or grayscale tonal value on an integer-based gradient scale. For example, a single 16-bit pixel value represents one color picked from a palette consisting of 65,536 individual colors. The pixel values for each image are stored into a file representing the image rendered at a set dimension, such as 640×480 pixels.
In raw uncompressed form, the size of a digital image file increases dramatically with the size of the color palette and image dimensions. A richer color palette implies higher resolution, and requires more integer values or pixels. Similarly, a larger dimensioned image requires an increased number of pixels. If the images are part of a moving sequence of images, as in video, the storage requirements are multiplied by the number of frames. Further, the bandwidth requirements to transmit and display a video sequence are much higher than with images. It is often desirable to utilize data compression to reduce data storage and bandwidth requirements. Compression algorithms take advantage of redundancy in the image and the peculiarities of the human vision system to compress the size of a digital image file. The Moving Picture Experts Group (MPEG) file format is presently a commonly used format for compressing digital video. MPEG algorithms compress data to form smaller bit sizes that can be easily transmitted and then decompressed. MPEG achieves its high compression rate by storing only the changes from one frame to another, instead of each entire frame. The video information is then encoded using a technique called Discrete Cosine Transform (DCT).
Currently, digital images and video are being increasingly exchanged between interconnected networks of computer systems, including over the Internet, as well as between other computing devices such as personal data assistants (PDAs) and cellular phones. Conventionally, the ability to exchange data, including digital video, over a network, is limited by the network bandwidth available to each device. The bandwidth is affected by the capability of the network itself as well as by the means by which each client is interconnected. A slow modem connection, for instance, is a form of low bandwidth connection that can restrict the ability of an individual client to exchange data. A lower bandwidth means longer download times for larger file sizes. Low bandwidth is particularly problematic when receiving digital video as content embedded, for instance, in Web pages.
One solution to the low bandwidth problem is to recompress video that is already stored in a compressed format, such as the MPEG file format, to further conserve on space and bandwidth requirements. The MPEG file format, however, is a video compression file format that is mostly used in a “lossy” version, that is, a version that loses some amount of data upon compression. Therefore, successive recompressions will result in additional data loss and in the formation of visual artifacts which deteriorate the perceptual quality of a video image.
Therefore, there is a need for an approach to compressing video that provides adequate compression to reduce the bandwidth requirements of transmitting video, while minimizing the incidence of artifacts in compressed video images at the same time, so that such data can be efficiently transmitted and stored on the available mass storage devices.
The aforementioned and other embodiments of the present shall be described in greater depth in the drawings and detailed description provided below. In one embodiment, the present invention is a method for compressing video data, the method comprising, for each pixel converting pixel data from the RGB color space to the YCbCr color space, quantizing the Y, Cb, and Cr values to generate a specified number of bits for each Y, Cb and Cr value, rearranging and concatenating the bits of quantized Y, Cb, and Cr values in Cb, Cr, Y format to create a word, and creating a bitstream using data derived from said word.
The step of creating a bitstream using data derived from said word comprises the steps of determining a first characteristic code value using the word, determining a second characteristic code value using the first characteristic code value, and concatenating said first and second characteristic code values to generate a coded bitstream. The first characteristic code value represents the difference between two successive words. The second characteristic code value represents the number of consecutive first characteristic code values having the same value.
Optionally, the method further comprises the step of determining a first characteristic code value by classifying the first characteristic code value into a plurality of code length categories. The number of code length categories equals at least four. The code length categories are selected from the group consisting of 4 bits, 9 bits, 15 bits, and 21 bits. The method further comprises the step of setting a value for a first set of bits in the first characteristic code value based on said determination step. The last bit of the first characteristic code value specifies the sign of the first characteristic code value.
In another embodiment, the method for decoding compressed video data comprises extracting first characteristic code values and second characteristic code values from a coded bitstream, determining binary words representing pixels from the first and second characteristic code values extracted in the previous step, rearranging the binary words from a Cb,Cr,Y format into a Y,Cb,Cr format, subjecting the Y, Cb and Cr values for each word to inverse quantization, and converting the inverse quantized Y, Cb and Cr values from a YCbCr color space into a RGB color space. One of ordinary skill in the art would appreciate that the decoding process comprises the steps of encoding process performed in reverse.
In another embodiment, the system for compressing video data comprise a color converter for converting pixel data from a RGB color space to a YCbCr color space, quantization elements for quantizing each of the Y, Cb, and Cr values to generate a specified number of bits for each Y, Cb and Cr value and means for rearranging and concatenating the bits of quantized Y, Cb, and Cr values in Cb, Cr, Y format to create a word, and means for generating a coded bitstream based upon said word. The system further comprises a switch, which may be configured to select either one or a combination encoding techniques for compressing video data.
The system further comprises a means for generating a first characteristic code value wherein said first characteristic code value represents the difference between two successive words. The system further comprises a means for generating a second characteristic code value wherein the second characteristic code value represents the number of consecutive first characteristic code values having the same value.
In another embodiment, the present invention is directed to a method and system for compressing video data, the method comprising, for each pixel converting pixel data from the a first color space to a second color space having at least three value types, quantizing the three value types to generate a specified number of bits for each value type, rearranging and concatenating the bits of quantized value types in a different format to create a word, and creating a bitstream using data derived from said word.
These and other features and advantages of the present invention will be appreciated, as they become better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
The present invention presents improved methods and systems for compressing video images. In one embodiment, the present invention is directed towards a method for compressing digital video by converting pixel data from the red, green and blue (RGB) color space to the luminance color, blue color difference and red color difference (YCbCr) color space, quantizing each Y, Cb, and Cr value into a specified number of bits each, and rearranging the Y, Cb, and Cr values into Cb, Cr, Y to create a word. It should be appreciated that, by concatenating the three pixels (Y,Cb,Cr) and treating them as one piece of data, the data processing system does not have to process each plane separately and therefore need only perform a single read/write as opposed to three reads/writes.
Y=0.299R+0.587G+0.114B (1)
Cb=0.564(B−Y) (2)
Cr=0.713(R−Y) (3)
Equations (2) and (3) can be expanded so that the Cb and Cr color signals are entirely in terms of the R, G and B color signals:
Cb=−0.169R−0.331G+0.5B (4)
Cr=0.5R−0.419G−0.081B (5)
In the next step 102, each of the Y, Cb, and Cr pixel values obtained as above, is quantized into ‘k’, ‘l’, and ‘m’ number of bits respectively. The steps involved in the quantization process are well known in the art. In one embodiment of the present invention, the value of each of ‘k’, ‘l’, and ‘m’ is 6. That is, Y, Cb, and Cr values are quantized into 6 bit values each.
In the next step 103, the quantized Y, Cb, and Cr values are concatenated together to form a word, in the order CbCrY. That is, the Cb values occupy the most significant bit positions, Y values are placed in the least significant bit positions, and Cr values are placed in the middle. In the embodiment where each of k, l, and m is 6 bits, the length of the resulting concatenated word (k+l+m) is 18 bits. In this manner, each pixel is represented by an 18-bit word.
In the following step, each of the CbCrY words for pixel data are collected line-by-line into a packet or buffer of selectable length N, such that the following arrangement of ‘N’ number of words is obtained:
(CbCrY)1(CbCrY)2 . . . (CbCrY)N
This is depicted in step 104 of the flow chart. Each (k+l+m)-bit word CbCrY in the packet is characterized by a distinct pair of values—delta level (ΔLEVEL) and ‘RUN’. In the following steps 105 and 106, the values of delta level and RUN are respectively computed and encoded. The process of computing and encoding delta level and RUN values is explained in detail later in this document.
The abovementioned steps 101 through 106 are repeated until data for all the pixels are encoded. The final coded bitstream for pixel data comprises, for each word representing a pixel, the code for delta level followed by the code value for RUN.
Referring to
The last bit of the delta level code specifies the sign of ΔLEVEL, as shown in ΔLEVEL code entries in the table of
The aforementioned code structure for delta level has two advantages. Firstly, this code allows for transmitting the difference between word values of pixels, rather than the entire word value. Thus, for example if an image has a lot of redundancy—which implies a number of similarly valued pixels, the first word will be long as it represents the absolute pixel value, but the following delta level values will be small, as they represent the difference between successive words or pixel values. Secondly, the delta level code structure of the present invention enables delta levels to be represented by codewords of predictable or known lengths. This is because, although the absolute value of delta level may vary, depending upon the numerical difference it represents, the total length of the codeword is known and indicated by the values of first two bits. This feature is particularly important in parallel processing environments, wherein the ability to concurrently process multiple words simultaneously is required. During parallel processing, if the codewords are of variable length, it cannot be determined where one word ends and the other begins, and this poses problems. The code structure of present invention also generates variable length words; however the coding scheme lets the system predict the length of each word through the first two bits of that word. Therefore, the pointer can be simply moved ahead by the length indicated by code size when performing parallel processing.
The code structure of RUN enables deriving codewords of predictable or known lengths. As with the code structure of delta level, the RUN code structure also offers the added advantage in parallel processing, as the total length of the codeword is known and indicated by the values of first two bits in the code.
In accordance with the encoding method of the present invention, pixel data is first converted from R,G,B space to Y,Cb,Cr color space. Accordingly, as shown in row 502, the following corresponding (Y,Cb,Cr) values of the four pixels are obtained (referring to, and making use of equations (1) through (5) mentioned previously):
Thereafter, each of Y, Cb, and Cr values are quantized into 6 bit values each. The quantized Y,Cb,Cr values are:
The corresponding binary values for the quantized Y,Cb,Cr values are shown in the row 503 of the table of
Next, the Y, Cb, and Cr values are rearranged into Cb, Cr, Y to create an 18-bit word. The corresponding decimal values of the 18-bit binary words for the four pixels are:
The aforementioned decimal values along with their corresponding binary values for pixels are given in row 504.
Next, the delta level values are computed, which measure the difference between two words. For computing delta Level for a word, first the range within which the word falls is determined. In this example, the first word is “180332”, as explained above. This word falls into the range 8258:270401. Therefore, the first two bits of the delta level code will be set as “11” and then the next set of bits will be the binary version of the difference between the word and the beginning of the range (180332-8258). The final bit of the code denotes the sign of delta level. The 21-bit code for the first word “180332” is shown in the row 505 of the table of
For the next word, the difference between this word and the previous word is “4160”, and it falls within the range 66:8257. On the basis of this information, all the bits of the binary code for the second word are determined. In the same manner, delta level codes for the other two words are also computed, and are shown in the row 505 of
Thereafter the RUN code is computed, which establishes the number of consecutive delta levels with the same value. In the illustrated example, the value of RUN for the first two words is 1 each, while that for the last two words is 2, as shown row 506 of
To carry out the encoding process of the present invention, pixel data is first converted from (R,G,B) color space to (Y,Cb,Cr) color space. This step is carried out by the color converter 603. Next, the (Y,Cb,Cr) data is quantized by quantization elements 604. The quantized pixels are then rearranged and concatenated by the R & CQP (R&CQP stands for Reorder & Concatenate Quantized Pixels) block 605. The pixel data from video frames is then transferred to line by the block 606 for further processing. After introducing a delay via the element 607, delta level, which is the difference between two words, is calculated and coded by the block 608. Depending on the value of delta Level, RUN value is computed and coded by blocks 609 and 610. The coded delta Level and RUN values are then used to generate the bitstream.
The encoding method of the present invention has been described with an exemplary quantization format wherein pixel data is converted from (R,G,B) color space to (Y,Cb,Cr) color space and each of Y, Cb and Cr values are quantized into 6 bits binary values. However, one of ordinary skill in the art would appreciate that the Y, Cb and Cr values may be quantized into binary values of any number of bits. Different levels of compression can be achieved by varying the quantization format, that is, by varying the number of bits used to represent the Y, Cb and Cr values.
Further, although the encoding method of the present invention has been described with reference to its application to video, one of ordinary skill in the art would appreciate that this method may also be employed for bandwidth efficient compression in other types of data such as graphics and still images.
Although described above in connection with particular embodiments of the present invention, it should be understood the descriptions of the embodiments are illustrative of the invention and are not intended to be limiting. Various modifications and applications may occur to those skilled in the art without departing from the true spirit and scope of the invention as defined in the appended claims.