1. Field of the Invention
This invention relates in general to methods of analyzing images to determine in real-time whether an image can be compressed without causing visible anomalies. More specifically, but without restriction to the particular embodiments hereinafter described in accordance with the best mode of practice, this invention relates to methods and software for assessing in the suitability of images for compression.
Portions of the disclosure of this patent document contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office file or records, but otherwise reserves all rights whatsoever. ATI, and all ATI-based trademarks and logos are trademarks or registered trademarks of ATI, Inc. in the United States and other countries.
2. Background Art
The display of the computer system is thought of as a “frame” in which an image is presented. An image in the frame is decomposed into discrete elements which are termed pixels (picture elements). Pixels are the tiny dots of color on a computer display screen that collectively make up a computer display image. There are a number of color models that are employed to represent the pixels' individual colors. The purpose of a color model is to allow a convenient specification of colors within some color gamut. Typically, each pixel's display color is represented by three color components: red, green, and blue. This particular color model is often called the Red Green Blue (RGB) Color Model. The RGB model employs a unit cube subset of the 3D Cartesian coordinate system to map out the primary colors, with R, G, and B axes and with black (no light) at its origin. Thus, since the model is a unit cube, the axes each have an intensity range between 0 and 1. The primary characteristics are additive meaning that individual contributions of each primary color are added together to yield the resultant pixel color. Each pixel's own individual color characteristics are represented by a specific binary number. Thus, it follows that computer display images are typically represented as a two-dimensional matrix of digital data values.
The process of digitizing images ordinarily generates large amounts of image data. Depending upon the number of pixels in a frame, storing and/or transmitting uncompressed image data will usually require a large amount of computer memory. For this reason, whenever possible, it is advantageous to compress the original image data. Compression of image data improves the performance of video accelerators and reduces the amount of used memory. However, depending upon the particular image being compressed and the type of compression method being applied, compression can degrade the image's visual quality by varying amounts.
There have been various techniques developed in the past to determine whether or not a specific compressed image will be acceptable to use instead of its corresponding original image. However, these past techniques are indiscriminate, which usually causes the unfavorable result of visual artifacts, and/or labor-intensive, which eliminates the ability of real-time use. Most of the existing methods for determining whether a compressed image will appear unflawed to the eye are purely numerical based schemes, and do not take the visual characteristics of the eye into account.
A common numerical-based technique used to determine this is simply the calculation of the average error of the compressed image in absolute magnitude compared to the original non-compressed image. For all of the individual elements of the picture, the average error is the summation of the differences between those elements of the compressed image and all of the elements of the original non-compressed image. This method only accumulates the average overall error of the compressed image. The problem with this method is that it will at times discard acceptable compressed images. This is because there are many compressed images where the average overall error is extremely high, but the images themselves visually appear to the eye to be identical to the original non-compressed image. The reason for this is that, although the actual error is high, in places on the image where there is a great deal of high frequency noise, the eye filters out these visual effects. Thus, only if the compressed image and original image were placed side-by-side, would the eye possibly be able to detect a slight difference. But, at a casual glance, it will not be possible for the eye to be able to distinguish between the two images.
Another existing technique to determine when to compress images is the use of human-intervention to detect image textures that do or do not compress well. This technique is very labor-intensive and, hence, is not suitable for real-time use.
Also, it should be noted that a common method of image compression itself simply consists of applying a single compression method to all images. This method does not perform any analysis of how the images will visually appear after compression has been performed. These types of compression techniques, which do not evaluate the suitability of the compressed image at all, will invariably produce visual artifacts.
A common compression algorithm currently used is the DirectX Texture Compression (DXTC) method. This method codes new pixels from the original image pixels which are grouped into 4×4 pixel blocks. Each new pixel in the block is encoded into two bits. The new pixel's bits represent colors chosen so that they best reproduce the corresponding original pixel's colors. These colors are chosen by the following process. First, two of the original pixels are chosen as the “endpoint colors”, which define a line in the RGB color space. This line is termed the “compression axis”. Then, the remaining pixels in the block are encoded into color values that lie along the line that represent the closest value which lies along the line to the original pixel color. Mathematically, this is accomplished by projecting the original pixel color onto the compression axis. Finally, all of the color values which lie on the compression axis are encoded into two bits, representing four ranges on the compression axis. Hence, each pixel within the 4×4 pixel block is encoded into two bits representing one of these four values.
The DXTC method may not work well for all images. This method does not apply any test to ensure accurate visual representation of the original image. It is only a compression algorithm. If the original colors do not map well, visual artifacts may result.
The present invention provides an analysis method to produce a series of measurements which determines whether an image is suitable for compression. These measurements are known as error metrics and relate directly to the likelihood of visible artifacts being produced after compression of the image. These error metrics are analyzed to produce a decision on whether or not the image texture is acceptable for compression. In one embodiment, the implementation of the method of the present invention can make the determination in real-time.
In one embodiment, analysis methods based on four error metrics are used to determine whether a compressed image should be used. The first metric is a signal-to-noise (SNR) error metric that prevents compressed images with low SNR from being used to represent original images. Another metric is detecting the geometric correlation of pixels within individual blocks of the compressed images. The third metric is used to determine whether color mapping in the compressed images are well-mapped. A final metric is a size metric that filters images smaller than a certain size and prevents them from being compressed. These analysis methods are used in combination to determine whether a compressed image should be kept. Most of the analysis methods are actually integral parts of the compression process. Therefore, the present invention incurs very little extra time cost to collect error metric data during compression. Also since compression itself is an operation that takes significant time, data collection is a shorter operation. Furthermore the processing cost is also low since the compression process gives the data needed to perform the error metric analysis.
Further objects of the present invention together with additional features contributing thereto and advantages accruing therefrom will be apparent from the following description of the preferred embodiments of the invention which are shown in the accompanying drawing figures with like reference numerals indicating like components throughout, wherein:
The present invention provides an analysis method to produce a series of measurements that determines whether an image is suitable for compression. In the following description, numerous specific details are set forth to provide a more thorough description of embodiments of the invention. It is apparent, however, to one skilled in the art, that the invention may be practiced without these specific details. In other instances, well known features have not been described in detail so as not to obscure the invention.
Determining whether compressed images are acceptable has long been a problem in the art and various methods have been attempted to analyze acceptability. Embodiments of the present invention are aimed at using a series of measurements to determine whether an image is suitable for compression. These measurements are known as error metrics and relate directly to the likelihood of visible artifacts being produced after compression of the image.
Approach Overview
Typically given any input image, several analyses will be performed in accordance to the present invention.
In either case, in step 16, the method gathers data, and performs the necessary calculations for the error metric analysis. In step 18, the analysis continues by checking the errors within all blocks against the error metric. The determining step of step 20 ensures that the method continues until all blocks are checked. In step 22, the method counts the number of blocks that failed the error metric. In step 26, the method rejects compression if the certain number of blocks that failed is greater than the error criteria in step 24. An alternate check is to see if the summed values from the blocks fail the error criteria in step 24. If not, the method moves to step 28, where it checks for whether additional analysis is to be performed. If so, the method begins again in steps 10. Otherwise compression is accepted (step 30). If the image has been compressed already, the compressed version is used. Notice that the second time around the method may skip one of steps 12 and 14 because it has been performed already. The method may proceed to step 16 directly, as indicated by the dotted line.
Overall, the error metrics are analyzed to produce a decision on whether or not the image texture is acceptable for compression. The implementation of the method of the present invention can make the determination in real-time. In one embodiment, the error metrics of the present invention is geared toward checking images that are compressed by the DXTC method. It should be understood that the error metrics can be applied to a number of other compression algorithms also. The description below details the application of each error metric analysis.
Frequency-space Analysis (Signal to Noise Ratio)
This analysis method is designed to detect whether the signal-to-noise-ratio (SNR) of the compressed image is too low to sufficiently represent a source image. If the SNR of a source image is fundamentally high, then the lower SNR of a compressed image may not be able to represent the source sufficiently accurately (i.e. poor quality is visually detectable).
To see how this error metric affects visible image quality, consider this example analogy. A photograph (source image) is converted to a black and white image representation. If one were to look at the image representation in low resolution, it would look terrible. It would not look much like the source image unless the source image also happened to be black and white. This degradation in quality is caused by the introduction of noise. Noise is introduced because the conversion tried to represent a source image that is not representable in a binary state (i.e. bit depth=1).
The idea of non-representability can be illustrated by starting at the photograph level. The example photograph is an analog image, meaning that each point in the photograph can be one of an infinite number of values. To begin compression, the first step digitizes the image. Thus, individual points on the image can only take a certain set of discrete values. This introduces noise, and at that point in time the noise is called quantization noise. Then, in the process of compression one might take a group of elements (say up to 16 elements of that picture) that are spatially adjacent, all next to each other and try to represent them with only 4 levels of color, for example. This introduces another set of quantization noise.
The characteristic of the DXTC compression method is that it works on blocks that are 4×4 elements in size. Low frequency noise is introduced because there is discontinuity across these blocks. Each block is encoded with only four values; and the block next to it is encoded with four different values. Thus, there is no correlation between adjacent blocks. In other words, there is an error between the two blocks because the two blocks operate completely independently. Respectively they are sharing certain amount of the information across them.
Another fundamental characteristic of noise is the loss of bit-depth. The RGB color images that are typically fed into the processor are usually 24 bit images. However, the maximum resolution of the DXTC compressed images is approximately 19 bits. So, there is a loss of 5 bits, which reduces the SNR.
Given this understanding of noise, the first detection step is of the signal-to-noise error metric analysis primarily concerned with eliminating images that do not fit into the two categories below.
The choice of using frequency-space analysis on low frequency blocks should be noted. Those skill in the art would appreciate that various other methods can be used to determine SNR. However, what is of concerned is that SNR in low frequency blocks cause visually noticeable problems in DXTC. A high SNR in higher frequency components in the image is not particularly relevant because the characteristics of the eye tend to mask them.
The combination of
where c and the standard deviation σ are constants and ri, rj represent the pixel x, y distances from the origin.
ZM=UTMU.
The elements of U are given by
for n=1 . . . N;m=2 . . . N where the image M is an N×N array.
Note that the various components can now be identified. Note the DC components are the upper left corner element in each block. It can be seen that the values in the LF blocks can now be identified (and counted) for each 16×16 pixel block once DCT is performed. Also, note that with this error metric, the image does not have to be compressed prior to the use of frequency space analysis.
Color Space Matching (Axis Mapping)
Another analysis method is based on an error metric named color space matching or axis mapping. In generating a DXTC image, DXTC works by defining a pair of colors and then, a series of points which can only be from a direct interpolance between those two colors. This means that the points lie on a static line across the 3-D color space along the three axes, R, G, and B (see
To illustrate this concept, consider the following example block. If all of the pixels except one in the block map well, then the entire block will be considered to map well. If none of the pixels map particularly well, then they may still be close enough. At some point there is a criteria which is currently being determined largely by experimentation. The criteria can be set in an application-by-application basis. The proper setting of the criteria enables the analysis to determine whether the mapping is acceptable.
ui=Ci−P
wi=Ri−P
Therefore, u1 will be solved as being:
Thus, it will follow that q1 will equal:
In other words,
q1w1−ui.
Note that the dot product of vi and qi equals 0 because the vectors vi and qi are perpendicular to each other. Since these vectors are perpendicular, this ensures that Ci is the closest mapping point of the source (original) point Ri along the compression axis.
The error metric calculation proceeds by summing the squares of the axis mapping errors (step 56), qi's, over all original points, Ri's, in the block. The calculation of qi2 is shown as follows:
The sum of the squares of the axis mapping errors equals:
Summed square error=Σq12
The use of the summation of squares was chosen for speed reasons to avoid an expensive square root operation and because it is appropriate for calculating a resultant Root Mean Square (RMS) error over the entire block. However, it should be noted that a square root operation may be used in the error metric algorithm instead of the summation of the squares method. The error metric algorithm then determines if the resulting summed square error is greater than some preset limit, x (step 58 of
If σq12>x, then mark the block as having a high axis mapping error.
In one embodiment x is setting that can be adjusted in an application-by-application basis. This error metric is only considered after the entire image has been through the compression process. The total number of high axis error blocks is checked as a percentage of the total blocks in the image that are not Direct Current (DC) blocks, which consist of a single color. If this percentage is found to be greater than a specified preset limit, then the image is considered to have a high axis mapping error and subsequently the compressed image is rejected.
An important point to note about this error metric is that the required data for calculation is mostly generated as a by-product of the compression. Little additional effort is needed to collect data to calculation this error metric.
Geometric Correlation
Another analysis method is based on an error metric named geometric correlation. This analysis method is directed at detecting blocks that suffer from a problem with smooth gradients that are local to a single block. This problem is a very psychovisual case and often appears when DXTC is used. Instances of this problem arise when images with smooth fade that occur over short distances are compressed. The reason behind this problem is this—what tends to happen is that the quantization, particularly between adjacent blocks, tends not be very well correlated. As such, what happens is that the image goes lumpy. What was previous the nice, smooth gradient curves go patchy or lumpy because of the adjacent blocks.
The best way to detect this problem is to look at one of the intermediate values of the compression phase. One of the intermediate values derived from the compression phase is the mapping of the distance of points from the origin along the color axis. If those tend to form a geometric correlation across the block (i.e. if they tend to rise in increasing order from one corner across to the other corner), then generally that block is representing some kind of smooth field. And, if the detection finds that there is a large number of these blocks in the image, it usually means that the image has gone lumpy.
Consider an example image broken up into 12 blocks. Then consider an imaginary circle of a gradient that fades through this image. The circle would intersect each block at different points. Since each point will pick different quantization, the end result is that there is a clear join where each block meets its neighbor, because of the different values. If there is just a diagonal line going through the blocks, then there would be a different number of pixels of each color in each block. So, each block would tend to pick a different quantization.
The “Geometric Correlation along the Axis Error Metric” algorithm operates by identifying whether each of the DXTC 4×4 pixel blocks has a strong geometric correlation. Blocks in which there is a strong correlation of the mapped axis positions are frequently poorly represented by a DXTC compressed image. This is usually because adjacent blocks are individually compressed under the DXTC compression method and do not affect each other at all under current compression methods. This independent compressing of the blocks leads to different “endpoint colors” which may result in a semi-random visual perturbation of the resultant image, depending upon whether the blocks are correlated. If the source image contains a strong geometric pattern, this semi-random perturbation will disrupt the pattern. Since the eye is largely a pattern-recognition machine, the disruption of the pattern is clearly visible.
This error metric detects blocks that contain geometric patterns. To detect blocks that contain these patterns, the axis distances for each original pixel color are checked for geometric correlation across the block
Then in steps 62 and 64, two checks are performed to ensure that the pixel values decrease in monotone fashion within each row (step 62) and within each column (step 64). In other words, the block is geometrically correlated if all the below tests are true:
Additionally, the block is geometrically correlated if the same tests are true after the block has been rotated through 90, 180, or 270 degrees (step 68). Again, the number of geometrically correlated blocks is summed throughout the whole image, and the final result is considered as a percentage of the non-DC blocks. The high percentage would mean that the image should not be compressed as the compressed version would suffer visible degradation in quality.
Size
A final error metric that is used is the size of the image. In essence, a decision is made so that any image smaller than a certain size is not compressed. Small images are not worth compressing partly because they do not greatly affect the performance of the hardware, mostly because the smaller the image that is supplied by the application, the more critical individual detail in the image tends to be. Thus, if there is an image that is 4 pixels by 4 pixels, then each individual pixel is probably pretty important. Whereas for an image that is 2000 pixels by 2000 pixels, the individual pixels probably do not matter very much. In one embodiment of the present invention, the size cut off is 64×64 pixels. An image above the size of 64×64 pixels will pass this analysis step. It should be understood that any size can be set for this analysis method or that the size can be used to scale the sensitivity to other error metric conditions.
Combination of Analysis Methods
Those skilled in art can appreciate that the sequence presented in
Performance
Most of the analysis methods described above are actually integral parts of the compression process. Therefore, the present invention incurs very little extra time cost to collect error metric data during compression. Also since compression itself is an operation that takes significant time, data collection is a shorter operation. Furthermore the processing cost is also low since the compression process gives the data needed to perform the error metric analysis.
Conclusion
Thus an analysis method to produce a series of measurements which determines whether an image is suitable for compression is described in conjunction with one or more specific embodiments. While this invention has been described in detail with reference to certain preferred embodiments, it should be appreciated that the present invention is not limited to those precise embodiments. Rather, in view of the present disclosure, which describes the current best mode for practicing the invention, many modifications and variations would present themselves to those of skill in the art without departing from the scope and spirit of this invention. The invention is defined by the claims and their full scope of equivalents.
Number | Name | Date | Kind |
---|---|---|---|
4580134 | Campbell et al. | Apr 1986 | A |
4821208 | Ryan et al. | Apr 1989 | A |
4887151 | Wataya | Dec 1989 | A |
4974071 | Maeda | Nov 1990 | A |
5045852 | Mitchell et al. | Sep 1991 | A |
5046119 | Hoffert | Sep 1991 | A |
5047853 | Hoffert et al. | Sep 1991 | A |
5218431 | Gleicher et al. | Jun 1993 | A |
5287200 | Sullivan et al. | Feb 1994 | A |
5430464 | Lumelsky | Jul 1995 | A |
5452017 | Hickman | Sep 1995 | A |
5463700 | Nakazawa | Oct 1995 | A |
5544286 | Laney | Aug 1996 | A |
5552832 | Astle | Sep 1996 | A |
5576845 | Komatsu | Nov 1996 | A |
5585944 | Rodriguez | Dec 1996 | A |
5600373 | Chui et al. | Feb 1997 | A |
5619591 | Tsang et al. | Apr 1997 | A |
5682152 | Wang et al. | Oct 1997 | A |
5682249 | Harrington et al. | Oct 1997 | A |
5734744 | Wittenstein et al. | Mar 1998 | A |
5742892 | Chadda | Apr 1998 | A |
5748174 | Wong et al. | May 1998 | A |
5748904 | Huang et al. | May 1998 | A |
5787192 | Takaichi et al. | Jul 1998 | A |
5805226 | Jung | Sep 1998 | A |
5822460 | Kim | Oct 1998 | A |
5822465 | Normile et al. | Oct 1998 | A |
5847762 | Canfield et al. | Dec 1998 | A |
5877819 | Branson | Mar 1999 | A |
5903673 | Wang et al. | May 1999 | A |
5929862 | Barkans | Jul 1999 | A |
5956425 | Yoshida | Sep 1999 | A |
5956431 | Iourcha et al. | Sep 1999 | A |
5959631 | Knittel | Sep 1999 | A |
5978511 | Horiuchi et al. | Nov 1999 | A |
5987175 | Imaizumi et al. | Nov 1999 | A |
5995122 | Hsieh et al. | Nov 1999 | A |
6005971 | Bergman et al. | Dec 1999 | A |
6009200 | Fujita et al. | Dec 1999 | A |
6031939 | Gilbert et al. | Feb 2000 | A |
6049630 | Wang et al. | Apr 2000 | A |
6052203 | Suzuki et al. | Apr 2000 | A |
6075619 | Iizuka | Jun 2000 | A |
6111607 | Kameyama | Aug 2000 | A |
6125201 | Zador | Sep 2000 | A |
6128000 | Jouppi et al. | Oct 2000 | A |
6188394 | Morein et al. | Feb 2001 | B1 |
6192155 | Fan | Feb 2001 | B1 |
6195024 | Fallon | Feb 2001 | B1 |
6285711 | Ratakonda et al. | Sep 2001 | B1 |
6309424 | Fallon | Oct 2001 | B1 |
6320981 | Yada | Nov 2001 | B1 |
6349151 | Jones et al. | Feb 2002 | B1 |
6438165 | Normile | Aug 2002 | B2 |
6529631 | Peterson et al. | Mar 2003 | B1 |
6606417 | Brechner | Aug 2003 | B1 |
6614449 | Morein | Sep 2003 | B1 |
6658146 | Iourcha et al. | Dec 2003 | B1 |
6683978 | Iourcha et al. | Jan 2004 | B1 |
6683979 | Walker | Jan 2004 | B1 |
6687410 | Brown | Feb 2004 | B1 |
6731810 | Miura et al. | May 2004 | B1 |
6825847 | Molnar et al. | Nov 2004 | B1 |
6944332 | Brechner | Sep 2005 | B1 |
6990249 | Nomura | Jan 2006 | B2 |
7050641 | Kharitonenko | May 2006 | B1 |
7103357 | Kirani et al. | Sep 2006 | B2 |
7158271 | Sawada | Jan 2007 | B2 |
7177371 | Hudson et al. | Feb 2007 | B1 |
7224846 | Fujishiro et al. | May 2007 | B2 |
7352300 | Fallon | Apr 2008 | B2 |
7355603 | Donovan et al. | Apr 2008 | B2 |
7505624 | Ogden et al. | Mar 2009 | B2 |
20040161146 | Van Hook et al. | Aug 2004 | A1 |
20040174379 | Collodi | Sep 2004 | A1 |
20040228527 | Pomianowski et al. | Nov 2004 | A1 |
20060188163 | Elder | Aug 2006 | A1 |
20060215914 | Ogden et al. | Sep 2006 | A1 |
20090274366 | Iourcha et al. | Nov 2009 | A1 |
Number | Date | Country |
---|---|---|
5216993 | Aug 1993 | JP |
WO 9708900 | Mar 1997 | WO |
WO 9918537 | Apr 1999 | WO |
Number | Date | Country | |
---|---|---|---|
20040081357 A1 | Apr 2004 | US |