This invention relates to image compression and/or decompression.
In three dimensional (3D) computer graphics (CG) systems, various techniques are used to determine the shape of a 3D object to be displayed as a 2D view on a display screen. An arrangement often referred to as a “shader” then determines the surface appearance of the object. This will generally involve applying a surface “texture” to the drawn object, as well as taking into consideration the reflectivity of the object and also the location of light sources (in the virtual environment) relative to that object.
Applying a surface texture involves projecting a previously prepared and stored image called a “texture map” (representing the desired surface appearance of the object) onto a 3D shape. This is an established technique and will not be described here in detail, except in relation to its general requirement for a set of stored texture maps for projection onto graphically generated 3D shapes. Texture maps are just image data; the images happened to represent surface patterning to be applied to a CG object, but fundamentally they simply represent image data. In, for example, a computer game system, there is often a need for very many of these texture maps, which means in practical terms that they need to be stored (for example on a computer game disk) in compressed form.
Many CG systems, particularly hardware-accelerated 3D CG devices in personal computers or in games machines, operate in real time, which is to say that they generate a new image for display once per display (frame) period. In order to achieve this, they require rapid access to stored texture maps and so require compression/decompression techniques which allow a relatively straightforward and rapid decompression of the stored texture maps.
Some image compression techniques are specifically designed to provide this feature, i.e. that the decompression process requires relatively little processing and relatively few memory accesses. An example is the family of S3 Texture Compression techniques, often referred to as DXTn (where n is 1-5), developed by S3 Graphics Ltd and described in references 1 and 2 below.
DXT1 in a basic form provides a fixed 6:1 compression of 24 bit RGB (red-green-blue) colour data so that a 4×4 block of pixels (384 bits) is compressed to a 64 bit data quantity. Each pixel block is compressed by picking a “start” and an “end” colour at 565 precision (that is, 5 bits for red, 6 for green and 5 for blue) and considering up to two full-precision intermediate colours which may be defined as being evenly distributed (on a straight line in RGB colour space) between the start and end colours. Accordingly, because the intermediate colours may be derived from the start and end colours, the intermediate colours do not need to be explicitly coded as part of the compressed data. Each pixel in the 4×4 pixel block is then encoded with a 2-bit index as a selection of a nearest one of these 4 colours. So the total number of bits used to encode the 4×4 pixel block is (16 pixels*2 bits per pixel)+(2 reference colours*(5+6+5) bits per reference colour)=64 bits.
When the block of pixels is decompressed, it is necessary just to detect the start and end colours, to interpolate the two intermediate colours evenly distributed in colour space between the start and end colours, and then use those four colours in a look-up table with the 2 bit index provided for each pixel. In this way, the more processor-intensive aspects of the compression/decompression processing (e.g. choice of the start and end colours) can be handled at the compression side, leaving the decompression as a relatively straightforward processing operation.
Other variants of DXT1, and other members of the DXTn family of techniques, use a similar approach and can also handle so-called alpha channel (transparency) information relating to the pixel block. For ease of explanation, DXT1 will be discussed here by way of example, but it will be appreciated that the techniques to be described are applicable both to the remaining DXT techniques and also to other compression techniques falling within the scope of the appended claims.
The DXT1 compression system described above provides an efficient way of compressing a single texture map, at a particular image size, for use in a CG system. However, the 3D object onto which the texture map is to be projected may vary in size—at a simple level, in dependence on how big an object is being represented and how far that object is displaced, in the virtual environment, from the virtual viewpoint. If only a single texture map were stored, the image size of the texture map may well not match the size required to map correctly onto the 3D object. But if a texture map were stored for each possible object scale, the storage requirements would be impractically large. So a convenient solution is that a few texture maps at a selection of different scales are stored, and for a particular object, a texture map (or rather, the relevant parts of a texture map) at the required scale is interpolated from the one or two stored maps nearest in scale to the required scale. Generally the aim will be to have a wide enough range of stored maps that most required scales will fall between a pair of stored map scales. To handle situations where the displayed object is moving relative to the virtual viewpoint, so that a differently scaled texture map may be required at each frame, this interpolation process can be carried out in real time.
The use of multiple scales of texture maps is sometimes referred to as “MIP mapping” (MIP being an acronym of the Latin phrase multum in parvo, meaning “much in a small space”). The term “MIP map” is generally used to refer to a set of texture maps at different scales. Often the scales form a geometric series, so that (for example) each scale is one quarter of the size (50% in each dimension) of the next higher scale. As an example, if a texture map has a basic size of 256×256 pixels, then the associated MIP map might contain a further eight versions of that texture map, at image sizes 128×128, 64×64, 32×32, 16×16, 8×8, 4×4, 2×2 and 1×1 pixels. The total storage requirement of the MIP map is very close to 1⅓ times the storage space of the basic (256×256) texture map.
The shader uses the MIP map to produce texture information at a required scale, generally by interpolating between the two closest scales in the MIP map. So, for example, if the display size of an object means that a texture map at a scale of 40×40 pixels would be required, the shader would interpolate required parts of the texture from the 64×64 and 32×32 images in the MIP map.
The set of scales listed above, in the 0.25 geometric series, is of course just one example of the use of MIP maps.
The technique is considered to work well on texture maps containing significant high spatial frequency information, but not so well on texture maps having smooth gradients (representing relatively low spatial frequency detail). An example of a texture map having such low frequency detail is a texture map representing the illumination of an object. With low spatial frequency texture maps, compression artefacts, sometimes representing discontinuities caused by the need to select the start and end colours in DXT compression on a block-by-block basis, can be visible.
Various other sets of scales, sometimes involving many more images in the MIP map, have been proposed, in order to try to improve the rendered appearance of displayed objects. Another possibility is to use a different colour space than RGB for the compression system. However, these attempts are found to suffer either from greatly increased storage requirements or undesirable additional processing overhead at the decompression stage.
This invention provides a method of image compression in which multiple versions of an image are compressed, each version having a different image resolution, the method comprising the steps of: for one or more compressed versions of the image: decompressing that compressed version to generate decompressed image data; detecting image differences between a higher resolution version of the image and the decompressed image data; and compressing difference data dependent upon the detected image differences.
This invention also provides a method of image decompression in which multiple compressed versions of an image are provided, each version having a different image resolution, along with compressed difference data dependent upon image differences between a decompressed image version and a respective higher resolution image version, the method comprising the steps of: selecting one or more image versions; decompressing the compressed image data relating to the selected image version(s); decompressing the difference data relating to respective higher resolutions than the selected image version(s); and combining the decompressed image data and the decompressed difference data to generate an output image at a required output resolution.
The selected image versions may be, for example, such that the resulting difference data represents resolutions spanning the required output resolution.
The invention provides an image data compression/decompression technique which is particularly (though not exclusively) suited to real time use in CG applications and which can provide an improved output image quality for little increase in memory or processing overhead. In particular, for similar storage and processing requirements, visible noise can be reduced (by the use of this technique compared to previous techniques) in situations where low spatial frequency image information such as smooth lighting gradients is encoded.
Various other aspects and features of the invention are defined in the appended claims.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
Referring now to
The data processing apparatus may be, for example, a personal computer, a computer games machine such as a Sony® PlayStation 3® home entertainment machine or a hand-held machine such as a Sony® PlayStation Portable® entertainment machine.
The system unit 10 comprises a number of items interconnected by a bus structure a central processing unit (CPU) 50; random access memory (RAM) 60, read only memory (ROM) 70, removable and/or fixed disk storage (such as optical disk storage) 80; an input/output (I/O) interface 90 for interfacing with peripherals such as the input device 30; a wired and/or wireless network interface 100 for interfacing with a network and/or internet connection 120; and a graphics card 110.
Two modes of operation of the data processing apparatus are described below: these are the preparation of compressed texture maps for use in later generation of graphical images, and the decompression of compressed texture maps for applying a texture to a graphical object. In general terms, these can be carried out by the same data processing apparatus, though it will be appreciated that it is perhaps more likely that the first of these processes would be carried out by a powerful non-portable system such as a so-called developer's kit or a powerful personal computer, whereas the second of the processes would be carried out by a consumer device such as one of the entertainment machines mentioned above. For the sake of the following description, it will be assumed that the apparatus shown in
In operation, computer program code is read from the disk storage 80, the ROM 70 and/or via the network connection 120 and is loaded into the RAM 60 for execution by the CPU 50, possibly in response to signals received from the input device 30. The CPU 50 generates data outputs which are passed to the graphics card 110.
The graphics card 110 acts on data received from the CPU 50 to prepare or “render” an image to be displayed on the display 20. A more detailed description of the graphics card 110 will be given below. In general terms, the graphics card (which need not of course be card-shaped or even removable in terms of its connection to the rest of the apparatus) comprises a microprocessor and associated memory and other hardware which are dedicated to handling processing tasks specific to the rendering of output images in an efficient way.
So, as an overview, the graphics card 110 receives data from the CPU 50, from which data it generates an output image to be stored in a display buffer 180. A primitive renderer 130 generates small image portions, known as primitives, from which the output image is built up. Each primitive might represent, for example, a small polygon forming part of an object or image background in the output image. A “depth” or “z” value is associated with each pixel of each primitive, to show its depth in the final image relative to other rendered primitives. The depth values are stored in a depth buffer 140. Similarly, a transparency or “a” value is associated with each pixel to define a degree of transparency. In this way, the final image can be built up so that background pixels are hidden behind non-transparent foreground pixels; whereas background pixels may be wholly or partly seen if a foreground pixel at the same display position is completely or partly transparent.
A shader 160 applies surface textures to some or all of the rendered primitives using texture maps stored in a texture map buffer 170. These texture maps may have been retrieved from (for example) the disk storage 80, the ROM 70, the RAM 60 or via the network connection 120, and are stored locally in the texture map buffer 170 for ease and speed of access. It will be appreciated that the shader could work directly from the original source of the texture map data, i.e. without the need for local storage, but such an arrangement would almost certainly be considerably slower than caching the texture map data in local storage. Indeed, the depth buffer 140, the texture map buffer 170 and the display buffer 180—along with other storage requirements of the graphics card not shown in FIG. 2—form part of the graphics card's local storage 150 which is provided in or very close to the graphics card for speed of operation.
The shader takes into account the nature of the object or surface to be rendered and other factors such as its reflectivity and the nature and position of any lighting in the virtual environment, to apply a surface finish or appearance to be applied to that object. Many different shader techniques have been proposed and developed, such as vertex shading, pixel shading, geometrical shading and the like. These are all known in the art and will not be described in detail here, as the present embodiments are relevant to the generation of the texture map data which is applied by the shader, rather than to the particular technique by which that texture map data is applied.
Shaders are usually implemented using a shading language, which is a specifically designed programming language having features which are particularly relevant to the functionality of a shader. Some example functions, written in a shader language, will be given below. The graphics card used in the Sony® PlayStation 3® entertainment machine is reported to be capable of about 75 billion shader operations per second.
The term “MIP map” is used here to refer to a set of texture maps at different scales. In the present example, the scales form a geometric (i.e. logarithmic) series or set, so that each scale (resolution) is one quarter of the size (50% in each dimension) of the next higher scale. In particular, a texture map 200 has a basic size of 256×256 pixels. The MIP map contains up to a further eight versions of that texture map, at image sizes 128×128, 64×64, 32×32, 16×16, 8×8, 4×4, 2×2 and 1×1 pixels. In
The different texture map versions in the MIP map have all been compressed using (in this example) DXT1 compression.
The shader uses the MIP map to produce texture information at a required scale, generally by interpolating between the two closest scales in the MIP map. Various interpolation methods for this purpose have been proposed, and the particular interpolation technique is not important to the present embodiment. Interpolation at this stage benefits if anti-aliasing processing was carried out when the texture map versions at different scales were first generated.
The required scale of the texture map depends on the display size of the object being rendered, which in turn depends on a base size and also the distance at which the object is to be displayed (in the virtual environment) from the virtual viewpoint. Known techniques are used to establish the required scale of the texture map to be interpolated. Rather than interpolating an entire texture map at the required scale, generally only those portions required for the object's display are interpolated. This selection of portions for interpolation can take place at a pixel-by-pixel level.
So, for example, if the display size of an object means that a texture map 250 at a scale of (say) 160×160 pixels would be required, the shader interpolates required parts of the texture from the 256×256 version 200 and the 128×128 pixel version 210 in the MIP map. Because DXT1 compression is local to particular blocks of the compressed image (i.e. the decompression of a block does not require any other blocks to be decompressed), it is necessary only to decompress those parts of the versions which are relevant to the interpolation process. An example is shown in
From a “base” texture map 300 (at, say, 256×256 pixels), a MIP map is generated using known techniques, and optionally including known anti-aliasing processing, to produce a series of smaller versions of the texture map 310, 320, 330, 340 . . . . As described with reference to
The texture maps are compressed (e.g. using DXT1 compression) and are then decompressed for the purposes of the processing below. For those of the texture maps which go forward for storage or transmission (all but the map 300—see below) they are stored or transmitted in compressed form.
The base texture map 300 is ultimately discarded; that is to say, it is not used in the MIP map which is stored or transmitted, and later referred to by the shader 160. However, it is used in the preparation of the MIP map and so is shown in
A series of “difference” texture maps is generated. These are shown as difference maps 350, 360, 370 and 380 in
The way in which the difference maps are generated will now be described. For each of the MIP map levels 310 . . . 330, the CPU 50 decompresses the map and carries out program process steps 342, 344, 346 and 348 as follows:
(step 342) image-expand (i.e. scale) the next lower resolution image version by a factor of 4 (i.e. so as to be the same size as the current level);
(step 344) calculate (i.e. detect) the difference, on a pixel by pixel basis, between pixels of the current level and pixels of the image-expanded next lower level;
(step 346) multiply the difference by a gain factor (a scaling constant) and apply an offset (e.g. 128=one half of the full range of pixel values); and
(step 348) use DXT1 or other compression to compress the difference image.
The gain factor might be, for example, 4 or 8 to improve the compression/decompression quality. This is possible because all of the difference values (before the offset is applied) will be reasonably close to zero. The offset is applied to handle negative difference values.
The image-expansion process may be a known bilinear filtering process.
DXT1 compression in a basic form provides a fixed 6:1 compression of 24 bit RGB colour data so that a 4×4 block of pixels (384 bits) is compressed to a 64 bit data quantity. Each pixel block is compressed by picking (using known techniques) a start and end colour at 565 precision (that is, 5 bits for red, 6 for green and 5 for blue) and considering up to two full-precision intermediate colours which may be defined as being evenly distributed (on a straight line in RGB colour space) between the start and end colours. Each pixel in the block is then encoded with a 2-bit index as a selection of a nearest one of these 4 colours. So the total number of bits used to encode the 4×4 pixel block is (16 pixels*2 bits per pixel)+(2 reference colours*(5+6+5) bits per reference colour)=64 bits.
When the block of pixels is later decompressed, it is necessary just to detect the start and end colours, to interpolate two colours evenly distributed in colour space between the start and end colours, and then use those four colours in a look-up table with the 2 bit index provided for each pixel.
Referring to
It will be seen that the base texture map 300 is used in the generation of the difference image 350 but is not stored or transmitted for later use.
The texture map versions 310 . . . 340 are also subjected to DXT1 compression.
All of the data generated in
In
The basic process to generate a required texture map area 392 in an arbitrarily-sized required texture map scale 390 will now be described. In the present example, the required texture map scale is between the base size of 256×256 pixels and the next lower size of 128×128 pixels.
Two interpolation processes 400, 410 are carried out by the shader 160.
The interpolation process 400 decompresses and acts between regions 312 and 322 in two texture maps selected for this purpose by the shader 160: the texture map versions 310 (next lower from the required scale 390) and 320 (next lower again), to generate an interpolated pixel or region corresponding to the required region 392 but at one quarter of the required scale (i.e. one level down in the MIP map structure).
The interpolation process 410 decompresses and acts between regions 352, 362 in respective difference images 350, 360 at scales either side of the required scale 390. So, the interpolation of the difference data takes place using difference data at a next higher resolution than the resolutions used for the interpolation of the texture data.
The interpolation process 400 (i.e. the process which applies to the texture map versions 310 and 320 in the example of
The results of the two interpolation processes 400, 410 are passed to a combiner process 420, again implemented by the shader 160. The combiner process takes the two interpolated regions and combines them as follows:
divide difference image pixel value by the gain factor (see above) and subtract the offset; and
add the difference image pixel value and the corresponding pixel position in the interpolated texture map, to generate an output texture map at the required resolution
An example of this process running in a shader programming language, along with an explanation of each command, is as follows:
half3 base=tex2Dbias (baseTex, in_uv, bias_amount).xyz;
half3 defines the type of the variable name which follows; so the variable base is a half precision three component (RGB) variable
tex2 Dbias is a command to generate a pixel value from a MIP map at the currently required scale plus a bias MIP level, bias_amount (which in the above example would be 1, but could be a different number in other embodiments)
baseTex identifies the MIP map corresponding to the texture maps (in this example, the maps 310 . . . 340)
in_uv represent coordinates within a texture map
.xyz indicates that the first three components (RGB) of the argument should be passed as a result
So this command generates a required pixel using a texture map scale one lower (in the MIP chain) than would otherwise have been selected.
half3 diff=tex2D (diffTex, in_uv).xyz;
tex2D is a command to generate a pixel value from a MIP map at the required scale (no offset)
diffTex identifies the MIP map corresponding to the difference images (i.e. the images 350 . . . 380 in the example)
half3 combined=base+scale.xxx*(diff−offset.xxx);
combined is the output pixel value
scale (representing 1/gain factor) and offset have been described earlier with reference to
.xxx signifies a three component representation of the relevant variable
In a possible development, for smaller required scales, representing (generally) more distant objects on which less detail can be seen, the use of the difference map could be reduced or even avoided altogether and instead just a conventional interpolation between texture maps having scales either side of the required output scale could be used. So, for example, if the required output scale were smaller than a certain level, the conventional technique could be used whereas for required output scales larger than such a threshold, the difference based technique could be used. Or a weighted sum of the conventional and new techniques' results could be used, with the weighting generally increasing in favour of the new (difference image) technique with increasing required scale. These two possibilities may be combined—i.e. a weighted and thresholded system, with the weighting applying above a threshold scale. Of course, the conventional technique would not be applicable at required scales above the second MIP level, assuming the largest texture map version were discarded as described above.
The storage requirements of the MIP map generated using the technique shown in
In summary, the embodiments described above provide an image data compression/decompression technique which is particularly (though not exclusively) suited to real time use in CG applications and which can provide an improved output image quality for little increase in memory or processing overhead. In particular, for similar storage and processing requirements, visible noise can be reduced (by the use of this technique compared to previous techniques) in situations where low spatial frequency image information such as smooth lighting gradients is encoded.
The above embodiments can be implemented by the data processing apparatus of
Number | Date | Country | Kind |
---|---|---|---|
0625401.5 | Dec 2006 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/GB2007/004862 | 12/18/2007 | WO | 00 | 10/20/2009 |