Real time on-chip texture decompression using shader processors

Description

BACKGROUND OF THE INVENTION
Field of the Invention

This disclosure relates generally to graphics processing, and in particular to a processing unit, method, and medium of texture decompression.

Description of the Related Art

Computer graphics processing systems process large amounts of data, typically with a graphics processing unit (GPU) performing a large percentage of the processing. A GPU is a complex integrated circuit that is configured to perform, inter alia, graphics-processing tasks. For example, a GPU may execute graphics-processing tasks required by an end-user application, such as a video-game application. The GPU may be a discrete device or may be included in the same device as another processor, such as a central processing unit (CPU).

A GPU produces the pixels that make up an image from a higher level description of its components in a process known as rendering. GPU's typically utilize a concept of continuous rendering by the use of computing elements to process pixel, texture, and geometric data. The computing elements may execute the functions of rasterizers, setup engines, color blenders, hidden surface removal, and texture mapping. These computing elements are often referred to as shaders, shader processors, shader arrays, shader pipes, shader pipe arrays, shader pipelines, or a shader engine, “shader” being a term in computer graphics referring to a set of software instructions or a program used by a graphics resource primarily to perform rendering effects. “Shader” may also refer to an actual hardware component or processor used to execute software instructions. A shader processor or program may read and render data and perform any type of processing of the data. GPU's equipped with a unified shader also simultaneously support many types of shader processing, from pixel, vertex, primitive, and generalized compute processing.

Much of the processing involved in generating complex graphics scenes involves texture data. Textures may be any of various types of data, such as color, transparency, lookup tables, or other data. In some embodiments, textures may be digitized images to be drawn onto geometric shapes to add visual detail. A large amount of detail, through the use of textures, may be mapped to the surface of a graphical model as the model is rendered to create a destination image. The purpose of texture mapping is to provide a realistic appearance on the surface of objects. Textures may specify many properties, including colors, surface properties like specular reflection or fine surface details in the form of normal or bump maps. A texture could also be image data, color or transparency data, roughness/smoothness data, reflectivity data, etc. A ‘texel’ is a texture element in the same way a ‘pixel’ is a picture element. The terms ‘texel’ and ‘pixel’ may be used interchangeably within this specification.

In 3D computer graphics, surface detail on objects is commonly added through the use of textures. For example, a 2D bitmap image of a brick wall may be applied, using texture mapping, to a set of polygons representing a 3D model of a building to give the 3D rendering of that object the appearance that it is made of bricks. Providing realistic computer graphics typically requires many high-quality, detailed textures. The use of textures can consume large amounts of storage space and bandwidth, and consequently textures may be compressed to reduce storage space and bandwidth utilization.

Texture compression has thus become a widely accepted feature of graphics hardware in general and 3D graphics hardware in particular. The goal of texture compression is to reduce storage and bandwidth costs on the graphics system while retaining as much of the quality of the original texture as possible. The compression and decompression methods described herein may be used to compress various types of texture information including image data, picture data, transparency information, smoothness or roughness data, or any other similarly structured data. As such, the term texture is used broadly herein to refer to the data being compressed or decompressed as part of a GPU.

Fixed-rate compression schemes have traditionally been used to compress textures and may generally suffer from several shortcomings as compared to variable-rate schemes. Unlike fixed-rate compression, variable-rate compression is more flexible and may allow for adjustments to quality as desired. For example, variable-rate compression may be set to achieve lossless compression. In some cases, the use of variable-rate compression schemes may provide better compression than traditional fixed-rate compression schemes. A variable-rate compression scheme, such as Joint Photographic Experts Group (JPEG), is typically not used for texture compression when on-the-fly decompression is desired due to the high complexity and implementation cost. Therefore, there is a need in the art for methods and mechanisms to enable low-cost on-the-fly decompression of variable-rate compressed textures.

In view of the above, improved processing units, methods, and mediums for performing real time decompression of compressed textures are desired.

SUMMARY OF EMBODIMENTS OF THE INVENTION

Various embodiments of processing units, methods and mediums for decompressing texture data are contemplated. In one embodiment, a first shader of a plurality of shaders may require a block of a texture to produce data used by a display device or in further processing. The first shader may be configured to calculate a virtual address of the block within an uncompressed version of the texture and convey the virtual address with a request for the block to a cache memory device. In response to determining an uncompressed version of the block is not stored in the cache, a second shader of the plurality of shaders may be initiated as a decompressing shader and the virtual address of the uncompressed version of the block may be passed to the decompressing shader. Also, in response to determining the uncompressed version of the block is not in the cache, a cache line may be allocated for the requested block.

The second shader may be configured to receive the compressed version of the block from the cache. The cache may be configured to utilize a table which maps a virtual address space of an uncompressed version of the texture to an address space of a compressed version of the texture. The cache and/or the second shader may be configured to determine the location and size of the compressed version of the block from the table. The table may also contain additional information, such as the value of the DC coefficient of a compressed version of each block of the texture.

After receiving the compressed version of the block from the cache, the second shader may be configured to decompress the compressed version of the block and then write a decompressed version of the block to the cache. After the decompressed version of the block has been written to the cache, the first shader may be configured to receive the decompressed version of the block from the cache. The first shader may then be configured to process the decompressed version of the block such that it may be applied to a rendered surface for display.

These and other features and advantages will become apparent to those of ordinary skill in the art in view of the following detailed descriptions of the approaches presented herein.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and further advantages of the systems, methods, and mechanisms may be better understood by referring to the following description in conjunction with the accompanying drawings, in which:

FIG. 1 illustrates one embodiment of a computer graphics system.

FIG. 2 is a block diagram of a GPU in accordance with one or more embodiments.

FIG. 3 illustrates a block diagram of one embodiment of a graphics processing system.

FIG. 4A illustrates a block diagram of one embodiment of a data cache.

FIG. 4B is a block mapping table in accordance with one or more embodiments.

FIG. 5 illustrates one embodiment of a virtual address space for an 8×8 block of texels.

FIG. 6 is a block diagram of one embodiment of a portion of data.

FIG. 7 is a generalized flow diagram illustrating one embodiment of a method to decompress a compressed block of a texture.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forth to provide a thorough understanding of the methods and mechanisms presented herein. However, one having ordinary skill in the art should recognize that the various embodiments may be practiced without these specific details. In some instances, well-known structures, components, signals, computer program instructions, and techniques have not been shown in detail to avoid obscuring the approaches described herein. It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements.

This specification includes references to “one embodiment” or “an embodiment.” The appearances of the phrases “in one embodiment” or “in an embodiment” do not necessarily refer to the same embodiment. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.

Terminology. The following paragraphs provide definitions and/or context for terms found in this disclosure (including the appended claims):

“Comprising.” This term is open-ended. As used in the appended claims, this term does not foreclose additional structure or steps. Consider a claim that recites: “A GPU comprising a plurality of shaders . . . .” Such a claim does not foreclose the GPU from including additional components (e.g., a texture unit, input/output circuitry, etc.).

“Configured To.” Various units, circuits, or other components may be described or claimed as “configured to” perform a task or tasks. In such contexts, “configured to” is used to connote structure by indicating that the units/circuits/components include structure (e.g., circuitry) that performs those task or tasks during operation. As such, the unit/circuit/component can be said to be configured to perform the task even when the specified unit/circuit/component is not currently operational (e.g., is not on). The units/circuits/components used with the “configured to” language include hardware for example, circuits, memory storing program instructions executable to implement the operation, etc. Reciting that a unit/circuit/component is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 114, sixth paragraph, for that unit/circuit/component. Additionally, “configured to” can include generic structure (e.g., generic circuitry) that is manipulated by software and/or firmware (e.g., an FPGA or a general-purpose processor executing software) to operate in manner that is capable of performing the task(s) at issue. “Configure to” may also include adapting a manufacturing process (e.g., a semiconductor fabrication facility) to fabricate devices (e.g., integrated circuits) that are adapted to implement or perform one or more tasks.

“First,” “Second,” etc. As used herein, these terms are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.). For example, in a processor having eight processing elements or cores, the terms “first” and “second” processing elements can be used to refer to any two of the eight processing elements. In other words, the “first” and “second” processing elements are not limited to logical processing elements 0 and 1.

Referring to FIG. 1, a block diagram of one embodiment of a computer graphics system is shown. Computer graphics system 100 includes computing system 102 and display device 114. Computing system 102 includes a graphics processing unit (GPU) 104 for processing graphics data. In some embodiments, GPU 104 may reside on a graphics card within computing system 102. GPU 104 may process graphics data to generate color and luminance values for each pixel of a frame for display on display device 114. GPU 104 may include one or more processing cores and/or an array of shaders to perform pixel manipulations.

Computing system 102 may include a software program application 108, an application programming interface (API) 110, and a driver 112, which may run on a CPU (not shown). API 110 may adhere to an industry-standard specification, such as OpenGL or DirectX. API 110 may communicate with driver 112. Driver 112 may translate standard code received from API 110 into a native format of instructions understood by GPU 104. GPU 104 may then execute the instructions received from driver 112.

Textures may be transferred to GPU 104 from system memory (not shown) or another storage device of computing system 102. In one embodiment, textures may be compressed using JPEG compression. In other embodiments, other types of variable-rate compression may be used to compress the textures. For the remainder of this specification, examples of JPEG type encoding will be used to describe the various embodiments. However, this is for illustrative purposes only, and other types of variable-rate compression may also be used with the methods and mechanisms described herein.

Driver 112 may reformat compressed textures as part of a tiling process. This reformatting may entail transcoding a JPEG-compressed texture into a hardware internal JPEG format. In other embodiments, the JPEG-compressed texture may be transcoded into other formats. The hardware internal JPEG format may contain additional information to facilitate the decompression process. For example, the hardware internal JPEG format may include a table with information on the location and sizes of the various blocks of the JPEG-compressed texture. The table may also include information on the DC coefficients of each 8×8 block of the JPEG-compressed texture. The table may further include Huffman codes, quantization tables, and other information to facilitate the decompression of the compressed texture. Driver 112 may also allocate a virtual address space for each of the compressed textures utilized by computing system 102. The size of each virtual address space may correspond to the size of the uncompressed texture.

Computing system 102 will typically have various other devices/components not shown in FIG. 1, such as a CPU, buses, memory, peripheral devices, etc. For example, computing system 102 may include an I/O interface which may be coupled to other devices, such as a keyboard, printer, and mouse, in addition to display device 114. In some embodiments, computing system 102 may include a plurality of GPU's.

In another embodiment, a processor, such as GPU 104, may be defined in software. The software instructions may be stored in a computer readable storage medium and when executed on a computing device, may define the processor. In a further embodiment, processors may comprise GPU's, CPU's, video processing units (VPU's), coprocessors, and/or other types of processors that are configured to process texture data. In various embodiments, the GPU and CPU may be separate integrated circuit devices/packages. In various embodiments, the GPU and CPU may be included in a single integrated circuit or package.

Referring to FIG. 2, a block diagram of one embodiment of a GPU 200 is shown. GPU 200 may be utilized to perform graphics-processing related tasks (e.g., using vertex shaders, geometry shaders, pixel shaders, etc.) and general-computing tasks (e.g., mathematical algorithms, physics simulations, etc.). In the example shown, GPU 200 includes shader processor array 210, command processor 212, texture memory 220, and memory controller 222 which may be configured to support direct-memory access (DMA). It is noted that the embodiment of GPU 200 depicted in FIG. 2 is for illustrative purposes only, and those skilled in the art will appreciate numerous alternative embodiments are possible. All such alternative embodiments are contemplated. Note also that GPU 200 may include many other components not shown in FIG. 2.

In the embodiment shown, shader processor array 210 comprises multiple processing units which may perform in parallel. Command processor 212 may issue commands and assign processing tasks to individual shader processors of shader processor array 210. In some embodiments, command processor 212 may include a dispatch processor (not shown) configured to divide a received workload into threads and distribute the threads among processing units of the shader processor array. Shader processor array 210 may be configured to perform various types of functions, including processing texture data and performing rendering algorithms to transform 3-dimensional texture objects into a 2-dimensional image. As noted above, shader processor array 210 may include a plurality of shader processors, and the plurality of shader processors may implement algorithms using a wide range of mathematical and logical operations on vertices and other texture data.

In some embodiments, GPU 200 may be configured to utilize one or more on-chip and/or off chip memories for temporarily storing data. While such memories may be referred to herein as “caches”, it is noted that the use of such a term does not necessarily require any particular organization, structure or policies for such memories. For example, while such memories may utilize organizations and policies typically associated with central processing unit (CPU) caches—such as set associative organizations and replacement policies, any desired organization and/or storage policies may be utilized. In various embodiments, texture memory 220 is used for storing texture data. In such an embodiment, texture memory 220 may provide faster access to certain texture data, such as texture data that is frequently used, than would be possible if the texture data were only stored in system memory 226 or local memory 230. System memory 226 may represent memory accessible by both GPU 200 and a central processing unit (CPU, not shown), while local memory may represent memory which is directly accessible by only GPU 200. In various embodiments, texture memory 220 may include multiple levels in a hierarchical arrangement as is commonly known in the cache arts. The number of such cache levels included in texture cache system 220 may vary from one embodiment to the next. Texture memory 220 may be implemented using a variety of memory technologies, such as static memory (e.g., SRAM), stacked-memory using dynamic memory (e.g., DRAM), or otherwise. Texture memory 220 may also include caching logic. The caching logic may be configured to cache data into texture memory 220 and to implement cache management policies that consider the relative latency and/or bandwidth of cache system 220 versus system memory 226.

GPU 200 may also include memory controller 222. Memory controller 222 may be coupled to system memory 226 and local memory 230. Memory controller 222 may access data, such as compressed textures 228, in system memory 226. Compressed textures 228 may include a plurality of textures which may be compressed with any of a variety of variable-rate compression techniques, such as JPEG. Compressed textures 228, or portions of individual textures within compressed textures 228, may be transferred to texture memory 220 and shader processor array 210 of GPU 200 (via memory controller 222) without first being decompressed. Host driver 240 may transfer commands and data to GPU 200 via system memory 226. Local memory 230 may be utilized for storing vertex data and other data used by GPU 200, and GPU 200 may write frame data to local memory 230.

Referring now to FIG. 3, a block diagram of one embodiment of a graphics processing system is shown. Graphics processing system 300 may include shader controller 310, and shader controller 310 may assign specific graphics processing tasks to individual shader computing units within shader array 320. Shader controller 310 may perform pre-processing on graphics-processing tasks and general-computing tasks, and issue these tasks to shader array 320. Shader controller 310 may identify which processing elements of the shader array are available to process new workloads, and shader controller 310 may send the new workloads to the available processing elements of shader array 320. Shader controller 310 may keep track of which workloads are being processed by the different processing elements of the shader array, enabling a plurality of threads to execute in parallel.

Shader array 320 may include texture consuming shader 321 and decompressing shader 322, which are representative of any number and type of shader processors which may be included in shader array 320. In various embodiments, shader array 320 may include an additional shader processor which may be configured to generate texture data procedurally. Generally speaking, procedural texture generation refers to the process of generating a texture algorithmically. In various embodiments this procedural generation of texture is performed dynamically rather than in advance. Shader array 320 may be used for texture mapping and producing image data for a display device, among other tasks. As part of performing these operations, texture consuming shader 321 may issue a texture request to texture filter 330. The texture request may be for one or more portions (e.g., blocks, texels) of the texture. Texture filter 330 may generate a virtual address for the requested texture, and convey the virtual address with the request to cache 340. Cache 340 may store textures in the form of texel data associated with pixels. Some of the textures may be compressed, and some of the textures may be uncompressed.

After receiving the virtual address from texture filter 330, cache 340 may perform an address check against all known virtual address ranges to determine if the requested texture is stored in cache 340. If an uncompressed version of the requested texture is stored in cache 340, cache 340 may return the uncompressed version of the texture to texture filter 330. If the uncompressed version of the texture is not stored in cache 340, the attempted request may result in a cache miss. In response to a cache miss, decompressing shader 322 may be initiated for the purpose of decompressing a compressed version of the texture. In various embodiments, shader array 320 may receive a request from cache 340, or otherwise, to initiate a decompressing shader. Also in response to a cache miss, texture consuming shader 321 may pass the virtual address of the texture to decompressing shader 322. Resources for the decompressing shader program may be pre-allocated on decompressing shader 322 to decrease the shader start latency and simplify resource management. The request may be routed to a particular shader processor of shader array 320 based on the virtual address of the block being requested.

Cache 340 may be queried for a compressed version of the texture, and if the compressed version of the texture is stored in cache 340, the compressed version of the texture may be returned to decompressing shader 322. If the compressed version of the texture is not stored in cache 340, the compressed version of the texture may be retrieved from system memory or another location. Decompressing shader 322 may also receive additional tables, textures, and/or constants to facilitate the decompression operation. Decompressing shader 322 may decompress some additional compressed data necessary to decompress the requested texture. In the case of a JPEG-compressed texture, the texture may be transcoded from the original code to a new encoding scheme, and the new encoding scheme may be designed to make decompression more efficient. After decompressing shader 322 has received and decompressed the compressed version of the texture, texture consuming shader 321 may utilize the decompressed version of the texture for the appropriate rendering calculations. This process may continue for a plurality of textures and/or portions of textures. In another embodiment, the functions described as being performed by texture filter 330 may be performed by shader array 320, and shader array 320 may be coupled directly to cache 340.

Cache 340 may utilize a table to determine the address to which a given virtual address maps for the compressed versions of textures stored in cache 340. In various embodiments, the table (or portions thereof) may be stored in cache 340 or elsewhere. In one embodiment, the table may map a virtual address to another address of the compressed version of a texture. The address to which the virtual address is mapped may or may not itself be a virtual address. Numerous options for the types of addressing schemes utilized are possible and are contemplated. The table may store an offset for each block of the compressed version of the texture, wherein the offset gives the location from the beginning of the compressed version of the texture to the block. In various embodiments, the table may facilitate random access to the blocks of one or more compressed textures. The cache logic of cache 340 may determine an address of a given block in response to a request for the compressed version of the block. The cache logic may use the table to determine an offset at which the desired block is stored within a page or fetch unit of the cache. The plurality of shaders of shader array 320 may also use the table to determine the offset of a requested block of a texture. In various embodiments, cache 340 may utilize a plurality of tables with mapping information on a plurality of textures.

After the texture data has been processed, shader array 320 may convey the image data to render unit 350. Render unit 350 may assign a specific number value that defines a unique color attribute for each pixel of an image frame. The number values may be passed to frame buffer 360 where they may be stored for use at the appropriate time, such as when they are rendered on display device 370.

On a subsequent operation, texture consuming shader 321 may be configured to perform the functions of a decompressing shader, and decompressing shader 322 may be configured to perform the functions of a texture consuming shader. Each shader processor of shader array 320 may be configured to perform a variety of functions depending on the requirements of the current operation.

In various embodiments, load balancing may be utilized to assign decompression tasks to underutilized shaders. Also, some space may be reserved in a number of compute units to allow decompression shaders to be launched on a number of compute units. Furthermore, multiple decompression requests may be packed into single instruction multiple data (SIMD) vectors. The SIMD vectors may facilitate the decompression of multiple blocks in one vector. In one embodiment, 16 blocks may be decompressed in one vector, with one block per four lanes.

In various embodiments, graphics processing system 300 may enable on-the-fly procedural generation of texture data. One shader may generate on-the-fly texture data, and a second shader may utilize the generated texture data for rendering operations. A decompressing shader may access compressed data and another shader may be utilized to decompress additional data, such as one or more tables. Some of the compressed data may be compressed using a variety of compression techniques. In various embodiments, the decompressing shader may request data from the cache, and in response to a cache miss, another shader may be initiated to procedurally generate texture data.

Turning now to FIG. 4A, a block diagram of one embodiment of a data cache is shown. Cache 410 may contain portions of textures 420 and 430, which are representative of any number of portions of textures which may be stored in cache 410. Textures 420 and 430 may be compressed textures, while the plurality of textures stored in cache 410 may be a mix of compressed and uncompressed textures. Texture 420 may include blocks 422 and 423, which are representative of any number of blocks of texture 420. Texture 420 may also include table 421, which may map a virtual address space of texture 420 to an address space of compressed texture 420. Texture 430 may be organized similarly to texture 420. In another embodiment, table 421 may be stored separately from texture 420.

When a texture consuming shader requests a block of a texture from cache 410, and the request results in a cache miss, cache 410 may allocate cache line 440 for the requested block. Cache 410 may convey the address of the allocated cache line to a decompressing shader. After the decompressing shader has decompressed the compressed block corresponding to the requested block, the decompressing shader may be configured to write the decompressed block to cache line 440. Alternatively, the decompressing shader may write the decompressed block to various locations within cache 410. In response to the decompressing shader writing the decompressed block to cache line 440, the texture consuming shader may be configured to fetch the decompressed block from cache 410. The corresponding latency compensation queues may need to be extended to accommodate the larger latency resulting from the on-the-fly decompression of the compressed block.

After the decompressed version of the block has been written to cache line 440, cache 410 may store the compressed version of the block and the decompressed version of the block. In various embodiments, cache 410 may execute a retention policy that discards one of the versions of the block in response to determining both versions are stored in cache 410. In one embodiment, the decompressed version of the block may be discarded after it has been fetched by the texture consuming shader. In another embodiment, the compressed version of the block may be discarded after the decompressed version of the block has been written to cache 410. In a further embodiment, both the compressed and decompressed version of the block may be maintained in cache 410 for an extended period of time.

In response to a request for an uncompressed version of a block of a texture, cache 410 may determine that the uncompressed version is not stored in cache 410. In various embodiments, in response to such a determination, cache 410 may automatically search for the compressed version of the block. If the compressed version of the block is stored in cache 410, cache 410 may notify a shader or other processing unit and/or cache 410 may convey the compressed version of the block to the shader or other processing unit.

In some embodiments, in response to a cache miss on a request for an uncompressed block, a separate software thread may be started, and the thread may initiate a decompressing shader. The texture consuming shader may convey the virtual address of the block to the decompressing shader. In various embodiments, when the shader finishes the decompression task, the decompressing shader may convey the uncompressed block(s) to the cache. In other embodiments, when the decompressing shader finishes the decompression operation, the decompressing shader may convey the shader output to the texture consuming shader.

Referring now to FIG. 4B, a block diagram of one embodiment of a block mapping table is shown. Table 421 may store mapping information for the plurality of blocks of texture 420 (of FIG. 4A). In various embodiments, table 421 may be organized in a variety of ways with other types of information in addition to what is illustrated in FIG. 4B. For example, in one embodiment, table 421 may include a DC coefficient value for each block of texture 420.

Table 421 may map the virtual address space of texture 420 to the physical address space of compressed texture 420 (of FIG. 4A). A decompressing shader (not shown) may fetch or otherwise receive one or more blocks of texture 420 from cache 410, and the decompressing shader may determine the location and size of the compressed blocks from table 421. The size of a compressed block may be determined by calculating the difference between the starting physical addresses of two adjacent blocks. In other embodiments, additional data may be provided to indicate size and/or location information for blocks. Further, the decompression shader may obtain additional information from table 421, such as a DC coefficient value of each block.

In some embodiments, the texture may be organized according to superblocks. A superblock may be a set of 16 8×8 blocks, which is a tile of 32×32 pixels, for a total of 1024 pixels. The index table for the texture may include a table entry for each superblock, and each table entry may give the address of the start of each superblock. In one embodiment, this address may be the location of the superblock within the texture. In another embodiment, this address may be an offset from the start of the texture. Each entry may also include a 4-bit index of the first 8×8 block belonging to the superblock. In some embodiments, superblocks may not be aligned with 2 kilobit (Kb) boundaries of the cache. Each entry may also include a 16-bit mask. The 16-bit mask may include one bit per block indicating whether that block starts in the next 2 Kb word.

In some embodiments, the decompressing shader may transform the virtual address of the 8×8 block into the virtual address of a 32×32 superblock to calculate an entry number of the index table for lookup purposes. The decompressing shader may lookup the entry of the index table corresponding to the superblock. The index table may be processed by a shader in a similar manner as other textures. The entries of the index table may be cached and processed.

From each index table entry, the shader may obtain the base address, which may be a virtual address. The base address may be of the first fetch unit of the compressed superblock. The shader may also obtain the offset of the fetch unit containing the requested block which needs to be decompressed. The shader may also calculate if the block is compressed or not based on the address of the block. Certain address ranges may correspond to virtual addresses of uncompressed blocks, and other address ranges may correspond to physical addresses of compressed blocks. The shader may be able to distinguish between the different address ranges.

Referring now to FIG. 5, a block diagram of one embodiment of a virtual address space for an 8×8 block of texels is shown. Each texel may be mapped to a unique address within virtual address space 570. Texel 1 may be mapped to address 501, texel 2 may be mapped to address 502, and so on, for all 64 texels of 8×8 block 500. Block 500 may be a block within a compressed texture, and virtual address space 570 may be allocated for block 500 of the compressed texture. The texture may include a plurality of blocks in addition to block 500. Virtual address space 570 may also include a unique address for each texel of the plurality of blocks in the texture.

For purposes of illustration, it will be assumed that an uncompressed texel is a 32-bit value (4 sets of 8-bit values). Other sizes of uncompressed texels may also be utilized with the methods and mechanisms described herein. For example, an uncompressed texel with a 24-bit value may be handled in a similar way. In various embodiments, a texture consuming shader may generate requests for individual texels. First, the shader may compute the virtual address of a texel. Then, the cache may be queried for the virtual address corresponding to the texel.

Turning now to FIG. 6, a block diagram of one embodiment of compressed data is shown. Data portion 605 may be a unit of fetch of the compressed data, and the size of data portion 605 may be based on the size of an uncompressed block. In one embodiment, a fetch unit may be of size 2 Kb. In other embodiments, a fetch unit may be any of various sizes. A plurality of compressed blocks may be packed into a fetch unit. In one embodiment, the maximum number of blocks that may be packed into a fetch unit may be assumed to be 16. In other embodiments, other numbers of blocks may be packed into a fetch unit. For one type of cache access scheme, it may be assumed that the data of the blocks do not cross boundaries of fetch units.

A block may be the smallest decodable unit of a compression format, such as JPEG. For JPEG, the block is an 8×8 pixel tile (with 64 pixels). When a texture is compressed, and a block of the texture requested by a shader needs to be decompressed, a cache line may be allocated in the cache for the block. In one embodiment, the cache line size may be 2 Kb to store an entire uncompressed block (32 bits*64=2 Kb). In other embodiments, the cache line size may be any of various sizes.

If a fetch unit contains an uncompressed block, then only one block may fit in the fetch unit. For a fetch unit containing compressed blocks, the fetch unit may also include a 176-bit header. The fetch unit may be assumed to have a capacity of 16 blocks. The header may include 16 11-bit offset values to indicate the locations of the compressed blocks within the fetch unit. The offsets reference the starting bit positions of the blocks. In other embodiments, there may be a variable number of offset indicators in the header.

As shown in FIG. 6, data portion 605 may include header 610 and blocks 611-626. Blocks 611-626 may be sixteen different blocks of a compressed texture. Header 610 may include offsets 631-646. Each offset may be an 11-bit offset value corresponding to the location of the corresponding block within data portion 605. In other embodiments, other bit-sizes of offset values may be utilized. Offset 631 may represent the starting address of block 611, offset 632 may represent the starting address of block 612, and so on. In some embodiments, there may be an additional offset indicating the last bit of the last block, to reduce unnecessary fetch from the cache.

In some embodiments, compressed 8×8 blocks of the texture may be packed and cross fetch unit boundaries. The corresponding information, showing that the block uses two fetch units, may be stored in an index table, and a decompressing shader may generate two fetches instead of one for blocks that cross fetch unit boundaries.

Turning now to FIG. 7, one embodiment of a method for decompressing a compressed block of a texture is shown. For purposes of discussion, the steps in this embodiment are shown in sequential order. It should be noted that in various embodiments of the method described below, one or more of the elements described may be performed concurrently, in a different order than shown, or may be omitted entirely. Other additional elements may also be performed as desired.

The method 700 starts in block 705, and then in block 710, a first shader of a plurality of shaders may determine the need for a block of a texture as part of the rendering operations for an image. The first shader may be a texture consuming shader. Next, the first shader may calculate the virtual address of the block (block 715). The first shader may have an uncompressed view of the texture, corresponding to the uncompressed version of the texture, and the virtual address may correspond to the location of the requested block within the uncompressed view. After block 715, the first shader may request the block from the cache and convey the virtual address with the request (block 720). Next, the cache may determine if an uncompressed version of the block is stored in the cache (conditional block 725). If the uncompressed version of the block is stored in the cache, the first shader may receive the uncompressed version of the block from the cache and process the block (block 770).

If the uncompressed version of the block is not stored in the cache, a second shader of the plurality of shaders may be initiated as a decompressing shader (block 730). The resources for the decompressing shader may be pre-allocated on one or more shader processors to decrease the shader start latency and simplify resource management. Also, the virtual address of the requested block may be passed from the first shader to the second shader. Next, a cache line may be allocated for the requested block (block 735). Then, the cache may determine if a compressed version of the block is stored in the cache (conditional block 740). In various embodiments, the cache may make this determination in response to a request by the second shader for the compressed version of the block. In other embodiments, the cache may make this determination automatically in response to determining the uncompressed version of the block is not stored in the cache (conditional block 725).

If the compressed version of the block is stored in the cache (conditional block 740), then the cache and/or second shader may determine the location and size of the compressed version of the block from the table (block 750). If the compressed version of the block is not stored in the cache (conditional block 740), then the compressed version of the block may be fetched (e.g., from local or system memory) and stored in the cache (block 745). Fetching the compressed version of the block from system memory may entail fetching the entire compressed texture or some portion of the texture. The cache may be configured to utilize a table which maps the virtual address space of an uncompressed version of the texture to an address space of a compressed version of the texture. The cache and/or second shader may determine the location and size of the compressed version of the block from the table (block 750). The table may also contain additional information, such as the value of the DC coefficient of a compressed version of each block of the texture. After block 750, the compressed version of the block may be conveyed to the second shader from the cache (block 755).

In another embodiment, if the compressed version of the block is not in the cache (conditional block 740), steps 745, 750, and 755 may be replaced with alternate steps. In the alternate steps, the compressed version of the block may be fetched from system memory and provided directly to the second shader. These alternate steps may be more efficient than having the second shader receive the compressed version of the block from the cache. In a further embodiment, the compressed version of the block may be fetched from system memory and provided directly to the second shader while also being written to the cache.

After the second shader receives the compressed version of the block (block 755), the second shader may decompress the compressed version of the block (block 760). Next, the second shader may write the decompressed version of the block to the cache (block 765). Then, the first shader may receive the decompressed version of the block from the cache and process the block as part of the rendering operations for the current image (block 770). After block 770, the method may end in block 775. Method 700 may be repeated for a plurality of blocks from a plurality of textures.

Although the features and elements are described in the example embodiments in particular combinations, each feature or element can be used alone without the other features and elements of the example embodiments or in various combinations with or without other features and elements. The present invention may be implemented in a computer program or firmware tangibly embodied in a non-transitory computer-readable storage medium having machine readable instructions for execution by a machine, a processor, and/or any general purpose computer for use with or by any non-volatile memory device. The computer-readable storage medium may contain program instructions which are operable to enable the functions, methods, and operations described in this specification. Suitable processors include, by way of example, both general and special purpose processors.

Typically, a processor will receive instructions and data from a read only memory (ROM), a RAM, and/or a storage device having stored software or firmware. Storage devices suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, read only memories (ROMs), magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks and digital versatile disks (DVDs).

The above described embodiments may be designed in software using a hardware description language (HDL) such as Verilog or VHDL. The HDL-design may model the behavior of an electronic system, and the design may be synthesized and ultimately fabricated into a hardware device. In addition, the HDL-design may be stored in a computer product and loaded into a computer system prior to hardware manufacture.

Types of hardware components, processors, or machines which may be used by or in conjunction with the present invention include Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), microprocessors, or any integrated circuit. Such processors may be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions (such instructions capable of being stored on a computer readable media). The results of such processing may be maskworks that are then used in a semiconductor manufacturing process to manufacture a processor which implements aspects of the methods and mechanisms described herein.

Software instructions, such as those used to implement image rendering calculations and shader tasks, may be stored on a computer-readable storage medium. A computer-readable storage medium may include any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The computer-readable storage medium may include, but is not limited to, magnetic or optical media (e.g., disk (fixed or removable), tape, CD-ROM, DVD-ROM, CD-R, CD-RW, DVD-R, DVD-RW, or Blu-Ray), RAM (e.g., synchronous dynamic RAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.) SDRAM, low-power DDR (LPDDR2, etc.) SDRAM, Rambus DRAM (RDRAM), static RAM (SRAM)), ROM, non-volatile memory (e.g. Flash memory) accessible via a peripheral interface such as the USB interface, micro-electro-mechanical systems (MEMS), and storage media accessible via a communication medium such as a network and/or a wireless link.

Although several embodiments of approaches have been shown and described, it will be apparent to those of ordinary skill in the art that a number of changes, modifications, or alterations to the approaches as described may be made. Changes, modifications, and alterations should therefore be seen as within the scope of the methods and mechanisms described herein. It should also be emphasized that the above-described embodiments are only non-limiting examples of implementations.

Claims

1. An apparatus comprising: a plurality of processing elements comprising circuitry configured to process data in parallel; andcircuitry configured to: process a first graphics processing task by a first processing element of the plurality of processing elements, wherein the first graphics processing task processes a plurality of blocks of a texture to produce image data for a display device; andbased at least in part on a determination that multiple blocks of the plurality of blocks of the texture do not have an uncompressed version in the apparatus: generate a vector by packing multiple decompression requests targeting the multiple blocks into the vector; andprocess the multiple decompression requests by a second processing element of the plurality of processing elements.
2. The apparatus as recited in claim 1, wherein the vector is a single instruction multiple data (SIMD) vector.
3. The apparatus as recited in claim 1, wherein each of the multiple decompression requests of the vector is processed in parallel by multiple lanes of a plurality of lanes of the second processing element.
4. The apparatus as recited in claim 3, wherein prior to generating the vector, the circuitry is further configured to transcode a first compressed version of the multiple blocks to a second compressed version of the multiple blocks.
5. The apparatus as recited in claim 3, wherein, based at least in part on a determination that the second processing element has completed the multiple decompression requests, the circuitry is further configured to send the uncompressed version of the multiple blocks to the first processing element to complete the first graphics processing task.
6. The apparatus as recited in claim 4, wherein the circuitry is further configured to insert, in the multiple decompression requests when generating the vector, identification that specifies a storage location of the second compressed version of the multiple blocks.
7. The apparatus as recited in claim 1, wherein the apparatus is configured to: maintain a first virtual address space for a compressed version of the texture; andmaintain a second virtual address space different from the first virtual address space for an uncompressed version of the texture.
8. A method for decompressing texture data, the method comprising: processing, by circuitry of a processing unit, a first graphics processing task by a first processing element of a plurality of processing elements, wherein the first graphics processing task processes a plurality of blocks of a texture to produce image data for a display device; andbased at least in part on a determination that multiple blocks of the plurality of blocks of the texture do not have an uncompressed version available: generating a vector by packing multiple decompression requests targeting the multiple blocks into the vector; andprocessing the multiple decompression requests by a second processing element of the plurality of processing elements.
9. The method as recited in claim 8, wherein the vector is a single instruction multiple data (SIMD) vector.
10. The method as recited in claim 8, wherein each of the multiple decompression requests of the vector is processed in parallel by multiple lanes of a plurality of lanes of the second processing element.
11. The method as recited in claim 10, wherein prior to generating the vector, the method further comprises transcoding a first compressed version of the multiple blocks to a second compressed version of the multiple blocks.
12. The method as recited in claim 11, further comprising inserting, in the multiple decompression requests when generating the vector, identification that specifies a storage location of the second compressed version of the multiple blocks.
13. The method as recited in claim 10, wherein, in response to determining that the second processing element has completed the multiple decompression requests, the method further comprises sending the uncompressed version of the multiple blocks to the first processing element to complete the first graphics processing task.
14. The method as recited in claim 13, further comprising: processing, by the second processing element, a second graphics processing task; andprocessing, by the first processing element, a vector comprising multiple decompression requests targeting multiple blocks of a texture of the second graphics processing task.
15. A computing system comprising: a processing unit; anda cache;wherein the processing unit comprises: a plurality of processing elements configured to process data in parallel;circuitry configured to:process a graphics processing task by a first processing element of the plurality of processing elements, wherein the graphics processing task processes a plurality of blocks of a texture to produce image data for a display device; andbased at least in part on a determination that multiple blocks of the plurality of blocks of the texture do not have an uncompressed version in the cache: generate a vector by packing multiple decompression requests targeting the multiple blocks into the vector; andprocess the multiple decompression requests by a second processing element of the plurality of processing elements.
16. The computing system as recited in claim 15, wherein the vector is a single instruction multiple data (SIMD) vector.
17. The computing system as recited in claim 15, wherein each of the multiple decompression requests of the vector is processed in parallel by multiple lanes of a plurality of lanes of the second processing element.
18. The computing system as recited in claim 17, wherein prior to generating the vector, the circuitry is further configured to transcode a first compressed version of the multiple data second compressed version of the multiple blocks.
19. The computing system as recited in claim 18, wherein the circuitry is further configured to insert, in the multiple decompression requests when generating the vector, identification that specifies a storage location of the second compressed version of the multiple blocks.
20. The computing system as recited in claim 17, wherein, based at least in part on a determination that the second processing element has completed the multiple decompression requests, the circuitry is further configured to send the uncompressed version of the multiple blocks to the first processing element to complete the graphics processing task.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 16/709,462, now U.S. Pat. No. 11,043,010 entitled “REAL TIME ON-CHIP TEXTURE DECOMPRESSION USING SHADER PROCESSORS”, filed Dec. 10, 2019, which is a continuation of U.S. patent application Ser. No. 15/181,130, now U.S. Pat. No. 10,510,164, entitled “REAL TIME ON-CHIP TEXTURE DECOMPRESSION USING SHADER PROCESSORS”, filed Jun. 13, 2016, which is a continuation of U.S. patent application Ser. No. 13/163,071, now U.S. Pat. No. 9,378,560, entitled “REAL TIME ON-CHIP TEXTURE DECOMPRESSION USING SHADER PROCESSORS”, filed Jun. 17, 2011, the entirety of which is incorporated herein by reference.

US Referenced Citations (574)

Number	Name	Date	Kind
5191548	Balkanski et al.	Mar 1993	A
5237460	Miller et al.	Aug 1993	A
5313577	Meinerth et al.	May 1994	A
5548751	Ryu et al.	Aug 1996	A
5649151	Chu et al.	Jul 1997	A
5696926	Culbert et al.	Dec 1997	A
5696927	MacDonald et al.	Dec 1997	A
5699539	Garber et al.	Dec 1997	A
5729668	Claflin et al.	Mar 1998	A
5768481	Chan et al.	Jun 1998	A
5812817	Hovis	Sep 1998	A
5900022	Kranich	May 1999	A
5936616	Torborg, Jr. et al.	Aug 1999	A
5940871	Goyal et al.	Aug 1999	A
5946716	Karp et al.	Aug 1999	A
6002411	Dye	Dec 1999	A
6008820	Chauvin et al.	Dec 1999	A
6032197	Birdwell et al.	Feb 2000	A
6049390	Notredame et al.	Apr 2000	A
6088780	Yamada et al.	Jul 2000	A
6092171	Relph	Jul 2000	A
6108460	Rich	Aug 2000	A
6111585	Choi	Aug 2000	A
6115787	Obara	Sep 2000	A
6154216	Walton	Nov 2000	A
6157743	Goris	Dec 2000	A
6173381	Dye	Jan 2001	B1
6184997	Hanyu et al.	Feb 2001	B1
6192051	Lipman et al.	Feb 2001	B1
6199126	Auerbach	Mar 2001	B1
6229926	Chui et al.	May 2001	B1
6243081	Goris	Jun 2001	B1
6266753	Hicok et al.	Jul 2001	B1
6279092	Franaszek	Aug 2001	B1
6307557	Deering	Oct 2001	B1
6313767	Ishizuka et al.	Nov 2001	B1
6349372	Benveniste	Feb 2002	B1
6349375	Faulkner	Feb 2002	B1
6349379	Gibson et al.	Feb 2002	B2
6362824	Thayer	Mar 2002	B1
6366289	Johns	Apr 2002	B1
6446189	Zuraski, Jr. et al.	Sep 2002	B1
6452602	Morein	Sep 2002	B1
6515673	Hashimoto et al.	Feb 2003	B1
6525722	Deering	Feb 2003	B1
6553457	Wilkins	Apr 2003	B1
6557083	Sperber et al.	Apr 2003	B1
6618440	Taunton	Sep 2003	B1
6630933	Van Hook	Oct 2003	B1
6728722	Shaylor	Apr 2004	B1
6779088	Benveniste et al.	Aug 2004	B1
6779100	Keltcher et al.	Aug 2004	B1
6807285	Iwamura	Oct 2004	B1
6853385	MacInnis et al.	Feb 2005	B1
6854050	Zuraski, Jr.	Feb 2005	B1
6897858	Hashimoto et al.	May 2005	B1
6920331	Sim et al.	Jul 2005	B1
6947944	Furuike	Sep 2005	B1
6959110	Danskin et al.	Oct 2005	B1
6968424	Danilak	Nov 2005	B1
6972769	Nebeker et al.	Dec 2005	B1
6973562	McGrath	Dec 2005	B1
7039241	Van Hook	May 2006	B1
7050639	Barnes et al.	May 2006	B1
7068192	Dean et al.	Jun 2006	B1
7079698	Kobayashi	Jul 2006	B2
7158990	Guo et al.	Jan 2007	B1
7171051	Moreton	Jan 2007	B1
7190284	Dye	Mar 2007	B1
7302543	Lekatsas et al.	Nov 2007	B2
7324113	Rouet et al.	Jan 2008	B1
7339590	Moskal et al.	Mar 2008	B1
7375660	Chang et al.	May 2008	B1
7382368	Molnar et al.	Jun 2008	B1
7385611	Toksvig	Jun 2008	B1
7512616	Corcoran	Mar 2009	B2
7526607	Singh et al.	Apr 2009	B1
7545382	Montrym et al.	Jun 2009	B1
7565018	Srinivasan	Jul 2009	B2
7587572	Stenstrom	Sep 2009	B1
7599975	Donovan	Oct 2009	B1
7617382	Toll et al.	Nov 2009	B1
7620953	Tene et al.	Nov 2009	B1
7626521	Hussain et al.	Dec 2009	B2
7656326	Hussain et al.	Feb 2010	B2
7669202	Tene et al.	Feb 2010	B1
7681013	Trivedi	Mar 2010	B1
7702875	Ekman	Apr 2010	B1
7742646	Donovan	Jun 2010	B1
7840786	Suggs	Nov 2010	B2
7848430	Valmiki et al.	Dec 2010	B2
7877565	Allen et al.	Jan 2011	B1
7885469	Wang et al.	Feb 2011	B2
7886116	Everitt	Feb 2011	B1
7920749	Donovan	Apr 2011	B1
7932912	Van Dyke	Apr 2011	B1
7932914	Geiss	Apr 2011	B1
7944452	Wietkemper et al.	May 2011	B1
7965773	Schlanger et al.	Jun 2011	B1
7991049	MacInnis et al.	Aug 2011	B2
7999820	Weitkemper	Aug 2011	B1
8023752	Donovan et al.	Sep 2011	B1
8040351	Diard	Oct 2011	B1
8065354	Donovan et al.	Nov 2011	B1
8094158	Allen	Jan 2012	B1
8184710	Srinivasan et al.	May 2012	B2
8243086	Diard	Aug 2012	B1
8254701	Diard	Aug 2012	B1
8271734	Glasco	Sep 2012	B1
8295621	Diard	Oct 2012	B1
8311091	Lin et al.	Nov 2012	B1
8421794	Du et al.	Apr 2013	B2
8441487	Everitt	May 2013	B1
8504744	Khawer et al.	Aug 2013	B2
8542243	Rogers	Sep 2013	B2
8595437	Glasco et al.	Nov 2013	B1
8619800	Finney et al.	Dec 2013	B1
8631055	Wegener	Jan 2014	B2
8655062	Kuo et al.	Feb 2014	B2
8661496	Perlman et al.	Feb 2014	B2
8687702	Schmit	Apr 2014	B2
8700865	Van Dyke et al.	Apr 2014	B1
8941655	Nordlund et al.	Jan 2015	B2
8958481	Lin	Feb 2015	B2
8982655	Henry et al.	Mar 2015	B1
9013473	Ahn et al.	Apr 2015	B2
9024957	Mahan et al.	May 2015	B1
9024959	Duprat et al.	May 2015	B2
9047686	Sharp	Jun 2015	B2
9159114	Dewangan et al.	Oct 2015	B2
9213640	Kaplan et al.	Dec 2015	B2
9218689	Baldwin	Dec 2015	B1
9258013	Chau et al.	Feb 2016	B1
9274951	Evans et al.	Mar 2016	B2
9319063	Wegener	Apr 2016	B2
9378560	Iourcha et al.	Jun 2016	B2
9407920	Guo	Aug 2016	B2
9419648	Guilford et al.	Aug 2016	B1
9477605	Loh	Oct 2016	B2
9536275	Diard	Jan 2017	B1
9537504	Guilford et al.	Jan 2017	B1
9547747	Fan et al.	Jan 2017	B2
9569348	Deming et al.	Feb 2017	B1
9582428	Henry et al.	Feb 2017	B2
9606961	Uliel et al.	Mar 2017	B2
9665490	Henry	May 2017	B2
9690511	Henry et al.	Jun 2017	B2
9697219	Wang et al.	Jul 2017	B1
9699470	Fenney	Jul 2017	B2
9720617	Myrick et al.	Aug 2017	B2
9734598	Grossman	Aug 2017	B2
9760439	Anderson	Sep 2017	B2
9767529	Liu et al.	Sep 2017	B1
9779023	Armangau	Oct 2017	B1
9880842	Bobba et al.	Jan 2018	B2
9912957	Surti	Mar 2018	B1
9928642	Diard	Mar 2018	B2
10140033	Amidi et al.	Nov 2018	B2
10171219	Fan et al.	Jan 2019	B2
10236971	Ui et al.	Mar 2019	B2
10298684	Baptist et al.	May 2019	B2
10366026	Diamant	Jul 2019	B1
10510164	Iourcha et al.	Dec 2019	B2
10572378	Li	Feb 2020	B2
10901948	Ackerman	Jan 2021	B2
11043010	Iourcha et al.	Jun 2021	B2
11050436	Lasch	Jun 2021	B2
11550600	Xu	Jan 2023	B2
20010050955	Zhang et al.	Dec 2001	A1
20010054131	Alvarez, II et al.	Dec 2001	A1
20020042862	Breternitz, Jr. et al.	Apr 2002	A1
20020049586	Nishio et al.	Apr 2002	A1
20020099907	Castelli et al.	Jul 2002	A1
20030053115	Shoda et al.	Mar 2003	A1
20030088715	Chaudhuri et al.	May 2003	A1
20030095596	Shimizu	May 2003	A1
20030112869	Chen et al.	Jun 2003	A1
20030117585	Lee	Jun 2003	A1
20030135694	Naffziger et al.	Jul 2003	A1
20030161542	Ridge	Aug 2003	A1
20030169935	Sano	Sep 2003	A1
20030185305	MacInnis et al.	Oct 2003	A1
20030191903	Sperber et al.	Oct 2003	A1
20030202607	Srinivasan	Oct 2003	A1
20030217237	Benveniste et al.	Nov 2003	A1
20030225981	Castelli et al.	Dec 2003	A1
20040008790	Rodriguez	Jan 2004	A1
20040015660	Benveniste et al.	Jan 2004	A1
20040028141	Hsiun et al.	Feb 2004	A1
20040046765	Lefebvre et al.	Mar 2004	A1
20040054953	Mitchell	Mar 2004	A1
20040091160	Hook et al.	May 2004	A1
20040111710	Chakradhar et al.	Jun 2004	A1
20040117578	Castelli	Jun 2004	A1
20040141655	Aoyagi	Jul 2004	A1
20040221128	Beecroft et al.	Nov 2004	A1
20040228527	Iourcha et al.	Nov 2004	A1
20050015378	Gammel et al.	Jan 2005	A1
20050021929	Yamada et al.	Jan 2005	A1
20050035984	Walmsley	Feb 2005	A1
20050060421	Musunuri et al.	Mar 2005	A1
20050093873	Paltashev et al.	May 2005	A1
20050160234	Newburn et al.	Jul 2005	A1
20050219252	Buxton et al.	Oct 2005	A1
20050225554	Bastos et al.	Oct 2005	A1
20050231504	Heng et al.	Oct 2005	A1
20050237335	Koguchi et al.	Oct 2005	A1
20050254715	Keng et al.	Nov 2005	A1
20050278508	Vasekin et al.	Dec 2005	A1
20060004942	Hetherington et al.	Jan 2006	A1
20060038822	Xu et al.	Feb 2006	A1
20060050978	Lin et al.	Mar 2006	A1
20060053188	Mantor	Mar 2006	A1
20060053189	Mantor	Mar 2006	A1
20060072147	Kanno	Apr 2006	A1
20060092169	Wetzel	May 2006	A1
20060112226	Hady et al.	May 2006	A1
20060114264	Weybrew et al.	Jun 2006	A1
20060117054	Li et al.	Jun 2006	A1
20060123184	Mondal et al.	Jun 2006	A1
20060126962	Sun	Jun 2006	A1
20060146939	Wang	Jul 2006	A1
20060176196	Sugita	Aug 2006	A1
20060250652	Silverbrook et al.	Nov 2006	A1
20060294312	Walmsley	Dec 2006	A1
20070005625	Lekatsas et al.	Jan 2007	A1
20070018979	Budagavi	Jan 2007	A1
20070033322	Zimmer et al.	Feb 2007	A1
20070036223	Srinivasan	Feb 2007	A1
20070083491	Walmsley et al.	Apr 2007	A1
20070091088	Jiao et al.	Apr 2007	A1
20070091089	Jiao et al.	Apr 2007	A1
20070109582	Mizuyama	May 2007	A1
20070147694	Chang et al.	Jun 2007	A1
20070147787	Ogawa et al.	Jun 2007	A1
20070157001	Ritzau	Jul 2007	A1
20070174588	Fuin	Jul 2007	A1
20070177205	Yamada et al.	Aug 2007	A1
20070237391	Wu	Oct 2007	A1
20070242085	Weybrew et al.	Oct 2007	A1
20070245097	Gschwind	Oct 2007	A1
20070245119	Hoppe	Oct 2007	A1
20070257935	Koduri et al.	Nov 2007	A1
20070269115	Wang et al.	Nov 2007	A1
20070279429	Ganzer	Dec 2007	A1
20070280543	Matsuhira	Dec 2007	A1
20070297501	Hussain et al.	Dec 2007	A1
20080002702	Bajic et al.	Jan 2008	A1
20080002896	Lu	Jan 2008	A1
20080019590	Arakawa	Jan 2008	A1
20080027892	Carnahan et al.	Jan 2008	A1
20080044090	Wakasa	Feb 2008	A1
20080055323	Franaszek et al.	Mar 2008	A1
20080059728	Daly	Mar 2008	A1
20080109795	Buck et al.	May 2008	A1
20080122665	Paris	May 2008	A1
20080162745	Tkacik et al.	Jul 2008	A1
20080172525	Nakamura et al.	Jul 2008	A1
20080189545	Parkinson	Aug 2008	A1
20080260276	Yamatani et al.	Oct 2008	A1
20080266287	Ramey et al.	Oct 2008	A1
20080266296	Ramey et al.	Oct 2008	A1
20080301256	McWilliams	Dec 2008	A1
20080301681	Sakamoto et al.	Dec 2008	A1
20080304738	Beilloin	Dec 2008	A1
20090002379	Baeza et al.	Jan 2009	A1
20090003446	Wu et al.	Jan 2009	A1
20090003457	Liu et al.	Jan 2009	A1
20090016603	Rossato et al.	Jan 2009	A1
20090019219	Magklis et al.	Jan 2009	A1
20090021522	Burley et al.	Jan 2009	A1
20090049282	Bacon et al.	Feb 2009	A1
20090051687	Kato et al.	Feb 2009	A1
20090063825	McMillen et al.	Mar 2009	A1
20090077109	Paris	Mar 2009	A1
20090086817	Matsuoka et al.	Apr 2009	A1
20090096642	Stein	Apr 2009	A1
20090119474	Anand et al.	May 2009	A1
20090119736	Perlman et al.	May 2009	A1
20090128575	Liao et al.	May 2009	A1
20090132764	Moll et al.	May 2009	A1
20090147017	Jiao	Jun 2009	A1
20090154819	Doherty et al.	Jun 2009	A1
20090160857	Rasmusson	Jun 2009	A1
20090169001	Tighe et al.	Jul 2009	A1
20090182948	Jiao et al.	Jul 2009	A1
20090193042	Hornibrook et al.	Jul 2009	A1
20090204784	Favergeon-Borgialli et al.	Aug 2009	A1
20090219288	Heirich	Sep 2009	A1
20090228635	Borkenhagen	Sep 2009	A1
20090228656	Borkenhagen	Sep 2009	A1
20090228664	Borkenhagen	Sep 2009	A1
20090228668	Borkenhagen	Sep 2009	A1
20090245382	Ekman	Oct 2009	A1
20090251476	Jiao et al.	Oct 2009	A1
20090295804	Goel et al.	Dec 2009	A1
20090304291	Boulton	Dec 2009	A1
20090304298	Lu et al.	Dec 2009	A1
20090305790	Lu et al.	Dec 2009	A1
20090322751	Oneppo et al.	Dec 2009	A1
20090322777	Lu et al.	Dec 2009	A1
20100017578	Mansson et al.	Jan 2010	A1
20100026682	Plowman et al.	Feb 2010	A1
20100045682	Ford et al.	Feb 2010	A1
20100046846	Brown	Feb 2010	A1
20100057750	Aasted et al.	Mar 2010	A1
20100058007	Yamauchi	Mar 2010	A1
20100091018	Tatarchuk et al.	Apr 2010	A1
20100128979	Monaghan et al.	May 2010	A1
20100138614	Glasco et al.	Jun 2010	A1
20100141666	Christopher et al.	Jun 2010	A1
20100149192	Kota et al.	Jun 2010	A1
20100166063	Perlman et al.	Jul 2010	A1
20100211827	Moyer et al.	Aug 2010	A1
20100211828	Moyer et al.	Aug 2010	A1
20100254621	Wennersten et al.	Oct 2010	A1
20100281004	Kapoor et al.	Nov 2010	A1
20100302077	Abali	Dec 2010	A1
20100332786	Grohoski et al.	Dec 2010	A1
20110007976	Sugita	Jan 2011	A1
20110029742	Grube et al.	Feb 2011	A1
20110029761	Sung et al.	Feb 2011	A1
20110050713	McCrary et al.	Mar 2011	A1
20110050716	Mantor et al.	Mar 2011	A1
20110063318	Bolz et al.	Mar 2011	A1
20110071988	Resch et al.	Mar 2011	A1
20110072235	Deming et al.	Mar 2011	A1
20110072321	Dhuse	Mar 2011	A1
20110078222	Wegener	Mar 2011	A1
20110078423	Giri et al.	Mar 2011	A1
20110087840	Glasco et al.	Apr 2011	A1
20110099204	Thaler	Apr 2011	A1
20110107052	Narayanasamy	May 2011	A1
20110107057	Petolino, Jr.	May 2011	A1
20110115802	Mantor et al.	May 2011	A1
20110115806	Rogers	May 2011	A1
20110148894	Duprat	Jun 2011	A1
20110150351	Singh et al.	Jun 2011	A1
20110164678	Date et al.	Jul 2011	A1
20110167081	Kosaka et al.	Jul 2011	A1
20110169845	Sreenivas et al.	Jul 2011	A1
20110173476	Reed	Jul 2011	A1
20110211036	Tran	Sep 2011	A1
20110213928	Grube et al.	Sep 2011	A1
20110216069	Keall et al.	Sep 2011	A1
20110221743	Keall et al.	Sep 2011	A1
20110231629	Shiraishi	Sep 2011	A1
20110231722	Mukherjee et al.	Sep 2011	A1
20110242125	Hall et al.	Oct 2011	A1
20110243469	McAllister et al.	Oct 2011	A1
20110267346	Howson	Nov 2011	A1
20110279294	Sagar	Nov 2011	A1
20110285913	Astrachan	Nov 2011	A1
20110310105	Koneru et al.	Dec 2011	A1
20110317766	Lim et al.	Dec 2011	A1
20120014597	Matsunaga	Jan 2012	A1
20120076297	Koziol et al.	Mar 2012	A1
20120084058	Sowerby et al.	Apr 2012	A1
20120092353	Paltashev et al.	Apr 2012	A1
20120102277	Kim et al.	Apr 2012	A1
20120131596	Lefebvre et al.	May 2012	A1
20120166757	Volvovski et al.	Jun 2012	A1
20120194527	Hartog et al.	Aug 2012	A1
20120194562	Ivashin	Aug 2012	A1
20120198106	Yang	Aug 2012	A1
20120256771	Mitchem et al.	Oct 2012	A1
20120268558	Choi et al.	Oct 2012	A1
20120275725	Kelly et al.	Nov 2012	A1
20120284587	Yu et al.	Nov 2012	A1
20120290868	Gladwin et al.	Nov 2012	A1
20120301046	Wallace	Nov 2012	A1
20120320067	Iourcha	Dec 2012	A1
20120320069	Lee et al.	Dec 2012	A1
20130018932	Bhaskar et al.	Jan 2013	A1
20130034309	Nystad	Feb 2013	A1
20130036290	Nystad	Feb 2013	A1
20130044961	Amit et al.	Feb 2013	A1
20130091500	Earl et al.	Apr 2013	A1
20130107942	Chen et al.	May 2013	A1
20130166260	Fan et al.	Jun 2013	A1
20130169642	Frascati et al.	Jul 2013	A1
20130191649	Muff et al.	Jul 2013	A1
20130195352	Nystad	Aug 2013	A1
20130251256	Deng et al.	Sep 2013	A1
20130262538	Wegener	Oct 2013	A1
20130278617	Strom et al.	Oct 2013	A1
20130339322	Amit et al.	Dec 2013	A1
20140063016	Howson et al.	Mar 2014	A1
20140092091	Li et al.	Apr 2014	A1
20140132614	Landsberger et al.	May 2014	A1
20140139513	Mammou	May 2014	A1
20140146872	Du et al.	May 2014	A1
20140157285	Chung et al.	Jun 2014	A1
20140177971	Strom et al.	Jun 2014	A1
20140189281	Sokol, Jr.	Jul 2014	A1
20140192891	Alshina et al.	Jul 2014	A1
20140198122	Grossman	Jul 2014	A1
20140210840	Ellis	Jul 2014	A1
20140258997	Lim et al.	Sep 2014	A1
20140267355	Kilgariff et al.	Sep 2014	A1
20140297962	Rozas	Oct 2014	A1
20140325266	Hoffman et al.	Oct 2014	A1
20140354641	Flordal	Dec 2014	A1
20140359219	Evans et al.	Dec 2014	A1
20140369614	Fenney	Dec 2014	A1
20140372722	Flordal et al.	Dec 2014	A1
20150012506	Bhagavan et al.	Jan 2015	A1
20150019813	Loh	Jan 2015	A1
20150019834	Loh	Jan 2015	A1
20150022520	Kim et al.	Jan 2015	A1
20150046672	Sych	Feb 2015	A1
20150046678	Moloney et al.	Feb 2015	A1
20150049110	Lum et al.	Feb 2015	A1
20150067008	Kamath	Mar 2015	A1
20150116342	Haase et al.	Apr 2015	A1
20150125085	Gupta	May 2015	A1
20150178214	Alameldeen et al.	Jun 2015	A1
20150248292	Koker et al.	Sep 2015	A1
20150262385	Satoh et al.	Sep 2015	A1
20150269181	Kuettel et al.	Sep 2015	A1
20150324228	Alsup et al.	Nov 2015	A1
20150338904	Henry et al.	Nov 2015	A1
20150370488	Watanabe et al.	Dec 2015	A1
20150373353	Jeong	Dec 2015	A1
20150378733	Beylin et al.	Dec 2015	A1
20150378734	Hansen et al.	Dec 2015	A1
20150378741	Lukyanov et al.	Dec 2015	A1
20150379682	Golas et al.	Dec 2015	A1
20150379684	Ramani	Dec 2015	A1
20150381202	Satpathy	Dec 2015	A1
20160004642	Sugimoto et al.	Jan 2016	A1
20160035128	Zhao et al.	Feb 2016	A1
20160055093	Turner et al.	Feb 2016	A1
20160055094	Patsilaras et al.	Feb 2016	A1
20160078665	Kwon et al.	Mar 2016	A1
20160078666	Park et al.	Mar 2016	A1
20160086299	Sharma et al.	Mar 2016	A1
20160092371	Shanbhogue	Mar 2016	A1
20160117246	Maurice et al.	Apr 2016	A1
20160140688	Lee	May 2016	A1
20160198164	Lin et al.	Jul 2016	A1
20160239209	Malyugin et al.	Aug 2016	A1
20160248440	Greenfield	Aug 2016	A1
20160260228	Chen et al.	Sep 2016	A1
20160260246	Oldcorn et al.	Sep 2016	A1
20160283390	Coulson	Sep 2016	A1
20160291984	Lu et al.	Oct 2016	A1
20160300320	Iourcha	Oct 2016	A1
20160306738	Bak et al.	Oct 2016	A1
20160321182	Grubisic et al.	Nov 2016	A1
20160323103	Baptist et al.	Nov 2016	A1
20160323407	de los Reyes Darias et al.	Nov 2016	A1
20160342545	Arai et al.	Nov 2016	A1
20160353122	Krajcevski	Dec 2016	A1
20160364830	Deng et al.	Dec 2016	A1
20160379337	Tsai et al.	Dec 2016	A1
20170032543	Cho	Feb 2017	A1
20170084055	Kwon	Mar 2017	A1
20170115924	Abali et al.	Apr 2017	A1
20170163284	Conway et al.	Jun 2017	A1
20170185451	Mirza et al.	Jun 2017	A1
20170185533	Rozas	Jun 2017	A1
20170186224	Diard	Jun 2017	A1
20170221256	Maksymczuk	Aug 2017	A1
20170256020	Sansottera et al.	Sep 2017	A1
20170256025	Abraham	Sep 2017	A1
20170257513	Matsumoto	Sep 2017	A1
20170269851	Oportus Valenzuela et al.	Sep 2017	A1
20170278215	Appu et al.	Sep 2017	A1
20170287098	Strugar et al.	Oct 2017	A1
20170344485	Chen et al.	Nov 2017	A1
20170345125	Golas	Nov 2017	A1
20170371792	Oportus Valenzuela	Dec 2017	A1
20170371797	Oportus Valenzuela	Dec 2017	A1
20180004532	Yasin et al.	Jan 2018	A1
20180012392	Kryachko	Jan 2018	A1
20180048532	Poort et al.	Feb 2018	A1
20180052773	Lin	Feb 2018	A1
20180060235	Yap et al.	Mar 2018	A1
20180074827	Mekkat	Mar 2018	A1
20180082398	Ashkar et al.	Mar 2018	A1
20180082470	Nijasure et al.	Mar 2018	A1
20180084269	Qiu	Mar 2018	A1
20180088822	Alameldeen et al.	Mar 2018	A1
20180089091	Akenine-Moller et al.	Mar 2018	A1
20180095823	Fahim	Apr 2018	A1
20180103261	Sun et al.	Apr 2018	A1
20180108331	Chao	Apr 2018	A1
20180114290	Paltashev et al.	Apr 2018	A1
20180150991	Tannenbaum	May 2018	A1
20180165210	Sethuraman	Jun 2018	A1
20180165786	Bourd et al.	Jun 2018	A1
20180165789	Gruber et al.	Jun 2018	A1
20180165790	Schneider	Jun 2018	A1
20180173623	Koob et al.	Jun 2018	A1
20180182155	Mirza et al.	Jun 2018	A1
20180189924	Gould et al.	Jul 2018	A1
20180217930	Koob et al.	Aug 2018	A1
20180225224	Senior et al.	Aug 2018	A1
20180239705	Heirman et al.	Aug 2018	A1
20180246655	Schmit	Aug 2018	A1
20180247387	Riguer	Aug 2018	A1
20180292897	Wald et al.	Oct 2018	A1
20180293695	Sharma et al.	Oct 2018	A1
20180293697	Ray et al.	Oct 2018	A1
20180293701	Appu et al.	Oct 2018	A1
20180293703	Vaidyanathan et al.	Oct 2018	A1
20180300857	Appu et al.	Oct 2018	A1
20180300903	Gierach et al.	Oct 2018	A1
20180301125	Haraden	Oct 2018	A1
20180308211	Surti et al.	Oct 2018	A1
20180308280	Surti et al.	Oct 2018	A1
20180308285	Doyle et al.	Oct 2018	A1
20180329830	Senior et al.	Nov 2018	A1
20180349315	Heggelund et al.	Dec 2018	A1
20190034333	Sazegari et al.	Jan 2019	A1
20190042410	Gould et al.	Feb 2019	A1
20190052553	Subramanian	Feb 2019	A1
20190052913	Hachfeld	Feb 2019	A1
20190066352	Kazakov	Feb 2019	A1
20190068974	Pohl	Feb 2019	A1
20190087305	Mola	Mar 2019	A1
20190095331	Diamand	Mar 2019	A1
20190096095	Veernapu et al.	Mar 2019	A1
20190102178	Zbiciak	Apr 2019	A1
20190102300	Blankenship et al.	Apr 2019	A1
20190102324	Ozsoy et al.	Apr 2019	A1
20190121880	Scherer, III et al.	Apr 2019	A1
20190171894	Poddar	Jun 2019	A1
20190206090	Ray et al.	Jul 2019	A1
20190213775	Dimitrov et al.	Jul 2019	A1
20190243780	Gopal et al.	Aug 2019	A1
20190250921	Forsyth	Aug 2019	A1
20190286570	Miura	Sep 2019	A1
20190304140	Fuller	Oct 2019	A1
20190311529	Lacey et al.	Oct 2019	A1
20190318445	Benthin et al.	Oct 2019	A1
20190377498	Das	Dec 2019	A1
20200118299	Iourcha	Apr 2020	A1
20200125501	Durham et al.	Apr 2020	A1
20200162100	Beckman et al.	May 2020	A1
20200167076	Lai	May 2020	A1
20200174945	Mukherjee et al.	Jun 2020	A1
20200186811	Madajczak et al.	Jun 2020	A1
20200218471	Chen	Jul 2020	A1
20200250097	Holland	Aug 2020	A1
20210011646	Nystad	Jan 2021	A1
20210049099	Turner	Feb 2021	A1
20210141729	Rao	May 2021	A1
20210142438	Appu	May 2021	A1
20210295583	Vaidyanathan	Sep 2021	A1
20210312592	Liu	Oct 2021	A1
20210366177	Viitanen	Nov 2021	A1
20210398241	Fetterman	Dec 2021	A1
20220012592	Jain	Jan 2022	A1
20220051476	Woop	Feb 2022	A1
20220066776	Stevens	Mar 2022	A1
20220069840	Haggebrant	Mar 2022	A1
20220092826	Sharma	Mar 2022	A1
20220197651	Kumar	Jun 2022	A1
20220207644	Korobkov	Jun 2022	A1
20220301228	Junkins	Sep 2022	A1
20220309190	Gopal	Sep 2022	A1
20220309732	Woo	Sep 2022	A1
20220398686	Persson	Dec 2022	A1
20220413854	Ray	Dec 2022	A1
20220414011	Mandal	Dec 2022	A1
20220414939	Pillai	Dec 2022	A1
20230100106	Dewan	Mar 2023	A1
20230195625	Allan	Jun 2023	A1
20230195638	Uhrenholt	Jun 2023	A1
20230297382	Abali	Sep 2023	A1
20230315627	Abali	Oct 2023	A1
20240078186	Uhrenholt	Mar 2024	A1

Foreign Referenced Citations (4)

Number	Date	Country
101072349	Nov 2007	CN
101425175	May 2009	CN
101470671	Jul 2009	CN
2000105839	Apr 2000	JP

Non-Patent Literature Citations (11)

Entry
Akenine-Moller, et al., “Graphics Processing Units for Handhelds”, Proceedings of the IEEE, May 2008, pp. 779-789, vol. 96, Issue 5, IEEE, New York, NY, USA.
Woo et al., “A 195mW, 9.1 MVertices/s Fully Programmable 3-D Graphics Processor for Low-Power Mobile Devices”, IEEE Journal of Solid-State Circuits, Nov. 19, 2008, pp. 2370-2380, vol. 43, No. 11, IEEE, Piscataway, NJ, USA.
Akenine-Moller, et al., “6.3 Procedural Texturing”, Real-Time Rendering, pp. 178-180, Third Edition, Jul. 25, 2008, CRC Press.
International Search Report and Written Opinion in International Application No. PCT/US2012/042442, mailed Oct. 31, 2012, 11 pages.
Andersson, Johan, “Chapter 5: Terrain Rendering in Frostbite Using Procedural Shader Splatting”, Advanced Real-Time Rendering in 3D Graphics and Games, SIGGRAPH 2007 Course 28, Aug. 8, 2007, 21 pages.
Andersson, Johan, “Terrain Rendering in Frostbite Using Procedural Shader Splatting”, Advanced Real-Time Rendering in 3D Graphics and Games, SIGGRAPH 2007 Course 28, Aug. 8, 2007, 52 pages.
Beam, Josh, “Tutorial: Getting Started with the OpenGL Shading Language (GLSL)”, Dec. 30, 2010, 7 pages, http://joshbeam.com/articles/getting_started_with_glsl/. [Retrieved Oct. 14, 2014].
Notification of the First Office Action in Chinese Application No. 201280029522.3, mailed Mar. 30, 2016, 20 pages.
Office Action in Japanese Patent Application No. 2014-515991, mailed Mar. 29, 2016, 9 pages.
Final Office Action in Japanese Patent Application No. 2014-515991, mailed Aug. 30, 2016, 6 pages.
Communication pursuant to Article 94(3) in European Application No. 12732897.9, mailed Oct. 13, 2017, 5 pages.

Related Publications (1)

	Number	Date	Country
	20210312668 A1	Oct 2021	US

Continuations (3)

	Number	Date	Country
Parent	16709462	Dec 2019	US
Child	17352809		US
Parent	15181130	Jun 2016	US
Child	16709462		US
Parent	13163071	Jun 2011	US
Child	15181130		US

Real time on-chip texture decompression using shader processors

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract