The present invention is generally related to computer implemented graphics. More particularly, the present invention is directed towards an efficient method for accessing memory.
Recent advances in computer performance have enabled graphic systems to provide more realistic graphical images using personal computers and home video game computers. In such graphic systems, some procedure must be implemented to “render” or draw graphic primitives to the screen of the system. A “graphic primitive” is a basic component of a graphic picture, such as a polygon, e.g., a triangle, or a vector. All graphic pictures are formed with combinations of these graphic primitives. Many procedures may be utilized to perform graphic primitive rendering.
Texture mapping schemes were developed to enhance the images rendered by early graphics systems. Early graphic systems displayed images representing objects having extremely smooth surfaces. That is, textures, bumps, scratches, or other surface features were not modeled. In order to improve the quality of the image, texture mapping was developed to model the complexity of real world surface images. In general, texture mapping is the mapping of an image or a function onto a surface in three dimensions. For example, the texture would be a picture of whatever material the designer was trying to convey (e.g., brick, stone, vegetation, wood, etc.) and would contain shading information as well as the texture and color to create the impression of a complex, dimensional surface. Texture mapping is now widely established and widely implemented in most computer graphics systems.
Modern realistic texture mapping (e.g., as required for modern 3D rendering applications) requires the manipulation of large amounts of data. Generally speaking, as a given 3-D scene becomes more realistic, the more realistic the texture map (or simply texture) being used in the texture mapping operations. Accordingly, realistic high-resolution textures can be very large (e.g., several megabytes of data). The bandwidth required for accessing such textures is also very large. The memory and bandwidth requirements can exceed the capabilities of even the most modern real-time 3-D rendering systems.
One prior art approach to alleviating texture memory and bandwidth requirements involves the implementation of various schemes whereby only a sub portion of the texture that may be needed in a scene is fetched from memory at a time. Large textures are typically stored in system memory, as opposed to local graphics memory. To satisfy the bandwidth requirements and latency constraints, a sub portion of the texture is fetched into the local graphics memory and used in the texture mapping operations of the 3-D rendering process.
For example, one prior art scheme only fetches that portion of a texture that is needed to render the visible scene (e.g., that portion of the scene within the view volume). As the scene changes (e.g., as the viewpoint of the view volume changes), other portions of the texture are fetched as required. This technique relies on a page fault type mechanism to fetch needed texture data when that data is not resident in local graphics memory.
The problems with this approach is the amount of time required to fetch the needed texture data from system memory to the local graphics memory. The required texture data needs to be pulled in (e.g., DMA transfer) from the system memory via the system memory controller and graphics bus (e.g., AGP bus, PCI express, etc.). The data transfer bandwidth from system memory via the graphics bus is much lower than that from the local graphics memory. Additionally, there is a significant amount of added latency imposed on data transfers across the graphics bus.
The above problems increase the time required to service the texture data page fault. This delay can cause stalling of the graphics rendering pipeline. Such stalling is very harmful to real-time 3D rendering applications. The stalling often leads to choppy frame rates and other noticeable pauses when new texture data must be fetched.
Some 3D rendering applications are especially dependent on smooth and reliable access to needed texture data. For example, MIP mapping generally requires several versions of a given texture to be stored and available in local graphics memory (e.g., a full resolution version and several lower resolution versions of a texture), and hence, texture memory demands tend to be high. In a highly dynamic rendering environment where the rendered scene changes rapidly, high bandwidth low latency access to the texture data is critical to overall performance. Stalling the 3D rendering pipeline due to prior art page fault type texture fetching rapidly leads to choppy frame rates, noticeable pauses, and similar problems as the rendered scene changes.
One approach to maintaining rendering speed and frame rate is to increase the amount of local graphics memory (e.g., 128 Mb, 256 Mb, etc.). Such an approach is expensive. Another prior art solution is to increase the performance of the page fault handling system. This approach is also expensive, in that it can require expensive high speed components (e.g., multi channel system memory, exotic high speed DDR RAM, PCIx graphics bus, etc.). Even with such components, however, there are practical limits to the degree to which the performance of the prior art page fault texture memory fetching schemes can be improved. Thus, what is needed is a more efficient way to maintain rendering speed and frame rate for those 3D rendering applications that utilize texture mapping.
Embodiments of the present invention provide a method and system for implementing texture data access for real time 3D rendering applications. Embodiments of the present invention perform texture data access operations while maintaining rendering speed and frame rate for those 3D rendering applications that utilize texture mapping.
In one embodiment, the present invention is implemented as a method for storing texture data. The method includes the step of accessing a low resolution version of a block of texture data in a low latency memory and storing a high resolution version of the block of texture data in high latency memory. Upon a request for the high resolution version of the block of texture data, the high resolution version is fetched from the high latency memory to the low latency memory. The low resolution version is subsequently accessed from the low latency memory until the high resolution version is fetched into the low latency memory.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments of the present invention.
Embodiments of the present invention provide a method and system for implementing texture data access for real time 3D rendering applications. Embodiments of the present invention perform texture data access operations while maintaining rendering speed and frame rate for those 3D rendering applications that utilize texture mapping. Embodiments of the present invention and their benefits are further described below.
Notation and Nomenclature:
Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of a computer system (e.g., computer system 500 of
As shown in
Referring still to
The data is accessed by a GPU as needed to render a given scene. For example, for high pixel count (e.g., 1600 by 1200 pixels, etc.) complex scenes, the several large texture maps are typically too large to fit within the graphics cache 103 or the graphics memory 102, and must be stored within the system memory 101. However, as described above, accessing the system memory is too slow to enable effective high-speed, real-time 3-D rendering by the GPU. Consequently, the large texture maps are divided into discrete blocks of texture data and these blocks are fetched, or otherwise transferred, into the higher speed graphics memory 102 and graphics cache 103. Texture mapping applications such as MIP mapping place usually high demands on the texture mapping system since several versions of a given texture map must be stored and accessed.
Since the graphics memory 102 and the graphics cache 103 are generally smaller than the system memory 101, the blocks of texture data (e.g., from one or more different MIP map levels) are fetched into the higher speed memory on an as-needed basis. Thus, for example, a MIP mapping texture operation can be implemented by the GPU accessing texture data from its high-speed graphics cache 103 (e.g., typically included within a GPU) and graphics memory 102 (e.g., typically coupled to the GPU via a high performance bus). In the prior art, when “nonresident” blocks of texture data are needed (e.g., required texture data that is not stored in the graphics memory 102 or the graphics cache 103), the GPU would fetch the required texture data from system memory 101, thereby stalling the GPU's graphics rendering pipeline until the required texture data is fetched.
Embodiments of the present invention utilize an intelligent texture data fetching process that maintains rendering speed and frame rate for the GPU. In one embodiment, several blocks of texture data (e.g., of differing resolutions) are stored in low latency memory. In those cases where a nonresident block is needed by the GPU, embodiments of the present invention utilize different resolution versions of a nonresident block, which are in fact resident in low latency memory, to maintain the rendering speed and frame rate speed while the correct resolution versions of the nonresident block of texture data is fetched into the low latency memory. This avoids stalling the graphics rendering pipeline while waiting for the correct resolution version of the texture data.
For example, when a nonresident block of texture data is needed by the GPU, the GPU can use a corresponding lower resolution block of the texture data in its texture mapping operation. As the corresponding lower resolution version of texture data is used, the GPU fetches the correct resolution version (e.g., the higher resolution version) from the high latency memory for storage in the low latency memory. Once the correct resolution version is stored in low latency memory, the texture mapping operation proceeds using the correct resolution version of the block of texture data. This maintains the GPU's rendering frame rate while avoiding a pipeline stall, as would be caused by waiting for the correct version of the texture data to be fetched.
As known by those skilled in the art, MIP mapping is a widely used type of level of detail filtering used in a texture mapping process. The filtering is configured to prevent moiré interference patterns, aliasing, and rendering artifacts by using multiple lower resolution versions 202-204 of a texture map in addition to a full resolution version of the texture map 201. The full resolution version 201 contains all the surface details of the objects. For example, at close distances to a rendered object, the texture map 201 renders in its original full detail. As the distances increase, successively lower resolution versions of the texture (e.g., versions 202-204) are used, as indicated by the axis 210. By choosing the appropriate texture map resolution and detail, MIP mapping ensures that pixels do not get lost at further distances. Instead, properly averaged smaller versions of the original texture map are used. Each of these stages is known as a MIP map level. It should be noted that although
As described above, the different MIP map levels (e.g., levels 0, level 1, and level 2 as depicted in
As described above, to facilitate handling, access, and fetching, each texture map is divided into a series of constituent blocks of texture data. Thus, each MIP map level is divided into a series of constituent blocks of texture data. Hence, for a given block of texture data, there exists corresponding different resolution versions of that block of texture data (e.g., a level 0 version of a block of texture data and a level 1 resolution version of a block of texture data, etc.).
Embodiments of the present invention utilize an intelligent texture data fetching process that maintains rendering speed and frame rate for the GPU. For example, in one embodiment, several blocks of level 0 texture data 411, level 1 texture data 412, and level 2 texture data 413 are stored in low latency memory 401. The texture data 411-413 is used in the MIP-mapping process executed by the GPU, for example, in the manner indicated in scene 300 of
In one embodiment, a virtual memory paging system is being used to manage texture memory. In such an embodiment, the blocks of texture data correspond to pages. Accordingly, a page fault mechanism is used to fetch the nonresident page of texture data. For example, when the GPU requests access to a nonresident page of texture data, a virtual memory subsystem can handle the request in the same manner as a page fault, and fetch the nonresident page from high latency memory (e.g., system memory) into the low latency memory (e.g., the graphics memory and/or graphics cache).
In this manner, a virtual memory subsystem can be extended to understand the particular issues related to accessing texture data for a 3D graphics rendering process. Instead of faulting and waiting for the host to supply data to the GPU, the GPU can simply re-issue a request to a coarser MIP-map level of the texture data until such a request succeeds (or some programmable number of faults has occurred, in which case normal fault handling could be started). In such an embodiment, applications could defer paging-in the needed texture data until a subsequent frame, with only minor impact on image quality due to the incorrect (e.g. coarser) MIP-map levels being used.
In one embodiment, proximity usage bits are added to texture pages/blocks which are currently being fetched by hardware in order to allow the device driver to better predict which pages/blocks are actually visible or potentially visible in the scene (e.g., 3D scene 300 of
In one embodiment, decompression functionality can be added to the page fault mechanism. In such an embodiment, the texture data in the high latency memory 401 is stored in a compressed form. Thus, when faulted pages are fetched from the high latency memory 401, the faulted pages are transferred more efficiently across the bus 420. The compressed texture data is then decompressed on-the-fly by the page fault mechanism (e.g., the GPU).
In one embodiment, the GPU can be configured to run a pixel shader program to regenerate faulted pages on the fly. In such an embodiment, nonresident blocks of texture data can be generated on-the-fly by filtering a resident block of texture data. For example, one of the texture data pages 411 can be filtered by the GPU using a shader program to obtain a coarser version of the texture data page. This coarser version of the texture data page is then used by the GPU until the correct version can be paged in from the high latency memory 401.
Alternatively, in one embodiment, a procedural texture can be implemented wherein the texture is generated using a program on-the-fly. In such an embodiment, a function can be used (e.g., fractal function, turbulence function, etc.) to generate the texture as opposed to fetching the texture image from some location in memory. Thus for example certain regular type textures (e.g., wood grain, marble, clouds, fire, etc.) can be realistically produced by a shader program. This method would allow different resolution versions of the texture to be produced as needed, potentially saving significant amounts of texture memory.
It should be noted that although embodiments of the present invention have been described in the context of MIP mapping, it should be understood by those skilled in the art that the texture memory management mechanisms as described herein are suited for use in other types of texture mapping functions which requires efficient management of texture memory (e.g., anisotropic filtering, antialiasing, etc.). As such, the texture block swapping process as described above can be utilized a respective of any texture block access “misses” when MIP mapping, or even when MIP mapping is not in use.
With reference now to
In general, computer system 500 comprises at least one CPU 501 coupled to a system memory 515 and a graphics processor unit (GPU) 510 via one or more busses as shown. The GPU 510 is coupled to a display 512. As shown in
As with computer system 500 of
Process 700 begins in step 701, where a low resolution version of a block of texture data is stored in a low latency memory. As described above, the low latency memory can be local graphics memory (e.g., graphics memory 616 of
The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
6295594 | Meier | Sep 2001 | B1 |
6304268 | Iourcha et al. | Oct 2001 | B1 |
7023445 | Sell | Apr 2006 | B1 |
20010011326 | Yoshikawa et al. | Aug 2001 | A1 |
20020004860 | Roman | Jan 2002 | A1 |
20030016229 | Dorbie et al. | Jan 2003 | A1 |
20050024376 | Gettman et al. | Feb 2005 | A1 |
20050193169 | Ahluwalia | Sep 2005 | A1 |