The present invention is generally related to hardware accelerated graphics computer systems.
Recent advances in computer performance have enabled graphic systems to provide more realistic graphical images using personal computers, home video game computers, handheld devices, and the like. In such graphic systems, a number of procedures are executed to “render” or draw graphic primitives to the screen of the system. A “graphic primitive” is a basic component of a graphic picture, such as a point, line, polygon, or the like. Rendered images are formed with combinations of these graphic primitives. Many procedures may be utilized to perform 3-D graphics rendering.
Specialized graphics processing units (e.g., GPUs, etc.) have been developed to optimize the computations required in executing the graphics rendering procedures. The GPUs are configured for high-speed operation and typically incorporate one or more rendering pipelines. Each pipeline includes a number of hardware-based functional units that are optimized for high-speed execution of graphics instructions/data, where the instructions/data are fed into the front end of the pipeline and the computed results emerge at the back end of the pipeline. The hardware-based functional units, cache memories, firmware, and the like, of the GPU are optimized to operate on the low-level graphics primitives and produce real-time rendered 3-D images.
The real-time rendered 3-D images are generated using rasterization technology. Rasterization technology is widely used in computer graphics systems, and generally refers to the mechanism by which the grid of multiple pixels comprising an image are influenced by the graphics primitives. For each primitive, a typical rasterization system steps from pixel to pixel and determines whether or not to “render” (write a given pixel into a frame buffer or pixel map) as per the contribution of the primitive. This, in turn, determines how to write the data to the display buffer representing each pixel.
Various traversal algorithms and various rasterization methods have been developed for computing all of the pixels covered by the primitive(s) comprising a given 3-D scene. For example, some solutions involve generating the pixels in a unidirectional manner. Such traditional unidirectional solutions involve generating the pixels row-by-row in a constant direction (e.g. left to right). The coverage for each pixel is evaluated to determine if the pixel is inside the primitive being rasterized. This requires that the sequence shift across the primitive to a starting location on a first side of the primitive upon finishing at a location on an opposite side of the primitive.
Other traditional methods involve stepping pixels in a local region following a space filling curve such as a Hilbert curve. The coverage for each pixel is evaluated to determine if the pixel is inside the primitive being rasterized. This technique does not have the large shifts (which can cause inefficiency in the system) of the unidirectional solutions, but is typically more complicated to design than the unidirectional solution.
Once the primitives are rasterized into their constituent pixels, these pixels are then processed in pipeline stages subsequent to the rasterization stage where the rendering operations are performed. Typically, these rendering operations involve reading the results of prior rendering for a given pixel from the frame buffer, modifying the results based on the current operation, and writing the new values back to the frame buffer. For example, to determine if a particular pixel is visible, the distance from the pixel to the camera is often used. The distance for the current pixel is compared to the closest previous pixel from the frame buffer, and if the current pixel is visible, then the distance for the current pixel is written to the frame buffer for comparison with future pixels. Similarly, rendering operations that assign a color to a pixel often blend the color with the color that resulted from previous rendering operations. Operations in which a frame buffer value is read for a particular pixel, modified, and written back are generally referred to as R-M-W operations. Generally, rendering operations assign a color to each of the pixels of a display in accordance with the degree of coverage of the primitives comprising a scene. The per pixel color is also determined in accordance with texture map information that is assigned to the primitives, lighting information, and the like.
In many systems, the capability of performing R-M-W operations presents a hazard that must be overcome in the system design. In particular, many systems process multiple primitives concurrently. However, most graphics systems present the appearance that primitives are rendered in the order in which they are provided to the GPU. If two sequential primitives utilized R-M-W operations, then the GPU must give the appearance that the value that is written by the first primitive is the value read by the second primitive for any particular pixel. The hazard for the system is how to concurrently process primitives yet maintain the appearance of sequential processing as required by many graphics programming models (e.g. OpenGL or DirectX).
A variety of techniques exist to mitigate the R-M-W hazard depending on the application. A system may maintain a transaction log of the color updates required for a pixel. At the end of rendering a scene, the sorted transaction log may be used to create the final pixel color. Another common solution is referred to as a “scoreboard”. A scoreboard is an array of memory that is used to indicate all of the screen locations where rendering of R-M-W operations may be occurring at any given time. When a primitive is rasterized, each pixel is checked against the scoreboard and is only rendered if no other pixel is currently rendering the same location. When rendering for a pixel begins, the scoreboard is marked for the pixel location. Upon completion of rendering, the scoreboard for a location is cleared. In this way, the system can render concurrently pixels in primitives which do not overlap pixels from other primitives, and will render serially any pixels in primitives which do overlap.
A problem exists however with the ability of prior art Scoreboard 3-D rendering architectures to function with the latency that occurs when accessing graphics memory. For example, as pixel fragments are updated in a graphics memory (e.g., frame buffer, etc.), an undesirable amount of latency is incurred as the scoreboard mechanism functions to mitigate the R-M-W hazards. As described above, depending on the specifics of individual systems, a large amount of this latency is due to the scoreboard checking of concurrently rendered pixels.
Thus, a need exists for a rasterization process that can scale as graphics needs require and provide added performance while reducing the impact of graphics memory access latency.
Embodiments of the present invention implement a rasterization process that can scale as graphics needs require and provide added performance while reducing the impact of graphics memory access latency.
In one embodiment, the present invention is implemented as a method for latency buffered scoreboarding in a graphics pipeline of a graphics processor. The method includes receiving a graphics primitive for rasterization in a raster stage of a graphics processor and rasterizing the graphics primitive to generate a plurality pixels related to the graphics primitive. An ID is stored and is used to account for an initiation of parameter evaluation for each of the pixels (e.g., when the pixels are sent down the pipeline) as the pixels are transmitted to a subsequent stage of the graphics processor. One or more buffers are used to store the resulting fragment data when the pixels emerge from the pipeline. The ID and the fragment data from the buffering are compared to determine whether they correspond to one another. The completion of parameter evaluation for each of the pixels is accounted for when the ID and the fragment data match and as the fragment data is written to a memory (e.g., L2 cache, graphics memory, or the like).
In one embodiment, the accounting for the initiation of parameter evaluation and the accounting for the completion of parameter evaluation comprises a scoreboarding process that is implemented by the graphics processor. This scoreboarding process can be implemented along with the buffering of the fragment data to compensate for latency in accessing the memory. In one embodiment, the accounting for the completion of parameter evaluation as provided by the scoreboarding process is configured to ensure coherency of the fragment data written to the memory. In one embodiment, the buffers for the fragment data comprise an L1 cache and the memory comprises an L2 cache.
In this manner, embodiments of the present invention provides for a rasterization process that can scale as graphics needs require and provide added performance while reducing the impact of graphics memory access latency. For example, pixel fragments can be updated in a graphics memory (e.g., L2 cache, frame buffer, etc.) while buffer memory compensates for any latency. These benefits can be provided as the scoreboard mechanism prevents R-M-W hazards.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments of the present invention.
Notation and Nomenclature:
Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of a computer system (e.g., computer system 100 of
Computer System Platform:
It should be appreciated that the GPU 110 can be implemented as a discrete component, a discrete graphics card designed to couple to the computer system 100 via a connector (e.g., AGP slot, PCI-Express slot, etc.), a discrete integrated circuit die (e.g., mounted directly on a motherboard), or as an integrated GPU included within the integrated circuit die of a computer system chipset component (not shown). Additionally, a local graphics memory 114 can be included for the GPU 110 for high bandwidth graphics data storage.
Embodiments of the present invention implement a method and system for latency buffered scoreboarding in a graphics pipeline of a graphics processor (e.g., GPU 110 of
In the
The traversal pattern 221 has advantages for maintaining a cache of relevant data and reducing the memory requests required for frame buffer access. For example, generating pixels that are near recently generated pixels is important when recent groups of pixels and/or their corresponding depth values, stencil values, and the like are kept in memories of a limited size (e.g., cache memories, etc.).
In each case (e.g., with triangle 301, triangle 401, etc.), an objective of the rasterization process is to determine which pixels have at least some degree of coverage by a given primitive. These pixels are then passed on to the subsequent stages of the graphics pipeline to be rendered into the appropriate fragment data and stored into the frame buffer memory for display.
The scoreboard unit 503 functions by accounting for the initiation of parameter evaluation for each of the pixels received from the raster unit 502. As pixels are emitted by the raster unit 502, the scoreboard unit 503 accounts for their processing initiation by using a scoreboard data structure (e.g., a scoreboard memory). As the raster unit 502 transmits pixels down the pipeline (e.g., to the program sequence or unit 504, etc.), it sets markers for the pixels into the scoreboard memory to account for the initiation of parameter evaluation. These markers (e.g., bits, flags, or the like) on the scoreboard signify that those respective pixels of the display have fragments “in-flight” within the graphics pipeline.
The in-flight marks of the scoreboard function by preventing read modify right hazards so that, for example, a subsequent primitive that rasterizes to the same pixels does not erroneously fetch stale data from graphics memory, caches, or the like. When a primitive is rasterized by the raster unit 502, each pixel is checked against the scoreboard 503 and is only launched down the pipeline if no other in-flight pixel is currently rendering the same location. Upon completion of rendering, the scoreboard for that location is cleared, thereby allowing concurrent rendering for pixels in primitives which do not overlap and serially rendering any pixels in primitives which do overlap.
The program sequencer 220 functions by controlling the operation of the functional modules of the graphics pipeline 210. The program sequencer 220 can interact with the graphics driver (e.g., a graphics driver executing on the CPU 101) to control the manner in which the functional modules (e.g., the ALU 505, etc.) of the graphics pipeline receive information, configure themselves, and process graphics primitives. The program sequencer 220 also functions by managing multiple pass rendering operations, where fragment data will loop around through the graphics pipeline two or more times (e.g., looping back from the data write unit 506) to implement, for example, more complex pixel shading, or the like.
The ALU unit 505 functions by performing parameter evaluation processing on the fragment data received from the raster unit 502 and the program sequencer 504. The parameter evaluation process can be one of the number of different evaluation processes, or pixel tests, which determine the degree to which the tiles from a given primitive influence pixel colors in the frame buffer 510. For example, the parameter evaluation process can be interpolation of primitive attributes, or a depth evaluation process, where, for example, depth values for the tiles passed from the raster unit 502 are tested against the depth values for those pixels are already residing within the frame buffer 510. Alternatively, the parameter evaluation process can be a transparency evaluation, where a transparency value for the tiles passed in raster unit are tested against the pixels already in the frame buffer. The objective is to identify pixels which will not ultimately be drawn in the frame buffer 510 and discard them to save processing bandwidth. For example, in a case where the parameter comprises a depth value, the objective is to identify those tiles which are behind other primitives, or are otherwise occluded, and discard them from the pipeline.
Once the ALU 505 completes operation on a pixel, the pixel is transmitted to the data write unit 506. The data write unit 506 writes the completed pixels to the fragment data cache 520. For multiple pass processing, data write unit 506 writes the fragment data back to the program sequencer 504.
The fragment data cache 520 functions by maintaining a high-speed cache memory for low latency access by the graphics pipeline. The fragment data cache 520 services read requests from the program sequencer 504, read and write requests from the raster unit 502, and data writes from the data write unit 506. The fragment data cache 520 is responsible for maintaining coherence between its internal caches and the graphics memory 114 (e.g., the frame buffer memory).
The
As described above, as pixels are emitted by the raster unit 502, the scoreboard unit 503 is marked to account for their processing initiation, thereby signifying that those respective pixels of the display have fragments “in-flight” within the graphics pipeline. The raster unit 502 also stores an ID that corresponds to the mark within the scoreboard unit 503. This ID is referred to as a raster scoreboard clear packet and includes information that identifies the in-flight fragments that have just been launched. The raster scoreboard clear packet is transmitted from the raster unit 502 to the buffer 701 of the fragment data cache 520 as shown by line 711. In one embodiment, the identifying information is the x-y coordinate information of the pixel.
The
The scoreboard clear logic 750 functions by comparing the ID of the scoreboard clear packets 711 and/or 712 and the fragment data from the buffers 721-724. As fragment data packets are emitted from the buffers 721-724 and are stored into the L2 cache 601, those fragment data packets are compared against the IDs of the scoreboard clear packets in the buffers 701-702. The scoreboard clear signal 610 is held back while a determination is made as to whether these ID's (e.g., 711-712 vs 721-724) match. If they do not match, the scoreboard clear signal 610 is sent immediately. If they do match, the scoreboard logic 750 waits for the data to arrive at the L2 cache 601 before sending the scoreboard clear signal 610. This attribute is implemented due to the fact that some pixels which are cleared will not have data written at all to the fragment data cache 520, so if the scoreboard logic 750 waited for a match before sending the clear 610, the system could potentially lock up.
This signifies to the rest of the graphics pipeline that the pixels corresponding to the marker have finished processing and have been stored into the L2 cache 601. If the scoreboard clear logic 750 does not detect a match, it does not send the scoreboard clear signal 610 back to the scoreboard 503, which prevents the emission of any new coincident pixels from the raster unit 502. This attribute ensures coherence by preventing rasterized fragments that collide with outstanding IDs from entering the pipeline until the outstanding IDs are cleared in the scoreboard.
In this manner, the scoreboard unit 503 and the fragment data cache 520 enable the accounting for the initiation and the completion of pixel parameter evaluation to ensure a latency between a fetch of required parameter data and the modification and writeback of the parameter data by a subsequent stage of the graphics pipeline does not corrupt the rendering process.
It should be noted that the cache coherency attributes provided by the scoreboard unit 503 and the fragment data cache 520 are particularly useful in those architectures which do not have a dedicated ROP (render operations) hardware unit. For example, on an architecture that uses a streaming fragment processor for doing parameter evaluation, shader execution, and streaming processing, as well as the backend render operations, the cache coherency and read-modify-write hazard mitigation provided by the scoreboard unit 503 and the fragment data cache 520 enable a more efficient GPU design. Such a design would require fewer transistors and consume less power. These benefits can be particularly useful in, for example, battery-powered handheld device applications.
In one embodiment, the ID for the scoreboard clear packet comprises a hash of the corresponding fragment's frame buffer location. For example, when a fragment is emitted from the raster unit 502, the raster unit 502 sets a marker bit in the scoreboard 503 whose address in the scoreboard is a hash of that fragment's x-y frame buffer location. Additionally, the fragment packet is tagged if a bit was set in the scoreboard for it.
The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
4620217 | Songer | Oct 1986 | A |
4648045 | Demetrescu | Mar 1987 | A |
4700319 | Steiner | Oct 1987 | A |
4862392 | Steiner | Aug 1989 | A |
4901224 | Ewert | Feb 1990 | A |
5185856 | Alcorn et al. | Feb 1993 | A |
5268995 | Diefendorff et al. | Dec 1993 | A |
5285323 | Hetherington et al. | Feb 1994 | A |
5357604 | San et al. | Oct 1994 | A |
5392393 | Deering | Feb 1995 | A |
5487022 | Simpson et al. | Jan 1996 | A |
5488687 | Rich | Jan 1996 | A |
5491496 | Tomiyasu | Feb 1996 | A |
5577213 | Avery et al. | Nov 1996 | A |
5579473 | Schlapp et al. | Nov 1996 | A |
5579476 | Cheng et al. | Nov 1996 | A |
5581721 | Wada et al. | Dec 1996 | A |
5600584 | Schlafly | Feb 1997 | A |
5604824 | Chui et al. | Feb 1997 | A |
5613050 | Hochmuth et al. | Mar 1997 | A |
5655132 | Watson | Aug 1997 | A |
5701444 | Baldwin | Dec 1997 | A |
5764228 | Baldwin | Jun 1998 | A |
5777628 | Buck-Gengler | Jul 1998 | A |
5831640 | Wang et al. | Nov 1998 | A |
5850572 | Dierke | Dec 1998 | A |
5864342 | Kajiya et al. | Jan 1999 | A |
5941940 | Prasad et al. | Aug 1999 | A |
5995121 | Alcorn et al. | Nov 1999 | A |
6115047 | Deering | Sep 2000 | A |
6166743 | Tanaka | Dec 2000 | A |
6173366 | Thayer et al. | Jan 2001 | B1 |
6222550 | Rosman et al. | Apr 2001 | B1 |
6229553 | Duluk, Jr. et al. | May 2001 | B1 |
6259460 | Gossett et al. | Jul 2001 | B1 |
6288730 | Duluk, Jr. et al. | Sep 2001 | B1 |
6333744 | Kirk et al. | Dec 2001 | B1 |
6351806 | Wyland | Feb 2002 | B1 |
6353439 | Lindholm et al. | Mar 2002 | B1 |
6407740 | Chan | Jun 2002 | B1 |
6411130 | Gater | Jun 2002 | B1 |
6411301 | Parikh et al. | Jun 2002 | B1 |
6417851 | Lindholm et al. | Jul 2002 | B1 |
6466222 | Kao et al. | Oct 2002 | B1 |
6483516 | Tischler | Nov 2002 | B1 |
6496537 | Kranawetter et al. | Dec 2002 | B1 |
6525737 | Duluk, Jr. et al. | Feb 2003 | B1 |
6526430 | Hung et al. | Feb 2003 | B1 |
6542971 | Reed | Apr 2003 | B1 |
6557022 | Sih et al. | Apr 2003 | B1 |
6597363 | Duluk, Jr. et al. | Jul 2003 | B1 |
6604188 | Coon et al. | Aug 2003 | B1 |
6624818 | Mantor et al. | Sep 2003 | B1 |
6636221 | Morein | Oct 2003 | B1 |
6664958 | Leather et al. | Dec 2003 | B1 |
6717577 | Cheng et al. | Apr 2004 | B1 |
6718542 | Kosche et al. | Apr 2004 | B1 |
6731288 | Parsons et al. | May 2004 | B2 |
6734861 | Van Dyke et al. | May 2004 | B1 |
6778181 | Kilgariff et al. | Aug 2004 | B1 |
6806886 | Zatz | Oct 2004 | B1 |
6839828 | Gschwind et al. | Jan 2005 | B2 |
6924808 | Kurihara et al. | Aug 2005 | B2 |
6947053 | Malka et al. | Sep 2005 | B2 |
6980209 | Donham et al. | Dec 2005 | B1 |
6980222 | Marion et al. | Dec 2005 | B2 |
6992669 | Montrym et al. | Jan 2006 | B2 |
6999100 | Leather et al. | Feb 2006 | B1 |
7015913 | Lindholm et al. | Mar 2006 | B1 |
7034828 | Drebin et al. | Apr 2006 | B1 |
7079156 | Hutchins et al. | Jul 2006 | B1 |
7106336 | Hutchins | Sep 2006 | B1 |
7158141 | Chung et al. | Jan 2007 | B2 |
7187383 | Kent | Mar 2007 | B2 |
7190366 | Hutchins et al. | Mar 2007 | B2 |
7257814 | Melvin et al. | Aug 2007 | B1 |
7280112 | Hutchins | Oct 2007 | B1 |
7298375 | Hutchins | Nov 2007 | B1 |
7450120 | Hakura et al. | Nov 2008 | B1 |
7477260 | Nordquist | Jan 2009 | B1 |
7659909 | Hutchins | Feb 2010 | B1 |
7710427 | Hutchins et al. | May 2010 | B1 |
7928990 | Jiao et al. | Apr 2011 | B2 |
7941645 | Riach et al. | May 2011 | B1 |
7969446 | Hutchins et al. | Jun 2011 | B2 |
8537168 | Steiner et al. | Sep 2013 | B1 |
8773447 | Donham | Jul 2014 | B1 |
8860722 | Cabral et al. | Oct 2014 | B2 |
20020105519 | Lindholm et al. | Aug 2002 | A1 |
20020126126 | Baldwin | Sep 2002 | A1 |
20020129223 | Takayama et al. | Sep 2002 | A1 |
20020169942 | Sugimoto | Nov 2002 | A1 |
20030038810 | Emberling | Feb 2003 | A1 |
20030115233 | Hou et al. | Jun 2003 | A1 |
20030164840 | O'Driscoll | Sep 2003 | A1 |
20030189565 | Lindholm et al. | Oct 2003 | A1 |
20040012597 | Zatz et al. | Jan 2004 | A1 |
20040012599 | Laws | Jan 2004 | A1 |
20040012600 | Deering et al. | Jan 2004 | A1 |
20040024260 | Winkler et al. | Feb 2004 | A1 |
20040100474 | Demers et al. | May 2004 | A1 |
20040114813 | Boliek et al. | Jun 2004 | A1 |
20040119710 | Piazza et al. | Jun 2004 | A1 |
20040126035 | Kyo | Jul 2004 | A1 |
20040130552 | Duluk, Jr. et al. | Jul 2004 | A1 |
20040246260 | Kim et al. | Dec 2004 | A1 |
20050122330 | Boyd et al. | Jun 2005 | A1 |
20050134588 | Aila et al. | Jun 2005 | A1 |
20050135433 | Chang et al. | Jun 2005 | A1 |
20050162436 | Van Hook et al. | Jul 2005 | A1 |
20050223195 | Kawaguchi | Oct 2005 | A1 |
20050231506 | Simpson et al. | Oct 2005 | A1 |
20050237337 | Leather et al. | Oct 2005 | A1 |
20050253873 | Hutchins et al. | Nov 2005 | A1 |
20050275657 | Hutchins et al. | Dec 2005 | A1 |
20050280655 | Hutchins et al. | Dec 2005 | A1 |
20060007234 | Hutchins et al. | Jan 2006 | A1 |
20060028469 | Engel | Feb 2006 | A1 |
20060152519 | Hutchins et al. | Jul 2006 | A1 |
20060155964 | Totsuka | Jul 2006 | A1 |
20060177122 | Yasue | Aug 2006 | A1 |
20060268005 | Hutchins et al. | Nov 2006 | A1 |
20060288195 | Ma et al. | Dec 2006 | A1 |
20070030278 | Prokopenko et al. | Feb 2007 | A1 |
20070165029 | Lee et al. | Jul 2007 | A1 |
20070236495 | Gruber et al. | Oct 2007 | A1 |
20070279408 | Zheng et al. | Dec 2007 | A1 |
20070285427 | Morein et al. | Dec 2007 | A1 |
Number | Date | Country |
---|---|---|
1954338 | May 2004 | CN |
101091203 | May 2004 | CN |
1665165 | May 2004 | EP |
1745434 | May 2004 | EP |
1771824 | May 2004 | EP |
05150979 | Jun 1993 | JP |
11053187 | Feb 1999 | JP |
2000047872 | Feb 2000 | JP |
2002073330 | Mar 2002 | JP |
2002171401 | Jun 2002 | JP |
2004199222 | Jul 2004 | JP |
2006196004 | Jul 2006 | JP |
2008161169 | Jul 2008 | JP |
2005112592 | May 2004 | WO |
2006007127 | May 2004 | WO |
2005114582 | Dec 2005 | WO |
2005114646 | Dec 2005 | WO |
Entry |
---|
“Interleaved Memory.” Dec. 26, 2002. http://www.webopedia.com/TERM/1/interleaved—memory.html. |
Pirazzi, Chris. “Fields, F1/F2, Interleave, Field Dominance and More.” Nov. 4, 2001. http://lurkertech.com/Ig/dominance.html. |
Hennessy, et al., Computer Organization and Design: the Hardware/Software Interface, 1997, Section 6.5. |
Moller, et al.; Real-Time Rendering, 2nd ed., 2002, A K Peters Ltd., pp. 92-99, 2002. |
Hollasch; IEEE Standard 754 Floating Point Numbers; http://steve.hollasch.net/cgindex/coding/ieeefloat.html; dated Feb. 24, 2005; retrieved Oct. 21, 2010. |
Microsoft; (Complete) Tutorial to Understand IEEE Floating-Point Errors; http://support.microsoft.com/kb/42980; dated Aug. 16, 2005; retrieved Oct. 21, 2010. |
The Free Online Dictionary, Thesaurus and Encyclopedia, definition for cache; http://www.thefreedictionary.com/ cache; retrieved Aug. 17, 2012. |
Wolfe A, et al., “A Superscalar 3D graphics engine”, MICRO-32. Proceedings of the 32nd annual ACM/IEEE International Symposium on Microarchitecture. Haifa, Israel, Nov. 16-18, 1999;. |
Zaharieva-Stoyanova E I: “Data-flow analysis in superscalar computer architecture execution,” Tellecommunications in Modern Satellite, Cable and Broadcasting Services, 2003. |
“Sideband,” http://www.encyclopedia.com/html/s1/sideband.asp. |
Pixar, Inc.; PhotoRealistic RenderMan 3.9 Shading Language Extensions; Sep. 1999. |
PCT Notificaiton of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration. PCT/US05/17032; Applicant NVIDA Corporation; Mail Date Nov. 9, 2005. |
PCT Notificaiton of Transmittal of the International Search Report or the Declaration. PCT/US05/17526; Applicant Hutchins, Edward A; Mail Date Jan. 17, 2006. |
PCT Notificaiton of Transmittal of the International Search Report and the Written Opinion of the International. |