The present invention is generally related to graphics computer systems.
Generally, a computer system suited to handle 3D image data includes a specialized graphics processor unit, or GPU, in addition to a traditional CPU (central processing unit). The GPU includes specialized hardware configured to handle 3D computer-generated objects. The GPU is configured to operate on a set of data models and their constituent “primitives” (usually mathematically described polygons) that define the shapes, positions, and attributes of the objects. The hardware of the GPU processes the objects, implementing the calculations required to produce realistic 3D images on a display of the computer system.
The performance of a typical graphics rendering process is highly dependent upon the performance of the system's underlying hardware. High performance real-time graphics rendering requires high data transfer bandwidth to the memory storing the 3D object data and the constituent primitives. Thus, more expensive prior art GPU subsystems (e.g., GPU equipped graphics cards) typically include larger (e.g., 128 MB or larger) specialized, expensive, high bandwidth local graphics memories for feeding the required data to the GPU. Less expensive prior art GPU subsystems include smaller (e.g., 64 MB or less) such local graphics memories, and some of the least expensive GPU subsystems have no local graphics memory.
A problem with the prior art low-cost GPU subsystems (e.g., having smaller amounts of local graphics memory) is the fact that the data transfer bandwidth to the system memory, or main memory, of a computer system is much less than the data transfer bandwidth to the local graphics memory. Typical GPUs with any amount of local graphics memory need to read command streams and scene descriptions from system memory. A GPU subsystem with a small or absent local graphics memory also needs to communicate with system memory in order to access and update pixel data including pixels representing images which the GPU is constructing. This communication occurs across a graphics bus, or the bus that connects the graphics subsystem to the CPU and system memory.
In one example, per-pixel Z-depth data is read across the system bus and compared with a computed value for each pixel to be rendered. For all pixels which have a computed Z value less than the Z value read from system memory, the computed Z value and the computed pixel color value are written to system memory. In another example, pixel colors are read from system memory and blended with computed pixel colors to produce translucency effects before being written to system memory. Higher resolution images (images with a greater number of pixels) require more system memory bandwidth to render. Images representing larger numbers of 3D objects require more system memory bandwidth to render. The low data transfer bandwidth of the graphics bus acts as a bottleneck on overall graphics rendering performance.
Thus, what is required is a solution capable of reducing the limitations imposed by the limited data transfer bandwidth of a graphics bus of a computer system. What is required is a solution that ameliorates the bottleneck imposed by the much smaller data transfer bandwidth of the graphics bus in comparison to the data transfer bandwidth of the GPU to local graphics memory. The present invention provides a novel solution to the above requirement.
Embodiments of the present invention ameliorate the bottleneck imposed by the much smaller data transfer bandwidth of the graphics bus in comparison to the data transfer bandwidth of the GPU to local graphics memory.
In one embodiment, the present invention is implemented as a system for cooperative graphics processing across a graphics bus in a computer system. The system includes a bridge coupled to a system memory via a system memory bus. The bridge is also coupled to a GPU (graphics processor unit) via the graphics bus. The bridge includes a fragment processor for implementing cooperative graphics processing with the GPU coupled to the graphics bus. The fragment processor is configured to implement a plurality of raster operations on graphics data stored in the system memory. The graphics bus interconnect coupling the bridge to the GPU to can be an AGP-based interconnect or a PCI Express-based interconnect. The GPU can be an add-in card-based GPU or can be a discrete integrated circuit device mounted (e.g., surface mounted, etc.) on the same printed circuit board (e.g., motherboard) as the bridge.
In one embodiment, the graphics data stored in the system memory comprises a frame buffer used by both the fragment processor and the GPU. One mode of cooperative graphics processing involves the fragment processor implementing frame buffer blending on the graphics data in the system memory (e.g., the frame buffer). In one embodiment, the fragment processor is configured to implement multi-sample expansion on graphics data received from the GPU and store the resulting expanded data in the system memory frame buffer. In one embodiment, the fragment processor is configured to evaluate a Z plane equation coverage value for a plurality of pixels (e.g., per polygon) stored in the system memory, wherein the Z plane equation coverage value is received from the GPU via the graphics bus.
In this manner, embodiments of the present invention implement a much more efficient use of the limited data transfer bandwidth of the graphics bus interconnect, and thus dramatically improve overall graphics rendering performance in comparison to the prior art. Furthermore, the benefits provided by the embodiments of the present invention are even more evident in those architectures which primarily utilize system memory for frame buffer graphics data storage.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments of the present invention.
Notation and Nomenclature:
Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of a computer system (e.g., computer system 100 of
Computer System Platform:
It should be appreciated that although the GPU 110 is depicted in
Referring still to
The cooperative graphics processing reduces the total amount data that must be transferred across the bandwidth constrained graphics bus 120. By performing certain graphics rendering operations within the bridge component 105, the comparatively high bandwidth system memory bus 121 can be used to access to graphics data 116, reducing the amount of data access latency experienced by these rendering operations. The resulting reduction in access latency, and increase in transfer bandwidth, allows the overall graphics rendering operations to proceed more efficiently, thereby increasing the performance of bandwidth-demanding 3D rendering applications. This cooperative graphics rendering process is described in further detail in
The bridge 105 includes a fragment processor 201 for implementing cooperative graphics processing with the GPU 110. The fragment processor 201 is configured to implement a plurality of raster operations on graphics data stored in the system memory. These raster operations executed by the fragment processor 201 suffer a much lower degree of latency in comparison to raster operations performed by the GPU 110. This is due to both the higher data transfer bandwidth of the system memory bus 121 and the shorter communications path (e.g., lower data access latency) between the fragment processor 201 and the graphics data 116 within the system memory 115.
Performing a portion of the raster operations, or all of the raster operations, required for graphics rendering in the fragment processor 201 reduces the amount of graphics data accesses (e.g., both reads and writes) that must be performed by the GPU 110. For example, by implementing fragment operations within the fragment processor 201, accesses to fragment data (e.g., graphics data 116) required for iterating fragment colors across multiple pixels can be performed across the high bandwidth system memory bus 121. For example, fragment data can be accessed, iterated across multiple pixels, and the resulting pixel color values can be stored back into the system memory 115 all across the high bandwidth system memory bus 121. The interpolation and iteration functions can be executed by the fragment processor 201 in conjunction with an internal RAM 215. Such fragment processing operations comprise a significant portion of the rendering accesses to the graphics data 116. Implementing them using a fragment processor within the bridge will effectively remove such traffic from the low bandwidth graphics bus 120.
In one embodiment, the fragment processor 201 substantially replaces the functionality of the fragment processor 205 within the GPU 110. In this manner, the incorporation of the fragment processor 201 renders the fragment processor 205 within the GPU 110 optional. For example, in one embodiment, the GPU 110 is an off-the-shelf card-based GPU that is detachably connected to the computer system 100 via a graphics bus interconnect slot (e.g., AGP slot, PCI Express slot, etc.). Such an off-the-shelf card-based GPU would typically incorporate its own one or more fragment processors for use in those systems having a conventional prior art type bridge. The graphics bus interconnect can be an AGP-based interconnect or a PCI Express-based interconnect. The GPU 110 can be an add-in card-based GPU or can be a discrete integrated circuit device mounted (e.g., surface mounted, etc.) on the same printed circuit board (e.g., motherboard) as the bridge 105. When connected to the bridge 105 of the computer system 100 embodiment, the included fragment processor(s) can be disabled (e.g., by the graphics driver).
Alternatively, the GPU 110 can be configured specifically for use with a bridge component having an internal fragment processor (e.g., fragment processor 201). Such a configuration provides advantages in that the GPU integrated circuit die area that would otherwise be dedicated to an internal fragment processor can be saved (e.g., thereby reducing GPU costs) or used for other purposes. In this manner, the inclusion of a fragment processor 205 within the GPU 110 is optional.
In one embodiment, the internal fragment processor 205 within the GPU 110 can be used by the graphics driver in conjunction with the fragment processor 201 within the bridge 105 to implement concurrent raster operations within both components. In such an embodiment, the graphics driver would allocate some portion of fragment processing to the fragment processor 201 and the remainder to the fragment processor 205. The graphics driver would balance the processing workloads between the bridge component 105 and the GPU 110 to best utilize the high bandwidth low latency connection of the bridge component 105 to the system memory 115. For example, to best utilize the high bandwidth system memory bus 121, it would be preferable to implement as large a share as possible of the fragment processing workloads within the bridge component 105 (e.g., fragment processor 201). This would ensure as large a percentage of the fragment operations as is practical are implemented using the low latency high bandwidth system memory bus 121. The remaining fragment processing workloads would be allocated to the fragment processor 205 of the GPU 110.
Implementing fragment processing operations within the bridge component 105 provides an additional benefit in that the amount of integrated circuit die area within the GPU 110 that must be dedicated to “bookkeeping” can be reduced. Bookkeeping logic is used by conventional GPUs to keep track of accesses to the graphics data 116 that are “in-flight”. Such in-flight data accesses are used to hide the latency the GPU 110 experiences when reading or writing to the graphics data 116 across the low bandwidth graphics bus 120. In general, in-flight data accesses refer to a queued number of data reads or data writes that are issued to the graphics data 116 that have been initiated, but whose results have yet to be received.
Bookkeeping logic is used to keep track of such in-flight accesses and, for example, to make sure storage is on hand when read results from the graphics data 116 arrive and to ensure the graphics data 116 is not corrupted when multiple writes have been issued. The more complex the bookkeeping logic, the more in-flight data accesses the GPU can maintain, and thus, the more the effects of the high latency can be hidden. By offloading fragment processing operations to the bridge 105 (e.g., fragment processor 201), the demands placed on any bookkeeping logic within the GPU 110 is reduced.
In this manner, embodiments of the present invention implement a much more efficient use of the limited data transfer bandwidth of the graphics bus interconnect, and thus greatly improves overall graphics rendering performance in comparison to the prior art architectures. Furthermore, the benefits provided by the embodiments of the present invention are even more evident in those architectures which primarily utilize system memory for frame buffer graphics data storage.
The
The Z buffer blending module 322 evaluates depth information per fragment of a polygon and iterates the resulting depth information across the multiple pixels. As with color blending, Z buffer blending involves a large number of reads and writes to the frame buffer 310 and similarly benefits from the high bandwidth and low latency of the system memory bus 121.
In one embodiment, the fragment processor 201 is configured to use a Z plane equation coverage value to iterate depth information across multiple pixels of a polygon. In such an embodiment, the depth and orientation of a polygon in 3-D space is defined using a Z plane equation. The Z plane equation is used by the fragment processor 201 to determine depth information for each constituent pixel covered by the polygon, and is a much more compact method of describing depth information for a polygon than by using a list of Z values for each fragment of the polygon. Additional description of Z plane raster operations can be found in commonly assigned U.S. Patent Application “Z PLANE ROP” by Steve Molnar, filed on Jun. 28, 2004, Ser. No. 10/878,460, which is incorporated herein in its entirety.
The compression module 323 compresses and decompresses per pixel data for storage and retrieval from the frame buffer 310. For example, in some rendering operations a given pixel can have multiple value samples, with a number of bits per sample. For example, in a case where each pixel of a display includes 8 sample points, the compression module 323 would compress/decompress the data describing such sample points for easier access to and from the frame buffer 310.
The multi sample expansion module 324 performs multi sample expansion operations on the fragment data. For example, depending upon the application (e.g., anti-aliasing) the sample expansion module 324 can expand sample information from one sample point per pixel into eight sample points per pixel. Thus it is desirable to perform the sample expansion in the fragment processor 201 for storage into the frame buffer 310 as supposed to the GPU 110.
Referring still to
For example, as opposed to sending individual pixels and their values, the GPU 110 sends fragments to the fragment processor 201. These fragments are pre-expanded. The fragments undergo multi sample expansion within the fragment processor 201. Multi sample expansion is used in applications involving anti-aliasing and the like. A typical multi sampling expansion would take one sample of one fragment and expanded it into four samples (e.g., 4× anti-aliasing) or eight samples (e.g., 8× anti-aliasing). This much larger quantity of data is then transferred to the frame buffer 310 across the high bandwidth system memory bus 121 as opposed to the low bandwidth graphics bus 120. For example, in a typical anti-aliasing application, a given pixel can be expanded from one sample comprising 32 bits into eight samples comprising 32 bits each.
Similarly, the Z plane equation can be expanded into 4×, 8×, etc. samples per pixel by the fragment processor 201 from the plane equation for the original polygon. The resulting expanded Z data is then transferred from the fragment processor 201 across the high bandwidth system memory bus 121 to the frame buffer 310.
Process 500 begins in step 501, where the fragment processor 201 receives fragment pre-expanded color values from the GPU 110 via the graphics bus 120. In step 502, the fragment processor 201 performs a multi sample color value expansion for a plurality of pixels. In step 503, the fragment processor 201 receives a Z plane equation coverage value from the GPU 110. In step 504, the fragment processor 201 performs a Z plane iteration process to generate iterated Z values for a plurality of pixels. In step 505, as described above, the fragment processor 201 stores the resulting expanded color values and the resulting expanded Z values into the frame buffer 310 via the high bandwidth system memory bus 121. Subsequently, in step 506, the GPU 110 accesses the expanded color values and expanded Z values to render the image.
The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
3091657 | Stuessel | May 1963 | A |
3614740 | Delagi et al. | Oct 1971 | A |
3940740 | Coontz | Feb 1976 | A |
3987291 | Gooding et al. | Oct 1976 | A |
4101960 | Stokes et al. | Jul 1978 | A |
4541046 | Nagashima et al. | Sep 1985 | A |
4566005 | Apperley et al. | Jan 1986 | A |
4748585 | Chiarulli et al. | May 1988 | A |
4885703 | Deering | Dec 1989 | A |
4897717 | Hamilton et al. | Jan 1990 | A |
4951220 | Ramacher et al. | Aug 1990 | A |
4958303 | Assarpour et al. | Sep 1990 | A |
4965716 | Sweeney | Oct 1990 | A |
4965751 | Thayer et al. | Oct 1990 | A |
4985848 | Pfeiffer et al. | Jan 1991 | A |
4985988 | Littlebury | Jan 1991 | A |
5036473 | Butts et al. | Jul 1991 | A |
5040109 | Bowhill et al. | Aug 1991 | A |
5047975 | Patti et al. | Sep 1991 | A |
5175828 | Hall et al. | Dec 1992 | A |
5179530 | Genusov et al. | Jan 1993 | A |
5197130 | Chen et al. | Mar 1993 | A |
5210834 | Zurawski et al. | May 1993 | A |
5263136 | DeAguiar et al. | Nov 1993 | A |
5276893 | Savaria | Jan 1994 | A |
5327369 | Ashkenazi | Jul 1994 | A |
5357623 | Megory-Cohen | Oct 1994 | A |
5375223 | Meyers et al. | Dec 1994 | A |
5388206 | Poulton et al. | Feb 1995 | A |
5388245 | Wong | Feb 1995 | A |
5392437 | Matter et al. | Feb 1995 | A |
5408606 | Eckart | Apr 1995 | A |
5418973 | Ellis et al. | May 1995 | A |
5430841 | Tannenbaum et al. | Jul 1995 | A |
5430884 | Beard et al. | Jul 1995 | A |
5432905 | Hsieh et al. | Jul 1995 | A |
5448496 | Butts et al. | Sep 1995 | A |
5498975 | Cliff et al. | Mar 1996 | A |
5513144 | O'Toole | Apr 1996 | A |
5513354 | Dwork et al. | Apr 1996 | A |
5517666 | Ohtani et al. | May 1996 | A |
5522080 | Harney | May 1996 | A |
5530457 | Helgeson | Jun 1996 | A |
5560030 | Guttag et al. | Sep 1996 | A |
5561808 | Kuma et al. | Oct 1996 | A |
5574847 | Eckart et al. | Nov 1996 | A |
5574944 | Stager | Nov 1996 | A |
5578976 | Yao | Nov 1996 | A |
5627988 | Oldfield | May 1997 | A |
5634107 | Yumoto et al. | May 1997 | A |
5638946 | Zavracky | Jun 1997 | A |
5644753 | Ebrahim et al. | Jul 1997 | A |
5649173 | Lentz | Jul 1997 | A |
5666169 | Ohki et al. | Sep 1997 | A |
5682552 | Kuboki et al. | Oct 1997 | A |
5682554 | Harrell | Oct 1997 | A |
5706478 | Dye | Jan 1998 | A |
5754191 | Mills et al. | May 1998 | A |
5761476 | Martell | Jun 1998 | A |
5764243 | Baldwin | Jun 1998 | A |
5766979 | Budnaitis | Jun 1998 | A |
5784590 | Cohen et al. | Jul 1998 | A |
5784640 | Asghar et al. | Jul 1998 | A |
5796974 | Goddard et al. | Aug 1998 | A |
5802574 | Atallah et al. | Sep 1998 | A |
5809524 | Singh et al. | Sep 1998 | A |
5812147 | Van Hook et al. | Sep 1998 | A |
5835788 | Blumer et al. | Nov 1998 | A |
5848254 | Hagersten | Dec 1998 | A |
5909595 | Rosenthal et al. | Jun 1999 | A |
5913218 | Carney et al. | Jun 1999 | A |
5920352 | Inoue | Jul 1999 | A |
5925124 | Hilgendorf et al. | Jul 1999 | A |
5940090 | Wilde | Aug 1999 | A |
5940858 | Green | Aug 1999 | A |
5949410 | Fung | Sep 1999 | A |
5950012 | Shiell et al. | Sep 1999 | A |
5956252 | Lau et al. | Sep 1999 | A |
5978838 | Mohamed et al. | Nov 1999 | A |
5996996 | Brunelle | Dec 1999 | A |
5999199 | Larson | Dec 1999 | A |
5999990 | Sharrit et al. | Dec 1999 | A |
6009454 | Dummermuth | Dec 1999 | A |
6016474 | Kim et al. | Jan 2000 | A |
6041399 | Terada et al. | Mar 2000 | A |
6049672 | Shiell et al. | Apr 2000 | A |
6049870 | Greaves | Apr 2000 | A |
6065131 | Andrews et al. | May 2000 | A |
6067262 | Irrinki et al. | May 2000 | A |
6069540 | Berenz et al. | May 2000 | A |
6072686 | Yarbrough | Jun 2000 | A |
6073158 | Nally et al. | Jun 2000 | A |
6092094 | Ireton | Jul 2000 | A |
6094116 | Tai et al. | Jul 2000 | A |
6108766 | Hahn et al. | Aug 2000 | A |
6112019 | Chamdani et al. | Aug 2000 | A |
6131152 | Ang et al. | Oct 2000 | A |
6141740 | Mahalingaiah et al. | Oct 2000 | A |
6144392 | Rogers | Nov 2000 | A |
6150610 | Sutton | Nov 2000 | A |
6189068 | Witt et al. | Feb 2001 | B1 |
6192073 | Reader et al. | Feb 2001 | B1 |
6192458 | Arimilli et al. | Feb 2001 | B1 |
6208361 | Gossett | Mar 2001 | B1 |
6209078 | Chiang et al. | Mar 2001 | B1 |
6219628 | Kodosky et al. | Apr 2001 | B1 |
6222552 | Haas et al. | Apr 2001 | B1 |
6230254 | Senter et al. | May 2001 | B1 |
6239810 | Van Hook et al. | May 2001 | B1 |
6247094 | Kumar et al. | Jun 2001 | B1 |
6249288 | Campbell | Jun 2001 | B1 |
6252610 | Hussain | Jun 2001 | B1 |
6255849 | Mohan | Jul 2001 | B1 |
6292886 | Makineni et al. | Sep 2001 | B1 |
6301600 | Petro et al. | Oct 2001 | B1 |
6307169 | Sun et al. | Oct 2001 | B1 |
6314493 | Luick | Nov 2001 | B1 |
6317819 | Morton | Nov 2001 | B1 |
6351808 | Joy et al. | Feb 2002 | B1 |
6363285 | Wey | Mar 2002 | B1 |
6363295 | Akram et al. | Mar 2002 | B1 |
6370617 | Lu et al. | Apr 2002 | B1 |
6437789 | Tidwell et al. | Aug 2002 | B1 |
6438664 | McGrath et al. | Aug 2002 | B1 |
6476808 | Kuo et al. | Nov 2002 | B1 |
6480927 | Bauman | Nov 2002 | B1 |
6490654 | Wickeraad et al. | Dec 2002 | B2 |
6496193 | Surti et al. | Dec 2002 | B1 |
6496902 | Faanes et al. | Dec 2002 | B1 |
6499090 | Hill et al. | Dec 2002 | B1 |
6525737 | Duluk, Jr. et al. | Feb 2003 | B1 |
6529201 | Ault et al. | Mar 2003 | B1 |
6545683 | Williams | Apr 2003 | B1 |
6597357 | Thomas | Jul 2003 | B1 |
6603481 | Kawai et al. | Aug 2003 | B1 |
6624818 | Mantor et al. | Sep 2003 | B1 |
6631423 | Brown et al. | Oct 2003 | B1 |
6631463 | Floyd et al. | Oct 2003 | B1 |
6657635 | Hutchins et al. | Dec 2003 | B1 |
6658447 | Cota-Robles | Dec 2003 | B2 |
6674841 | Johns et al. | Jan 2004 | B1 |
6690381 | Hussain et al. | Feb 2004 | B1 |
6700588 | MacInnis et al. | Mar 2004 | B1 |
6715035 | Colglazier et al. | Mar 2004 | B1 |
6732242 | Hill et al. | May 2004 | B2 |
6750870 | Olarig | Jun 2004 | B2 |
6809732 | Zatz et al. | Oct 2004 | B2 |
6812929 | Lavelle et al. | Nov 2004 | B2 |
6825848 | Fu et al. | Nov 2004 | B1 |
6839062 | Aronson et al. | Jan 2005 | B2 |
6862027 | Andrews et al. | Mar 2005 | B2 |
6891543 | Wyatt | May 2005 | B2 |
6915385 | Leasure et al. | Jul 2005 | B1 |
6944744 | Ahmed et al. | Sep 2005 | B2 |
6952214 | Naegle et al. | Oct 2005 | B2 |
6965982 | Nemawarkar | Nov 2005 | B2 |
6975324 | Valmiki et al. | Dec 2005 | B1 |
6976126 | Clegg et al. | Dec 2005 | B2 |
6978149 | Morelli et al. | Dec 2005 | B1 |
6978457 | Johl et al. | Dec 2005 | B1 |
6981106 | Bauman et al. | Dec 2005 | B1 |
6985151 | Bastos et al. | Jan 2006 | B1 |
7015909 | Morgan, III et al. | Mar 2006 | B1 |
7031330 | Bianchini, Jr. | Apr 2006 | B1 |
7032097 | Alexander et al. | Apr 2006 | B2 |
7035979 | Azevedo et al. | Apr 2006 | B2 |
7148888 | Huang | Dec 2006 | B2 |
7151544 | Emberling | Dec 2006 | B2 |
7154500 | Heng et al. | Dec 2006 | B2 |
7159212 | Schenk et al. | Jan 2007 | B2 |
7185178 | Barreh et al. | Feb 2007 | B1 |
7202872 | Paltashev et al. | Apr 2007 | B2 |
7260677 | Vartti et al. | Aug 2007 | B1 |
7305540 | Trivedi et al. | Dec 2007 | B1 |
7321787 | Kim | Jan 2008 | B2 |
7334110 | Faanes et al. | Feb 2008 | B1 |
7369815 | Kang et al. | May 2008 | B2 |
7373478 | Yamazaki | May 2008 | B2 |
7406698 | Richardson | Jul 2008 | B2 |
7412570 | Moll et al. | Aug 2008 | B2 |
7486290 | Kilgariff et al. | Feb 2009 | B1 |
7487305 | Hill et al. | Feb 2009 | B2 |
7493452 | Eichenberger et al. | Feb 2009 | B2 |
7545381 | Huang et al. | Jun 2009 | B2 |
7564460 | Boland et al. | Jul 2009 | B2 |
7750913 | Parenteau et al. | Jul 2010 | B1 |
7777748 | Bakalash et al. | Aug 2010 | B2 |
7852341 | Rouet et al. | Dec 2010 | B1 |
7869835 | Zu | Jan 2011 | B1 |
8020169 | Yamasaki | Sep 2011 | B2 |
8416251 | Gadre et al. | Apr 2013 | B2 |
8424012 | Karandikar et al. | Apr 2013 | B1 |
8493396 | Karandikar et al. | Jul 2013 | B2 |
8493397 | Su et al. | Jul 2013 | B1 |
8683184 | Lew et al. | Mar 2014 | B1 |
8687008 | Karandikar et al. | Apr 2014 | B2 |
8698817 | Gadre et al. | Apr 2014 | B2 |
8711161 | Scotzniovsky et al. | Apr 2014 | B1 |
8725990 | Karandikar et al. | May 2014 | B1 |
8736623 | Lew et al. | May 2014 | B1 |
8738891 | Karandikar et al. | May 2014 | B1 |
20010026647 | Morita | Oct 2001 | A1 |
20020005729 | Leedy | Jan 2002 | A1 |
20020026623 | Morooka | Feb 2002 | A1 |
20020031025 | Shimano et al. | Mar 2002 | A1 |
20020085000 | Sullivan et al. | Jul 2002 | A1 |
20020087833 | Burns et al. | Jul 2002 | A1 |
20020116595 | Morton | Aug 2002 | A1 |
20020130874 | Baldwin | Sep 2002 | A1 |
20020144061 | Faanes et al. | Oct 2002 | A1 |
20020158869 | Ohba et al. | Oct 2002 | A1 |
20020194430 | Cho | Dec 2002 | A1 |
20030001847 | Doyle et al. | Jan 2003 | A1 |
20030001857 | Doyle | Jan 2003 | A1 |
20030003943 | Bajikar | Jan 2003 | A1 |
20030014457 | Desai et al. | Jan 2003 | A1 |
20030016217 | Vlachos et al. | Jan 2003 | A1 |
20030016844 | Numaoka | Jan 2003 | A1 |
20030020173 | Huff et al. | Jan 2003 | A1 |
20030031258 | Wang et al. | Feb 2003 | A1 |
20030051091 | Leung et al. | Mar 2003 | A1 |
20030061409 | RuDusky | Mar 2003 | A1 |
20030067473 | Taylor et al. | Apr 2003 | A1 |
20030080963 | Van Hook et al. | May 2003 | A1 |
20030093506 | Oliver et al. | May 2003 | A1 |
20030115500 | Akrout et al. | Jun 2003 | A1 |
20030169269 | Sasaki et al. | Sep 2003 | A1 |
20030172326 | Coffin, III et al. | Sep 2003 | A1 |
20030188118 | Jackson | Oct 2003 | A1 |
20030204673 | Venkumahanti et al. | Oct 2003 | A1 |
20030204680 | Hardage, Jr. | Oct 2003 | A1 |
20030227461 | Hux et al. | Dec 2003 | A1 |
20040012597 | Zatz et al. | Jan 2004 | A1 |
20040073771 | Chen et al. | Apr 2004 | A1 |
20040073773 | Demjanenko | Apr 2004 | A1 |
20040103253 | Kamei et al. | May 2004 | A1 |
20040193837 | Devaney et al. | Sep 2004 | A1 |
20040205281 | Lin et al. | Oct 2004 | A1 |
20040205326 | Sindagi et al. | Oct 2004 | A1 |
20040212730 | MacInnis et al. | Oct 2004 | A1 |
20040215887 | Starke | Oct 2004 | A1 |
20040221117 | Shelor | Nov 2004 | A1 |
20040263519 | Andrews et al. | Dec 2004 | A1 |
20050012749 | Gonzalez et al. | Jan 2005 | A1 |
20050012759 | Valmiki et al. | Jan 2005 | A1 |
20050024369 | Xie | Feb 2005 | A1 |
20050060601 | Gomm | Mar 2005 | A1 |
20050071722 | Biles | Mar 2005 | A1 |
20050088448 | Hussain et al. | Apr 2005 | A1 |
20050140682 | Sumanaweera et al. | Jun 2005 | A1 |
20050239518 | D'Agostino et al. | Oct 2005 | A1 |
20050262332 | Rappoport et al. | Nov 2005 | A1 |
20050280652 | Hutchins et al. | Dec 2005 | A1 |
20060020843 | Frodsham et al. | Jan 2006 | A1 |
20060064517 | Oliver | Mar 2006 | A1 |
20060064547 | Kottapalli et al. | Mar 2006 | A1 |
20060103659 | Karandikar et al. | May 2006 | A1 |
20060152519 | Hutchins et al. | Jul 2006 | A1 |
20060152520 | Gadre et al. | Jul 2006 | A1 |
20060176308 | Karandikar et al. | Aug 2006 | A1 |
20060176309 | Gadre et al. | Aug 2006 | A1 |
20070076010 | Swamy et al. | Apr 2007 | A1 |
20070130444 | Mitu et al. | Jun 2007 | A1 |
20070285427 | Morein et al. | Dec 2007 | A1 |
20080016327 | Menon et al. | Jan 2008 | A1 |
20080278509 | Washizu et al. | Nov 2008 | A1 |
20090235051 | Codrescu et al. | Sep 2009 | A1 |
20120023149 | Kinsman et al. | Jan 2012 | A1 |
Number | Date | Country |
---|---|---|
07-101885 | Apr 1995 | JP |
H08-077347 | Mar 1996 | JP |
H08-153032 | Jun 1996 | JP |
08-297605 | Dec 1996 | JP |
09-287217 | Oct 1997 | JP |
09-287217 | Nov 1997 | JP |
H09-325759 | Dec 1997 | JP |
10-222476 | Aug 1998 | JP |
11-190447 | Jul 1999 | JP |
2000-148695 | May 2000 | JP |
2001-022638 | Jan 2001 | JP |
2003-178294 | Jun 2003 | JP |
2004-252990 | Sep 2004 | JP |
1998-018215 | Aug 2000 | KR |
413766 | Dec 2000 | TW |
436710 | May 2001 | TW |
442734 | Jun 2001 | TW |
093127712 | Jul 2005 | TW |
Entry |
---|
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1997 p. 8-1. |
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1999 p. 8-1, 9-1. |
Intel, Pentium Processor Family Developer's Manual, 1997, pp. 2-13. |
Fisher, Joseph A., Very Long Instruction Word Architecture and the ELI-512, ACM, 1993, pp. 140-150. |
Hamacher, V. Carl et al., Computer Organization, Second Edition, McGraw Hill, 1984, pp. 1-9. |
Kozyrakis, “A Media enhanced vector architecture for embedded memory systems,” Jul. 1999, http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-99-1059.pdf. |
Brown, Brian; “Data Structure And Number Systems”; 2000; http://www.ibilce.unesp.br/courseware/datas/data3.htm. |
“Alpha Testing State”; http://msdn.microsoft.com/library/en-us/directx9—c/directx/graphics/programmingguide/GettingStarted/Direct3Kdevices/States/renderstates/alphatestingstate.asp. |
“Anti-aliasing”; http://en.wikipedia.org/wiki/Anti-aliasing. |
“Vertex Fog”; http://msdn.microsoft.com/library/en-us/directx9—c/Vertex—fog.asp?frame=true. |
NVIDIA Corporation, Technical Brief: Transform and Lighting; dated 1999; month unknown. |
Graham, Susan L. et al., Getting Up to Speed: The future of Supercomputing, the National Academies Press, 2005, glossary. |
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 102 and 338 (NVID-P001502). |
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 305. |
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1988, pp. 273. |
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1984, pp. 566. |
Graston et al. (Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture); Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems; pp. 138-144; Year of Publication: 2001. |
Duca et al., A Relational Debugging Engine for Graphics Pipeline, International Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2005, pp. 453-463, ISSN:0730-0301. |
Gadre, S., Patent Application Entitled “Video Processor Having Scalar and Vector Components with Command FIFO for Passing Function Calls from Scalar to Vector”, U.S. Appl. No. 11/267,700, filed Nov. 4, 2005. |
Gadre, S., Patent Application Entitled “Stream Processing in a Video Processor”, U.S. Appl. No. 11/267,599, filed Nov. 4, 2005. |
Karandikar et al., Patent Application Entitled: “Multidemnsional Datapath Processing in a Video Processor”, U.S. Appl. No. 11/267,638, filed Nov. 4, 2005. |
Karandikar et al., Patent Application Entitled: “A Latency Tolerant System for Executing Video Processing Operations”, U.S. Appl. No. 11/267,875, filed Nov. 4, 2005. |
Gadre, S., Patent Application Entitled “Separately Schedulable Condition Codes For a Video Processor”, U.S. Appl. No. 11/267,793, filed Nov. 4, 2005. |
Lew, et al., Patent Application Entitled “A Programmable DMA Engine for Implementing Memory Transfers for a Video Processor”, U.S. Appl. No. 11/267,777, filed Nov. 4, 2005. |
Karandikar et al., Patent Application Entitled: “A Pipelined L2 Cache for Memory Transfers for a Video Processor”, U.S. Appl. No. 11/267,606, filed Nov. 4, 2005. |
Karandikar, et al., Patent Application Entitled: “Command Acceleration in a Video Processor”, U.S. Appl. No. 11/267,640, filed Nov. 4, 2005. |
Karandikar, et al., Patent Application Entitled “A Configurable SIMD Engine in a Video Processor”, U.S. Appl. No. 11/267,393, filed Nov. 4, 2005. |
Karandikar, et al., Patent Application Entitled “Context Switching on a Video Processor Having a Scalar Execution Unit and a Vector Execution Unit”, U.S. Appl. No. 11/267,778, filed Nov. 4, 2005. |
Lew, et al., Patent Application Entitled “Multi Context Execution on a Video Processor”, U.S. Appl. No. 11/267,780, filed Nov. 4, 2005. |
Su, Z, et al., Patent Application Entitled: “State Machine Control for a Pipelined L2 Cache to Implement Memory Transfers for a Video Processor”, U.S. Appl. No. 11/267,119, filed Nov. 4, 2005. |
Free On-Line Dictionary of Computing (FOLDOC), defintion of “video”, from foldoc.org/index.cgi?query=video&action=Search, May 23, 2008. |
FOLDOC, definition of “frame buffer”, from foldoc.org/index.cgi?query=frame+buffer&action=Search, Oct. 3, 1997. |
FOLDOC, definition of “motherboard”, from foldoc.org/index.cgi?query=motherboard&action=Search, Aug. 10, 2000. |
FOLDOC, definition of “separate compilation”, from foldoc.org/index.cgi?query=separate+compilation&action=Search, Feb. 19, 2005. |
FOLDOC, definition of “vector processor”, http://foldoc.org/, Sep. 11, 2003. |
FOLDOC (Free On-Line Dictionary of Computing), defintion of X86, Feb. 27, 2004. |
FOLDOC, definition of “superscalar,” http://foldoc.org/, Jun. 22, 2009. |
FOLDOC, definition of Pentium, Sep. 30, 2003. |
Wikipedia, definition of “scalar processor,” Apr. 4, 2009. |
Wikipedia, entry page defining term “SIMD”, last modified Mar. 17, 2007. |
FOLDOC, Free Online Dictionary of Computing, defintion of SIMD, foldoc.org/index.cgi?query=simd&action=Search, Nov. 4, 1994. |
Definition of “queue” from Free on-Line Dictionary of Computing (FOLDOC), http://folddoc.org/index.cgi?query=queue&action=Search, May 15, 2007. |
Definition of “first-in first-out” from FOLDOC, http://foldoc.org/index.cgi?query=fifo&action=Search, Dec. 6, 1999. |
Definition of “block” from FOLDOC, http://foldoc.org/index.cgi?block, Sep. 23, 2004. |
Wikipedia, definition of Multiplication, accessed from en.wikipedia.org/w/index.php?title=Multiplication&oldid=1890974, published Oct. 13, 2003. |
Graham, Susan L. et al., Getting Up to Speed: The future of Supercomputing, the National Academies Press, 2005, glossary, Feb. 2005. |
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 102 and 338 (NVID-P001502), Dec. 1987. |
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 305, Dec. 1987. |
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1988, pp. 273, Dec. 1988. |
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1984, pp. 566, Dec. 1988. |
Wikipeida, definition of “subroutine”, published Nov. 29, 2003, four pages. |
Graston et al. (Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture); Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems; pp. 138-144; Year of Publication: 2001, Oct. 2001. |
SearchStorage.com Definitions, “Pipeline Burst Cache,” Jul. 31, 2001, url: http://searchstorage.techtarget.com/sDefinition/0,,sid5—gci214414,00.html. |
Parhami, Behrooz, Computer Arithmetic: Algorithms and Hardware Designs, Oxford University Press, Jun. 2000, pp. 413-418. |
gDEBugger, graphicRemedy, http://www.gremedy.com, Aug. 8, 2006. |
Duca et al., A Relational Debugging Engine for Graphics Pipeline, International Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2005, pp. 453-463, ISSN:0730-0301, Jul. 2005. |
Merriam-Webster Dictionary Online; Definition for “program”; retrieved Dec. 14, 2010. |
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1997 p. 8-1, Jan. 1997. |
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1999 p. 8-1, 9-1, May 1999. |
Intel, Intel Pentium III Xeon Processor at 500 and 550Mhz, Feb. 1999. |
Intel, Intel MMX Technology at a Glance, Jun. 1997. |
Intel, Pentium Processor Family Developer's Manual, 1997, pp. 2-13, Oct. 199. |
Intel, Pentium processor with MMX Technology at 233Mhz Performance Brief, Jan. 1998, pp. 3 and 8. |
PCreview, article entitled “What is a Motherboard”, from www.pcreview.co.uk/articles/Hardware/What—is—a—Motherboard., Nov. 22, 2005. |
Wikipedia, defintion of “vector processor”, http://en.wikipedia.org/, May 14, 2007. |
Fisher, Joseph A., Very Long Instruction Word Architecture and the ELI-512, ACM, 1993, pp. 140-150, Jun. 1993. |
Quinnell, Richard A. “New DSP Architectures Go “Post-Harvard” for Higher Performance and Flexibility” Techonline; posted May 1, 2002. |
IBM TDB, Device Queue Management, vol. 31 Iss. 10, pp. 45-50, Mar. 1, 1989. |
Hamacher, V. Carl et al., Computer Organization, Second Edition, McGraw Hill, 1984, pp. 1-9, May 1984. |
Kozyrakis, “A Media enhanced vector architecture for embedded memory systems,” Jul. 1999, http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-99/1059.pdf. |
HPL-PD A Parameterized Research Approach—May 31, 2004 http://web.archive.org/web/*/www.trimaran.org/docs/5—hpl-pd.pdf. |
Hutchins E., SC10: A Video Processor And Pixel-Shading GPU for Handheld Devices; presented at the Hot Chips conferences on Aug. 23, 2004. |
Brown, Brian; “Data Structure And Number Systems”; 2000; http://www.ibilce.unesp.br/courseware/datas/data3.htm, Mar. 2000. |
“Alpha Testing State”; http://msdn.microsoft.com/library/en-us/directx9—c/directx/graphics/programmingguide/GettingStarted/Direct3Kdevices/States/renderstates/alphatestingstate.asp, Sep. 2004. |
“Anti-aliasing”; http://en.wikipedia.org/wiki/Anti-aliasing, Mar. 2004. |
“Vertex Fog”; http://msdn.microsoft.com/library/en-us/directx9—c/Vertex—fog.asp?frame=true, Apr. 2008. |
Wilson D., NVIDIA's Tiny 90nm G71 and G73: GeForce 7900 and 7600 Debut; at http://www.anandtech.com/show/1967/2; dated Sep. 3, 2006, retrieved Jun. 16, 2011. |
Woods J., Nvidia GeForce FX Preview, at http://www.tweak3d.net/reviews/nvidia/nv30preview/1.shtml; dated Nov. 18, 2002; retrieved Jun. 16, 2011. |
NVIDIA Corporation, Technical Brief: Transform and Lighting; dated 1999; month unknown, Apr. 1999. |