Method and system for implementing fragment operation processing across a graphics bus interconnect

Information

  • Patent Grant
  • 9092170
  • Patent Number
    9,092,170
  • Date Filed
    Tuesday, October 18, 2005
    19 years ago
  • Date Issued
    Tuesday, July 28, 2015
    9 years ago
  • CPC
  • Field of Search
    • US
    • 345 422000
    • 345 501000
    • 345 502000
    • 345 503000
    • 345 506000
    • 345 520000
    • 345 522000
    • 345 531000
    • 345 532000
    • 345 568000
    • 345 582000
    • 710 003000
    • 710 107000
    • 710 126000
    • 710 128000
    • 710 129000
    • 710 131000
    • 710 132000
    • 710 305000
    • 710 306000
    • 710 310000
    • 710 316000
    • 710 317000
    • 711 002000
    • 711 154000
    • 711 206000
  • International Classifications
    • G06F13/14
    • G06F15/16
    • G06F3/12
    • Term Extension
      920
Abstract
A method and system for a cooperative graphics processing across a graphics bus in a computer system. The system includes a bridge coupled to a system memory via a system memory bus and coupled to a graphics processor via the graphics bus. The bridge includes a fragment processor for implementing cooperative graphics processing with the graphics processor coupled to the graphics bus. The fragment processor is configured to implement a plurality of raster operations on graphics data stored in the system memory.
Description
FIELD OF THE INVENTION

The present invention is generally related to graphics computer systems.


BACKGROUND OF THE INVENTION

Generally, a computer system suited to handle 3D image data includes a specialized graphics processor unit, or GPU, in addition to a traditional CPU (central processing unit). The GPU includes specialized hardware configured to handle 3D computer-generated objects. The GPU is configured to operate on a set of data models and their constituent “primitives” (usually mathematically described polygons) that define the shapes, positions, and attributes of the objects. The hardware of the GPU processes the objects, implementing the calculations required to produce realistic 3D images on a display of the computer system.


The performance of a typical graphics rendering process is highly dependent upon the performance of the system's underlying hardware. High performance real-time graphics rendering requires high data transfer bandwidth to the memory storing the 3D object data and the constituent primitives. Thus, more expensive prior art GPU subsystems (e.g., GPU equipped graphics cards) typically include larger (e.g., 128 MB or larger) specialized, expensive, high bandwidth local graphics memories for feeding the required data to the GPU. Less expensive prior art GPU subsystems include smaller (e.g., 64 MB or less) such local graphics memories, and some of the least expensive GPU subsystems have no local graphics memory.


A problem with the prior art low-cost GPU subsystems (e.g., having smaller amounts of local graphics memory) is the fact that the data transfer bandwidth to the system memory, or main memory, of a computer system is much less than the data transfer bandwidth to the local graphics memory. Typical GPUs with any amount of local graphics memory need to read command streams and scene descriptions from system memory. A GPU subsystem with a small or absent local graphics memory also needs to communicate with system memory in order to access and update pixel data including pixels representing images which the GPU is constructing. This communication occurs across a graphics bus, or the bus that connects the graphics subsystem to the CPU and system memory.


In one example, per-pixel Z-depth data is read across the system bus and compared with a computed value for each pixel to be rendered. For all pixels which have a computed Z value less than the Z value read from system memory, the computed Z value and the computed pixel color value are written to system memory. In another example, pixel colors are read from system memory and blended with computed pixel colors to produce translucency effects before being written to system memory. Higher resolution images (images with a greater number of pixels) require more system memory bandwidth to render. Images representing larger numbers of 3D objects require more system memory bandwidth to render. The low data transfer bandwidth of the graphics bus acts as a bottleneck on overall graphics rendering performance.


Thus, what is required is a solution capable of reducing the limitations imposed by the limited data transfer bandwidth of a graphics bus of a computer system. What is required is a solution that ameliorates the bottleneck imposed by the much smaller data transfer bandwidth of the graphics bus in comparison to the data transfer bandwidth of the GPU to local graphics memory. The present invention provides a novel solution to the above requirement.


SUMMARY OF THE INVENTION

Embodiments of the present invention ameliorate the bottleneck imposed by the much smaller data transfer bandwidth of the graphics bus in comparison to the data transfer bandwidth of the GPU to local graphics memory.


In one embodiment, the present invention is implemented as a system for cooperative graphics processing across a graphics bus in a computer system. The system includes a bridge coupled to a system memory via a system memory bus. The bridge is also coupled to a GPU (graphics processor unit) via the graphics bus. The bridge includes a fragment processor for implementing cooperative graphics processing with the GPU coupled to the graphics bus. The fragment processor is configured to implement a plurality of raster operations on graphics data stored in the system memory. The graphics bus interconnect coupling the bridge to the GPU to can be an AGP-based interconnect or a PCI Express-based interconnect. The GPU can be an add-in card-based GPU or can be a discrete integrated circuit device mounted (e.g., surface mounted, etc.) on the same printed circuit board (e.g., motherboard) as the bridge.


In one embodiment, the graphics data stored in the system memory comprises a frame buffer used by both the fragment processor and the GPU. One mode of cooperative graphics processing involves the fragment processor implementing frame buffer blending on the graphics data in the system memory (e.g., the frame buffer). In one embodiment, the fragment processor is configured to implement multi-sample expansion on graphics data received from the GPU and store the resulting expanded data in the system memory frame buffer. In one embodiment, the fragment processor is configured to evaluate a Z plane equation coverage value for a plurality of pixels (e.g., per polygon) stored in the system memory, wherein the Z plane equation coverage value is received from the GPU via the graphics bus.


In this manner, embodiments of the present invention implement a much more efficient use of the limited data transfer bandwidth of the graphics bus interconnect, and thus dramatically improve overall graphics rendering performance in comparison to the prior art. Furthermore, the benefits provided by the embodiments of the present invention are even more evident in those architectures which primarily utilize system memory for frame buffer graphics data storage.





BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.



FIG. 1 shows a computer system in accordance with one embodiment of the present invention.



FIG. 2 shows a diagram depicting fragment operation processing as implemented by a computer system in accordance with one embodiment of the present invention.



FIG. 3 shows a diagram depicting fragment processing operations executed by the fragment processor and the rendering operations executed by the GPU within a cooperative graphics rendering process in accordance with one embodiment of the present invention.



FIG. 4 shows a diagram depicting information that is transferred from the GPU to the fragment processor and to the frame buffer in accordance with one embodiment of the present invention.



FIG. 5 shows a flowchart of the steps of a cooperative graphics rendering process in accordance with one embodiment of the present invention.





DETAILED DESCRIPTION OF THE INVENTION

Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments of the present invention.


Notation and Nomenclature:


Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.


It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of a computer system (e.g., computer system 100 of FIG. 1), or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.


Computer System Platform:



FIG. 1 shows a computer system 100 in accordance with one embodiment of the present invention. Computer system 100 depicts the components of a basic computer system in accordance with embodiments of the present invention providing the execution platform for certain hardware-based and software-based functionality. In general, computer system 100 comprises at least one CPU 101, a system memory 115, and at least one graphics processor unit (GPU) 110. The CPU 101 can be coupled to the system memory 115 via the bridge component 105 or can be directly coupled to the system memory 115 via a memory controller internal to the CPU 101. The GPU 110 is coupled to a display 112. System 100 can be implemented as, for example, a desktop computer system or server computer system, having a powerful general-purpose CPU 101 coupled to a dedicated graphics rendering GPU 110. In such an embodiment, components would be included that are designed to add peripheral buses, specialized graphics memory, JO devices (e.g., disk drive 112), and the like. The bridge component 105 also supports expansion buses coupling the disk drive 112.


It should be appreciated that although the GPU 110 is depicted in FIG. 1 as a discrete component, the GPU 110 can be implemented as a discrete graphics card designed to couple to the computer system via a graphics bus connection (e.g., AGP, PCI Express, etc.), as a discrete integrated circuit die (e.g., mounted directly on the motherboard), or as an integrated GPU included within the integrated circuit die of a computer system chipset component (e.g., integrated within the bridge chip 105). Additionally, a local graphics memory 111 can optionally be included for the GPU 110 for high bandwidth graphics data storage. It also should be noted that although the bridge component 105 is depicted as a discrete component, the bridge component 105 can be implemented as an integrated controller within a different component (e.g., within the CPU 101, GPU 110, etc.) of the computer system 100. Similarly, system 100 can be implemented as a set-top video game console device such as, for example, the Xbox®, available from Microsoft Corporation of Redmond, Wash.


EMBODIMENTS OF THE PRESENT INVENTION

Referring still to FIG. 1, embodiments of the present invention reduce constraints imposed by the limited data transfer bandwidth of a graphics bus (e.g., graphics bus 120) of a computer system. Embodiments of the present invention ameliorate the bottleneck imposed by the much smaller data transfer bandwidth of the graphics bus 120 in comparison to the data transfer bandwidth of the system memory bus 121 to system memory 115. This is accomplished in part by the bridge 105 implemented cooperative graphics processing in conjunction with the GPU 110 to reduce the amount data that must be transferred across the graphics bus 120 during graphics rendering operations. As shown FIG. 1, the bridge component 105 is a core logic chipset component that provides core logic functions for the computer system 100.


The cooperative graphics processing reduces the total amount data that must be transferred across the bandwidth constrained graphics bus 120. By performing certain graphics rendering operations within the bridge component 105, the comparatively high bandwidth system memory bus 121 can be used to access to graphics data 116, reducing the amount of data access latency experienced by these rendering operations. The resulting reduction in access latency, and increase in transfer bandwidth, allows the overall graphics rendering operations to proceed more efficiently, thereby increasing the performance of bandwidth-demanding 3D rendering applications. This cooperative graphics rendering process is described in further detail in FIG. 2 below.



FIG. 2 shows a diagram depicting a fragment operation processing process in accordance with one embodiment of the present invention. As depicted in FIG. 2, the GPU 110 is coupled to the bridge 105 via the low bandwidth graphics bus 120. The bridge 105 is further coupled to the system memory 115 via the high bandwidth system memory bus 121. FIG. 2 depicts a configuration whereby system memory 115 is used as frame buffer memory (e.g., graphics data 116) for the computer system (as opposed to a local graphics memory). The bridge 105 is configured to access to graphics data 116 via the high bandwidth system memory bus 121. The bridge 105 is also coupled to the GPU 110 via the graphics bus 120.


The bridge 105 includes a fragment processor 201 for implementing cooperative graphics processing with the GPU 110. The fragment processor 201 is configured to implement a plurality of raster operations on graphics data stored in the system memory. These raster operations executed by the fragment processor 201 suffer a much lower degree of latency in comparison to raster operations performed by the GPU 110. This is due to both the higher data transfer bandwidth of the system memory bus 121 and the shorter communications path (e.g., lower data access latency) between the fragment processor 201 and the graphics data 116 within the system memory 115.


Performing a portion of the raster operations, or all of the raster operations, required for graphics rendering in the fragment processor 201 reduces the amount of graphics data accesses (e.g., both reads and writes) that must be performed by the GPU 110. For example, by implementing fragment operations within the fragment processor 201, accesses to fragment data (e.g., graphics data 116) required for iterating fragment colors across multiple pixels can be performed across the high bandwidth system memory bus 121. For example, fragment data can be accessed, iterated across multiple pixels, and the resulting pixel color values can be stored back into the system memory 115 all across the high bandwidth system memory bus 121. The interpolation and iteration functions can be executed by the fragment processor 201 in conjunction with an internal RAM 215. Such fragment processing operations comprise a significant portion of the rendering accesses to the graphics data 116. Implementing them using a fragment processor within the bridge will effectively remove such traffic from the low bandwidth graphics bus 120.


In one embodiment, the fragment processor 201 substantially replaces the functionality of the fragment processor 205 within the GPU 110. In this manner, the incorporation of the fragment processor 201 renders the fragment processor 205 within the GPU 110 optional. For example, in one embodiment, the GPU 110 is an off-the-shelf card-based GPU that is detachably connected to the computer system 100 via a graphics bus interconnect slot (e.g., AGP slot, PCI Express slot, etc.). Such an off-the-shelf card-based GPU would typically incorporate its own one or more fragment processors for use in those systems having a conventional prior art type bridge. The graphics bus interconnect can be an AGP-based interconnect or a PCI Express-based interconnect. The GPU 110 can be an add-in card-based GPU or can be a discrete integrated circuit device mounted (e.g., surface mounted, etc.) on the same printed circuit board (e.g., motherboard) as the bridge 105. When connected to the bridge 105 of the computer system 100 embodiment, the included fragment processor(s) can be disabled (e.g., by the graphics driver).


Alternatively, the GPU 110 can be configured specifically for use with a bridge component having an internal fragment processor (e.g., fragment processor 201). Such a configuration provides advantages in that the GPU integrated circuit die area that would otherwise be dedicated to an internal fragment processor can be saved (e.g., thereby reducing GPU costs) or used for other purposes. In this manner, the inclusion of a fragment processor 205 within the GPU 110 is optional.


In one embodiment, the internal fragment processor 205 within the GPU 110 can be used by the graphics driver in conjunction with the fragment processor 201 within the bridge 105 to implement concurrent raster operations within both components. In such an embodiment, the graphics driver would allocate some portion of fragment processing to the fragment processor 201 and the remainder to the fragment processor 205. The graphics driver would balance the processing workloads between the bridge component 105 and the GPU 110 to best utilize the high bandwidth low latency connection of the bridge component 105 to the system memory 115. For example, to best utilize the high bandwidth system memory bus 121, it would be preferable to implement as large a share as possible of the fragment processing workloads within the bridge component 105 (e.g., fragment processor 201). This would ensure as large a percentage of the fragment operations as is practical are implemented using the low latency high bandwidth system memory bus 121. The remaining fragment processing workloads would be allocated to the fragment processor 205 of the GPU 110.


Implementing fragment processing operations within the bridge component 105 provides an additional benefit in that the amount of integrated circuit die area within the GPU 110 that must be dedicated to “bookkeeping” can be reduced. Bookkeeping logic is used by conventional GPUs to keep track of accesses to the graphics data 116 that are “in-flight”. Such in-flight data accesses are used to hide the latency the GPU 110 experiences when reading or writing to the graphics data 116 across the low bandwidth graphics bus 120. In general, in-flight data accesses refer to a queued number of data reads or data writes that are issued to the graphics data 116 that have been initiated, but whose results have yet to be received.


Bookkeeping logic is used to keep track of such in-flight accesses and, for example, to make sure storage is on hand when read results from the graphics data 116 arrive and to ensure the graphics data 116 is not corrupted when multiple writes have been issued. The more complex the bookkeeping logic, the more in-flight data accesses the GPU can maintain, and thus, the more the effects of the high latency can be hidden. By offloading fragment processing operations to the bridge 105 (e.g., fragment processor 201), the demands placed on any bookkeeping logic within the GPU 110 is reduced.


In this manner, embodiments of the present invention implement a much more efficient use of the limited data transfer bandwidth of the graphics bus interconnect, and thus greatly improves overall graphics rendering performance in comparison to the prior art architectures. Furthermore, the benefits provided by the embodiments of the present invention are even more evident in those architectures which primarily utilize system memory for frame buffer graphics data storage.



FIG. 3 shows a diagram depicting fragment processing operations executed by the fragment processor 201 and the rendering operations executed by the GPU 110 within a cooperative graphics rendering process in accordance with one embodiment of the present invention. As depicted in FIG. 3, the fragment processor 201 implements its accesses to the frame buffer 310 via the high bandwidth system bus 121. The GPU 110 implements its accesses to the frame buffer 310 via the low bandwidth graphics bus 120.


The FIG. 3 embodiment shows the functions included within the fragment processor 201. As shown in FIG. 3, the fragment processor includes those raster operations such as frame buffer blending 321 and Z buffer operations, and other operations such as compression, and multi-sample expansion. For example, the frame buffer blending module 321 blends fragment colors into the pixel data of the frame buffer 310. Generally, this involves interpolating existing pixel colors with fragment colors and iterating those resulting values across the multiple pixels of a polygon. Such frame buffer blending involves a large number of reads and writes to the frame buffer 310 and thus directly benefits from the high bandwidth and low latency afforded by the system memory bus 121.


The Z buffer blending module 322 evaluates depth information per fragment of a polygon and iterates the resulting depth information across the multiple pixels. As with color blending, Z buffer blending involves a large number of reads and writes to the frame buffer 310 and similarly benefits from the high bandwidth and low latency of the system memory bus 121.


In one embodiment, the fragment processor 201 is configured to use a Z plane equation coverage value to iterate depth information across multiple pixels of a polygon. In such an embodiment, the depth and orientation of a polygon in 3-D space is defined using a Z plane equation. The Z plane equation is used by the fragment processor 201 to determine depth information for each constituent pixel covered by the polygon, and is a much more compact method of describing depth information for a polygon than by using a list of Z values for each fragment of the polygon. Additional description of Z plane raster operations can be found in commonly assigned U.S. Patent Application “Z PLANE ROP” by Steve Molnar, filed on Jun. 28, 2004, Ser. No. 10/878,460, which is incorporated herein in its entirety.


The compression module 323 compresses and decompresses per pixel data for storage and retrieval from the frame buffer 310. For example, in some rendering operations a given pixel can have multiple value samples, with a number of bits per sample. For example, in a case where each pixel of a display includes 8 sample points, the compression module 323 would compress/decompress the data describing such sample points for easier access to and from the frame buffer 310.


The multi sample expansion module 324 performs multi sample expansion operations on the fragment data. For example, depending upon the application (e.g., anti-aliasing) the sample expansion module 324 can expand sample information from one sample point per pixel into eight sample points per pixel. Thus it is desirable to perform the sample expansion in the fragment processor 201 for storage into the frame buffer 310 as supposed to the GPU 110.


Referring still to FIG. 3, in the present embodiment, texture operations and lighting operations are still performed by the GPU 110 (e.g., module 331 and module 332). The texture operations and lighting operations can proceed much more quickly since the relatively limited bandwidth of the graphics bus 120 is free from the traffic that has been moved to the fragment processor 201.



FIG. 4 shows a diagram depicting information that is transferred from the GPU 110 to the fragment processor 201 and to the frame buffer 310 in accordance with one embodiment of the present invention. FIG. 4 shows a case illustrating the benefits by using the fragment processor 201 to perform Z value iteration and multi sample color expansion in accordance with one embodiment of the present invention. As shown in FIG. 4, the GPU 110 can be configured to transfer pre-expanded color values and Z plane equation coverage values to the fragment processor 201 for iteration and expansion. This results in a very compact transfer of data across the low bandwidth graphics bus 120 to the fragment processor 201.


For example, as opposed to sending individual pixels and their values, the GPU 110 sends fragments to the fragment processor 201. These fragments are pre-expanded. The fragments undergo multi sample expansion within the fragment processor 201. Multi sample expansion is used in applications involving anti-aliasing and the like. A typical multi sampling expansion would take one sample of one fragment and expanded it into four samples (e.g., 4× anti-aliasing) or eight samples (e.g., 8× anti-aliasing). This much larger quantity of data is then transferred to the frame buffer 310 across the high bandwidth system memory bus 121 as opposed to the low bandwidth graphics bus 120. For example, in a typical anti-aliasing application, a given pixel can be expanded from one sample comprising 32 bits into eight samples comprising 32 bits each.


Similarly, the Z plane equation can be expanded into 4×, 8×, etc. samples per pixel by the fragment processor 201 from the plane equation for the original polygon. The resulting expanded Z data is then transferred from the fragment processor 201 across the high bandwidth system memory bus 121 to the frame buffer 310.



FIG. 5 shows a flowchart of the steps of a cooperative graphics rendering process 500 in accordance with one embodiment of the present invention. As depicted in FIG. 5, process 500 shows the basic steps involved in a cooperative graphics rendering process as implemented by a fragment processor (e.g., fragment processor 201) of a bridge (e.g., bridge of 105) and the GPU (e.g., GPU 110) of a computer system (e.g., computer system 100 of FIG. 2).


Process 500 begins in step 501, where the fragment processor 201 receives fragment pre-expanded color values from the GPU 110 via the graphics bus 120. In step 502, the fragment processor 201 performs a multi sample color value expansion for a plurality of pixels. In step 503, the fragment processor 201 receives a Z plane equation coverage value from the GPU 110. In step 504, the fragment processor 201 performs a Z plane iteration process to generate iterated Z values for a plurality of pixels. In step 505, as described above, the fragment processor 201 stores the resulting expanded color values and the resulting expanded Z values into the frame buffer 310 via the high bandwidth system memory bus 121. Subsequently, in step 506, the GPU 110 accesses the expanded color values and expanded Z values to render the image.


The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

Claims
  • 1. A system for cooperative graphics processing across a graphics bus comprising: a computer system comprising: a system memory;a graphics bus;a system memory bus,a graphics processor coupled to the graphics bus; anda bridge comprising a fragment processor, the bridge being coupled to the system memory via the system memory bus, and to the graphics processor via the graphics bus,wherein, the graphics processor and fragment processor are configured to perform a plurality of fragment processing operations cooperatively,wherein a graphics driver executing on the computer system balances the plurality of fragment processing operations between the fragment processor and the graphics processor by allocating at least a portion of the plurality of fragment processing to the fragment processor to be performed and allocating a remaining portion of the plurality of fragment processing operations to the graphics processor to be performed,further wherein, the system memory bus has a greater bandwidth than the graphics bus.
  • 2. The system of claim 1, wherein the plurality of fragment processing operations comprises a plurality of raster operations on graphics data stored in the system memory.
  • 3. The system of claim 1, wherein the graphics data stored in the system memory comprises a frame buffer used by the fragment processor and the graphics processor.
  • 4. The system of claim 1, wherein the plurality of fragment processing operations comprises frame buffer blending on the graphics data in the system memory.
  • 5. The system of claim 1, wherein the plurality of fragment processing operations comprises multi-sample expansion on graphics data received from the graphics processor and store resulting expanded data in the system memory.
  • 6. The system of claim 1, wherein the plurality of fragment processing operations comprises evaluating a Z-plane equation coverage value for a plurality of pixels stored in the system memory, wherein the Z-plane equation coverage value is received from the graphics processor.
  • 7. The system of claim 1, wherein the bridge is a North bridge chipset component of the computer system.
  • 8. The system of claim 1, wherein the graphics processor is configured to use a portion of the system memory for frame buffer memory.
  • 9. The system of claim 1, wherein the graphics processor is detachably coupled to the graphics bus by a connector.
  • 10. The system of claim 1, wherein the graphics bus is an AGP graphics bus.
  • 11. The system of claim 1, wherein the graphics bus is a PCI Express graphics bus.
  • 12. The system of claim 1, wherein the graphics driver balances the plurality of fragment processing operations between the fragment processor and the graphics processor by allocating as large a share as possible of the plurality of fragment processing operations to the fragment processor.
  • 13. The system of claim 1, wherein the system memory is used as frame buffer memory for the computer system.
  • 14. The system of claim 1, wherein an amount of data access latency experienced by performing fragment processing operations in the fragment processor is reduced relative to an amount of data access latency experienced by performing the fragment processing operations in the graphics processor.
  • 15. A bridge for implementing cooperative graphics processing with a graphics processor coupled to the bridge across a graphics bus comprising: a computer system comprising: a system memory bus interface comprising a system memory bus;a graphics bus interface comprising a graphics bus; anda fragment processor disposed in the bridge coupled to the system memory bus, the fragment processor being configured to perform a plurality of fragment processing operations cooperatively with a graphics processor coupled to the graphics bus,wherein, a graphics driver executing on the computer system balances the plurality of fragment processing operations between the fragment processor and the graphics processor by allocating at least a portion of the plurality of fragment processing operations to the fragment processor to be performed and allocating a remaining portion of the plurality of fragment processing operations to the graphics processor to be performed,further wherein, the system memory bus has a greater bandwidth than the graphics bus.
  • 16. The bridge of claim 15, wherein the plurality of fragment processing operations comprises a plurality of raster operations on graphics data stored in the system memory.
  • 17. The system of claim 15, wherein the bridge is configured to use a frame buffer in the system memory for the processing of graphics data.
  • 18. The system of claim 15, wherein the plurality of fragment processing operations comprises frame buffer blending on the graphics data in the system memory.
  • 19. The system of claim 15, wherein the plurality of fragment processing operations comprises multi-sample expansion on graphics data received from the graphics processor and store resulting expanded data in the system memory.
  • 20. The system of claim 15 wherein the plurality of fragment processing operations comprises evaluating a Z plane equation coverage value for a plurality of pixels stored in the system memory, wherein the Z plane equation coverage value is received from the graphics processor.
  • 21. The system of claim 15, wherein the graphics processor is detachably coupled to the graphics bus by a connector.
  • 22. The system of claim 15, wherein the graphics driver balances the plurality of fragment processing operations between the fragment processor and the graphics processor by allocating as large a share as possible of the plurality of fragment processing operations to the fragment processor.
  • 23. The system of claim 15, wherein the system memory bus is coupled to a system memory, and wherein the system memory is used as frame buffer memory for the computer system.
  • 24. In a bridge of a computer system, a method for cooperatively implementing fragment processing operations with a graphics processor across a graphics bus in a computer system, comprising: in a computer system, receiving at a fragment processor pre-expanded color values from the graphics processor via the graphics bus, the fragment processor, graphics processor and graphics bus being disposed in the computer system;performing a multi-sample expansion on the color values resulting in expanded color value graphics data, the multi-sample expansion comprising at least a portion of the fragment processing to be performed cooperatively by the fragment processor and the graphics processor in the computer system across the graphics bus;storing the expanded color value graphics data into a frame buffer in a system memory through a system memory bus; andrendering an image to a display, the rendering performed by the graphics processor accessing the expanded color value graphics data in the frame buffer,wherein, the multi-sample expansion is balanced by a graphics driver executing on the computer system by allocating the portion of the fragment processing to the fragment processor to be performed and allocating a remaining portion of the fragment processing to be performed cooperatively to the graphics processor to be performed,further wherein, the system memory bus has a greater bandwidth than the graphics bus.
  • 25. The method of claim 24, further comprising: receiving Z plane equation coverage values from the graphics processor via the graphics bus;performing a Z plane iteration process to generate iterated Z values for a plurality of pixels;storing the iterated Z values into the frame buffer; andrendering the image to the display, the rendering performed by the graphics processor accessing the iterated Z values in the frame buffer.
  • 26. The method of claim 24, wherein the bridge is a North bridge chipset component of the computer system.
  • 27. The method of claim 26, wherein the graphics processor is detachably coupled to the graphics bus by a connector.
  • 28. The method of claim 27, wherein the graphics bus is an AGP graphics bus.
  • 29. The method of claim 27, wherein the graphics bus is a PCI Express graphics bus.
  • 30. The method of claim 24, wherein the graphics driver balances the plurality of fragment processing operations between the fragment processor and the graphics processor by allocating as large a share as possible of the plurality of fragment processing operations to the fragment processor.
  • 31. The method of claim 24, wherein the system memory is used as frame buffer memory for the computer system.
US Referenced Citations (268)
Number Name Date Kind
3091657 Stuessel May 1963 A
3614740 Delagi et al. Oct 1971 A
3940740 Coontz Feb 1976 A
3987291 Gooding et al. Oct 1976 A
4101960 Stokes et al. Jul 1978 A
4541046 Nagashima et al. Sep 1985 A
4566005 Apperley et al. Jan 1986 A
4748585 Chiarulli et al. May 1988 A
4885703 Deering Dec 1989 A
4897717 Hamilton et al. Jan 1990 A
4951220 Ramacher et al. Aug 1990 A
4958303 Assarpour et al. Sep 1990 A
4965716 Sweeney Oct 1990 A
4965751 Thayer et al. Oct 1990 A
4985848 Pfeiffer et al. Jan 1991 A
4985988 Littlebury Jan 1991 A
5036473 Butts et al. Jul 1991 A
5040109 Bowhill et al. Aug 1991 A
5047975 Patti et al. Sep 1991 A
5175828 Hall et al. Dec 1992 A
5179530 Genusov et al. Jan 1993 A
5197130 Chen et al. Mar 1993 A
5210834 Zurawski et al. May 1993 A
5263136 DeAguiar et al. Nov 1993 A
5276893 Savaria Jan 1994 A
5327369 Ashkenazi Jul 1994 A
5357623 Megory-Cohen Oct 1994 A
5375223 Meyers et al. Dec 1994 A
5388206 Poulton et al. Feb 1995 A
5388245 Wong Feb 1995 A
5392437 Matter et al. Feb 1995 A
5408606 Eckart Apr 1995 A
5418973 Ellis et al. May 1995 A
5430841 Tannenbaum et al. Jul 1995 A
5430884 Beard et al. Jul 1995 A
5432905 Hsieh et al. Jul 1995 A
5448496 Butts et al. Sep 1995 A
5498975 Cliff et al. Mar 1996 A
5513144 O'Toole Apr 1996 A
5513354 Dwork et al. Apr 1996 A
5517666 Ohtani et al. May 1996 A
5522080 Harney May 1996 A
5530457 Helgeson Jun 1996 A
5560030 Guttag et al. Sep 1996 A
5561808 Kuma et al. Oct 1996 A
5574847 Eckart et al. Nov 1996 A
5574944 Stager Nov 1996 A
5578976 Yao Nov 1996 A
5627988 Oldfield May 1997 A
5634107 Yumoto et al. May 1997 A
5638946 Zavracky Jun 1997 A
5644753 Ebrahim et al. Jul 1997 A
5649173 Lentz Jul 1997 A
5666169 Ohki et al. Sep 1997 A
5682552 Kuboki et al. Oct 1997 A
5682554 Harrell Oct 1997 A
5706478 Dye Jan 1998 A
5754191 Mills et al. May 1998 A
5761476 Martell Jun 1998 A
5764243 Baldwin Jun 1998 A
5766979 Budnaitis Jun 1998 A
5784590 Cohen et al. Jul 1998 A
5784640 Asghar et al. Jul 1998 A
5796974 Goddard et al. Aug 1998 A
5802574 Atallah et al. Sep 1998 A
5809524 Singh et al. Sep 1998 A
5812147 Van Hook et al. Sep 1998 A
5835788 Blumer et al. Nov 1998 A
5848254 Hagersten Dec 1998 A
5909595 Rosenthal et al. Jun 1999 A
5913218 Carney et al. Jun 1999 A
5920352 Inoue Jul 1999 A
5925124 Hilgendorf et al. Jul 1999 A
5940090 Wilde Aug 1999 A
5940858 Green Aug 1999 A
5949410 Fung Sep 1999 A
5950012 Shiell et al. Sep 1999 A
5956252 Lau et al. Sep 1999 A
5978838 Mohamed et al. Nov 1999 A
5996996 Brunelle Dec 1999 A
5999199 Larson Dec 1999 A
5999990 Sharrit et al. Dec 1999 A
6009454 Dummermuth Dec 1999 A
6016474 Kim et al. Jan 2000 A
6041399 Terada et al. Mar 2000 A
6049672 Shiell et al. Apr 2000 A
6049870 Greaves Apr 2000 A
6065131 Andrews et al. May 2000 A
6067262 Irrinki et al. May 2000 A
6069540 Berenz et al. May 2000 A
6072686 Yarbrough Jun 2000 A
6073158 Nally et al. Jun 2000 A
6092094 Ireton Jul 2000 A
6094116 Tai et al. Jul 2000 A
6108766 Hahn et al. Aug 2000 A
6112019 Chamdani et al. Aug 2000 A
6131152 Ang et al. Oct 2000 A
6141740 Mahalingaiah et al. Oct 2000 A
6144392 Rogers Nov 2000 A
6150610 Sutton Nov 2000 A
6189068 Witt et al. Feb 2001 B1
6192073 Reader et al. Feb 2001 B1
6192458 Arimilli et al. Feb 2001 B1
6208361 Gossett Mar 2001 B1
6209078 Chiang et al. Mar 2001 B1
6219628 Kodosky et al. Apr 2001 B1
6222552 Haas et al. Apr 2001 B1
6230254 Senter et al. May 2001 B1
6239810 Van Hook et al. May 2001 B1
6247094 Kumar et al. Jun 2001 B1
6249288 Campbell Jun 2001 B1
6252610 Hussain Jun 2001 B1
6255849 Mohan Jul 2001 B1
6292886 Makineni et al. Sep 2001 B1
6301600 Petro et al. Oct 2001 B1
6307169 Sun et al. Oct 2001 B1
6314493 Luick Nov 2001 B1
6317819 Morton Nov 2001 B1
6351808 Joy et al. Feb 2002 B1
6363285 Wey Mar 2002 B1
6363295 Akram et al. Mar 2002 B1
6370617 Lu et al. Apr 2002 B1
6437789 Tidwell et al. Aug 2002 B1
6438664 McGrath et al. Aug 2002 B1
6476808 Kuo et al. Nov 2002 B1
6480927 Bauman Nov 2002 B1
6490654 Wickeraad et al. Dec 2002 B2
6496193 Surti et al. Dec 2002 B1
6496902 Faanes et al. Dec 2002 B1
6499090 Hill et al. Dec 2002 B1
6525737 Duluk, Jr. et al. Feb 2003 B1
6529201 Ault et al. Mar 2003 B1
6545683 Williams Apr 2003 B1
6597357 Thomas Jul 2003 B1
6603481 Kawai et al. Aug 2003 B1
6624818 Mantor et al. Sep 2003 B1
6631423 Brown et al. Oct 2003 B1
6631463 Floyd et al. Oct 2003 B1
6657635 Hutchins et al. Dec 2003 B1
6658447 Cota-Robles Dec 2003 B2
6674841 Johns et al. Jan 2004 B1
6690381 Hussain et al. Feb 2004 B1
6700588 MacInnis et al. Mar 2004 B1
6715035 Colglazier et al. Mar 2004 B1
6732242 Hill et al. May 2004 B2
6750870 Olarig Jun 2004 B2
6809732 Zatz et al. Oct 2004 B2
6812929 Lavelle et al. Nov 2004 B2
6825848 Fu et al. Nov 2004 B1
6839062 Aronson et al. Jan 2005 B2
6862027 Andrews et al. Mar 2005 B2
6891543 Wyatt May 2005 B2
6915385 Leasure et al. Jul 2005 B1
6944744 Ahmed et al. Sep 2005 B2
6952214 Naegle et al. Oct 2005 B2
6965982 Nemawarkar Nov 2005 B2
6975324 Valmiki et al. Dec 2005 B1
6976126 Clegg et al. Dec 2005 B2
6978149 Morelli et al. Dec 2005 B1
6978457 Johl et al. Dec 2005 B1
6981106 Bauman et al. Dec 2005 B1
6985151 Bastos et al. Jan 2006 B1
7015909 Morgan, III et al. Mar 2006 B1
7031330 Bianchini, Jr. Apr 2006 B1
7032097 Alexander et al. Apr 2006 B2
7035979 Azevedo et al. Apr 2006 B2
7148888 Huang Dec 2006 B2
7151544 Emberling Dec 2006 B2
7154500 Heng et al. Dec 2006 B2
7159212 Schenk et al. Jan 2007 B2
7185178 Barreh et al. Feb 2007 B1
7202872 Paltashev et al. Apr 2007 B2
7260677 Vartti et al. Aug 2007 B1
7305540 Trivedi et al. Dec 2007 B1
7321787 Kim Jan 2008 B2
7334110 Faanes et al. Feb 2008 B1
7369815 Kang et al. May 2008 B2
7373478 Yamazaki May 2008 B2
7406698 Richardson Jul 2008 B2
7412570 Moll et al. Aug 2008 B2
7486290 Kilgariff et al. Feb 2009 B1
7487305 Hill et al. Feb 2009 B2
7493452 Eichenberger et al. Feb 2009 B2
7545381 Huang et al. Jun 2009 B2
7564460 Boland et al. Jul 2009 B2
7750913 Parenteau et al. Jul 2010 B1
7777748 Bakalash et al. Aug 2010 B2
7852341 Rouet et al. Dec 2010 B1
7869835 Zu Jan 2011 B1
8020169 Yamasaki Sep 2011 B2
8416251 Gadre et al. Apr 2013 B2
8424012 Karandikar et al. Apr 2013 B1
8493396 Karandikar et al. Jul 2013 B2
8493397 Su et al. Jul 2013 B1
8683184 Lew et al. Mar 2014 B1
8687008 Karandikar et al. Apr 2014 B2
8698817 Gadre et al. Apr 2014 B2
8711161 Scotzniovsky et al. Apr 2014 B1
8725990 Karandikar et al. May 2014 B1
8736623 Lew et al. May 2014 B1
8738891 Karandikar et al. May 2014 B1
20010026647 Morita Oct 2001 A1
20020005729 Leedy Jan 2002 A1
20020026623 Morooka Feb 2002 A1
20020031025 Shimano et al. Mar 2002 A1
20020085000 Sullivan et al. Jul 2002 A1
20020087833 Burns et al. Jul 2002 A1
20020116595 Morton Aug 2002 A1
20020130874 Baldwin Sep 2002 A1
20020144061 Faanes et al. Oct 2002 A1
20020158869 Ohba et al. Oct 2002 A1
20020194430 Cho Dec 2002 A1
20030001847 Doyle et al. Jan 2003 A1
20030001857 Doyle Jan 2003 A1
20030003943 Bajikar Jan 2003 A1
20030014457 Desai et al. Jan 2003 A1
20030016217 Vlachos et al. Jan 2003 A1
20030016844 Numaoka Jan 2003 A1
20030020173 Huff et al. Jan 2003 A1
20030031258 Wang et al. Feb 2003 A1
20030051091 Leung et al. Mar 2003 A1
20030061409 RuDusky Mar 2003 A1
20030067473 Taylor et al. Apr 2003 A1
20030080963 Van Hook et al. May 2003 A1
20030093506 Oliver et al. May 2003 A1
20030115500 Akrout et al. Jun 2003 A1
20030169269 Sasaki et al. Sep 2003 A1
20030172326 Coffin, III et al. Sep 2003 A1
20030188118 Jackson Oct 2003 A1
20030204673 Venkumahanti et al. Oct 2003 A1
20030204680 Hardage, Jr. Oct 2003 A1
20030227461 Hux et al. Dec 2003 A1
20040012597 Zatz et al. Jan 2004 A1
20040073771 Chen et al. Apr 2004 A1
20040073773 Demjanenko Apr 2004 A1
20040103253 Kamei et al. May 2004 A1
20040193837 Devaney et al. Sep 2004 A1
20040205281 Lin et al. Oct 2004 A1
20040205326 Sindagi et al. Oct 2004 A1
20040212730 MacInnis et al. Oct 2004 A1
20040215887 Starke Oct 2004 A1
20040221117 Shelor Nov 2004 A1
20040263519 Andrews et al. Dec 2004 A1
20050012749 Gonzalez et al. Jan 2005 A1
20050012759 Valmiki et al. Jan 2005 A1
20050024369 Xie Feb 2005 A1
20050060601 Gomm Mar 2005 A1
20050071722 Biles Mar 2005 A1
20050088448 Hussain et al. Apr 2005 A1
20050140682 Sumanaweera et al. Jun 2005 A1
20050239518 D'Agostino et al. Oct 2005 A1
20050262332 Rappoport et al. Nov 2005 A1
20050280652 Hutchins et al. Dec 2005 A1
20060020843 Frodsham et al. Jan 2006 A1
20060064517 Oliver Mar 2006 A1
20060064547 Kottapalli et al. Mar 2006 A1
20060103659 Karandikar et al. May 2006 A1
20060152519 Hutchins et al. Jul 2006 A1
20060152520 Gadre et al. Jul 2006 A1
20060176308 Karandikar et al. Aug 2006 A1
20060176309 Gadre et al. Aug 2006 A1
20070076010 Swamy et al. Apr 2007 A1
20070130444 Mitu et al. Jun 2007 A1
20070285427 Morein et al. Dec 2007 A1
20080016327 Menon et al. Jan 2008 A1
20080278509 Washizu et al. Nov 2008 A1
20090235051 Codrescu et al. Sep 2009 A1
20120023149 Kinsman et al. Jan 2012 A1
Foreign Referenced Citations (18)
Number Date Country
07-101885 Apr 1995 JP
H08-077347 Mar 1996 JP
H08-153032 Jun 1996 JP
08-297605 Dec 1996 JP
09-287217 Oct 1997 JP
09-287217 Nov 1997 JP
H09-325759 Dec 1997 JP
10-222476 Aug 1998 JP
11-190447 Jul 1999 JP
2000-148695 May 2000 JP
2001-022638 Jan 2001 JP
2003-178294 Jun 2003 JP
2004-252990 Sep 2004 JP
1998-018215 Aug 2000 KR
413766 Dec 2000 TW
436710 May 2001 TW
442734 Jun 2001 TW
093127712 Jul 2005 TW
Non-Patent Literature Citations (79)
Entry
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1997 p. 8-1.
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1999 p. 8-1, 9-1.
Intel, Pentium Processor Family Developer's Manual, 1997, pp. 2-13.
Fisher, Joseph A., Very Long Instruction Word Architecture and the ELI-512, ACM, 1993, pp. 140-150.
Hamacher, V. Carl et al., Computer Organization, Second Edition, McGraw Hill, 1984, pp. 1-9.
Kozyrakis, “A Media enhanced vector architecture for embedded memory systems,” Jul. 1999, http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-99-1059.pdf.
Brown, Brian; “Data Structure And Number Systems”; 2000; http://www.ibilce.unesp.br/courseware/datas/data3.htm.
“Alpha Testing State”; http://msdn.microsoft.com/library/en-us/directx9—c/directx/graphics/programmingguide/GettingStarted/Direct3Kdevices/States/renderstates/alphatestingstate.asp.
“Anti-aliasing”; http://en.wikipedia.org/wiki/Anti-aliasing.
“Vertex Fog”; http://msdn.microsoft.com/library/en-us/directx9—c/Vertex—fog.asp?frame=true.
NVIDIA Corporation, Technical Brief: Transform and Lighting; dated 1999; month unknown.
Graham, Susan L. et al., Getting Up to Speed: The future of Supercomputing, the National Academies Press, 2005, glossary.
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 102 and 338 (NVID-P001502).
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 305.
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1988, pp. 273.
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1984, pp. 566.
Graston et al. (Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture); Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems; pp. 138-144; Year of Publication: 2001.
Duca et al., A Relational Debugging Engine for Graphics Pipeline, International Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2005, pp. 453-463, ISSN:0730-0301.
Gadre, S., Patent Application Entitled “Video Processor Having Scalar and Vector Components with Command FIFO for Passing Function Calls from Scalar to Vector”, U.S. Appl. No. 11/267,700, filed Nov. 4, 2005.
Gadre, S., Patent Application Entitled “Stream Processing in a Video Processor”, U.S. Appl. No. 11/267,599, filed Nov. 4, 2005.
Karandikar et al., Patent Application Entitled: “Multidemnsional Datapath Processing in a Video Processor”, U.S. Appl. No. 11/267,638, filed Nov. 4, 2005.
Karandikar et al., Patent Application Entitled: “A Latency Tolerant System for Executing Video Processing Operations”, U.S. Appl. No. 11/267,875, filed Nov. 4, 2005.
Gadre, S., Patent Application Entitled “Separately Schedulable Condition Codes For a Video Processor”, U.S. Appl. No. 11/267,793, filed Nov. 4, 2005.
Lew, et al., Patent Application Entitled “A Programmable DMA Engine for Implementing Memory Transfers for a Video Processor”, U.S. Appl. No. 11/267,777, filed Nov. 4, 2005.
Karandikar et al., Patent Application Entitled: “A Pipelined L2 Cache for Memory Transfers for a Video Processor”, U.S. Appl. No. 11/267,606, filed Nov. 4, 2005.
Karandikar, et al., Patent Application Entitled: “Command Acceleration in a Video Processor”, U.S. Appl. No. 11/267,640, filed Nov. 4, 2005.
Karandikar, et al., Patent Application Entitled “A Configurable SIMD Engine in a Video Processor”, U.S. Appl. No. 11/267,393, filed Nov. 4, 2005.
Karandikar, et al., Patent Application Entitled “Context Switching on a Video Processor Having a Scalar Execution Unit and a Vector Execution Unit”, U.S. Appl. No. 11/267,778, filed Nov. 4, 2005.
Lew, et al., Patent Application Entitled “Multi Context Execution on a Video Processor”, U.S. Appl. No. 11/267,780, filed Nov. 4, 2005.
Su, Z, et al., Patent Application Entitled: “State Machine Control for a Pipelined L2 Cache to Implement Memory Transfers for a Video Processor”, U.S. Appl. No. 11/267,119, filed Nov. 4, 2005.
Free On-Line Dictionary of Computing (FOLDOC), defintion of “video”, from foldoc.org/index.cgi?query=video&action=Search, May 23, 2008.
FOLDOC, definition of “frame buffer”, from foldoc.org/index.cgi?query=frame+buffer&action=Search, Oct. 3, 1997.
FOLDOC, definition of “motherboard”, from foldoc.org/index.cgi?query=motherboard&action=Search, Aug. 10, 2000.
FOLDOC, definition of “separate compilation”, from foldoc.org/index.cgi?query=separate+compilation&action=Search, Feb. 19, 2005.
FOLDOC, definition of “vector processor”, http://foldoc.org/, Sep. 11, 2003.
FOLDOC (Free On-Line Dictionary of Computing), defintion of X86, Feb. 27, 2004.
FOLDOC, definition of “superscalar,” http://foldoc.org/, Jun. 22, 2009.
FOLDOC, definition of Pentium, Sep. 30, 2003.
Wikipedia, definition of “scalar processor,” Apr. 4, 2009.
Wikipedia, entry page defining term “SIMD”, last modified Mar. 17, 2007.
FOLDOC, Free Online Dictionary of Computing, defintion of SIMD, foldoc.org/index.cgi?query=simd&action=Search, Nov. 4, 1994.
Definition of “queue” from Free on-Line Dictionary of Computing (FOLDOC), http://folddoc.org/index.cgi?query=queue&action=Search, May 15, 2007.
Definition of “first-in first-out” from FOLDOC, http://foldoc.org/index.cgi?query=fifo&action=Search, Dec. 6, 1999.
Definition of “block” from FOLDOC, http://foldoc.org/index.cgi?block, Sep. 23, 2004.
Wikipedia, definition of Multiplication, accessed from en.wikipedia.org/w/index.php?title=Multiplication&oldid=1890974, published Oct. 13, 2003.
Graham, Susan L. et al., Getting Up to Speed: The future of Supercomputing, the National Academies Press, 2005, glossary, Feb. 2005.
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 102 and 338 (NVID-P001502), Dec. 1987.
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 305, Dec. 1987.
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1988, pp. 273, Dec. 1988.
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1984, pp. 566, Dec. 1988.
Wikipeida, definition of “subroutine”, published Nov. 29, 2003, four pages.
Graston et al. (Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture); Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems; pp. 138-144; Year of Publication: 2001, Oct. 2001.
SearchStorage.com Definitions, “Pipeline Burst Cache,” Jul. 31, 2001, url: http://searchstorage.techtarget.com/sDefinition/0,,sid5—gci214414,00.html.
Parhami, Behrooz, Computer Arithmetic: Algorithms and Hardware Designs, Oxford University Press, Jun. 2000, pp. 413-418.
gDEBugger, graphicRemedy, http://www.gremedy.com, Aug. 8, 2006.
Duca et al., A Relational Debugging Engine for Graphics Pipeline, International Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2005, pp. 453-463, ISSN:0730-0301, Jul. 2005.
Merriam-Webster Dictionary Online; Definition for “program”; retrieved Dec. 14, 2010.
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1997 p. 8-1, Jan. 1997.
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1999 p. 8-1, 9-1, May 1999.
Intel, Intel Pentium III Xeon Processor at 500 and 550Mhz, Feb. 1999.
Intel, Intel MMX Technology at a Glance, Jun. 1997.
Intel, Pentium Processor Family Developer's Manual, 1997, pp. 2-13, Oct. 199.
Intel, Pentium processor with MMX Technology at 233Mhz Performance Brief, Jan. 1998, pp. 3 and 8.
PCreview, article entitled “What is a Motherboard”, from www.pcreview.co.uk/articles/Hardware/What—is—a—Motherboard., Nov. 22, 2005.
Wikipedia, defintion of “vector processor”, http://en.wikipedia.org/, May 14, 2007.
Fisher, Joseph A., Very Long Instruction Word Architecture and the ELI-512, ACM, 1993, pp. 140-150, Jun. 1993.
Quinnell, Richard A. “New DSP Architectures Go “Post-Harvard” for Higher Performance and Flexibility” Techonline; posted May 1, 2002.
IBM TDB, Device Queue Management, vol. 31 Iss. 10, pp. 45-50, Mar. 1, 1989.
Hamacher, V. Carl et al., Computer Organization, Second Edition, McGraw Hill, 1984, pp. 1-9, May 1984.
Kozyrakis, “A Media enhanced vector architecture for embedded memory systems,” Jul. 1999, http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-99/1059.pdf.
HPL-PD A Parameterized Research Approach—May 31, 2004 http://web.archive.org/web/*/www.trimaran.org/docs/5—hpl-pd.pdf.
Hutchins E., SC10: A Video Processor And Pixel-Shading GPU for Handheld Devices; presented at the Hot Chips conferences on Aug. 23, 2004.
Brown, Brian; “Data Structure And Number Systems”; 2000; http://www.ibilce.unesp.br/courseware/datas/data3.htm, Mar. 2000.
“Alpha Testing State”; http://msdn.microsoft.com/library/en-us/directx9—c/directx/graphics/programmingguide/GettingStarted/Direct3Kdevices/States/renderstates/alphatestingstate.asp, Sep. 2004.
“Anti-aliasing”; http://en.wikipedia.org/wiki/Anti-aliasing, Mar. 2004.
“Vertex Fog”; http://msdn.microsoft.com/library/en-us/directx9—c/Vertex—fog.asp?frame=true, Apr. 2008.
Wilson D., NVIDIA's Tiny 90nm G71 and G73: GeForce 7900 and 7600 Debut; at http://www.anandtech.com/show/1967/2; dated Sep. 3, 2006, retrieved Jun. 16, 2011.
Woods J., Nvidia GeForce FX Preview, at http://www.tweak3d.net/reviews/nvidia/nv30preview/1.shtml; dated Nov. 18, 2002; retrieved Jun. 16, 2011.
NVIDIA Corporation, Technical Brief: Transform and Lighting; dated 1999; month unknown, Apr. 1999.