This Application is related to the following U.S. Patent Applications:
“A PROGRAM SEQUENCER FOR GENERATING INDETERMINANT LENGTH SHADER PROGRAMS FOR A GRAPHICS PROCESSOR”, by Mahan et al., filed on Aug. 15, 2007, Ser. No. 11/893,404; “FRAGMENT SPILL/RELOAD FOR A GRAPHICS PROCESSOR”, by Mahan et al., filed on Aug. 15, 2007, Ser. No. 11/893,502; “SHADER PROGRAM INSTRUCTION FETCH”, by Mahan et al., filed on Aug. 15, 2007, Ser. No. 11/893,503; and “SOFTWARE ASSISTED SHADER MERGING”, by Mahan et al., filed on Aug. 15, 2007, Ser. No. 11/893,439.
The present invention is generally related to programming graphics computer systems.
Recent advances in computer performance have enabled graphic systems to provide more realistic graphical images using personal computers, home video game computers, handheld devices, and the like. In such graphic systems, a number of procedures are executed to “render” or draw graphic primitives to the screen of the system. A “graphic primitive” is a basic component of a graphic picture, such as a point, line, polygon, or the like. Rendered images are formed with combinations of these graphic primitives. Many procedures may be utilized to perform 3-D graphics rendering.
Specialized graphics processing units (e.g., GPUs, etc.) have been developed to optimize the computations required in executing the graphics rendering procedures. The GPUs are configured for high-speed operation and typically incorporate one or more rendering pipelines. Each pipeline includes a number of hardware-based functional units that are optimized for high-speed execution of graphics instructions/data. Generally, the instructions/data are fed into the front end of the pipeline and the computed results emerge at the back end of the pipeline. The hardware-based functional units, cache memories, firmware, and the like, of the GPU are optimized to operate on the low-level graphics primitives and produce real-time rendered 3-D images.
In modern real-time 3-D graphics rendering, the functional units of the GPU need to be programmed in order to properly execute many of the more refined pixel shading techniques. These techniques require, for example, the blending of colors into a pixel in accordance with factors in a rendered scene which affect the nature of its appearance to an observer. Such factors include, for example, fogginess, reflections, light sources, and the like. In general, several graphics rendering programs (e.g., small specialized programs that are executed by the functional units of the GPU) influence a given pixel's color in a 3-D scene. Such graphics rendering programs are commonly referred to as shader programs, or simply shaders. In more modern systems, some types of shaders can be used to alter the actual geometry of a 3-D scene (e.g., Vertex shaders) and other primitive attributes.
In a typical GPU architecture, each of the GPU's functional units is associated with a low level, low latency internal memory (e.g., register set, etc.) for storing instructions that programmed the architecture for processing the primitives. The instructions typically comprise a shader programs and the like. The instructions are loaded into their intended GPU functional units by propagating them through the pipeline. As the instructions are passed through the pipeline, when they reach their intended functional unit, that functional unit will recognize its intended instructions and store them within its internal registers.
Prior to being loaded into the GPU, the instructions are typically stored in system memory. Because the much larger size of the system memory, a large number of shader programs can be stored there. A number of different graphics processing programs (e.g., shader programs, fragment programs, etc.) can reside in system memory. The programs can each be tailored to perform a specific task or accomplish a specific result. In this manner, the graphics processing programs stored in system memory act as a library, with each of a number of shader programs configured to accomplish a different specific function. For example, depending upon the specifics of a given 3-D rendering scene, specific shader programs can be chosen from the library and loaded into the GPU to accomplish a specialized customized result.
The graphics processing programs, shader programs, and the like are transferred from system memory to the GPU through a DMA (direct memory access) operation. This allows GPU to selectively pull in the specific programs it needs. The GPU can assemble an overall graphics processing program, shader, etc. by selecting two or more of the graphics programs in system memory and DMA transferring them into the GPU.
There are problems with conventional GPU architectures in selectively assembling more complex graphics programs, shader programs, or the like from multiple subprograms. In general, it is advantageous to link two or more graphics programs together in order to implement more complex or more feature filled render processing. A problem exists however, in that in order to link multiple graphics processing programs together, the addressing schemes of the programs need to properly refer to GPU memory such that the two programs execute as intended. For example, in a case where two shader programs are linked to form a longer shader routine, the first shader address mechanism needs to correctly reference the second shader address mechanism. Additionally, both shader address mechanisms need to properly and coherently referred to the specific GPU functional units and/or registers in which they will be stored. This can involve quite a bit of overhead in those cases where there are many different graphics programs stored in system memory and a given application wants to be able to link multiple programs in a number of different orders, combinations, total lengths, and the like.
The programs in system memory have no way of knowing the order in which they will be combined, the number of them there will be in any given combination, or whether they will be combined at all. Due to the real time rendering requirements, the configurations of the combinations need to be determined on-the-fly, and need to be implemented as rapidly as possible in order to maintain acceptable frame rates. It is still desirable to DMA transfer the programs from the system memory to the GPU (e.g., on an as-needed basis). In order to facilitate DMA transfers, the desired programs need to be modified to properly point to their respective correct addresses and to properly order themselves for execution with the various functional units of the GPU. Unfortunately, this results in a large number of read-modify-write operations (e.g., R-M-W), where the program must be read, their address mechanisms altered such that the individual instructions comprising each program correctly match their intended functional units and registers, and written back to system memory. Only after the required R-M-W operations have been completed can the desired programs be DMA transferred into the GPU. This results in a large amount of undesirable processor overhead.
The increased overhead proves especially problematic with the ability of prior art 3-D rendering architectures to scale to handle the increasingly complex 3-D scenes of today's applications. Scenes now commonly contain hundreds of programs each consisting of up to hundreds of instructions. Thus, a need exists for program loading process that can scale as graphics application needs require and provide added performance without incurring penalties such as increased processor overhead.
Moreover, different operations or programs within a graphics pipeline may originate from different sources. For example, a traditional graphics pipeline may involve multiple application programmatic interfaces (APIs), with separate APIs being used for interacting with shader modules and the raster operation modules. With different APIs, and often different underlying sources of graphics data, read-modify-write hazards are created. For example, a change to one program currently operating on the graphics pipeline may unintentionally overwrite portions of a second program, while the second program is still in use.
Embodiments for programming a graphics pipeline, and modules within the graphics pipeline, are detailed herein. These embodiments address the need for allowing multiple shader programs to be merged into the same instruction table, while avoiding read-modify-write hazards. Several of these embodiments utilize offset registers associated with the instruction tables for the modules within the pipeline. The offset register serves as a pointer to locations in the instruction table, which allows instructions to be written to the instruction table, without requiring that the shader programs have explicit addresses.
One embodiment described a method of implementing software assisted shader merging for a graphics pipeline. The method involves accessing a first shader program in memory, and generating a first shader instruction from that program. This first instruction is loaded into an instruction table at a first location, indicated by an offset register. A second shader program in memory is then accessed, and used to generate a second shader instruction. The second shader instruction is loaded into the instruction table at a second location indicated by the offset register.
Another embodiment describes a graphics processing unit (GPU). The GPU includes an integrated circuit die, made up of a number of stages of the GPU, as well as a memory interface for interfacing with the graphics memory, and a host interface for interfacing with a computer system. The stages of the GPU make up a graphics pipeline, which is configured to access a first shader program stored in memory. A shader instruction is generated from his first shader program, and loaded into an instruction table at a location indicated by an offset register. The pipeline is also configured to access a second shader program, which is used to generate another shader instruction, which is loaded into the instruction table at another location indicated by the offset register.
Another embodiment describes a handheld computer system device. The device is made up of system memory, a central processing unit (CPU), and a graphics processing unit (GPU). The GPU includes a graphics pipeline, which is configured to access a shader program stored in memory, generate a shader instruction from the shader program, and load the instruction into a location in an instruction table indicated by an offset register. The graphics pipelines further configured to access another shader program, generate a second shader instruction from this second shader program, and load the second instruction into another location in the instruction table, as indicated by the offset register.
The present invention is illustrated by way of example, and not by way of limitation, in the Figures of the accompanying drawings and in which like reference numerals refer to similar elements.
It should be appreciated that the GPU can be implemented as a discrete component, a discrete graphics card designed to couple to the computer system via a connector (e.g., AGP slot, PCI-Express slot, etc.), a discrete integrated circuit die (e.g., mounted directly on a motherboard), or as an integrated GPU included within the integrated circuit die of a computer system chipset component (not shown), or within the integrated circuit die of a PSOC (programmable system-on-a-chip). Additionally, a local graphics memory 114 can be included for the GPU for high bandwidth graphics data storage. The GPU is depicted as including pipeline 210, which is described in greater detail below, with reference to
As depicted in
In some embodiments, the graphics pipeline may be utilized for other, non-graphics purposes. For example, the graphics pipeline, and the GPU, may be utilized to implement general-purpose GPU operations, e.g., physical simulation, such as the draping of cloth across an object, can be calculated using graphics hardware such as the GPU and the graphics pipeline. While embodiments below are described in terms of operating on graphics primitives, it is understood that other embodiments are well suited to applications involving non-graphics oriented applications.
The program sequencer functions by controlling the operation of the functional modules of the graphics pipeline. The program sequencer can interact with the graphics driver (e.g., a graphics driver executing on the CPU) to control the manner in which the functional modules of the graphics pipeline receive information, configure themselves for operation, and process graphics primitives. For example, in the
In one embodiment, data proceeds between the various functional modules in a packet based format. For example, the graphics driver transmits data to the GPU in the form of data packets are specifically configured to interface with and be transmitted along the fragment pipe communications pathways of the pipeline. Such data packets may include pixel packets or register packets. The pixel packets generally includes information regarding a group or tile of pixels (e.g., 4 pixels, 8 pixels, 16 pixels, etc.) and coverage information for one or more primitives that relate to the pixels. The register packets can include configuration information that enables the functional modules of the pipeline to configure themselves for rendering operations. For example, the register packets can include configuration bits, instructions, functional module addresses, etc. that that can be used by one or more of the functional modules of the pipeline to configure itself for the current rendering mode, or the like. In addition to pixel rendering information and functional module configuration information, the data packets can include shader program instructions that program the functional modules of the pipeline to execute shader processing on the pixels. For example, the instructions comprising a shader program can be transmitted down the graphics pipeline and be loaded by one or more designated functional modules. Once loaded, during rendering operations, the functional module can execute the shader program on the pixel data to achieve the desired rendering effect.
In some embodiments, pixel packets may make multiple passes through the graphics pipeline. For example, as a packet is processed through the graphics pipeline, some of the instructions in a particular functional module may be performed on that packet during an initial pass, and additional instructions may be performed during a subsequent pass. In the depicted embodiment, if a packet is to pass through the graphics pipeline an additional time, it is returned to the program sequencer, which may pass it through the graphics pipeline for additional processing. In another embodiment, the data write unit may pass the partially-processed data packet to the fragment data cache, and the program sequencer may retrieve the packet for additional processing. As is explained below, some embodiments utilize this approach to enable additional instructions to be loaded into the graphics pipeline modules.
In some embodiments, as noted above, such shader program instructions are passed as packets through the graphics pipeline, and loaded into one or more designated functional modules. One such embodiment “labels” instruction packets to identify which functional module or modules should utilize the instructions contained therein, e.g., by including a register address or pointer in the packet header, indicating which instruction table the instruction packet is intended for. For example, as an instruction packet passes through a functional module, one of three possible results may occur, based upon the header for the packet: the packet is not intended for that module, and so the module ignores the packet and passes it along the graphics pipeline; the packet is intended solely for that module, and so the module utilizes the instructions contained therein, and “consumes” the packet, not passing it down the graphics pipeline; or the packet is intended for several modules, including the current module, and so the module utilizes the instructions contained therein, and passes the packet down the graphics pipeline.
In one embodiment, the GPU stores the span of pixels in graphics memory subsequent to program execution for the first portion. This clears the stages of the pipeline to be used for loading instructions for the second portion. Subsequent to loading instructions from the second portion, the GPU accesses the span of pixels in the graphics memory to perform program execution for the second portion. In this manner, the program sequencer loads the first portion of the shader program, processes the span of pixels in accordance with the first portion, temporarily stores the intermediate result in the graphics memory, loads the second portion of the shader program, retrieves the intermediate results from the graphics memory and processes the span of pixels (e.g., the intermediate results) in accordance with the second portion. This process can be repeated until all portions of an indeterminate length shader program have been executed and the span of pixels have been complete a processed. The resulting processed pixel data is then transferred to the graphics memory for rendering onto the display.
Several embodiments combine the use of instruction packets with the storing/reloading of intermediate results to reprogram the graphics pipeline part way through processing a span of pixels. As is explained in greater detail below, the graphics pipeline can be “cleared” through the use of a spill/reload operation, allowing instruction packets to pass between functional modules. In this manner, the highly optimized and efficient fragment pipe communications pathway implemented by the functional modules of the graphics pipeline can be used not only to transmit pixel data between the functional modules, but to also transmit configuration information and shader program instructions between the functional modules.
Referring still to
To execute shader programs of indeterminate length, the program sequencer controls the graphics pipeline to execute such indeterminate length shader programs by executing them in portions. The program sequencer accesses a first portion of the shader program from the system memory 114 and loads the instructions from the first portion into the plurality of stages of the pipeline (e.g., the ALU, the data write component, etc.) of the GPU to configure the GPU for program execution. As described above, the instructions for the first portion can be transmitted to the functional modules of the graphics pipeline as pixel packets that propagate down the fragment pipeline. A span of pixels (e.g., a group of pixels covered by a primitive, etc.) is then processed in accordance with the instructions from the first portion. A second portion of the shader program is then accessed (e.g., DMA transferred in from the system memory) and instructions from the second portion are then loaded into the plurality of stages of the pipeline.
The span of pixels is then processed in accordance with the instructions from the second portion. In this manner, multiple shader program portions can be accessed, loaded, and executed to perform operations on the span of pixels. For example, for a given shader program that comprises a hundred or more portions, for each of the portions, the GPU can process the span of pixels by loading instructions for the portion and executing instructions for that portion, and so on until all the portions comprising the shader program are executed. This attribute enables embodiments of the present invention to implement the indefinite length shader programs. As described above, no arbitrary limit is placed on the length of a shader program that can be executed.
The functional modules are typical functional modules of the 3-D graphics rendering pipeline (e.g., setup unit, raster unit, texturing unit, etc.). The 3-D graphics rendering pipeline comprises a core component of a GPU, and can be seen as more basic diagram of the pipeline 210 from
Each of the instruction block images comprise a graphics rendering epoch that programs the hardware components of, for example, a functional unit 322 (e.g., an ALU unit, etc.) to perform a graphics rendering operation. A typical instruction block image (e.g., instruction block image 301) comprises a number of instructions. In the DMA transfer embodiment described above, the instruction block images are stored in system memory and are maintained there until needed by a graphics application executing on the GPU.
The
Additional details regarding the DMA transfer of instruction block images/portions and their assembly to create shader programs can be found in the commonly assigned United States patent application “Address independent shader program loading” by Mahan, et al., filed on Aug. 15, 2007, application Ser. No. 11/893,427, which is incorporated herein in its entirety.
Instruction Tables and Offset Registers
With reference now to
In the depicted embodiment, each functional module is associated with an instruction table. For example, program sequencer 220 is associated with command table 420. For such an embodiment, the instruction table contains instructions for the module on what actions to perform, or how to handle and manipulate data. For example, the command table may contain instructions for the program sequencer on where to locate shader program instruction sets, or how to configure other modules within the pipeline. In the depicted embodiment, instruction tables within the pipeline are 64 entries in length; is understood that the length of instruction tables in a very, across different embodiments, an embodiment so well suited to applications involving instruction tables of different sizes. Moreover, while
In the depicted embodiment, an offset register is associated with each instruction table. The offset register is used as a pointer to a location within its associated instruction table, such that a new instruction received by a particular module, e.g., as a register packet, is loaded into the module's instruction table at the position indicated by the offset. The use of the offset register, in these embodiments, allows for new instructions to be inserted into the instruction table, without requiring that the underlying program or instructions contain the instruction table offset. Accordingly, modules within the pipeline can be programmed, with reference to this offset register, and without requiring that a shader program be compiled or patched whenever the contents or ordering of the instruction table is modified. For example, offset register 421 is associated with the command table of the program sequencer, and may indicate the first unused entry in the command table.
Similarly, module 225 is associated with table 425, with corresponding offset register 426. Module 230 is associated with table 430, with corresponding offset register 431. Module 235 is associated with table 435, with corresponding offset register 436. Module 240 is associated with table 440, with corresponding offset register 436.
Also, in some embodiments, a global offset register (not pictured) is maintained for the graphics pipeline. In several such embodiments, the global offset register is an alias for the offset registers of each of the modules within the pipeline. This offers several advantages: first, all of the offset registers within the pipeline may be modified, by altering the current value of the global offset register; and second, if a register packet is intended for use by all of the modules within the pipeline, it can be directed at the global offset register, rather than an offset register specific to one or another of the modules.
Register Packet Generation
With reference now to
One of the operations of the program sequencer is the generation of register packets. As noted previously, register packets are specialized data packets, which are passed down the graphics pipeline, and used to program or configure the modules within the pipeline, e.g., by loading instructions into a module's instruction table. One source of instructions may be an instruction block retrieved from system memory, e.g., as shown in
In several embodiments, the instruction blocks which make up a shader program are processed by the program sequencer, and used to generate register packets. If a register packet may be made up of a header section 501 and a payload section 503. The header section may contain a variety of information, and will typically include an address field 502. The address field indicates which module or modules this particular register packet is intended for; e.g., a register packet intended for the ALU may have a different address entry than a packets intended for the data right unit, while a register packets intended for global application throughout the graphics pipeline may have still another address entry. The payload of a register packet is typically the instruction intended to be loaded into the consuming module; indeed depicted embodiment, the payload portion of the register packet is shown as being 32 bits wide. It is understood that, across different embodiments, the contents of the register packet, as well as the size and contents of any field within the packet, may vary.
Loading Instructions
With reference now to
The command table is loaded with a number of command instructions. In different embodiments, these command instructions may have different sources; in one embodiment, for example, a GPU driver executing on a host computer may pass these command instructions to a GPU, and hence to the graphics pipeline and the program sequencer.
Command instruction 0, in the depicted embodiment, is flagged as an “immediate” instruction; the presence of the immediate opcode in an instruction, in this embodiment, indicates that the data for the instruction is located in the instruction itself, as opposed to in memory or in a nearby register. Here, command instruction 0 tells the program sequencer to set the value of the global offset to zero. As described above, some embodiments utilize a global offset value, which aliases to the individual offsets used by each module throughout the pipeline. By initializing the global offset to zero, each offset associated with a module in the pipeline is similarly set to zero. In the depicted embodiment, this has the effect of resetting the pointer for table 625 to point to the first entry in the table; any instructions written to this table therefore be written beginning at position 0.
Command instruction 1, in the depicted embodiment, is a “gather” operation. A gather operation instructs the program sequencer to access a location in memory, e.g., graphics memory, and “read” a specified quantity of data stored at that location. Here, the program sequencer is instructed to access memory location 0x100, and read 64 words of data from that location. In some embodiments, such as the depicted embodiment, a gather operation is used to point the program sequencer at the location of instruction blocks; the program sequencer can then retrieve the instruction blocks, and generate the appropriate register packets.
Once the program sequencer has generated a register packet, the packet is passed down the pipeline. If a register packet is intended for a particular module, that module will recognize the address entry in the register packet. The module will then take the contents of the payload of the register packets, and load them into an associated data register, e.g., data register 627. The module will then attempt to write the contents of the data register to the position in its associated instruction table indicated by the associated offset register value. Here, for example, module 225 will recognize a register packet generated by the program sequencer. Module 225 will extract the payload of the register packet, and load it into the data register. The contents of the data register will then be written to instruction table 625 beginning from the position indicated by offset register 626. As entries are written to the instruction table, the value in the offset register may be incremented, such that the offset points to the next available location in the instruction table.
Once the program sequencer has finished gathering the indicated data, and generating any appropriate register packets, command instruction 2 from the command table is performed. In the depicted embodiment, command instruction 2 is an “execute.stop” instruction. In the depicted embodiment, an execute.stop instruction indicates to the program sequencer that, upon receipt of data to be processed, the data packets should be passed through the pipeline where instructions ranging from a first value through a second value in each instruction table should be performed. Here, a pixel packet received by the program sequencer would have a sequence value in the header initialized to the first indicated value, 0, and as operations are performed on the pixel packet, the sequence value would be incremented. Once the sequence value reaches the second indicated value, 7, no further operations would be performed on the pixel packet. In summary, command instruction 2 instructs the program sequencer and the graphics pipeline to perform an eight instruction program, beginning at instruction 0 and ending at instruction 7, and then halt operations until additional data packets are received for processing.
Program Instruction Fetch
With reference now to
With reference to step 705, a program sequencer in a graphics pipeline is configured. In various embodiments, as noted previously, the method and approach of configuring the program sequencer may vary. For example, in one embodiment, a GPU driver issues a series of command instructions to the program sequencer, which configure the program sequencer to control the graphics pipeline. In one such embodiment, a series of command instructions are loaded into a command table associated with the program sequencer; these command instructions control the operation of the program sequencer, which in turn governs the flow of data to the graphics pipeline.
With reference now to step 710, a shader program stored in memory is accessed. In some embodiments, shader programs are made up of multiple instruction blocks, and stored in memory, e.g., graphics memory or system memory. The instruction blocks are made up of instructions intended for modules within the graphics pipeline. These instructions configure the various modules perform certain tasks on data packets being passed through the pipeline. These instruction blocks are retrieved, e.g., by the program sequencer carrying out a gather operation.
With reference now to step 720, shader instructions are generated from the shader program. As noted previously, in some embodiments, shader programs are made up of a collection of instruction blocks, which in turn are made up of individual instructions. In some embodiments, instructions may be read, one of the time, directly from memory; in another embodiment, individual instructions may need to be derived, e.g., by the program sequencer, from the shader program.
With reference now to step 730, shader instructions are loaded into an instruction table for a pipeline module at locations indicated by an offset register. As described previously, some embodiments utilize an offset register as a “pointer.” The value contained in the offset register indicates a position within the instruction table for the related module where instructions should be loaded. In one such embodiment, the program sequencer generates register packets, which contain one or more instructions intended for a particular module. The module recognizes the register packet, e.g., by reading an address field included in the header of the packet, and extracts the instructions contained therein. These instructions are then loaded into the instruction table for the module, beginning at the position indicated by the offset register. As such, the shader instructions can be passed to their intended destination and utilized by the appropriate module, without the need for the shader program to include specific instruction table offsets.
Software Assisted Shader Program Merging
In some embodiments, the above approach is utilized to allow multiple shader programs to be “merged” into the same instruction table. In this way, these embodiments address the problem described previously, in which multiple sources of shader programs may generate programs independently.
With reference to
As discussed previously, a command table may be loaded with command instructions from various sources. In one embodiment, for example, a GPU driver executing on a host computer may pass these command instructions to a GPU, enhanced to the graphics pipeline and program sequencer.
Command instruction 0, in the depicted embodiment, is an “immediate” instruction, and instructs the program sequencer to set the global offset to zero. As such, the offsets associated with the instruction tables for the various modules within the pipeline will be set to zero as well. In some embodiments, e.g., where no global offset registries utilized, multiple command instructions may be utilized, to said each individual offset register to zero, or some other starting value.
Command instruction 1, in
Command instruction 2, in
Command instruction 3, in the depicted embodiment, is an execute.stop instruction, and instructs the program shader to process received data packets by configuring the modules in the pipeline to perform the instructions contained entries 0 through 11.
By loading instruction tables in this manner, e.g., through the use of the offset register, the program sequencer can load multiple programs into a single instruction table at the same time, in an address independent manner. As such, the programs being loaded do not need to include an explicit instruction table offset. Therefore, the use of the program sequencer, the offset register, and this approach to coding instructions allows for a more flexible and adaptable graphics pipeline.
Altering a Program
As noted previously, one issue in handling multiple programs in the same instruction table is that if one of the programs is changed, it may adversely affect other programs currently loaded into the table, e.g., through a read-modify-write interaction. Some embodiments address this problem through the use of the offset register.
One approach to handling modified shader programs is to completely reload the affected instruction tables. This approach is illustrated in
Offset 926 is again initialized zero, and the instructions corresponding to the first program are passed through data register 927, and written to instruction table 925 as entries 0 through 7. The instructions corresponding to the second program are then written to instruction table 925 beginning at the first available location, as indicated by the offset register. In the depicted embodiment, the second program is an eight instruction program, and so is written to entries 8 through 15.
Command instruction 3 is still an execute.stop instruction, but now instructs the program sequencer to configure the modules in the pipeline to perform the instructions contained in entries 0 through 15.
In some embodiments, the first program may have been altered, rather than the second. In these embodiments, the approach described with reference to
Efficient Instruction Table Updating
In some embodiments, a more efficient approach is available, as illustrated in
As shown in
In some embodiments, the first program may have been altered, rather than the second. In these embodiments, the approach described with reference to
Software Assisted Shader Merging
With reference now to
With reference to step 1105, a program sequencer in a graphics pipeline is configured. In various embodiments, as noted previously, the method and approach of configuring the program sequencer may vary. For example, in one embodiment, a GPU driver issues a series of command instructions to the program sequencer, which configure the program sequencer to control the graphics pipeline. In one such embodiment, a series of command instructions are loaded into a command table associated with the program sequencer; these command instructions control the operation of the program sequencer, which in turn governs the flow of data to the graphics pipeline.
With reference now to step 1110, a first shader program stored in memory is accessed. In some embodiments, shader programs are made up of multiple instruction blocks, and stored in memory, e.g., graphics memory or system memory. The instruction blocks are made up of instructions intended for modules within the graphics pipeline. These instructions configure the various modules perform certain tasks on data packets being passed through the pipeline. These instruction blocks are retrieved, e.g., by the program sequencer carrying out a gather operation.
With reference now to step 1115, shader instructions are generated from the first shader program. As noted previously, in some embodiments, shader programs are made up of a collection of instruction blocks, which in turn are made up of individual instructions. In some embodiments, instructions may be read, one of the time, directly from memory; in another embodiment, individual instructions may need to be derived, e.g., by the program sequencer, from the shader program.
With reference now to step 1120, this first set of shader instructions are loaded into an instruction table for a pipeline module at locations indicated by an offset register. As described previously, some embodiments utilize an offset register as a “pointer.” The value contained in the offset register indicates a position within the instruction table for the related module where instructions should be loaded. In one such embodiment, the program sequencer generates register packets, which contain one or more instructions intended for a particular module. The module recognizes the register packet, e.g., by reading an address field included in the header of the packet, and extracts the instructions contained therein. These instructions are then loaded into the instruction table for the module, beginning at the position indicated by the offset register.
With reference now to step 1130, a second shader programs stored in memory is accessed.
With reference now to step 1135, shader instructions are generated from this second shader program.
With reference now to step 1140, the second set of shader instructions are loaded into the instruction table for the pipeline module at the position indicated by the offset register, e.g., at the first available position following the first set of shader instructions. In this way, shader instructions corresponding to multiple shader programs can be passed to their intended destination and utilized by the appropriate module, without the need for the shader program to include specific memory addresses.
With reference now to step 1150, the program sequencer for the graphics pipeline is reconfigured. In some embodiments, the program sequencer, and specifically the command table for the program sequencer, is reconfigured. This may occur, for example, when a change has been made to one of the shader programs currently loaded in a graphics pipeline, and the GPU driver wishes to utilize the changed shader program. In one such embodiment, the GPU driver reloads the command table for the program shader.
With reference now to step 1155, the program sequencer executes the updated command instructions, to load a modified shader program into the appropriate instruction table. In some embodiments, the step entails gathering the modified shader program from memory, generating register packets, and forwarding the register packets down the pipeline. Modules within the pipeline will recognize packets intended for their consumption, and load the updated instructions into the instruction tables. In some embodiments, the updated shader program can be written over top of a shader program is intended to replace. In other embodiments, the shader program may be written to a different location within the instruction table, e.g., where sufficient blank or unused entries exist, in order to allow the shader program to be written continuously.
Notation and Nomenclature:
Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of a computer system (e.g., computer system 100 of
Embodiments of the present invention are thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the following claims.
Number | Name | Date | Kind |
---|---|---|---|
3091657 | Stuessel | May 1963 | A |
3614740 | Delagi et al. | Oct 1971 | A |
3987291 | Gooding et al. | Oct 1976 | A |
4101960 | Stokes et al. | Jul 1978 | A |
4541046 | Nagashima et al. | Sep 1985 | A |
4566005 | Apperley et al. | Jan 1986 | A |
4748585 | Chiarulli et al. | May 1988 | A |
4897717 | Hamilton et al. | Jan 1990 | A |
4958303 | Assarpour et al. | Sep 1990 | A |
4965716 | Sweeney | Oct 1990 | A |
4965751 | Thayer et al. | Oct 1990 | A |
4985848 | Pfeiffer et al. | Jan 1991 | A |
5040109 | Bowhill et al. | Aug 1991 | A |
5047975 | Patti et al. | Sep 1991 | A |
5175828 | Hall et al. | Dec 1992 | A |
5179530 | Genusov et al. | Jan 1993 | A |
5197130 | Chen et al. | Mar 1993 | A |
5210834 | Zurawski et al. | May 1993 | A |
5263136 | DeAguiar et al. | Nov 1993 | A |
5327369 | Ashkenazi | Jul 1994 | A |
5357623 | Megory-Cohen | Oct 1994 | A |
5375223 | Meyers et al. | Dec 1994 | A |
5388206 | Poulton et al. | Feb 1995 | A |
5388245 | Wong | Feb 1995 | A |
5418973 | Ellis et al. | May 1995 | A |
5430841 | Tannenbaum et al. | Jul 1995 | A |
5430884 | Beard et al. | Jul 1995 | A |
5432905 | Hsieh et al. | Jul 1995 | A |
5517666 | Ohtani et al. | May 1996 | A |
5522080 | Harney | May 1996 | A |
5560030 | Guttag et al. | Sep 1996 | A |
5561808 | Kuma et al. | Oct 1996 | A |
5574944 | Stager | Nov 1996 | A |
5627988 | Oldfield | May 1997 | A |
5644753 | Ebrahim et al. | Jul 1997 | A |
5649173 | Lentz | Jul 1997 | A |
5666169 | Ohki et al. | Sep 1997 | A |
5682552 | Kuboki et al. | Oct 1997 | A |
5682554 | Harrell | Oct 1997 | A |
5706478 | Dye | Jan 1998 | A |
5754191 | Mills et al. | May 1998 | A |
5761476 | Martell | Jun 1998 | A |
5764243 | Baldwin | Jun 1998 | A |
5784590 | Cohen et al. | Jul 1998 | A |
5784640 | Asghar et al. | Jul 1998 | A |
5796974 | Goddard et al. | Aug 1998 | A |
5802574 | Atallah et al. | Sep 1998 | A |
5809524 | Singh et al. | Sep 1998 | A |
5812147 | Van Hook et al. | Sep 1998 | A |
5835788 | Blumer et al. | Nov 1998 | A |
5848254 | Hagersten | Dec 1998 | A |
5920352 | Inoue | Jul 1999 | A |
5925124 | Hilgendorf et al. | Jul 1999 | A |
5940090 | Wilde | Aug 1999 | A |
5940858 | Green | Aug 1999 | A |
5949410 | Fung | Sep 1999 | A |
5950012 | Shiell et al. | Sep 1999 | A |
5978838 | Mohamed et al. | Nov 1999 | A |
5999199 | Larson | Dec 1999 | A |
6009454 | Dummermuth | Dec 1999 | A |
6016474 | Kim et al. | Jan 2000 | A |
6041399 | Terada et al. | Mar 2000 | A |
6049672 | Shiell et al. | Apr 2000 | A |
6073158 | Nally et al. | Jun 2000 | A |
6092094 | Ireton | Jul 2000 | A |
6108766 | Hahn et al. | Aug 2000 | A |
6112019 | Chamdani et al. | Aug 2000 | A |
6131152 | Ang et al. | Oct 2000 | A |
6141740 | Mahalingaiah et al. | Oct 2000 | A |
6144392 | Rogers | Nov 2000 | A |
6150610 | Sutton | Nov 2000 | A |
6189068 | Witt et al. | Feb 2001 | B1 |
6192073 | Reader et al. | Feb 2001 | B1 |
6192458 | Arimilli et al. | Feb 2001 | B1 |
6208361 | Gossett | Mar 2001 | B1 |
6209078 | Chiang et al. | Mar 2001 | B1 |
6222552 | Haas et al. | Apr 2001 | B1 |
6230254 | Senter et al. | May 2001 | B1 |
6239810 | Van Hook et al. | May 2001 | B1 |
6247094 | Kumar et al. | Jun 2001 | B1 |
6252610 | Hussain | Jun 2001 | B1 |
6292886 | Makineni et al. | Sep 2001 | B1 |
6301600 | Petro et al. | Oct 2001 | B1 |
6314493 | Luick | Nov 2001 | B1 |
6317819 | Morton | Nov 2001 | B1 |
6351808 | Joy et al. | Feb 2002 | B1 |
6370617 | Lu et al. | Apr 2002 | B1 |
6437789 | Tidwell et al. | Aug 2002 | B1 |
6438664 | McGrath et al. | Aug 2002 | B1 |
6480927 | Bauman | Nov 2002 | B1 |
6490654 | Wickeraad et al. | Dec 2002 | B2 |
6496902 | Faanes et al. | Dec 2002 | B1 |
6499090 | Hill et al. | Dec 2002 | B1 |
6525737 | Duluk, Jr. et al. | Feb 2003 | B1 |
6529201 | Ault et al. | Mar 2003 | B1 |
6597357 | Thomas | Jul 2003 | B1 |
6603481 | Kawai et al. | Aug 2003 | B1 |
6624818 | Mantor et al. | Sep 2003 | B1 |
6631423 | Brown et al. | Oct 2003 | B1 |
6631463 | Floyd et al. | Oct 2003 | B1 |
6657635 | Hutchins et al. | Dec 2003 | B1 |
6658447 | Cota-Robles | Dec 2003 | B2 |
6674841 | Johns et al. | Jan 2004 | B1 |
6700588 | MacInnis et al. | Mar 2004 | B1 |
6715035 | Colglazier et al. | Mar 2004 | B1 |
6732242 | Hill et al. | May 2004 | B2 |
6809732 | Zatz et al. | Oct 2004 | B2 |
6812929 | Lavelle et al. | Nov 2004 | B2 |
6825843 | Allen et al. | Nov 2004 | B2 |
6825848 | Fu et al. | Nov 2004 | B1 |
6839062 | Aronson et al. | Jan 2005 | B2 |
6862027 | Andrews et al. | Mar 2005 | B2 |
6891543 | Wyatt | May 2005 | B2 |
6915385 | Leasure et al. | Jul 2005 | B1 |
6944744 | Ahmed et al. | Sep 2005 | B2 |
6952214 | Naegle et al. | Oct 2005 | B2 |
6965982 | Nemawarkar | Nov 2005 | B2 |
6975324 | Valmiki et al. | Dec 2005 | B1 |
6976126 | Clegg et al. | Dec 2005 | B2 |
6978149 | Morelli et al. | Dec 2005 | B1 |
6978457 | Johl et al. | Dec 2005 | B1 |
6981106 | Bauman et al. | Dec 2005 | B1 |
6985151 | Bastos et al. | Jan 2006 | B1 |
7015909 | Morgan, III et al. | Mar 2006 | B1 |
7031330 | Bianchini, Jr. | Apr 2006 | B1 |
7032097 | Alexander et al. | Apr 2006 | B2 |
7035979 | Azevedo et al. | Apr 2006 | B2 |
7148888 | Huang | Dec 2006 | B2 |
7151544 | Emberling | Dec 2006 | B2 |
7154500 | Heng et al. | Dec 2006 | B2 |
7159212 | Schenk et al. | Jan 2007 | B2 |
7185178 | Barreh et al. | Feb 2007 | B1 |
7202872 | Paltashev et al. | Apr 2007 | B2 |
7260677 | Vartti et al. | Aug 2007 | B1 |
7305540 | Trivedi et al. | Dec 2007 | B1 |
7321787 | Kim | Jan 2008 | B2 |
7334110 | Faanes et al. | Feb 2008 | B1 |
7369815 | Kang et al. | May 2008 | B2 |
7373478 | Yamazaki | May 2008 | B2 |
7406698 | Richardson | Jul 2008 | B2 |
7412570 | Moll et al. | Aug 2008 | B2 |
7486290 | Kilgariff et al. | Feb 2009 | B1 |
7487305 | Hill et al. | Feb 2009 | B2 |
7493452 | Eichenberger et al. | Feb 2009 | B2 |
7545381 | Huang et al. | Jun 2009 | B2 |
7564460 | Boland et al. | Jul 2009 | B2 |
7750913 | Parenteau et al. | Jul 2010 | B1 |
7777748 | Bakalash et al. | Aug 2010 | B2 |
7852341 | Rouet et al. | Dec 2010 | B1 |
7869835 | Zu | Jan 2011 | B1 |
8020169 | Yamasaki | Sep 2011 | B2 |
20010026647 | Morita | Oct 2001 | A1 |
20020116595 | Morton | Aug 2002 | A1 |
20020130874 | Baldwin | Sep 2002 | A1 |
20020144061 | Faanes et al. | Oct 2002 | A1 |
20020194430 | Cho | Dec 2002 | A1 |
20030001847 | Doyle et al. | Jan 2003 | A1 |
20030003943 | Bajikar | Jan 2003 | A1 |
20030014457 | Desai et al. | Jan 2003 | A1 |
20030016217 | Vlachos et al. | Jan 2003 | A1 |
20030016844 | Numaoka | Jan 2003 | A1 |
20030031258 | Wang et al. | Feb 2003 | A1 |
20030067473 | Taylor et al. | Apr 2003 | A1 |
20030172326 | Coffin, III et al. | Sep 2003 | A1 |
20030188118 | Jackson | Oct 2003 | A1 |
20030204673 | Venkumahanti et al. | Oct 2003 | A1 |
20030204680 | Hardage, Jr. | Oct 2003 | A1 |
20030227461 | Hux et al. | Dec 2003 | A1 |
20040012597 | Zatz et al. | Jan 2004 | A1 |
20040073771 | Chen et al. | Apr 2004 | A1 |
20040073773 | Demjanenko | Apr 2004 | A1 |
20040103253 | Kamei et al. | May 2004 | A1 |
20040193837 | Devaney et al. | Sep 2004 | A1 |
20040205326 | Sindagi et al. | Oct 2004 | A1 |
20040212730 | MacInnis et al. | Oct 2004 | A1 |
20040215887 | Starke | Oct 2004 | A1 |
20040221117 | Shelor | Nov 2004 | A1 |
20040263519 | Andrews et al. | Dec 2004 | A1 |
20050012759 | Valmiki et al. | Jan 2005 | A1 |
20050024369 | Xie | Feb 2005 | A1 |
20050071722 | Biles | Mar 2005 | A1 |
20050088448 | Hussain et al. | Apr 2005 | A1 |
20050239518 | D'Agostino et al. | Oct 2005 | A1 |
20050262332 | Rappoport et al. | Nov 2005 | A1 |
20050280652 | Hutchins et al. | Dec 2005 | A1 |
20060020843 | Frodsham et al. | Jan 2006 | A1 |
20060064517 | Oliver | Mar 2006 | A1 |
20060064547 | Kottapalli et al. | Mar 2006 | A1 |
20060103659 | Karandikar et al. | May 2006 | A1 |
20060152519 | Hutchins et al. | Jul 2006 | A1 |
20060152520 | Gadre et al. | Jul 2006 | A1 |
20060176308 | Karandikar et al. | Aug 2006 | A1 |
20060176309 | Gadre et al. | Aug 2006 | A1 |
20070076010 | Swamy et al. | Apr 2007 | A1 |
20070130444 | Mitu et al. | Jun 2007 | A1 |
20070285427 | Morein et al. | Dec 2007 | A1 |
20080016327 | Menon et al. | Jan 2008 | A1 |
20080278509 | Washizu et al. | Nov 2008 | A1 |
20090235051 | Codrescu et al. | Sep 2009 | A1 |
20120023149 | Kinsman et al. | Jan 2012 | A1 |
Number | Date | Country |
---|---|---|
H08-077347 | Mar 1996 | JP |
H08-153032 | Jun 1996 | JP |
08-297605 | Dec 1996 | JP |
H09-325759 | Dec 1997 | JP |
10-222476 | Aug 1998 | JP |
11-190447 | Jul 1999 | JP |
2000-148695 | May 2000 | JP |
2003-178294 | Jun 2003 | JP |
2004-252990 | Sep 2004 | JP |
413766 | Dec 2000 | TW |
436710 | May 2001 | TW |
442734 | Jun 2001 | TW |
Entry |
---|
Merriam-Webster Dictionary Online; Definition for “program”; retrieved Dec. 14, 2010. |
Foley et al. Computer Graphics: Principles and Practice; Second Edition in C; pp. 866-867; Addison-Wesley 1997. |
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1997 p. 8-1. |
Intel, Intel Architecture Software Deveopler's Manual, vol. 1: Basic Architecture 1999 p. 8-1, 9-1. |
Intel, Intel Pentium III Xeon Processor at 500 and 550Mhz, Feb. 1999. |
Free On-Line Dictionary of Computing (FOLDOC), defintion of “video”, from foldoc.org/index.cgi?query=video&action=Search, May 23, 2008. |
FOLDOC, definition of “frame buffer”, from foldoc.org/index.cgi?query=frame+buffer&action=Search, Oct. 3, 1997. |
PCreview, article entitled “What is a Motherboard”, from www.pereview.co.uk/articles/Hardware/What—is—a—Motherboard., Nov. 22, 2005. |
FOLDOC, definition of “motherboard”, from foldoc.org/index.cgi?query=motherboard&action=Search, Aug. 10, 2000. |
FOLDOC, definition of “separate compilation”, from foldoc.org/index.cgi?query=separate+compilation&action=Search, Feb. 19, 2005. |
FOLDOC, definition of “vector processor”, http://foldoc.org/, Sep. 11, 2003. |
Wikipedia, defintion of “vector processor”, http://en.wikipedia.org/, May 14, 2007. |
Fisher, Joseph A., Very Long Instruction Word Architecture and the ELI-512, ACM, 1993, pp. 140-150. |
FOLDOC (Free On-Line Dictionary of Computing), defintion of X86, Feb. 27, 2004. |
FOLDOC, definition of “superscalar,” http://foldoc.org/, Jun. 22, 2009. |
FOLDOC, definition of Pentium, Sep. 30, 2003. |
Wikipedia, definition of “scalar processor,” Apr. 4, 2009. |
Intel, Intel MMX Technology at a Glance, Jun. 1997. |
Intel, Pentium Processor Family Developer's Manual, 1997, pp. 2-13. |
Intel, Pentium processor with MMX Technology at 233Mhz Performance Brief, Jan. 1998, pp. 3 and 8. |
Wikipedia, entry page defining term “SIMD”, last modified Mar. 17, 2007. |
FOLDOC, Free Online Dictionary of Computing, defintion of SIMD, foldoc.org/index.cgi?query=simd&action=Search, Nov. 4, 1994. |
Definition of “queue” from Free on-Line Dictionary of Computing (FOLDOC), http://folddoc.org/index.cgi?query=queue&action=Search, May 15, 2007. |
Definition of “first-in first-out” from FOLDOC, http://foldoc.org/index.cgi?query=fifo&action=Search, Dec. 6, 1999. |
Definition of “block” from FOLDOC, http://foldoc.org/index.cgi?block, Sep. 23, 2004. |
Quinnell, Richard A. “New DSP Architectures Go “Post-Harvard” for Higher Performance and Flexibility” Techonline; posted May 1, 2002. |
Wikipedia, definition of Multiplication, accessed from en.wikipedia.org/w/index.php?title=Multiplication&oldid=1890974, published Oct. 13, 2003. |
IBM TDB, Device Queue Management, vol. 31 Iss. 10, pp. 45-50, Mar. 1, 1989. |
Hamacher, V. Carl et al., Computer Organization, Second Edition, McGraw Hill, 1984, pp. 1-9. |
Graham, Susan L. et al., Getting Up to Speed: The future of Supercomputing, the National Academies Press, 2005, glossary. |
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 102 and 338. |
Rosenberg, Jerry M., Dictionary of Computers, Information Processing & Telecommunications, 2nd Edition, John Wiley & Sons, 1987, pp. 305. |
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1988, pp. 273. |
Graf, Rudolf F., Modern Dictionary of Electronics, Howard W. Sams & Company, 1984, pp. 566. |
Wikipeida, definition of “subroutine”, published Nov. 29, 2003, four pages. |
Graston et al. (Software Pipelining Irregular Loops on the TMS320C6000 VLIW DSP Architecture); Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems; pp. 138-144; Year of Publication: 2001. |
SearchStorage.com Definitions, “Pipeline Burst Cache,” Jul. 31, 2001, url: http://searchstorage.techtarget.com/sDefinition/0,,sid5—gci214414,00.html. |
Parhami, Behrooz, Computer Arithmetic: Algorithms and Hardware Designs, Oxford University Press, Jun. 2000, pp. 413-418. |
gDEBugger, graphicRemedy, http://www.gremedy.com, Aug. 8, 2006. |
Duca et al., A Relational Debugging Engine for Graphics Pipeline, International Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2005, pp. 453-463, ISSN:0730-0301. |
NVIDIA Corporation, Technical Brief: Transform and Lighting; dated 1999; month unknown. |
Gadre, S., Patent Application Entitled “Video Processor Having Scalar and Vector Components with Command FIFO for Passing Function Calls from Scalar to Vector”, U.S. Appl. No. 11/267,700, filed Nov. 4, 2005. |
Gadre, S., Patent Application Entitled “Stream Processing in a Video Processor”, U.S. Appl. No. 11/267,599, filed Nov. 4, 2005. |
Karandikar et al., Patent Application Entitled: “Multidemnsional Datapath Processing in a Video Processor”, U.S. Appl. No. 11/267,638, filed Nov. 4, 2005. |
Karandikar et al., Patent Application Entitled: “A Latency Tolerant System for Executing Video Processing Operations”, U.S. Appl. No. 11/267,875, filed Nov. 4, 2005. |
Gadre, S., Patent Application Entitled “Separately Schedulable Condition Codes for a Video Processor”, U.S. Appl. No. 11/267,793, filed Nov. 4, 2005. |
Lew, et al., Patent Application Entitled “A Programmable DMA Engine for Implementing Memory Transfers for a Video Processor”, U.S. Appl. No. 11/267,777, filed Nov. 4, 2005. |
Karandikar et al., Patent Application Entitled: “A Pipelined L2 Cache for Memory Transfers for a Video Processor”, U.S. Appl. No. 11/267,606, filed Nov. 4, 2005. |
Karandikar, et al., Patent Application Entitled: “Command Acceleration in a Video Processor”, U.S. Appl. No. 11/267,640, filed Nov. 4, 2005. |
Karandikar, et al., Patent Application Entitled “A Configurable SIMD Engine in a Video Processor”, U.S. Appl. No. 11/267,393, filed Nov. 4, 2005. |
Karandikar, et al., Patent Application Entitled “Context Switching on a Video Processor Having a Scalar Execution Unit and a Vector Execution Unit”, U.S. Appl. No. 11/267,778, filed Nov. 4, 2005. |
Lew, et al., Patent Application Entitled “Multi Context Execution on a Video Processor”, U.S. Appl. No. 11/267,780, filed Nov. 4, 2005. |
Su, Z, et al., Patent Application Entitled: “State Machine Control for a Pipelined L2 Cache to Implement Memory Transfers for a Video Processor”, U.S. Appl. No. 11/267,119, filed Nov. 4, 2005. |
Kozyrakis, “A Media enhanced vector architecture for embedded memory systems,” Jul. 1999, http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-99/1059.pdf. |
HPL-PD A Parameterized Research Approach—May 31, 2004 http://web.archive.org/web/*/www.trimaran.org/docs/5—hpl-pd.pdf. |
Heirich; Optimal Automatic Mulit-pass Shader Partitioning by Dynamic Programming; Eurographics—Graphics Hardware (2005); Jul. 2005. |
Brown, Brian; “Data Structure and Number Systems”; 2000; http://www.ibilce.unesp.br/courseware/datas/data3.htm. |
Hutchins E., SC10: A Video Processor and Pixel-Shading GPU for Handheld Devices; presented at the Hot Chips conferences on Aug. 23, 2004. |
Wilson D., NVIDIA's Tiny 90nm G71 and G73: GeForce 7900 and 7600 Debut; at http://www.anandtech.com/show/1967/2; dated Sep. 3, 2006, retrieved Jun. 16, 2011. |
Woods J., Nvidia GeForce FX Preview, at http://www.tweak3d.net/reviews/nvidia/nv30preview/1.shtml; dated Nov. 18, 2002; retrieved Jun. 16, 2011. |