1. Field of Invention
The present invention relates generally to the field of 3D computer graphics rendering, and more particularly, to ways of and means for improving the performance of parallel graphics rendering processes running on 3D parallel graphics rendering systems supporting the decomposition of 3D scene objects among its multiple graphics processing pipelines (GPPLs).
2. Brief Description of the State of Knowledge in the Art
Applicants' copending U.S. patent application Ser. No. 11/897,536, incorporated herein by reference, in its entirety, discloses diverse kinds of PC-level computing systems embodying different types of parallel graphics rendering subsystems (PGRSs) with graphics processing pipelines (GPPLs) generally illustrated in
In general, such graphics-based computing systems support multiple modes of graphics rendering parallelism across their GPPLs, including image and object division modes, which can be adaptively and dynamically switched into operation during the run-time of any graphics application running on the host computing system. While each mode of parallel operation has its advantages, as described in copending U.S. patent application Ser. No. 11/897,536, supra, the object division mode of parallel operation is particularly helpful during the running of interactive gaming applications because this mode has the potential of resolving many bottleneck conflicts which naturally accompany such demanding applications.
Today, real-time graphics applications, such as advanced video games, are more demanding than ever, utilizing massive textures, abundance of polygons, high depth-complexity, anti-aliasing, multi-pass rendering, etc., with such robustness growing exponentially over time.
Clearly, conventional PC-based graphics systems fail to address the dynamically changing needs of modern graphics applications. By their very nature, prior art PC-based graphics systems are unable to resolve the variety of bottlenecks (e.g. geometry limited, pixel limited, data transfer limited, and memory limited) summarized in FIG. 3C1 of copending U.S. patent application Ser. No. 11/897,536, that dynamically arise along 3D graphic pipelines. Consequently, such prior art graphics systems are often unable to maintain a high and steady level of performance throughout a particular graphics application.
Thus, a given pipeline along a parallel graphics system is only as strong as the weakest link of it stages, and thus a single bottleneck determines the overall throughput along the graphics pipelines, resulting in unstable frame-rate, poor scalability, and poor performance.
And while each parallelization mode described above and summarized in copending U.S. patent application Ser. No. 11/897,536, solves only part of the bottleneck dilemma currently existing along the PC-based graphics pipelines, no one parallelization method, in and of itself, is sufficient to resolve all bottlenecks in demanding graphics applications, and enable quantum leaps in graphics performance necessary for photo-realistic imagery in real-time interactive graphics environments.
Thus, there is a great need in the art for a new and improved way of and means for practicing parallel 3D graphics rendering processes in modern multiple-GPU based computer graphics systems, while avoiding the shortcomings and drawbacks of such prior art methodologies and apparatus.
Accordingly, a primary object of the present invention is to provide a new and improved method of and apparatus for practicing parallel 3D graphics processes in modern multiple-GPU based computer graphics systems, based on monitoring the graphics workloads in a sub-frame resolution, treating graphics tasks as objects, and parallelize graphics task-objects in 3D scenes, among multiple graphics processing pipelines (GPPLs).
Another object of the present invention is to a new and improved parallel graphics processing subsystem that matches the optimal parallel mode of division to the graphics workload, at each instant of time during the running a graphics-based application.
Another object of the present invention is to provide such a parallel graphics processing subsystem supporting various division modes among GPPLs; image division, object division, and improved object division with no recomposition.
Another object of the present invention is to provide a new and improved method of parallel graphics processing on a parallel graphics processing system that is capable of real-time modification of the flow structure of the incoming graphics commands such that multi-mode parallelism is carried out among GPPLs in an optimal manner.
Another object of the present invention is to a provide new and improved parallel graphics processing system that carries out real-time (i.e. online) decisions on what is the best parallelization method to operate the GPPLs, and to modify the flow of the incoming commands in real-time accordingly.
Another object of the present invention is to a new and improved method of controlling the operation of parallel graphics processing among a plurality of GPPLs on a parallel graphics processing system according to a new type of object-division parallelism, involving the performance of sub-frame division, wherein each frame of a 3D scene to be rendered is divided into a set of minimal tasks (where each task is considered as a macro-object of sorts), and then, in the spirit of object-division parallelism, the processing of these divided tasks are distributed between multiple GPU's.
Another object of the present invention is to a new and improved parallel graphics processing system having object-division mode of parallel graphics processing, wherein each frame of a 3D scene to be rendered is divided into a set of minimal tasks (where each task is considered as a macro-object of sorts), and then, in the spirit of object-division parallelism, the processing of these divided tasks are distributed between multiple GPU's in a real-time manner during the run-time of the graphics-based application executing on the CPU(s) of associated host computing system.
Another object of the present invention is to a new and improved host computing system, having one or more CPUs and employing a parallel graphics processing system having object-division mode of parallel graphics processing, wherein each frame of a 3D scene to be rendered is divided into a set of minimal tasks (where each task is considered as a macro-object of sorts), and then, in the spirit of object-division parallelism, the processing of these divided tasks are distributed between multiple GPU's in a real-time manner during the run-time of the graphics-based application executing on the CPU(s) of the host computing system.
These and other objects of the present invention will become apparent hereinafter and in the claims to invention.
For a more complete understanding of how to practice the Objects of the Present Invention, the following Detailed Description of the Illustrative Embodiments can be read in conjunction with the accompanying Drawings, briefly described below:
FIG. 3B1 is a schematic representation of the subcomponents of a first illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) that can be employed in the PGPS of the present invention depicted in
FIG. 3B2 is a schematic representation of the subcomponents of a second illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) that can be employed in the POPS of the present invention depicted in
FIG. 3B3 is a schematic representation of the subcomponents of an illustrative embodiment of a CPU-based graphics processing pipeline that can be employed in the PGPS of the present invention depicted in
In contemporary graphics applications, multiple rendering targets are used, rather than a single back-buffer. Scene objects are simultaneously rendered to a set of rendering ‘surfaces’ in texture memory in order to generate effects such as shadow maps and reflections. The rendering ‘surfaces’ can be rendered in various orders, however any order must satisfy the dependencies between surfaces. In some stage all ‘surfaces’ must be merged into back buffer.
The present invention monitors the rendering order and controls the rendering flow by breaking down the sequence of rendering commands into blocks. Some of the heaviest blocks are farther break down into entities called task-objects. There are different possible break down (graphics frame/stream division) schemes according to the chosen parallelization mode of parallelism: e.g. time-division, image division, classical (depth-based) object division, or ‘depthless’ object division, each being supported in real-time in Applicants' parallel graphics processing system described great detail in copending U.S. application Ser. No. 12/077,072, incorporated herein by reference. Optimization of the scheme and tasks-objects parallelization among multiple GPPLs is carried out by a scheduler.
The parallel 3D graphics processing system and method of the present invention can be practiced in diverse kinds of computing and micro-computing environments in which 3D graphics support is required or desired. Referring to
In
As shown, the PMCM further comprises an OS-GPU interface (I/F) and Utilities; Merge Management Module; Distribution Management Module; Distributed Graphics Function Control; and Hub Control, as described in greater detail in U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, incorporated herein by reference.
As shown, the Decomposition Module further comprises a Load Balance Submodule, and a Division Submodule, whereas the Distribution Module comprises a Distribution Management Submodule and an Interconnect Network.
Also, the Rendering Module comprises the plurality of GPPLs, whereas the Re-Composition Module comprises the Pixel Shader, the Shader Program Memory and the Video Memory (e.g. Z Buffer and Color Buffers) within each of the GPPLs cooperating over the Interconnect Network.
In FIG. 3B1, a first illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) is shown for use in the PGPS of the present invention depicted in
In FIG. 3B2, a second illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) is shown for use in the PGPS of the present invention depicted in
In FIG. 3B3, an illustrative embodiment of a CPU-based graphics processing pipeline (GPPL) is shown for use in the PGPS of the present invention depicted in
In
Having described the system architecture of the illustrative embodiment of the present invention, it is now appropriate to focus attention to its new and improved mode of parallel graphics processing carried out according to its tasked-based object division principles of operation.
When a task-object uses or creates a render target that is used by subsequent rendering operation to a different target, a dependency is set up between the two task objects. A simplified example is shown in
This scene is generated by the code of
The above code is converted into Block Dependency Graph of
In
The host computing system of the present invention performing task-object based graphics parallelization of present invention is depicted in
Task-object and sub frame division, refers to the ability to divide a frame to minimal tasks, and distribute the processing of these tasks between multiple GPU's. This is a new way of graphics parallelization in a sub-frame resolution. In order to break down the entire flow of the rendering to task-objects within a single frame, the stream of commands must be scanned and a map of all the textures and surfaces that are used during the scene must be created. The tasks are then organized in a Task Graph, which is sent to a Scheduling mechanism. At last, the tasks are executed on the desired GPU(s), the partial results are inter-communicated by the synchronizer mechanism, and the next tasks are being processed.
Every command sent by the application to the 3D Engine, is intercepted and accumulated in a Command Buffer 601.
The Block Separator 602 processes the Command Buffer. Each set of commands could be defined as a Block 603. For example, a block could be created for each draw command and its preceding commands, or for all commands between two SetRenderTarget commands, or even an entire frame. The definitions of block could vary due to some reasons: Larger blocks (and therefore fewer blocks) are faster to analyze, thus saving CPU time. Smaller blocks allow more precise distribution.
Each Block can be break down to task-objects in several ways, according to the various parallelization modes (such as Image Division, Object Division, and Depthless Object Division). The Task Separator 604 is responsible of splitting the block to a set of optional Processing Techniques, each technique, consisting of several Tasks-objects. For example, assume we have a simple Block with a Clear command and 3 Draw calls, generating image of
This block could generate several Task-object sets, as shown in
The Dependency 608 component finds all the resources updated and needed by this block task. For example, a drawing block, updates the Render Target, and probably the Z-Buffer too, and it depends on the Vertex Buffer, the sampled Textures, and again the Z-Buffer.
The Cost Approximation 607 module is responsible of approximating the cost of a task, before it is being executed. Typically the cost depends on the amount of work to be done, and the cost of communication to/from the task-object, depending mostly on the size of the resource (in Bytes), and the bandwidth of the PCI-e Bus. The approximation of cost is critical for scheduling, and therefore must occur before the execution and should be as precise as possible. The module attempts to find a correlation between the streamed commands, and the true cost of a task.
The present application is a Continuation-in-Part (CIP) of the following Applications: Ser. No. 12/077,072 filed Mar. 14, 2008; Ser. No. 11/897,536 filed Aug. 30, 2007; U.S. application Ser. No. 11/789,039 filed Apr. 23, 2007; U.S. application Ser. No. 11/789,039 filed Apr. 23, 2007; U.S. application Ser. No. 11/655,735 filed Jan. 18, 2007, which is based on Provisional Application Ser. No. 60/759,608 filed Jan. 18, 2006; U.S. application Ser. No. 11/648,160 filed Dec. 31, 2006; U.S. application Ser. No. 11/386,454 filed Mar. 22, 2006; U.S. application Ser. No. 11/340,402 filed Jan. 25, 2006, which is based on Provisional Application No. 60/647,146 filed Jan. 25, 2005; U.S. application Ser. No. 10/579,682 filed May 17, 2006, which is a National Stage Entry of International Application No. PCT/IL2004/001069 filed Nov. 19, 2004, which is based on Provisional Application Ser. No. 60/523,084 filed Nov. 19, 2003; each said patent application being commonly owned by Lucid Information Technology, Ltd., and being incorporated herein by reference as if set forth fully herein.
Number | Date | Country | |
---|---|---|---|
60759608 | Jan 2006 | US | |
60647146 | Jan 2005 | US | |
60523084 | Nov 2003 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12077072 | Mar 2008 | US |
Child | 12229215 | US | |
Parent | 11897536 | Aug 2007 | US |
Child | 12077072 | US | |
Parent | 11789039 | Apr 2007 | US |
Child | 11897536 | US | |
Parent | 11655735 | Jan 2007 | US |
Child | 11789039 | US | |
Parent | 11648160 | Dec 2006 | US |
Child | 11655735 | US | |
Parent | 11386454 | Mar 2006 | US |
Child | 11648160 | US | |
Parent | 11340402 | Jan 2006 | US |
Child | 11386454 | US | |
Parent | 10579682 | Mar 2007 | US |
Child | 11340402 | US |