A portion of the disclosure of this patent document contains material which is subject to (copyright or mask work) protection. The (copyright or mask work) owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all (copyright or mask work) rights whatsoever.
This disclosure relates generally to the field of computer programming. More particularly, but not by way of limitation, it relates to techniques for programming graphical and computational applications to execute on a variety of graphical and computational processors.
Computers and other computational devices typically have at least one programmable processing element that is generally known as a central processing unit (CPU). They frequently also have other programmable processors that are used for specialized processing of various types, such as graphics processing operations, hence are typically called graphics processing units (GPUs). GPUs generally comprise multiple cores or processing elements designed for executing the same instruction on parallel data streams, making them more effective than general-purpose CPUs for algorithms in which processing of large blocks of data is done in parallel. In general, a CPU functions as the host and hands-off specialized parallel tasks to the GPUs.
Several frameworks have been developed for heterogeneous computing platforms that have CPUs and GPUs. These frameworks include OpenGL™. OpenGL focuses on using the GPU for graphics processing and provides APIs for rendering 2D and 3D graphics.
The OpenGL framework offers a C-like development environment in which users can create applications to run on various different types of CPUs, GPUs, digital signal processors (DSPs), and other processors. OpenGL also provides a compiler and a runtime environment in which code can be compiled and executed within a heterogeneous computing system. When using OpenGL, developers can use a single, unified language to target all of the processors for which an OpenGL driver is available. This is done by presenting the developer with an abstract platform model and application programming interface (API) that conceptualizes all of these architectures in a similar way, as well as an execution model supporting data and task parallelism across heterogeneous architectures.
When an OpenGL program is executed, a series of API calls configure the system for execution, an embedded compiler compiles the OpenGL code, and the runtime asynchronously coordinates execution between parallel tasks. A typical OpenGL-based system runs source code through an embedded compiler on the end-user system to generate executable code for a target GPU available on that system. Then, the executable code, or portions of the executable code, are sent to the target GPU and are executed. However, this approach, particularly the compiling step, may take too long for some types of applications, such as graphics-intensive games.
In some sense, OpenGL itself may be considered as a state machine, with each command potentially resulting in a state change that requires the generation and/or compilation of new GPU code. This arises from the fact that certain GPU functions rely on dedicated circuitry within the GPU, while others require use of the programmable features of the GPU. Depending on the particular GPU hardware being used, these types of state changes can be very expensive from a computation time perspective. Additionally, in recent years, evolution of GPU hardware has outpaced evolution of OpenGL, such that, in some sense, OpenGL APIs are mismatched to the hardware environment in which the programs will run. The result is that a developer may inadvertently be writing code that is particularly inefficient for at least some hardware on which it will run.
Therefore, there is a need in the art for a framework for GPU programming that more closely relate the APIs to the underlying hardware, such that a developer is aware of the distinctions between the fixed-function portions and the programmable portions of modern GPUs. This awareness can enable a developer to write code that executes more efficiently on modern devices.
One disclosed embodiment includes a non-transitory computer readable medium having instructions stored thereon to support immutable pipeline state objects containing code for a graphics processing unit (GPU). When executed, the instructions can cause one or more processors to create an immutable pipeline state object that contains compiled information about one or more graphics operations to display a graphical object. The immutable pipeline state object can be compiled at application load time to encapsulate executable instructions for a GPU and externalize mutable attributes requiring re-compilation when changed. The one or more graphics operations can include one or more shaders of a type selected from the group consisting of a vertex shader, fragment shader, and a vertex fetch configuration. The one or more graphics operations can include at least one item selected from the group consisting of blend state, rasterization enablement, and multisample masking.
The non-transitory computer readable medium of this first disclosed embodiment can further include instructions that cause the one or more processors to create a set of one or more associated state options for the immutable state object. The set of one or more associated state options can include data attributes that can be changed without causing a corresponding change to the executable instructions for the GPU and the associated immutable state object. Examples of such attributes include input textures or input vertex data, viewport size, and/or occlusion query data.
Another disclosed embodiment relates to a method of generating GPU code for graphical operations in an application program. The method can include defining one or more objects, such as a target frame buffer configuration, to be persistent throughout a rendering pass executed by a GPU. The method can further include defining a plurality of immutable pipeline state objects, each associated with a graphical operation and containing compiled executable instructions for the GPU. The method can also include defining one or more state options associated with the immutable state object. The one or more state options can include data attributes that can be changed to alter the corresponding graphical operation without causing a change to the executable instructions for the GPU. In the disclosed method of this embodiment, the compiled executable instructions for the GPU can be arranged so as to be compiled only one time at a time other than draw time of the graphical operation and cached for repeated use thereafter. This time can be when the application is installed onto a target system or when the application is loaded into a memory of the target system for execution. The immutable pipeline state objects can further include additional parameters that affect the compiled executable instructions for the GPU. The immutable pipeline state objects can also include at least one shader such as a vertex shader, fragment shader, and a vertex fetch configuration and at least one additional item selected such as a blend state, rasterization enablement, and multisample masking.
Yet another disclosed embodiment relates to a computing device having a memory and a processor, the processor including a CPU and a GPU. The processing device can be configured to execute program code stored in the memory, thereby creating an immutable pipeline state object and a set of one or more associated state options for the immutable state object. The immutable pipeline state object can contain compiled information about one or more graphics operations to display a graphical object and can be adapted to be compiled at a time other than the time at which the graphical object is rendered so as to encapsulate executable instructions for a GPU and externalize mutable attributes requiring re-compilation when changed. The set of one or more associated state options for the immutable state object can include data attributes that can be changed without causing a corresponding change to the executable instructions for the GPU and the associated immutable state object. The one or more graphics operations can include one or more shaders, such as a vertex shader, fragment shader, or a vertex fetch configuration. The one or more graphics operations can include at least one item such as blend state, rasterization enablement, and multisample masking. The time other than the time at which the graphical object is rendered can be a time that an application including the program code is installed onto the computing device or a time that an application including the program code is loaded into the memory of the computing device for execution.
An innovative GPU framework and related APIs present more accurate representations of the target hardware so that the distinctions between the fixed-function and programmable features of the GPU are perceived by a developer. This permits a program and/or a graphics object generated or manipulated by the program to be understood as not just code, but machine states that are associated with the code. When such an object is defined, the definitional components requiring programmable GPU features can be compiled only once and reused repeatedly as needed. Similarly, when a state change is made, the state changes correspond to the state changes made on the hardware. Additionally, the creation of these immutable objects prevents a developer from inadvertently changing portions of the program or object that cause it to behave differently than intended.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without these specific details. In other instances, structure and devices are shown in block diagram form in order to avoid obscuring the invention. References to numbers without subscripts or suffixes are understood to reference all instance of subscripts and suffixes corresponding to the referenced number. Moreover, the language used in this disclosure has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter. Reference in the specification to “one embodiment” or to “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention, and multiple references to “one embodiment” or “an embodiment” should not be understood as necessarily all referring to the same embodiment.
As used herein, the term “a computer system” can refer to a single computer or a plurality of computers working together to perform the function described as being performed on or by a computer system. Similarly, a machine-readable medium can refer to a single physical medium or a plurality of media that may together contain the indicated information stored thereon. A processor can refer to a single processing element or a plurality of processing elements, implemented either on a single chip or on multiple processing chips.
It will be appreciated that in the development of any actual implementation (as in any development project), numerous decisions must be made to achieve the developers' specific goals (e.g., compliance with system- and business-related constraints), and that these goals may vary from one implementation to another. It will also be appreciated that such development efforts might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the design an implementation of systems having the benefit of this disclosure.
Turning now to
The application 120 may be delivered to the target machine 105 in any desired manner, including electronic transport over a network and physical transport of machine-readable media. This generally involves delivery of the application 120 to a server (not shown in
Upon launch of the application 120, one action performed by the application can be creation of a collection of pipeline objects 155 that may include state information 125, fragment shaders 130, and vertex shaders 135, the application may be compiled by an embedded GPU compiler 145 that compiles the representation provided by the compiler 115 into native binary code for the GPU 150. The compiled native code may be cached in cache 140 or stored elsewhere in the target system 105 to improve performance if the same pipeline is recreated later, such as during future launches of the application. Finally, the GPU 150 may execute the native binary code, performing the graphics and compute kernels for data parallel operations.
Referring now to
As illustrated in
The storage device 214 is typically a magnetic hard drive, an optical drive, a non-volatile solid-state memory device, or other types of memory systems, which maintain data (e.g. large amounts of data) even after power is removed from the system. While
Referring now to
Computing system 300 includes a CPU 310, a GPU 330. In the embodiment illustrated in
In addition, computing system 300 also includes a system memory 340 that may be accessed by CPU 310 and GPU 330. In various embodiments, computing system 300 may comprise a supercomputer, a desktop computer, a laptop computer, a video-game console, an embedded device, a handheld device (e.g., a mobile telephone, smart phone, MP3 player, a camera, a GPS device, or other mobile device), or any other device that includes or is configured to include a GPU. Although not illustrated in
GPU 330 assists CPU 310 by performing certain special functions, such as graphics-processing tasks and data-parallel, general-compute tasks, usually faster than CPU 310 could perform them in software.
GPU 330 is coupled with CPU 310 and system memory 340 over link 350. Link 350 may be any type of bus or communications fabric used in computer systems, including a peripheral component interface (PCI) bus, an accelerated graphics port (AGP) bus, a PCI Express (PCIE) bus, or another type of link, including non-bus links. If multiple links 350 are employed, they may be of different types.
In addition to system memory 340, computing system 300 may include a local memory 320 that is coupled to GPU 330, as well as to link 350. Local memory 320 is available to GPU 330 to provide access to certain data (such as data that is frequently used) faster than would be possible if the data were stored in system memory 340. Both CPU 310 and GPU 330 can also contain caches or local memory within them.
Although a single CPU 310 and GPU 330 are illustrated in
Turning now to
A unified programming interface may be used to develop software on a system generally corresponding to that described above with respect to
Aspects of an innovative GPU programming framework and associated APIs may be best understood as a tiered-structure having three levels:
A first level is things that must be defined at the time a rendering pass is started and cannot be changed until the rendering pass is complete. First and foremost of these is the image that is being rendered to, i.e., the frame buffer configuration. Frame buffer configuration can include buffer size and dimensions, color parameters, etc. The fixation of frame buffer configuration at the start and for the duration of a render pass is in contradistinction to prior art GPU frameworks/APIs, in which change to frame buffer configuration is treated just like any other command, and can thus appear at any point, including in the middle of a rendering pass. Further aspects of frame buffer configuration are described in the co-pending application incorporated by reference above.
A second level for is choosing a pipeline state object, which includes all of the graphics state that must be compiled into GPU code. Because running the GPU compiler on the target system is computationally expensive, it is preferable to do all of this at once, generally not at run time, but rather at the time the application is installed on the target system or when the application is loaded for execution. Further aspects and details of the pipeline state object are discussed in greater detail below. By incorporating functionality triggering a GPU code re-compile into a pipeline state object, APIs may be developed that make a developer aware of when the code being written will result in a computationally expensive and time-consuming recompile. Further aspects of the pipeline state object are described below.
A third level is state options that are easy/inexpensive to change on the fly (e.g., during execution of the application) without the necessity of compiling new GPU code. Further aspects of these state options are described below as well as in the co-pending application incorporated by reference above.
Once a frame buffer has been defined, a command or sequence of commands to render an object to the frame buffer can be created. In some embodiments, this task can fall to a render command encoder 544 as illustrated in
Sampler 508 may be an immutable objected constructed using the Device method newSamplerWithDescriptor, which uses the sampler descriptor object 520 as an input value. Sampler descriptor object 520 may in turn be a mutable container for sampler properties including filtering options, addressing modes, maximum anisotropy, level-of-detail parameters, and depth comparison mode. To construct the sampler 508, desired values for sampler properties may be set in the sampler descriptor object 520 before it is used as an input in constructing the sampler 508.
Depth stencil state object 538 may be a mutable object used in constructing the render command encoder object 544. Depth stencil state object 538 may itself be constructed using depth stencil state descriptor object 530, which may be a mutable state object that contains settings for depth and/or stencil state. For example, depth stencil state descriptor object 530 may include a depth value for setting the depth, stencil back face state and stencil front face state properties for specifying separate stencil states for front and back-facing primitives, and a depth compare function property for specifying how a depth test is performed. For example, leaving the value of the depth compare function property at its default value indicates that the depth test always passes, which means an incoming fragment remains a candidate to replace the data at the specified location. If a fragment's depth value fails the depth test, the incoming fragment is discarded. Construction of a custom depth stencil state descriptor object 530 itself may require creation of a stencil state object 522 which may be an immutable state object. Other graphics states may also be part of the pipeline.
Pipeline state 542, examples of which are discussed below, may be an object containing compiled graphics rendering state, such as rasterization (including multisampling), visibility, and blend state. Pipeline state 542 may also contain programmable states such as two graphics shader functions to be executed on the GPU. One of these shader functions may be for vertex operations and one for fragment operations. The state in the pipeline state object 542 may generally be assembled and compiled at runtime. Pipeline state object 542 may be constructed using the pipeline state descriptor object 540 which may be a mutable descriptor object and a container for graphics rendering states.
In general to construct the Pipeline state object 542, first a pipeline state descriptor object 540 may be constructed and then its values may be set as desired. For example, a rasterization enabled property (BOOL type) may be set to NO, so that all primitives are dropped before rasterization, and no fragments are processed. Disabling rasterization may be useful to obtain feedback from vertex-only transformations. Other possible values that may be set include vertex and fragment function properties that help specify the vertex and fragment shaders, and a value for the blend state that specifies the blend state of a specified frame buffer attachment. If the frame buffer attachment supports multisampling, then multiple samples can be created per fragment, and the following pipeline state properties can be set to determine coverage: the sampleCount property for the number of samples for each fragment, the sampleMask property for specifying a bitmask that is initially bitwise ANDed with the coveragemask produced by the rasterizer (by default, the sampleMask bitmask may generally be all ones, so a bitwise AND with that bitmask does not change any values; an alphaToCoverageEnabled property to specify if the alpha channel fragment output may be used as a coverage mask, an alphaToOneEnabled property for setting the alpha channel fragment values, and a sampleCoverage property specifying a value (between 0.0 and 1.0, inclusive) that is used to generate a coverage mask, which may then be bitwise ANDed with the coverage value produced by the rasterizer.
Pipeline state descriptor object 540 itself may be constructed using one or objects that include function object 524, blend state 526, and pixel format 528. Function object 524 may represent a handle to a single function that runs on the GPU and may be created by compiling source code from an input value string. Function object 524 generally only relates to state values on graphics apps but not compute apps. Blend state 526 may be a mutable object containing values for blending. Blending may be a fragment operation that uses a highly configurable blend function to mix the incoming fragment's color data (source) with values in the frame buffer (destination). Blend functions may determine how the source and destination fragment values are combined with blend factors. Some of the properties that define the blend state may include 1) blending enabled property (BOOL value) for enabling blending; 2) writeMask property for specifying a bitmask that restricts which color bits are blended; 3) rgbBlendFunciton and alphaBlendFunction properties for assigning blend functions for the RGB and Alpha fragment data; and 4) sourceRGBBlendFactor, sourceAlphaBlendFactor, destinationRGBBlendFactor, and destinationAlphaBlendFactor properties for assigning source and destination blend factors.
In general, it may be desirable to know the pixel formats of every render target (multiple colors, depth, and stencil) as part of building the RenderPipelineState. This can allow the compiler to know how to format the output memory. Additionally, pipeline objects can be created at any time, although it may be desirable to create them early during application launch. This allows selection of a pre-created pipeline during the second level of execution described above.
Once all the required objects have been constructed, render command encoder object 544 may be constructed using those objects. Thus, in summary, to construct and initialize the render command encoder object 544, in one embodiment, first one or more frame buffer attachments 532 each of which containing the state of a destination for rendering commands (e.g., color buffer, depth buffer, or stencil buffer) may be constructed. Next, a mutable frame buffer descriptor object 534 that contains the frame buffer state, including its associated attachments may be constructed. Then using the frame buffer descriptor object 534, render command encoder object 544 can be constructed by calling a command buffer method (e.g., renderCommandEncoderWithFramebuffer).
At this point, a pipeline state object 542 to represent the compiled pipeline state, such as shader, rasterization (including multisampling), visibility, and blend state may be constructed by first creating the mutable descriptor object, pipeline state descriptor 540, and setting the desired graphics rendering state for the render-to-texture operation for pipeline state descriptor object 540. After pipeline state object 542 has been created, a render command encoder method (e.g., setPipelineState) may be called to associate the pipeline state object 542 to the render command encoder 544.
To reiterate, pipelines can be really expensive to create because they invoke the compiler. Ideally an application would create many pipelines upon launch or when loading content, and then for each frame create a render command encoder and, in sequence, set the pipelines, other states, and resources necessary for each object to draw. At any given time, a render command can have one “current” pipeline, which can be changed over time. The act of switching to an already-created pipeline can be expected to be inexpensive. An application can also create a pipeline at any time, including just before execution, but it may take an unsatisfyingly long time if done during animation. Thus, once created, a pipeline can be used with many different render commands. (For example, an application could create one pipeline and hold onto it forever, where a new render command can be created for each frame, and for each frame the pipeline can be used to draw an object.
An exemplary pipeline state object 542 is illustrated in
Because of the fixed relationships between the objects and the linkage between these relationships and the GPU code generation and compilation process, it is advantageous to combine them into a single pipeline state object. This allows the requisite GPU code to be generated and compiled only one time, either when the application is installed on the target system or when the application is loaded. The GPU machine code so-generated is then stored, allowing the code to be retrieved, passed to the GPU, and executed whenever a particular object is drawn. This avoids the necessity of generating and compiling the associated code on the fly each time a particular object is to be drawn.
In other words, pipeline state objects are immutable objects that represent compiled machine code. By use of different pipeline state objects for various graphical operations, when a draw call is made for an object associated with a particular pipeline state object, it is not necessary to look at the state of the API to determine, for example, whether a shader must be compiled because all states associated with the application program have already had corresponding GPU code generated, compiled, and stored so as to be ready for execution with whatever parameters are supplied on the fly during application run time.
In the prior art, determinations whether a code recompilation was necessary were typically made by generating a hash function for each compiled and cached GPU code segment. When a state was changed that might require new code generation, a corresponding hash of the state could be generated and checked against the available cached compiled states to determine whether the required executable code was already available. While this saved some time for instances in which the requisite code had already been compiled, even the checking process could be unduly time consuming in some applications. Thus, the use of pipeline state objects as described above can save significant time during program execution because the generation and compilation of all required GPU code can be front loaded to application installation or initiation, rather than taking place during the rendering operations.
While the parameters that affect code generation are incorporated into the immutable pipeline state object, there are still a variety of other parameters that do not affect code generation or require recompilation of the GPU code. These data and parameters, e.g., size of viewport, input textures, input vertex data, occlusion queries, etc., are available to be modified through API calls. In short, this can be understood as follows: changing the data that is manipulated is easy and computationally inexpensive, and is thus encapsulated in mutable objects and modifiable by API calls. Changing the way the data is manipulated is harder and computationally intensive, and is thus encapsulated in immutable objects that are constructed and compiled once and not at draw time. Thus, the goal for pipeline state objects is to encapsulate everything that requires code generation. What requires code generation/compilation is different from GPU to GPU, so it is desirable to create a union of all such functions so that general API can be used with a variety of GPU hardware. Exemplary pipeline state objects can encapsulate the following: vertex fetch configuration, vertex shader, fragment shader, blend state, color formats attached to frame buffer, multi-sample mask, depth write enabled state, and rasterization enabled state.
Program instructions and/or a database that represent and embody the described techniques and mechanisms may be stored on a machine-readable storage medium. The program instructions may include machine-readable instructions that when executed by the machine, cause the machine to perform the actions of the techniques described herein.
A machine-readable storage medium may include any storage media accessible by a computer during use to provide instructions and/or data to the computer, and may include multiple instances of a physical medium as if they were a single physical medium. For example, a machine-readable storage medium may include storage media such as magnetic or optical media, e.g., disk (fixed or removable), tape, CD-ROM, DVD-ROM, CD-R, CD-RW, DVD-R, DVD-RW, or Blu-Ray. Storage media may further include volatile or non-volatile memory media such as RAM, ROM, non-volatile memory (e.g., flash memory) accessible via a peripheral interface such as the USB interface, etc. Storage media may include micro-electro-mechanical systems (MEMS), as well as storage media accessible via a communication medium such as a network and/or a wireless link.
It is to be understood that the above description is intended to be illustrative, and not restrictive. For example, the above-described embodiments may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention therefore should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
Number | Date | Country | |
---|---|---|---|
62005131 | May 2014 | US |