Embodiments of the present disclosure relate generally to the field of graphics processing and more specifically to the field of improved shader binary caching and execution for efficient graphics processing.
High level graphics languages (e.g., OpenGL and DirectX) allow applications to specify the execution of particular shaders. Shaders are instruction sets that define how certain pieces of geometry or fragments are processed by a graphics processor. These shader instruction sets can be quite long and detailed in what they do, and often execute millions of times. In order to enable execution of these shaders on graphics processors, a compiler is employed. An exemplary compiler takes an instruction set written with a programming language (e.g., C programming language and other similar programming languages) and compiles the instruction set into a shader binary code or microcode that can be executed on the graphics processor.
One or more of these shader instruction sets may be compiled together to form an entire execution pipeline or program object. An exemplary process by which an application selects shaders and compiles the selected shader sources into binaries and links them together into a program object may require an unbounded amount of time. An exemplary compiler may require an extensive amount of optimization time while compiling and linking intermediate results and/or a final output.
Shader compilation times on mobile hardware can take a significant portion of frame render time. For example, on an application, such as a video game, running at 60 Hz, the frame render time is roughly 16 ms. When the application compiles a handful of shaders, which take on average 3-5 ms to compile on current mobile hardware, the shader compilation time can easily exceed the frame time and unfortunately cause visible stuttering on the screen.
An application attempting to compile shaders during runtime risks frame hitches or a gap of time when rendering stops or slows as shaders are compiled and programs linked together. Such visible stuttering or frame rate hitches are undesirable. Applications may attempt to get around this by compiling in-between runtimes, such that the results of the compile or link are not required for immediate execution. Despite such timing efforts, there are states or contexts that may change in 3D graphics, requiring one or more shaders to be recompiled during runtime. So even if an application attempts to compile and link all required shaders ahead of time (such as in-between levels of a video game) it is still possible for the application to require shader recompiling during runtime.
It would also be difficult for an application vendor to supply a shader binary library that contains all of the possible binaries that would need to be stored so as to avoid compiling. Such an exemplary effort may result in a binary library containing hundreds of thousands of possible shader binaries to supply shader binaries for every possible configuration and shader combination possible. Even so, should unexpected changes occur, the stored binaries would then be out of date. Shader binaries will need to be recreated whenever the graphics hardware, application, or graphics drivers change so that the recompiled shader binaries are updated. In other words, a large binary library will not provide the required flexibility to update the executable shader binaries to reflect any changes to hardware and software.
Embodiments of this present invention provide solutions to the challenges inherent in efficiently compiling shaders during runtime. According to one embodiment of the present invention, a method for compiling a shader for execution by a graphics processor is disclosed. The method comprises selecting a shader for execution. A key is computed for the selected shader. A memory is searched for a copy of the computed key. A shader binary stored in the memory is passed to the graphics processor for execution if the copy of the computed key is located in the memory. Otherwise, the shader is compiled to produce the shader binary for execution by the graphics processor and storing the shader binary in the memory. The shader binary is associated with the computed key and the copy of the computed key.
In a computer system according to one embodiment of the present invention, the computer system comprises a processor, a graphics processor, and a memory. The memory is operable to store instructions, that when executed performs a method for compiling a shader for execution by a graphics processor. The method comprises selecting a shader for execution. A key is computed for the selected shader. A memory is searched for a copy of the computed key. A shader binary stored in the memory is passed to the graphics processor for execution if the copy of the computed key is located in the memory. Otherwise, the shader is compiled to produce the shader binary for execution by the graphics processor and storing the shader binary in the memory. The shader binary is associated with the computed key and the copy of the computed key.
In a computer system according to one embodiment of the present invention, the computer system comprises a compiler, a memory, a graphics driver module, and a graphics processor. The compiler is operable to compile and link shader source code to create a shader binary. The memory is operable to store a plurality of shader binaries. Each shader binary is paired with an associated key. The graphics driver module is operable to select one or more shaders for execution by the graphics processor and to compute a key for a selected shader, and is further operable to search the memory for a copy of the computed key. A shader binary is passed from the memory for execution by the graphics processor if the copy of the computed key is located in the memory. Otherwise, the compiler is operable to compile and link the shader to create a shader binary for execution by the graphics processor and storing the shader binary in the memory. The shader binary is associated with the computed key and the copy of the computed key.
The present invention will be better understood from the following detailed description, taken in conjunction with the accompanying drawing figures in which like reference characters designate like elements and in which:
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the present invention. The drawings showing embodiments of the invention are semi-diagrammatic and not to scale and, particularly, some of the dimensions are for the clarity of presentation and are shown exaggerated in the drawing Figures. Similarly, although the views in the drawings for the ease of description generally show similar orientations, this depiction in the Figures is arbitrary for the most part. Generally, the invention can be operated in any orientation.
Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories and other computer readable media into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices. When a component appears in several embodiments, the use of the same reference numeral signifies that the component is the same component as illustrated in the original embodiment.
This present invention provides a solution to the increasing challenges inherent in compiling and linking shader source code to produce shader binaries at runtime. Various embodiments of the present disclosure provide an exemplary persistent cache memory that stores shader binaries and associated keys. As discussed in detail below, when a graphics driver selects a shader instruction set for execution by a graphics processor, a cache is searched for a key associated with a shader binary for the selected shader instruction set. If the key is found in the memory, then the corresponding shader binary is sent to the graphics processor for execution, otherwise, the shader instruction set is sent to a compiler for compiling and linking to create a shader binary for execution by the graphics processor and storage in the memory.
As discussed herein, some mobile applications, such as a WebGL compatible browser, may compile a significant number of shaders during a single runtime of the application. In one embodiment, each shader binary for a mobile computing system may be 1-2 Kbytes in size. This may cause the persistent cache 108 to fill up. One solution is to track the usage of shader binaries and to delete any old entries based on a least recently used (LRU) cache storage algorithm. Another solution may be to use a coarse ring buffer scheme using a pair of persistent files. For example, when one of the persistent files is filled up, the other persistent file may be truncated and new entries appended to that persistent file.
In one embodiment, shader binaries stored in a persistent cache may be compressed (e.g., RLE compression) beforehand. In one embodiment, the shader binary is compressed immediately before placement into the cache and decompressed immediately after removal from the cache.
As illustrated in
In one embodiment, a cache stores shader binaries and associates each shader binary with a key. In one embodiment a key size and hash function may be used to produce a key that may be chosen such that a probability of collisions is kept extremely low. In one embodiment, a 64 bit key may suffice as a number of possible shaders that a typical application may compile for execution number at the most in the tens of thousands. In one embodiment, an exemplary key is computed using a hash function on a source shader string and the shader arguments that are also passed to the shader compiler. These arguments are computed internally by the graphics driver using the graphics driver's current state. The same shader instruction set may be compiled using different compiler arguments, resulting in multiple key/value pairs being added to the cache as graphic driver states change.
In one embodiment, as discussed herein, the cache 210 may also contain a global key that is computed at runtime based on a current graphics driver version, current compiler version, and other hardware related states. The global key may be computed at graphics driver startup and compared to a global key previously stored in the cache 210. If there is a mismatch between the previously stored global key and the new global key, the cache 210 is out of date and all stored shader binaries are invalidated. When stored shader binaries are invalidated, a shader selected for execution will need to be compiled, even if a copy of the shader binary is stored in the cache 210 (in other words, the stored shader binary is invalid). Such global keys may be used to ensure that only the latest updated shader binaries are used by the application.
As illustrated in
In one embodiment, a checksum may be used to ensure that a shader binary stored in a persistent cache 210 is uncorrupted. As noted herein, comparing a computed key to a key stored in the cache 210 may be used to ensure that the previously stored shader binary associated with the stored key is still valid (that there have not been software or hardware changes) while a checksum is used to ensure that a stored shader binary has not been corrupted due to copy errors, etc. In other words, a key is used to ensure that a desired shader binary selected is the correct one and is up to date, while the checksum is used to ensure that there are no errors in the cached shader binary.
In one embodiment, GLSL shaders used in applications are compiled to produce an intermediate compilation using ARB assembly code, which is compiled itself to produce an executable using shader microcode or binary. In one embodiment, the ARB assembly and the shader microcode or binary are stored in the cache 308. In one exemplary environment, an OpenGL graphics rendering API supports user-supplied ARB assembly programs and fixed-function shading, and these are all cached in the cache 308 for later retrieval.
In one embodiment, the desired shader binary (e.g., an ARB assembly and shader binary used to produce the desired executable) retrieved during a current runtime session was stored in the cache 308 during a previous runtime session. As illustrated in
In one embodiment, as illustrated in
In step 404 of
In step 408 of
In step 410 of
In step 416 of
Although certain preferred embodiments and methods have been disclosed herein, it will be apparent from the foregoing disclosure to those skilled in the art that variations and modifications of such embodiments and methods may be made without departing from the spirit and scope of the invention. It is intended that the invention shall be limited only to the extent required by the appended claims and the rules and principles of applicable law.
This application claims the benefit of Provisional Application No. 61/585,620, filed on Jan. 11, 2012, titled “GRAPHICS PROCESSOR CLOCK SCALING, APPLICATION LOAD TIME IMPROVEMENTS, AND DYNAMICALLY ADJUSTING RESOLUTION OF RENDER BUFFER TO IMPROVE AND STABILIZE FRAME TIMES OF A GRAPHICS PROCESSOR,” by Swaminathan Narayanan, et al., which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61585620 | Jan 2012 | US |