The present invention relates to graphics processing, and more particularly, to a graphics processing system for performing deferred vertex attribute shading based on split vertex bitstreams and a related graphics processing method.
As known in the art, graphics processing is typically carried out in a pipelined fashion, with multiple pipeline stages operating on the data to generate the final rendering output (e.g., a frame that is displayed). Many graphics processing pipelines now include one or more programmable processing stages, commonly referred to as “shaders”, which execute programs to perform graphics processing operations to generate the desired graphics data. For example, the graphics processing pipeline may include a vertex shader and a pixel (fragment) shader. These shaders are programmable processing stages that may execute shader programs on input data values to generate a desired set of output data values for being further processed by the rest of the graphics pipeline stages. The shaders of the graphics processing pipeline may share programmable processing circuitry, or may be distinct programmable processing units.
For example, the vertex shading operation may include a vertex position shading operation and a vertex attribute shading operation for vertices of primitives in each frame. With regard to a bin-based rendering scheme, there are two choices for shading vertex attributes. One conventional design is to perform the vertex attribute shading at the binning process (i.e., vertex phase (VP) pass) and store the vertex attribute shading results of vertices of all primitives in the frame into a bin memory. Since one vertex attribute shading may be performed for each vertex once, the shading burden may be reduced. However, since the bin memory is needed to store vertex attribute shading results of many vertices, the memory traffic and the memory space requirement is large. In addition, the performance drop may occur in some cases.
The other conventional design is to perform the vertex attribute shading at the rendering process (i.e., pixel phase (PP) pass) after the binning process is done and store the vertex attribute shading results of vertices in an on-chip cache. Since the vertex attribute shading results are stored in the on-chip cache only, the memory traffic and memory space requirement of the bin memory can be reduced. However, since the vertex attribute is shaded on-the-fly when being used, the performance drop may occur due to excessive repeated attribute shading for vertices that may have been shaded before or inefficient SIMD (single input multiple output) execution resulting from insufficient bin vertex count.
Thus, there is a need for an innovative vertex attribute shading design which is capable of avoiding excessive vertex attribute shading results being written to and read from a bin memory without too much loss of the shading performance.
One of the objectives of the claimed invention is to provide a graphics processing system for performing deferred vertex attribute shading based on split vertex bitstreams and a related graphics processing method.
According to a first aspect of the present invention, an exemplary graphics processing system is disclosed. The exemplary graphics processing system includes a first storage device, a second storage device, a vertex position shader, a vertex classification module, and a vertex attribute shader. The vertex position shader is arranged to perform vertex position shading for vertices of primitives in a frame at a binning process. The vertex classification module is arranged to classify the vertices of the primitives in the frame into first-type vertices and second-type vertices according to vertex distribution. The vertex attribute shader is arranged to perform deferred vertex attribute shading for the first-type vertices and the second-type vertices at a rendering process following the binning process, wherein vertex attribute shading results of at least a portion of the first-type vertices classified by the vertex classification module are stored in the second storage device, and vertex attribute shading results of at least a portion of the second-type vertices classified by the vertex classification module are stored in the first storage device.
According to a second aspect of the present invention, an exemplary graphics processing method is disclosed. The exemplary graphics processing method includes: performing vertex position shading for vertices of primitives in a frame at a binning process; classifying the vertices of the primitives in the frame into first-type vertices and second-type vertices according to vertex distribution; and performing deferred vertex attribute shading for the first-type vertices and the second-type vertices at a rendering process following the binning process, wherein vertex attribute shading results of at least a portion of the first-type vertices classified by the classifying step are stored in a second storage device but not a first storage device, and vertex attribute shading results of at least a portion of the second-type vertices classified by the classifying step are stored in the first storage device.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
Certain terms are used throughout the following description and claims, which refer to particular components. As one skilled in the art will appreciate, electronic equipment manufacturers may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not in function. In the following description and in the claims, the terms “include” and “comprise” are used in an open-ended fashion, and thus should be interpreted to mean “include, but not limited to . . . ”. Also, the term “couple” is intended to mean either an indirect or direct electrical connection. Accordingly, if one device is coupled to another device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.
The command and data of a frame are fed into the vertex position shader 112 to undergo vertex position shading. As shown in
Since the vertex attribute shading is deferred to a post-binning pass (i.e., PP pass), the binning process can be done fast and avoid storing varying data (i.e., vertex attribute shading results) of vertices into the first storage device 106. In addition to the vertex shader and the pixel shader, pre-depth (Pre-Z) processing is a feature supported by many GPUs. The Pre-Z processing stage is placed before a pixel shading stage in the pipeline. For example, if the Pre-Z processing stage decides that a primitive is behind a geometry (i.e., the primitive in a screen space is invisible), the primitive can be discarded such that following processing of the primitive can be omitted to save the system resource. Hence, the graphics processing system 100 may be configured to apply hidden surface removal processing (i.e., Pre-Z operation) per bin/tile before the vertex attribute shading begins at the PP pass.
With regard to the vertex classification module 102, it is arranged to classify vertices of primitives in a frame into first-type vertices (e.g., in-tile vertices or local vertices) and second-type vertices (e.g., out-tile vertices or global vertices) according to vertex distribution indicated by the vertex position information generated from the vertex position shader 112, and then store the classification result (i.e., information associated with in-tile vertices and out-tile vertices) into a table (or a list) in the first storage device 106. In this embodiment, the vertex classification module 102 is arranged to use a tile size TS for dividing screen display space (i.e., one frame) into a plurality of tiles each having at least one bin, where each of the first-type vertices (in-tile vertices) classified by the vertex classification module 102 is used by primitive (s) within a single tile of the tiles only, and each of the second-type vertices (out-tile vertices) classified by the vertex classification module 102 is used by primitive(s) across multiple tiles of the tiles. That is, when a primitive covers more than one tile, its associated vertices are classified as second-type vertices (out-tile vertices). Hence, multiple tiles may share vertex attribute shading results (varying data) of the second-type vertices (out-tile vertices) because the multiple tiles share the same primitive.
Concerning the vertex V0, it is used/referenced by a single primitive P0 only. Since the primitive P0 is inside a single tile Tile—0 only, the vertex V0 is classified as a first-type vertex (in-tile vertex). Concerning the vertex V1, it is used/referenced by multiple primitives P0, P1, and P2. Since at least one of the primitives P0, P1, and P2 (i.e., each of primitives P1 and P2) is across multiple tiles Tile—0 and Tile—1, the vertex V1 is classified as a second-type vertex (out-tile vertex). Concerning the vertex V2, it is used/referenced by multiple primitives P0, P2, and P3. Since at least one of the primitives P0, P2, and P3 (i.e., each of primitives P2 and P3) is across multiple tiles Tile—0 and Tile—1, the vertex V2 is classified as a second-type vertex (out-tile vertex). Concerning the vertex V3, it is used/referenced by a single primitive P1 only. Since the primitive P1 is across multiple tiles Tile—0 and Tile—1, the vertex V3 is classified as a second-type vertex (out-tile vertex). Concerning the vertex V4, it is used/referenced by multiple primitives P1, P2, P3, and P4. Since at least one of the primitives P1, P2, P3, and P4 (i.e., each of primitives P1, P2 and P3) is across multiple tiles Tile—0 and Tile—1, the vertex V4 is classified as a second-type vertex (out-tile vertex). Concerning the vertex V5, it is used/referenced by multiple primitives P3 and P4. Since at least one of the primitives P3 and P4 (i.e., primitive P3) is across multiple tiles Tile—0 and Tile—1, the vertex V5 is classified as a second-type vertex (out-tile vertex). Concerning the vertex V6, it is used/referenced by a single primitive P4 only. Since the primitive P4 is inside a single tile Tile—1 only, the vertex V6 is classified as a first-type vertex (in-tile vertex).
The first-type vertices (called in-tile vertices hereinafter) and the second-type vertices (called out-tile vertices hereinafter) are split into two streams of vertices that will be shaded for attributes after the binning process. The proposed deferred vertex attribute shading design employs a split-stream deferred vertex attribute shading scheme, and treats the in-tile vertices and the out-tile vertices differently in two ways. For example, vertex attribute shading results of at least a portion (i.e., part or all) of the in-tile vertices classified by the vertex classification module 102 are stored in the second storage device (e.g., an on-chip cache) 110 but not the first storage device (e.g., an off-chip bin memory) 106, and vertex attribute shading results of at least a portion (i.e., part of all) of the out-tile vertices are stored in the first storage device 106. Since an out-tile vertex is used/referenced by a primitive intersecting with multiple primitive, the vertex attribute shading result of the out-tile vertex stored in the first storage device 106 is calculated and used by the pixel shader 104 when pixel/fragment shading is applied to at least one bin in one tile, and the vertex attribute shading result of the out-tile vertex can be loaded from the first storage device 106 and can be reused by the pixel shader 104 when pixel/fragment shading is applied to at least one bin in another tile. An in-tile vertex is shaded by vertex attribute shading on-the-fly when being used, and a vertex attribute shading result of the in-tile vertex is not written into the first storage device 106 inmost cases, thus saving the memory traffic of the first storage device 106. It should be noted that out-tile vertices are shaded by the vertex attribute shader 114 with lower priority than in-tile ones when they are not for immediate use for the current tile.
As can be seen from
For example, the tile size TS for each frame is adaptively selected based on static determination. That is, the tile size TS for each frame is adaptively selected based on non-frame-adaptive condition(s). The static determination may depend on a screen resolution of an application because the ratio of out-tile vertices to in-tile vertices changes with the screen resolution of the application. Alternatively, the static determination may depend on the number of shader units employed because more shading power employed can afford re-shading and keep less varying data of out-tile vertices going to the first storage device 106.
For another example, the tile size TS for each frame is adaptively selected based on dynamic determination. That is, the tile size TS for each frame is adaptively selected based on frame-adaptive condition(s). The dynamic determination may depend on whether a shader-bound or memory bound status of a frame changes. Alternatively, the dynamic determination may depend on whether an average primitive size of a frame changes.
The binning result generated by the binning module 103 and the classification result generated by the vertex classification module 102 are stored into the first storage device 106. In one exemplary design, the first storage device 106 may be implemented using a bin memory 300 shown in
The VFB 304 is used to store flags of each vertex. For example, the flags of each vertex may include is_shaded, is_in_tile, etc. The flag is_shaded is indicative of a shading status of the vertex. For example, when the flag is_shaded of a vertex is set by “1” (i.e., is_shaded=1 (true)), it means the vertex attribute shading has been done to the vertex; and when the flag is_shaded of the vertex is set by “0” (i.e., is_shaded=0 (false)), it means the vertex attribute shading has not been done to the vertex. Initially, the flag is_shaded of each vertex is set by “0”. The flag is_in_tile is indicative of a vertex type of the vertex. For example, when a vertex is classified as a first-type vertex (in-tile vertex), is_in_tile=1 (true); and when a vertex is classified as a second-type vertex (out-tile vertex), is_in_tile=0 (false).
Please refer to
With regard to the COV buffer 310 shown in
As mentioned above, the vertex attribute shader 114 is arranged to perform deferred vertex attribute shading at the PP pass. In this embodiment, the vertex attribute shader 114 may have a plurality of processing elements to support SIMD (single instruction multiple data) execution. For example, a fixed number of inputs of the same processing job (for example, vertices in the same shader kernel/shader type) are collected together into a wave before being sent to the vertex attribute shader 114 using SIMD architecture such as SIMD-64 architecture or SIMD-32 architecture. When the SIMD-64 architecture is employed by the vertex attribute shader 114, the vertex attribute shader 114 can perform SIMD execution of a wave of 64 vertices in the same shader kernel/shader type. When the SIMD-32 architecture is employed by the vertex attribute shader 114, the vertex attribute shader 114 can perform SIMD execution of a wave of 32 vertices in the same shader kernel/shader type. The vertex packing module 108 is arranged to group/pack un-shaded in-tile vertices and un-shaded out-tile vertices of the same shader kernel/shader type in waves of SIMD execution for the deferred vertex attribute shading at the vertex attribute shader 114.
The present invention proposes several manners to group/pack un-shaded first-type vertices (in-tile vertices) and un-shaded second-type vertices (out-tile vertices) of the same shader kernel/shader type. In accordance with a first grouping/packing manner, the vertex packing module 108 groups at least one un-shaded first-type vertex and at least one un-shaded second-type vertex within a same tile into a wave of SIMD execution.
To improve un-shaded out-tile vertex grouping, the first grouping/packing manner is modified to extend the un-shaded out-tile vertex grouping from a current tile to at least one neighboring tile (e.g., four neighboring tiles). Hence, in accordance with a second grouping/packing manner, the vertex packing module 108 groups at least one un-shaded in-tile vertex within a current tile and un-shaded out-tile vertices within the current tile and at least one neighboring tile into a wave of SIMD execution.
In accordance with a third grouping/packing manner, the vertex packing module 108 groups un-shaded in-tile vertices located only within a same tile into a wave of SIMD execution, and groups un-shaded out-tile vertices located only within a same tile into a wave of SIMD execution.
The vertex attribute shader 114 applies different storage strategies to vertex attribute shading results (i.e., varying data) of the in-tile vertices and vertex attribute shading results (i.e., varying data) of the out-tile vertices. Specifically, the vertex attribute shading results (i.e., varying data) of at least a portion (i.e., part or all) of the in-tile vertices processed by the vertex attribute shader 114 are stored in the second storage device (e.g., on-chip cache) 110 only, while the vertex attribute shading results (i.e., varying data) of all out-tile vertices processed by the vertex attribute shader 114 will be eventually written into the first storage device (e.g., off-chip bin memory) 106.
Consider a case where one tile is composed of multiple bins. When a vertex attribute shading result of an in-tile vertex inside one bin of a tile is held in an on-chip cache and the in-tile vertex is used by a primitive inside the tile, the vertex attribute shading result of the in-tile vertex can be reused when another bin of the tile is processed by the pixel shader 104. In other words, caching vertex attribute shading results of in-tile vertices can enable in-tile reuse. However, when an overflow condition of the second storage device 110 is met, meaning that the second storage device 110 is already full or almost full, vertex attribute shading results (i.e., varying data) of a portion of the in-tile vertices processed by the vertex attribute shader 114 may be overflowed to the first storage device 106 such as the COV buffer 310 shown in
In this embodiment, the vertex attribute shading results of at least a portion of the in-tile vertices and the vertex attribute shading results of at least a portion of the out-tile vertices are held in the on-chip cache, and the vertex attribute shading results of at least a portion of the out-tile vertices are further copied to the bin memory.
When a vertex attribute shading result of a specific vertex (e.g., a non-overflowed in-tile vertex, an out-tile vertex, or an overflowed in-tile vertex) is requested by the pixel shader 104 and a cache hit occurs, the requested vertex attribute shading result of the specific vertex can be read from the on-chip cache 802 without needing any memory traffic of the bin memory 300. However, when a vertex attribute shading result of a specific vertex (e.g., an out-tile vertex or an overflowed in-tile vertex) is requested by the pixel shader 104 and a cache miss occurs, the requested vertex attribute shading result of the specific vertex is not available in the on-chip cache 802, and memory traffic of the bin memory 300 is needed to obtain the requested vertex attribute shading result of the specific vertex.
With regard to the system configuration shown in
In general, each of the first storage device 106 and the second storage device 110 has a limited storage capacity. To buffer vertex attribute shading results of more first-type vertices (in-tile vertices) and second-type vertices (out-tile vertices), a data compression and decompression technique may be employed by a graphics processing system. Hence, vertex attribute shading results of first-type vertices (in-tile vertices) and second-type vertices (out-tile vertices) can be further compressed upon storing and de-compressed upon use.
The de-compressor 904 is arranged to use a decompression algorithm matching the compression algorithm used by the compressor 902. The compressed vertex attribute shading results of the first-type vertices and the second-type vertices are transmitted to the pixel shader 104 through the de-compressor 904. Hence, the de-compressor 904 receives a compressed vertex attribute shading result of a requested vertex from one of the first storage device 106 and the second storage device 110, and outputs a de-compressed vertex attribute shading result of the requested vertex to the pixel shader 104 for pixel/fragment shading.
It is noted that, in an alternative design, the compressor 902 may be placed after the second storage device 110. By this implementation, the vertex attribute shading results of first-type vertices (in-tile vertices) and second-type vertices (out-tile vertices) are stored into the second storage device 110 first, and the compressor 902 employs the compression on the vertex attribute shading results thereafter. Thus, the compressor 902 reads vertex attribute shading results of first-type vertices (in-tile vertices) from the second storage device 110 and outputs the compressed vertex attribute shading results of first-type vertices (in-tile vertices) to the de-compressor 904, and reads vertex attribute shading results of second-type vertices (out-tile vertices) and outputs the compressed vertex attribute shading results of second-type vertices (out-tile vertices) to the first storage device 106.
In the example shown in
As mentioned above, the vertex classification is performed by the vertex classification module 102 at the VP pass, and the classification result is referenced by the vertex attribute shader 114 at the PP pass. In one exemplary design, the vertex attribute shader 114 may be further arranged to tune the performance adaptively at the PP phase by re-classifying first-type vertices (in-tile vertices) originally classified by the vertex classification module 102 as second-type vertices (out-tile vertices) and/or re-classifying second-type vertices (out-tile vertices) originally classified by the vertex classification module 102 as first-type vertices (in-tile vertices).
The vertex attribute shader 114 may check a first predetermined criterion. When some primitives are not shared by many tiles and the shader kernel/shader type is short, the first predetermined criterion is met. In a case where the first predetermined criterion is met, the vertex classification module 102 may optionally re-classify some second-type vertices (out-tile vertices) as first-type vertices (in-tile vertices) at the rendering process (i.e., PP pass) to avoid sending many vertex attribute shading results to the first storage device 106. As long as a flag is shaded of a vertex in the VFB 304 is not set by “1” during the vertex attribute shading of one tile, the vertex attribute shading of another tile will not see the vertex as “shaded” and will shade the vertex again. Hence, this re-classification feature is easier to implement.
In addition, the vertex attribute shader 114 may further check a second predetermined criterion. When there are too many first-type vertices (in-tile vertices) in one tile, the shader kernel/shader type is long, and the first-type vertices (in-tile vertices) are used/referenced by primitives across multiple bins in the tile, the second predetermined criterion is met. Hence, in a case where the second predetermined criterion is met, the vertex classification module 102 may optionally re-classify those first-type vertices (in-tile vertices) in the tile as second-type vertices (out-tile vertices) at the rendering process (i.e., PP pass), thereby enabling reuse of vertex attribute shading results of vertices when the pixel shader 104 processes different bins in the tile.
As mentioned above, vertex attribute shading results of first-type vertices (in-tile vertices) are held in the second storage device 110 at the first priority. When an overflow condition of the second storage device 110 is met, one option is to re-shade certain first-type vertices (in-tile vertices). However, when re-shading for first-type vertices (in-tile vertices) still costs a lot of shading power, the vertex attribute shader 114 may either re-classify the first-type vertices (in-tile vertices) as second-type vertices (out-tile vertices) or may overflow vertex attribute shading results of first-type vertices (in-tile vertices) to an off-chip COV buffer to maximize in-tile reuse. Though such a design comes with a price of memory traffic, it will be limited to certain tiles only.
Alternatively, in-tile vertices and out-tile vertices can be classified as all in-tile vertices or all out-tile vertices under certain scenarios. For example, when the application is severely bound by memory traffic, all of the second-type vertices (out-tile vertices) can be treated as first-type vertices (in-tile vertices) at the rendering process (i.e., PP pass). For another example, when the application consists of big triangles or it is required to have all vertex attribute shading results stored into the bin memory, all of the first-type vertices (in-tile vertices) can be treated as second-type vertices (out-tile vertices) at the rendering process (i.e., PP pass).
In summary, the proposed graphics processing system performs deferred vertex attribute shading operation based on split vertex streams, where vertex attribute shading results (i.e., varying data) of in-tile vertices are held in an on-chip cache most of time, thereby saving the memory traffic of an off-chip bin memory. In addition, when compression on attribute shading results (i.e., varying data) of out-tile vertices is implemented, more saving on the memory traffic of the off-chip bin memory can be achieved. Though storing vertex attribute shading results (i.e., varying data) of in-tile vertices in the on-chip cache may consume a small part of the shading power due to re-shading of in-tile vertices, the performance gain due to saving on the memory traffic would surpass the shading loss.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
This application claims the benefit of U.S. provisional application No. 62/032,632, filed on Aug. 3, 2014 and incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62032632 | Aug 2014 | US |