There is a continuing demand for graphics applications that are faster, more realistic, and more detailed than previous graphics applications. Some such demanding graphics applications include, for example, video, mobile computing, gaming, educational, and personal computing applications. Accordingly, there is an accompanying demand and desire to process graphics data faster, with greater detail, and in general, more realistic and in real-time.
Realistic rendering or displaying of three-dimensional (3-D) graphics may be limited by in some instances due to constraints of a processing system and/or a methodology for rendering the 3-D graphics. 3-D graphics may be rendered using pipelined processing to provide different effects such as, for example, textures, Z-buffering, and color blending. However, the pipeline may be slowed, compromised, or impractical for providing realistic 3-D graphics in real-time due to inefficiencies therein.
Thus, there exists a need for a system and method to efficiently process 3-D graphics.
The several embodiments described herein are solely for the purpose of illustration. Embodiments may include any currently or hereafter-known versions of the elements described herein. Various embodiments of the present disclosure will be described in detail. However, such details are included to facilitate understanding of and to describe exemplary embodiments of the present disclosure. In some instances details such as, well-known methods, types of data, protocols, procedures, components, electrical structures and circuits, are not described in detail, or are shown in block diagram form, in order not to obscure embodiments hereof. Furthermore, some embodiments may be described in particular embodiments but may be implemented in hardware, software, firmware, middleware, or a combination thereof. Therefore, persons skilled in the art will recognize from this description that other embodiments may be practiced with various modifications and alterations.
Conventional stencil buffers provide a mask for a scene being rendered on a per-pixel basis. The per-pixel formation and processing of a conventional stencil buffer requires considerable bandwidth (e.g., bus traffic). High costs to system resources, including processing power consumption, processing time, heat generation, and memory allocations, may effectively compromise a graphics processing and rendering operation.
In some embodiments herein, a hierarchical stencil buffer (HSB) is provided to reduce bandwidth requirements for processing graphics. As an initial matter, the HSB is created to accommodate stencil values.
At operation 110, the graphics data is setup for processing by the graphics processing operation 100. Setup logic may take vertice information defining point locations (e.g., x, y, z coordinates) and translate the vertice information to data that may be used in further processing of the graphics. For example, setup operation 110 may include extruding an edge(s) of an object in the scene being rendered. Setup operation 110 may further include operations related to a light source as indicted in the legend of
In some embodiments herein, as a graphics scene is rendered relative to a light source in the graphics scene, a HSB may be created to store stencil values. The stencil values may provide an indication of whether the pixels being rendered are illuminated by the light source in the scene or in shadow relative to the light source. In some embodiments, the HSB writes all pixels that are in shadow relative to the light source to the buffer thereof.
The format of the stencil value may vary or be based on implementation (i.e., format) of the HSB. In some embodiments, the format of the stencil buffer values may be based on a hardware and/or software protocol, or other factors. For example, some hardware and software implementations may use 8 bits for stencil values whereas other systems may use 1 bit. It should be appreciated that the format or protocol for representing the stencil values may vary, while adhering to other aspects of some embodiments herein.
In some embodiments, the creation of the HSB as outlined in
A rasterization operation 215 may render a graphics scene to determine Z (i.e., a depth) values for the objects, surfaces, and areas in the scene. As understood by those skilled in the art, the Z values are used to resolve visibility in the scene. A hierarchical Z buffer 220 may be used to depth values.
At operation 225, a shadow test is performed on the objects, surfaces, and areas in the graphics scene being processed. The shadow test operation 225 operates to avoid performing the shadow test on a per-pixel basis. Performing a shadow test for a graphics scene on a pixel-by-pixel basis may be extremely resource hungry and time consuming. Furthermore, the bandwidth that may be used to make the transfers of information between a processor and a memory, may impact other operations relying on the bus structure. In some embodiments, a reduction in the number of times a processor references a memory device may provide a corresponding reduction in power consumption and heat generation by the processor.
Shadow operation 225 includes, after the HSB is written to memory (e.g., a cache memory, a RAM device, etc.), testing pixels as they are rendered to see if they are in (out) shadow relative to a light source previously used to create the HSB. Since the HSB includes a number of hierarchical levels or representation of the graphics scene, shadow test operation 225 may not need to traverse the entire hierarchical stencil buffer to make a determination of whether a particular pixel is in shadow. For example, shadow test operation 225 may compare a pixel to a 32×32 pixel hierarchical level to see if it is in (out) shadow. If the pixel is in shadow, then there is no need to further traverse the HSB since lower resolution hierarchical levels (e.g., 16×16, 8×8, 4×4) including the pixel will also indicate that the pixel is in shadow. In this manner, a savings in processing power, processing time, and bandwidth utilization may be provided by the HSB, in some embodiments hereof.
In some embodiments, in the event shadow test operation 225 determines the tested pixel is in shadow, an “in shadow” value is associated with the pixel. The “in shadow” value may be passed down the pipeline to assist in other operations and/or provide a tag for the pixel. In some embodiments, some additional information may be passed down pipeline 200 even though the pixel failed shadow test operation 225. The additional information may include, for example, shadow penumbra or an alpha value (i.e., transparency) that may be used in, for example, a blending function to create soft shadows.
In the event shadow test operation 225 determines the tested pixel is not in shadow (i.e., visible), then the pixel is permitted to continue down pipeline 200 for further processing operations. The further processing operation may be used to add texture, color, and other attributes for rendering, for example, a photo-realistic scene.
Those skilled in the art should appreciate that texture operation 250, Z test operation 260, and color blend operation 270 may be implemented in a variety of methods and techniques, without departing from the disclosure and embodiments herein. Each of texture operation 250, Z test operation 260, and color blend operation 270 may be implemented consistent with known texture, Z test, and color blend operations for rendering of graphics. It is noted that texture operation 250, Z test operation 260, and color blend operation 270 may use, store, and reference associated texture data 255, Z-buffer 265, and color buffer 275, respectively.
In some embodiments, Z test operation 260 may take advantage of operating efficiencies afforded by a hierarchical Z-buffer, as understood by those skilled in the art.
Also, presented in
In some embodiments herein, the highest n levels of the HSB may be aligned with the size of cache (i.e., memory) available. It is noted that size of memory referenced here may be taken after subtracting out cache that may be needed for other purposes such as, for example, higher levels of hierarchical z, textures, etc. Also, due the reduced memory requirements that may be afforded by using the HSB in some embodiments herein, numerous stencil tests may be available in local cache, thereby resulting in a significant reduction in bandwidth over a bus.
In some embodiments herein, the HSB is compressed. That is, the values stored in the HSB are in a compressed state. In some embodiments, graphics rendering hardware may be modified to read the stencil value and do a decompression thereof. Also, sending compressed stencil values of the HSB across a computing system bus is another way to reduce memory bandwidth.
In some embodiments, the HSB may not contain a continuous set of hierarchical levels. While the HSB may contain a plurality of hierarchical levels, each one half the size of the one above it, in some embodiments some of the levels of the HSB herein may be eliminated. The elimination of certain HSB hierarchical levels may be based on an optimization of the HSB. Additionally, implementations of the HSB herein are flexible since the size of the HSB levels stored in hardware may vary.
The following is an exemplary outline of a shadow algorithm using a HSB and compression:
As mentioned herein above, the HSB may be compressed to further leverage efficiencies of the HSB gained by, for example, reduced bandwidth requirements. Compression may be used to introduce better memory hierarchy utilization by the HSB hierarchy. In one instantiation hereof, a simple run-length encoding (RLE) scheme for compressing the levels of the HSB hierarchy may be used. Using the algorithm from above, the HSB hierarchy will contain integer (e.g., a byte) values that are either 0 or 1. An escape sequence (e.g., an all 1's byte) may be used to indicate that all 0's or 1's will be compressed. Thus, each byte can include a repeat count in 7 bits with the remaining bit indicating whether the value being repeated is a 0 or a 1. A count of 128 may be prohibited since that particular bit sequences may indicate transitions in and out of the “0's and 1's” only modes. When not in the “0's and 1's” only modes, a high bit may be used to indicate a repeat count followed by the byte to repeat (i.e., non-repeating individual bytes with the high bit set would become two bytes).
What follows is an exemplary coding scheme for compressing an HSB, in accordance with some embodiments herein. For example,
8BIT Mode Compression May be Represented by:
As an example, refer to the image of a sample stencil buffer section in
In this illustrative but simple example, 2-bits per byte could have been used to compress everything down to 16 bytes. However, the example was provided as an illustration of a representative compression, not an exhaustive discussion. Furthermore, it should be appreciated that other compression techniques, methods and protocols may be used in conjunction with the HSB hereof. Other compression schemes may be beneficial depending on the types of data being encoded. For example, encoding may be conducted on a block basis (e.g. 16×16) to get better spatial coherency of the data. The deltas between adjacent values may be computed before compressing so that gradients (e.g. soft shadow falloffs) may be turned into constant values.
In an instance where the HSB contains only values of 1 or 0 as presented in the example here, a compression scheme that packs every 8 values into a byte may be used instead of other compression techniques, methods, and schemes. In some embodiments, the RLE compression scheme (or others) hereof may be done in conjunction with another scheme.
At operation 510, stencil values are stored in the HSB in a compressed state. The compression scheme may vary. The HSB may include stencil values for all pixels not in shadow (or in shadow) relative too the light being evaluated.
In some embodiments, one of a number of devices that may be connected to I/O interface 720 includes a display device 730. Display device 730 may provide a mechanism upon which graphics may be rendered.
Graphics processor 715 may be utilized to perform graphics processing for the processor 705 in order to reduce the workload on processor 705. Moreover, graphics processor 715 may include a rendering engine 735 having a rendering pipeline in accordance with embodiments hereof for a HSB, including a compressed HSB. In some embodiments, graphics processor may not be present or may not be used in a creation and/or usage of a HSB herein. In some embodiments processor 705 may be used, alone or in combination with other devices (e.g., memory 710) to implement some of the embodiments herein.
It should be appreciated that system 700 is only exemplary and that any type of computing device that renders graphics may be utilized in implementing aspects of the invention. In some embodiments,
It should be understood that system 700 may include, in some embodiments, additional, fewer, and alternative components and devices to those depicted in
The following table, Table 2, illustrates a bandwidth reduction that may be obtained using a HSB, in accordance herewith. If it is assumed that that the HSB has a capture rate of 50%, then the bandwidth may be reduced from about 2 GB/s to about 1 GB/s. In the instance a 90% capture rate is assumed, the table shows that only about 2 MB/s bandwidth is required.
Estimates based on our calculations are 50-90% bandwidth reduction per light source. The bandwidth was calculated in the table above, and the 50-90% savings is based on the bandwidth savings obtained using a hierarchical technique for the z-buffer.
The foregoing disclosure has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope set forth in the appended claims.