The present invention is generally related to hardware accelerated graphics computer systems.
Recent advances in computer performance have enabled graphic systems to provide more realistic graphical images using personal computers, home video game computers, handheld devices, and the like. In such graphic systems, a number of procedures are executed to “render” or draw graphic primitives to the screen of the system. A “graphic primitive” is a basic component of a graphic picture, such as a point, line, polygon, or the like. Rendered images are formed with combinations of these graphic primitives. Many procedures may be utilized to perform 3-D graphics rendering.
Specialized graphics processing units (e.g., GPUs, etc.) have been developed to optimize the computations required in executing the graphics rendering procedures. The GPUs are configured for high-speed operation and typically incorporate one or more rendering pipelines. Each pipeline includes a number of hardware-based functional units that are optimized for high-speed execution of graphics instructions/data, where the instructions/data are fed into the front end of the pipeline and the computed results emerge at the back end of the pipeline. The hardware-based functional units, cache memories, firmware, and the like, of the GPU are optimized to operate on the low-level graphics primitives (e.g., comprising “points”, “lines”, “triangles”, etc.) and produce real-time rendered 3-D images.
The real-time rendered 3-D images are generated using raster display technology. Raster display technology is widely used in computer graphics systems, and generally refers to the mechanism by which the grid of multiple pixels comprising an image are influenced by the graphics primitives. For each primitive, a typical rasterization system generally steps from pixel to pixel and determines whether or not to “render,” or write a given pixel into a frame buffer or pixel map, as per the contribution of the primitive. This, in turn, determines how to write the data to the display buffer representing each pixel.
Various traversal algorithms and various rasterization methods have been developed for computing from a graphics primitive based description to a pixel based description (e.g., rasterizing pixel to pixel per primitive) in a way such that all pixels within the primitives comprising a given 3-D scene are covered. For example, some solutions involve generating the pixels in a unidirectional manner. Such traditional unidirectional solutions involve generating the pixels row-by-row in a constant direction. This requires that the sequence shift across the primitive to a starting location on a first side of the primitive upon finishing at a location on an opposite side of the primitive.
Other traditional methods involve stepping pixels in a local region following a space filling curve such as a Hilbert curve. The coverage for each pixel is evaluated to determine if the pixel is inside the primitive being rasterized. This technique does not have the large shifts (which can cause inefficiency in the system) of the unidirectional solutions, but is typically more complicated to design than the unidirectional solution.
Once the primitives are rasterized into their constituent pixels, these pixels are then processed in pipeline stages subsequent to the rasterization stage where the rendering operations are performed. Typically, these rendering operations involve reading the results of prior rendering for a given pixel from the frame buffer, modifying the results based on the current operation, and writing the new values back to the frame buffer. For example, to determine if a particular pixel is visible, the distance from the pixel to the camera is often used. The distance for the current pixel is compared to the closest previous pixel from the frame buffer, and if the current pixel is visible, then the distance for the current pixel is written to the frame buffer for comparison with future pixels. Similarly, rendering operations that assign a color to a pixel often blend the color with the color that resulted from previous rendering operations. Generally, rendering operations assign a color to each of the pixels of a display in accordance with the degree of coverage of the primitives comprising a scene. The per pixel color is also determined in accordance with texture map information that is assigned to the primitives, lighting information, and the like.
A problem exists however with the ability of prior art 3-D rendering architectures to scale to handle the increasingly complex 3-D scenes of today's applications. Many of these applications require the ability to efficiently implement complex screen coloring and rendering effects in real-time, such as, for example, complex OpenVG (Open Vector Graphics) screen effects. Additional applications require the ability to accurately draw complex objects or characters in real-time whether the character will ultimately result on screen or not, and the ability to accurately simulate “lens flare” type effects for bright light sources.
With computer screens now commonly having screen resolutions of 1920×1200 pixels or larger, traditional methods of increasing 3-D rendering performance to handle increasingly demanding applications are problematic. For example, increasing clock speed to improve performance has negative side effects, such as increasing power consumption and increasing the heat produced by the GPU integrated circuit die. Other methods for increasing performance, such as incorporating large numbers of parallel execution units for parallel execution of GPU operations have negative side effects such as increasing integrated circuit die size, decreasing yield of the GPU manufacturing process, increasing power requirements, and the like.
Thus, a need exists for a rasterization process that can scale as graphics application needs require and provide added performance without incurring penalties such as increased power consumption and/or reduced fabrication yield.
Embodiments of the present invention provide a method and system for a rasterization process that can scale as graphics application needs require and provide added performance without incurring penalties such as increased power consumption and/or reduced fabrication yield.
In one embodiment, the present invention is implemented as a method for bounding region accumulation for graphics rendering. The method is implemented within a raster unit of a GPU (e.g., graphics processor unit). The method includes receiving a plurality of graphics primitives (e.g., triangles) for rasterization in a raster stage of a graphics processor and rasterizing the primitives to generate a plurality of pixels related to the primitives and a plurality of respective bounding regions related to the primitives. Upon receiving an accumulation start command (e.g., from a graphics driver), the bounding regions are accumulated in an accumulation register coupled within a raster unit of the GPU. The accumulation continues until an accumulation stop command is received. The operation results in an accumulated bounding region. Access to the accumulated bounding region is enabled to facilitate a subsequent graphics rendering operation.
In one embodiment, the accumulated bounding box has a left limit related to a leftmost one of the respective bounding boxes, a right limit related to a rightmost one of the respective bounding boxes, an upper limit related to an uppermost one of the respective bounding boxes, and a lower limit related to a lowermost one of the respective bounding boxes. In one embodiment, the bounding regions are bounding boxes. In one embodiment, an accumulated bounding box can be accessed by the raster unit to perform a scissoring operation on a subsequently received graphics primitive, and the scissoring operation can be configured to implement an OpenVG paint application operation.
In one embodiment, the accumulated bounding box can be accessed by the raster unit to perform a pre-rendering operation on a stream of subsequently received graphics primitives (e.g., primitives comprising a complex character or object). The pre-rendering operation can be used to determine whether the object resulting from the stream of primitives will ultimately appear on a display. In one embodiment, wherein the pre-rendering operation is configured to determine a screen area size of an object resulting from stream of subsequently received graphics primitives. The screen area size can be used to implement a camera lens flare effect on a display.
In this manner, embodiments of the present invention efficiently implement complex screen coloring and rendering effects in real-time, such as, for example, complex OpenVG (Open Vector Graphics) screen effects and “lens flare” type effects for bright light sources. Additionally, embodiments of the present invention can enhance real-time rendering performance by quickly and accurately determining whether complex objects or characters will ultimately be rendered on screen or not, thereby saving valuable computing cycles from being wasted. These attributes facilitate a rasterization process that can scale without resorting to prior art half measures (e.g., clock speed over-increase) which increase power consumption and/or reduce fabrication yield.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments of the present invention.
Notation and Nomenclature:
Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of a computer system (e.g., computer system 100 of
Computer System Platform:
It should be appreciated that the GPU 110 can be implemented as a discrete component, a discrete graphics card designed to couple to the computer system 100 via a connector (e.g., AGP slot, PCI-Express slot, etc.), a discrete integrated circuit die (e.g., mounted directly on a motherboard), or as an integrated GPU included within the integrated circuit die of a computer system chipset component (not shown). Additionally, a local graphics memory 114 can be included for the GPU 110 for high bandwidth graphics data storage.
Embodiments of the present invention implement a method and system for bounding box accumulation for graphics rendering. The method is implemented within a raster unit of a GPU (e.g., GPU 110). The method includes receiving a plurality of graphics primitives (e.g., triangles) for rasterization in a raster unit of a GPU and rasterizing the graphics primitives to generate a plurality of pixels related to the graphics primitives and a plurality of respective bounding boxes related to the graphics primitives. Upon receiving an accumulation start command (e.g., from a graphics driver executing on the CPU 101), the bounding boxes are accumulated in an accumulation register coupled within the raster unit. The accumulation continues until an accumulation stop command is received (e.g., from the graphics driver). The operation results in the production of an accumulated bounding box that bounds the region occupied by all of the primitives received between the start command in the stop command. Access to the accumulated bounding box is enabled to facilitate a subsequent graphics rendering operation. Embodiments of the present invention and their benefits are further described below.
In one embodiment, as depicted in
Thus, as depicted in
The boustrophedonic pattern has advantages for maintaining a cache of relevant data and reducing the memory requests required for frame buffer and texture access. For example, generating pixels that are near recently generated pixels is important when recent groups of pixels and/or their corresponding texture values are kept in memories of a limited size (e.g., cache memories, etc.). Additional details regarding boustrophedonic pattern rasterization can be found in U.S. Patent Application “A GPU HAVING RASTER COMPONENTS CONFIGURED FOR USING NESTED BOUSTROPHEDONIC PATTERNS TO TRAVERSE SCREEN AREAS” by Franklin C. Crow et al., Ser. No. 11/304,904, filed on Dec. 15, 2005, which is incorporated herein in its entirety.
It should be noted that although embodiments of the present invention are described in the context of boustrophedonic rasterization, other types of rasterization patterns can be used. For example, the algorithms and GPU stages described herein for rasterizing tile groups can be readily applied to traditional left-to-right, line-by-line rasterization patterns.
As depicted in
In one embodiment, the present invention utilizes an accumulation start command from a graphics driver executing on the CPU 101 in order to begin accumulating the bounding boxes. Each of the primitives that make up the objects 501 and 502 have their respective corresponding bounding boxes (e.g., bounding box 401) as described above. Additionally, as described above, each bounding box corresponds to the outer limits of its respective primitive. Upon receiving the start accumulating command, the bounding boxes for the primitives comprising the objects 501 and 502 are accumulated by using an accumulation register.
As each of the primitives comprising the objects 501 and 502 is rasterized and processed, the outer limit of the outermost bounding box that is processed is remembered such that the accumulated bounding box 510 is generated. As shown in
In this manner, the accumulated bounding box 510 can be thought of as a bounding box that precisely bounds all of the primitives that have been processed between the accumulation start command and the accumulation stop command. Thus for example, as shown in
In one embodiment, the accumulated bounding box is configured to utilize a depth component in addition to the two-dimensional left-right upper-lower components. For example, in such an embodiment, a three-dimensional accumulated bounding box has a nearest extent and a farthest extent, in addition to rightmost, topmost, leftmost and bottommost extents. The outer boundaries of the three-dimensional accumulated bounding box is built up out of the respective three-dimensional bounding boxes of the primitives rendered between the start time and the stop time. As such, the use and generation of this 3D accumulated bounding box is analogous to the 2D accumulated bounding box described above.
It should be noted that although embodiments of the present invention are described herein with respect to the term “bounding boxes”, other regions besides boxes can be used to implement the accumulating functionality described above. Accordingly, the objective would be to accumulate a bounding region within the accumulation register, which could be a number of different polygon types, of which a bounding box is one example. For example, a bounding region employing six points (e.g., hexagon, etc.) as opposed to four points (e.g., square, rectangle, quadrilateral, etc.) can be used to accumulate the bounding regions, bounding boxes, or the like for each of the primitives rendered between the start time and the stop time. Furthermore, an axis-aligned irregular octagon or a coarse bitmap representation (e.g., which is not a polygon) are additional examples that can be used to define the bounding region. Similarly, a 3D accumulated bounding region is not limited to a cubic representation, or the like, and can be, for example, a coarse bitmap representation of irregular shape orientation, or the like. Such implementations are within the scope of the present invention.
Embodiments of the present invention enable access to the accumulated bounding box to facilitate one or more subsequent graphics rendering operations. This is illustrated in display screen 600 by the rendered objects 601 and 602. The rendered objects 601 and 602 result from, in this case, an OpenVG paint application operation. In the
In this manner, embodiments of the present invention provides a quick and efficient way to create a scissor box (e.g., use the accumulated bounding box 510 as a scissor box) to do a “spray paint” application similar to stenciling. The accumulated bounding box 510 prevents rendering of colors into unnecessary screen areas (e.g., outside the stencil). This is very helpful in those applications where it is very expensive to apply colors to a stencil out area. For example, OpenVG is a graphics programming interface that can allow the application of elaborate paints to the stenciled area (e.g., plaids, checkerboard paints, pre-rendered textures, etc.). Such OpenVG painting can be quite expensive in terms of both compute cycles and power consumption when such elaborate paints are applied on a per pixel basis. Since the process can be quite expensive, being able to apply the paint only in the limited scissor box area (e.g., accumulated bounding box 510) saves compute cycles and power. This allows us to apply shader effects into a smaller area (the acutely bounding box) in comparison to the screen area.
Another application of an accumulated bounding box in accordance with embodiments of the present invention, is to use an accumulated bounding box in conjunction with a preview render. A preview render describes a case where one or more complex objects can be quickly rendered at a much lower geometric resolution in order to make intelligent decisions with regard to how the subsequent full resolution rendering is to be handled. In one embodiment, the primitives comprising the one or more complex objects are quickly rendered in rough geometric detail and the corresponding accumulated bounding box is generated. The accumulated bounding box can be quickly examined to determine whether or not the object(s) will actually appear on screen (e.g., whether any of the accumulated bounding box intersects the screen, drawing window, or the like). If the accumulated bounding box shows no intersection with the screen, window, or the like, or shows that the object(s) are depth occluded, the graphics rendering process can completely skip rendering the complex object(s), and consequently save a large number of computer cycles and power expenditure. If the accumulated bounding box shows the complex object(s) will intersect the screen, then the object is rendered in full geometric detail.
Another exemplary application of an accumulated bounding box is to provide a way to determine the on-screen size of a rendered object. An accumulated bounding box for an object can be used to determine how big an object will appear on screen, or how much screen area an object will consume, taking into account any depth occlusion. This can be used to determine how certain rendering effects can be implemented. For example, in a case where a stream of primitives are used to model a bright light source (e.g., like the sun), data regarding the screen area that the light source will consume (e.g., as provided by an accumulated bounding box for the light source) can be used to implement camera lens flare for the light source.
Another exemplary application of an accumulated bounding box is depth peeling. In one embodiment, a 3D accumulated bounding region is used to implement a depth peeling function for a rendered object. In such an implementation, and accumulated 3D bounding region can be used to scissor a stream of primitives being rendered such that primitives that are rendered outside the 3-D bounding region (e.g., primitives nearer than the near limit of the 3D bounding region) can be removed, or peeled away, analogous to the manner in which layers of an onion are peeled away to reveal the underlying visible layer. Accordingly, this functionality is referred to as depth peeling.
Depth peeling is particularly useful in those applications where complex objects are built up of many parts, for example, from a core, to intermediate layer, to an outer layer (e.g., an internal combustion engine, a model of the human body, etc). For example, during an initial rendering of the complex object, the start accumulation and stop accumulation commands can be issued to create a 3D bounding region. This 3D bounding region has depth limits and X Y limits such that the bounding region is smaller than the overall complex object. The 3D bounding region can them be used to scissor a subsequent rendering of the complex object. The scissor operation can remove outer layer from the rendering so that the inner layer is visible (e.g., the engine without the engine block, the human body without the skin, etc.).
Additional details regarding depth peeling and applications thereof can be found in U.S. Pat. No. 6,989,840 “ORDER INDEPENDENT TRANSPARENCY RENDERING SYSTEM AND METHOD”, by Everitt et al., and U.S. Pat. No. 6,744,433 “SYSTEM AND METHOD FOR USING AND COLLECTING INFORMATION FROM A PLURALITY OF DEPTH LAYERS”, by Bastos et al., which are both incorporated herein in their entirety.
The
The raster unit 702 includes a coarse raster component 703 and a fine raster component 704. The coarse raster component 703 implements a coarse rasterization, as it rapidly searches a grid of tiles to identify tiles of interest (e.g., tiles that are covered by a primitive). Each tile comprises a group of pixels (2×2, 4×4, 8×8, 16×16, etc.). The coarse raster searches by these large groups, in contrast to individual pixels, in order to quickly traverse a screen area. Once the tiles of interest are identified, the fine raster component 704 individually identifies the pixels that are covered by the primitive. Hence, in such an embodiment, the coarse raster component 703 rapidly searches a grid of pixels by using tiles, and the fine raster component 704 uses the information generated by the coarse raster component 703 and implements fine granularity rasterization by individually identifying pixels covered by the primitive.
The pixel test component 705 receives the pixels from the fine raster component 704. The pixel test component 705 is configured to determine whether some portion of the primitive being rendered will not be viewed. The pixel test component 705 implements a plurality of pixel test operations which can reduce the scope of the pixel coverage (e.g., reduce the number of pixels turned on by a coverage mask). The pixel test operations include, for example, depth tests, stencil tests, window ID tests, and the like. These pixel tests are implemented to determine whether one or more pixels, or even all of the pixels, related to the primitive will be turned off. The surviving pixels are then transmitted onward to the de-serializer 706. For example, in a case where no pixels related to the primitive survive (e.g., such as when the primitive is completely occluded), those pixels are discarded, and those pixels do not result in any bounding box accumulation for that primitive.
The de-serializer 706 functions by unwinding pixel groups received from the fine raster component 704 in the pixel test component 705 into individual pixels that are then transferred on a one pixel per clock basis to the shader unit 707. The de-serializer 706 also functions as the link to an accumulation register 721 that accumulates the bounding boxes for graphics primitives as described above.
The shader unit 707 performs pixel shader processing for each of the pixels received from the de-serializer 706. The shader unit 507 operates on the pixels in accordance with the parameters iterated across each of the pixels. Once the shader unit 707 completes operation on a pixel, the pixel is transmitted to render operations unit 708. Render operations unit 708 performs back end rendering operations on the pixels received from the shader unit 707, and writes the completed pixels to frame buffer 114.
It should be noted that in the
Additionally, it should be noted that although the raster unit 702 is shown with two scissor registers 725 and 726, embodiments of the present invention can be implemented with a single scissor register that would be configured to provide both the custom scissoring functionality as described above and OpenGL compliant scissoring functionality.
Referring now to
Process 800 begins in step 801, where a raster unit of the GPU (e.g., raster unit 702) receives a plurality of graphics primitives (e.g., triangles) for rasterization. In step 802, the primitives are rasterized to generate a plurality of pixels related to the primitives and a plurality of respective bounding regions related to the primitives. In step 803, an accumulation start command is received from a graphics driver. In step 804, the bounding regions are accumulated in an accumulation register 721 coupled within the raster unit 702 of the GPU 110. In step 805, an accumulation stop command is received from the graphics driver, thus defining the limits of the accumulated bounding region. In step 806, access to the accumulated bounding region is enabled to facilitate one or more subsequent graphics rendering operations, as described above.
The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
5010515 | Torborg, Jr. | Apr 1991 | A |
5313287 | Barton | May 1994 | A |
5452104 | Lee | Sep 1995 | A |
6501564 | Schramm et al. | Dec 2002 | B1 |
6744433 | Bastos et al. | Jun 2004 | B1 |
6956579 | Diard et al. | Oct 2005 | B1 |
6989840 | Everitt et al. | Jan 2006 | B1 |
7154066 | Talwar et al. | Dec 2006 | B2 |
7184040 | Tzvetkov | Feb 2007 | B1 |
7289119 | Heirich et al. | Oct 2007 | B2 |
20010005209 | Lindholm et al. | Jun 2001 | A1 |
20040085313 | Moreton et al. | May 2004 | A1 |
20050225670 | Wexler et al. | Oct 2005 | A1 |
20060245001 | Lee et al. | Nov 2006 | A1 |
20060267981 | Naoi | Nov 2006 | A1 |
20060274070 | Herman et al. | Dec 2006 | A1 |
20080118148 | Jiao et al. | May 2008 | A1 |