The present invention is generally related to computer graphics systems.
Recent advances in computer performance have enabled graphic systems to provide more realistic graphical images using personal computers, home video game computers, handheld devices, and the like. In such graphic systems, a number of procedures are executed to “render” or draw graphic primitives to the screen of the system. A “graphic primitive” is a basic component of a graphic picture, such as a vertex, polygon, or the like. Rendered images are formed with combinations of these graphic primitives. Many procedures may be utilized to perform graphics rendering.
Specialized graphics processing units (e.g., GPUs, etc.) have been developed to optimize the computations required in executing the graphics rendering procedures. The GPUs are configured for high-speed operation and typically incorporate one or more rendering pipelines. Each pipeline includes a number of hardware-based functional units that are optimized for high-speed execution of graphics instructions/data, where the instructions/data are fed into the front end of the pipeline and the computed results emerge at the back end of the pipeline. The hardware-based functional units, cache memories, firmware, and the like, of the GPU are optimized to operate on the low-level graphics primitives (e.g., comprising “points”, “lines”, “triangles”, etc.) and produce real-time rendered images.
“Path rendering” is a well-established resolution-independent approach to 2D computer graphics characterized by the specification of graphics objects as paths. A path is a sequence of trajectories and contours. In this context, a trajectory is a connected sequence of path commands. Path commands include line segments, Bézier curve segments, and partial elliptical arcs. Each path command has an associated set of numeric parameters known as path coordinates. When a pair of path coordinates defines a 2D (x,y) location, this pair is a control point. Intuitively a trajectory corresponds to pressing a pen's tip down on paper, dragging it to draw on the paper, and eventually lifting the pen.
A contour is a trajectory with the same start and end point; in other words, a closed trajectory. These contours and trajectories may be convex, self-intersecting, nested in other contours, or may intersect other trajectories/contours in the path. There is generally no bound on the number of path segments or trajectories/contours in a path. For a so-called rendering “primitive”, paths can be quite complex.
Paths are rendered by either filling or stroking the path. Conceptually, path filling corresponds to determining what points (framebuffer sample locations) are logically “inside” the path. Stroking is roughly the region swept out by a fixed-width pen that is centered on the trajectory that travels along the trajectory orthogonal to the trajectory's tangent direction.
Salient features of path rendering systems include the ability to both fill and stroke paths (see
GPUs can greatly accelerate the rendering of paths with a “stencil, then cover” method for path rendering. This method renders a path in two steps: first, stenciling the path's filled or stroked coverage into a stencil buffer; second, covering the path's filled or stroked coverage with a conservatively rasterized region. Normally each color sample has a single corresponding stencil sample. In order to minimize aliasing artifacts in the path rendering process, it is highly desirable to determine rasterized path coverage at multiple sub-pixel locations within a pixel. In order to achieve any acceptable level of quality for path rendering, path rasterization typically should test 8 or more sub-pixel locations per pixel.
“Stencil, then cover” methods can improve quality within a reasonable memory budget by maintaining a sufficient plurality of stencil samples per pixel and performing the rasterization for the stenciling step at all these sub-pixel locations. Existing anti-aliasing methods use supersampling or multi-sampling to maintain a one-to-one mapping of a color samples to a corresponding stencil sample and rasterization sample location. Such approaches are expensive because color samples are often 32-bit (or larger) values while stencil samples are often 8-bit (or smaller) values so adding additional stencil samples requires a substantially greater amount of storage and processing associated with color. There is a substantial processing, data storage, and power consumption cost associated with maintaining a corresponding color sample for every coverage location tested during the stenciling of paths. For this reason, it is advantageous to associate a single color sample with multiple stencil samples and their corresponding rasterization locations. Color samples can then be updated during the cover step based on the aggregate result of their associated plurality of stencil sample coverage determinations.
For example, if the GPU associates 4 stencil samples (and corresponding rasterization locations) with 1 color sample and maintains 4 color samples per pixel, the effective number of stencil samples per pixel is 16. In this configuration, the stencil step for “stencil, then cover” operates at 16 stencil samples per pixel while shading and blending during the “cover” step operates at 4 color samples per pixel. During the cover step, a fractional stencil test result is computed based on the 4 Boolean stencil test results for each stencil sample corresponding to the color sample. For example, if three of the four samples passed the stencil test during the cover step, the fractional stencil test result for the color sample would be 75% and the shaded color for the color sample should be modulated by 75% prior to blending and update of the color sample.
A problem with this approach arises when the cover geometry is broken up into triangles for efficient hardware rasterization. The cover geometry is typically represented as a rectangle or polygonal convex hull conservatively bounding the path's region. GPU rasterizers typically rasterize arbitrary rectangle or convex polygons as two or more triangles to simplify and regularize the rasterization process. While logically the covering geometry is a single rectangle or polygonal convex hull, breaking the covering geometry into triangles creates internal seams or edges. This creates a situation where stencil samples and their associated rasterization locations associated with a single color sample may belong to different triangle primitives, forcing the associated color sample to “straddle” two (or more) triangle primitives. In this case, the color sample may be blended by the partial coverage of each primitive. This will lead to visible blending artifacts at these interior seams of the covering geometry. For example, if triangle M covers 2 stencil samples associated with color sample A while triangle N covers the other 2 stencil samples associated with color sample A and all the samples pass the stencil, the color sample would be blended twice, each time modulated by 50% when the correct behavior would be to modulate by 100% and blended once. Two blends modulated by 50% would have a different result. In general, any color samples straddling stencil samples belonging to different triangles of the covering geometry will be prone to incorrect blending.
The present invention addresses implementation and quality ramifications that result from extending the “stencil, then cover” approach to rasterization scenarios where a single color sample is associated with a plurality of stencil values and coverage locations.
A related problem is efficiently generating covering geometry for the cover step that guarantees it conservatively covers all the samples generated during the stencil step of “stencil, then cover” path rendering. Existing methods require the computation, storage, and rendering of polygonal bounding regions.
A method to avoid color samples straddling interior seams of covering geometry and that guarantees conservative covering geometry without explicit polygonal bounding regions is therefore desirable.
In one embodiment, the present invention is implemented as a GPU acceleration method for rendering paths. The method includes accessing data comprising a path, stenciling the path, wherein a bounding region of a plurality of stencil samples updated during the stenciling is accumulated, and provoking GPU hardware to produce a rasterized region for covering the bounding region as one object without interior seams. In the preferred embodiment, the bounding region is maintained as an image-space bounding box because of the ease of accumulating a conservative bounding box during rasterization and the ease of rendering such a rectangle without interior seams. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced with alternative bounding region representations.
The foregoing is a summary and thus contains, by necessity, simplifications, generalizations and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments of the present invention.
Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of non-transitory electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer readable storage medium of a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of a computer system (e.g., computer system 800 of
Embodiments of the present invention implement a new rasterization mode primarily useful for path rendering. In conventional rasterization, primitives (such as triangles) are rasterized one at a time, and the rasterization process first determines which “coarse” tiles may be updated by the triangle and then for each coarse tile determines exactly which samples are covered by the triangle. For conventional 3D rendering that's almost always the desired result, however GPU-acceleration of path rendering is implemented as a two-step process where the first step generates “coverage” information in one buffer and the second step shades all covered samples. The second step typically resets the coverage information determined in the first step so the “stencil, then cover” process can be repeated for additional paths to render.
When each color sample has a single associated stencil sample and coverage location, the “stencil, then cover” process 200 operates correctly in the presence of internal seams 321 or 322. However the present invention addresses the case when a color sample is associated with a plurality of stencil values and coverage locations.
When path rendering is implemented with a “Multi-Stencil” optimization, there are more coverage samples in the stencil buffer than in the color buffer. During the Cover step, the coverage information is converted into opacity stored in the alpha channel and the color is modulated by this opacity. If the cover geometry is rasterized as a sequence of triangles, then some pixels on the edge of the triangle may be partially covered by multiple triangles, and the result of this blending will produce artifacts. So embodiments of this invention are required to produce correct cover results with multi-stencil, particularly when the path is subject to arbitrary transformations on the GPU.
Each color sample illustrates a different situation. Color samples A (411) and D (414) straddle the internal seam 405 while color samples B (412) and C (413) do not straddle.
Color sample C (413) has all of its stencil samples covered by the path (shown as solid dots enclosed by 413's circle) and is completely within the lower triangle of bounding box 401. This means the color sample is 100% covered will be rasterized by a single covering triangle primitive. Color sample B (412) is similar in that all its coverage samples are completely within the upper triangle of bounding box 401, but only 75% of its stencil samples are covered by rasterized path 410. For both color samples 413 and 412 all the stencil samples for the color samples are rasterized by a single covering triangle primitive (the upper triangle for color sample 412 and lower triangle for color sample 413).
Path rasterization artifacts however arise in the case of color samples A and D (411 and 414) because these color samples have associated stencil values and coverage locations that straddle the internal seam 405. The source of these path rasterization artifacts is that the coverage of the straddling color samples is “split” between two different triangles making up bounding box 401.
Concentrating on color sample A 411, its stencil samples 431 are rasterized by the upper triangle while the stencil sample 432 is rasterized by the lower triangle. Color sample 411 is completely within the path 410 and so 100% covered but it will be 75% covered by the upper triangle and 25% covered by the lower triangle. The color of the color sample will therefore be updated twice, once with 25% coverage by the lower triangle and once with 75% coverage by the lower triangle.
The implementation of these two updates is examined in further detail. Assuming the lower triangle is rasterized first, when rendered this way, the 25% coverage update takes 25% of the path color and 75% of the background color. Then when the upper triangle is rasterized, 75% of the path color is combined with 25% of the prior sample color. Because the prior sample color includes 75% of the background color, some of the background color will “leak” into the coloring of rasterized path 410 along internal seam 405. However this should not be the case when, despite internal seam 405, the coverage of color sample 411 is 100%, meaning no background color should be present. This artifact will happen whenever an internal seam splits the stencil sample coverage for a color sample. This is undesirable and the present invention, as will be detailed, avoids this type of path rendering artifact, thereby avoiding compromised path rendering quality. Moreover path rendering standards generally disallow blending color samples more than once when rendering a path.
Concentrating on color sample D 414, a similar but different situation is shown. In this case, two stencil samples fall on either side of the internal seam 405. While the top triangle rasterizes with just 25% of the samples covered because one stencil sample 423 is uncovered by the path, the bottom triangle rasterized with 50% of the stencil samples covered. While the specific situation is different from color sample A 411, the underlying situation of the internal seam 405 “splitting” the coverage remains. Again, this leads to compromised path rendering quality.
The cover step in “stencil, then cover” process 500 is performed by operation 505. This operation is analogous to path covering 204 in the prior art, except rather than rasterize the bounding representation as a set of triangle primitives, which leads to internal seams 321, 322, and 405, the bounding region is rasterized directly. Skilled practitioners of the art will recognize that an image-space aligned bounding box is readily rasterized as a screen-space aligned rectangle without internal seams rather than as two (or more) triangles.
In this manner embodiments of the present invention provide a means to GPU-accelerate existing path rendering standards. Implementations of existing standards can re-target their path rendering to benefit from the substantial quality and performance benefits of GPU acceleration. For example, a web browser could be implemented to render SVG through GPU acceleration to achieve immersive web experiences at higher resolutions and quality levels than possible with only CPU-based path rendering.
In alternate embodiments, the accumulated bounding box could be any conservative bounding region representation of the “coverage” step's coverage. For example, a GPU with multiple rasterizer, each owning different sub-regions of the frame buffer might keep a bounding box per rasterizer. As another example, the rasterizer might keep a list of unique coarse tiles visited during the “coverage” step. Such alternative conservative bounding regions may provide tighter bounds than a simple bounding box. This has the advantage of allowing the second shading step to examine, shade, and update potentially fewer pixels.
Operation 503 may be split into two operations, where stenciling the path 203 is separate from collecting the bounding region. Sometimes the “cover” step is not performed immediately after the “stencil” step, so the bounding region must be regenerated. In this case, any geometry that bounds the stencil geometry should produce a sufficient bounding region, although possibly more conservative than necessary.
Skilled practitioners will recognize that the rasterization functionality in
Several advantages of the present invention are now discussed. The conservative nature of this rasterization is also beneficial when implementing “shared edge” support for path rendering. In this case, cover geometry constructed from rasterizing triangles may not be sufficient to cover samples along a shared outer edge of the path, but a bounding box bloated to a coarse raster tile will cover all samples.
Another benefit is guaranteeing each and every sample visited during the “stencil” step will be shaded during the “cover” step. This provides a level of robustness that is difficult to guarantee if object-space covering geometry is transformed instead. Under extreme projective transform, it is difficult to numerically guarantee that every sample visited during the “coverage” step is necessarily visited during the shading step because different geometry is being rendered for the cover geometry in the shading step than during the prior “coverage” step. Embodiments of the present invention however provide a robust guarantee because the shading step's geometry must reflect coverage determined in the “coverage” step.
A conservative rasterization that always fully covers coarse raster tiles is also beneficial to stencil compression because it allows the GPU's rasterization and depth-stencil processing (or ZROP) units to easily conclude that a fully covered tile is trivially re-compressible. This can increase fill rate and reduce memory bandwidth and power consumption without requiring complex logic and speculative memory reads to determine whether a partially covered tile is re-compressible.
The conservative bounding box used in this invention can either be constructed from a set of vertices provoked explicitly for this purpose, or can be reused and inherited from geometry used in the stencil step. If it can be reused from the stencil step, then in some cases the entire cover step can be executed without executing any vertex transformation work on the GPU, which may lead to significant power savings. If the cover geometry has complex shading requiring interpolated attributes, then it may be desirable to provoke geometry for the bounding box that includes attributes from which the setup unit can derive plane equations. The rasterized bounding box only requires a single set of plane equations per attribute (as opposed to per triangle), so the GPU has freedom in choosing which plane equations to select.
It should be appreciated that the GPU 830 can be implemented as a discrete component, a discrete graphics card designed to couple to the computer system 800 via a connector (e.g., AGP slot, PCI-Express slot, etc.), a discrete integrated circuit die (e.g., mounted directly on a motherboard), or as an integrated GPU included within the integrated circuit die of a computer system chipset component (not shown). Additionally, a local graphics memory 820 can be included for the GPU 830 for high bandwidth graphics data storage.
The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.