This relates generally to computers and, particularly, to graphics processing.
There is a shift in technology in displays, and 3D displays and 3D TV are becoming mainstream, while 3D in cinemas is already widespread around the world. The next Nintendo handheld gaming device, the Nintendo 3DS, will have an autostereoscopic display. Furthermore, public stereo TV was launched in January 2010 in South Korea, and advertising companies are using 3D displays more and more. All in all, it is clear that 3D displays are a hot area, but there are very few specialized algorithms for 3D graphics for such displays.
An optimized rasterization algorithm may be used for stereoscopic and 3D multi-view graphics. The algorithm is based on analytical computations, which is in contrast to standard rasterization for multi-view graphics which uses either accumulation buffering-like techniques or stochastic rasterization. In order to render real-time graphics for stereo or multi-view displays, rather high quality is desirable, especially for objects that are out of focus. Current solutions do not solve this in a good way, because reaching high quality in these difficult regions is very costly in terms of computations and memory bandwidth usage.
In the following, bold characters are vectors (x,y,w) in homogeneous 2D space. It is well known that a time-continuous edge equation can be written as:
e(t)=a(t)x+b(t)y+c,
for an edge through two vertices, p1 and p0,
where
(a,b,c)=(p1×p0)=t2f+tg+h,
under the assumption that a vertex moves linearly:
p
i(t)=(1−t)qi+tri.
The vectors f, g, and h are computed as:
f=(r1−q1)×(r0−q0)
g=q
1×(r0−q0)+(r1−q1)×q0
h=q
1
×q
0
If you focus on a single pixel, it can be shown that the edge equation for motion blur becomes:
e(t)=αt2+βt+γ.
Note that qiy=riy and qiw=riw, (for a multi-view setting). That is, the y-coordinates and w-coordinates for a moving vertex, pi(t), are the same for the start position, qi and end position, ri, as described in
So in summary, we obtain:
f=(0,0,0),
g=(0, q1w(r0x−q0x)−q0w(r1x−q1x), q0y(r1x−q1x)−q1y(r0x−q0x)), and h=q1×q0.
This is considerably less expensive to compute than the previous expressions for generalized motion blur. These computations would be done in a triangle setup, and while it is beneficial to have a faster triangle setup, the real gains comes from the fact that root finding becomes much faster with our equations. This is so because f=(0,0,0), which means that e(t)=αt2+βt+γ becomes e(t)=αt+β, i.e., a first degree polynomial instead of a second degree polynomial (note that α and β are not necessarily the same α and β in the second degree polynomial). So in our optimized situation, the parameters, (a,b,c), for the edge equation becomes:
(a,b,c)=(hx,gyt+hy,gzt+hz)
As can be seen, a is no longer a function of t, and intuitively, this can be understood by making an analogue to non-homogenous edge equations, where n=(a,b) is the “normal” of the edge. This normal is computed as n=(a,b)=(−(y1−y0), x1−x0), where (x0, y0) and (x1, y1) are the screen space vertices of the edge. As can be seen, a only depends on the y-components, and for the multi-view case, the y-coordinates remain the same for all t, as we have seen above. The conclusion becomes that a must be constant.
For a particular sample point, (x0,y0), the edge equation becomes:
e(t)=t(gyy0+gz)+(hxx0+hyy0+hz)=αt+β.
The inside function, i(t), equals 0 if e(t)>0 and 1 otherwise. The visibility function is defined as v(t)=i0(t) i1(t) i2(t), as described by Gribel et al. “Analytical Motion Blur Rasterization With Compression,” High-Performance Graphics, pp. 163-172, 2010. The sample point is inside the triangle throughout the interval where v(t)=1, as can be seen in
The second degree nature of time-continuous edge equations makes it possible for a triangle to cover multiple spans throughout t for each sample. In contrast to this, thanks to the first degree characteristic of the multi-view edge functions, the visibility function will only be v(t)=1 for one contiguous interval in t, which simplifies the algorithm further.
Let us now focus on a particular scanline with y=y0, and let x vary along this scanline. In this case, we arrive at a simplified edge equation:
e(x,t)=αt+γ+hxx,
where α=gyy0+gz, and γ=hyy0+hz.
Let us look at what happens for two neighboring pixels, (x, y0) and (x+1, y0), and solve for t in e(x,t0)=0 and e(x+1,t2)=0:
t
0=(−γ−hxx)/α,
t
1
=t
0
−h
x/α.
This can be visualized in the epipolar plane at y0, as shown in
Based on these observations, we devise a new algorithm for quick analytical rasterization for multi-view graphics. There are several different embodiments here.
Starting values for t for each edge equation for n scanlines are computed, and with a single instruction multiple data (SIMD) width of n, we compute the next n t-values with SIMD instructions.
In one embodiment of the algorithm, shown in
At block 16, y is set equal to y+n. A check at block 18 determines if y-n is outside the bounding box. If so, triangle rasterization is finished and, otherwise, the flow iterates back to processing scanlines.
In another embodiment of the algorithm, shown in
First, in flow 40b, for each of the n scanlines, initial t values are calculated, as well as the Δt=hx/α increments (block 52). Then x- and t-values for all pixels in the tile are computed in parallel (flow 40b, block 54, flow 40c, blocks 56 and 58). The visibility function is also evaluated in parallel, flow 40c block 60, and, on success, the surviving samples are processed (block 62) (as described in the previous embodiment, flow 40a, blocks 44-48). Otherwise, the tile is done (block 64).
Then the flow returns to the left most flow 40a in
By solving the multi-view rasterization problem analytically, we avoid all types of noise in terms of visibility, in some embodiments. In some embodiments, the quality in terms of visibility is exact, i.e., it cannot be improved beyond our solution.
We also devise a technique for efficiently traversing a multi-view triangle. After some mathematics, we came to the conclusion that only simple adds are needed to traverse from one pixel to the neighboring pixel (in x), and this makes our traversal algorithms (we have two different embodiments) particularly fast.
We use analytical visibility computations over the camera line, instead of point sampling. We developed specialized edge equations that are fast to evaluate iteratively along a scanline, or over multiple scanlines using a SIMD instructions set. All of this makes for a very fast algorithm with high quality, in some embodiments.
The computer system 130, shown in
In the case of a software implementation, the pertinent code may be stored in any suitable semiconductor, magnetic, or optical memory, including the main memory 132 (as indicated at 139) or any available memory within the graphics processor. Thus, in one embodiment, the code to perform the sequences of
The graphics processing techniques described herein may be implemented in various hardware architectures. For example, graphics functionality may be integrated within a chipset. Alternatively, a discrete graphics processor may be used. As still another embodiment, the graphics functions may be implemented by a general purpose processor, including a multicore processor.
References throughout this specification to “one embodiment” or “an embodiment” mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation encompassed within the present invention. Thus, appearances of the phrase “one embodiment” or “in an embodiment” are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be instituted in other suitable forms other than the particular embodiment illustrated and all such forms may be encompassed within the claims of the present application.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.