This relates to digital image processing and, particularly, depictions of motion blurring and compression techniques.
Motion blur is generated when the shutter of the camera is open for a finite time, and some relative motion appears inside the field of view of the camera. It is an effect that is important for offline rendering for feature films, since the frame rate is rather low (˜24 frames per second). With motion blur in the rendered images, apparent jerkiness in the animation can be reduced or removed entirely. However, motion blur is also becoming an important visual effect for real-time rendering, e.g., for games. In order to get good performance, various rather crude approximations, that may or may not apply in all cases, are used.
In general, motion blur rendering can be divided into two parts, namely, visibility determination and shading computations. Most solutions that converge to a correctly rendered image are based on point sampling. The more samples that are used the better image is obtained, and at the same time, the rendering cost goes up. In many cases, one can obtain reasonable results with rather few shader samples compared to the number of visibility samples. For example, RenderMan uses only a single shader sample for motion-blurred micro polygons.
The motion blur visibility problem may be solved analytically in order to avoid sampling noise. One embodiment involves rasterization of motion-blurred triangles with analytical visibility.
Assume that the entire transform matrix, including projection, for a vertex is called M. A vertex in homogeneous clip space is then obtained as |p=M
The standard (no motion) edge function, in homogeneous form, through two vertices, say and {circumflex over (p)}0 and {circumflex over (p)}1, is:
e(x, y,w)=({circumflex over (p)}1×{circumflex over (p)}0)·(x, y, w)=ax+by+cw. (1)
A sampling point, (x,y,w), is inside the triangle if ei(x,y,w)≦0 for i∈{0,1,2}, i.e. for the three edges of the triangle. Next, this is extended with a time dimension.
Assume that the vertices move linearly from the beginning of a frame, at t=0, to the end for a frame, at t=1. At t=0, we denote the vertices as qi, and we call them ri at t=1. Since there is no bar nor a hat on the vertices, all qi and ri are in homogeneous clip space. A linearly interpolated vertex is then given as:
p
i(t)=(1−t)qi+tri, (2)
for a certain instant t∈[01]. The coefficients of a time-dependent edge equation are given by:
(a,b,c)=({circumflex over (p)}1×{circumflex over (p)}0)=((1−t){circumflex over (q)}hu 1+t{circumflex over (r)}1)×((1−t){circumflex over (q)}0+t{circumflex over (r)}0)=t2f+tg+h, (3)
where:
h={circumflex over (q)}
1
×{circumflex over (q)}
0,
k={circumflex over (q)}
1
+{circumflex over (r)}
0
+{circumflex over (r)}
1
×{circumflex over (q)}
0,
f=h−k+{circumflex over (r)}
1
×{circumflex over (r)}
0,
g=−2h+k. (4)
Each edge equation is now a function of time consisting of three functions: (a(t), b(t), c(t)), where, for example, a(t)=fxt2+gxt+hx. Finally, the entire time-dependent edge equation is:
e(x, y,t)=a(t)x+b(t)y+c(t), (5)
where we have set w=1 since rasterization is done in screen space (x,y).
For now, we assume that each pixel has a single sample point at (x0,y0). Extensions to multi-sampling and super-sampling just increase the sampling rate. If we consider a particular pixel, then (x0,y0) are constant. In this case, the time-dependent edge function becomes a function of time, t, alone:
e(x0,y0,t)=e(t)=a(t)x0+b(t)y0+c(t). (6)
This expression can be expanded using Equation 3:
where (α,β, γ) are constants for a certain sample point, (x0,y0). Hence, each edge equation is a quadratic function in t. Next, we introduce a binary inside-function, i(t), as:
i.e, i(t)=1 for all t∈[0,1] when (x0,y0) is inside the corresponding time-dependent edge equation. Note that the inside functions, ik(t), can be computed analytically by solving the second-degree polynomial in Equation 7.
For a moving triangle, we have three time-dependent edge functions, ek(t), where k∈{0,1,2}. The point (x0,y0), will be inside the moving triangle when all inside functions are positive at the same time. This visibility function can be expressed as:
v(t)=i0(t)i1(t)i2(t), (9)
i.e., the multiplication of all three inside functions.
We derive the equation for the depth during the time span where the sample point, (x0,y0), is inside the moving triangle. Perspective-correct interpolation coordinates (u,v), can be used to interpolate any attribute per vertex. This is done as:
s(u,v)=(1−u−v)p0+up1+vp2, (10)
where pk are the attribute vectors at the three vertices, and s(u,v) is the interpolated attribute vector. Edge equations, ek, can be used to compute (u,v):
Note that u, v, and all ek are functions of (x0,y0), but this was left out to shorten notation. Equation 11 also holds when time-dependent edge equations are used.
The depth buffer stores interpolated depth values. Assuming that pk=(pxk,pyk,pzk,pwk), k∈{0,1,2}, are the triangle vertices in clip space (before division by w), one first uses Equation 10, and then computes the depth as d=sz/sw for a particular fragment with perspective-correct barycentric coordinates, (u,v).
When we turn from static triangles to moving triangles, pk are functions of time (Equation 2), and so are the edge equations. Let us first take a look at one of the texture coordinates, u (see Equation 11):
where the three time-dependent edge equations are: αkt2+βkt+γk (Equation 7). The texture coordinate, u, becomes a rational polynomial of degree two in t. The major difference, when compared to static triangles, is when the entire depth function is put together,
where all pi and pwi are functions of time according to Equation 2, and u and v are functions of time (Equation 12) as well. When these expressions replacing the corresponding terms in Equation 13, we arrive at a cubic rational polynomial for the depth function for a certain sample point, (x0,y0):
Two of the advantages of using d=sz/sw include the fact that the depth is in the range [0,1] due to the way the projection matrix is set up and that depth buffer compression can therefore be implemented efficiently since the depth will be linear over a triangle. Also, d=sz can be used, which will generate the same images, but the depth will now range between the near and the far plane: [Znear, Zfar]. This simplifies the depth function for moving triangles. It will still be a rational function in t with degree three in the numerator, but the degree in the denominator will be reduced to two, that is:
In theory, the intersection of three inside functions of the visibility function can result in at most four disjoint time spans where the resulting function, v(t), is positive. This is because each inside function can consist of two disjoint positive parts. In practice, we have only encountered three intervals when you consider front-facing triangles for any value of t. Most commonly, only a single interval is generated for most triangles and samples, however.
The term “interval” denotes a range in the time dimension, together with the color and depth for that time range. An interval is denoted by Δ. In practice, the third-degree rational depth function (Equation 15), is approximated by a linear function. The motivation for this is that the depth function rarely varies much beyond such an approximation within a pixel, and it makes computations much faster. In addition, we have good experiences with this approximation.
Given the depth function approximation, each interval stores the following parameters:
tis: time at the beginning of the interval
tie: time at the end of the interval
zi: depth at the beginning of the interval
ki: slope of the depth function
ci: color of the interval (16)
Our interval is analogous to a fragment in rendering without motion blur, and an example of an interval is shown in
Interval insertion is illustrated in
Our approach is based on trying to keep the number of intervals stored per pixel small, and to facilitate compression when possible. When two intervals intersect, one can use clipping to create non-overlapping (in time) intervals. However, that can generate up to four intervals, which is undesirable. An intersection can also generate only two intervals, but in such cases, we also refrain from clipping since our compression mechanism works better with unclipped intervals. Note that using non-clipped intervals requires a slightly more complex resolve procedure. For opaque rendering, simple depth test optimizations can be included in this process as well, and this is shown in the bottom two illustrations 34 and 36 of
Note that to facilitate depth testing, we keep the intervals sorted on tsi per pixel. This can be done during interval insertion using insertion sort, for example.
A rasterization with compression sequence, shown in
After rendering all moving and non-moving triangles, as indicated in
The resolve pass (blocks 46, 47 and 48) processes a pixel independently of other pixels, and sweeps the sorted intervals in a pixel from time t=0 to t=1. During the sweep, we maintain a list, called Active List, per pixel of all intervals overlapping the current time of the sweep, as indicated in block 46 of
Between each pair of encounters (the three cases above), the final color for that particular subspan in time is computed (
for n disjoint intervals. If a box filter is used, the colors of all intervals are simply combined into one final color weighted by their duration in time. For the transparent resolve procedure, the only difference is that the color, ck, of the resolved interval, Δk, is computed by blending the intervals in the Active List in back-to-front order based on the alpha component of each color.
When there are many small triangles with a relatively high degree of motion blur, each pixel may need to store a large number of intervals, Δi, in order to exactly represent the color of the pixel. We have observed up to a few hundred intervals per pixel in extreme cases. This is clearly not desirable for a rasterization-based algorithm due to the extra pressure on the memory subsystem, that is, increased memory bandwidth usage. The problem can be alleviated by using a tiling architecture, where the triangles are sorted into tiles (rectangular regions of pixels) by front-end pass, and per-pixel rendering done in a back-end pass. Tiles can be back-end processed in parallel by separate cores, since the rasterization and per-pixel work is independent at this point. However, a tiling architecture cannot solve the problem. Instead, we use lossy compression of intervals.
Merging intervals are illustrated in the top part of
We use an oracle-based approach to attack this problem. Our oracle function is denoted:
o
i,j
=O(Δi,Δj), (18)
where the oracle function, O( ) operates on two intervals, Δi and Δj, where i<j. The task of the oracle is basically to compute an estimation on how appropriate it is to merge the two input intervals. Given an oracle function, O, we compute oracle function values, oi,j, for all i, and j∈{i+1,i+2,i+3}. For transparent scenes with a high level of depth overlap, we have increased the search range up to +10, instead of +3. In the next step, the interval pair with the lowest oi,j is merged. Depending on the implementation, this process may continue until the number of intervals per pixel falls in a desired range, or until there are no more appropriate merges possible. Next, we describe an oracle, for two intervals, that generates a lower value the more appropriate they are to merge.
Our oracle function, O, may be described, for example, by the following formula, where i<j:
0(Δi,Δj)=h1 max(tjs−tie,0)+h2|
The first term favors merging of intervals that are located close in time (even overlapping). The second term
We will describe the merging of two intervals, Δi and Δj, into a new interval, Δ′i. The merge is described as: Δ′i=merge(Δi,Δj), i<j, where the new parameters are:
t′is=tis
t′
i
e=max(tie,tje)
z′
i=(1−α)zi+α(zj−kj(tjstis))
k′
i=(1−α)ki+αkj
c′
i=(1−α)ci+αcj, (20)
where α=(tje−tjs)/(tie−tis+tje−tis) is used to linearly blend parameters depending on the lengths (in time) of the intervals that are merged. As can be seen, the slope, k′i of the depth function, and the color, c′i, are simply linear interpolations of the input intervals' parameters. The depth, z′i, is slightly more complex because we need to blend the depth at the same instant in time. Since we want the new depth at time tis, we compute the depth of Δj's depth function at tis and use that for blending. For future work, it would be interesting to investigate other approaches to depth merging, e.g., where the area under the depth function is kept constant after compression. An example of merging two intervals is shown in
The computer system 130, shown in
In the case of a software implementation, the pertinent code may be stored in any suitable semiconductor, magnetic, or optical memory, including the main memory 132 or any available memory within the graphics processor. Thus, in one embodiment, the code to perform the sequences of
The graphics processing techniques described herein may be implemented in various hardware architectures. For example, graphics functionality may be integrated within a chipset. Alternatively, a discrete graphics processor may be used. As still another embodiment, the graphics functions may be implemented by a general purpose processor, including a multicore processor.
References throughout this specification to “one embodiment” or “an embodiment” mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation encompassed within the present invention. Thus, appearances of the phrase “one embodiment” or “in an embodiment” are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be instituted in other suitable forms other than the particular embodiment illustrated and all such forms may be encompassed within the claims of the present application.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.