The present technology relates to compression of polygon mesh displacement data for computer graphics including but not limited to ray and path tracing. The technology herein provides a custom compression algorithm for generating high quality crack-free displaced micromeshes (“DMMs”) for computer graphics, while being fast enough to handle dynamic content in modern real-time applications
Still more particularly, the technology herein relates to a method for computing a compressed representation of dense triangle meshes such as for ray tracing workloads, and using lossy compression techniques to more efficiently store geometric displacements of polygon meshes such as for ray and path tracing while maintaining watertightness.
As graphics rendering fidelity has increased and the graphics industry has made huge strides in how to model the behavior of light and its interactions with objects within virtual environments, there is now a huge demand for very detailed, more realistic virtual environments. This has meant a huge increase in the amount of geometry that developers would like to model and image. However, memory bandwidth remains a bottleneck that limits the amount of geometry that graphics hardware can obtain from memory for rendering.
In the past, tessellation shaders addressed the memory bandwidth problem by generating—on the fly—a polygon mesh (see
Such a tessellated mesh is said to be “watertight” when there are no gaps between polygons. The mesh is said to not be “watertight” if—pretending the mesh were a real object immersed in water—water would leak in through any seams or holes between geometric shapes or polygons forming the mesh. Even tiny gaps between polygons can lead to missing pixels that can be seen in a rendered image. See
One source of such gaps resulted from performing floating-point operations in different orders—which did not always give the same results. Unfortunately, ordering shader calculations to make them identical for neighboring patches could cost a lot in performance. T-junctions—another watertight tessellation problem—occur when a patch is split even though one or more of its edges are flat. If the patch sharing the flat edges is not also split the same way, then a crack is created. See
Cracks and pixel dropouts were thus known to result from differing levels of tessellation, from the formation of T-junctions, due to computation issues, and for other reasons. Because any practical system represents the location of any given vertex using finite precision, vertices do not (to the detailed calculation and processing hardware) always in fact precisely lie on adjoining segments between polygons. Although this problem may be exacerbated by the lower precision of some hardware rasterizers and other graphics hardware, it exists for any finite precision representation, including IEEE floating point.
Previous approaches often required solving a complex global optimization problem, in order to maximize quality without introducing cracks. But the only way to guarantee a flawless rendering is through precise representation of relationships; vertices that are logically equal must be exactly equal. See Moreton et al (2001). Furthermore, real-time graphics applications often need to compress newly generated data on a per frame basis (e.g., the output of a physics simulation), before it can be rendered. Thus, to satisfy current graphics systems demands, one must be very careful while also being fast in processing what is analogous to a firehose of information.
Ray tracing performance scales nicely as geometric complexity increases, making it a good candidate for visualization of such more complex and realistic environments. As an example, it is possible using ray tracing to increase the amount of geometry modeling a scene by a factor of 100 and not incur much of a time performance penalty (for example, tracing time might double—but generally not increase by anything close to a hundredfold).
The problem: even though real time or close to real time processing of vast numbers of triangles is now practical, the acceleration data structures needed to support tracing such increased complexity geometry have the potential to grow in size linearly with the increased amount of geometry and could take an amount of time to build that similarly increases linearly with the amount of geometry. Complex 3D scenes composed of billions of triangles are onerous to store in memory and transfer into the rendering hardware. A goal is to make it possible to dramatically increase the amount of geometry while avoiding a proportional increase in the time it takes to build an acceleration data structure or the space it takes to store the acceleration data structure in memory.
Work to compress polygon meshes for ray and path tracing has been done in the past. See for example Thonat et al, Tessellation-free displacement mapping for ray tracing, pp 1-16 ACM Transactions on Graphics Volume 40 Issue 6 No.: 282 (December 2021) doi.org/10.1145/3478513.3480535, //dLacm.org/doi/abs/10.1145/3478513.3480535; Wang et al, View-dependent displacement mapping, ACM Transactions on Graphics Volume 22 Issue 3 July 2003 pp 334-339, doi.org/10.1145/882262.882272; Lier et al, “A high-resolution compression scheme for ray tracing subdivision surfaces with displacement”, Proceedings of the ACM on Computer Graphics and Interactive Techniques Volume 1 Issue 2 Aug. 2018 Article No.: 33 pp 1-17, doi.org/10.1145/3233308; Chun et al, “Multiple layer displacement mapping with lossless image compression”, International Conference on Technologies for E-Learning and Digital Entertainment Edutainment 2010: Entertainment for Education. Digital Techniques and Systems pp 518-528; Szirmay-Kalos et al, Displacement Mapping on the GPU—State of the Art, Computer Graphics Forum Volume 27, Issue 6 Sep. 2008 Pages 1567-1592.
However, there is much room for improving how to represent polygon meshes for applications including but not limited to ray and path tracing in more compact, compressed forms that achieve “watertightness”. In particular, there are several reasons why consistent mesh generation and representation are not simple. As one example, forward differencing can suffer from round-off error when evaluating a long sequence of vertices of a tessellated mesh. This problem can sometimes be made worse if the compressor and decompressor use different computation hardware. Even if the implementations were identical, the same inputs with differing rounding modes might yield unequal results. Also, if different patches are processed independently, it is simply not possible to match things up as you go or clean up small discrepancies after the fact—rather, consistent triangle mesh representation, compression, decompression and processing should be accomplished from the beginning as a part of the design. It is important to realize that in order to have a guarantee of perfect watertight rendering there can be no errors or inconsistencies—not even a single bit. See Moreton et al, Watertight Tessellation using Forward Differencing, EGGH01: SIGGRAPH/Eurographics Workshop on Graphics Hardware (2001).
Embodiments herein employ a fast compression scheme that enables encoding sub triangles of a triangle mesh in parallel, with minimal synchronization, while producing high quality results that are free of cracks.
The introduction of Displaced Micro-meshes (DMMs) fills the aforementioned gap by helping to solve the memory bandwidth problem. See the micromesh patent applications. Very high quality, high-definition content is often very coherent, or locally similar. In order to achieve dramatically increased geometric quantities, we can use μ-mesh (also “micromesh”)—a structured representation of geometry that exploits coherence for compactness (compression) and exploits its structure for efficient rendering with intrinsic level of detail (LOD) and animation. Micromesh is a powerful concept that has the ability to yield substantial speed and efficiency increases; for example, a huge advantage of micromesh tracing is the ability to rapidly and efficiently cull large portions of the mesh. The μ-mesh structure can for example be used to avoid large increases in bounding volume hierarchy (BVH) construction costs (time and space) while preserving high efficiency. When rasterizing, the intrinsic μ-mesh LOD can be used to rasterize right-sized primitives.
While applying displacement mapping to micromesh enables efficient rendering of highly complex 3D objects, as noted above, compressed numerical representations for the displacement map can create problems with “watertightness” if not implemented carefully. In particular, any lossy compression used to represent localized displacement map numerical representations has the potential to create cracks in the visualized/rendered micromesh if not handled appropriately.
Example embodiments herein provide a custom compression algorithm for generating high quality cracks-free displaced micromeshes (“DMMs”), while being fast enough to handle dynamic content in modern real-time applications. The technology herein succeeds in providing a crackfree micromesh in the form of a structured representation that enables it to be stored in very compact, compressed formats. In some example implementations, the average storage space per triangle is decreased from the typically ˜100 bits per triangle to on the order of only 1 or 2 bits per triangle.
In one embodiment, such compression is achieved through a novel hierarchical encoding scheme using linearly interpolated vertex displacement amounts between minimum and maximum triangles forming a prismoid.
Furthermore, to satisfy the requirements above, we developed a fast compression scheme that enables encoding sub triangles in parallel, with minimal synchronization, while producing high quality results that are free of cracks.
In one embodiment, displacement amounts can be stored in a flat, uncompressed format such that, for example, an unsigned normalized value (such as UNORM11) for any microvertex can be directly accessed. Displacement amounts can also be stored in a new compression format that uses a predict-and-correct mechanism.
One embodiment of our compression algorithm constrains correction bit widths so the set of displacement values representable with a given μ-mesh type is a strict superset of all values representable with a more compressed μ-mesh type. By the encoder organizing the μ-mesh types from most to least compressed, we can proceed to directly encode sub triangles in “compression ratio order” using a predict-and-correct (P&C) scheme, starting with the most compressed μ-mesh type, until a desired level of quality is achieved. This scheme enables parallel encoding while maximizing compression ratio, and without introducing mismatching displacement values along edges shared between sub triangles.
Further aspects include determining what constraints need to be put in place to guarantee crack-free compression; a fast encoding algorithm for a single sub triangle using the prediction & correction scheme; a compression scheme for meshes that adopt a uniform tessellation rate (i.e., all base triangles contain the same number of μ-triangles); compressor extensions to handle adaptively tessellated triangle meshes; and techniques that exploit wraparound computation methods to increase compression performance.
One embodiment provides a set of rules on DMM correction and shift bit widths that enable a given micro-mesh type to always be able to represent a more compressed micro-mesh type. These rules, in conjunction with additional constraints on the order used to encode DMMs, enable a compression scheme as a parallel algorithm, with little communication required among independently compressed DMMs, and still being able to guarantee high quality crack free results. In one embodiment, the technology herein transforms a previously global optimization into a local one, enabling parallel crack-free compression of DMMs, with very little “inter-triangle” communication required at compression time.
When rendering using data from a compressed representation, we need to be able to efficiently access required data. When rendering a pixel, we can directly address associated texels by computing the memory address of the compressed block containing the required texel data. Texel compression schemes use fixed block size compression, which makes possible direct addressing of texel blocks. When compressing displacement maps (see below) in one embodiment, we use a hierarchy of fixed size blocks with compressed encodings therein.
Further novel features include:
Crackfree Guarantee
An often-used general solution to cracking is to ensure the hardware or shader uses the same input data for shared vertices and shared edges. But displacement offsets or differences along shared edges have been known to pick up slightly different or varying values, which can lead to cracking artifacts. This can be especially true where the shared vertex/shared edge numerical values are accessed and determined locally/independently e.g., on a randomly ordered basis rather than together and/or in a particular order.
The example non-limiting embodiments herein provide crackfree, watertightness guarantees despite such challenges.
DMM Compression
When highly detailed geometry is described, it is important that the description be as compact as possible. The viability of detailed geometry for real-time computer graphics relies on being able to render directly from a compact representation. The above-referenced copending commonly-assigned “micromesh” patent applications describe the incorporation of displacement maps (DMs) into a μ-mesh representation. Because the DMs are high quality μ-mesh components, they may be compressed by taking advantage of inherent coherence. DMs can be thought of as representatives of data associated with vertices. This data class may be understood as calling for both lossless and lossy compression schemes. Where a lossless scheme can exactly represent an input, a lossy scheme is allowed to approximate an input to within a measured tolerance. The fact that a scheme is lossy means that data is being lost—which should make higher compression ratios and more compact data representations possible. However, as noted above, the problem is not (just) ensuring the decompressor recovers the compressed data in a deterministic way—it is further complicated by the need to recover the same (bit-for-bit) displacement values whenever the vertices are on a shared tessellated edge between two different polygons.
Lossy schemes may flag where an inexact encoding has occurred, or indicate which samples failed to encode losslessly.
Displacement Block Storage
In one embodiment, the mesh displacement information is stored in a number of different compressed formats that allow us to describe the microtriangles with as few bits as possible.
In one example embodiment, the micromesh comprises a mesh of base triangles that are stitched or joined together at their respective vertices. These base triangles can be referred to as “API” base triangles because they each define three vertices of the type that can be processed by a legacy vertex shader or ray tracer. However, in one embodiment, the base triangle itself is not imaged, rasterized or otherwise visualized, and instead serves as a platform for a recursively-subdividable displacement-mapped micromesh. This micromesh is formed as regular 2n×2n mesh (where n is any non-zero integer), with each further tessellated level subdividing each sub triangle in the previous level into four (4) smaller sub triangles according to a barycentric grid and a space filling curve. See
In example embodiments, a displacement value is stored for each microvertex of the micromesh. These displacement values are stored in displacement blocks such as shown in
In one embodiment, because of the way the displacement values are configured, no compression is needed in order to fit displacement values for lower tessellation levels into a single cacheline. As
See
“Full” Precision Displacement Values are Represented as UNORM11
For context,
Thus, in example embodiments, displacement amounts can be stored in a flat, uncompressed format where the UNORM11 displacement for any μ-vertex can be directly accessed.
However, as the tessellation level increases, so do the number of microvertices and we soon run out of room in a single cacheline to store the corresponding displacement values in UNORM11. See
Displacement Compression with Forward Differencing (“Predict-and-Correct”)
The P&C mechanism in an example embodiment relies on the recursive subdivision process used to form a μ-mesh. A set of base anchor points are specified for the base triangle. At each level of subdivision, new vertex displacement values are formed by averaging the displacement values of two adjacent vertices in a higher subdivision level. This is the prediction step: predict that the value is the average of the two adjacent vertices.
The next step corrects that prediction by moving it up or down to get to where it should be. When those movements are small, or are allowed to be stored lossily, the number of bits used to correct the prediction can be smaller than the number of bits needed to directly encode it. The bit width of the correction factors is variable per level.
In more detail, for predict-and-correct, a set of base anchor displacements are specified for the base triangle as shown in
disp_amount_prediction=(disp_amount_v0+disp_amount_v b1+1)/2
It will be noted that the encoder will communicate the base anchor displacements to the decoder, and the decoder in recursively subdividing the base triangle into increasingly deeper levels of subdivision (resulting in higher and higher tessellation levels) will already have calculated the adjacent microvertex displacement values which are thus available for computing (by linear interpolation) the displacement values for new intermediate microvertices.
Of course, the actual displacement value of a microvertex is not necessarily the same as its immediate neighbors—the micromesh is configured in one embodiment so any microtriangle can have an independent orientation which means that its three microvertices can have independently defined displacement values. So as in a typical forward differencing system, the encoder also calculates and communicates to the decoder, a scalar correction to the prediction. In other words, the encoder computes the prediction and then compares the prediction to the actual displacement value of the microvertex. See
d(4)=(d(2)+d(1)+1)/2+correction(4)
d(7)=(d(5)+d(3)+1)/2+correction(7).
Thus, the next step performed by both the encoder and the decoder is to correct the predicted displacement amount with a per-vertex scalar correction, moving the displacement amount up or down to reach the final displacement amount. When these movements are small, or allowed to be stored lossily, the number of bits used to correct the prediction can be smaller than the number of bits needed to directly encode it. In practice it is likely for higher subdivision levels to require smaller corrections due to self-similarity of the surface, and so the bit-widths of the correction factors are reduced for higher levels. See
The base anchor displacements are unsigned (UNORM11) while the corrections are signed (two's complement). In one embodiment, a shift value is also introduced to allow corrections to be stored at less than the full width. Shift values are stored per subdivision level with 4 variants (a different shift value for the microvertices of each of the three sub triangle edges, and a fourth shift value for interior microvertices) to allow vertices on each of the sub triangle's edges to be shifted independently (e.g., using simple shift registers) from each other and from vertices internal to the sub triangle.
In more detail, at deeper and deeper tessellation levels, the micromesh surface tend to become more and more self-similar—permitting the encoder to use fewer and fewer bits to encode the signed correction between the actual surface and the predicted surface. The encoding scheme in one embodiment provides variable length coding for the signed correction. More encoding bits may be used for coarse corrections, fewer encoding bits are needed for finer corrections. In example embodiments, this variable length coding of correction values is tied to tessellation level as follows:
Thus, in one embodiment, when corrections for a great many microtriangles are being encoded, the number of correct bits per microtriangle can be small (e.g., as small as a single bit in one embodiment).
Meanwhile, in one embodiment, the encoding scheme uses block floating point, which allows even one bit precision to be placed wherever in the range it is needed or desired. Thus, “shift bits” allow adjustment of the amplitude of corrections, similar to a shared exponent. The shifts for the above tessellation levels may be as follows in one embodiment:
The decoder (and the encoder when recovering displacement values it previously compressed) may use a hardware shift circuit such as a shift register to shift correction values by amounts and in directions specified by the shift values. For example, the level 5 4-bit shift values can shift the 1-bit correction value to any of 16 different shift positions to provide a relatively large dynamic range for the 1-bit correction value.
Providing different shifts for different levels and different shifts for each edge and interior vertices prevents “chain reactions” or domino-like effects (i.e., where knocking down one domino causes the momentum to propagate to a next domino, which propagates it to a further domino, and so on) and avoids the need for global optimization of the mesh. By decoupling the shift values used to encode/decode the interior vertices from the edge vertices, we enable the edge vertices to match their counterparts on neighboring micromeshes which share the same edges, without propagating the constraints on their values to the interior vertices. When this is not possible, such constraints can emerge locally and propagate throughout the mesh and effectively become global constraints. As will be explained below, the width of the shift and correction values cannot be arbitrary, but must follow constraints to ensure bit-for-bit matching between compression levels.
The predict-and-correct operation expressed in the following example Formula 1 below, written in pseudo-code:
Each final displacement amount then becomes a source of prediction for the next level down. Note that each prediction has an extra “+1” term which allows for rounding versus truncation, since the division here is the correction's truncating division. It is equivalent to prediction=round((v0+v1)/2) in exact precision arithmetic, rounding half-integers up to the next whole number.
As will be understood from the discussion below, a primary design goal for this compression algorithm is to constrain the correction bit widths so that the set of displacement values representable with a given μ-mesh type is a strict superset of all values representable with a more compressed μ-mesh type. The above correction and shift value widths meet this constraint.
In another embodiment, the displacement map may be generated and encoded using the above described predict and control (P&C) technique and the constant-time algorithm for finding the closest correction is used. In an embodiment, as described above, the P&C technique and the algorithm for finding the closest correction is used in association with the fast compression scheme directed to constrain correction bit widths in displacement encodings.
Displacement Storage
Displacement amounts are stored in 64B or 128B granular blocks called displacement blocks. The collection of displacement blocks for a single base triangle is called a displacement block set. A displacement block encodes displacement amounts for either 8×8 (64), 16×16 (256), or 32×32 (1024) μ-triangles.
In a particular non-limiting implementation, the largest memory footprint displacement set will have uniform uncompressed displacement blocks covering 8×8 (64) μ-triangles in 64 bytes. The smallest memory footprint would come from uniformly compressed displacement blocks covering 32×32 in 64 bytes, which specifies ˜0.5 bits per μ-triangle. There is roughly a factor of 16× difference between the two. The actual memory footprint achieved will fall somewhere within this range. The size of a displacement block in memory (64B or 128B) paired with the number of μ-triangles it can represent (64, 256 or 1024) defines a μ-mesh type. We can order μ-mesh types from most to least compressed, giving a “compression ratio order” used in watertight compression—see
As the
While the number of displacement blocks in the above table increases geometrically with larger numbers of triangles, self-culling at the decoder/graphics generation side will often or usually (e.g., in ray tracing) ensure that only one or a small number of the displacement blocks is actually retrieved from memory.
In some embodiments, the base anchor points are unsigned (UNORM11) while the corrections are signed (two's complement). A shift value allows for corrections to be stored at less than the full width. Shift values are stored per level with four variants to allow vertices on each of the sub triangle mesh edges to be shifted independently from each other and from vertices internal to the sub triangle. Each decoded value becomes a source of prediction for the next level down.
Compressor—Sub Triangle Encoder
According to some embodiments, a 2-pass approach is used to encode a sub triangle with a given μ-mesh type. See
The first pass uses the P&C scheme described above to compute lossless corrections for a subdivision level, while keeping track of the overall range of values the corrections take. The optimal shift value that may be used for each edge and for the internal vertices (4 shift values total in one embodiment) to cover the entire ranges with the number of correction bits available is then determined. This process is performed independently for the vertices situated on the three sub triangle edges and for the internal vertices of the sub triangle, for a total of 4 shift values per subdivision level. The independence of this process for each edge is required to satisfy the constraints for crack-free compression.
The second pass encodes the sub triangle using once again the P&C scheme, but this time with lossy corrections and shift values computed in the 1st pass. The second pass uses the first pass results (and in particular the maximum correction range and number of bits available for correction) to structure the lossy correction and shift values—the latter allowing the former to represent larger numbers than possible without shifting. The result of these two passes can be used as-is, or can provide the starting point for optimization algorithms that can further improve quality and/or compression ratio.
A hardware implementation of the P&C scheme may exhibit wrapping around behavior in case of (integer) overflow or underflow. This property can be exploited in the 2nd pass to represent correction values by “wrapping around” that wouldn't otherwise be reachable given the limited number of bits available. This also means that the computation of shift values based on the range of corrections can exploit wrapping to obtain higher-quality results (see “Improving shift value computation by utilizing wrapping” below).
Note that the encoding procedure can never fail per se, and for a given μ-mesh type, a sub triangle can always be encoded. That said, the compressor can analyze the result of this compression step and by using a variety of metrics and/or heuristics decide that the resulting quality is not sufficient. (See “Using displacement direction lengths in the encoding success metric” below.)
In this case the compressor can try to encode the sub triangle with less compressed μ-mesh types, until the expected quality is met. This iterative process can lead to attempting to encode a sub triangle with a μ-mesh type that cannot represent all its μ-triangles. In this case the sub triangle is recursively split in four sub triangles until it can be encoded. In one embodiment, the initial split step splits only when the current subtriangle contains more triangles than can be encoded with the current micromesh type (hence the need to recursively split until the number of microtriangles in the subtriangle matches the number of triangles that can be encoded with the current micromesh type).
Exploiting Mod 2048 Arithmetic
In the above prediction calculation expressions, the compressor tries to compute the correction based on the prediction, the shift and the uncompressed value. But in one embodiment, this correction computation can be a bit tricky when the computation is performed using wrapping arithmetic (e.g., 0, 1, 2, . . . 2046, 2047, 0, 1, 2 . . . ) for mod 2048 arithmetic—which is what the decoder hardware uses in one embodiment when adding the prediction to the correction based on unsigned UNORM11 values. Specifically, while the averaging operation is a typical averaging, the decoded position wraps according to unsigned arithmetic rules when adding the correction to the prediction. Meanwhile, the error metric is in one embodiment not based on wrapping arithmetic. Therefore, it is up to the software encoder to either avoid wrapping based on stored values or to make that wrapping outcome sensible. An algorithm by which the encoder can make use of this wrapping and exploit it to improve quality is described below. An alternative embodiment could clamp the additional results and prevent wraparound (thereby effectively discarding information), but would then lose the ability to improve compression results by exploiting the wraparound behavior. In one embodiment, exploiting the wraparound behavior can decrease error by a factor of 3.
Displacement Compression—A Robust Constant-Time Algorithm for Finding the Closest Correction
As described above, corrections from subdivision level n to subdivision level n+1 are signed integers with a fixed number of bits b (given by the sub triangle format and subdivision level) and are applied according to the formula above. Although an encoder may compute corrections in any of several different ways, a common problem for an encoder is to find the b-bit value of c (correction) that minimizes the absolute difference between the d (decoded) and a reference (uncompressed) value r in the formula in
This is complicated by how the integer arithmetic wraps around (it is equivalent to the group operation in the Abelian group Z/211Z), but the error metric is computed without wrapping around (it is not the Euclidean metric in Z/211Z). An example is provided to further show how this is a nontrivial problem.
Consider the case p=100, r=1900, s=0, and b=7, illustrated in
Shown is the number line of all UNORM11 values from 0 to 2047, the locations of predicted value p in thick line and reference value r in a dot-dash line, and in the lighter shade around the thick line of p, all possible values of d for all possible corrections (since b=7, the possible corrections are the signed integers from −26=−64 to 26−1=63 inclusive).
In this example, there is a shift of 0 and a possible correction range of −64 to +63 as shown by the vertical lines on the left and right side of the prediction line labelled p. The decoder should preferably pick a value that is closest to the r line within the standard Euclidean metric. This would appear to be the right-most vertical line at +63. However, when applying wraparound arithmetic, the closest line to the reference line r is not the right-most line, but rather is the left-most line at −64 since this leftmost line has the least distance from the reference line r using wraparound arithmetic.
In this case, the solution is to choose the correction of c=63, giving a decoded value of d=163 and an error of abs(r-d)=1737. If the distance metric was that of /211, the solution would instead be c=−64, giving a decoded value of d=36 and an error of 183 (wrapping around). So, even though using the error metric of /211 is easier to compute, it produces a correction with the opposite sign of the correct solution, which results in objectionable visual artifacts such as pockmarks.
Next, consider the case p=100, r=1900, s=6, and b=3, illustrated in
In this case, the solution is to choose the correction of c=−4, giving a decoded value of d=1892 and an error of abs(r-d)=8. The wraparound behavior may be exploited to get a good result here, but by doing so, it is seen that a nonzero shift can give a lower error than the previous case, even with fewer bits.
Other scenarios are possible. The previous scenario involved arithmetic underflow; cases requiring arithmetic overflow are also possible, as well as cases where no overflow or underflow is involved, and cases where a correction obtains zero error.
The below presents pseudocode for an algorithm that given unsigned integers 0≤p<2048, 0≤r<2048, an unsigned integer shift 0≤s<11, and an unsigned integer bit width 0≤b≤11, always returns the best possible integer value of c (between −2b and (2b)−1 inclusive if b>0, or equal to 0 if b=0) within a finite number of operations (regardless of the number of b-bit possibilities for c). In the illustrated pseudocode for the sequential algorithm steps 1-8 below, non-mathematical italic text within parentheses represent comments, and modulo operations (mod) are taken to return positive values.
(Early check for the zero-bit case) If b is equal to 0, return 0.
(Range of representable values around 0 with shift applied is −nR . . . pR−1) Set nR=2b-1+s, pR=nR−2s.
(Difference in ) Set signed integer d=r−p.
(Is the reference value between the two extreme corrections?) If (d mod 2048)>pR and 2048−(d mod 2048)>nR:
Otherwise: (The reference value is between two representable values; find them in /211; then the ideal correction must be one of the two.)
using floating-point arithmetic for the division.
(Compute error for iLo) Set eLo to the absolute difference of r, and the result of substituting correction=iLo into Formula 1 above.
(Compute error for iHi) Set eHi to the absolute difference of r, and the result of substituting correction=iHi into Formula 1.
(Choose the option with lower error) If eLo≤eHi, return iLo. Otherwise, return iHi.
Basically, the pseudocode algorithm recognizes that the reference line r must always be between two correction value lines within the representable range or exactly coincident with a correction value line within the range. The algorithm flips between two different cases (the reference value between the two extreme corrections or the reference value is between two representable values), and chooses the case with the lower error. Basically, the wraparound case provides a “shortcut” for situations where the predicted and reference values are near opposite ends of the bit-limited displacement value range in one embodiment.
Compressor—Improving Shift Value Computation by Utilizing Wrapping
Minimizing the size of the shift at each level for each vertex type may improve compression quality. The distance between the representable corrections (see the possible decoded values shown in
For instance, consider a correction level and vertex type where the differences mod 2048 between each reference and predicted value are distributed as in
In more detail,
One possible algorithm may be as follows. Subtract 2048 from (differences mod 2048) that are greater than 1024, so that all wrapped differences wi will lie within the range of integers—1024 . . . 1023 inclusive. See
Then compute the shift s given the level bit width b as the minimum number s such that
2s(2b−1)≥max(wi)
and
−2s(2b)≤min(wi).
In one example, this transform can be included as part of “pass one” of an encoder to compute lossless corrections (see
Compressor—Using Displacement Ranges in the Encoding Success Metric
A method for interpreting scaling information as a per-vertex signal of importance, and a method for using per-vertex importance to modify the displacement encoder error metric are described. This improves quality where needed and reduces size where quality is not as important.
As described above, each vertex has a range over which it may be displaced, given by the displacement map specification. For instance, with the prismoid specification, the length of this range scales with the length of the interpolated direction vector and the interpolated scale. Meanwhile, the decoded input and output of the encoded format has fixed range and precision (UNORM11 values) as discussed above. This means that the minimum and maximum values may result in different absolute displacements in different areas of a mesh—and therefore, a UNORM11 error of a given size for one part of a mesh may result in more or less visual degradation compared to another.
In one embodiment, a per-mesh-vertex importance (e.g., a “saliency”) is allowed to be provided to the encoder such as through the error metric. One option is for this to be the possible displacement range in object space of each vertex (e.g., distance x scale in the prismoid representation—which is a measure of differences and thus computed error in object space); however, this could also be the output of another process, or guided by a user. For example, an artist could indicate which vertices have higher “importance” to achieve improved imaging results, e.g., so higher quality is provided around a character's face and hands than around her clothing.
The mesh vertex importance is interpolated linearly to get an “importance” level for each μ-mesh vertex. Then within the error metric, the compressed versus uncompressed error for each error metric element is weighted by an error metric “importance” derived from the element's μ-mesh vertices' level of “importance”. These are then accumulated and the resulted accumulated error— which is now weighted based on “importance” level—is compared against the error condition(s). In this way, the compressor frequently chooses more compressed formats for regions of the mesh with lower “importance”, and less compressed formats for regions of the mesh with higher “importance”.
Compressor—Constraints for Crack-Free Compression
The discussion above explains how a compressor can compress a micromesh defined by a base triangle. By organizing the μ-mesh types from most to least compressed as shown in
In the example shown, the microvertices are assigned a designator such as “S1”. Here, the letter “S” refers to “subdivision” and the number following refers to the number of the subdivision. Thus, one can see that “S0” vertices on the top and bottom of the shared edge for each sub triangle will be stored at subdivision level zero—namely in uncompressed format. A first subdivision will generate the “S1” vertex at subdivision level 1, and a second subdivision will generate the “S2” vertices at subdivision level 2.
To avoid cracks along the shared edge, the decoded displacement values of the two triangles must match. S0 vertices match since they are always encoded uncompressed. S1 and S2 vertices will match if and only if (1) the sub triangle is encoded in “compression ratio order” and (2) displacement values encoded with a more compressed μ-mesh type are always representable by less compressed μ-mesh types. The second constraint implies that for a given subdivision level a less compressed μ-mesh type should never use fewer bits than a more compressed μ-mesh type. For instance, if the right sub triangle uses a μ-mesh type more compact than the left sub triangle, the right sub triangle will be encoded first. Moreover, the post-encoding displacement values of the right sub triangle's edge (i.e., its edge that is shared with the right sub triangle) will be copied to replace the displacement values from the left sub triangle. Property (2) ensures that once compressed, the displacement values along the left sub triangle's edge is losslessly encoded, creating a perfect match along the shared edge.
In this example, these two sub triangles are encoded with different micromesh types (for example, assume the sub triangle on the left is more compressed than the sub triangle on the right). As discussed above, the compressor in one embodiment works from more compressed to less compressed formats, so in this case, displacements for the sub triangle on the left will be encoded first. So let's assume the displacements for the sub triangle on the left have already been successfully encoded and a processor is now trying to encode the displacements for the sub triangle on the right—and in particular, displacements for the microvertices of the triangle on the right that lie on the edge shared between the two triangles. The displacement values to be encoded to the shared edge microvertices of the right side sub triangle must match, bit for bit, the displacement values already encoded for the shared edge vertices of the left side sub triangle. Cracking may result if they don't match exactly.
If the shared edge vertices on the right side triangle are going to match bit-for-bit the shared edge vertices on the left side triangle, the number of bits used to represent displacement for the right side triangle must be equal to or greater than the number of bits used to represent displacement for the left side triangle. For this reason, the vertices facing one another on the left and right sub triangle shared edge have the same subdivision level—for example, a left side S0 vertex matches a right side S0 vertex, a left side S1 vertex matches a right side S1 vertex, a left side S2 vertex matches a right side S2 vertex and so on. Thus, on edges shared between sub triangles, a less compressed displacement format can never use fewer bits for a given subdivision level than a facing, more compressed displacement format. For example, if you imagine recording on horizontal line such as in a spreadsheet, the number of bits assigned to represent the vertices for a given subdivision level across all the different micromesh types sorted from more compressed to less compressed, will form a monotonic sequence that increases, or does not change, and cannot decrease. In other words, there can never be fewer bits for a given subdivision level in the less compressed type than there are bits in the more compressed type. Example embodiments impose this constraint on the encoding scheme to guarantee watertightness assuming the encoding algorithm is deterministic (it does not have any stochastic components).
Thus, in this example, we see 2× more vertices on the right than on the left, Some edge vertices shared between the sub triangles on the left and the right do not belong to the same subdivision level. For example, “S2” vertices on the left side sub triangle face S1 vertices on the right side sub triangle, and S1 vertices on the left side sub triangle face S0 vertices on the right side sub triangle. Therefore, the number of bits assigned to encode the same shared vertices for the left and right side sub triangles are not necessarily the same.
In particular, in one embodiment, the higher (tessellation rate) subdivision levels are assigned fewer bits per vertex for displacement encoding so it is likely that the number of bits available to encode for example S1 is going to be higher than the number of bits available to encode S2 for example. However, as discussed above, when processing sub triangles having different tessellation rates, it is preferable in some embodiments to encode lower tessellation rate sub triangles before encoding adjoining higher tessellation rate triangles in order to guarantee that the information associated with the adjoining sub triangle can match bit-for-bit. Specifically, since fewer bits may be available for encoding higher tessellation rate sub triangle on the right, it will otherwise not be guaranteed that the vertex encoding for the higher tessellation rate sub triangle on the right as compared to the lower tessellation rate sub triangle on the left. First encoding the sub triangle with the lower tessellation rate on the left will ensure that the higher tessellation rate sub triangle on the right will be able to represent the same vertex information so long as within a micromesh type, the number of displacement encoding bits for increasingly deep/recursive subdivision levels does not increase:
# bits for subdivision level k≤# bits for subdivision level j
where j is any less subdivided level (lower tessellation ratio) than k.
To summarize, when encoding a triangle mesh according to some high performance embodiments, the following constraints on ordering are adopted to avoid cracks in the mesh:
Thus, the following constraints are imposed on correction bit widths configurations in some embodiments:
The rule above accounts for micromesh types that represent the same number of microtriangles (i.e. same number of subdivisions), but with different storage requirements (e.g. 1024 microtriangles in 128B or 64B).
In one embodiment, the effective number of bits used to represent a displacement value is given by the sum of its correction and shift bit widths. Also, in the example of
These example constraints allow different sub triangles in the mesh to be processed independently (both encoding and then subsequent decoding) by high performance, asynchronous parallel processing while ensuring those processes will independently derive the same displacement values for vertices shared between adjacent sub triangles when encoding the mesh and preventing situations where a larger precision data representation is being squeezed into a smaller number of bits, which would result in a loss of numerical resolution and thus the inability to provide a bit-for-bit match of displacement values at interfacing vertices of different sub triangles. It's a little like interviewing different eyewitnesses of an important event independently in different rooms without letting them talk to one other, and each witness agreeing on exactly the same sequence of events.
Compressor—Mesh Encoder (uniform)
The pseudo-code below and shown in
Note that each sub triangle carries a set of reference displacement values, which are the target values for compression. An edge shared by an encoded sub triangle and one or more not-yet-encoded sub triangles is deemed as “partially encoded”. To ensure crack-free compression its decompressed displacement values are propagated to the not-yet-encoded sub triangles, where they replace their reference values.
The
In this case, the builder has subdivided the
This is where the algorithm takes advantage of a constraint that the less compressed top sub triangle vertex formats must be able to represent the more compressed vertex formats of the lower sub triangles. This may sound like a redundant requirement—won't a less compressed format always be able to represent the values of a more compressed format? Not necessarily—if both formats use lossy compression, there exists the possibility that a less compressed format will not be able to represent certain values that a more compressed format is able to represent. However, if such a situation were allowed to occur, the result would be cracks in the mesh. Accordingly, in example embodiments, a constraint is imposed to prevent this—namely any less compressed type can always represent all values of a more compressed type.
But even this constraint is not enough to guarantee no cracking. This is because the displacement values the decompressor will recover from the lowermost sub triangles on the edge shared with the uppermost sub triangle are not the original displacements of the mesh, but rather have passed through a lossy compression process. Accordingly, in one embodiment, we place bit-for-bit matching above precision, and propagate the successfully compressed then recovered values from the lower sub triangle vertices onto the shared edge with the uppermost sub triangle, thereby substituting the propagated values for the uppermost sub triangle's own vertex displacements. By propagating these displacement values recovered from decompressing the lower sub triangle vertex to the less-compressed uppermost sub triangle—and with the constraint that the less compressed format of the uppermost sub triangle can exactly represent those propagated values from a more compressed format—it can now be guaranteed that the vertex displacements the decoder recovers for the uppermost sub triangle will be bit-for-bit identical with the corresponding vertex displacements the decoder will recover for the lowermost sub triangles along the shared edge—with no requirement that the decoder decodes both at the same time or knows there is a shared edge.
The algorithm will then try to recompress the four subdivided upper sub triangles as shown in
As
Compressor—Mesh Encoder (Adaptive)
As shown below encoding of adaptively tessellated meshes uses an additional outer loop, in order to process sub triangles in ascending tessellation rate order:
The example compression technique herein does not make any assumption of whether the mesh we are compressing is manifold or not, and therefore we can compress non-manifold meshes just fine. This property can be quite important (often assets from games are not manifold) and makes the example embodiment more robust.
Note that when updating the reference displacements for edges shared with sub triangles that use a 2× higher tessellation rate, only every other vertex is affected (see
Images generated applying one or more of the techniques disclosed herein may be displayed on a monitor or other display device. In some embodiments, the display device may be coupled directly to the system or processor generating or rendering the images. In other embodiments, the display device may be coupled indirectly to the system or processor such as via a network. Examples of such networks include the Internet, mobile telecommunications networks, a WIFI network, as well as any other wired and/or wireless networking system. When the display device is indirectly coupled, the images generated by the system or processor may be streamed over the network to the display device. Such streaming allows, for example, video games or other applications, which render images, to be executed on a server or in a data center and the rendered images to be transmitted and displayed on one or more user devices (such as a computer, video game console, smartphone, other mobile device, etc.) that are physically separate from the server or data center. Hence, the techniques disclosed herein can be applied to enhance the images that are streamed and to enhance services that stream images such as NVIDIA GeForce Now (GFN), Google Stadia, and the like.
Furthermore, images generated applying one or more of the techniques disclosed herein may be used to train, test, or certify deep neural networks (DNNs) used to recognize objects and environments in the real world. Such images may include scenes of roadways, factories, buildings, urban settings, rural settings, humans, animals, and any other physical object or real-world setting. Such images may be used to train, test, or certify DNNs that are employed in machines or robots to manipulate, handle, or modify physical objects in the real world. Furthermore, such images may be used to train, test, or certify DNNs that are employed in autonomous vehicles to navigate and move the vehicles through the real world. Additionally, images generated applying one or more of the techniques disclosed herein may be used to convey information to users of such machines, robots, and vehicles.
Furthermore, images generated applying one or more of the techniques disclosed herein may be used to display or convey information about a virtual environment such as the metaverse, Omniverse, or a digital twin of a real environment. Furthermore, Images generated applying one or more of the techniques disclosed herein may be used to display or convey information on a variety of devices including a personal computer (e.g., a laptop), an Internet of Things (IoT) device, a handheld device (e.g., smartphone), a vehicle, a robot, or any device that includes a display.
All patents, patent applications and publications cited herein are incorporated by reference for all purposes as if expressly set forth.
All patents & publications cited above are incorporated by reference as if expressly set forth. While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
This application claims priority to U.S. Provisional Patent Application No. 63/245,155 filed Sep. 16, 2021, the entire content of which is herein incorporated by reference. This application is related to the following commonly-owned patent applications each of which is incorporated herein by reference for all purposes as if expressly set forth herein: U.S. patent application Ser. No. 17/946,235 filed Sep. 16, 2022 entitled Micro-Meshes, A Structured Geometry For Computer Graphics (21-SC-1926US02; 6610-126)U.S. patent application Ser. No. 17/946,221 filed Sep. 16, 2022 entitled Accelerating Triangle Visibility Tests For Real-Time (22-DU-0175US01; 6610-124)US Patent Application no. xxxxxx filed Sep. 16, 2022 entitled Displaced Micro-meshes for Ray and Path Tracing (22-AU-0623US01/6610-125).
Number | Date | Country | |
---|---|---|---|
63245155 | Sep 2021 | US |