Ray tracing is a type of graphics rendering technique in which simulated rays of light are cast to test for object intersection and pixels are colored based on the result of the ray cast. Ray tracing is computationally more expensive than rasterization-based techniques, but produces more physically accurate results. Improvements in ray tracing operations are constantly being made.
A more detailed understanding can be had from the following description, given by way of example in conjunction with the accompanying drawings wherein:
Techniques for performing ray tracing operations are provided. The techniques include dividing a primitive of a scene to generate primitive portions; identifying, from the primitive portions, and based on an opacity texture, one or more opaque primitive portions and one or more invisible primitive portions; generating box nodes for a bounding volume hierarchy corresponding to the opaque primitive portions, but not the invisible primitive portions; and inserting the generated box nodes into the bounding volume hierarchy.
In various alternatives, the processor 102 includes a central processing unit (CPU), a graphics processing unit (GPU), a CPU and GPU located on the same die, or one or more processor cores, wherein, in different implementations, each processor core is a CPU or a GPU. In various alternatives, the memory 104 is located on the same die as the processor 102, or is located separately from the processor 102. The memory 104 includes a volatile or non-volatile memory, for example, random access memory (RAM), dynamic RAM, or a cache.
The storage 106 includes a fixed or removable storage, for example, without limitation, a hard disk drive, a solid state drive, an optical disk, or a flash drive. The input devices 108 include, without limitation, a keyboard, a keypad, a touch screen, a touch pad, a detector, a microphone, an accelerometer, a gyroscope, a biometric scanner, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals). The output devices 110 include, without limitation, a display, a speaker, a printer, a haptic feedback device, one or more lights, an antenna, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals).
The input driver 112 and output driver 114 include one or more hardware, software, and/or firmware components that are configured to interface with and drive input devices 108 and output devices 110, respectively. The input driver 112 communicates with the processor 102 and the input devices 108, and permits the processor 102 to receive input from the input devices 108. The output driver 114 communicates with the processor 102 and the output devices 110, and permits the processor 102 to send output to the output devices 110. The output driver 114 includes an accelerated processing device (“APD”) 116 which is coupled to a display device 118, which, in some examples, is a physical display device or a simulated device that uses a remote display protocol to show output. The APD 116 is configured to accept compute commands and graphics rendering commands from processor 102, to process those compute and graphics rendering commands, and to provide pixel output to display device 118 for display. As described in further detail below, the APD 116 includes one or more parallel processing units configured to perform computations in accordance with a single-instruction-multiple-data (“SIMD”) paradigm. Thus, although various functionality is described herein as being performed by or in conjunction with the APD 116, in various alternatives, the functionality described as being performed by the APD 116 is additionally or alternatively performed by other computing devices having similar capabilities that are not driven by a host processor (e.g., processor 102) and configured to provide graphical output to a display device 118. For example, it is contemplated for any processing system that performs processing tasks in accordance with a SIMD paradigm to be configured to perform the functionality described herein. Alternatively, it is contemplated that computing systems that do not perform processing tasks in accordance with a SIMD paradigm performs the functionality described herein.
The APD 116 executes commands and programs for selected functions, such as graphics operations and non-graphics operations that are suited for parallel processing. In various examples, the APD 116 is used for executing graphics pipeline operations such as pixel operations, geometric computations, and rendering an image to display device 118 based on commands received from the processor 102. The APD 116 also executes compute processing operations that are not directly related to graphics operations, such as operations related to video, physics simulations, computational fluid dynamics, or other tasks, based on commands received from the processor 102. In some examples, these compute processing operations are performed by executing compute shaders on the SIMD units 138.
The APD 116 includes compute units 132 that include one or more SIMD units 138 that are configured to perform operations at the request of the processor 102 (or another unit) in a parallel manner according to a SIMD paradigm. The SIMD paradigm is one in which multiple processing elements share a single program control flow unit and program counter and thus execute the same program but are able to execute that program with different data. In one example, each SIMD unit 138 includes sixteen lanes, where each lane executes the same instruction at the same time as the other lanes in the SIMD unit 138 but is able to execute that instruction with different data. In some situations, lanes are switched off with predication if not all lanes need to execute a given instruction. In some situations, predication is also used to execute programs with divergent control flow. More specifically, for programs with conditional branches or other instructions where control flow is based on calculations performed by an individual lane, predication of lanes corresponding to control flow paths not currently being executed, and serial execution of different control flow paths allows for arbitrary control flow.
The basic unit of execution in compute units 132 is a work-item. Each work-item represents a single instantiation of a program that is to be executed in parallel in a particular lane. In various examples, work-items are executed simultaneously (or partially simultaneously and partially sequentially) as a “wavefront” on a single SIMD processing unit 138. One or more wavefronts are included in a “work group,” which includes a collection of work-items designated to execute the same program. In some implementations, a work group is executed by executing each of the wavefronts that make up the work group. In alternatives, the wavefronts are executed on a single SIMD unit 138 or on different SIMD units 138. In some implementations, wavefronts are the largest collection of work-items that are executed simultaneously (or pseudo-simultaneously) on a single SIMD unit 138. “Pseudo-simultaneous” execution occurs in the case of a wavefront that is larger than the number of lanes in a SIMD unit 138. In such a situation, wavefronts are executed over multiple cycles, with different collections of the work-items being executed in different cycles. An APD scheduler 136 is configured to perform operations related to scheduling various workgroups and wavefronts on compute units 132 and SIMD units 138.
The parallelism afforded by the compute units 132 is suitable for graphics related operations such as pixel value calculations, vertex transformations, and other graphics operations. Thus in some instances, a graphics pipeline 134, which accepts graphics processing commands from the processor 102, provides computation tasks to the compute units 132 for execution in parallel.
The compute units 132 are also used to perform computation tasks not related to graphics or not performed as part of the “normal” operation of a graphics pipeline 134 (e.g., custom operations performed to supplement processing performed for operation of the graphics pipeline 134). An application 126 or other software executing on the processor 102 transmits programs that define such computation tasks to the APD 116 for execution.
The APD 116, including the compute units 132, implements ray tracing, which is a technique that renders a 3D scene by testing for intersection between simulated light rays and objects in a scene. In some implementations, much of the work involved in ray tracing is performed by programmable shader programs, executed on the SIMD units 138 in the compute units 132, as described in additional detail below.
In examples, traversal through the ray tracing pipeline 300 is performed partially or fully by the scheduler 136, either autonomously or under control of the processor 102, or partially or fully by a shader program (such as a bounding volume hierarchy traversal shader program) executing on one or more of the SIMD units 138. In some examples, testing a ray against boxes and triangles (inside the acceleration structure traversal stage 304) is hardware accelerated (meaning that a fixed function hardware unit performs the steps for those tests). In other examples, such testing is performed by software such as a shader program executing on one or more SIMD units 138. Herein, where the phrase “the ray tracing pipeline does [a thing]” is used, this means that the hardware and/or software that implements the ray tracing pipeline 300 does that thing. Although described as executing on the SIMD unit 138 of
In some modes of operation, the ray tracing pipeline 300 operates in the following manner. A ray generation shader 302 is executed. The ray generation shader 302 sets up data for a ray to test against a triangle or scene that includes a collection of triangles and requests the acceleration structure traversal stage 304 test the ray for intersection with triangles.
The acceleration structure traversal stage 304 traverses an acceleration structure, which is a data structure that describes a scene and objects within the scene, and tests the ray against triangles in the scene. In some examples, during this traversal, for triangles that are intersected by the ray, the ray tracing pipeline 300 triggers execution of an any hit shader 306 and/or an intersection shader 307 if those shaders are specified by the material of the intersected triangle. Note that multiple triangles can be intersected by a single ray. It is not guaranteed that the acceleration structure traversal stage will traverse the acceleration structure in the order from closest-to-ray-origin to farthest-from-ray-origin. In some examples, the acceleration structure traversal stage 304 triggers execution of a closest hit shader 310 for the triangle closest to the origin of the ray that the ray hits, or, if no triangles were hit, triggers a miss shader.
Note, it is possible for the any hit shader 306 or intersection shader 307 to “reject” an intersection from the acceleration structure traversal stage 304, and thus the acceleration structure traversal stage 304 triggers execution of the miss shader 312 if no intersections are found to occur with the ray or if one or more intersections are found but are all rejected by the any hit shader 306 and/or intersection shader 307. An example circumstance in which an any hit shader 306 “rejects” a hit is when at least a portion of a triangle that the acceleration structure traversal stage 304 reports as being hit is fully transparent (“invisible”). In an example, the acceleration structure traversal stage 304 tests geometry and not transparency. Thus, in these examples, the any hit shader 306 that is invoked due to an intersection with a triangle having at least some transparency sometimes determines that the reported intersection should not count as a hit due to “intersecting” a transparent portion of the triangle. A typical use for the closest hit shader 310 is to color a ray based on a texture for the material. A typical use for the miss shader 312 is to color a ray with a color set by a skybox. It should be understood that, in various implementations, the shader programs defined for the closest hit shader 310 and miss shader 312 implements a wide variety of techniques for coloring ray and/or performing other operations.
A typical way in which ray generation shaders 302 generate rays is with a technique referred to as backwards ray tracing. In backwards ray tracing, the ray generation shader 302 generates a ray having an origin at the point of the camera. The point at which the ray intersects a plane defined to correspond to the screen defines the pixel on the screen whose color the ray is being used to determine. If the ray hits an object, that pixel is colored based on the closest hit shader 310. If the ray does not hit an object, the pixel is colored based on the miss shader 312. It is possible for multiple rays to be cast per pixel, with the final color of the pixel being determined by some combination of the colors determined for each of the rays of the pixel.
It is possible for any of the any hit shader 306, intersection shader 307, closest hit shader 310, and miss shader 312, to spawn their own rays, which enter the ray tracing pipeline 300 at the ray test point. These rays can be used for any purpose. One common use is to implement environmental lighting or reflections. In an example, when a closest hit shader 310 is invoked, the closest hit shader 310 spawns rays in various directions. For each object, or a light, hit by the spawned rays, the closest hit shader 310 adds the lighting intensity and color to the pixel corresponding to the closest hit shader 310. It should be understood that although some examples of ways in which the various components of the ray tracing pipeline 300 are used to render a scene have been described, any of a wide variety of techniques are alternatively used.
As described above, the determination of whether a ray intersects an object is referred to herein as a “ray intersection test.” The ray intersection test involves shooting a ray from an origin and determining whether the ray intersects a geometric primitive (e.g., a triangle) and, if so, what distance from the origin the triangle intersection is at. For efficiency, the ray tracing test uses a representation of space referred to as an acceleration structure, such as a bounding volume hierarchy. In a bounding volume hierarchy, each non-leaf node represents an axis aligned bounding box that bounds the geometry of all children of that node. In an example, the base node represents the maximal extents of an entire region for which the ray intersection test is being performed. In this example, the base node has two children that each typically represent different axis aligned bounding boxes that subdivide the entire region. Each of those two children has two child nodes that represent axis aligned bounding boxes that subdivide the space of their parents, and so on. Leaf nodes represent a triangle or other geometric primitive against which a ray intersection test is performed. A non-leaf node is sometimes referred to as a “box node” herein and a leaf node is sometimes referred to as a “triangle node” herein.
The bounding volume hierarchy data structure allows the number of ray-triangle intersections (which are complex and thus expensive in terms of processing resources) to be reduced as compared with a scenario in which no such data structure were used and therefore all triangles in a scene would have to be tested against the ray. Specifically, if a ray does not intersect a particular bounding box, and that bounding box bounds a large number of triangles, then all triangles in that box are eliminated from the test. Thus, a ray intersection test is performed as a sequence of tests of the ray against axis-aligned bounding boxes, followed by tests against triangles.
The spatial representation 402 of the bounding volume hierarchy is illustrated in the left side of
It is possible to render a geometrically complex object by representing that object by a large number of detailed polygons. An alternative technique is a technique in which simple geometry, such as a single primitive, is rendered with an opacity texture that indicates which portions of that primitive are considered opaque and which portions are considered invisible. This alternative technique has the benefit that a much smaller amount of geometry is processed.
In a corresponding rendered image 506, colored pixels 510 corresponding to the leaf are shown, and empty pixels 508 are shown for the portions outside of the leaf. Rendering the primitive 502 with the opacity texture 504 results in pixels 508 corresponding to the opaque portions of the primitive 502, but no pixels corresponding to the invisible portions 510 of the primitive 502. The portions of the render target corresponding to the empty pixels 508 might be colored by other rendering.
In the example application, the ray tracing pipeline 300 casts the ray to determine what color to display for the ray. To make this determination, the ray tracing pipeline 300 executes an any hit shader 306 to identify 610 all hits with primitives 602. For each such hit, the ray tracing pipeline 300 (e.g., within the any hit shader 306) evaluates whether the position of the hit in the opacity texture is considered opaque or invisible 612. After all hits on primitives have been identified with the any hit shaders 306 and opacity has been evaluated for each such hit primitive, a closest hit shader 310 examines the group of hit primitives for which the hit is opaque to determine which such hit is the closest hit 614. Then, the closest hit shader 310 determines a color for that hit (which can be done through any technically feasible means such as applying a texture and lighting and performing other steps).
The above technique for determining a closest hit for primitives that have an opacity texture is a fairly expensive in terms of processing time. For example, multiple instances of the any hit shader 306 are executed. Thus, reducing the number of instances of the any hit shader 306 that are executed would improve performance.
It is possible that the portion of a particular primitive 602 that is considered opaque by the opacity texture 604 is sometimes quite small. For instance, in primitive 602(1), only a central region is considered opaque. Similarly, for the other illustrated primitives 602 of
The BVH builder 801 accepts scene geometry 805 and generates a bounding volume hierarchy 803. The scene geometry 805 includes primitives that describe a scene, which is provided by an application or other entity. The bounding volume hierarchy (“BVH”) 803 is similar to the bounding volume hierarchy 404 of
Referring to
The BVH builder 801 uses any technically feasible technique to generate the initial BVH 807. In an example, the BVH builder 801 builds the initial BVH 807 by iteratively geometrically subdividing the geometry of the scene (e.g., by bisecting the bounding box of the scene along a particular axis). Each subdivision results in a different bounding box, which the BVH builder 801 sets as a box node 802. The BVH builder 801 uses certain criteria, such as a maximum number of primitives in a box node 802 or a maximum depth in the BVH 807, to determine which box nodes 802 the leaf nodes 804 are parented to. For example, the BVH builder 801 makes a box node 802 whose bounding box contains a maximum of two primitives the parent of the leaf nodes 804 for those two primitives. The result is a set of box nodes 802, each of which points to either one or more other box nodes 802 or one or more other triangle nodes 804 as illustrated.
The BVH builder 801 generates a refined BVH 809 in the following manner. The BVH builder 801 examines one or more primitives associated with the leaf noes 804 of the initial BVH 807. The BVH builder 801 divides the primitives associated with one or more such leaf nodes 804 into smaller primitives as shown in
To generate the refined BVH 809, the BVH builder 801 adds box nodes 806 into the tree of the initial BVH 807. The added box nodes 806 have associated bounding boxes that are smaller than the bounding boxes of the non-divided primitives 804. Moreover, the added box nodes 806 have bounding boxes that bound the opaque primitive portions 704, but not the invisible portions 706. In other words, the added box nodes 806 have bounding boxes that encompass an area that is smaller than the primitives that are divided.
In some implementations, the leaf nodes 804 remain the same as in the initial BVH 807. More specifically, instead of replacing the leaf nodes 804, which point to undivided primitives 700, with leaf nodes that bound only the opaque portions 704, and having the added box nodes 806 point to these replaced leaf nodes, the leaf nodes 804 that correspond to the undivided primitives 700 remain in the refined BVH 809. The added box nodes 806 point to these original leaf nodes 804. It is possible for multiple added box nodes 806 to point to a single such leaf node 804, as illustrated. This occurs because it is possible for the bounding boxes corresponding to the added box nodes 806 to bound an area that is smaller than a particular undivided primitive 700. Because it is possible for multiple such added box nodes 806 to exist in the refined BVH 809 for a single undivided primitive 700, it is possible for the refined BVH 809 to include multiple box nodes 806 to point to the same undivided primitive 700.
The smaller box nodes 806 provide the benefit that a smaller number of any hit shader instances are executed. This occurs because the smaller box nodes 806 result in fewer intersection tests with leaf nodes. More specifically, because the added box nodes 806 do not exist for the invisible portions 706, BVH traversal for a ray that intersects such invisible portions 706 does not reach the leaf node 804 for the undivided primitive 700 corresponding to those invisible portions 706. Keeping the undivided primitives 700 in the refined BVH 809, rather than adding the divided primitives (e.g., opaque primitive portions 704), results in a smaller amount of data being required for the refined BVH 809. The size of the added box nodes is smaller than the divided primitives.
In some implementations, the BVH builder 801 is configured to generate the refined BVH 809 in the following manner. For each box node 802 in the initial BVH 807 that is the parent of a leaf node 804, the BVH builder 801 generates an added bounding box 806 for one or more of the opaque primitive portions 704 of the leaf node 804. The BVH builder 801 sets the parent of each such added bounding box 806 to the box node 802 the is the parent of the leaf node 804. The BVH builder 801 also set the parent of that leaf node 804 to each such added bounding box 806 that corresponds to that leaf node 802. The BVH builder 801 modifies the box node 802 so that the box node 802 is no longer the parent of the leaf node 804.
In the example of
Although it has been described that an initial bounding box is generated 807 and then converted to a refined bounding box 809, it is also possible for the BVH builder 801 to generate the refined bounding box 809 directly, without first creating an initial bounding box 807. Any such generated BVH 809 would have one or more box nodes 806 having corresponding bounding boxes that bound opaque primitive portions that exclude at least a portion of a primitive of a scene that has is not considered opaque according to an opacity texture. In addition, the generated BVH 809 would include leaf nodes 804 corresponding to the original primitives, where the box nodes 806 point to such leaf nodes 804. In addition, in some instances, multiple of the box nodes 806 of such generated BVH 809 would point to a single such leaf node 804.
During ray tracing, traversal of the refined BVH 809 occurs in a similar manner as described elsewhere herein. For example, a ray tracing pipeline 300 would traverse the BVH nodes, including box nodes and leaf nodes, performing an intersection test for a ray against such nodes. For box nodes, a failed intersection test eliminates children of that box node from consideration. For leaf nodes, the result of the intersection test determines whether the ray intersects the corresponding leaf node. The technique described with respect to
It should be understood that when the phrase “the ray tracing pipeline 300 performs an action” is used, it means that the hardware, software, or combination of hardware and software that implements the ray tracing pipeline 300 performs those steps.
The method 900 begins at step 902, where a BVH builder 801 divides one or more primitives of a scene to generate primitive portions. The primitives that are divided are designated as having an associated opacity texture.
At step 904, the BVH builder 801 identifies, from the primitive portions, opaque primitive portions, and invisible primitive portions. Opaque primitive portions are portions designated as opaque by the opacity texture. Invisible primitive portions are portions designed as invisible by the opacity texture.
At step 906, the BVH builder 801 generates box nodes corresponding to the opaque primitive portions but not the invisible primitive portions. In an example, the BVH builder 801 generates one box node for each primitive portion. The box nodes generated in this manner are assigned a bounding box that bounds the corresponding opaque primitive portion.
At step 908, the BVH builder 801 inserts the generated box nodes into a bounding volume hierarchy, with the box nodes being the parent of the leaf node corresponding to the original undivided primitive. In some examples, the BVH builder 801 modifies the box node that pointed to the primitive to instead point to one or more box nodes generated based on that primitive. In addition, in some examples, the BVH builder 801 modifies the box node that pointed to the primitive to no longer point to that primitive.
It should be understood that many variations are possible based on the disclosure herein. Although features and elements are described above in particular combinations, each feature or element can be used alone without the other features and elements or in various combinations with or without other features and elements.
The methods provided can be implemented in a general purpose computer, a processor, or a processor core. Suitable processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine. Such processors can be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions and other intermediary data including netlists (such instructions capable of being stored on a computer readable media). The results of such processing can be maskworks that are then used in a semiconductor manufacturing process to manufacture a processor which implements features of the disclosure.
The methods or flow charts provided herein can be implemented in a computer program, software, or firmware incorporated in a non-transitory computer-readable storage medium for execution by a general purpose computer or a processor. Examples of non-transitory computer-readable storage mediums include a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).